biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	ceca060501	[Deprecation] Deprecate `seed=None` (#29185 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 18:19:25 +00:00
Charlie Fu	75648b16dd	[ROCm][CI] Fix config/test_config_generation.py (#29142 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-11-21 17:12:16 +00:00
Chendi.Xue	460d02a417	[NIXL] Fix after virtual block_size for host_buffer with heter kv_layout (#29122 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-11-21 08:55:27 -08:00
Mingyuan Ma	b4c8fbaae2	Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892 ) Signed-off-by: mingyuanm <mingyuanm@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 09:54:11 -07:00
rasmith	e99e467384	[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 11:53:09 -05:00
Wentao Ye	a42ab317ac	[Log] Optimize startup log (#28948 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-21 08:46:20 -08:00
Aleksandr Malyshev	b7f1f490a6	Upstream triton fp4 weight preshuffle (#28888 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-21 11:34:46 -05:00
Woosuk Kwon	30b44a1598	GPU Model Runner V2 (#25266 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-21 08:20:55 -08:00
Wentao Ye	1f400c58b8	[CI] Add batch invariant test to ci (#27842 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 09:20:33 -07:00
rasmith	711241c13c	[CI/Build] Fix illegal memory access and unsupported test in kernels/attention/test_cache.py (#29118 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-21 10:58:38 -05:00
Cyrus Leung	d7219bcda3	[Misc] Move dynamic seed initialization to `EngineArgs` (#29165 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 15:27:44 +00:00
wangxiyuan	4050bae417	[Doc] Update plugin doc (#28532 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-21 14:57:26 +00:00
skaraban3807	f1805db1a6	[Perf] These changes enhance the NUMA functionality of vllm for systems with more than one NUMA nodes per socket (#25559 ) Signed-off-by: Siddappa Karabannavar <siddappa.karabannavar@amd.com>	2025-11-21 14:13:52 +00:00
Julien Denize	434f3d3eb8	Fix mistral config (#29172 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-11-21 14:01:20 +00:00
sfbemerk	2092ce8c39	Tool Call Parser logs should not contain user input / model output except on DEBUG (#29160 ) Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-21 20:57:19 +08:00
who who who	fc9f821d20	fix cross attention (#28346 ) Signed-off-by: fsx950223 <fsx950223@outlook.com>	2025-11-21 04:55:43 -08:00
Cyrus Leung	9452863088	Revert "Revert #28875 (#29159 )" (#29179 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 04:27:43 -08:00
Bhagyashri	2b1b3dfa4b	Update Dockerfile to use gcc-toolset-14 and fix test case failures on power (ppc64le) (#28957 ) Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>	2025-11-21 12:24:09 +00:00
Russell Bryant	cca2d2cdbe	[Core] Align whisper closer to other multimodal models (#27292 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-11-21 12:01:54 +00:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
WeiQing Chen	b34129bf8e	[Misc] remove useless v1 env (#29164 ) Signed-off-by: David Chen <530634352@qq.com>	2025-11-21 01:41:20 -08:00
Cyrus Leung	4d7231e774	Revert #28875 (#29159 )	2025-11-21 01:40:17 -08:00
Huamin Li	8ac3a41487	[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 23:53:30 -08:00
Canlin Guo	7d6da483b0	[Minor][Clean] Remove the legacy assertion in video (#29150 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-20 23:52:34 -08:00
Chenheli Hua	e4c3182c68	[Small] Capture AttributeError when checking ray dependency. (#29024 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-11-20 22:54:10 -08:00
Alex Brooks	b4734b9550	[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-11-21 13:32:30 +08:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Matthew Bonanni	11857a00b0	[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-20 20:24:43 -08:00
Boyuan Feng	8c25f9cfb6	[BugFix] skip combo kernel on cpu (#29129 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-21 11:50:59 +08:00
Cyrus Leung	56e96b37e4	[V0 Deprecation] Remove `best_of` (#29090 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:40:40 +08:00
Qidong Su	698024ecce	[Doc] update installation guide regarding aarch64+cuda pytorch build (#28875 ) Signed-off-by: Qidong Su <soodoshll@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-20 19:40:25 -08:00
jeremyteboul	0730414999	[Core] Add audio_embeds support to chat completions (#29059 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-11-21 11:39:47 +08:00
zhrrr	a982f5b5ea	[kernel][perf] support uncontiguous input for rms_norm kernel (#28103 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-20 19:39:09 -08:00
Cyrus Leung	0e741c12e3	[Bugfix] Fix Plamo3 rope handling (#29092 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-21 11:38:35 +08:00
Wentao Ye	56669c1f29	[CI] Fix mypy for `vllm/v1/worker` (#29037 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 11:36:07 +08:00
Hongxia Yang	3f5f36da3f	[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving (#29127 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-11-21 03:30:07 +00:00
Wentao Ye	e1eefa4c40	[Bug] Fix torch warning of tf32 usage (#29112 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 01:54:59 +00:00
Xiao Li	ed6ae1e36a	[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation (#29124 ) Signed-off-by: Xiao Li <ilx@meta.com>	2025-11-20 17:54:35 -08:00
Jee Jee Li	9875be6431	[LoRA][2/2]Remove LoRA extra vocab (#28545 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-21 09:46:43 +08:00
Wentao Ye	df44df0143	[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 18:41:49 -07:00
Michael Goin	87cbbdff63	Update model references for OLMo3 (#29099 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-21 09:16:52 +08:00
Michael Goin	986ab5db63	[CI Bugfix] Fix Kernels DeepGEMM Test (H100) (#29106 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-20 16:42:33 -08:00
Rob Mulla	dd39f91edb	[Doc] cleanup TPU documentation and remove outdated examples (#29048 ) Signed-off-by: Rob Mulla <rob.mulla@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-21 00:05:59 +00:00
rasmith	c7a29d2c8d	[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:44:37 +00:00
rasmith	8237ab8a2b	[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:35:14 +00:00
Driss Guessous	3fd74189db	Fixes bench (#29058 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-11-20 21:21:54 +00:00
rasmith	5e5a7eb16f	[CI/Build] Make test_attention_selector.py run tests on correct platform (#29064 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-20 20:45:56 +00:00
rasmith	3d84ef9054	[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 20:39:49 +00:00
Software Developer	4d01b64284	[Bugfix] - Add Trace Headers to Beam Search Path (#29100 ) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>	2025-11-20 20:00:33 +00:00
Kevin H. Luu	114b0e2500	[chore] Update annotate release scripts (#29077 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-20 10:22:40 -08:00

... 56 57 58 59 60 ...

14386 Commits