biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wang.yuqi	4ae77dfd42	[Frontend][1/n] Make pooling entrypoints request schema consensus \| CompletionRequest (#32395 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-16 06:17:04 +00:00
XiongfeiWei	73f635a75f	[Bug] Add TPU backend option (#32438 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2026-01-16 05:17:12 +00:00
cjackal	35bf5d08e8	[bugfix] Fix online serving crash when text type response_format is received (#26822 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>	2026-01-16 12:23:54 +08:00
Kebe	5de6dd0662	[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 03:21:55 +00:00
ltd0924	709502558c	[Model] Add Step3vl 10b (#32329 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-15 19:04:16 -08:00
Micah Williamson	46f8a982b1	[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:55:57 +00:00
Matthew Bonanni	bcf2333cd6	[CI] Fix LM Eval Large Models (H100) (#32423 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-16 00:52:49 +00:00
Michael Goin	83239ff19a	Add thread_n=64 support to Marlin MoE (#32360 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 16:45:44 -08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Wentao Ye	aca5c51487	[Refactor] Remove unused file (#32422 )	2026-01-15 15:59:38 -07:00
Yongye Zhu	31c29257c8	[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-15 12:53:40 -08:00
Aleksandr Malyshev	8c11001ba2	[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-01-15 14:13:08 -06:00
Richard Zou	bd292be0c0	[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-15 20:01:41 +00:00
TJian	41c544f78a	[ROCm] [CI] [Release] Rocm wheel pipeline with sccache (#32264 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-16 02:56:18 +08:00
Michael Goin	1be5a73571	[UX] Use kv_offloading_backend=native by default (#32421 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 18:55:11 +00:00
Lucas Wilkinson	c36ba69bda	[BugFix] Fix `assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1]` in Blackwell Quantized MoE Test (#32362 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-15 10:19:12 -08:00
Matthias Gehre	047413375c	[Attention][AMD] Make flash-attn optional (#30361 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-01-15 17:18:24 +00:00
smit kadvani	74e4bb1c5a	fixing podman build issue (#32131 ) Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-01-15 11:07:08 -06:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Woosuk Kwon	6218034dd7	[Model Runner V2] Support FlashInfer backend & Fix CUDA Graph bug [1/2] (#32348 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-15 08:59:23 -08:00
Pleaplusone	77c16df31d	[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-15 16:35:47 +00:00
Pleaplusone	130d6c9514	[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for `AiterFlashAttentionBackend` (#29887 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-15 15:29:53 +00:00
Dipika Sikka	361dfdc9d8	[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-15 07:25:55 -08:00
Matthew Bonanni	8ebfacaa75	[Attention][MLA] Make `FLASHINFER_MLA` the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-15 09:49:57 -05:00
brian033	b89275d018	[ROCm] Improve error handling while loading quantized model on gfx120… (#31715 ) Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-01-15 04:16:00 -08:00
Cyrus Leung	28459785ff	[3/N] Group together media-related code (#32406 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 11:52:12 +00:00
rasmith	8853a50af2	[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-15 19:05:54 +08:00
Douglas Lehr	c5891b5430	[ROCM] Add ROCm image build to release pipeline (#31995 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com>	2026-01-15 19:01:40 +08:00
Chauncey	707b44cc28	[Refactor] [11/N] to simplify the mcp architecture (#32396 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 18:49:31 +08:00
rongfu.leng	3a4e10c847	[Benchmark] [Feature] add vllm bench sweep startup command (#32337 ) Signed-off-by: lengrongfu <lenronfu@gmail.com>	2026-01-15 09:25:46 +00:00
Cyrus Leung	cbbae38f93	[2/N] Move cache factories to MM registry (#32382 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 01:02:30 -08:00
Cyrus Leung	cdba4c74b3	[Model] Avoid token selection in SigLIP pooling head (#32389 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 17:01:59 +08:00
seeksky	a52d1396a7	fix: avoid crash on zero-arg tool calls in glm4 parser (#32321 ) Signed-off-by: seekskyworld <djh1813553759@gmail.com>	2026-01-15 08:45:59 +00:00
dtc	1e584823f8	[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-01-15 16:31:16 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
Andreas Karatzas	ae1eba6a9a	[ROCm][CI] Pin transformers 4.57.3 to fix jina test failures (#32350 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-15 15:19:34 +08:00
Ofir Zafrir	e9ec2a72d8	[Bugfix] Fix stale `common_attn_metadata.max_seq_len` in speculative decoding with Eagle (#32312 ) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>	2026-01-15 06:39:37 +00:00
Lucas Wilkinson	2c9b4cf5bf	[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>	2026-01-15 06:32:22 +00:00
Ning Xie	9d7ae3fcdb	[code clean] remove duplicate check (#32376 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-15 05:29:34 +00:00
rasmith	3c2685645e	[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-15 05:04:34 +00:00
Micah Williamson	773d7073ae	[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-15 04:53:43 +00:00
kzwrime	edadca109c	[Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference (#31867 ) Signed-off-by: kunzh <zhikun.wu@outlook.com>	2026-01-15 04:50:48 +00:00
Li Wang	d86fc23bdd	[Misc] Remove redundant line (#32366 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-15 04:29:56 +00:00
Shiyan Deng	375e5984fe	Support configure skip_special_tokens in openai response api (#32345 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2026-01-15 04:07:26 +00:00
baonudesifeizhai	19b251fe3d	Fix optional parameter parsing in MiniMax M2 tool parser #32278 (#32342 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2026-01-15 04:05:48 +00:00
Ryan Rock	15422ed3f7	[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-15 04:01:42 +00:00
dolpm	8471b27df9	[compile] raise on compile_size implicit padding (#32343 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-14 20:46:56 +00:00
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
vllmellm	e27078ea80	[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-14 19:32:48 +00:00
Aleksandr Samarin	d084e9fca7	[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291 ) Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com> Signed-off-by: southfreebird <yvorott@gmail.com> Co-authored-by: southfreebird <yvorott@gmail.com>	2026-01-14 13:20:52 -05:00

... 26 27 28 29 30 ...

14386 Commits