Aleksandr Malyshev
|
8c11001ba2
|
[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2026-01-15 14:13:08 -06:00 |
|
Richard Zou
|
bd292be0c0
|
[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-15 20:01:41 +00:00 |
|
TJian
|
41c544f78a
|
[ROCm] [CI] [Release] Rocm wheel pipeline with sccache (#32264)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-16 02:56:18 +08:00 |
|
Michael Goin
|
1be5a73571
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 18:55:11 +00:00 |
|
Lucas Wilkinson
|
c36ba69bda
|
[BugFix] Fix assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1] in Blackwell Quantized MoE Test (#32362)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-15 10:19:12 -08:00 |
|
Matthias Gehre
|
047413375c
|
[Attention][AMD] Make flash-attn optional (#30361)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-01-15 17:18:24 +00:00 |
|
smit kadvani
|
74e4bb1c5a
|
fixing podman build issue (#32131)
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-01-15 11:07:08 -06:00 |
|
Wentao Ye
|
b34474bf2c
|
[Feature] Support async scheduling + PP (#32359)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-15 12:06:23 -05:00 |
|
Woosuk Kwon
|
6218034dd7
|
[Model Runner V2] Support FlashInfer backend & Fix CUDA Graph bug [1/2] (#32348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-15 08:59:23 -08:00 |
|
Pleaplusone
|
77c16df31d
|
[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 16:35:47 +00:00 |
|
Pleaplusone
|
130d6c9514
|
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 15:29:53 +00:00 |
|
Dipika Sikka
|
361dfdc9d8
|
[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-15 07:25:55 -08:00 |
|
Matthew Bonanni
|
8ebfacaa75
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-15 09:49:57 -05:00 |
|
brian033
|
b89275d018
|
[ROCm] Improve error handling while loading quantized model on gfx120… (#31715)
Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-01-15 04:16:00 -08:00 |
|
Cyrus Leung
|
28459785ff
|
[3/N] Group together media-related code (#32406)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 11:52:12 +00:00 |
|
rasmith
|
8853a50af2
|
[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 19:05:54 +08:00 |
|
Douglas Lehr
|
c5891b5430
|
[ROCM] Add ROCm image build to release pipeline (#31995)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
|
2026-01-15 19:01:40 +08:00 |
|
Chauncey
|
707b44cc28
|
[Refactor] [11/N] to simplify the mcp architecture (#32396)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 18:49:31 +08:00 |
|
rongfu.leng
|
3a4e10c847
|
[Benchmark] [Feature] add vllm bench sweep startup command (#32337)
Signed-off-by: lengrongfu <lenronfu@gmail.com>
|
2026-01-15 09:25:46 +00:00 |
|
Cyrus Leung
|
cbbae38f93
|
[2/N] Move cache factories to MM registry (#32382)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 01:02:30 -08:00 |
|
Cyrus Leung
|
cdba4c74b3
|
[Model] Avoid token selection in SigLIP pooling head (#32389)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 17:01:59 +08:00 |
|
seeksky
|
a52d1396a7
|
fix: avoid crash on zero-arg tool calls in glm4 parser (#32321)
Signed-off-by: seekskyworld <djh1813553759@gmail.com>
|
2026-01-15 08:45:59 +00:00 |
|
dtc
|
1e584823f8
|
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-01-15 16:31:16 +08:00 |
|
Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
Andreas Karatzas
|
ae1eba6a9a
|
[ROCm][CI] Pin transformers 4.57.3 to fix jina test failures (#32350)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-15 15:19:34 +08:00 |
|
Ofir Zafrir
|
e9ec2a72d8
|
[Bugfix] Fix stale common_attn_metadata.max_seq_len in speculative decoding with Eagle (#32312)
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>
|
2026-01-15 06:39:37 +00:00 |
|
Lucas Wilkinson
|
2c9b4cf5bf
|
[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-15 06:32:22 +00:00 |
|
Ning Xie
|
9d7ae3fcdb
|
[code clean] remove duplicate check (#32376)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-15 05:29:34 +00:00 |
|
rasmith
|
3c2685645e
|
[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 05:04:34 +00:00 |
|
Micah Williamson
|
773d7073ae
|
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-15 04:53:43 +00:00 |
|
kzwrime
|
edadca109c
|
[Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference (#31867)
Signed-off-by: kunzh <zhikun.wu@outlook.com>
|
2026-01-15 04:50:48 +00:00 |
|
Li Wang
|
d86fc23bdd
|
[Misc] Remove redundant line (#32366)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2026-01-15 04:29:56 +00:00 |
|
Shiyan Deng
|
375e5984fe
|
Support configure skip_special_tokens in openai response api (#32345)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2026-01-15 04:07:26 +00:00 |
|
baonudesifeizhai
|
19b251fe3d
|
Fix optional parameter parsing in MiniMax M2 tool parser #32278 (#32342)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2026-01-15 04:05:48 +00:00 |
|
Ryan Rock
|
15422ed3f7
|
[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-15 04:01:42 +00:00 |
|
dolpm
|
8471b27df9
|
[compile] raise on compile_size implicit padding (#32343)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-14 20:46:56 +00:00 |
|
Lumosis
|
66652e8082
|
[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
|
2026-01-14 20:10:01 +00:00 |
|
vllmellm
|
e27078ea80
|
[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-14 19:32:48 +00:00 |
|
Aleksandr Samarin
|
d084e9fca7
|
[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
Signed-off-by: southfreebird <yvorott@gmail.com>
Co-authored-by: southfreebird <yvorott@gmail.com>
|
2026-01-14 13:20:52 -05:00 |
|
qli88
|
3a612322eb
|
[CI] Move rixl/ucx from Dockerfile.rocm_base to Dockerfile.rocm (#32295)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2026-01-14 16:53:36 +00:00 |
|
Cyrus Leung
|
9ea07b41da
|
[1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 15:25:31 +00:00 |
|
Ning Xie
|
552b262936
|
rename tokenize serving api request id prefix to tokenize (#32328)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-14 14:52:20 +00:00 |
|
Chauncey
|
00e6402d56
|
[Frontend] track responsesAPI server_load (#32323)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 12:00:37 +00:00 |
|
Shanshan Shen
|
ce0946249d
|
[Misc] Make mem utils can be reused by other platforms (#32322)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-01-14 03:46:01 -08:00 |
|
Cyrus Leung
|
3f28174c6a
|
[Frontend] Standardize use of create_error_response (#32319)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 11:22:26 +00:00 |
|
Chauncey
|
769d0629e1
|
[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 10:20:58 +00:00 |
|
Cyrus Leung
|
90db5b31e4
|
[Refactor] Move top-level dummy data generation to registry (#32310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 02:17:46 -08:00 |
|
Roger Wang
|
b8199f6049
|
[Model] Re-implement Qwen3Omni Audio Encoder (#32167)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-14 15:40:30 +08:00 |
|
sangho.lee
|
7e6f123810
|
Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-14 15:33:09 +08:00 |
|
Chauncey
|
9312a6c03a
|
[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 07:26:24 +00:00 |
|