ltd0924
|
709502558c
|
[Model] Add Step3vl 10b (#32329)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-15 19:04:16 -08:00 |
|
Micah Williamson
|
46f8a982b1
|
[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-16 00:55:57 +00:00 |
|
TomerBN-Nvidia
|
c277fbdf31
|
[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>
|
2026-01-15 16:15:05 -08:00 |
|
Yongye Zhu
|
31c29257c8
|
[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-15 12:53:40 -08:00 |
|
Michael Goin
|
1be5a73571
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 18:55:11 +00:00 |
|
Wentao Ye
|
b34474bf2c
|
[Feature] Support async scheduling + PP (#32359)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-15 12:06:23 -05:00 |
|
Dipika Sikka
|
361dfdc9d8
|
[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-15 07:25:55 -08:00 |
|
Cyrus Leung
|
28459785ff
|
[3/N] Group together media-related code (#32406)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 11:52:12 +00:00 |
|
rasmith
|
8853a50af2
|
[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 19:05:54 +08:00 |
|
Chauncey
|
707b44cc28
|
[Refactor] [11/N] to simplify the mcp architecture (#32396)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 18:49:31 +08:00 |
|
Cyrus Leung
|
cbbae38f93
|
[2/N] Move cache factories to MM registry (#32382)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 01:02:30 -08:00 |
|
dtc
|
1e584823f8
|
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-01-15 16:31:16 +08:00 |
|
Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
rasmith
|
3c2685645e
|
[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 05:04:34 +00:00 |
|
Micah Williamson
|
773d7073ae
|
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-15 04:53:43 +00:00 |
|
Ryan Rock
|
15422ed3f7
|
[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-15 04:01:42 +00:00 |
|
dolpm
|
8471b27df9
|
[compile] raise on compile_size implicit padding (#32343)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-14 20:46:56 +00:00 |
|
Lumosis
|
66652e8082
|
[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
|
2026-01-14 20:10:01 +00:00 |
|
Aleksandr Samarin
|
d084e9fca7
|
[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
Signed-off-by: southfreebird <yvorott@gmail.com>
Co-authored-by: southfreebird <yvorott@gmail.com>
|
2026-01-14 13:20:52 -05:00 |
|
Cyrus Leung
|
9ea07b41da
|
[1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 15:25:31 +00:00 |
|
Cyrus Leung
|
90db5b31e4
|
[Refactor] Move top-level dummy data generation to registry (#32310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 02:17:46 -08:00 |
|
sangho.lee
|
7e6f123810
|
Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-14 15:33:09 +08:00 |
|
Chauncey
|
9312a6c03a
|
[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 07:26:24 +00:00 |
|
Hongxia Yang
|
048bb59728
|
AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2026-01-13 23:25:10 -08:00 |
|
Andreas Karatzas
|
9d0d7f48d5
|
[ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-14 07:21:26 +00:00 |
|
Yi Liu
|
50632adc58
|
Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-01-14 07:11:30 +00:00 |
|
Roberto L. Castro
|
8ef50d9a6b
|
[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-13 15:22:53 -08:00 |
|
Rabi Mishra
|
69f8a0ea37
|
fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-13 19:11:54 +00:00 |
|
Wentao Ye
|
f28125d87b
|
[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-13 10:58:18 -08:00 |
|
Andrew Xia
|
af54d2e2d0
|
[responseAPI] support partial message generation (#32100)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-01-13 10:41:26 -08:00 |
|
Matthew Bonanni
|
2263d44b68
|
[4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-13 09:08:45 -08:00 |
|
Chauncey
|
4f02cb2eac
|
[Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 15:37:34 +00:00 |
|
Cyrus Leung
|
252c011012
|
[Refactor] Remove MultiModalProfiler (#32254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 15:10:20 +00:00 |
|
Matthew Bonanni
|
98f60e5acb
|
[6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-13 05:38:52 -08:00 |
|
Chauncey
|
fefce49807
|
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 13:01:39 +00:00 |
|
Cyrus Leung
|
232214b2ae
|
[Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 10:45:42 +00:00 |
|
Andreas Karatzas
|
df7e12715f
|
[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-13 15:14:30 +08:00 |
|
Xingyu Liu
|
80221e1884
|
[BugFix]Fix eagle draft_model_config and add tests (#31753)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
2026-01-12 23:09:36 -08:00 |
|
Andreas Karatzas
|
5e714f7ff4
|
[ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-12 22:33:59 -08:00 |
|
Nick Hill
|
c6bb5b5603
|
[BugFix] Fix engine crash caused by chat tools + response_format (#32127)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-13 10:33:14 +08:00 |
|
Andrew Xia
|
a307ac0734
|
[responsesAPI] add unit test for optional function tool call id (#32036)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-01-12 16:14:54 -08:00 |
|
Nicolò Lucchesi
|
f8bd8394e3
|
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 20:38:49 +00:00 |
|
Or Ozeri
|
2be765b68a
|
[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 18:39:38 +00:00 |
|
Ilya Markov
|
1eb61ab34b
|
[Refactor] EPLB rebalance algo to NumPy (#30697)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-12 18:13:23 +00:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Cyrus Leung
|
7c0d3c5152
|
[Benchmark] Share data between SLA runs (#32184)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 01:12:22 +08:00 |
|
Nicolò Lucchesi
|
5b68107411
|
[Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-12 18:10:05 +01:00 |
|
Asaf Joseph Gardin
|
8fb2c135be
|
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-01-12 17:02:38 +00:00 |
|
danielafrimi
|
3f72639d36
|
[FIX] Add NO_MUL activation support for modular kernel path (#31528)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Signed-off-by: <>
Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@pool0-01777.cm.cluster>
|
2026-01-12 11:55:49 -05:00 |
|
Jaehyun An
|
6bc9c8473e
|
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
|
2026-01-12 16:39:02 +00:00 |
|