Commit Graph

4744 Commits

Author SHA1 Message Date
Richard Zou
822e250ab7 [torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-11 16:07:09 +00:00
Jhao-Ting Chen
5573894737 Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
2026-03-11 11:36:11 -04:00
Harry Mellor
d5816c8c2f Fix tied weights in weight mapping test for Transformers v5 (#36788)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 15:10:26 +00:00
Wuxun Zhang
e584dce52b Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
2026-03-11 19:19:15 +08:00
Weiguang Li
724759684c [Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 03:13:06 -07:00
Richard Zou
09b6f99852 [compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-11 03:12:03 -07:00
Angela Yi
13e79fc811 [ci] Update rtol for test_classification (#36556)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
2026-03-11 03:08:16 -07:00
roikoren755
e661b9ee83 [NemotronH] Small fix reasoning parser (#36635)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-11 02:44:41 -07:00
Nicolò Lucchesi
098d844731 [NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-11 01:11:23 -07:00
Sladyn
4aaaf8c8ce feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
2026-03-11 04:35:33 +00:00
Wentao Ye
a8ff2cca92 [Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-03-10 21:25:30 -07:00
tunglinwood
42fadebecb [Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
2026-03-10 21:24:48 -07:00
Ning Xie
fe714dd507 [openapi server] log exception in exception handler(2/N) (#36201)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-03-10 20:16:30 -07:00
Nick Hill
65b2f405dc [Core] Simplify core kv-cache blocks initialization logic (#36521)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-10 20:20:02 +00:00
Nick Hill
2a68464c5b [Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-03-10 11:17:26 -07:00
Harry Mellor
f83b933b84 [CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-10 09:18:28 -07:00
Hashem Hashemi
721ae79f50 Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
2026-03-10 09:14:27 -07:00
Srinivasoo7
106ff69c4e feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
2026-03-10 14:43:40 +00:00
Jiangyun Zhu
ca5fb4bbd8 [Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-03-10 07:39:01 -07:00
wang.yuqi
a3189a08b0 [Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-10 13:32:25 +00:00
Mark McLoughlin
234860399b [Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2026-03-10 06:20:41 -07:00
Harry Mellor
c88510083b Fix Qwen2.5-VL test for Transformers v5 (#36532)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-10 12:05:34 +00:00
Chang Su
507ddbe992 feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-03-10 03:29:59 -07:00
Harry Mellor
195c997203 Fix LFM2 MoE test for Transformers v5 (#36534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-09 22:29:17 -07:00
Wentao Ye
7279374f91 [Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-09 20:55:58 -07:00
Hojin Yang
0836be3b03 [Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-10 10:59:19 +08:00
Andreas Karatzas
179547d62c [ROCm][CI] Fix ROCm GPT-OSS Eval test group (#36179)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-09 17:55:20 -07:00
Shaun Kotek
203a7f27da add nemotron v3 reasoning parser (#36393)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
Co-authored-by: root <root@gpu-259.slurm-workers-slurm.slurm.svc.cluster.local>
2026-03-09 15:11:41 -07:00
Micah Williamson
4ff9b045fe [ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-03-09 13:27:55 -05:00
Copilot
4b87ffbefb [torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints (#36027)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-03-09 18:04:40 +00:00
Andreas Karatzas
1e0f917b34 [ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm (#36101)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-09 12:07:44 -05:00
Andreas Karatzas
c174d54f86 [ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-09 12:02:41 -05:00
Roberto L. Castro
580864d81e [Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
2026-03-09 09:50:36 -07:00
Roberto L. Castro
2b28b9b269 [Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-09 09:46:57 -07:00
Matthew Bonanni
77a73458e3 Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-03-09 07:17:14 -07:00
Isotr0py
b0906d8b02 [MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU (#36472)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-09 03:43:44 -07:00
Cyrus Leung
f96c3ab08c [Deprecation][1/2] Remove items deprecated in v0.18 (#36470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-09 03:43:23 -07:00
Xin Yang
dc6b578466 [Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-08 23:41:01 -07:00
liuzhenwei
1bc9c77f6d [XPU] Add test script of PD disaggregation (#36434)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2026-03-09 05:50:27 +00:00
Alex Brooks
65a4da1504 [Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
2026-03-09 05:46:23 +00:00
wang.yuqi
fff3711a24 [Frontend][2/n] Improve pooling entrypoints | embed. (#36110)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
2026-03-09 11:42:19 +08:00
wang.yuqi
dcf8862fd4 [Examples][1/n] Resettle basic examples. (#35579)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:22:53 -07:00
Jiangyun Zhu
e5ff140216 [cudagraph] fix cudagraph warning in deepseekv32 (#28044)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-03-08 20:27:41 -04:00
danisereb
0a6a3a1290 Add support for ModelOpt MXFP8 MoE models (#35986)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-03-08 13:00:05 -07:00
Andreas Karatzas
40077ea3de [CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-08 14:42:24 +08:00
Wei Zhao
379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) 2026-03-07 13:51:54 -08:00
PatchyTIS
a6be75dbd2 [Core] NGram GPU Implementation compatible with Async Scheduler (#29184) 2026-03-07 13:51:37 -08:00
Micah Williamson
ee54f9cdb9 [ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224) 2026-03-07 13:50:52 -08:00
Micah Williamson
fc4657756f [ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174) 2026-03-07 13:50:17 -08:00
qli88
eebd14651f [CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416) 2026-03-07 13:49:56 -08:00