Commit Graph

14386 Commits

Author SHA1 Message Date
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00
Patrick von Platen
18d4e481d0 [Voxtral] Fix speech transcription api (#31388)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2026-01-08 18:34:19 +08:00
Isotr0py
2972a05473 [MM Encoder]: Make MMEncoderAttention's scale takes effect properly (#31950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 02:33:48 -08:00
Cyrus Leung
5576227bc1 [Model] Standardize common vision encoders (#31947)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:33:16 -08:00
Cyrus Leung
d1b6fe007f [Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:16:21 -08:00
omer-dayan
04a49669d1 RayLLM Bugfix - Preserve obj store URL for multi engine_config creation (#30803)
Signed-off-by: Omer Dayan <omdayan@nvidia.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 10:00:25 +00:00
BingjiaWang
96fcd3c267 [Misc] Support qwen3-next lora (#31719) 2026-01-08 09:27:50 +00:00
DevByteAI
1f214290d6 fix(compile): apply partition wrapper when loading AOT cached functions (#31536)
Signed-off-by: Devbyteai <abud6673@gmail.com>
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 17:27:26 +08:00
Ryan Rock
8cbdc7eb94 [CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2026-01-08 09:00:24 +00:00
Lumosis
b634e619bb Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
2026-01-08 09:00:07 +00:00
Isotr0py
eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 08:10:15 +00:00
Zhiwei
573a1d1119 [ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
2026-01-08 15:47:44 +08:00
Shang Wang
33156f56e0 [docker] A follow-up patch to fix #30913: [docker] install cuda13 version of lmcache and nixl (#31775)
Signed-off-by: Shang Wang <shangw@nvidia.com>
2026-01-07 23:47:02 -08:00
Rabi Mishra
107cf8e92f fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN (#31712)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-08 15:46:07 +08:00
Zyyeric
63baa28cf5 [Model] Enable LoRA support for tower and connector in GLM4-V (#31652)
Signed-off-by: Zyyeric <eric1976808123@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-08 15:45:53 +08:00
Andy Liu
e5173d3bac [Bugfix] Remove the num_hidden_layers override for glm4_moe (#31745) 2026-01-08 15:45:10 +08:00
prashanth058
d3235cb503 [Fix] Enable mm_processor_cache with vision LoRA (#31927)
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
2026-01-08 15:31:51 +08:00
Nick Hill
287b37cda4 [BugFix] Fix spec decoding edge case bugs (#31944)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-08 15:31:03 +08:00
Chang Su
791b2fc30a [grpc] Support gRPC server entrypoint (#30190)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: njhill <nickhill123@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2026-01-07 23:24:46 -08:00
Lucas Wilkinson
be6a81f31b [chore] Update FA commit (#30460)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-07 23:24:18 -08:00
Ronald
2ab441befe [platform] add dp_metadata arg to set_additional_forward_context (#31942)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
2026-01-08 06:56:44 +00:00
ShaanveerS
9572f74f15 [Model] Enable LoRA support for tower and connector in DotsOCR (#31825)
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com>
2026-01-08 14:50:16 +08:00
Andreas Karatzas
5f2a473ff3 [ROCm][CI] v1 cpu offloading attention backend fix (#31833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 14:37:50 +08:00
Michael Goin
6b2a672e47 [Doc] Add Claude code usage example (#31188)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-08 13:50:23 +08:00
rasmith
f1b1bea5c3 [CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 (#31873)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2026-01-08 13:06:09 +08:00
Charlie Fu
cddbc2b4b2 [ROCm][CI] Add rocm support for run-multi-node-test.sh (#31922)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 04:36:39 +00:00
Andreas Karatzas
087a138963 [ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 04:35:25 +00:00
Andreas Karatzas
c4041f37a4 [ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling (#31931)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 04:17:56 +00:00
Richard Zou
a79079feef [BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-01-08 04:04:58 +00:00
Robert Shaw
9f6dcb71ae [MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
2026-01-08 03:46:27 +00:00
Andreas Karatzas
8dd2419fa9 [CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency (#31932)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 02:58:01 +00:00
Rabi Mishra
39d82005f7 fix(rocm): add early return in get_flash_attn_version for ROCm (#31286)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-08 10:28:07 +08:00
Rabi Mishra
25eef3dc2e feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-08 10:27:09 +08:00
Matthew Bonanni
0d7667419f [0/N][Attention] Fix miscellaneous pre-commit issues (#31924)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-08 01:15:17 +00:00
Robert Shaw
5dcd7ef1f2 [MoE Refactor][15/N] Apply Refactor to Fp8 (#31415) 2026-01-07 19:42:33 -05:00
Elvir Crnčević
ffc0a2798b Add back missing DeepEP LL params (#31911)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
2026-01-07 17:47:54 -05:00
Nick Hill
10ef65eded [BugFix] Fix bad words with speculative decoding (#31908)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-07 15:46:42 -05:00
Ilya Markov
6170d47d22 [EPLB] Optimize EPLB with numpy (#29499)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-01-07 15:21:35 -05:00
Xin Yang
0ada960a20 [Kernel] Support bias type in grouped_topk kernel (#31781)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-07 12:16:32 -08:00
Ning Xie
c907d22158 [refactor] refactor memory constants usage (#31865)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-07 18:37:31 +00:00
Michael Goin
f347ac6c34 [Perf] Fuse stride preparation for NVFP4 cutlass_moe (#31837)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-07 13:31:26 -05:00
Festus Ayobami Owumi
05f47bd8d2 [Doc] Fix: Correct vLLM announcing blog post link in docs (#31868)
Signed-off-by: enfinity <festusowumi@gmail.com>
2026-01-07 10:06:42 -08:00
roikoren755
bf184a6621 Enable quantized attention in NemotronH models (#31898)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-01-07 17:37:19 +00:00
Jee Jee Li
30399cc725 UX: add vLLM env info in '/server_info' (#31899)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-07 17:13:02 +00:00
Kfir Toledo
b89443b8d9 [KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
2026-01-07 16:59:43 +00:00
Marko Rosenmueller
1d9e9ae8a4 [Bugfix]: prevent leaking tokens in crash log (#30751)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
2026-01-07 16:15:19 +00:00
Cyrus Leung
b7036c87a1 [Refactor] Clean up pooler modules (#31897)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 00:07:43 +08:00
Kate Cheng
cc6dafaef2 [Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213)
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2026-01-07 10:53:54 -05:00
R3hankhan
1ab055efe6 [OpenAI] Extend VLLMValidationError to additional validation parameters (#31870)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2026-01-07 14:45:49 +00:00
Cyrus Leung
b665bbc2d4 [Chore] Migrate V0 attention utils (#31891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-07 13:44:36 +00:00