Cyrus Leung
aa125ecf0e
[Frontend] Improve error message ( #31987 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 20:07:03 +00:00
Lucas Kabela
f16bfbe5bc
[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders ( #31627 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-08 14:33:24 -05:00
Michael Goin
87e07a6b46
Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" ( #31978 )
2026-01-08 11:31:53 -08:00
Woosuk Kwon
7508243249
[Model Runner V2] Simplify BlockTables with UVA ( #31965 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-08 10:24:26 -08:00
Nicolò Lucchesi
83e1c76dbe
[CI][ROCm] Fix NIXL tests on ROCm ( #31728 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-09 01:34:43 +08:00
Nishidha Panpaliya
a563866b48
Fix ijson build for Power. ( #31702 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2026-01-08 17:12:33 +00:00
Nick Hill
a3d909ad2b
[Misc] Tidy up some spec decode logic in GPUModelRunner ( #31591 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-08 09:10:07 -08:00
Jee Jee Li
49568d5cf9
[Doc] Improve MM models LoRA notes ( #31979 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-08 08:55:22 -08:00
danisereb
b8112c1d85
[Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 ( #31960 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-08 16:08:37 +00:00
Chauncey
eaba8ece77
[Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming ( #31969 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-08 15:28:13 +00:00
yxing-bj
fe86be66c5
[Model] Support IQuestCoder model ( #31575 )
...
Signed-off-by: yxing <yxing@iquestlab.com >
2026-01-08 14:42:57 +00:00
Chauncey
1da3a5441a
[Docs]: update claude code url ( #31971 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-08 14:04:55 +00:00
TJian
72c068b8e0
[CI] [Bugfix] Fix unbounded variable in run-multi-node-test.sh ( #31967 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-08 05:42:01 -08:00
Mary
7645bc524b
[OpenAI] Fix tool_choice=required streaming when output has trailing extra data ( #31610 )
...
Signed-off-by: maylikenoother <ogedengbemary19@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-01-08 21:01:42 +08:00
Ce Zhao
1123a87892
[Model] Enable LoRA support for Pixtral ( #31724 )
...
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local >
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com >
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com >
2026-01-08 05:00:57 -08:00
tianshu-Michael-yu
03fd76c570
[Model] Add LFM2-VL model support ( #31758 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4
[Model] Add Grok-2 ( #31847 )
...
Signed-off-by: dangoldbj <dangoldbj23@gmail.com >
2026-01-08 04:59:48 -08:00
Patrick von Platen
18d4e481d0
[Voxtral] Fix speech transcription api ( #31388 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: Anexdeus <5142168@mail.ru >
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-01-08 18:34:19 +08:00
Isotr0py
2972a05473
[MM Encoder]: Make MMEncoderAttention's scale takes effect properly ( #31950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 02:33:48 -08:00
Cyrus Leung
5576227bc1
[Model] Standardize common vision encoders ( #31947 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 02:33:16 -08:00
Cyrus Leung
d1b6fe007f
[Chore] Further cleanup pooler ( #31951 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 02:16:21 -08:00
omer-dayan
04a49669d1
RayLLM Bugfix - Preserve obj store URL for multi engine_config creation ( #30803 )
...
Signed-off-by: Omer Dayan <omdayan@nvidia.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 10:00:25 +00:00
BingjiaWang
96fcd3c267
[Misc] Support qwen3-next lora ( #31719 )
2026-01-08 09:27:50 +00:00
DevByteAI
1f214290d6
fix(compile): apply partition wrapper when loading AOT cached functions ( #31536 )
...
Signed-off-by: Devbyteai <abud6673@gmail.com >
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-08 17:27:26 +08:00
Ryan Rock
8cbdc7eb94
[CI/Build] Enable test_kv_cache_events_dp for AMD ( #31834 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-01-08 09:00:24 +00:00
Lumosis
b634e619bb
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. ( #31635 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com >
2026-01-08 09:00:07 +00:00
Isotr0py
eac3b96ec0
[Models] Allow converting Qwen3-VL into Reranker model ( #31890 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 08:10:15 +00:00
Zhiwei
573a1d1119
[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch ( #31905 )
...
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com >
2026-01-08 15:47:44 +08:00
Shang Wang
33156f56e0
[docker] A follow-up patch to fix #30913 : [docker] install cuda13 version of lmcache and nixl ( #31775 )
...
Signed-off-by: Shang Wang <shangw@nvidia.com >
2026-01-07 23:47:02 -08:00
Rabi Mishra
107cf8e92f
fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN ( #31712 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 15:46:07 +08:00
Zyyeric
63baa28cf5
[Model] Enable LoRA support for tower and connector in GLM4-V ( #31652 )
...
Signed-off-by: Zyyeric <eric1976808123@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-08 15:45:53 +08:00
Andy Liu
e5173d3bac
[Bugfix] Remove the num_hidden_layers override for glm4_moe ( #31745 )
2026-01-08 15:45:10 +08:00
prashanth058
d3235cb503
[Fix] Enable mm_processor_cache with vision LoRA ( #31927 )
...
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
2026-01-08 15:31:51 +08:00
Nick Hill
287b37cda4
[BugFix] Fix spec decoding edge case bugs ( #31944 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-08 15:31:03 +08:00
Chang Su
791b2fc30a
[grpc] Support gRPC server entrypoint ( #30190 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
Signed-off-by: njhill <nickhill123@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: njhill <nickhill123@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2026-01-07 23:24:46 -08:00
Lucas Wilkinson
be6a81f31b
[chore] Update FA commit ( #30460 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-07 23:24:18 -08:00
Ronald
2ab441befe
[platform] add dp_metadata arg to set_additional_forward_context ( #31942 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
2026-01-08 06:56:44 +00:00
ShaanveerS
9572f74f15
[Model] Enable LoRA support for tower and connector in DotsOCR ( #31825 )
...
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com >
2026-01-08 14:50:16 +08:00
Andreas Karatzas
5f2a473ff3
[ROCm][CI] v1 cpu offloading attention backend fix ( #31833 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 14:37:50 +08:00
Michael Goin
6b2a672e47
[Doc] Add Claude code usage example ( #31188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-08 13:50:23 +08:00
rasmith
f1b1bea5c3
[CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 ( #31873 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-08 13:06:09 +08:00
Charlie Fu
cddbc2b4b2
[ROCm][CI] Add rocm support for run-multi-node-test.sh ( #31922 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 04:36:39 +00:00
Andreas Karatzas
087a138963
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory ( #31928 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:35:25 +00:00
Andreas Karatzas
c4041f37a4
[ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling ( #31931 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:17:56 +00:00
Richard Zou
a79079feef
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 ( #31915 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-08 04:04:58 +00:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Andreas Karatzas
8dd2419fa9
[CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency ( #31932 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 02:58:01 +00:00
Rabi Mishra
39d82005f7
fix(rocm): add early return in get_flash_attn_version for ROCm ( #31286 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:28:07 +08:00
Rabi Mishra
25eef3dc2e
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels ( #31645 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:27:09 +08:00
Matthew Bonanni
0d7667419f
[0/N][Attention] Fix miscellaneous pre-commit issues ( #31924 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-08 01:15:17 +00:00