daniel-salib
|
a4ec0c5595
|
[Frontend] Add MCP tool streaming support to Responses API (#31761)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2026-01-09 09:19:34 +08:00 |
|
Robert Shaw
|
0fa8dd24d2
|
[Bugfix] Fix Typo from NVFP4 Refactor (#31977)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-08 16:18:50 -08:00 |
|
Max Hu
|
6ebe34d6fa
|
[Feature] Add iteration level logging and enhance nvtx marker (#31193)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
|
2026-01-09 00:13:39 +00:00 |
|
Nick Hill
|
11cec296dd
|
[BugFix] Add spec-decode-incompatible request param validation (#31982)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 00:08:21 +00:00 |
|
Robert Shaw
|
5825bbc1f7
|
[Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-08 19:07:45 -05:00 |
|
Yongye Zhu
|
d62cfe546d
|
[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection (#31752)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-08 19:01:30 -05:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
Dipika Sikka
|
5d3b6097ad
|
[Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs (#30881)
|
2026-01-08 17:45:17 -05:00 |
|
bnellnm
|
e74698c27a
|
[Misc][Refactor] Add FusedMoERouter object (#30519)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-08 20:52:55 +00:00 |
|
Cyrus Leung
|
aa125ecf0e
|
[Frontend] Improve error message (#31987)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-08 20:07:03 +00:00 |
|
Lucas Kabela
|
f16bfbe5bc
|
[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-08 14:33:24 -05:00 |
|
Michael Goin
|
87e07a6b46
|
Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978)
|
2026-01-08 11:31:53 -08:00 |
|
Woosuk Kwon
|
7508243249
|
[Model Runner V2] Simplify BlockTables with UVA (#31965)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-08 10:24:26 -08:00 |
|
Nicolò Lucchesi
|
83e1c76dbe
|
[CI][ROCm] Fix NIXL tests on ROCm (#31728)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-09 01:34:43 +08:00 |
|
Nishidha Panpaliya
|
a563866b48
|
Fix ijson build for Power. (#31702)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2026-01-08 17:12:33 +00:00 |
|
Nick Hill
|
a3d909ad2b
|
[Misc] Tidy up some spec decode logic in GPUModelRunner (#31591)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-08 09:10:07 -08:00 |
|
Jee Jee Li
|
49568d5cf9
|
[Doc] Improve MM models LoRA notes (#31979)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-08 08:55:22 -08:00 |
|
danisereb
|
b8112c1d85
|
[Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 (#31960)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-08 16:08:37 +00:00 |
|
Chauncey
|
eaba8ece77
|
[Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming (#31969)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-08 15:28:13 +00:00 |
|
yxing-bj
|
fe86be66c5
|
[Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
|
2026-01-08 14:42:57 +00:00 |
|
Chauncey
|
1da3a5441a
|
[Docs]: update claude code url (#31971)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-08 14:04:55 +00:00 |
|
TJian
|
72c068b8e0
|
[CI] [Bugfix] Fix unbounded variable in run-multi-node-test.sh (#31967)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-08 05:42:01 -08:00 |
|
Mary
|
7645bc524b
|
[OpenAI] Fix tool_choice=required streaming when output has trailing extra data (#31610)
Signed-off-by: maylikenoother <ogedengbemary19@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-01-08 21:01:42 +08:00 |
|
Ce Zhao
|
1123a87892
|
[Model] Enable LoRA support for Pixtral (#31724)
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local>
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
|
2026-01-08 05:00:57 -08:00 |
|
tianshu-Michael-yu
|
03fd76c570
|
[Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-08 05:00:27 -08:00 |
|
Bijaya Dangol
|
59d260f5e4
|
[Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
|
2026-01-08 04:59:48 -08:00 |
|
Patrick von Platen
|
18d4e481d0
|
[Voxtral] Fix speech transcription api (#31388)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2026-01-08 18:34:19 +08:00 |
|
Isotr0py
|
2972a05473
|
[MM Encoder]: Make MMEncoderAttention's scale takes effect properly (#31950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-08 02:33:48 -08:00 |
|
Cyrus Leung
|
5576227bc1
|
[Model] Standardize common vision encoders (#31947)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-08 02:33:16 -08:00 |
|
Cyrus Leung
|
d1b6fe007f
|
[Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-08 02:16:21 -08:00 |
|
omer-dayan
|
04a49669d1
|
RayLLM Bugfix - Preserve obj store URL for multi engine_config creation (#30803)
Signed-off-by: Omer Dayan <omdayan@nvidia.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-08 10:00:25 +00:00 |
|
BingjiaWang
|
96fcd3c267
|
[Misc] Support qwen3-next lora (#31719)
|
2026-01-08 09:27:50 +00:00 |
|
DevByteAI
|
1f214290d6
|
fix(compile): apply partition wrapper when loading AOT cached functions (#31536)
Signed-off-by: Devbyteai <abud6673@gmail.com>
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 17:27:26 +08:00 |
|
Ryan Rock
|
8cbdc7eb94
|
[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-08 09:00:24 +00:00 |
|
Lumosis
|
b634e619bb
|
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
|
2026-01-08 09:00:07 +00:00 |
|
Isotr0py
|
eac3b96ec0
|
[Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-08 08:10:15 +00:00 |
|
Zhiwei
|
573a1d1119
|
[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
|
2026-01-08 15:47:44 +08:00 |
|
Shang Wang
|
33156f56e0
|
[docker] A follow-up patch to fix #30913: [docker] install cuda13 version of lmcache and nixl (#31775)
Signed-off-by: Shang Wang <shangw@nvidia.com>
|
2026-01-07 23:47:02 -08:00 |
|
Rabi Mishra
|
107cf8e92f
|
fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN (#31712)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-08 15:46:07 +08:00 |
|
Zyyeric
|
63baa28cf5
|
[Model] Enable LoRA support for tower and connector in GLM4-V (#31652)
Signed-off-by: Zyyeric <eric1976808123@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-08 15:45:53 +08:00 |
|
Andy Liu
|
e5173d3bac
|
[Bugfix] Remove the num_hidden_layers override for glm4_moe (#31745)
|
2026-01-08 15:45:10 +08:00 |
|
prashanth058
|
d3235cb503
|
[Fix] Enable mm_processor_cache with vision LoRA (#31927)
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
|
2026-01-08 15:31:51 +08:00 |
|
Nick Hill
|
287b37cda4
|
[BugFix] Fix spec decoding edge case bugs (#31944)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-08 15:31:03 +08:00 |
|
Chang Su
|
791b2fc30a
|
[grpc] Support gRPC server entrypoint (#30190)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: njhill <nickhill123@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2026-01-07 23:24:46 -08:00 |
|
Lucas Wilkinson
|
be6a81f31b
|
[chore] Update FA commit (#30460)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-07 23:24:18 -08:00 |
|
Ronald
|
2ab441befe
|
[platform] add dp_metadata arg to set_additional_forward_context (#31942)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
|
2026-01-08 06:56:44 +00:00 |
|
ShaanveerS
|
9572f74f15
|
[Model] Enable LoRA support for tower and connector in DotsOCR (#31825)
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com>
|
2026-01-08 14:50:16 +08:00 |
|
Andreas Karatzas
|
5f2a473ff3
|
[ROCm][CI] v1 cpu offloading attention backend fix (#31833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-08 14:37:50 +08:00 |
|
Michael Goin
|
6b2a672e47
|
[Doc] Add Claude code usage example (#31188)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-08 13:50:23 +08:00 |
|
rasmith
|
f1b1bea5c3
|
[CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 (#31873)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2026-01-08 13:06:09 +08:00 |
|