Commit Graph

12835 Commits

Author SHA1 Message Date
Kevin Šuc
ac9f9330e6 Rename --exclude-log-deltas to --enable-log-deltas (#32020)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
2026-01-09 15:30:40 +00:00
Isotr0py
2d0c5b630e [Doc] Remove hardcoded Whisper in example openai translation client (#32027)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-09 14:44:52 +00:00
Michael Goin
34cd32fe30 [Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe (#31832)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-01-09 07:40:33 -07:00
R3hankhan
8e27663b6a [CPU] Add head sizes 80 and 112 with vec16 fallback (#31968)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2026-01-09 22:14:46 +08:00
maang
7cdf7e2fe0 [Model] Remove redundant None check in DeepSeekOCR image input processing (#32016)
Signed-off-by: maang <maang_h@163.com>
2026-01-09 06:12:44 -08:00
Adolfo Victoria
bbf80ede43 Fix type error (#31999)
Signed-off-by: Adolfo Victoria <adolfokarim@gmail.com>
Co-authored-by: Adolfo Victoria <adovi@meta.com>
2026-01-09 22:03:32 +08:00
inkcherry
4505849b30 [ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2026-01-09 14:01:57 +00:00
Roger Wang
db07433ce5 [Misc] Skip hashing kwargs if value is None (#32025)
Signed-off-by: Roger Wang <hey@rogerw.io>
2026-01-09 13:20:59 +00:00
Andreas Karatzas
e02706d2d2 [ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 20:48:32 +08:00
Sophie du Couédic
b474782ad7 [Feature][Benchmarks] Custom dataset: read output length from dataset (#31881)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
2026-01-09 12:40:59 +00:00
Bofeng Xue
55212c1404 fix: remove duplicate engine_id check in nixl_connector (#31948)
Signed-off-by: Bofeng BF1 Xue <xuebf1@Lenovo.com>
Co-authored-by: Bofeng BF1 Xue <xuebf1@Lenovo.com>
2026-01-09 12:13:17 +00:00
Xin Yang
e7b68f4d6c [Bugfix] Fix Triton FusedMoE LoRA (#30585)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-09 11:46:59 +00:00
vllmellm
1a19e9cd87 [Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten (#31380)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-01-09 19:28:02 +08:00
Cyrus Leung
c8ed39b9dd [Model] Reorganize pooling layers (#31973)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-09 11:02:14 +00:00
Andreas Karatzas
020732800c [Bugfix] Fix OpenAPI schema test failures (#31921)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 10:56:20 +00:00
Alex Brooks
dc77cb7129 [Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2026-01-09 10:28:43 +00:00
gnovack
bde38c11df fix lora moe sharding when rank < max_lora_rank (#31994)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-09 14:43:25 +08:00
Xin Yang
707b240d7e [Bugfix] Fix FusedMoE LoRA w2_output_size (#31949)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-09 00:54:05 -05:00
Nick Hill
29ce48221c [Cleanup] Remove obsolete spec decoding compatibility logic (#32003)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 05:44:18 +00:00
TJian
7a05d2dc65 [CI] [ROCm] Fix tests/entrypoints/test_grpc_server.py on ROCm (#31970)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-01-09 12:54:20 +08:00
Divakar Verma
a1648c4045 [ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-09 04:04:33 +00:00
RioS
e2d49ec2a4 [Bugfix] missing tokens occur in harmony streaming (#30437)
Signed-off-by: RioS <aa248424@gmail.com>
Signed-off-by: Ri0S <aa248424@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-01-09 03:59:34 +00:00
Xin Yang
8413868dab [Bugfix] Fix typo in FusedMoE LoRA reshape comment (#31992)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-08 18:46:05 -08:00
zhrrr
8ff4a99566 [Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 02:31:50 +00:00
daniel-salib
a4ec0c5595 [Frontend] Add MCP tool streaming support to Responses API (#31761)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2026-01-09 09:19:34 +08:00
Robert Shaw
0fa8dd24d2 [Bugfix] Fix Typo from NVFP4 Refactor (#31977)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-08 16:18:50 -08:00
Max Hu
6ebe34d6fa [Feature] Add iteration level logging and enhance nvtx marker (#31193)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
2026-01-09 00:13:39 +00:00
Nick Hill
11cec296dd [BugFix] Add spec-decode-incompatible request param validation (#31982)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 00:08:21 +00:00
Robert Shaw
5825bbc1f7 [Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-08 19:07:45 -05:00
Yongye Zhu
d62cfe546d [MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection (#31752)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-08 19:01:30 -05:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
Dipika Sikka
5d3b6097ad [Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs (#30881) 2026-01-08 17:45:17 -05:00
bnellnm
e74698c27a [Misc][Refactor] Add FusedMoERouter object (#30519)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-01-08 20:52:55 +00:00
Cyrus Leung
aa125ecf0e [Frontend] Improve error message (#31987)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 20:07:03 +00:00
Lucas Kabela
f16bfbe5bc [Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-08 14:33:24 -05:00
Michael Goin
87e07a6b46 Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978) 2026-01-08 11:31:53 -08:00
Woosuk Kwon
7508243249 [Model Runner V2] Simplify BlockTables with UVA (#31965)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-08 10:24:26 -08:00
Nicolò Lucchesi
83e1c76dbe [CI][ROCm] Fix NIXL tests on ROCm (#31728)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-09 01:34:43 +08:00
Nishidha Panpaliya
a563866b48 Fix ijson build for Power. (#31702)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2026-01-08 17:12:33 +00:00
Nick Hill
a3d909ad2b [Misc] Tidy up some spec decode logic in GPUModelRunner (#31591)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-08 09:10:07 -08:00
Jee Jee Li
49568d5cf9 [Doc] Improve MM models LoRA notes (#31979)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-08 08:55:22 -08:00
danisereb
b8112c1d85 [Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 (#31960)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-01-08 16:08:37 +00:00
Chauncey
eaba8ece77 [Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming (#31969)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-08 15:28:13 +00:00
yxing-bj
fe86be66c5 [Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
2026-01-08 14:42:57 +00:00
Chauncey
1da3a5441a [Docs]: update claude code url (#31971)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-08 14:04:55 +00:00
TJian
72c068b8e0 [CI] [Bugfix] Fix unbounded variable in run-multi-node-test.sh (#31967)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-01-08 05:42:01 -08:00
Mary
7645bc524b [OpenAI] Fix tool_choice=required streaming when output has trailing extra data (#31610)
Signed-off-by: maylikenoother <ogedengbemary19@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-01-08 21:01:42 +08:00
Ce Zhao
1123a87892 [Model] Enable LoRA support for Pixtral (#31724)
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local>
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
2026-01-08 05:00:57 -08:00
tianshu-Michael-yu
03fd76c570 [Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00