Commit Graph

14386 Commits

Author SHA1 Message Date
Cyrus Leung
14df02b4e1 [Chore] Cleanup mem_utils.py (#31793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-06 19:55:59 +08:00
BlankR
6ebb66ccea [Doc] Fix format of multimodal_inputs.md (#31800)
Signed-off-by: BlankR <hjyblanche@gmail.com>
2026-01-06 03:30:24 -08:00
wang.yuqi
43d384bab4 [CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-06 19:30:05 +08:00
Cyrus Leung
db318326a5 [Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-06 11:29:55 +00:00
Fadi Arafeh
799b5721f6 [cpu][bench] Add CPU paged attention benchmarks (#31720)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-01-06 10:57:57 +00:00
Cyrus Leung
97ca4c3b60 [Chore] Remove more V0 dead code from sequence.py (#31783)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-06 10:25:14 +00:00
Isotr0py
ee2e69d6cd [Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 00:44:22 -08:00
Isotr0py
7101e0851f [Models]: Use MMEncoderAttention for MoonViT (#31738)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: h100 <h100@inferact.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: h100 <h100@inferact.ai>
2026-01-06 08:00:25 +00:00
vllmellm
e9717801bd [Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in eagle.py (#31714)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-01-06 07:53:22 +00:00
Cyrus Leung
da71d44410 [Doc] Show that use_audio_in_video is supported in docs (#30837)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-05 23:27:19 -08:00
Kevin McKay
1fb0209bbc [Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check (#31177)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 14:10:59 +08:00
Robert Shaw
81323ea221 [CI] Fix CPU MM PRocessor Test (#31764)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-06 04:22:18 +00:00
Michael Goin
e1cd7a5faf [Bugfix] Add init_workspace_manager to moe kernel benchmarks (#31042)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-05 19:14:33 -08:00
Michael Goin
a68e703c32 [UX] Add -ep shorthand for --enable-expert-parallel (#30890)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-05 19:13:36 -08:00
maang
cd1245a184 [Cleanup] Remove redundant decoder_layer_type assignment in Qwen2 (#31760)
Signed-off-by: maang <maang_h@163.com>
2026-01-05 18:09:18 -08:00
Wentao Ye
ffec815422 [Perf] Optimize additional fill(0) in cutlass moe, 2.9% E2E throughput improvement, 10.8% TTFT improvement (#31754)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-05 18:01:13 -08:00
maang
d386ab1412 [Docs] Improve malformed exception caused by backslash line continuations (#31694)
Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-05 17:51:54 -08:00
Michael Goin
ccb309a964 Revert "[CI Failure] Disable B200 tests while runner is broken" (#31750)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-01-05 17:26:33 -08:00
John Calderon
2f4e6548ef [Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2026-01-06 00:23:00 +00:00
Seiji Eicher
3c98c2d21b [CI/Build] Allow user to configure NVSHMEM version via ENV or command line (#30732)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-01-05 15:56:08 -08:00
Michael Goin
9513029898 [Bugfix] Properly apply v_scale for mimo_v2_flash (#31175)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-05 23:20:46 +00:00
Robert Shaw
f6c0009afa [Bugfix] Fix Broken ModelOpt NVFP4 MoE (#31742)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-05 23:18:38 +00:00
Yongye Zhu
776ca1e187 [MoE Refactor] Aiter Experts for BF16 MoE (#31542)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-01-05 14:52:59 -08:00
Wentao Ye
af9a7ec255 [Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-05 22:31:21 +00:00
Matthew Bonanni
276e03b92c [CI][DeepSeek] Add nightly DeepSeek R1 lm_eval tests on H200 (#30356)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-05 17:17:59 -05:00
Nick Hill
32f4e4db00 [Cleanup] Remove deprecated fields from CachedRequestData class (#31734)
Signed-off-by: njhill <nickhill123@gmail.com>
2026-01-05 21:07:14 +00:00
amitz-nv
ee21291825 [Model] Nemotron Parse 1.1 Support (#30864)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-05 13:00:14 -08:00
Qidong Su
af1b07b0c5 [docker] install cuda13 version of lmcache and nixl (#30913)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
2026-01-05 12:50:39 -08:00
gnovack
c77a993cc2 pin lora_b moe weights on cpu (#31317)
Signed-off-by: gnovack <gnovack@amazon.com>
2026-01-05 12:15:40 -08:00
Roberto L. Castro
fdcc5176be [BugFix] Fix architecture flags to prevent issues on SM103 (#31150)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
2026-01-05 20:11:35 +00:00
Wang Kunpeng
5708297e4e [Misc][Model][Refactor] Pass the prefix into Linear layers (#31669)
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2026-01-05 20:03:18 +00:00
baonudesifeizhai
02dbb933cb Fix GLM-4.6v flash tool calling in transformers 5.x (#31622)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2026-01-05 11:32:43 -08:00
Isotr0py
51e38a8e30 [Misc] Enable Paligemma's PrefixLM attention mask computation (#31725)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 03:31:49 +08:00
Or Ozeri
d8e38d4939 Triton Attention: Support cross-layers blocks (#30687)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-05 19:29:16 +00:00
kzwrime
21156ff199 [Bugfix] Add missing extra_tensors arg to DeviceCommunicatorBase.disp… (#31644)
Signed-off-by: kunzh <zhikun.wu@outlook.com>
2026-01-06 01:26:09 +08:00
RickyChen / 陳昭儒
c455b771fd [Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager (#31643)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
2026-01-06 01:25:38 +08:00
Michael Goin
eefa713a66 [CI Failure] Disable B200 tests while runner is broken (#31732)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-05 08:50:51 -08:00
Kevin Šuc
79ed460dd5 [Frontend] [Doc] Exclude log deltas feature (#30322)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 16:34:35 +00:00
Isotr0py
6aa5b18e1d [v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 00:00:23 +08:00
wang.yuqi
911d38ed99 [Model] Let more models to support the score template. (#31335)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 11:54:26 +00:00
zzzzwwjj
caaa482aca [platform] Support additional forward context for OOT (#31674)
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 10:25:13 +00:00
Yihua Cheng
b471aad41f [KVconnector][LMCache] remove the import of legacy LMCache code (#31704)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
2026-01-05 10:11:01 +00:00
Jee Jee Li
d5503ca7f9 [LoRA] LoRA PDL improvement (#31660)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-05 08:28:46 +00:00
Qiping Pan
a2ad15c070 [Model] Enable LoRA support for BLIP2 (#31620)
Signed-off-by: Qiping Pan <panqiping@outlook.com>
2026-01-05 08:02:24 +00:00
Tres
3133c192a3 [ROCM] Reorder arguments and rename parameters for rope_cached_thd_positions_2c_fwd_inplace (#29993)
Signed-off-by: Tres Popp <tres.popp@amd.com>
2026-01-05 15:37:57 +08:00
wang.yuqi
76fd458aa7 [CI] Bump sentence-transformer from 3.2.1 to 5.2.0 (#31664)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-04 21:45:01 -08:00
cjackal
e2701cc525 [Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser (#31581)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-01-05 05:42:47 +00:00
Tyler Michael Smith
fe8a9fbd2e [Bugfix] Fix EPLB state logging error (#31455)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-01-05 04:06:28 +00:00
Ning Xie
98b8b3abaa [log] enable max_log_len trim only when needed (#31482)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-05 03:55:43 +00:00
CHENYUE
346e56455a Add chat prefix completion feature to DeepSeek v3.2 (#31147) 2026-01-05 11:20:25 +08:00