Lucas Wilkinson
|
6040e0b6c0
|
[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-10-02 10:35:51 -07:00 |
|
Huy Do
|
05bf0c52a1
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-10-02 10:35:51 -07:00 |
|
Lucas Wilkinson
|
c536881a7c
|
[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-10-02 10:35:51 -07:00 |
|
Lucas Wilkinson
|
ebce361c07
|
[BugFix][DP/EP] Fix CUTLASS MLA hang under load (#26026)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-10-02 10:35:50 -07:00 |
|
Lucas Wilkinson
|
e4beabd2c8
|
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.11.0rc4
|
2025-09-30 22:47:42 -07:00 |
|
Zhewen Li
|
febb688356
|
[Bugfix] Fix __syncwarp on ROCM (#25996)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:47:42 -07:00 |
|
Roger Wang
|
a1825fe645
|
[MM] Add text-only mode for Qwen3-VL (#26000)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:47:42 -07:00 |
|
Lucia Fang
|
bab9231bf1
|
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:47:38 -07:00 |
|
qizixi
|
c214d699fd
|
[spec decode] Consolidate speculative decode method name for MTP (#25232)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-30 22:47:11 -07:00 |
|
Jee Jee Li
|
c3dfb0f6dd
|
[Bench] Add DeepSeekV32 to MoE benchmark (#25962)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
youkaichao
|
83f3c9beae
|
[bugfix][deepseek] fix flashmla kernel selection (#25956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
Nicolò Lucchesi
|
d0b178cef1
|
[NIXL] Add support for MLA caches with different latent dim (#25902)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
Yongye Zhu
|
b3230e1ac0
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
Lucas Wilkinson
|
03df0fb5d2
|
[BugFix] Fix DP/EP hang (#25906)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:10 -07:00 |
|
Wentao Ye
|
9471879bd4
|
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:32:47 -07:00 |
|
Roger Wang
|
ab5b6459df
|
[Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:32:47 -07:00 |
|
Robert Shaw
|
8ce5d3198d
|
[P/D] NIXL Updates (#25844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.11.0rc3
|
2025-09-28 22:55:33 -07:00 |
|
JJJYmmm
|
09c2cbc04a
|
[Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-28 22:55:17 -07:00 |
|
Isotr0py
|
4c347044c9
|
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.11.0rc2
|
2025-09-27 23:35:12 -07:00 |
|
Roger Wang
|
19e7ab7315
|
[Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:35:11 -07:00 |
|
Roger Wang
|
6de3d431d9
|
[MM] Optimize memory profiling for scattered multimodal embeddings (#25810)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:35:11 -07:00 |
|
Nicolò Lucchesi
|
b14773bd64
|
[Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:35:11 -07:00 |
|
Tyler Michael Smith
|
26a7a33b88
|
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:35:03 -07:00 |
|
Michael Goin
|
5aa5811a16
|
[CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Wentao Ye
|
c2fa2d4dc9
|
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Russell Bryant
|
32335c8b34
|
Add option to restrict media domains (#25783)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Russell Bryant
|
04c2b26972
|
Add filtering for chat template kwargs (#25794)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Russell Bryant
|
ee10d7e6ff
|
Validate API tokens in constant time (#25781)
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Sage Moore
|
bb79c4da2f
|
Reduce the Cuda Graph memory footprint when running with DBO (#25779)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Clouddude
|
b761df963c
|
[Doc]: improve CPU(x86) build-wheel-from-source section (#25617)
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com>
v0.11.0rc1
v0.11.1rc0
|
2025-09-26 10:26:33 -07:00 |
|
阿丹(adan)
|
33f6aaf972
|
Eagle3 that supports the Minicpm3 model (#24243)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: liudan <adan@minicpm.com>
Co-authored-by: liudan <liudan@qq.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
|
2025-09-26 10:04:57 -07:00 |
|
Jiangyun Zhu
|
56aafa8c0b
|
[Misc] fix unique_filepath (#25732)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-26 16:56:15 +00:00 |
|
Seiji Eicher
|
8d52f2b3a7
|
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
|
2025-09-26 09:43:30 -07:00 |
|
Lucas Wilkinson
|
984d18498a
|
[BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) (#25622)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-26 16:22:49 +00:00 |
|
Isotr0py
|
d4d9899860
|
[Quantization] Add field to skip unquantized modules for GPTQ config (#25455)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-26 15:47:41 +00:00 |
|
Cyrus Leung
|
db1e42f627
|
[CI/Build] Fix some V1 tests not being run (#25569)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 20:52:36 +08:00 |
|
Cyrus Leung
|
bc9d7b5595
|
[CI/Build] Split up Distributed Tests (#25572)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 14:49:33 +02:00 |
|
wang.yuqi
|
fe6b19c314
|
[Bugfix] Properly abort pooling request. (#25734)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-26 05:47:34 -07:00 |
|
Chauncey
|
2827b3f4a3
|
[CI] Fix test_shared_storage_connector_hashes (#25748)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-26 20:46:17 +08:00 |
|
Chih-Chieh Yang
|
2b6b1d7809
|
[Model] Mamba2 varlen refactor (#21467)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
|
2025-09-26 11:31:14 +00:00 |
|
Cyrus Leung
|
633f943e30
|
[Doc] Update Batch-level DP docs (#25757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 02:37:40 -07:00 |
|
Xu Wenqing
|
b03b1b97f6
|
Support LongCat-Flash-Chat tool call (#24083)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-09-26 09:25:39 +00:00 |
|
Sage Moore
|
dfb9af2014
|
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-26 01:25:28 -07:00 |
|
yyzxw
|
19f76ee68e
|
[misc] refactor speculative config (#25657)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-09-26 01:22:06 -07:00 |
|
Icey
|
dd70437a4f
|
Remove cuda hard-code in compute_causal_conv1d_metadata (#25555)
Signed-off-by: Icey <1790571317@qq.com>
|
2025-09-26 01:19:20 -07:00 |
|
Tao He
|
99b3a504c5
|
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-09-26 01:18:58 -07:00 |
|
Iceber Gu
|
6e30010d2f
|
fix: print outputt offline_inference/base/chat.py example (#25744)
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
|
2025-09-26 01:18:24 -07:00 |
|
xaguilar-amd
|
52621c8f5c
|
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703)
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com>
|
2025-09-26 01:18:20 -07:00 |
|
Andrew Sansom
|
d48f4d6daf
|
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-26 01:18:09 -07:00 |
|
Andrew Sansom
|
e84e0735c7
|
fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions (#25738)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-26 01:18:05 -07:00 |
|