Cyrus Leung
|
a358e4dffe
|
[Refactor] Make Renderer an abstract class (#33479)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-01 10:36:30 +08:00 |
|
René Honig
|
079781177a
|
fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels (#33417)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-01-31 14:06:42 -08:00 |
|
Roy Wang
|
63c0889416
|
[Misc] Fix flashinfer related tests (#33462)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-01-31 16:10:24 -05:00 |
|
smashyalts
|
1e86c802d4
|
Fix grammar (#33121)
Signed-off-by: smashyalts <smashyalts@gmail.com>
|
2026-01-31 09:59:34 -08:00 |
|
linhaifeng
|
fedf64332e
|
[Bugfix]: Fix display errors in TORCH_CHECK messages (#32942)
Signed-off-by: linhaifeng <1371675203@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-31 09:48:48 -08:00 |
|
Xiao Yang
|
2238a12c13
|
[Misc] support collect_env for endpoint /server_info (#33246)
Signed-off-by: yang.xiao <yang.xiao@daocloud.io>
|
2026-02-01 01:42:59 +08:00 |
|
Harry Mellor
|
ce0afe2451
|
Update huggingface-hub pin for the last time before Transformers v5 (#33473)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-31 09:14:24 -08:00 |
|
Cyrus Leung
|
88c3e114d8
|
[Refactor] Move MM data parsing outside processor (#33408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 16:46:14 +00:00 |
|
Cyrus Leung
|
92924b2ddd
|
[Deprecation] Remove deprecated items related to pooling (#33477)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 08:44:40 -08:00 |
|
YunzhuLu
|
27cb2f678f
|
[Bugfix] Early-reject requests with MM data longer than encode cache capacity (#33110)
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-31 08:41:13 -08:00 |
|
jma99_2333
|
22d9a056d5
|
Support clear mm and encoder cache (#33452)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-31 15:22:25 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
13b842f271
|
[BugFix][Router Replay] Capture Logical Experts with EPLB (#33013)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2026-01-31 10:12:17 -05:00 |
|
Luka Govedič
|
15f40b20aa
|
[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
|
2026-01-31 06:48:34 -08:00 |
|
Cyrus Leung
|
793af538a3
|
[Doc] Update plugin deprecation notices (#33476)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 22:48:28 +08:00 |
|
cmunley1
|
6f5e7cda57
|
support return prompt token ids in responses (#33378)
|
2026-01-31 06:04:20 -08:00 |
|
Roy Wang
|
68feb76a6f
|
[Misc] Replace deprecated interface seed_everything (#33474)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-01-31 05:38:39 -08:00 |
|
Cyrus Leung
|
4cb59dea6a
|
[Bugfix] Fix incompatibility between #33372 and #32863 (#33475)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 05:21:32 -08:00 |
|
Angela Yi
|
608b556507
|
[ez] Add structured torch.compile logs (#33213)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-31 21:00:54 +08:00 |
|
Cyrus Leung
|
f0a1c8453a
|
[Frontend] Use new Renderer for Completions and Tokenize API (#32863)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 04:51:15 -08:00 |
|
caozuoba
|
8980001c93
|
[perf] v1/spec_decode: skip softmax for all-greedy rejection sampling (#32852)
Signed-off-by: hdj <1293066020@qq.com>
|
2026-01-31 09:51:26 +00:00 |
|
jennyyyyzhen
|
527bcd14d4
|
[ROCM] Enable aiter attn backend for qwen3-next model (#32492)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
|
2026-01-31 17:03:57 +08:00 |
|
Jinwu
|
f68e3ea4e1
|
[BugFix] Add synchronize in CutlassW4A8LinearKernel to ensure data is ready for use. (#33078)
Co-authored-by: jinwuguo <jinwuguo@tencent.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-31 08:14:54 +00:00 |
|
Yanan Cao
|
d5c41db35b
|
[Kernel] [Helion] [3/N] Helion kernel registry (#33203)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-01-31 15:38:46 +08:00 |
|
Fadi Arafeh
|
1618e25492
|
[CPU][Feat] Enable KleidiAI accelerated int4 dynamic quant with BF16 activations on Arm CPUs (#33122)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-31 07:16:22 +00:00 |
|
AutumnAurelium
|
f3888aca83
|
Add EAGLE3 support for AFMoE (#33111)
Signed-off-by: AutumnAurelium <88015631+AutumnAurelium@users.noreply.github.com>
|
2026-01-31 06:53:08 +00:00 |
|
Dimitrios Bariamis
|
f0bca83ee4
|
Add support for Mistral Large 3 inference with Flashinfer MoE (#33174)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-30 22:48:27 -08:00 |
|
Matthias Gehre
|
73419abfae
|
[Bugfix] Handle Asym W4A16 (ConchLinearKernel) for CT (#33200)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-01-31 06:21:51 +00:00 |
|
Nicolò Lucchesi
|
e77f162cf5
|
[Bugfix] Fix Qwen3ASR language asr tag in output (#33410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-31 05:24:49 +00:00 |
|
Yanan Cao
|
8ecd213c0b
|
[Kernel] [Helion] [2/N] Helion kernel wrapper (#32964)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-01-31 12:53:01 +08:00 |
|
Francesco Fusco
|
5b55c0bea7
|
[Attention] Clarify comment explaining attn_logits +1 dimension (#33427)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
|
2026-01-31 04:50:30 +00:00 |
|
Patrick von Platen
|
15e0bb9c42
|
[Streaming -> Realtime] Rename all voxtral related classes, fn, files (#33415)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2026-01-31 04:49:00 +00:00 |
|
Micah Williamson
|
6c64c41b4a
|
[ROCm][CI] Force max_num_seqs=1 on ROCm In test_sharded_state_loader to reduce flakiness (#33277)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-31 12:28:29 +08:00 |
|
Russell Bryant
|
a2ef06e1b3
|
[Misc] offest -> offset in comments and variable names (#33444)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-01-30 20:19:22 -08:00 |
|
Lucas Wilkinson
|
0a3c71e7e5
|
[BugFix] Fix whisper FA2 + full cudagraphs (#33360)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-31 12:15:06 +08:00 |
|
Michael Goin
|
29fba76781
|
[UX] Use gguf repo_id:quant_type syntax for examples and docs (#33371)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-31 12:14:54 +08:00 |
|
Isotr0py
|
9df152bbf6
|
[Misc] Algin Qwen3-VL-embedding image example outputs with HF repo example (#33419)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-30 19:36:56 -08:00 |
|
Nick Hill
|
876a16f4fb
|
[ModelRunner V2] Fix spec decoding + logprobs (#33391)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-31 03:33:26 +00:00 |
|
Matthew Bonanni
|
aaa901ad55
|
[Attention] Move MLA forward from backend to layer (#33284)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-30 19:30:00 -08:00 |
|
Wentao Ye
|
010ec0c30e
|
[Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 (#33362)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-31 02:54:16 +00:00 |
|
Alberto Ferrer
|
64a40a7ab4
|
[Bugfix] Fix typo in read_offset variable name (#33426)
Signed-off-by: Alberto Ferrer <albertof@barrahome.org>
|
2026-01-31 01:26:15 +00:00 |
|
Gregory Shtrasberg
|
31aedfe7d6
|
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-01-30 19:05:23 -06:00 |
|
Michael Goin
|
67ebaff528
|
Refactor NVFP4 Linear utils for ModelOpt and CT (#33201)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-30 16:37:42 -08:00 |
|
Chendi.Xue
|
2b465570e6
|
[CI][HPU]accelerate hpu test by skip python re-install and clean container name (#33286)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-01-30 21:36:29 +00:00 |
|
Huy Do
|
9ca66ecc10
|
Indicate compile mode in the benchmark results (#32990)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2026-01-30 15:34:36 -05:00 |
|
Pavani Majety
|
c3a9752b0c
|
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-01-30 10:30:46 -08:00 |
|
xuebwang-amd
|
f451b4558b
|
[Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) (#33173)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-01-30 17:50:23 +00:00 |
|
Vasiliy Kuznetsov
|
3f96fcf646
|
fix QERL attention import path (#33432)
Signed-off-by: vasiliy <vasiliy@fb.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-30 09:29:09 -08:00 |
|
Yanan Cao
|
6c1f9e4c18
|
[Kernel] [Helion] [1/N] Add Helion ConfigManager (#32740)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-01-30 12:19:19 -05:00 |
|
Harry Mellor
|
67239c4c42
|
Fix encoder-decoder model disabling mm processor cache (#33236)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 16:30:10 +00:00 |
|
Nicolò Lucchesi
|
8ece60768f
|
[CI] Qwen3-ASR transcriptios tests (#33414)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-30 16:17:56 +00:00 |
|