Daniele
|
f377333bd7
|
[Misc] add usedforsecurity=False in md5 hash call (#26357)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2025-10-08 10:18:32 +00:00 |
|
Wentao Ye
|
f8607863d8
|
[Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement (#26197)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-08 15:33:56 +08:00 |
|
Utkarsh Sharma
|
335b28f7d1
|
[TPU] Rename tpu_commons to tpu_inference (#26279)
Signed-off-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Utkarsh Sharma <utksharma@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-10-07 23:30:52 -07:00 |
|
Ayush Satyam
|
5e65d6b2ad
|
fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
|
2025-10-07 22:55:08 -07:00 |
|
Cyrus Leung
|
0d4f48fa10
|
[Bugfix] Incorrect MM data format in vllm bench throughput (#26395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-08 13:52:19 +08:00 |
|
Barry Kang
|
127c8b782a
|
Add gather_indexer_k_quant_cache kernel (#25931)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-08 04:58:57 +00:00 |
|
Ayush Satyam
|
cd9890544b
|
fix(v1/kv_cache): resolve async KV transfer bug in cascade attention (#23485)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
|
2025-10-08 04:46:33 +00:00 |
|
Nick Hill
|
067da2d1df
|
[Core] Simplify setting new_token_ids in CachedRequestData (#26388)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-08 03:32:37 +00:00 |
|
isharif168
|
046118b938
|
Add SwigluOAI implementation for CPUFusedMOE (#26347)
Signed-off-by: Sharif Inamdar <sharif.inamdar@arm.com>
|
2025-10-07 20:17:49 -06:00 |
|
liangel-02
|
b32260ab85
|
[torchao] safetensors integration (#25969)
Signed-off-by: Angel Li <liangel@meta.com>
|
2025-10-07 20:12:35 -06:00 |
|
Lucas Wilkinson
|
f80e7866c0
|
[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-08 10:09:34 +08:00 |
|
Thomas Parnell
|
31a4b3e6c4
|
Revert #24446 and #26168 (#26332)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-07 16:38:19 -06:00 |
|
Benjamin Chislett
|
caf8b1c084
|
[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled (#26361)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-07 22:12:26 +00:00 |
|
Michael Goin
|
1b86bd8e18
|
Add more libraries to rlhf.md (#26374)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-07 20:59:41 +00:00 |
|
Johnny Yang
|
59012df99b
|
[TPU] update TPU benchmark threshold (#25713)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
|
2025-10-07 13:53:09 -07:00 |
|
Benjamin Chislett
|
3d1f67616d
|
[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA (#25984)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-07 16:05:59 -04:00 |
|
Sergei Skvortsov
|
6ebaf43ee4
|
[V1] Logit processors for rejection sampler (#19482)
Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-07 13:02:49 -07:00 |
|
Morrison Turnansky
|
0c824fc46f
|
[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26113)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-07 12:53:43 -07:00 |
|
Pei-Lun Liao
|
eb577e4655
|
[Bugfix] Add missing sink tensor into flash attn cascade attn implementation (#26325)
|
2025-10-07 18:56:39 +00:00 |
|
Wentao Ye
|
8f36850f73
|
[Bug] Fix Shape Validation for Fallback while Enabling E8M0 for DeepGEMM (#26322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-07 13:50:30 -04:00 |
|
Chen Zhang
|
29fd2662ba
|
[deepseek] add EP8 FusedMOE config for H200 and B200 (#26331)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-07 10:38:54 -07:00 |
|
Michael Goin
|
30a3e5af69
|
[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval (#26316)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 10:36:15 -07:00 |
|
fxmarty-amd
|
a38c1bfe09
|
[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py (#26364)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
|
2025-10-07 09:52:24 -07:00 |
|
Paul Pak
|
320feae6f5
|
[Model] Lfm2Moe (#26344)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2025-10-07 16:03:05 +00:00 |
|
Cyrus Leung
|
1e4ecca1d0
|
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:42:31 +00:00 |
|
Cyrus Leung
|
c0a7b89d8e
|
[Misc] Move LRUCache into its own file (#26342)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:08:40 +00:00 |
|
antrec
|
6f59beaf0b
|
[Model] Add support for ModernBertForTokenClassification (#26340)
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-07 14:29:19 +00:00 |
|
fxmarty-amd
|
41f1cf38f2
|
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166)
|
2025-10-07 09:35:26 -04:00 |
|
Isotr0py
|
08d26a1b7e
|
[Model] Use merge_by_field_config for MM models (Ovis family) (#26308)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-07 12:54:22 +00:00 |
|
fhl2000
|
63773a6200
|
[Docs] add docs for cuda graph v1 (#24374)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-07 05:25:05 -07:00 |
|
Sergio Paniego Blanco
|
883b42896a
|
Add TRL example notebook to RLHF docs (#26346)
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
|
2025-10-07 11:31:28 +00:00 |
|
Daniel Cámpora
|
e1098ced95
|
Add topk logits torch op for DS3.2. (#25945)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-10-07 10:07:32 +00:00 |
|
Grant Holmes (Ren)
|
d100d78eb3
|
Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
|
2025-10-07 09:20:30 +00:00 |
|
Cyrus Leung
|
7e4cd070b0
|
[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 16:46:44 +08:00 |
|
Snehlata
|
46b0779996
|
[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 (#26265)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-10-07 08:42:28 +00:00 |
|
Ayush Satyam
|
de342585ff
|
[Model] Define merge_by_field_config MM interface (R-T) (#26260)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 16:10:55 +08:00 |
|
Andrew Xia
|
185d8ed44f
|
[responsesAPI][bugfix] serialize harmony messages (#26185)
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-10-07 07:07:53 +00:00 |
|
Cyrus Leung
|
d9836d4517
|
[Deprecation] Deprecate LLM.set_tokenizer (#26333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 06:50:57 +00:00 |
|
Ayush Satyam
|
5f7e8a916a
|
[Model] Define merge_by_field_config MM interface (U-Z) (#26261)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 06:45:49 +00:00 |
|
ahao-anyscale
|
4dbdf4a294
|
[BUG] Fix file parsing for load_format runai_streamer_sharded (#26324)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-10-07 11:23:07 +08:00 |
|
Michael Goin
|
c6873c4e6d
|
[UX] Support nested dicts in hf_overrides (#25727)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 11:19:16 +08:00 |
|
Sage Moore
|
2111b4643c
|
[Core] Simplify the Dp padding/should ubatch coordination logic (#25768)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 01:57:49 +00:00 |
|
Sage Moore
|
c50901f3b9
|
[Docs][DBO] Add initial doc that describes the DBO implementation (#26024)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-07 00:47:28 +00:00 |
|
Simon Mo
|
8229280a9c
|
[Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
|
2025-10-07 00:05:33 +00:00 |
|
Benjamin Chislett
|
f77df94647
|
[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-06 23:03:49 +00:00 |
|
Gregory Shtrasberg
|
f231e5bc21
|
[ROCm] Split AITER unified attention into its own backend (#25507)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-10-06 22:49:23 +00:00 |
|
Benjamin Chislett
|
2161efe978
|
[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-06 16:16:30 -04:00 |
|
Varun Sundar Rabindranath
|
f23b4c04fd
|
[BugFix] Pad input buffers in _dummy_run (#26209)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-06 16:07:51 -04:00 |
|
Varun Sundar Rabindranath
|
93540958b8
|
[Docs] Fix broken table in moe_kernel_features doc (#26314)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-06 15:58:05 -04:00 |
|
Cyrus Leung
|
44b9af5bb2
|
[Benchmark] Enable MM Embedding benchmarks (#26310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-06 19:51:58 +00:00 |
|