Michael Goin
|
30a3e5af69
|
[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval (#26316)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 10:36:15 -07:00 |
|
fxmarty-amd
|
a38c1bfe09
|
[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py (#26364)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
|
2025-10-07 09:52:24 -07:00 |
|
Paul Pak
|
320feae6f5
|
[Model] Lfm2Moe (#26344)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2025-10-07 16:03:05 +00:00 |
|
Cyrus Leung
|
1e4ecca1d0
|
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:42:31 +00:00 |
|
Cyrus Leung
|
c0a7b89d8e
|
[Misc] Move LRUCache into its own file (#26342)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:08:40 +00:00 |
|
antrec
|
6f59beaf0b
|
[Model] Add support for ModernBertForTokenClassification (#26340)
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-07 14:29:19 +00:00 |
|
fxmarty-amd
|
41f1cf38f2
|
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166)
|
2025-10-07 09:35:26 -04:00 |
|
Isotr0py
|
08d26a1b7e
|
[Model] Use merge_by_field_config for MM models (Ovis family) (#26308)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-07 12:54:22 +00:00 |
|
fhl2000
|
63773a6200
|
[Docs] add docs for cuda graph v1 (#24374)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-07 05:25:05 -07:00 |
|
Sergio Paniego Blanco
|
883b42896a
|
Add TRL example notebook to RLHF docs (#26346)
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
|
2025-10-07 11:31:28 +00:00 |
|
Daniel Cámpora
|
e1098ced95
|
Add topk logits torch op for DS3.2. (#25945)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-10-07 10:07:32 +00:00 |
|
Grant Holmes (Ren)
|
d100d78eb3
|
Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
|
2025-10-07 09:20:30 +00:00 |
|
Cyrus Leung
|
7e4cd070b0
|
[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 16:46:44 +08:00 |
|
Snehlata
|
46b0779996
|
[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 (#26265)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-10-07 08:42:28 +00:00 |
|
Ayush Satyam
|
de342585ff
|
[Model] Define merge_by_field_config MM interface (R-T) (#26260)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 16:10:55 +08:00 |
|
Andrew Xia
|
185d8ed44f
|
[responsesAPI][bugfix] serialize harmony messages (#26185)
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-10-07 07:07:53 +00:00 |
|
Cyrus Leung
|
d9836d4517
|
[Deprecation] Deprecate LLM.set_tokenizer (#26333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 06:50:57 +00:00 |
|
Ayush Satyam
|
5f7e8a916a
|
[Model] Define merge_by_field_config MM interface (U-Z) (#26261)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 06:45:49 +00:00 |
|
ahao-anyscale
|
4dbdf4a294
|
[BUG] Fix file parsing for load_format runai_streamer_sharded (#26324)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-10-07 11:23:07 +08:00 |
|
Michael Goin
|
c6873c4e6d
|
[UX] Support nested dicts in hf_overrides (#25727)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 11:19:16 +08:00 |
|
Sage Moore
|
2111b4643c
|
[Core] Simplify the Dp padding/should ubatch coordination logic (#25768)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-07 01:57:49 +00:00 |
|
Sage Moore
|
c50901f3b9
|
[Docs][DBO] Add initial doc that describes the DBO implementation (#26024)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-07 00:47:28 +00:00 |
|
Simon Mo
|
8229280a9c
|
[Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
|
2025-10-07 00:05:33 +00:00 |
|
Benjamin Chislett
|
f77df94647
|
[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-06 23:03:49 +00:00 |
|
Gregory Shtrasberg
|
f231e5bc21
|
[ROCm] Split AITER unified attention into its own backend (#25507)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-10-06 22:49:23 +00:00 |
|
Benjamin Chislett
|
2161efe978
|
[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-06 16:16:30 -04:00 |
|
Varun Sundar Rabindranath
|
f23b4c04fd
|
[BugFix] Pad input buffers in _dummy_run (#26209)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-06 16:07:51 -04:00 |
|
Varun Sundar Rabindranath
|
93540958b8
|
[Docs] Fix broken table in moe_kernel_features doc (#26314)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-06 15:58:05 -04:00 |
|
Cyrus Leung
|
44b9af5bb2
|
[Benchmark] Enable MM Embedding benchmarks (#26310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-06 19:51:58 +00:00 |
|
Raushan Turganbay
|
7cd95dc8a3
|
[Bugfix] Fix gemma3 with transformers backend (#23178)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Raushan Turganbay <raushan@huggingface.co>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 18:42:32 +00:00 |
|
Crefeda Rodrigues
|
c02058c222
|
Add bias handling to CPUFusedMOE kernel (#26289)
Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Crefeda Rodrigues <65665931+cfRod@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Sharif Inamdar <Sharif.Inamdar@arm.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-10-06 18:39:10 +00:00 |
|
7mile
|
b2ea5ba677
|
[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231)
Signed-off-by: seven-mile <i@7li.moe>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-06 18:24:22 +00:00 |
|
Karan Goel
|
824a3f403f
|
[Misc] auto_tune: kill specific vllm process (#26304)
Signed-off-by: Karan Goel <karangoel@google.com>
|
2025-10-06 18:02:51 +00:00 |
|
Rahul Tuli
|
05f6846ede
|
Support llama3 eagle3 head with llama4 verifier (#25961)
Signed-off-by: rahul-tuli <rtuli@redhat.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-10-06 13:56:08 -04:00 |
|
Michael Goin
|
20db99cc69
|
[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe (#26188)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-06 13:50:11 -04:00 |
|
Yannick Schnider
|
6431be808f
|
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-06 17:19:34 +00:00 |
|
Matthew Bonanni
|
4727a8afa7
|
[Attention] Remove unused reorder_batch method (#24463)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-06 13:13:39 -04:00 |
|
tomeras91
|
b8f603cebe
|
[Model] EVS support for nano_nemotron_vl (#26269)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
2025-10-07 00:23:37 +08:00 |
|
Chatcharin Sangbutsarakum
|
fc679696f8
|
Fix DotsOCR tensor type (#26281)
Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com>
|
2025-10-06 12:23:43 +00:00 |
|
Raushan Turganbay
|
ab5e7d93f4
|
[Bugfix] Fix mrope in Transformers Backend (#26087)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 11:40:50 +00:00 |
|
Harry Mellor
|
0340f45553
|
Support expert parallel load balancing in Transformers backend (#26287)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 11:20:16 +00:00 |
|
Cyrus Leung
|
19a00eb210
|
[Model] Use merge_by_field_config for MM models (Llava family) (#26280)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-06 09:45:26 +00:00 |
|
Cyrus Leung
|
391612e78b
|
[Frontend] Consolidate tokenizer init code (#26276)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-06 09:34:52 +00:00 |
|
abhisheksheth28
|
77c95f72f7
|
[Doc] add KAITO to integrations (#25521)
Signed-off-by: "Abhishek Sheth" <absheth@microsoft.com>
|
2025-10-06 17:30:03 +08:00 |
|
Aritra Roy Gosthipaty
|
59f30d0448
|
[Docs] Edit HF Inference Endpoints documentation (#26275)
Signed-off-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
|
2025-10-06 10:13:09 +01:00 |
|
Roger Wang
|
43c146ca42
|
[Misc] Clean up unnecessary E501 ignore (#26274)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-06 07:29:18 +00:00 |
|
Yasmin Moslem
|
7c2ec0fe87
|
[Benchmarking] Add disable_shuffle option for dataset loading (#26258)
Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com>
|
2025-10-06 07:05:44 +00:00 |
|
dependabot[bot]
|
039b6bade3
|
Bump actions/stale from 10.0.0 to 10.1.0 (#26272)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-10-06 07:01:21 +00:00 |
|
Harry Mellor
|
6c04638214
|
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 05:12:40 +00:00 |
|
wuhang
|
91ac7f764d
|
[CI][gpt-oss] Enable python tool tests in CI (#24315)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-10-06 04:20:06 +00:00 |
|