Zhonghua Deng
|
969bbc7c61
|
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-19 17:17:03 +00:00 |
|
Elizabeth Thomas
|
41b6f9200f
|
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 19:46:28 +00:00 |
|
Lucas Wilkinson
|
30bb19a760
|
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-17 23:50:15 -08:00 |
|
Zhengxu Chen
|
5f2f3fba1d
|
[compile] Fix CI for test_gpt2_cache_hit (#30902)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 20:22:23 -08:00 |
|
SungMinCho
|
a0b782f9cc
|
[Metrics] Model FLOPs Utilization estimation (#30738)
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-18 01:40:51 +00:00 |
|
Boyuan Feng
|
104003dc77
|
update piecewise cudagraph warning when splitting_ops=[] (#30728)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-16 06:09:34 -08:00 |
|
jiangkuaixue123
|
b9ff4f2a8d
|
[feature] extend DBO to XBO (#30120)
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
|
2025-12-16 00:04:01 -05:00 |
|
Michael Goin
|
a450c64a30
|
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-15 20:18:02 +00:00 |
|
Harry Mellor
|
970713d4a4
|
Remove SkipValidation from ModelConfig (#30695)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-15 17:34:08 +00:00 |
|
Nicolò Lucchesi
|
185c22bf2f
|
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector (#29805)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-15 11:17:58 +00:00 |
|
wang.yuqi
|
4429d934de
|
[Model] Automatic conversion of TokenClassification model (#30666)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-15 08:13:00 +00:00 |
|
Boyuan Feng
|
917fdae5b2
|
[Log] Skip piecewise cudagraph warn when using full cudagraph (#30657)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-15 02:49:45 +00:00 |
|
yifant-code
|
5ccf0efa84
|
[Bugfix] Improve error messages in ModelConfig validation (#30213)
Signed-off-by: ytian218 <ytian218@bloomberg.net>
Co-authored-by: ytian218 <ytian218@bloomberg.net>
|
2025-12-14 21:23:37 +08:00 |
|
Nicolò Lucchesi
|
0efd9f867c
|
[Core] Whisper Enable Encoder Batching (#29421)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-11 21:06:51 +00:00 |
|
Harry Mellor
|
cf3eacfe58
|
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 20:45:23 +00:00 |
|
Qiu
|
a11f4a81e0
|
[Misc][PCP&DCP] relocate PCP feature check (#30050)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-11 03:36:18 -08:00 |
|
wang.yuqi
|
a5f9fb5960
|
[Deprecation] Deprecation --convert reward, use --convert embed instead. (#30463)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-11 10:18:25 +00:00 |
|
Cyrus Leung
|
7e24e5d4d6
|
[Deprecation] Remove deprecated task, seed and MM settings (#30397)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:39 -08:00 |
|
Cyrus Leung
|
5a87d8b9b1
|
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:35 -08:00 |
|
Will Eaton
|
a9e4106f28
|
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-10 11:00:52 -08:00 |
|
Nicolò Lucchesi
|
c756fb6781
|
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph (#30072)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-10 06:14:24 -08:00 |
|
PatrykSaffer
|
4c2e10ea19
|
[Bugfix] Fix cuda graph sizes when running with speculative decoding (#30330)
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
|
2025-12-10 00:47:07 +00:00 |
|
Benjamin Chislett
|
e858bfe051
|
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-09 13:29:33 -05:00 |
|
Laith Sakka
|
87aee9ed2b
|
Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-08 10:46:15 -05:00 |
|
wang.yuqi
|
9e77ffca3f
|
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-08 08:10:09 +00:00 |
|
Isotr0py
|
b952f4d3c3
|
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-07 15:51:36 +00:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
Wentao Ye
|
17eb25e327
|
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 04:44:50 +00:00 |
|
Nick Hill
|
4026ae31e9
|
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig (#30161)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-05 20:59:04 -08:00 |
|
Rohan Potdar
|
40a046cd82
|
[Bugfix]: Fix TokenizerLike interface (#30009)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2025-12-05 20:56:40 -08:00 |
|
Harry Mellor
|
bf4a901af9
|
Better error when world size is larger than node and distributed_executor_backend is not set (#30140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-05 20:53:52 -08:00 |
|
Bangsheng Tang
|
77e4472809
|
let draft model follow target model's config_format (#30152)
|
2025-12-05 13:33:42 -08:00 |
|
Ilya Markov
|
4e26d3b09e
|
[Compile] Conditional compilation. Introduce compile_ranges (#24252)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-12-05 18:17:32 +00:00 |
|
Matthew Bonanni
|
66e674cdd5
|
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-12-05 09:48:43 -08:00 |
|
Alec S
|
2c174420f5
|
Reduce validation to a warning (#28749)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-05 14:02:49 +00:00 |
|
Max Hu
|
c2894d3883
|
[Feature] Add Layer-wise NVTX Support (#29990)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
|
2025-12-05 11:20:07 +00:00 |
|
amitz-nv
|
6038b1b04b
|
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2025-12-05 00:34:33 -08:00 |
|
Qiu
|
0098a6e3da
|
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-04 21:40:51 -05:00 |
|
Mercykid-bash
|
1119f6e47a
|
Abstract eplb algo (#26471)
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Che Ruan <cr623@ic.ac.uk>
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-04 19:09:09 +00:00 |
|
wang.yuqi
|
74c4d80c6c
|
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-04 13:44:15 +00:00 |
|
Arpit Khandelwal
|
dfdda96747
|
[Core] Remove forced None assignment for deprecated PassConfig flags (#29994)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-04 09:15:04 +00:00 |
|
Xieyang Xu
|
ad32e3e19c
|
enable multi-node in external launcher mode (#29833)
|
2025-12-03 17:02:02 -08:00 |
|
Lumis Chen
|
9bcf92295a
|
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163)
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-12-03 16:06:57 +00:00 |
|
Chauncey
|
b78772c433
|
[Frontend] supports deepseekv32 chat template (#29837)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-03 20:53:44 +08:00 |
|
Yong Hoon Shin
|
69520bc695
|
Add logging for cudagraph related info (#29825)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-12-03 01:01:48 -08:00 |
|
Arpit Khandelwal
|
d7284a2604
|
[Core] Rename PassConfig flags as per RFC #27995 (#29646)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-03 03:38:55 +00:00 |
|
Isotr0py
|
63b1da76ba
|
[Chore]: Reorganize gguf utils funtions under transformers_utils (#29891)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-02 17:33:23 +00:00 |
|
Harry Mellor
|
951445a52d
|
Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 12:16:37 +00:00 |
|
Julien Denize
|
d8c6210eea
|
Add Mistral Large 3 and Ministral 3 (#29757)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
|
2025-12-02 10:29:00 +00:00 |
|