Lucia Fang
|
bab9231bf1
|
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:47:38 -07:00 |
|
qizixi
|
c214d699fd
|
[spec decode] Consolidate speculative decode method name for MTP (#25232)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-30 22:47:11 -07:00 |
|
Nicolò Lucchesi
|
d0b178cef1
|
[NIXL] Add support for MLA caches with different latent dim (#25902)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
Yongye Zhu
|
b3230e1ac0
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-30 22:36:24 -07:00 |
|
Russell Bryant
|
32335c8b34
|
Add option to restrict media domains (#25783)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Russell Bryant
|
04c2b26972
|
Add filtering for chat template kwargs (#25794)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-27 23:32:55 -07:00 |
|
Jiangyun Zhu
|
56aafa8c0b
|
[Misc] fix unique_filepath (#25732)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-26 16:56:15 +00:00 |
|
Seiji Eicher
|
8d52f2b3a7
|
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
|
2025-09-26 09:43:30 -07:00 |
|
Cyrus Leung
|
db1e42f627
|
[CI/Build] Fix some V1 tests not being run (#25569)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 20:52:36 +08:00 |
|
Cyrus Leung
|
bc9d7b5595
|
[CI/Build] Split up Distributed Tests (#25572)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 14:49:33 +02:00 |
|
wang.yuqi
|
fe6b19c314
|
[Bugfix] Properly abort pooling request. (#25734)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-26 05:47:34 -07:00 |
|
Chauncey
|
2827b3f4a3
|
[CI] Fix test_shared_storage_connector_hashes (#25748)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-26 20:46:17 +08:00 |
|
Chih-Chieh Yang
|
2b6b1d7809
|
[Model] Mamba2 varlen refactor (#21467)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>
|
2025-09-26 11:31:14 +00:00 |
|
Eugene Khvedchenya
|
392edee34a
|
EVS Support (Video tokens pruning) (#22980)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-26 11:54:54 +08:00 |
|
tomeras91
|
57329a8c01
|
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-09-25 16:10:29 -07:00 |
|
Ekagra Ranjan
|
e71b8e210d
|
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-25 15:22:03 -07:00 |
|
Cyrus Leung
|
6b0fcbbf43
|
[Misc] Simplify test_argsort_mm_positions (#25690)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 18:23:01 +00:00 |
|
Matthew Bonanni
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
Cyrus Leung
|
0ea80c87d9
|
[Model] Define merge_by_field_config MM interface (#25676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 17:13:07 +00:00 |
|
Nicolò Lucchesi
|
0754ac4c49
|
[Misc] Remove cruft file in repo (#25678)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-25 08:05:12 -07:00 |
|
Isotr0py
|
03858e6d1c
|
[Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-25 14:46:04 +00:00 |
|
Cyrus Leung
|
2f17117606
|
[mypy] Fix wrong type annotations related to tuple (#25660)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 13:00:45 +00:00 |
|
Cyrus Leung
|
0bcc3a160d
|
[CI/Build] Fix flaky entrypoints test (#25663)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 12:19:40 +00:00 |
|
wang.yuqi
|
7f570f1caa
|
[V0 deprecation] Remove unreachable model_config.supported_tasks (#25642)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-25 11:26:31 +00:00 |
|
Agata Dobrzyniewicz
|
3c2b2ccece
|
[Bugfix] Add triton.language.tensor placeholder (#25649)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-09-25 10:31:14 +00:00 |
|
Tyler Michael Smith
|
1260180c67
|
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-25 08:05:21 +00:00 |
|
Jacob Kahn
|
bc092ea873
|
Map CwmForCausalLM to llama and LlamaForCausalLM (#25611)
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-25 07:37:03 +00:00 |
|
XuruiYang
|
845adb3ec6
|
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
|
2025-09-24 21:53:40 -07:00 |
|
Wei Wei
|
05c19485a5
|
[Kernel] Support DCP for Triton backend (#25132)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-09-24 18:09:34 -07:00 |
|
Wentao Ye
|
1f29141258
|
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-24 18:52:36 -04:00 |
|
Shu Wang
|
54e42b72db
|
Support mnnvl all2allv from Flashinfer (#21003)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-24 14:38:16 -04:00 |
|
Tao Hui
|
e18b714b2e
|
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output (#25405)
Signed-off-by: taohui <taohui3@gmail.com>
|
2025-09-24 20:58:00 +08:00 |
|
Jonas M. Kübler
|
58c360d9be
|
[Bug] fix import and unit test (#25558)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
|
2025-09-24 10:17:59 +00:00 |
|
Woosuk Kwon
|
2e19a848d4
|
[V0 Deprecation] Remove max_seq_len_to_capture (#25543)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-24 01:51:39 -07:00 |
|
Cyrus Leung
|
6488f3481b
|
[Misc]] Move processing context to multimodal directory (#25548)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-24 08:15:00 +00:00 |
|
Isotr0py
|
27ec3c78f3
|
[CI/Build] Fix v1 OOT registration test (#25547)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-24 08:03:13 +00:00 |
|
Chengji Yao
|
190c45a6af
|
[TPU][Bugfix] fix the missing apply_model in tpu worker (#25526)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-09-24 05:18:08 +00:00 |
|
Ben Browning
|
5caaeb714c
|
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls (#25514)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-09-24 03:20:38 +00:00 |
|
Benjamin Chislett
|
c30b405b8f
|
[Spec Decode] Enable FlashInfer Spec Decoding (#25196)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
|
2025-09-23 22:29:58 -04:00 |
|
0xNullPath
|
be0bb568c9
|
[Model] Support SeedOss Reason Parser (#24263)
Signed-off-by: Yan Lu <luyan@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 18:15:51 -06:00 |
|
ahao-anyscale
|
c8bde93367
|
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together (#24922)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-09-23 18:13:32 -06:00 |
|
Doug Smith
|
7ad5e50adf
|
Improve output when failing json.loads() on structured output test (#25483)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-09-23 18:03:31 -06:00 |
|
kourosh hakhamaneshi
|
abad204be6
|
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-09-23 15:49:09 -07:00 |
|
Andrew Xia
|
95bc60e4cb
|
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-23 15:46:46 -07:00 |
|
Thomas Parnell
|
969b4da3a6
|
[V0 Deprecation] Remove placeholder attn (#25510)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-23 22:12:14 +00:00 |
|
Jialin Ouyang
|
4f8c4b890a
|
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-23 15:11:14 -07:00 |
|
Isotr0py
|
ae002924e9
|
[CI/Build] Fix and re-enable v1 PP test on CI (#25496)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 21:58:25 +00:00 |
|
Ilya Markov
|
8bdd8b5c51
|
Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-23 15:53:00 -04:00 |
|
jiahanc
|
d5944d5146
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-09-23 15:44:35 -04:00 |
|
ElizaWszola
|
63400259d0
|
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-09-23 12:03:10 -07:00 |
|