Fynn Schmitt-Ulms
|
9433acb8df
|
[Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-03-02 14:29:09 -05:00 |
|
Andreas Karatzas
|
ec27b36b4b
|
[CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 08:10:54 +00:00 |
|
Ryan Rock
|
87d319c52f
|
[AMD][CI] Support Triton attention with ExampleConnector (#34931)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-03-01 09:58:07 +02:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|
Lucas Wilkinson
|
1d532f9d8f
|
[DP] Only use DP padding when cudagraphs are actually used (#34102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-27 15:14:31 -05:00 |
|
Huamin Li
|
157722da75
|
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2026-02-28 01:50:37 +08:00 |
|
Nicolò Lucchesi
|
cabdaa7619
|
[Misc] Move GPUModelRunner.prepare_kernel_block_sizes to utils (#35400)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-27 11:42:51 +08:00 |
|
Yiliu Dong
|
d940607629
|
[Core] Support min_tokens with speculative decoding (#32642)
Signed-off-by: qianlihuang <yiliu.dong@qq.com>
Co-authored-by: qianlihuang <yiliu.dong@qq.com>
|
2026-02-26 12:31:28 -05:00 |
|
Andreas Karatzas
|
9571e99945
|
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 14:16:18 -08:00 |
|
Benjamin Chislett
|
ee59a7c615
|
[Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-25 07:51:14 -05:00 |
|
Chen Zhang
|
8fae54faff
|
[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2026-02-24 22:00:19 -08:00 |
|
Harry Mellor
|
f7967577f5
|
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM (#35203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-24 22:00:06 -08:00 |
|
Nick Hill
|
dbf0da817a
|
[Core] Cleanup engine pause/sleep logic (#34528)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-24 19:33:34 -08:00 |
|
Andreas Karatzas
|
067c5d9ad1
|
[ROCm][CI] Added MI325 mirrors (#34923)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-24 13:37:15 -08:00 |
|
Benjamin Chislett
|
f5972a872f
|
[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-24 09:49:56 -08:00 |
|
Rohan Potdar
|
2ff4e51152
|
[ROCm] AITER fused RoPE+KVCache (#33443)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: charlifu <charlifu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
|
2026-02-23 19:06:00 -08:00 |
|
haosdent
|
a2ba6a5244
|
[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-23 17:31:51 +01:00 |
|
Andreas Karatzas
|
5f68464f92
|
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-23 05:05:54 -08:00 |
|
Nicolò Lucchesi
|
ab6f3487a6
|
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:34:57 -08:00 |
|
Andreas Karatzas
|
54254f7a61
|
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:25:23 -08:00 |
|
zhongdaor-nv
|
a0fe7ea2f0
|
[feat] Add per-block extra_keys to KV events (#33304)
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 20:11:40 -08:00 |
|
Xin Yang
|
7a5adad480
|
[Kernel] Optimize sample_recovered_tokens_kernel (#34974)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-20 19:59:06 -08:00 |
|
Lucas Wilkinson
|
aaefc58ee0
|
[CI] Revert PRs 34818 and 33600 (#34979)
|
2026-02-20 13:25:50 -08:00 |
|
Wei Zhao
|
f24b2de3d3
|
[Test] Add FP8 KV Cache Testing for MLA Backends (#34473)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-20 18:51:58 +00:00 |
|
rasmith
|
0c1dc42748
|
[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-19 21:32:40 -08:00 |
|
Matthew Bonanni
|
662205d34e
|
[Bugfix] Fix Basic Models Test (#34818)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-19 14:49:07 -08:00 |
|
Kyle Sayers
|
64ac1395e8
|
[Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-02-18 13:48:11 -08:00 |
|
Jongseok Park
|
c656ba3b4d
|
[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 23:14:30 +00:00 |
|
Cyrus Leung
|
574fe75245
|
[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 05:29:01 -08:00 |
|
junuxyz
|
c61a98f529
|
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-17 12:22:56 +00:00 |
|
Ekagra Ranjan
|
cd81cdb399
|
[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-16 11:08:44 +00:00 |
|
Thomas Parnell
|
d5fe3f702c
|
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2026-02-14 13:15:56 -08:00 |
|
Cyrus Leung
|
73391a1baa
|
[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-14 10:14:21 -08:00 |
|
Andreas Karatzas
|
b3c14229b0
|
[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-14 07:32:09 -08:00 |
|
Harry Huang
|
c027541eaf
|
[Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-13 13:02:28 -08:00 |
|
Ben Browning
|
fd267bc7b7
|
[Bugfix]: Fix structured output in multi-turn gpt-oss (#34454)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-13 11:12:48 -08:00 |
|
Aaron Hao
|
dddbff4624
|
[Core] Move pause and resume functions into engine (#34125)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-13 00:15:10 -08:00 |
|
haosdent
|
4137c5dfa7
|
[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-13 00:13:22 -08:00 |
|
Cyrus Leung
|
ea5ff3a1f6
|
[Refactor] Simplify BOS/EOS token handling (#34435)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 18:18:24 -08:00 |
|
Matthew Bonanni
|
f2c47886fd
|
[Attention] Add FlashInfer Sparse MLA backend (#33451)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-02-12 17:21:54 +00:00 |
|
Cyrus Leung
|
fb455ed547
|
[V0 Deprecation] Remove code related to per-request logits processors (#34400)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 20:44:28 +08:00 |
|
Cyrus Leung
|
b96f7314b4
|
[Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:38:11 -08:00 |
|
Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Cyrus Leung
|
b5dcb372e4
|
[Misc] Clean up validation logic in input processor (#34144)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 19:29:29 -08:00 |
|
Pavani Majety
|
578977bb5e
|
[SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-10 16:18:43 -05:00 |
|
junuxyz
|
c5a66d1697
|
[Core][BugFix] Fix PP KV cache sharding memory validation (#33698)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-10 10:46:24 -05:00 |
|
Krish Gupta
|
748625cdaf
|
[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220)
Signed-off-by: KrxGu <krishom70@gmail.com>
|
2026-02-10 13:05:32 +00:00 |
|
Chen Zhang
|
97fa8f6590
|
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2026-02-10 07:41:16 +00:00 |
|
Roger Wang
|
8a5e0e2b2b
|
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:03:32 +08:00 |
|
Cyrus Leung
|
48312e579a
|
[Misc] Make PlaceholderRange.get_num_embeds a method (#34035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 05:30:17 +00:00 |
|