Andreas Karatzas
|
ec27b36b4b
|
[CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-02 08:10:54 +00:00 |
|
Huamin Li
|
157722da75
|
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2026-02-28 01:50:37 +08:00 |
|
Yiliu Dong
|
d940607629
|
[Core] Support min_tokens with speculative decoding (#32642)
Signed-off-by: qianlihuang <yiliu.dong@qq.com>
Co-authored-by: qianlihuang <yiliu.dong@qq.com>
|
2026-02-26 12:31:28 -05:00 |
|
Andreas Karatzas
|
9571e99945
|
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 14:16:18 -08:00 |
|
Benjamin Chislett
|
ee59a7c615
|
[Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-25 07:51:14 -05:00 |
|
Thomas Parnell
|
d5fe3f702c
|
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2026-02-14 13:15:56 -08:00 |
|
Cyrus Leung
|
73391a1baa
|
[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-14 10:14:21 -08:00 |
|
Harry Huang
|
c027541eaf
|
[Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-13 13:02:28 -08:00 |
|
Cyrus Leung
|
b96f7314b4
|
[Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:38:11 -08:00 |
|
Benjamin Chislett
|
af3162d3aa
|
[Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-05 12:37:18 -05:00 |
|
Nick Hill
|
add9f1fbd9
|
[Minor] Include StreamingInput in inputs package (#33856)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-05 04:38:20 +00:00 |
|
Harry Mellor
|
61e632aea1
|
Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 17:40:59 +00:00 |
|
Patrick von Platen
|
10152d2194
|
[Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-30 18:41:29 +08:00 |
|
Cyrus Leung
|
c25dbee40d
|
[Model] Bump transformers version for test registry (#33100)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-26 18:53:22 +00:00 |
|
Joshua Deng
|
91601ff478
|
[Feature] add session based streaming input support to v1 (#28973)
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-24 12:06:28 -08:00 |
|
Harry Huang
|
5206e5e28c
|
[V1][Hybrid] Mamba Prefix Caching with align mode (#30877)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2026-01-23 09:56:48 -08:00 |
|
Micah Williamson
|
019e2c3b7c
|
[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-22 05:47:33 +00:00 |
|
Tomas Ruiz
|
4a5299c93f
|
feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2026-01-19 16:05:46 -05:00 |
|
Matthew Bonanni
|
20228cb851
|
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-12 09:13:56 -08:00 |
|
Andreas Karatzas
|
e02706d2d2
|
[ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-09 20:48:32 +08:00 |
|
zhrrr
|
8ff4a99566
|
[Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 02:31:50 +00:00 |
|
Benjamin Chislett
|
f7008ce1c4
|
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-06 18:50:37 +00:00 |
|
Wentao Ye
|
af9a7ec255
|
[Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 22:31:21 +00:00 |
|
Micah Williamson
|
fd8afdf38d
|
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-18 10:27:37 +08:00 |
|
Matthew Bonanni
|
7eb6cb6c18
|
[Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-17 09:49:59 -08:00 |
|
Andreas Karatzas
|
783644e4ac
|
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-12 03:54:56 +00:00 |
|
Harry Mellor
|
8781cd6b88
|
Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 17:02:10 +00:00 |
|
Wentao Ye
|
d6464f2679
|
[Chore] Fix torch precision warning (#30428)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-11 04:05:56 +00:00 |
|
Andreas Karatzas
|
ed7af3178a
|
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-10 05:33:13 +00:00 |
|
Micah Williamson
|
7d80c73d42
|
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-10 02:35:49 +00:00 |
|
Lucas Wilkinson
|
abe93bce59
|
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-12-09 17:18:10 -08:00 |
|
Wentao Ye
|
83319b44c2
|
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-09 10:40:37 -05:00 |
|
Charlie Fu
|
6af70e11a0
|
[ROCm][CI] Fix test_max_len.py for Rocm (#29916)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
|
2025-12-08 16:58:30 -05:00 |
|
Divakar Verma
|
962d703818
|
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-05 19:57:26 +00:00 |
|
Divakar Verma
|
afb1e5b380
|
[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-02 20:46:10 +00:00 |
|
Divakar Verma
|
e2fbfc955e
|
[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-02 05:27:46 +00:00 |
|
Divakar Verma
|
a690fb5bd6
|
[CI][ROCm] Fix test_correctness_sliding_window (#29243)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-02 04:53:27 +00:00 |
|
Nick Hill
|
44822d7ff2
|
[BugFix] Preserve spec decoding uniform decode when scheduling (#29759)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-01 17:15:52 -08:00 |
|
EanWang211123
|
37b15e97e8
|
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-27 22:05:45 -08:00 |
|
Nick Hill
|
4e57c6587f
|
[Core] Support logprobs with spec decode + async scheduling (#29223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 12:55:24 -08:00 |
|
WeiQing Chen
|
b34129bf8e
|
[Misc] remove useless v1 env (#29164)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-11-21 01:41:20 -08:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Ronald
|
d8874c61a5
|
[Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-17 12:16:20 -08:00 |
|
Nick Hill
|
80b6080ddc
|
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 06:46:46 +08:00 |
|
Nick Hill
|
58e61e56b7
|
[Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 16:01:09 -08:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Cyrus Leung
|
e2741f6cbc
|
[Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 18:39:57 +00:00 |
|
Yong Hoon Shin
|
9324e10275
|
Fix KV sharing fast prefill with cudagraph enabled (#28537)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 11:53:42 +00:00 |
|
Yannick Schnider
|
119c4927b3
|
[Bugfix] Fix validate model input for decoder models (#27099)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-13 10:18:47 -08:00 |
|
Nicolò Lucchesi
|
19d91ece4b
|
[CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-09 16:04:59 +00:00 |
|