Commit Graph

111 Commits

Author SHA1 Message Date
Andreas Karatzas
ec27b36b4b [CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-02 08:10:54 +00:00
Huamin Li
157722da75 [perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2026-02-28 01:50:37 +08:00
Yiliu Dong
d940607629 [Core] Support min_tokens with speculative decoding (#32642)
Signed-off-by: qianlihuang <yiliu.dong@qq.com>
Co-authored-by: qianlihuang <yiliu.dong@qq.com>
2026-02-26 12:31:28 -05:00
Andreas Karatzas
9571e99945 [ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 14:16:18 -08:00
Benjamin Chislett
ee59a7c615 [Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-25 07:51:14 -05:00
Thomas Parnell
d5fe3f702c [Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2026-02-14 13:15:56 -08:00
Cyrus Leung
73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-14 10:14:21 -08:00
Harry Huang
c027541eaf [Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
2026-02-13 13:02:28 -08:00
Cyrus Leung
b96f7314b4 [Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-11 19:38:11 -08:00
Benjamin Chislett
af3162d3aa [Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-05 12:37:18 -05:00
Nick Hill
add9f1fbd9 [Minor] Include StreamingInput in inputs package (#33856)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-05 04:38:20 +00:00
Harry Mellor
61e632aea1 Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-03 17:40:59 +00:00
Patrick von Platen
10152d2194 [Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-30 18:41:29 +08:00
Cyrus Leung
c25dbee40d [Model] Bump transformers version for test registry (#33100)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 18:53:22 +00:00
Joshua Deng
91601ff478 [Feature] add session based streaming input support to v1 (#28973)
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-24 12:06:28 -08:00
Harry Huang
5206e5e28c [V1][Hybrid] Mamba Prefix Caching with align mode (#30877)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2026-01-23 09:56:48 -08:00
Micah Williamson
019e2c3b7c [ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-01-22 05:47:33 +00:00
Tomas Ruiz
4a5299c93f feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
2026-01-19 16:05:46 -05:00
Matthew Bonanni
20228cb851 [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-12 09:13:56 -08:00
Andreas Karatzas
e02706d2d2 [ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 20:48:32 +08:00
zhrrr
8ff4a99566 [Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 02:31:50 +00:00
Benjamin Chislett
f7008ce1c4 [Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-06 18:50:37 +00:00
Wentao Ye
af9a7ec255 [Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-05 22:31:21 +00:00
Micah Williamson
fd8afdf38d [ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-18 10:27:37 +08:00
Matthew Bonanni
7eb6cb6c18 [Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Andreas Karatzas
783644e4ac [ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-12 03:54:56 +00:00
Harry Mellor
8781cd6b88 Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 17:02:10 +00:00
Wentao Ye
d6464f2679 [Chore] Fix torch precision warning (#30428)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 04:05:56 +00:00
Andreas Karatzas
ed7af3178a [ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 05:33:13 +00:00
Micah Williamson
7d80c73d42 [CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-10 02:35:49 +00:00
Lucas Wilkinson
abe93bce59 [Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-12-09 17:18:10 -08:00
Wentao Ye
83319b44c2 [Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-09 10:40:37 -05:00
Charlie Fu
6af70e11a0 [ROCm][CI] Fix test_max_len.py for Rocm (#29916)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
2025-12-08 16:58:30 -05:00
Divakar Verma
962d703818 [Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-05 19:57:26 +00:00
Divakar Verma
afb1e5b380 [CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-02 20:46:10 +00:00
Divakar Verma
e2fbfc955e [CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-02 05:27:46 +00:00
Divakar Verma
a690fb5bd6 [CI][ROCm] Fix test_correctness_sliding_window (#29243)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-02 04:53:27 +00:00
Nick Hill
44822d7ff2 [BugFix] Preserve spec decoding uniform decode when scheduling (#29759)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-01 17:15:52 -08:00
EanWang211123
37b15e97e8 [Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-27 22:05:45 -08:00
Nick Hill
4e57c6587f [Core] Support logprobs with spec decode + async scheduling (#29223)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 12:55:24 -08:00
WeiQing Chen
b34129bf8e [Misc] remove useless v1 env (#29164)
Signed-off-by: David Chen <530634352@qq.com>
2025-11-21 01:41:20 -08:00
Nick Hill
5bdd155277 [CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-18 05:26:32 +00:00
Ronald
d8874c61a5 [Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-11-17 12:16:20 -08:00
Nick Hill
80b6080ddc [BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-17 06:46:46 +08:00
Nick Hill
58e61e56b7 [Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-14 16:01:09 -08:00
Laith Sakka
2e0ad629b0 Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-14 14:11:10 -08:00
Cyrus Leung
e2741f6cbc [Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-14 18:39:57 +00:00
Yong Hoon Shin
9324e10275 Fix KV sharing fast prefill with cudagraph enabled (#28537)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 11:53:42 +00:00
Yannick Schnider
119c4927b3 [Bugfix] Fix validate model input for decoder models (#27099)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-13 10:18:47 -08:00
Nicolò Lucchesi
19d91ece4b [CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-09 16:04:59 +00:00