Ekagra Ranjan
cd81cdb399
[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs ( #31058 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-16 11:08:44 +00:00
Cyrus Leung
ea5ff3a1f6
[Refactor] Simplify BOS/EOS token handling ( #34435 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:18:24 -08:00
Krish Gupta
748625cdaf
[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input ( #34220 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-10 13:05:32 +00:00
Mark McLoughlin
2abd97592f
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries ( #30522 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-05 09:57:27 +02:00
Nick Hill
fa4e0fb028
[Core] Don't schedule spec tokens with prefill chunks ( #33652 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-04 23:40:22 +00:00
Or Ozeri
8e32690869
[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads ( #32255 )
...
Fixes a not-yet-reported case where it was possible for blocks to be
freed by an abort before an async transfer completed, resulting
in corrupted KV data.
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-04 11:16:34 +00:00
Nick Hill
52ee21021a
[BugFix][Spec Decoding] Fix negative accepted tokens metric crash ( #33729 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-03 23:34:41 +00:00
Cyrus Leung
c6e7404cc5
[Multimodal] Simplify MM input definitions ( #33331 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:32:04 +00:00
Wentao Ye
3e440786af
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement ( #32618 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 20:30:32 +00:00
Joshua Deng
91601ff478
[Feature] add session based streaming input support to v1 ( #28973 )
...
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com >
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-24 12:06:28 -08:00
knlnguyen1802
378385b90c
[EC Connector] Optimize remote cache check in scheduler ( #32585 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
2026-01-22 03:30:59 +00:00
Or Ozeri
2be765b68a
[BugFix] scheduler: Fix ordering preserving of skipped requests ( #32173 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-12 18:39:38 +00:00
Or Ozeri
028599739d
[BugFix] scheduler: Fix resuming of preempted requests after async load ( #31583 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-10 12:39:25 -08:00
Lumosis
b634e619bb
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. ( #31635 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com >
2026-01-08 09:00:07 +00:00
Nick Hill
32f4e4db00
[Cleanup] Remove deprecated fields from CachedRequestData class ( #31734 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-05 21:07:14 +00:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors ( #30201 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-08 20:46:09 -08:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-06 23:15:42 -08:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 12:16:37 +00:00
Zhuohan Li
d0cd728907
[Core] Support reseting all running requests' KV while calling reset_prefix_cache ( #28827 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-02 02:25:05 +00:00
wang.yuqi
67fc16cd8c
[Bugfix] If chunked_prefill is disabled, end the scheduling early. ( #28911 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-25 16:06:09 +08:00
Cyrus Leung
5a4802588e
[Misc] Further clean up chunked prefill and prefix caching init ( #29186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-22 19:34:15 +08:00
Mark McLoughlin
c6fa3895e9
[KV Connector] Fix async connector prefix cache metrics ( #28585 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-21 17:45:00 -05:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 ( #28771 )" ( #29121 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-20 21:27:45 -08:00
Cyrus Leung
98b4d389ed
[Redo] #26368 ( #28771 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 22:47:41 -08:00
Nick Hill
ac86bff8cb
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… ( #28773 )
2025-11-14 20:24:00 -08:00
Jialin Ouyang
186352b270
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization ( #26368 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 16:04:04 -08:00
Cyrus Leung
e2741f6cbc
[Chore] Rename SchedulerConfig.chunked_prefill_enabled ( #28735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 18:39:57 +00:00
Cyrus Leung
511a6b611d
[Config] Clean up SchedulerConfig initialization ( #28665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 22:41:02 +08:00
Mark McLoughlin
6e25b1cddf
[KV Connector] Test async mode in scheduler tests ( #28550 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-13 18:30:59 -05:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com >
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: herotai214 <herotai214@gmail.com >
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com >
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com >
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: herotai214 <herotai214@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com >
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com >
2025-11-11 18:58:33 -08:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-05 10:36:31 +00:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 00:35:04 +00:00
Kuntai Du
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
Tova Movshovitz
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
Nick Hill
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 ( #26596 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-10 18:21:25 +00:00
Chen Zhang
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching ( #26296 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-10 03:02:49 -07:00
Qier Li
d17f0fbf30
[Core][KVConnector] Propagate all tokens on resumed preemptions ( #24926 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
Co-authored-by: Qier Li <qier@fb.com >
2025-10-09 14:43:31 +08:00
Elaine Zhao
f08919b7d1
[Bugfix] Respect min_tokens in scheduler stop check ( #26317 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-10-08 14:08:24 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Huamin Li
7d6b03381e
[CI Failure] fix_test_auto_prefix_cache_support ( #26053 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-04 02:44:49 -07:00
Reza Barazesh
bc546f76a1
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 14:45:20 +01:00
Simon Danielsson
e23cacda35
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2025-09-30 08:17:49 +00:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-18 09:20:27 +00:00
Flora Feng
69f46359dd
[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec ( #23779 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2025-08-29 18:36:57 +08:00
Hanchenli
5da4f5d857
[Bugfix] Fix for V1 priority scheduling crashes at preemption ( #23713 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
2025-08-28 00:44:52 +00:00
Chenguang Zheng
d765cf01fe
[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests ( #22711 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-08-25 00:41:17 -07:00
Woosuk Kwon
c9b38be8aa
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT ( #23041 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-18 17:20:38 -07:00