Nicolò Lucchesi
|
bc3700e0cd
|
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-18 11:53:30 +08:00 |
|
Matthew Bonanni
|
7eb6cb6c18
|
[Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-17 09:49:59 -08:00 |
|
Chendi.Xue
|
ae2e503dda
|
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-14 15:38:28 +00:00 |
|
Martin Hickey
|
f4417f8449
|
[KVConnector] Add KV events to KV Connectors (#28309)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2025-12-11 15:30:29 +01:00 |
|
Will Eaton
|
a9e4106f28
|
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-10 11:00:52 -08:00 |
|
Or Ozeri
|
4c6fd25880
|
kv_transfer: Rename the shared storage connectors (#30201)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-12-08 20:46:09 -08:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
rasmith
|
b12f4a9830
|
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-05 20:57:38 -08:00 |
|
Samuel Shen
|
7e31c3a3f6
|
[CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
|
2025-12-06 12:53:34 +08:00 |
|
Matthew Bonanni
|
66e674cdd5
|
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-12-05 09:48:43 -08:00 |
|
Mark McLoughlin
|
dff0a2b394
|
[NIXL] Add remote_request_id to kv_transfer_params (#29665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-05 09:43:48 -08:00 |
|
Nicolò Lucchesi
|
78c44fd722
|
[NIXL] Small cleanup of unused variables (#29618)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-05 18:17:36 +01:00 |
|
Mark McLoughlin
|
949a6a19d2
|
[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-05 15:52:45 +01:00 |
|
rasmith
|
5aa9b09040
|
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-03 22:56:35 +08:00 |
|
Harry Mellor
|
951445a52d
|
Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 12:16:37 +00:00 |
|
Matthew Bonanni
|
fc1d8be3dc
|
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-27 11:19:09 -05:00 |
|
rasmith
|
71362ffab4
|
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:42:49 +00:00 |
|
rasmith
|
a4fdf2405c
|
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 10:59:39 +00:00 |
|
Jialin Ouyang
|
30b9c67743
|
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 21:27:45 -08:00 |
|
Or Ozeri
|
647464719b
|
[KVConnector][Core] Support cross-layer KV blocks (#27743)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-11-20 19:09:59 +01:00 |
|
Cyrus Leung
|
98b4d389ed
|
[Redo] #26368 (#28771)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 22:47:41 -08:00 |
|
Chendi.Xue
|
c9e665852a
|
[NIXL] heterogeneous block_size support (#26759)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-11-14 21:51:32 -08:00 |
|
Nick Hill
|
ac86bff8cb
|
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773)
|
2025-11-14 20:24:00 -08:00 |
|
Jialin Ouyang
|
186352b270
|
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 16:04:04 -08:00 |
|
Mark McLoughlin
|
6e25b1cddf
|
[KV Connector] Test async mode in scheduler tests (#28550)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-13 18:30:59 -05:00 |
|
Nicolò Lucchesi
|
a7ef3eb0cd
|
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
|
2025-11-11 16:57:43 +00:00 |
|
Nick Hill
|
289eb6c537
|
[Core] Simplify async KV output aggregation (#28327)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-09 09:44:13 -08:00 |
|
Nick Hill
|
67a2da890e
|
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 22:11:03 +00:00 |
|
Nicolò Lucchesi
|
68a72a5cc1
|
Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012)" (#28289)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-07 15:07:01 +00:00 |
|
Samuel Shen
|
40db194446
|
[CI]: Add LMCacheConnector Unit Tests (#27852)
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
|
2025-11-05 09:45:57 -08:00 |
|
Kuntai Du
|
86dca07d9b
|
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-05 10:36:31 +00:00 |
|
Nick Hill
|
c9f66da8fd
|
[PerfFix] Avoid separate thread for MP executor shm spin (#28012)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 08:33:55 -08:00 |
|
Mark McLoughlin
|
58279c60b5
|
[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-03 23:00:49 -08:00 |
|
Yihua Cheng
|
e675118849
|
[Add] cmdline argument parsing for KV cache offloading modules (#27621)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-01 07:17:07 +00:00 |
|
Nick Hill
|
0cdbe7b744
|
[Core] Async scheduling + structured outputs compatibility (#26866)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-01 00:35:04 +00:00 |
|
GuanLuo
|
d6517be3cd
|
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-10-31 10:16:00 -07:00 |
|
Kuntai Du
|
b853540388
|
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-10-24 23:34:18 -07:00 |
|
kourosh hakhamaneshi
|
7e1d697b56
|
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366)
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-10-24 17:08:05 +00:00 |
|
Nicolò Lucchesi
|
4dfdb821c8
|
[P/D] Dynamic kv_output_aggregator collect size (#26734)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-22 18:07:58 +02:00 |
|
Mark McLoughlin
|
4ca13a8667
|
[NIXL] Terminate handshake listener thread in shutdown (#26404)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-22 16:59:53 +02:00 |
|
Tyler Michael Smith
|
6c2eef5a5d
|
[P/D] KVConnector for decode benchmarking (#25986)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-10-21 16:30:47 -07:00 |
|
Nicolò Lucchesi
|
72f431e709
|
[Nixl] Minor refactor to handshake related metadata (#26410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-21 09:07:47 +02:00 |
|
dongbo910220
|
8a297115e2
|
[Chore] Separate out hashing utilities from vllm.utils (#27151)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-10-19 11:09:38 +08:00 |
|
Nick Hill
|
4aed506b65
|
[Core] Streamline some structured output related code (#26737)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 23:27:44 +00:00 |
|
Qier Li
|
720394de43
|
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046)
Signed-off-by: Qier Li <kevin44036@gmail.com>
|
2025-10-14 14:38:07 +00:00 |
|
Chendi.Xue
|
7e6edb1469
|
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-14 09:46:05 +00:00 |
|
Will Eaton
|
53c9a7cee2
|
[P/D] [NixlConnector] kv load recovery integration (#26171)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-10-13 08:48:04 -07:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Mark McLoughlin
|
784c231151
|
[NIXL] Ignore abort on already-finished request (#25067)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-10 12:21:56 +02:00 |
|