vllmellm
|
64deead719
|
[Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-25 06:56:06 +00:00 |
|
Harry Mellor
|
316c8492bf
|
Scheduled removal of guided_* config fields (#29326)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 05:24:05 +00:00 |
|
Chen Zhang
|
71df2a57ef
|
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-24 14:28:32 -08:00 |
|
vllmellm
|
e48b2e6848
|
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-24 15:24:49 +00:00 |
|
rasmith
|
3999442f1c
|
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:45:08 +00:00 |
|
rasmith
|
71362ffab4
|
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:42:49 +00:00 |
|
Nick Hill
|
7df331c66b
|
[BugFix] Fix chunked prompt logprobs + preemption (#29071)
|
2025-11-22 16:07:18 -05:00 |
|
Nick Hill
|
d44a63c6d6
|
[BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-22 22:41:25 +08:00 |
|
Nicolò Lucchesi
|
066209a045
|
[Attention] Refactor FA block_size limitations to hybrid models only (#29084)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-22 06:38:44 -08:00 |
|
Cyrus Leung
|
5a4802588e
|
[Misc] Further clean up chunked prefill and prefix caching init (#29186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-22 19:34:15 +08:00 |
|
rasmith
|
8e22da1d7f
|
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 11:00:54 +00:00 |
|
rasmith
|
a4fdf2405c
|
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 10:59:39 +00:00 |
|
Mark McLoughlin
|
c6fa3895e9
|
[KV Connector] Fix async connector prefix cache metrics (#28585)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-21 17:45:00 -05:00 |
|
Julien Denize
|
57430fc95c
|
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 13:58:59 -08:00 |
|
Wentao Ye
|
1f400c58b8
|
[CI] Add batch invariant test to ci (#27842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 09:20:33 -07:00 |
|
WeiQing Chen
|
b34129bf8e
|
[Misc] remove useless v1 env (#29164)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-11-21 01:41:20 -08:00 |
|
Jialin Ouyang
|
30b9c67743
|
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 21:27:45 -08:00 |
|
Cyrus Leung
|
56e96b37e4
|
[V0 Deprecation] Remove best_of (#29090)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 11:40:40 +08:00 |
|
rasmith
|
c7a29d2c8d
|
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 21:44:37 +00:00 |
|
rasmith
|
8237ab8a2b
|
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 21:35:14 +00:00 |
|
Or Ozeri
|
647464719b
|
[KVConnector][Core] Support cross-layer KV blocks (#27743)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-11-20 19:09:59 +01:00 |
|
Or Ozeri
|
c0c2dd1e0b
|
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-20 18:55:10 +08:00 |
|
Wentao Ye
|
2c52c7fd9a
|
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache (#29038)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-20 16:52:23 +08:00 |
|
Benjamin Chislett
|
fcbcba6c70
|
[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-19 19:17:48 -08:00 |
|
Wentao Ye
|
1607e664f0
|
[Bug] Fix Batch Invariant MLA test (#28967)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-19 21:18:32 +00:00 |
|
Qiu
|
2fd893b4ce
|
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
|
2025-11-19 15:52:44 -05:00 |
|
Didier Durand
|
09540cd918
|
[Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-19 04:56:21 -08:00 |
|
Chendi.Xue
|
c3e2978620
|
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-18 14:03:23 -05:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Wentao Ye
|
a289cc1dde
|
[Test] Batch Invariant: Rename and organize tests (#27421)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 18:09:47 -05:00 |
|
Ronald
|
d8874c61a5
|
[Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-17 12:16:20 -08:00 |
|
Nick Hill
|
80b6080ddc
|
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 06:46:46 +08:00 |
|
Eldar Kurtić
|
e439c784fa
|
Add support for Eagle with separate lm-head and embed_tokens layers (#28549)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-15 06:12:02 -08:00 |
|
Cyrus Leung
|
98b4d389ed
|
[Redo] #26368 (#28771)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 22:47:41 -08:00 |
|
Chendi.Xue
|
c9e665852a
|
[NIXL] heterogeneous block_size support (#26759)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-11-14 21:51:32 -08:00 |
|
Nick Hill
|
ac86bff8cb
|
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773)
|
2025-11-14 20:24:00 -08:00 |
|
Jialin Ouyang
|
186352b270
|
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 16:04:04 -08:00 |
|
Nick Hill
|
58e61e56b7
|
[Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 16:01:09 -08:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Marcin Ostrowski
|
0de4f217ab
|
[Bugfix] TypeError: 'NoneType' object is not callable (#27410)
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
|
2025-11-14 21:13:53 +00:00 |
|
Cyrus Leung
|
e2741f6cbc
|
[Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 18:39:57 +00:00 |
|
Cyrus Leung
|
511a6b611d
|
[Config] Clean up SchedulerConfig initialization (#28665)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 22:41:02 +08:00 |
|
Nicolò Lucchesi
|
96b23b8e3b
|
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-14 22:40:05 +08:00 |
|
Yong Hoon Shin
|
9324e10275
|
Fix KV sharing fast prefill with cudagraph enabled (#28537)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 11:53:42 +00:00 |
|
rasmith
|
93103575ce
|
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-13 22:41:29 -08:00 |
|
Mark McLoughlin
|
6e25b1cddf
|
[KV Connector] Test async mode in scheduler tests (#28550)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-13 18:30:59 -05:00 |
|
elvischenv
|
5d6ce2b960
|
[Perf] Support stream interval for reducing host overhead (#27869)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-13 13:21:25 -05:00 |
|
Yannick Schnider
|
119c4927b3
|
[Bugfix] Fix validate model input for decoder models (#27099)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-13 10:18:47 -08:00 |
|