inkcherry
|
4505849b30
|
[ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2026-01-09 14:01:57 +00:00 |
|
Andreas Karatzas
|
e02706d2d2
|
[ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-09 20:48:32 +08:00 |
|
Nick Hill
|
29ce48221c
|
[Cleanup] Remove obsolete spec decoding compatibility logic (#32003)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 05:44:18 +00:00 |
|
zhrrr
|
8ff4a99566
|
[Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 02:31:50 +00:00 |
|
Nick Hill
|
11cec296dd
|
[BugFix] Add spec-decode-incompatible request param validation (#31982)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 00:08:21 +00:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
Ryan Rock
|
8cbdc7eb94
|
[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-08 09:00:24 +00:00 |
|
Lumosis
|
b634e619bb
|
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
|
2026-01-08 09:00:07 +00:00 |
|
Andreas Karatzas
|
5f2a473ff3
|
[ROCm][CI] v1 cpu offloading attention backend fix (#31833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-08 14:37:50 +08:00 |
|
Andreas Karatzas
|
087a138963
|
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-08 04:35:25 +00:00 |
|
Richard Zou
|
a79079feef
|
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-08 04:04:58 +00:00 |
|
Nick Hill
|
10ef65eded
|
[BugFix] Fix bad words with speculative decoding (#31908)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-07 15:46:42 -05:00 |
|
Kfir Toledo
|
b89443b8d9
|
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
|
2026-01-07 16:59:43 +00:00 |
|
weiyu
|
e7596371a4
|
[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
|
2026-01-07 16:07:16 +08:00 |
|
Benjamin Chislett
|
f7008ce1c4
|
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-06 18:50:37 +00:00 |
|
Lucas Wilkinson
|
e0327c9db2
|
[Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-06 04:05:17 -08:00 |
|
John Calderon
|
2f4e6548ef
|
[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 00:23:00 +00:00 |
|
Wentao Ye
|
af9a7ec255
|
[Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 22:31:21 +00:00 |
|
Nick Hill
|
32f4e4db00
|
[Cleanup] Remove deprecated fields from CachedRequestData class (#31734)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-05 21:07:14 +00:00 |
|
Or Ozeri
|
d8e38d4939
|
Triton Attention: Support cross-layers blocks (#30687)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-05 19:29:16 +00:00 |
|
Isotr0py
|
6aa5b18e1d
|
[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:00:23 +08:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
Xingyu Liu
|
0eee877f67
|
[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
|
2026-01-02 15:13:15 -08:00 |
|
Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Nicolò Lucchesi
|
ab1af6aa3e
|
[CI][NIXL] Split DPEP tests (#31491)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-30 07:26:12 -05:00 |
|
Sage
|
39512aba72
|
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>
|
2025-12-30 00:17:16 +00:00 |
|
Alexei-V-Ivanov-AMD
|
d63b969675
|
[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187)
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Signed-off-by: <>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
|
2025-12-29 16:53:59 -05:00 |
|
Yifan Qiao
|
52bf066516
|
[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-12-26 18:25:46 -08:00 |
|
Kunshang Ji
|
5326c89803
|
[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-26 21:40:44 +00:00 |
|
Nick Hill
|
81786c8774
|
[BugFix] Fix async scheduling + reasoning with struct output (#31332)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2025-12-25 23:01:02 +00:00 |
|
Richard Zou
|
254f6b9867
|
[Bugfix] Fix eagle dp tests on A100 (#31241)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-12-25 00:05:04 +00:00 |
|
Cyrus Leung
|
aa3868ecfe
|
[Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:38:46 -08:00 |
|
Michael Goin
|
8ee90c83f8
|
Add --max-model-len auto to auto-fit context to available memory (#29431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-23 21:37:14 -08:00 |
|
Chen Zhang
|
538e830caa
|
[KVEvent] User request.block_hash for parent block_hash (#30544)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2025-12-23 18:23:43 -08:00 |
|
Mark McLoughlin
|
f790068600
|
[Core] Add a random suffix to frontend-provided request IDs (#27987)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-12-23 13:05:39 -08:00 |
|
Cyrus Leung
|
bb62dda2c3
|
[Misc] Introduce encode_*_url utility function (#31208)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-23 13:45:21 +00:00 |
|
Pavani Majety
|
3e10262356
|
Revert "[SM100] Enable fp8 compute for prefill MLA (#30746)" (#31197)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 18:15:33 -08:00 |
|
Divakar Verma
|
78e5e62bbf
|
[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-23 01:28:19 +00:00 |
|
Lucas Wilkinson
|
de71747655
|
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-22 13:06:10 -08:00 |
|
Pavani Majety
|
b10f41c894
|
[SM100] Enable fp8 compute for prefill MLA (#30746)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 19:15:57 +00:00 |
|
Seiji Eicher
|
1ab5213531
|
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-19 20:38:30 +00:00 |
|
Nick Hill
|
2ac85a4544
|
[BugFix] Fix logprobs with spec decode and modified logits (#30846)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 19:58:28 -08:00 |
|
Nick Hill
|
45c0526ac9
|
[BugFix] Handle errors when preprocessing added requests (#30895)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-19 01:29:11 +00:00 |
|
Elizabeth Thomas
|
41b6f9200f
|
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 19:46:28 +00:00 |
|
Isotr0py
|
700a5ad6c6
|
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 02:04:19 +08:00 |
|
inkcherry
|
500f26e6d3
|
[Bugfix] fix DP-aware routing in OpenAI API requests (#29002)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-12-18 09:50:42 -08:00 |
|
Nicolò Lucchesi
|
bc3700e0cd
|
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-18 11:53:30 +08:00 |
|
Micah Williamson
|
fd8afdf38d
|
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-18 10:27:37 +08:00 |
|
SungMinCho
|
a0b782f9cc
|
[Metrics] Model FLOPs Utilization estimation (#30738)
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-18 01:40:51 +00:00 |
|
Matthew Bonanni
|
7eb6cb6c18
|
[Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-17 09:49:59 -08:00 |
|