biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Kabela	a8c6ee9b78	[Performance Improvement] Update `batched_count_greater_than` to handle batch size 1 without recompile (#38933 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-04-09 23:51:31 +08:00
wang.yuqi	66c079ae83	[Frontend][4/n] Improve pooling entrypoints \| pooling. (#39153 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-09 10:09:45 +00:00
sihao_li	e80e633927	[XPU] Skip VLLM_BATCH_INVARIANT for XPU in EAGLE DP test (#39164 ) Signed-off-by: sihao.li <sihao.li@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-09 12:45:16 +08:00
Chendi.Xue	ef5a226819	[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller (#38935 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2026-04-09 11:19:07 +08:00
Wentao Ye	3352bf8b03	[CI Bug] Fix pre-commit issue in main (#39347 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-08 14:10:05 -07:00
triangleXIV	7c94ae16c6	[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102 ) Signed-off-by: triangle14 <y1019026570@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-04-08 14:03:17 -07:00
Rishi Puri	ad05edfbca	`tests/v1/e2e/spec_decode`: assert async scheduling is used (#39206 ) Signed-off-by: Rishi Puri <riship@nvidia.com> Signed-off-by: Rishi Puri <puririshi98@berkeley.edu> Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Flora Feng <4florafeng@gmail.com>	2026-04-08 20:30:03 +00:00
Wentao Ye	2018137242	[Feature] Batch invariant nvfp4 linear support (#39322 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-08 16:29:13 -04:00
Or Ozeri	512c5eb455	[kv_offload+HMA][5/N]: Track group block hashes and block IDs (#37109 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-04-08 19:50:28 +03:00
haosdent	8904fc4d19	[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 (#34875 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-04-08 15:30:00 +00:00
Giancarlo Delfin	5daf62271d	[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-04-07 17:37:37 -07:00
ibifrost	96b5004b71	[KVConnector] Support 3FS KVConnector (#37636 ) Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com> Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2026-04-07 15:46:00 +00:00
Ronen Schaffer	7c139ab23f	[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-04-07 15:14:45 +03:00
Andreas Karatzas	a435e3108d	[ROCm][CI] Fix test repo-root assumptions (#39053 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-07 13:36:21 +08:00
zhanqiuhu	bfdc0a3a99	[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635 )	2026-04-06 19:07:02 +02:00
Walter Beller-Morales	e69a265135	[Feat][Core] safely abort requests when FSM fails to advance (#38663 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-04-06 08:00:16 -07:00
Julien Denize	fef56c1855	[Mistral Grammar] Support Grammar Factory (#38150 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2026-04-06 10:28:51 -04:00
Greg Pereira	4dd49b06f8	[Bug] Fix Import paths for `encoder_cudagraph` modules (#38997 ) Signed-off-by: greg pereira <grpereir@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-05 19:11:58 +00:00
Aaron Batilo	9a528260ef	[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987 ) Signed-off-by: Aaron Batilo <abatilo@coreweave.com>	2026-04-05 02:41:54 -07:00
Yusuf Mohammad	46f02e00f2	[Bugfix] Fix AWQ models batch invariance issues (#38670 ) Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet> Signed-off-by: <> Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>	2026-04-03 14:54:15 +00:00
wliao2	32e0c0bfa2	refactor hard coded device string in test files under tests/v1 and tests/lora (#37566 ) Signed-off-by: Liao, Wei <wei.liao@intel.com>	2026-04-03 11:21:47 +08:00
zhanqiuhu	7b743ba953	[CI] Fix: pass string cache_dtype in test_register_kv_caches (#38836 )	2026-04-02 19:42:09 +00:00
wang.yuqi	a9b4f07ba2	[Frontend] Re-enable running MaxSim on GPU (#38620 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-03 00:03:13 +08:00
Chauncey	cbe7d18096	[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-04-01 09:56:45 -07:00
yzong-rh	dc0428ebb8	[NIXL][BUG] Fix Triton heterogeneous TP (#37940 ) Signed-off-by: Yifan <yzong@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-04-01 17:23:15 +02:00
Lucas Wilkinson	eb47454987	[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-01 00:15:53 -04:00
HarshRathva	17b72fd1c8	Fix priority preemption regression test in scheduler (#37051 ) Signed-off-by: HarshRathva <harshrathvaai@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-01 06:36:12 +03:00
Yifan Qiao	91e4521f9f	[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-03-31 17:58:37 -07:00
Wentao Ye	856589ed9a	[Refactor] Remove dead code in kv connector and model runner (#38383 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-31 17:05:23 -04:00
Matthew Bonanni	757068dc65	[Bugfix][Async] Fix async spec decoding with hybrid models (#38556 ) Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>	2026-03-31 11:08:54 -04:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
Nicolò Lucchesi	7430389669	[Bugfix][CI] Skip flaky `test_eagle` test (#38566 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 09:42:37 -04:00
Matthew Bonanni	7d65463528	[WIP][CI][Bugfix] Fix `test_run_eagle_dp` (#38584 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-31 12:30:25 +02:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
Chendi.Xue	3b1dbaad4e	[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-30 16:47:30 +00:00
Collin McCarthy	1031c84c36	Fix ambiguous num_blocks for hybrid attn mamba (#37236 ) Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-30 11:09:45 +00:00
Nicolò Lucchesi	cc06b4e86b	[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-30 09:41:50 +00:00
Andreas Karatzas	4f2ed5fddb	[ROCm][CI] Enable hybrid chunked prefill test (#38317 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 10:30:26 +08:00
Wentao Ye	995dea1354	[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-29 18:12:50 +00:00
yzong-rh	6dad4c5722	[Test] Fix flaky race condition in test_abort_final_step (#38414 ) Signed-off-by: Yifan <yzong@redhat.com>	2026-03-28 09:06:56 +00:00
dtc	6287e7fa20	[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-03-27 09:26:40 +01:00
Or Ozeri	7cc302dd87	[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-27 08:38:33 +03:00
Giancarlo Delfin	c32e97602d	[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-26 13:38:12 -07:00
Woosuk Kwon	144030c84e	Relocate Encoder CUDA graph manager (#38116 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-25 20:52:12 -07:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
Wentao Ye	d7e93e13fb	[Feature] EPLB Support for GPU Model Runner v2 (#37488 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-25 08:16:39 -07:00
Andrii Skliar	cd7643015e	[Feature] Support per-draft-model MoE backend via `--speculative-config` (#37880 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: [Andrii Skliar] <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-25 14:31:52 +00:00
Harry Mellor	d215d1efca	[Mypy] Better fixes for the `mypy` issues in `vllm/config` (#37902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 06:14:43 -07:00
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00

1 2 3 4 5 ...

1143 Commits