biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	4f2ed5fddb	[ROCm][CI] Enable hybrid chunked prefill test (#38317 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 10:30:26 +08:00
Kyle Sayers	d28d86e8a3	[QeRL] Fix online quantized reloading (#38442 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-29 14:56:41 -06:00
Wentao Ye	995dea1354	[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-29 18:12:50 +00:00
allgather	8c0b6267d7	[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410 ) Signed-off-by: allgather <all2allops@gmail.com>	2026-03-29 09:59:06 +00:00
Andreas Karatzas	43cc5138e5	[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-28 22:08:03 -07:00
Shubhra Pandit	5b8c30d62b	[Spec Decode, BugFix] Propagate norm_before_fc from Eagle3 speculator (#38111 ) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>	2026-03-29 00:42:06 +00:00
haosdent	d39b8daf5f	[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-29 00:27:52 +00:00
Walter Beller-Morales	fafca38adc	[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-28 18:30:54 +00:00
Kunshang Ji	aa4eb0db78	[CI]revert initialize_model context manager (#38426 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-28 16:56:50 +00:00
Andreas Karatzas	af89140efc	[ROCm][CI] Fix UV install in Dockerfile.rocm to detect curl failures and retry (#38415 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-29 00:47:42 +08:00
haosdent	b2bc736b12	[CI] Fix Ernie4.5-VL initialization test (#38429 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-28 22:43:24 +08:00
whyiug	58c959a767	[Misc]: clean up non-core lint issues (#37049 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2026-03-28 10:28:16 -04:00
Bvicii	bda3eda82d	[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418 ) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>	2026-03-28 06:32:52 -07:00
Michael Goin	2bf5b70ae8	[CI Bugfix] Pre-download missing FlashInfer headers in Docker build (#38391 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-03-28 06:09:00 -07:00
yzong-rh	6dad4c5722	[Test] Fix flaky race condition in test_abort_final_step (#38414 ) Signed-off-by: Yifan <yzong@redhat.com>	2026-03-28 09:06:56 +00:00
Liwen	171775f306	Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108 ) Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-28 08:27:11 +00:00
TJian	58a249bc61	[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-28 06:07:03 +00:00
IriKa	148a5c1226	[Bugfix]fix output Nan/Inf in marlin if dtype=float16 (#33972 ) Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>	2026-03-27 16:36:08 -07:00
Wei Zhao	b69bf2f0b1	[Perf] Use torch compile to fuse pack topk in trtllm moe (#37695 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-03-27 17:30:46 -06:00
rongfu.leng	88149b635e	Add nvidia h800 moe config (#31201 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2026-03-27 16:28:48 -07:00
Hongxia Yang	83a4df049d	[ROCm][Documentation] update quickstart and installation to include rocm nightly docker tips (#38367 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-27 23:20:19 +00:00
Gregory Shtrasberg	731285c939	[ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 (#38252 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-03-27 18:03:12 -05:00
Johnny	97d19197bc	[NVIDIA] Fix DGX Spark logic (#38126 ) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com> Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com> Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com> Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: Sathish Sanjeevi <SKPsanjeevi@users.noreply.github.com> Co-authored-by: Guillaume Guy <guillaume.c.guy@gmail.com> Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-27 15:26:07 -07:00
Giancarlo Delfin	384e4d5f48	[Model Runner V2] Rebuild attention metadata before eagle decode full… (#38311 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-27 13:46:42 -07:00
Nicolò Lucchesi	44a6528028	[CI] Skip failing test (#38369 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-27 13:25:19 -07:00
Kyle Sayers	648edcf729	[QeRL] Compose online quantization with quantized reloading (#38032 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-27 13:22:33 -07:00
Michael Goin	7ba425e916	Add short flag `-sc` for `--speculative-config` argument (#38380 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-03-27 12:04:22 -07:00
Gregory Shtrasberg	b8665383df	[ROCm] Fix GPT-OSS import for triton 3.6 (#37453 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-03-27 18:00:57 +00:00
Rohan Potdar	0e9358c11d	{ROCm]: gpt-oss fusion/padding fixes (#38043 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com>	2026-03-27 12:19:15 -04:00
Harry Mellor	21d2b53f88	Remove need for explicit `\n` in docstring lists for `--help` formatting (#38350 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 08:38:00 -07:00
Jonas M. Kübler	98e7f223b9	enable skipping of SW attention layers when using FP8 KV cache (#33695 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2026-03-27 07:25:02 -06:00
Juan Pérez de Algaba	b111f8a61f	fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952 ) Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2026-03-27 09:02:10 -04:00
Sage Moore	497e234d38	[EPLB] Cleanup the transfer logic for the various eplb maps (#34520 ) Signed-off-by: Sage Moore <sagmoore@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-27 10:18:46 +01:00
dtc	6287e7fa20	[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-03-27 09:26:40 +01:00
Shengqi Chen	84e439a9cb	[CI/Build] Move nightly wheel index generation to a single post-build step (#38322 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-27 07:44:18 +00:00
Yuichiro Utsumi	a1746ff9ec	[Doc] Clarify Helm chart location in deployment guide (#38328 ) Signed-off-by: Yuichiro Utsumi <utsumi.yuichiro@fujitsu.com> Signed-off-by: Yuichiro Utsumi <81412151+utsumi-fj@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 15:43:02 +08:00
Flora Feng	aee4c14689	[Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-27 14:42:26 +08:00
Bowen Bao	0ae89f18fd	[Refactor] Move FusedMoE hidden_size roundup to quant_method (#34285 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-03-26 23:38:26 -07:00
wenjun liu	c2b17d71af	[CI] Add xpu auto-label rule for Intel GPU/XPU PRs (#38320 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-27 14:22:38 +08:00
Li, Jiang	becaed6ec8	[CPU] Support CT W4A16 on CPU MP kernel (#38219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-27 14:15:28 +08:00
Xiaoshuang Wang	a8eab8f30d	[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975 ) Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Icey <1790571317@qq.com>	2026-03-27 14:13:21 +08:00
cjackal	2babac0bed	[frontend] dump openai responses type by alias (#38262 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-03-27 05:58:20 +00:00
Or Ozeri	7cc302dd87	[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-27 08:38:33 +03:00
Bvicii	999dfc1622	[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789 ) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-26 22:17:00 -07:00
wenjun liu	d86060122a	[CI/Build] enable Intel XPU test flow with prebuilt image (#37447 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-26 18:16:04 -07:00
Harry Mellor	f73bcb1c51	Various Transformers v5 config fixes (#38247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 23:06:59 +00:00
yzong-rh	28048bd6b0	[Bugfix] Add missing f-string prefix in xgrammar choices error message (#38162 ) Signed-off-by: Yifan Zong <yzong@redhat.com>	2026-03-26 21:43:03 +00:00
Giancarlo Delfin	c32e97602d	[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-26 13:38:12 -07:00
Wei Zhao	0904b6550d	Fix multi-node allreduce fusion (#38136 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: root <root@theia0053.lyris.clusters.nvidia.com>	2026-03-26 20:24:36 +00:00
Stig-Arne Grönroos	f26fcdfb9e	[Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module (#37547 ) Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>	2026-03-26 19:01:05 +00:00

1 2 3 4 5 ...

15337 Commits