biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Rohan Potdar	0e9358c11d	{ROCm]: gpt-oss fusion/padding fixes (#38043 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com>	2026-03-27 12:19:15 -04:00
Harry Mellor	21d2b53f88	Remove need for explicit `\n` in docstring lists for `--help` formatting (#38350 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 08:38:00 -07:00
Jonas M. Kübler	98e7f223b9	enable skipping of SW attention layers when using FP8 KV cache (#33695 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2026-03-27 07:25:02 -06:00
Juan Pérez de Algaba	b111f8a61f	fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952 ) Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2026-03-27 09:02:10 -04:00
Sage Moore	497e234d38	[EPLB] Cleanup the transfer logic for the various eplb maps (#34520 ) Signed-off-by: Sage Moore <sagmoore@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-27 10:18:46 +01:00
dtc	6287e7fa20	[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-03-27 09:26:40 +01:00
Shengqi Chen	84e439a9cb	[CI/Build] Move nightly wheel index generation to a single post-build step (#38322 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-27 07:44:18 +00:00
Yuichiro Utsumi	a1746ff9ec	[Doc] Clarify Helm chart location in deployment guide (#38328 ) Signed-off-by: Yuichiro Utsumi <utsumi.yuichiro@fujitsu.com> Signed-off-by: Yuichiro Utsumi <81412151+utsumi-fj@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-27 15:43:02 +08:00
Flora Feng	aee4c14689	[Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-27 14:42:26 +08:00
Bowen Bao	0ae89f18fd	[Refactor] Move FusedMoE hidden_size roundup to quant_method (#34285 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-03-26 23:38:26 -07:00
wenjun liu	c2b17d71af	[CI] Add xpu auto-label rule for Intel GPU/XPU PRs (#38320 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-27 14:22:38 +08:00
Li, Jiang	becaed6ec8	[CPU] Support CT W4A16 on CPU MP kernel (#38219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-27 14:15:28 +08:00
Xiaoshuang Wang	a8eab8f30d	[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975 ) Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Icey <1790571317@qq.com>	2026-03-27 14:13:21 +08:00
cjackal	2babac0bed	[frontend] dump openai responses type by alias (#38262 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-03-27 05:58:20 +00:00
Or Ozeri	7cc302dd87	[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-27 08:38:33 +03:00
Bvicii	999dfc1622	[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789 ) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-26 22:17:00 -07:00
wenjun liu	d86060122a	[CI/Build] enable Intel XPU test flow with prebuilt image (#37447 ) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>	2026-03-26 18:16:04 -07:00
Harry Mellor	f73bcb1c51	Various Transformers v5 config fixes (#38247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 23:06:59 +00:00
yzong-rh	28048bd6b0	[Bugfix] Add missing f-string prefix in xgrammar choices error message (#38162 ) Signed-off-by: Yifan Zong <yzong@redhat.com>	2026-03-26 21:43:03 +00:00
Giancarlo Delfin	c32e97602d	[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-26 13:38:12 -07:00
Wei Zhao	0904b6550d	Fix multi-node allreduce fusion (#38136 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: root <root@theia0053.lyris.clusters.nvidia.com>	2026-03-26 20:24:36 +00:00
Stig-Arne Grönroos	f26fcdfb9e	[Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module (#37547 ) Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>	2026-03-26 19:01:05 +00:00
TJian	bc9c6fbbe6	[ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-26 18:47:10 +00:00
Andreas Karatzas	bff9a1c266	[ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 18:33:45 +00:00
Andreas Karatzas	db01535e2b	[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile (#37930 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 12:44:01 -05:00
jennyyyyzhen	a4cf9b22ba	[ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228 ) (#37228 ) Signed-off-by: jennyyyyzhen <yzhen@hmc.edu> Co-authored-by: yZhen <yZhen@fb.com>	2026-03-26 10:33:39 -07:00
Andreas Karatzas	9c3ae04bfe	[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 16:51:18 +00:00
Andreas Karatzas	a8e48a7b85	[CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM (#38178 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 11:46:03 -05:00
Divakar Verma	b9dbc5c4ab	[Mamba][APC] Add test case to compare apc outputs (#34977 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-03-26 16:40:35 +00:00
TJian	60af7b967b	[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-26 16:32:25 +00:00
Andreas Karatzas	bdc1719eb9	[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 09:26:46 -07:00
haosdent	0aac2048bf	[Bugfix] Restore CUDA graph persistent buffers for FP8 FlashMLA decode (#35175 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-26 16:13:39 +00:00
Chuan (Richard) Li	cb2263218e	[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886 ) Signed-off-by: Li <chuali@amd.com>	2026-03-26 11:59:24 -04:00
Wentao Ye	e054f152fa	[CI] Add batch invariant test for b200 (#38014 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-26 11:54:54 -04:00
zhang-prog	0f5b526040	[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-03-26 15:34:49 +00:00
Zhewen Li	be1a85b7a2	Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050 ) (#38169 ) Co-authored-by: Zhewen Li <zhewenli@inferact.ai>	2026-03-26 07:59:09 -07:00
Cyrus Leung	2e225f7bd2	[Renderer] Consolidate factory methods (#38218 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 12:19:22 +00:00
Jared Wen	757eafcf37	[bug-fix] GLM OCR Patch Merger context_dim (#37962 ) Signed-off-by: JaredforReal <w13431838023@gmail.com>	2026-03-26 05:11:21 -07:00
wang.yuqi	dcdc145893	[CI] Reorganize scoring tests (#38207 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-26 12:07:01 +00:00
Andreas Karatzas	f2d16207c7	[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 19:57:00 +08:00
Andreas Karatzas	37a83007fe	[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-26 19:54:59 +08:00
Wentao Ye	bf5eec638d	[Refactor] Remove unused utils (#38153 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-26 17:08:19 +08:00
Mateusz Sokół	b1cb1d3d2c	DOC: Documentation pages fixes (#38125 ) Signed-off-by: Mateusz Sokół <mat646@gmail.com>	2026-03-26 16:55:42 +08:00
Kunshang Ji	6ae8bbd0c2	[XPU] Disable xpu graph by default (#38193 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-26 01:53:45 -07:00
Cyrus Leung	a9213c0ffe	[Doc] Fix outdated reference to CUDAGraphManager (#38209 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 01:52:38 -07:00
Cyrus Leung	502c41a8f6	[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 16:44:04 +08:00
Vadim Gimpelson	52069012fe	[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-26 01:21:47 -07:00
Fadi Arafeh	71161e8b63	[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-26 07:03:31 +00:00
Terry Gao	38de822310	[Model] Add torch.compile support for InternVL vision encoder (#38049 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-25 23:52:29 -07:00
Jee Jee Li	2bfbdca23c	[Bugfix] Fix benchmark_fused_collective.py (#38082 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-25 23:51:00 -07:00

1 2 3 4 5 ...

15309 Commits