biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Martin Vit	228023b3a5	[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990 ) Signed-off-by: Martin Vit <martin@voipmonitor.org> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-05 10:28:31 -04:00
Aaron Batilo	9a528260ef	[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987 ) Signed-off-by: Aaron Batilo <abatilo@coreweave.com>	2026-04-05 02:41:54 -07:00
Robert Shaw	968ed02ace	[Quantization][Deprecation] Remove Petit NVFP4 (#32694 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-04-05 00:07:45 +00:00
Robert Shaw	7d266abb22	Revert "[vLLM IR] gemma_rms_norm" (#38998 )	2026-04-04 17:48:08 -04:00
Xiaoshuang Wang	156405d243	[vLLM IR] gemma_rms_norm (#38780 ) Signed-off-by: Icey <1790571317@qq.com>	2026-04-04 13:55:52 -04:00
Artem Perevedentsev	99e5539a67	[Perf][GDN] Align TMA usage with upstream FLA (#38981 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-05 00:38:02 +08:00
Linkun	a88ce94bbb	[IR][RmsNorm] pass None if not has_weight (#38961 ) Signed-off-by: Linkun Chen <github@lkchen.net> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-04-04 11:02:30 -04:00
Ziming Qi	2a36d8fb72	[Bugfix][CPU] Fix macOS compatibility broken by #36487 (#38970 ) Signed-off-by: Ziming (2imi9) <148090931+2imi9@users.noreply.github.com>	2026-04-04 14:05:58 +00:00
lalit10	93726b2a1c	Refactor Arctic loading to use AutoWeightsLoader (#38955 ) Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com> Co-authored-by: Lalit Laxminarayan Bangad <lalitbangad@meta.com>	2026-04-04 05:01:09 +00:00
Yongye Zhu	8617f8676b	[Bugfix] Fix DSV32 weight loading (#38870 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2026-04-03 19:57:52 -07:00
Andreas Karatzas	06fd9ffcc4	[ROCm][CI] Fix ROCm Dockerfile conftest generation for older Docker parsers (#38959 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-04 10:41:41 +08:00
Wentao Ye	cab4064cd5	[Bug] Fix workspace manager `_current_workspaces` size (#38853 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-04 01:29:45 +00:00
Wentao Ye	062f1a2d70	[Bug] Fix compile error for `swap_blocks_batch` in CUDA 13 (#38915 )	2026-04-03 16:56:38 -07:00
elenalil-aws	81994e1d0e	[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… (#38927 ) Signed-off-by: elenalil-aws <elenalil@amazon.com>	2026-04-03 23:30:09 +00:00
Andreas Karatzas	4b506ff90a	[ROCm][CI] Minor missing import patch (#38951 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-03 23:01:20 +00:00
Andreas Karatzas	5875bb2e9c	[ROCm][CI] Added back missing common deps (#38937 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-03 15:58:57 -07:00
Kevin H. Luu	f0d3ad9f3e	[ci] Remove soft fail for AMD image build job (#38941 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-04-03 20:42:33 +00:00
Divin Honnappa	121ea5a21f	Removed GPU state confirmation and cleanup steps. (#38238 ) Signed-off-by: Divin Honnappa <divin.honnappa@amd.com>	2026-04-03 13:11:08 -07:00
Jeffrey Wang	ab79863e6c	Remove MQ multi-node tests (#38934 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-04-03 20:00:08 +00:00
Nick Hill	5f1de2b14b	[Model Runner V2] Add config validation for not-yet-supported features (#38758 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-04-03 12:08:08 -07:00
yzong-rh	a5a623d961	[Bugfix] Re-enable Renormalize routing for TRT-LLM MoE experts (#38859 ) Signed-off-by: Yifan Zong <yzong@redhat.com>	2026-04-04 01:48:17 +08:00
Xiaoshuang Wang	f8c3af2d85	[vLLM IR] add `import_ir_kernels()` to support OOT platforms (#38807 ) Signed-off-by: Icey <1790571317@qq.com>	2026-04-03 17:25:19 +00:00
danisereb	50cd5674b3	Fix invalid logprobs with MTP enabled and sync scheduling (#38711 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-04-03 12:24:37 -04:00
Vasiliy Kuznetsov	7b1a7423be	[Frontend] new online quantization frontend (#38138 ) Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>	2026-04-03 11:58:39 -04:00
Nicolò Lucchesi	97f92c6b47	[KVConnector] Skip `register_kv_caches` on profiling (#38558 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-04-03 15:40:16 +00:00
Yusuf Mohammad	46f02e00f2	[Bugfix] Fix AWQ models batch invariance issues (#38670 ) Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet> Signed-off-by: <> Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>	2026-04-03 14:54:15 +00:00
Qiming Zhang	6b4872240f	[XPU] bump up xpu-kernel v0.1.5, transpose moe weights (#38342 ) Signed-off-by: mayuyuace <qiming1.zhang@intel.com> Signed-off-by: Qiming Zhang <qiming1.zhang@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-03 14:10:02 +00:00
Necofish	580090db6b	[Kernel] Add swapAB support for SM120 CUTLASS blockwise FP8 GEMM (#38325 )	2026-04-03 15:49:59 +02:00
Artem Perevedentsev	cb10b7e80b	[GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill (#38361 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>	2026-04-03 13:38:02 +00:00
Mieszko Dziadowiec	bf8b022e60	[Intel][Triton] Support `round_int8` for Intel backend (#38825 ) Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-03 20:47:35 +08:00
xiangdong	40ee64c00e	[XPU][CI] Skip test_topp_only and test_topk_and_topp cases on Intel GPU in CI (#38904 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-04-03 20:44:52 +08:00
wufann	1b117cb0ac	[ROCm] Fix aiter persistent mode mla with q/o nhead<16 for kimi-k2.5 tp8 (#38615 ) Signed-off-by: wufann <36477220+wufann@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-03 03:54:00 -07:00
Anton Ivanov	abebd9323d	[CPU] Replace OMP initialization (#36487 ) Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>	2026-04-03 18:42:43 +08:00
Hyeonki Hong	25f2b55319	[Frontend] feat: add streaming support for token generation endpoint (#37171 ) Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>	2026-04-03 10:20:32 +00:00
xiangdong	cb4ff07f8b	[XPU][CI] Skip test_topk_only cases on Intel GPU in CI (#38899 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-04-03 09:50:41 +00:00
Gregory Shtrasberg	a7d79fa133	[ROCm][CI/Build] Fix the pytest hook to properly print out the summary (#38585 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-04-03 17:24:26 +08:00
Netanel Haber	fa9e68022d	Fix Nano Nemotron VL regressions (#38655 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-03 15:22:06 +08:00
Isotr0py	5506435419	[Misc] Clean up Gemma4 implementation (#38872 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> v0.19.1rc0	2026-04-03 05:47:02 +00:00
Yifan Qiao	311c981647	[MRV2][KVConnector] Fix missing build_connector_worker_meta (#38698 ) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-04-03 08:42:52 +03:00
Li, Jiang	21d7ecc5b0	[CI/Build] Add audio deps in Dockerfile.cpu (#38876 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-04-03 05:05:14 +00:00
Aaron Hao	4729b90838	[Bug] Add e_score_correction_bias to SKIP_TENSORS (#38746 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-04-02 21:15:05 -07:00
shunting314	8b141ed8c3	full cudagraph for flex-attn (#36298 ) Signed-off-by: shunting314 <shunting@meta.com>	2026-04-02 21:15:01 -07:00
Varun Sundar Rabindranath	2ad7c0335f	[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2026-04-02 21:14:57 -07:00
Bowen Bao	201d2ea5bf	[CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI (#38664 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-04-03 04:05:45 +00:00
Bowen Bao	103f0de565	[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774 ) Signed-off-by: Bowen Bao <bowenbao@amd.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-03 03:29:57 +00:00
wliao2	32e0c0bfa2	refactor hard coded device string in test files under tests/v1 and tests/lora (#37566 ) Signed-off-by: Liao, Wei <wei.liao@intel.com>	2026-04-03 11:21:47 +08:00
Itay Etelis	4a06e1246e	[Perf] Batch KV cache swap copies via cuMemcpyBatchAsync (#38460 ) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-03 03:13:23 +00:00
Carl Y	3bc2734dd0	[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>	2026-04-03 01:47:04 +00:00
Carl Y	1f5ec2889c	[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com> Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 21:16:11 -04:00
Yan Ma	ee3cf45739	[XPU] Initial support for GDN attention on Qwen3-next/Qwen3.5 (#33657 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-03 08:59:11 +08:00

1 2 3 4 5 ...

15531 Commits