biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mieszko Dziadowiec	bf8b022e60	[Intel][Triton] Support `round_int8` for Intel backend (#38825 ) Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-03 20:47:35 +08:00
xiangdong	40ee64c00e	[XPU][CI] Skip test_topp_only and test_topk_and_topp cases on Intel GPU in CI (#38904 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-04-03 20:44:52 +08:00
wufann	1b117cb0ac	[ROCm] Fix aiter persistent mode mla with q/o nhead<16 for kimi-k2.5 tp8 (#38615 ) Signed-off-by: wufann <36477220+wufann@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-03 03:54:00 -07:00
Anton Ivanov	abebd9323d	[CPU] Replace OMP initialization (#36487 ) Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>	2026-04-03 18:42:43 +08:00
Hyeonki Hong	25f2b55319	[Frontend] feat: add streaming support for token generation endpoint (#37171 ) Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>	2026-04-03 10:20:32 +00:00
xiangdong	cb4ff07f8b	[XPU][CI] Skip test_topk_only cases on Intel GPU in CI (#38899 ) Signed-off-by: zengxian <xiangdong.zeng@intel.com>	2026-04-03 09:50:41 +00:00
Gregory Shtrasberg	a7d79fa133	[ROCm][CI/Build] Fix the pytest hook to properly print out the summary (#38585 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-04-03 17:24:26 +08:00
Netanel Haber	fa9e68022d	Fix Nano Nemotron VL regressions (#38655 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-03 15:22:06 +08:00
Isotr0py	5506435419	[Misc] Clean up Gemma4 implementation (#38872 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> v0.19.1rc0	2026-04-03 05:47:02 +00:00
Yifan Qiao	311c981647	[MRV2][KVConnector] Fix missing build_connector_worker_meta (#38698 ) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-04-03 08:42:52 +03:00
Li, Jiang	21d7ecc5b0	[CI/Build] Add audio deps in Dockerfile.cpu (#38876 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-04-03 05:05:14 +00:00
Aaron Hao	4729b90838	[Bug] Add e_score_correction_bias to SKIP_TENSORS (#38746 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-04-02 21:15:05 -07:00
shunting314	8b141ed8c3	full cudagraph for flex-attn (#36298 ) Signed-off-by: shunting314 <shunting@meta.com>	2026-04-02 21:15:01 -07:00
Varun Sundar Rabindranath	2ad7c0335f	[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2026-04-02 21:14:57 -07:00
Bowen Bao	201d2ea5bf	[CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI (#38664 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-04-03 04:05:45 +00:00
Bowen Bao	103f0de565	[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774 ) Signed-off-by: Bowen Bao <bowenbao@amd.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-03 03:29:57 +00:00
wliao2	32e0c0bfa2	refactor hard coded device string in test files under tests/v1 and tests/lora (#37566 ) Signed-off-by: Liao, Wei <wei.liao@intel.com>	2026-04-03 11:21:47 +08:00
Itay Etelis	4a06e1246e	[Perf] Batch KV cache swap copies via cuMemcpyBatchAsync (#38460 ) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-03 03:13:23 +00:00
Carl Y	3bc2734dd0	[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>	2026-04-03 01:47:04 +00:00
Carl Y	1f5ec2889c	[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com> Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 21:16:11 -04:00
Yan Ma	ee3cf45739	[XPU] Initial support for GDN attention on Qwen3-next/Qwen3.5 (#33657 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-03 08:59:11 +08:00
Matthew Bonanni	05e68e1f81	[CI] Fix `test_nixl_connector` (#38838 )	2026-04-02 17:52:13 -07:00
Vadim Gimpelson	771913e4a0	[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 (#38832 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-04-03 04:45:57 +04:00
1096125073	71a9125c67	[New Model]: add support for telechat3 (#38510 ) Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn> Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>	2026-04-03 08:26:22 +08:00
Nicolò Lucchesi	66e86f1dbd	[Kernel] Mamba support different layout for Conv state (#37416 )	2026-04-03 01:50:09 +02:00
Michael	bb39382b2b	[Bugfix]: Fix Gemma4ToolParser.__init__() missing `tools` parameter (#38847 ) Signed-off-by: Michael Hospedales <hospedales@me.com>	2026-04-02 14:35:19 -07:00
zhanqiuhu	7b743ba953	[CI] Fix: pass string cache_dtype in test_register_kv_caches (#38836 )	2026-04-02 19:42:09 +00:00
Stefano Castagnetta	188defbd0b	[CI] Add flashinfer.py to attention test source deps (#38792 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-04-02 19:24:29 +00:00
Luciano Martins	08ed2b9688	feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Luciano Martins <lucianomartins@google.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2026-04-02 11:13:28 -07:00
Yanan Cao	ecd5443dbc	Bump helion dependency from 0.3.2 to 0.3.3 (#38062 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 10:59:33 -07:00
Stefano Castagnetta	58262dec6e	[Bugfix] Fix test mocks after SM100 restriction in #38730 (#38791 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-04-02 13:12:58 -04:00
Lucas Wilkinson	cb3935a8fc	[FA4] Update flash-attention to latest upstream FA4 (#38690 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-02 17:02:37 +00:00
Bowen Bao	82a006beeb	[CI][ROCm] Add gpt-oss w4a8 in CI (#38292 ) Signed-off-by: Bowen Bao <bowenbao@amd.com>	2026-04-03 00:06:01 +08:00
wang.yuqi	a9b4f07ba2	[Frontend] Re-enable running MaxSim on GPU (#38620 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-03 00:03:13 +08:00
Koushik Dutta	d9408ffba3	Triton MLA perf fixes (#33529 ) Signed-off-by: Koushik Dutta <koushd@gmail.com> Co-authored-by: root <root@ubuntu-nvidia.localdomain> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-04-02 09:40:01 -04:00
Yusuf Mohammad	16a65e4173	[Bugfix] Enable batch-invariant Triton matmul on all Ampere GPUs (SM 8x) (#38427 ) Signed-off-by: yusuf <yusufmohammad@live.com> Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet> Signed-off-by: Yusuf Mohammad <79484377+YM2132@users.noreply.github.com> Signed-off-by: <> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>	2026-04-02 09:29:58 -04:00
bsliu	c0817e4d39	[Model] Add support for Cheers multimodal model (#38788 ) Signed-off-by: bsliu <1187291748@qq.com> Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>	2026-04-02 21:01:40 +08:00
Harry Mellor	dfe5e31689	Don't compile vision encoder for Transformers backend (#30518 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-04-02 12:42:29 +00:00
JartX	2ce3d0ce36	[Feature] KV cache per-token-head INT8/FP8 quantization (#38378 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: yangyang4991 <yangyang4991@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2026-04-02 08:13:26 -04:00
Jiangyun Zhu	4eefbf9609	[Perf] fuse kernels in gdn (#37813 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-04-02 11:52:18 +00:00
vllmellm	551b3fb39f	[ROCm] Enable VLLM triton FP8 moe for gfx1201, tuned for Qwen3-30B-A3B-FP8 tp=2 and Qwen/Qwen3.5-35B-A3B-FP8 tp=2 (#38086 ) Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-04-02 08:13:42 +00:00
Li, Jiang	c6f722b93e	[CPU] Support gelu act in cpu_fused_moe (#38770 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-04-02 14:14:32 +08:00
Xin Yang	9bd7231106	Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205 )" (#38778 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-04-01 22:02:32 -07:00
Yanan Cao	73f48ce559	[Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam (#38743 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>	2026-04-01 21:30:31 -07:00
Gregory Shtrasberg	3aab680e3e	[ROCm][Bugfix] Fix ROCm runtime failure due to missing symbol (#38750 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>	2026-04-01 21:30:11 -07:00
Sergey Zinchenko	5a2d420c17	[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545 ) Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com>	2026-04-01 21:14:49 -07:00
Benjamin Chislett	5f96f9aff1	[Perf] DSV3.2 Indexer Fused Weights Projection (#38684 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-04-02 03:34:49 +00:00
Luka Govedič	694449050f	Fix multiline-format string for python 3.10 (#38739 ) Signed-off-by: Luka Govedic <luka.govedic@gmail.com>	2026-04-02 03:19:35 +00:00
Nick Hill	6241521dd2	[BugFix] Fix precommit breakage due to conflicting in-flight merges (#38759 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-04-01 15:35:06 -07:00
Kevin H. Luu	1785dc5501	Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831 )" (#38751 )	2026-04-02 06:34:28 +08:00

... 2 3 4 5 6 ...

15652 Commits