biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Turner Jabbour	4034c3d32e	[Core] Move test utility to test file (#35672 ) Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>	2026-03-02 10:56:03 -05:00
EdalatiAli	cb21972a97	[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-01 23:31:19 -08:00
haosdent	6290470843	[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-01 15:14:46 -05:00
Asaf Gardin	bbf81f9a92	[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-03-01 20:40:23 +08:00
Hashem Hashemi	7600642eae	Add padding support to wvSplitK solution for skinny GEMMs (#33762 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-28 09:02:05 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Yanan Cao	9098ce690c	[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-27 09:21:35 -08:00
Max Hu	9c3fe9936b	Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Shang Wang <shangw@nvidia.com>	2026-02-27 20:20:23 +08:00
Michael Goin	4fec53cfcb	[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274 )	2026-02-26 17:58:03 -07:00
Andrii Skliar	56a6371706	[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687 ) Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Andrii <askliar@nvidia.com>	2026-02-26 16:31:43 -08:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Elizabeth Thomas	c97234c08b	fix(mxfp4): Disable monolithic path for TRITON backend with EP (#34270 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 13:33:42 -08:00
Kunshang Ji	8ad54a991b	[Platform] Add current_platform.num_compute_units interface (#35042 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-02-24 22:22:49 -08:00
Xin Yang	3bbb2046ff	[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-24 17:14:24 -08:00
Eldar Kurtić	a87cc50859	[Attn,KV-cache] Use per-head scales in the attention selector (#34281 ) Signed-off-by: Your Name <you@example.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Your Name <you@example.com>	2026-02-24 09:02:43 -05:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
tacos8me	b7892a3bef	[Model] Add NVFP4 quantization support for Step3.5-Flash (#34478 ) Signed-off-by: tacos8me <ian@cloudhabit.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-22 12:30:46 -07:00
Yanan Cao	9d7577b2bd	[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 19:55:51 -08:00
Yanan Cao	a6d0299c75	[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching (#34185 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-20 08:36:51 -08:00
Xin Yang	b1c4f0b265	[Kernel] Optimize grouped topk kernel (#34206 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 01:34:45 -08:00
rasmith	2b84ac669c	[CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py (#34181 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-18 23:10:19 +00:00
Wenlong Wang	847a57cd12	[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-18 13:03:24 -08:00
rasmith	fcd6ac97ed	[CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm (#34655 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-18 15:00:40 -05:00
Burkhard Ringlein	e24663c5a9	Add unit tests for fp8 output fusion of triton_attn (#34228 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-18 06:22:49 -05:00
ElizaWszola	a88b3be7c4	[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-17 23:35:04 -08:00
haosdent	b68fd899d1	[Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression (#34507 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-16 17:58:49 -08:00
Isotr0py	71cd89264f	[MM Encoder] Add Triton ViT attention backend (#32183 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 06:32:47 -08:00
haosdent	79f3fab05a	[Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE (#34494 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-14 23:25:46 -08:00
Marek Michalowski	742d214d6e	[Bugfix] fix the import path in moe test utils.py (#34245 ) Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>	2026-02-13 00:13:45 -08:00
Cyrus Leung	372b2e762a	[Bugfix] Standardize getting number of image patches/tokens (#34358 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:47:01 -08:00
Yanan Cao	96161fe978	[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel (#33373 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-12 18:13:12 -08:00
amitz-nv	f120bd42d3	[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 (#33506 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2026-02-12 13:06:58 -08:00
Michael Goin	ff1f83b056	[Refactor] Replace `activation: str` with `MoEActivation` enum (#33843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 17:29:32 -08:00
Wei Zhao	5aff2699bd	Fix CI failure - Flashinfer Kernel tests (#34316 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-11 14:17:16 -08:00
Linda	275e0d2a99	[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-11 12:38:11 +00:00
Hashem Hashemi	1b3540e6c6	Threshold fix wvSplitk for occasional CI fails (#34013 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-11 03:59:14 +00:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
Roberto L. Castro	afdce12c89	[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-10 10:29:52 -05:00
xuebwang-amd	b129136c7a	[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-10 10:08:05 -05:00
Andrey Talman	f97ca67176	[Release 2.10] Update to Torch 2.10 - final release (#30525 )	2026-02-08 13:51:09 -08:00
Hashem Hashemi	ed17f54c8b	Perf tuning and expansion of cases covered for wvSplitKrc (#33493 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-07 05:33:11 -08:00
lukec	15a0b9e570	Fix spelling errors (#33978 )	2026-02-06 23:58:50 -08:00
Wentao Ye	77c09e1130	[Refactor] Remove align block size logic in `moe_permute` (#33449 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 10:57:06 -08:00
Andreas Karatzas	350ca72c04	[ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-06 15:08:16 +00:00
Xinyu Chen	e969a169ef	support view_from_cpu_tensor on XPU (#33868 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-02-06 08:34:20 +00:00
Xin Yang	79028d4388	[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568 )	2026-02-05 20:34:00 -05:00
Hashem Hashemi	d5c4800112	Adds padding and perf improvements to wvSplitK_fp8 (#33527 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-05 22:16:02 +00:00
bnellnm	a57c8228ff	[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-05 18:07:18 +00:00
rasmith	c1395f72cd	[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-05 05:05:48 +00:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00

1 2 3 4 5 ...

746 Commits