biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
rasmith	c1395f72cd	[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-05 05:05:48 +00:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00
杨朱 · Kiki	b95cc5014d	[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 15:01:59 +08:00
danielafrimi	0aca8b8c62	[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-02-02 09:18:50 -05:00
csy0225	c3b40dc3e7	[Models] Step-3.5-Flash (#33523 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-02 10:21:18 +08:00
Roy Wang	63c0889416	[Misc] Fix flashinfer related tests (#33462 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-31 16:10:24 -05:00
Yanan Cao	d5c41db35b	[Kernel] [Helion] [3/N] Helion kernel registry (#33203 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-31 15:38:46 +08:00
Dimitrios Bariamis	f0bca83ee4	Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-30 22:48:27 -08:00
Yanan Cao	8ecd213c0b	[Kernel] [Helion] [2/N] Helion kernel wrapper (#32964 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-31 12:53:01 +08:00
Gregory Shtrasberg	31aedfe7d6	[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-30 19:05:23 -06:00
Michael Goin	67ebaff528	Refactor NVFP4 Linear utils for ModelOpt and CT (#33201 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-30 16:37:42 -08:00
Pavani Majety	c3a9752b0c	[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-01-30 10:30:46 -08:00
Yanan Cao	6c1f9e4c18	[Kernel] [Helion] [1/N] Add Helion ConfigManager (#32740 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-30 12:19:19 -05:00
Harry Mellor	a11bc12d53	Fix `test_moe.py` for Transformers v5 (#33413 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 14:03:25 +00:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Richard Zou	d9aa39a3bb	[torch.compile] Speed up MOE handling in forward_context (#33184 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-27 15:17:54 -08:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
Roberto L. Castro	fcb9df99bd	[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-24 18:45:27 -07:00
7. Sun	0ccecf8833	[Tests] Standardize RNG seed utility across test files (#32982 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-01-24 06:47:14 +00:00
Michael Goin	4561f13985	[Refactor] Rename `gptq_marlin` to `marlin` to match MoE (#32952 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-23 16:48:12 -05:00
Matt	305e53ade8	[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test (#32904 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 16:24:26 +00:00
Fadi Arafeh	aac0b817fa	[CPU Backend][BugFix] Fix failing CPU MoE test (#32876 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-23 12:06:51 +00:00
Karan Bansal	fa6e599a61	[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-23 08:22:37 +00:00
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
bnellnm	dc917cceb8	[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE` (#31996 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-22 18:21:35 -05:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Richard Zou	654a71fc3c	[torch.compile] Improve Cold Start for MoEs (#32805 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-22 10:44:40 -05:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
elvischenv	808d6fd7b9	Bump Flashinfer to v0.6.1 (#30993 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-01-21 08:49:50 -08:00
whx	1861ae8aae	[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-21 11:38:04 -05:00
Robert Shaw	85f55c943c	[Quantization][Deprecation] Deprecate HQQ (#32681 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 09:32:40 -05:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Lucas Wilkinson	b4f64e5b02	Update FlashMLA (#32491 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 13:03:37 +08:00
linhaifeng	7901109ea5	[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-20 11:13:39 -05:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Yanan Cao	9d1e611f0e	[CI] Add Helion as an optional dependency (#32482 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-19 19:09:56 +00:00
Matt	11bbf86f6a	[CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused (#32408 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-19 08:25:47 +00:00
bnellnm	327a02d8db	[MoE Refactor] Separate Router into OO Classes (#30623 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-18 11:40:49 -05:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
Hashem Hashemi	7a1030431a	Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-01-16 11:45:04 -06:00
rasmith	6ca4f400d8	[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-16 16:22:53 +08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
rasmith	8853a50af2	[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-15 19:05:54 +08:00
rasmith	3c2685645e	[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-15 05:04:34 +00:00
Hongxia Yang	048bb59728	AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-01-13 23:25:10 -08:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00

1 2 3 4 5 ...

698 Commits