biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Xin Yang	b1c4f0b265	[Kernel] Optimize grouped topk kernel (#34206 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 01:34:45 -08:00
Robert Shaw	6874638bc4	[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-18 07:42:36 -08:00
ElizaWszola	a88b3be7c4	[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-17 23:35:04 -08:00
Hongxia Yang	4a00a511bb	[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-02-17 19:19:41 -08:00
Pushpinder Singh	bcd65c1f6a	[Bugfix] Replace c10::optional with std::optional in topk kernel (#34467 ) Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com>	2026-02-13 08:30:23 -08:00
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
Hashem Hashemi	fac4e96940	small adjustment to wvSplitKrc (#34410 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-12 20:26:36 +00:00
Kyle Sayers	e9cd691132	[Bugfix] Fix Sparse24 Compressed Tensors models (#33446 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 23:15:16 -08:00
Li, Jiang	05339a7b20	[Bugfix][CPU] Fix llama4 inference on CPU (#34321 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-02-11 19:07:23 +08:00
R3hankhan	d1b837f0ae	[CPU] Enable FP16 (Half dtype) support for s390x (#34116 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-11 14:41:42 +08:00
Дзержи́нский	1485396abb	[Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022 ) Signed-off-by: Dzerzhinsky <256908701+AstroVoyager7@users.noreply.github.com> Signed-off-by: Дзержи́нский <256908701+AstroVoyager7@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-10 19:31:51 -08:00
Kebe	5ee5c86eeb	[Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2026-02-10 19:31:36 -08:00
Roberto L. Castro	afdce12c89	[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-10 10:29:52 -05:00
Nikhil Gupta	caad9f1e01	[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901 ) Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com>	2026-02-09 18:04:41 +08:00
ihb2032	5a5c43511a	fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052 ) Signed-off-by: ihb2032 <hebome@foxmail.com> Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain>	2026-02-09 08:55:41 +00:00
Hashem Hashemi	ed17f54c8b	Perf tuning and expansion of cases covered for wvSplitKrc (#33493 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-07 05:33:11 -08:00
Vel	bc32444b23	[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517 ) Signed-off-by: code4me2 <velvetmoon222999@gmail.com>	2026-02-06 20:28:01 -08:00
Wentao Ye	77c09e1130	[Refactor] Remove align block size logic in `moe_permute` (#33449 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 10:57:06 -08:00
Gassan Salama	1363e3d6d5	[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263 ) Signed-off-by: Gassan <gassan.salama@arm.com>	2026-02-06 15:01:48 +08:00
R3hankhan	ac04dd374f	[CPU] Add BF16 Kernel type for s390x (#33788 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-06 04:57:02 +00:00
Hashem Hashemi	d5c4800112	Adds padding and perf improvements to wvSplitK_fp8 (#33527 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-05 22:16:02 +00:00
R3hankhan	4dffc5e044	[CPU] Split attention dispatch by head_dim alignment (#32161 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-03 19:37:15 -08:00
Radu Salavat	e69c990c21	[Feature][CPU Backend]: Optimize ARM vectorization backend (#30329 ) Signed-off-by: Radu Salavat <radu.salavat@arm.com>	2026-02-02 20:17:56 -08:00
Lain	089cd4f002	fix cutlass_3x_gemm_fp8_blockwise on sm103a (#32224 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-02 11:47:46 -08:00
Kebe	528e9b1490	[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series (#33540 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Thomas Vegas <tvegas@nvidia.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2026-02-02 22:55:46 +08:00
linhaifeng	fedf64332e	[Bugfix]: Fix display errors in TORCH_CHECK messages (#32942 ) Signed-off-by: linhaifeng <1371675203@qq.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-31 09:48:48 -08:00
Li, Jiang	8311f083bd	[Bugfix][CPU] Fix thread num for shared memory communication (#33317 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 03:26:58 -08:00
Didier Durand	31b25f6516	[Doc]: fixing multiple typos in diverse files (#33256 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 16:52:03 +08:00
Wentao Ye	c4e744dbd4	[Perf] Optimize `moe_permute` for CUTLASS FP8 (#32892 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-28 10:15:24 -08:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
rasmith	58996f3589	[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 (#32976 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-27 07:16:43 +00:00
dolpm	58a05b0ca1	[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-26 16:59:44 -05:00
Roberto L. Castro	fcb9df99bd	[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-24 18:45:27 -07:00
Michael Goin	4561f13985	[Refactor] Rename `gptq_marlin` to `marlin` to match MoE (#32952 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-23 16:48:12 -05:00
rasmith	6cc6d92be5	[CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel (#32831 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-23 13:35:48 -08:00
Li, Jiang	5da4c7d789	[CI/Build][CPU] Fix failed pooling tests and macos smoke test (#32907 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-23 10:48:20 +00:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Fadi Arafeh	744ef30484	[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-22 18:55:23 +00:00
Or Ozeri	421012b63a	OffloadingConnector: Support kernel_block_size != block_size (#30692 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-22 12:30:04 +00:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
Robert Shaw	85f55c943c	[Quantization][Deprecation] Deprecate HQQ (#32681 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 09:32:40 -05:00
Wentao Ye	6c97b9b9b6	[Perf] Only clone when needed for `moe_permute` (#32273 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-20 11:34:39 -05:00
Wentao Ye	eebc58df0c	[Refactor] Remove unused cutlass moe problem size function (#32047 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-18 12:46:59 -08:00
Hashem Hashemi	7a1030431a	Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-01-16 11:45:04 -06:00
Michael Goin	83239ff19a	Add thread_n=64 support to Marlin MoE (#32360 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 16:45:44 -08:00
Wentao Ye	f28125d87b	[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-13 10:58:18 -08:00
Kevin McKay	c60578de0a	[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process (#31295 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2026-01-10 03:57:38 +00:00
PatrykSaffer	80fead8bf6	Fuse RoPE and MLA KV-cache write (#25774 ) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-09 19:18:37 -08:00
Lucas Wilkinson	0a0aa07747	[Quant] Make static quant support all group shapes (#30833 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-09 12:49:27 -08:00

1 2 3 4 5 ...

706 Commits