vllm/csrc/quantization/w8a8 at f9e2a75a1ee1d339ec5d885f842b3cfc27d71e02 - vllm

Files

Wentao Ye 308feab33f [Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830 )

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

2026-01-09 11:13:43 -08:00

cutlass

[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830 )

2026-01-09 11:13:43 -08:00

fp8

[Refactor] Reduce duplicate code in per_token_group_quant cuda kernels (#30496 )

2025-12-12 16:45:23 -05:00

int8

[Bugfix] Fix test fused quant layernorm tests (#27865 )

2025-11-08 14:31:33 -08:00

per_token_group_quant_8bit.h

[Refactor] Refactor FP8 & INT8 Quant Folder inside w8a8 (#25293 )

2025-10-08 10:20:48 -04:00