vllm/csrc at 81c57f60a2c77d169dbec021bb58a467edf580f6 - vllm

Files

Junhao Li 3303f134e0 [Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 )

Signed-off-by: Junhao Li <junhao@ubicloud.com>

2025-08-07 19:18:28 -07:00

attention

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

core

[Kernel] fp4 marlin kernel (#17687 )

2025-05-10 19:58:49 -07:00

cpu

[Hardware][CPU] Build fix for ARM without BF16 (#21848 )

2025-07-30 06:22:00 -07:00

cutlass_extensions

[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 )

2025-08-07 19:18:28 -07:00

mamba/mamba_ssm

[v1] - Mamba1 Attention Metadata (#21249 )

2025-08-06 17:03:42 -07:00

moe

[Refactor] Fix Compile Warning #1444-D (#21462 )

2025-08-01 06:10:30 -07:00

prepare_inputs

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

quantization

[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 )

2025-08-07 19:18:28 -07:00

quickreduce

[Feature] add quick all reduce (#19744 )

2025-06-26 20:54:24 -07:00

rocm

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

sparse/cutlass

[feat]: CUTLASS block scaled group gemm for SM100 (#19757 )

2025-07-04 12:58:04 -06:00

activation_kernels.cu

Modularize fused experts and integrate PPLX kernels (#15956 )

2025-05-14 13:11:54 -07:00

cache_kernels.cu

[Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036 )

2025-08-01 19:18:51 -04:00

cache.h

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

cuda_compat.h

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

cuda_utils_kernels.cu

[NVIDIA] Support nvfp4 quantization (#12784 )

2025-02-12 19:51:51 -08:00

cuda_utils.h

[Attention] MLA with chunked prefill (#12639 )

2025-02-21 15:30:12 -08:00

cuda_view.cu

[V1] Fully Transparent Implementation of CPU Offloading (#15354 )

2025-03-31 20:22:34 +08:00

cumem_allocator.cpp

[core] improve error handling when wake up from sleep mode (#12981 )

2025-02-10 09:38:57 +08:00

custom_all_reduce_test.cu

[Distributed] Add custom allreduce support for ROCM (#14125 )

2025-03-31 22:49:12 -07:00

custom_all_reduce.cu

[Distributed] Add custom allreduce support for ROCM (#14125 )

2025-03-31 22:49:12 -07:00

custom_all_reduce.cuh

fix: spelling (#16466 )

2025-04-11 23:24:22 -07:00

custom_quickreduce.cu

[Feature] add quick all reduce (#19744 )

2025-06-26 20:54:24 -07:00

dispatch_utils.h

Modularize fused experts and integrate PPLX kernels (#15956 )

2025-05-14 13:11:54 -07:00

layernorm_kernels.cu

[perf] Add fused MLA QKV + strided layernorm (#21116 )

2025-07-22 07:07:44 -07:00

layernorm_quant_kernels.cu

[perf] Add fused MLA QKV + strided layernorm (#21116 )

2025-07-22 07:07:44 -07:00

ops.h

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

2025-07-25 17:07:07 -07:00

permute_cols.cu

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

pos_encoding_kernels.cu

[Kernel] Have rotary embeddings support tensors (#18046 )

2025-05-14 15:43:55 -07:00

sampler.cu

[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel (#20491 )

2025-07-05 19:38:02 -07:00

torch_bindings.cpp

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

2025-07-25 17:07:07 -07:00

type_convert.cuh

[torch.compile] Fuse RMSNorm with quant (#9138 )

2024-11-08 21:20:08 +00:00