Commit Graph

4 Commits

Author SHA1 Message Date
mikaylagawarecki
7c080dd3c5 [4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
2026-03-31 10:21:13 -07:00
Wentao Ye
ce8cf9161d [Compile] Fix compile warning st256_cs in cuda_vec_utils.cuh (#36693)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-16 11:12:15 -04:00
Roberto L. Castro
580864d81e [Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
2026-03-09 09:50:36 -07:00
Roberto L. Castro
a201ad72d8 [Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
2026-02-27 16:28:17 -08:00