[Kernel] Full Tensor Parallelism for LoRA Layers (#3524)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-04-27 02:03:48 -05:00
parent 18d23f642a
commit eefeb16464
19 changed files with 686 additions and 111 deletions
--- a/csrc/punica/bgmv/bgmv_fp16_fp32_fp16.cu
+++ b/csrc/punica/bgmv/bgmv_fp16_fp32_fp16.cu
@@ -2,3 +2,4 @@
 #include "bgmv_impl.cuh"

 FOR_BGMV_WIDE_NARROW(INST_BGMV_TWOSIDE, nv_half, float, nv_half)
+FOR_INST_BGMV_WIDE_NARROW(INST_BGMV_ONESIDE, nv_half, float, nv_half)