[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
This commit is contained in:
Varun Sundar Rabindranath
2025-03-18 05:47:53 -04:00
committed by GitHub
parent d1695758b2
commit 400d483e87
15 changed files with 245 additions and 2092 deletions

View File

@@ -62,9 +62,10 @@ class LoRAModelRunnerMixin:
if not self.lora_manager:
raise RuntimeError("LoRA is not enabled.")
# Set is_prefill to True, so we always use the SGMV kernels.
# For cuda platforms, we have specialized triton kernels, and
# the cuda path ignores `is_prefill`.
# Set is_prefill to True, so we always use the SGMV kernels on
# non-cuda platforms.
# On cuda platforms we use the same kernels for prefill and
# decode and this flag is generally ignored.
lora_mapping = LoRAMapping(token_lora_mapping,
prompt_lora_mapping,
is_prefill=True)