vllm/tests/lora/test_punica_ops.py at 5719a4e4e601fb91274294d25370b7aad656d629 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

yugong333 ffe1fc7a28 Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 )

Signed-off-by: Yu Gong <yu3.gong@gmail.com>

2026-02-02 12:30:06 -05:00

11 KiB

Raw Blame History

View Raw