vllm/tests/v1/cudagraph at 4293c00b84b968ed25f80dfd2af3bb34d1eeeef6 - vllm

Files

yugong333 ffe1fc7a28 Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 )

Signed-off-by: Yu Gong <yu3.gong@gmail.com>

2026-02-02 12:30:06 -05:00

__init__.py

2025-08-15 10:01:39 -04:00

test_cudagraph_dispatch.py

2026-02-02 12:30:06 -05:00

test_cudagraph_mode.py

2025-12-17 09:49:59 -08:00