vllm/tests/v1/cudagraph at f0d525171557e3fe74e8e6df52257f9d66831d3f - vllm

Files

yugong333 ffe1fc7a28 Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 )

Signed-off-by: Yu Gong <yu3.gong@gmail.com>

2026-02-02 12:30:06 -05:00

__init__.py

2025-08-15 10:01:39 -04:00

test_cudagraph_dispatch.py

2026-02-02 12:30:06 -05:00

test_cudagraph_mode.py

2025-12-17 09:49:59 -08:00