vllm/tests/v1/cudagraph at c4b9e6778f9d8054c1665b2d1c2cb0ee36e9e2f5 - vllm

Files

yugong333 ffe1fc7a28 Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 )

Signed-off-by: Yu Gong <yu3.gong@gmail.com>

2026-02-02 12:30:06 -05:00

__init__.py

2025-08-15 10:01:39 -04:00

test_cudagraph_dispatch.py

2026-02-02 12:30:06 -05:00

test_cudagraph_mode.py

2025-12-17 09:49:59 -08:00