vllm/tests/v1/cudagraph at a8ffc4f0f2d02aa4e505dcc3c974d5ec6e00737c - vllm

Files

Daisy-Ma-coder cfbee3d0e7 [CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

2025-09-22 10:37:43 -07:00

__init__.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_cudagraph_dispatch.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_cudagraph_mode.py

[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

2025-09-22 10:37:43 -07:00