This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
f0d525171557e3fe74e8e6df52257f9d66831d3f
vllm
/
tests
/
v1
/
cudagraph
History
yugong333
ffe1fc7a28
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (
#32005
)
...
Signed-off-by: Yu Gong <
yu3.gong@gmail.com
>
2026-02-02 12:30:06 -05:00
..
__init__.py
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (
#20059
)
2025-08-15 10:01:39 -04:00
test_cudagraph_dispatch.py
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (
#32005
)
2026-02-02 12:30:06 -05:00
test_cudagraph_mode.py
[Attention] Update tests to remove deprecated env vars (
#30563
)
2025-12-17 09:49:59 -08:00