This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
4c31218f80e35c4d94097a792a15b7817381daf0
vllm
/
vllm
/
v1
/
attention
/
backends
History
vllmellm
217db4baa6
[Bugfix][ROCm] Fix AITER MLA V1 (
#17880
)
...
Signed-off-by: vllmellm <
vllm.ellm@embeddedllm.com
>
2025-05-09 08:38:21 +00:00
..
mla
[Bugfix][ROCm] Fix AITER MLA V1 (
#17880
)
2025-05-09 08:38:21 +00:00
__init__.py
[V1] Implement vLLM V1 [1/N] (
#9289
)
2024-10-22 01:24:07 -07:00
flash_attn.py
[Core] Support full cuda graph in v1 (
#16072
)
2025-05-07 22:30:15 -07:00
flashinfer.py
[v1] AttentionMetadata for each layer (
#17394
)
2025-05-06 07:58:37 -07:00
pallas.py
[TPU] Fix the test_sampler (
#17820
)
2025-05-08 05:51:33 -04:00
triton_attn.py
[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (
#16828
)
2025-05-06 18:21:48 -04:00
utils.py
[v1] AttentionMetadata for each layer (
#17394
)
2025-05-06 07:58:37 -07:00