vllm/vllm/attention/ops/blocksparse_attention at 6ffa3f314c59e42238f1c5f923ff2839e0af9698 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Cyrus Leung 6ffa3f314c [CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

..

__init__.py

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

2024-05-24 22:00:52 -07:00

blocksparse_attention_kernel.py

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

2024-05-24 22:00:52 -07:00

interface.py

[CI/Build] Avoid CUDA initialization (#8534 )

2024-09-18 10:38:11 +00:00

utils.py

[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )

2024-07-12 10:47:17 +08:00