vllm/vllm/attention/backends at 309aaef8255fb832bf674c6ed7d9d84211629421 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Cody Yu 309aaef825 [Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

..

__init__.py

[Core] Refactor Attention Take 2 (#3462 )

2024-03-25 04:39:33 +00:00

abstract.py

[Core] Modulize prepare input and attention metadata builder (#6596 )

2024-07-23 00:45:24 +00:00

blocksparse_attn.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00

flash_attn.py

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

flashinfer.py

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

ipex_attn.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

openvino.py

[Hardware][Intel] OpenVINO vLLM backend (#5379 )

2024-06-28 13:50:16 +00:00

pallas.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

rocm_flash_attn.py

[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )

2024-07-20 09:39:07 -07:00

torch_sdpa.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

utils.py

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

xformers.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00