vllm/vllm/attention/ops at a4e2b268568b335d8fe37f8eaaa894cec3ba9397 - vllm

Files

Wallas Henrique c27df94e1f [Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 )

Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-11-25 12:23:32 -05:00

2024-11-18 11:39:40 -08:00

__init__.py

2024-03-25 04:39:33 +00:00

hpu_paged_attn.py

2024-11-06 01:09:10 -08:00

ipex_attn.py

2024-11-20 10:57:39 +00:00

paged_attn.py

2024-11-14 21:23:29 +00:00

prefix_prefill.py

2024-11-25 12:23:32 -05:00

triton_flash_attention.py

2024-05-16 10:46:52 -07:00