Logo
Explore Help
Register Sign In
biondizzle/vllm
1
0
Fork 0
You've already forked vllm
Code Issues Pull Requests Actions 2 Packages Projects Releases Wiki Activity
Files
983a40a8bb2ef2a0ed9c5134d49358c38d6b03ae
vllm/vllm/attention/ops
History
Yu-Zhou d0a7a2769d [Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139)
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2025-02-18 19:40:19 -08:00
..
blocksparse_attention
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
2025-02-02 11:58:18 -08:00
__init__.py
[Core] Refactor Attention Take 2 (#3462)
2024-03-25 04:39:33 +00:00
hpu_paged_attn.py
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139)
2025-02-18 19:40:19 -08:00
ipex_attn.py
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
2025-02-02 11:58:18 -08:00
nki_flash_attn.py
[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
2025-02-11 21:12:37 -08:00
paged_attn.py
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
2025-02-02 11:58:18 -08:00
prefix_prefill.py
[ROCm][V1] Add intial ROCm support to V1 (#12790)
2025-02-13 22:21:50 -08:00
triton_decode_attention.py
[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)
2025-02-04 18:22:24 -08:00
triton_flash_attention.py
[Misc] Fix improper placement of SPDX header in scripts (#12694)
2025-02-03 11:16:59 -08:00
Powered by Gitea Version: 1.25.2 Page: 226ms Template: 2ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API