Revert "[Kernel] Use flash-attn for decoding (#3648)" (#4820)

Lora 3 & 4 test seems to have illegal memory access failure after this commit;

[2024-05-14 23:51:18,182 E 22 22] logging.cc:101: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
<br class="Apple-interchange-newline">
Exmaple: https://buildkite.com/vllm/ci/builds/7382#018f793d-1527-4e1c-ab59-c3a34ec55241

This reverts commit 1356df5.

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)
This commit is contained in:
SangBin Cho
2024-05-15 11:52:45 +09:00
committed by GitHub
parent 29bc01bf3b
commit 8a7cc254a0
6 changed files with 65 additions and 313 deletions

View File

@@ -12,7 +12,7 @@ MODELS = [
# "Deci/DeciLM-7b", # Broken
# "tiiuae/falcon-7b", # Broken
"EleutherAI/gpt-j-6b",
# "mosaicml/mpt-7b", # Broken
"mosaicml/mpt-7b",
# "Qwen/Qwen1.5-0.5B" # Broken,
]