[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)

Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-25 01:00:52 -04:00
parent e64fde4b01
commit 8e192ff967
23 changed files with 2445 additions and 87 deletions
--- a/vllm/transformers_utils/config.py
+++ b/vllm/transformers_utils/config.py
@@ -63,4 +63,4 @@ def get_hf_text_config(config: PretrainedConfig):
        assert hasattr(config.text_config, "num_attention_heads")
        return config.text_config
    else:
-        return config
+        return config