vllm/vllm/platforms at cb528d0585c0a2a876dfc3813c7fe6092a2549ae - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Tao He 60f7624334 Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )

2025-05-12 19:52:47 -07:00

..

__init__.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

cpu.py

[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265 )

2025-05-09 17:16:12 +00:00

cuda.py

Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )

2025-05-12 19:52:47 -07:00

hpu.py

[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 )

2025-04-11 07:38:36 -07:00

interface.py

Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )

2025-05-12 19:52:47 -07:00

neuron.py

Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 )

2025-05-07 00:07:30 -07:00

rocm.py

[Core] Use platform-agnostic device control for DP engine core (#17245 )

2025-05-12 12:09:16 -07:00

tpu.py

Improve configs - the rest! (#17562 )

2025-05-09 15:18:44 -07:00

xpu.py

[Hardware] add platform-specific request validation api (#16291 )

2025-04-09 12:50:01 -07:00