vllm/vllm/model_executor at 289fc48ab73fb1eb610a72b4ddde9694e529bfba - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Netanel Haber 289fc48ab7 Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653 )

2026-03-04 08:43:13 -08:00

..

[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890 )

2026-02-26 05:00:10 +00:00

[Core] Add All-to-All communication backend for DCP (#34883 )

2026-03-04 10:01:57 -05:00

[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681 )

2026-03-04 09:49:47 +00:00

Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653 )

2026-03-04 08:43:13 -08:00

[offloader] v2: Hide weight onloading latency via prefetching (#29941 )

2026-02-25 17:20:59 -08:00

[MoE Refactor] Create MK for TRTLLM Kernels (#32564 )

2026-03-03 10:39:50 -08:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900 )

2026-02-21 19:28:01 -05:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00