vllm/vllm/model_executor at 5bec0b0ba3d313ac8b36e29517bdc938417d80ca - vllm

Files

Wei Zhao 6da1310f91 [Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296 )

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 84e436ed1c)

2026-03-10 19:08:18 -07:00

kernels

[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890 )

2026-02-26 05:00:10 +00:00

layers

[Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296 )

2026-03-10 19:08:18 -07:00

model_loader

add mixed precision support for modelopt (#35047 )

2026-02-26 21:56:24 +00:00

models

[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994 )

2026-03-06 13:03:40 -08:00

offloader

[offloader] v2: Hide weight onloading latency via prefetching (#29941 )

2026-02-25 17:20:59 -08:00

warmup

[MoE Refactor] Create MK for TRTLLM Kernels (#32564 )

2026-03-03 10:39:50 -08:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900 )

2026-02-21 19:28:01 -05:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00