vllm/vllm/model_executor at 8332078cfdbd5e44e527893b695e79052d008172 - vllm

Files

Benjamin Chislett 8332078cfd [Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315 )

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-04-08 20:36:33 -04:00

kernels

[ROCm][Quantization] Add asymmetric INT8 quantization support to TritonInt8ScaledMMLinearKernel (#38501 )

2026-04-06 09:42:10 +08:00

layers

[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315 )

2026-04-08 20:36:33 -04:00

model_loader

[Frontend] new online quantization frontend (#38138 )

2026-04-03 11:58:39 -04:00

models

[Bugfix]Fix EP precision for Qwen3.5, Qwen3-Next (#39181 )

2026-04-09 01:47:48 +04:00

offloader

Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178 )

2026-03-17 21:22:09 +08:00

warmup

[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005 )

2026-04-08 19:23:08 +00:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

Add ability to replace oot ops when using lora (#37181 )

2026-03-16 18:04:15 -07:00

parameter.py

[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904 )

2026-03-24 17:14:01 +00:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00