vllm/vllm/model_executor at 4e2ab1861d17d6df3d0220f7bb82df7864bbd218 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

rasmith 78434b923c [CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access (#39087 )

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

2026-04-08 16:57:18 +08:00

..

[ROCm][Quantization] Add asymmetric INT8 quantization support to TritonInt8ScaledMMLinearKernel (#38501 )

2026-04-06 09:42:10 +08:00

[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access (#39087 )

2026-04-08 16:57:18 +08:00

[Frontend] new online quantization frontend (#38138 )

2026-04-03 11:58:39 -04:00

[Attention][V0 Deprecation] Deprecate accept output buffer (#39125 )

2026-04-07 17:14:58 -04:00

Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178 )

2026-03-17 21:22:09 +08:00

[Perf] Change Trtllm fp8 MoE to use Shuffled Weights and BlockMajorK Layout (#38993 )

2026-04-05 10:54:31 -04:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

Add ability to replace oot ops when using lora (#37181 )

2026-03-16 18:04:15 -07:00

parameter.py

[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904 )

2026-03-24 17:14:01 +00:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00