vllm/vllm/model_executor at d0532bf38da5c8f4758e34e53a3708be0955d2db - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Xin Yang d0532bf38d [Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683 )

Signed-off-by: Xin Yang <xyangx@amazon.com>

2026-03-20 11:28:41 -06:00

..

[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) (#37329 )

2026-03-20 10:32:21 -05:00

[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683 )

2026-03-20 11:28:41 -06:00

[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334 )

2026-03-19 00:45:10 +00:00

[Pixtral] Enable Pixtral language model support Eagle3 (#37182 )

2026-03-20 15:50:15 +00:00

Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178 )

2026-03-17 21:22:09 +08:00

[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358 )

2026-03-19 17:58:33 +00:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

Add ability to replace oot ops when using lora (#37181 )

2026-03-16 18:04:15 -07:00

parameter.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 )

2026-01-29 16:52:11 +08:00