vllm/vllm/model_executor at 6fa6e7ef0c9b485e8a684211e96691731aad6faa - vllm

Files

Roberto L. Castro 8ef50d9a6b [Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 )

Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

2026-01-13 15:22:53 -08:00

layers

[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 )

2026-01-13 15:22:53 -08:00

model_loader

[FixBug] Improve exception string in tensorizer.py (#31680 )

2026-01-11 05:01:53 -08:00

models

[6/N][Attention] Move utils to more appropriate locations (#32215 )

2026-01-13 05:38:52 -08:00

warmup

[UX] Reduce DeepGEMM warmup log output to single progress bar (#30903 )

2025-12-17 20:21:51 -08:00

__init__.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00

custom_op.py

[Doc] Add developer guide for CustomOp (#30886 )

2026-01-09 16:21:11 +00:00

parameter.py

[Docs] Replace rst style double-backtick with md single-backtick (#27091 )

2025-10-17 02:47:34 -07:00

utils.py

[Platform] Deprecate seed_everything (#31659 )

2026-01-04 18:34:04 -08:00