vllm/tests/quantization at 2f186635cbcb38fd85e718a5b7ff9ec698cbb4f8 - vllm

Files

Linda 275e0d2a99 [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 )

Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>

2026-02-11 12:38:11 +00:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

fp_quant.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

reference_mxfp4.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_auto_round.py

Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 )

2026-01-14 07:11:30 +00:00

test_blackwell_moe.py

[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 )

2026-02-11 12:38:11 +00:00

test_compressed_tensors.py

Refactor NVFP4 Linear utils for ModelOpt and CT (#33201 )

2026-01-30 16:37:42 -08:00

test_configs.py

[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 )

2026-01-10 23:19:46 -08:00

test_cpu_offload.py

[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 )

2026-01-10 23:19:46 -08:00

test_cpu_wna16.py

[XPU][9/N] clean up existing ipex code/doc (#34111 )

2026-02-11 00:27:15 -08:00

test_experts_int8.py

[Quantization] Deprecate Long Tail of Schemes (#31688 )

2026-01-08 19:07:45 -05:00

test_fp8.py

fix memory for online fp8 quantization with streaming weight load (#31914 )

2026-02-02 14:17:42 -05:00

test_gptq_dynamic.py

[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 )

2026-01-10 23:19:46 -08:00

test_gptq_v2.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_lm_head.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_mixed_precision.py

[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 )

2025-11-11 12:05:22 -05:00

test_modelopt.py

[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957 )

2025-12-21 22:34:49 -05:00

test_ptpc_fp8.py

[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 )

2026-01-10 23:19:46 -08:00

test_quark.py

[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 )

2025-12-10 19:59:35 -08:00

test_register_quantization_config.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_torchao.py

[QeRL] Layerwise Reloading (#32133 )

2026-01-30 08:50:05 -07:00

utils.py

[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 )

2026-01-10 23:19:46 -08:00