vllm/tests/compile/fusions_e2e at 490f17d0c7e21c3a4cd4c8e13b0e40840fc754c1 - vllm

Files

Carl Y 1f5ec2889c [mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 )

Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-02 21:16:11 -04:00

__init__.py

[CI][torch.compile] Reduce e2e fusion test time (#33293 )

2026-02-04 19:09:03 -05:00

common.py

[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307 )

2026-03-03 06:24:21 -08:00

conftest.py

[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 )

2026-04-02 21:16:11 -04:00

models.py

[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 )

2026-04-02 21:16:11 -04:00

test_tp1_quant.py

[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 )

2026-04-02 21:16:11 -04:00

test_tp2_ar_rms.py

[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792 ) (#36205 )

2026-04-02 21:16:11 -04:00

test_tp2_async_tp.py

[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825 )

2026-03-31 22:15:05 -04:00