Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|
Monishver
|
c09ad767cd
|
Feature/silu block quant fusion v1 (#32996)
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
|
2026-04-01 18:50:43 +00:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
BadrBasowid
|
077a9a8e37
|
[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-31 14:15:50 -04:00 |
|
Yongye Zhu
|
87bd91892f
|
[MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 03:37:04 +00:00 |
|
elvischenv
|
296839a1b0
|
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-18 15:01:26 +00:00 |
|
Luka Govedič
|
9556af87d5
|
[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
|
2026-03-11 10:56:55 -07:00 |
|
Copilot
|
4b87ffbefb
|
[torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints (#36027)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-09 18:04:40 +00:00 |
|
TJian
|
fb7fdc49c4
|
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-03 06:24:21 -08:00 |
|
Jason Li
|
9d37941017
|
[torch.compile] Sequence Parallelism threshold compile ranges (#28672)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-26 05:00:12 +00:00 |
|
Michael Goin
|
a4bd661fb3
|
[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-23 17:34:41 -08:00 |
|
ElizaWszola
|
a88b3be7c4
|
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-17 23:35:04 -08:00 |
|
Ikenna
|
906077181b
|
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna <ikennachifo@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-07 02:27:33 +00:00 |
|
Luka Govedič
|
ac32e66cf9
|
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-06 04:19:49 -08:00 |
|
Luka Govedič
|
4d9513537d
|
[CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-04 19:09:03 -05:00 |
|