Didier Durand
|
4979eb79da
|
[Doc]: fix typos in various files (#24821)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-15 01:08:52 -07:00 |
|
Wentao Ye
|
fc2dbcda8b
|
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (#24783)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-14 11:20:17 -04:00 |
|
Matthew Bonanni
|
7ba32aa60b
|
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-12 15:45:53 -06:00 |
|
Hyogeun Oh (오효근)
|
41f17bf290
|
[Docs] Fix warnings in mkdocs build (continued) (#24740)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-09-12 06:43:15 -07:00 |
|
Wentao Ye
|
fcba05c435
|
[Bug] Fix Layer weight_block_size Assertion Issue (#24674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 19:47:59 -04:00 |
|
Michael Goin
|
c3aea10dc8
|
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-11 15:43:14 -07:00 |
|
Duncan Moss
|
074854b24f
|
[Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 17:04:56 -04:00 |
|
Wentao Ye
|
a892b259b4
|
[Doc] Remove Useless Comments (#24687)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 12:25:47 -07:00 |
|
Jerry Zhang
|
2048c4e379
|
[torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-09-10 23:53:24 -07:00 |
|
Gregory Shtrasberg
|
9a161307f5
|
[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-10 13:59:55 -07:00 |
|
tomeras91
|
08abfa78ec
|
[Bugfix] fix modelopt exclude_modules name mapping (#24178)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-10 10:20:46 -07:00 |
|
Hyogeun Oh (오효근)
|
ccee371e86
|
[Docs] Fix warnings in mkdocs build (continued) (#24092)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-10 06:23:28 -07:00 |
|
RoadToNowhereX
|
c0bd6a684a
|
Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217)
Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-10 06:22:31 -07:00 |
|
Wei
|
0efdb5c3ba
|
[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-09-10 04:27:53 +00:00 |
|
Didier Durand
|
f4962a6d55
|
[Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 00:22:16 -07:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
Isotr0py
|
00a4e56d8d
|
[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-06 09:23:12 -07:00 |
|
Didier Durand
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
Didier Durand
|
83609ca91d
|
[Doc]: fix typos in Python comments (#24173)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-04 08:52:17 -07:00 |
|
nvjullin
|
37241077d5
|
[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 09:25:40 -04:00 |
|
bnellnm
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
Didier Durand
|
02d411fdb2
|
[Doc]: fix typos in Python comments (#24115)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:14:07 -07:00 |
|
Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
Kyle Sayers
|
1c41310584
|
[Bugfix] Fix transform_config parsing in Compressed Tensors (#23945)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-02 13:54:10 -04:00 |
|
Yan Ma
|
7be0cb8e9e
|
[XPU][Feature] fp8 online quantization support for XPU (#23148)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
|
2025-09-02 04:06:53 +00:00 |
|
Didier Durand
|
0235103cbb
|
[Doc]: fix typos in Python comments (#24042)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-01 19:07:45 -07:00 |
|
Jun-Howie
|
acc1a6e10a
|
Fix the bug related to loading GPTP INT3 weights. (#23328)
Signed-off-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 05:39:57 +00:00 |
|
JartX
|
183a70967a
|
[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) (#23994)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 03:33:40 +00:00 |
|
Xin Yang
|
8fb85b7bb6
|
Add routed_scaling_factor to MoE grouped topk (#23123)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:48 -07:00 |
|
Michael Goin
|
b7adf94c4a
|
Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-29 10:28:35 -07:00 |
|
elvischenv
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
Po-Han Huang (NVIDIA)
|
95089607fa
|
[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-28 06:56:20 -07:00 |
|
Didier Durand
|
d3da2eea54
|
[Doc]: fix typos in Python scripts (#23828)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-28 05:37:38 -07:00 |
|
JartX
|
3462c1c522
|
[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-28 09:03:22 +00:00 |
|
Kyle Sayers
|
22feac8e95
|
[Transform] [Quantization] Add transforms to compressed tensors (#22486)
|
2025-08-28 02:43:48 -04:00 |
|
Michael Goin
|
a781e84ec2
|
[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-28 11:12:53 +08:00 |
|
Michael Goin
|
f9ca2b40a0
|
[Bugfix] Fix Marlin NVFP4 for modelopt (#23659)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-27 17:48:16 -04:00 |
|
Yongye Zhu
|
082cc07ef8
|
DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608)
|
2025-08-27 17:33:21 -04:00 |
|
Wentao Ye
|
3af47c3cc6
|
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-27 14:09:08 +00:00 |
|
Dipika Sikka
|
d272415e57
|
[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (#22674)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-08-27 05:00:21 +00:00 |
|
Michael Goin
|
de02b07db4
|
[Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 (#23678)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-27 09:34:57 +08:00 |
|
czhu-cohere
|
2c2b140ae8
|
[quantization] use channel scales for w4a8 + misc fixes (#23570)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-08-26 18:23:23 -07:00 |
|
nvjullin
|
f66673a39d
|
[Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-26 06:54:04 -07:00 |
|
Michael Goin
|
d52358c1e0
|
[Perf] Remove duplicated NVFP4 blockscales to save memory (#23379)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-26 19:16:33 +08:00 |
|
weiliang
|
ae067888d6
|
Update Flashinfer to 0.2.14.post1 (#23537)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-25 18:30:44 -07:00 |
|
Ming Yang
|
504d914314
|
[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-08-24 18:06:35 -07:00 |
|
czhu-cohere
|
e76e233540
|
[kernel] Support W4A8 on Hopper (#23198)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-08-24 06:18:04 +00:00 |
|
Daifeng Li
|
fa78de9dc3
|
Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527)
Signed-off-by: feng <fengli1702@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 20:53:21 -06:00 |
|
elvischenv
|
24d0c9e6ed
|
[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-22 22:09:05 +00:00 |
|