biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Didier Durand	4979eb79da	[Doc]: fix typos in various files (#24821 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-15 01:08:52 -07:00
Wentao Ye	fc2dbcda8b	[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (#24783 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-14 11:20:17 -04:00
Matthew Bonanni	7ba32aa60b	[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-12 15:45:53 -06:00
Hyogeun Oh (오효근)	41f17bf290	[Docs] Fix warnings in mkdocs build (continued) (#24740 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-09-12 06:43:15 -07:00
Wentao Ye	fcba05c435	[Bug] Fix Layer `weight_block_size` Assertion Issue (#24674 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-11 19:47:59 -04:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Duncan Moss	074854b24f	[Kernel][B200] `mxfp4` fused cutlass moe (#23696 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 17:04:56 -04:00
Wentao Ye	a892b259b4	[Doc] Remove Useless Comments (#24687 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-11 12:25:47 -07:00
Jerry Zhang	2048c4e379	[torchao] Support quantization configs using module swap (#21982 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-09-10 23:53:24 -07:00
Gregory Shtrasberg	9a161307f5	[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-10 13:59:55 -07:00
tomeras91	08abfa78ec	[Bugfix] fix modelopt exclude_modules name mapping (#24178 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-10 10:20:46 -07:00
Hyogeun Oh (오효근)	ccee371e86	[Docs] Fix warnings in `mkdocs build` (continued) (#24092 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-10 06:23:28 -07:00
RoadToNowhereX	c0bd6a684a	Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217 ) Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-10 06:22:31 -07:00
Wei	0efdb5c3ba	[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-10 04:27:53 +00:00
Didier Durand	f4962a6d55	[Doc]: fix typos in Python comments (#24417 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 00:22:16 -07:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
Isotr0py	00a4e56d8d	[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-06 09:23:12 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
bnellnm	e9b92dcd89	[Kernels] Overlap shared experts with send/recv (#23273 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-03 12:35:18 -04:00
Didier Durand	02d411fdb2	[Doc]: fix typos in Python comments (#24115 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:14:07 -07:00
Didier Durand	d7e1e59972	[Doc]: fix typos in Python comments (#24093 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:05:45 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00
Kyle Sayers	1c41310584	[Bugfix] Fix transform_config parsing in Compressed Tensors (#23945 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-09-02 13:54:10 -04:00
Yan Ma	7be0cb8e9e	[XPU][Feature] fp8 online quantization support for XPU (#23148 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>	2025-09-02 04:06:53 +00:00
Didier Durand	0235103cbb	[Doc]: fix typos in Python comments (#24042 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-01 19:07:45 -07:00
Jun-Howie	acc1a6e10a	Fix the bug related to loading GPTP INT3 weights. (#23328 ) Signed-off-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: JunHowie <JunHowie@aliyun.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-01 05:39:57 +00:00
JartX	183a70967a	[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) (#23994 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-01 03:33:40 +00:00
Xin Yang	8fb85b7bb6	Add routed_scaling_factor to MoE grouped topk (#23123 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-29 21:36:48 -07:00
Michael Goin	b7adf94c4a	Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-29 10:28:35 -07:00
elvischenv	16a45b3a28	[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671 ) Signed-off-by: jindih <jindih@nvidia.com> Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: jindih <jindih@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedic <lgovedic@redhat.com>	2025-08-28 19:36:50 +00:00
Po-Han Huang (NVIDIA)	95089607fa	[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-28 06:56:20 -07:00
Didier Durand	d3da2eea54	[Doc]: fix typos in Python scripts (#23828 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-28 05:37:38 -07:00
JartX	3462c1c522	[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-28 09:03:22 +00:00
Kyle Sayers	22feac8e95	[Transform] [Quantization] Add transforms to compressed tensors (#22486 )	2025-08-28 02:43:48 -04:00
Michael Goin	a781e84ec2	[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-28 11:12:53 +08:00
Michael Goin	f9ca2b40a0	[Bugfix] Fix Marlin NVFP4 for modelopt (#23659 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-27 17:48:16 -04:00
Yongye Zhu	082cc07ef8	DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608 )	2025-08-27 17:33:21 -04:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
Dipika Sikka	d272415e57	[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (#22674 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-08-27 05:00:21 +00:00
Michael Goin	de02b07db4	[Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 (#23678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-27 09:34:57 +08:00
czhu-cohere	2c2b140ae8	[quantization] use channel scales for w4a8 + misc fixes (#23570 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-26 18:23:23 -07:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
Michael Goin	d52358c1e0	[Perf] Remove duplicated NVFP4 blockscales to save memory (#23379 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-26 19:16:33 +08:00
weiliang	ae067888d6	Update Flashinfer to 0.2.14.post1 (#23537 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-25 18:30:44 -07:00
Ming Yang	504d914314	[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-08-24 18:06:35 -07:00
czhu-cohere	e76e233540	[kernel] Support W4A8 on Hopper (#23198 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-24 06:18:04 +00:00
Daifeng Li	fa78de9dc3	Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527 ) Signed-off-by: feng <fengli1702@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-22 20:53:21 -06:00
elvischenv	24d0c9e6ed	[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-22 22:09:05 +00:00

1 2 3 4 5 ...

524 Commits