biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andika Rachman	5e034f2e3d	[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend (#32092 ) Signed-off-by: andikarachman <andika.rachman.y@gmail.com>	2026-01-12 10:03:28 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Robert Shaw	9f6dcb71ae	[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-01-08 03:46:27 +00:00
Robert Shaw	5dcd7ef1f2	[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415 )	2026-01-07 19:42:33 -05:00
Cyrus Leung	db318326a5	[Misc] Use `deprecated` for `seed_everything` (#31780 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-06 11:29:55 +00:00
Fadi Arafeh	799b5721f6	[cpu][bench] Add CPU paged attention benchmarks (#31720 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-06 10:57:57 +00:00
Michael Goin	e1cd7a5faf	[Bugfix] Add init_workspace_manager to moe kernel benchmarks (#31042 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-05 19:14:33 -08:00
Alfred	a0e9ee83c7	[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604 ) Signed-off-by: Alfred <massif0601@gmail.com>	2026-01-02 22:24:51 +00:00
Amir Samani	030fc44914	use the same stream for cuda graph catpure and replay for NCCL (#29207 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 19:10:03 +08:00
Michael Goin	06d490282f	[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-21 09:41:57 -08:00
Michael Goin	0a1ab1e565	[Perf][Kernels] Vectorize `csrc/activations_kernels.cu` (#29512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:56:02 -08:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
Ming Yang	fba8906930	[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-11 08:20:45 +00:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Varun Sundar Rabindranath	19bee6d12d	[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 18:04:59 +00:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
Didier Durand	eca7a8fb59	[Doc]: fix typos in various files (#29230 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-24 11:10:48 +00:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Kunshang Ji	2a2d5d2780	Replace `torch.cuda.Event` with `torch.Event` for better hardware compatibility (#26985 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-18 11:34:36 -08:00
Michael Goin	f9a4087182	Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-11 11:46:04 -05:00
Xin Yang	57201a6a4c	Fix rotary embedding benchmark script (#28323 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-10 21:57:12 -05:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Xiake Sun	03fa4d3fb3	[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com> Signed-off-by: Xiake Sun <xisun@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 04:53:40 +00:00
Michael Goin	f32229293e	Disable nm-testing models with issues in CI (#28206 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-06 06:19:07 -08:00
tomeras91	e4ee658672	[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-04 12:59:43 +00:00
yugong333	2ec401bc39	Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 18:27:35 +08:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Isotr0py	3125d79950	[Chore] Remove unused `PolyNorm` layer (#27110 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-17 19:03:43 +00:00
wangxiyuan	8f4b313c37	[Misc] rename torch_dtype to dtype (#26695 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-15 12:11:48 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Roberto L. Castro	96ad65b7fe	[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-10 09:43:40 -07:00
Elvir Crnčević	7b03584de8	Silu v2 (#25074 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: elvircrn <elvircrn@gmail.com> Signed-off-by: Elvir Crnčević <elvircrn@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>	2025-10-10 15:19:53 +00:00
Lukas Geiger	6273fe8d3d	[Benchmarks] Fix imports in FP8 tuning script (#26407 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 16:31:59 +00:00
Lukas Geiger	338b1bf04f	[Benchmarks] Add support for Qwen 3 VL MoE tuning (#26419 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-08 14:01:08 +00:00
Harry Mellor	557b2e961d	Remove all cases of `fmt: on/off` (#26253 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:14 -07:00
Jiangyun Zhu	eb0fa43868	[Perf] Optimize `reshape_and_cache` CUDA Kernel (#25955 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Liu-congo <1502632128@qq.com>	2025-10-03 01:33:46 -07:00
ElizaWszola	502640c3f9	[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-10-02 19:35:13 +00:00
Jee Jee Li	67f3fb0844	[Bench] Add DeepSeekV32 to MoE benchmark (#25962 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-30 14:13:48 -07:00
Cyrus Leung	2f17117606	[mypy] Fix wrong type annotations related to tuple (#25660 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 13:00:45 +00:00
Tyler Michael Smith	1260180c67	Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-25 08:05:21 +00:00
Saman A. Pour	90b139cfff	Enable Fbgemm NVFP4 on Dense models (#25609 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-24 21:12:53 -07:00
Wentao Ye	1f29141258	[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 18:52:36 -04:00
Michael Goin	d83f3f7cb3	Fixes and updates to bench_per_token_quant_fp8 (#25591 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-24 08:30:15 -07:00
Chenxi Yang	0d235b874a	Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302 ) Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com>	2025-09-23 18:07:42 -06:00
ElizaWszola	63400259d0	[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 12:03:10 -07:00
Amir Samani	8c1c81a3de	[core] add nccl symmetric memory for all reduce (#24532 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 14:33:06 -04:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00
Cyrus Leung	6c117cff7d	[Frontend] Pass API server count to each process (#23717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 01:15:19 +08:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00

1 2 3 4

200 Commits