biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Wentao Ye	a42ab317ac	[Log] Optimize startup log (#28948 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-21 08:46:20 -08:00
Wentao Ye	5031cd5d55	[Refactor] Optimize `select_experts` (#28069 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 18:53:15 -05:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Robert Shaw	26990d25dc	[Bugfix] Update device name for H200 detection (#28349 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-08 19:01:11 +00:00
Michael Goin	0852527647	[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:20:55 -08:00
Roger Meier	2918c1b49c	[Model] Use the same fused_moe configs for all H200 devices (#23642 ) Signed-off-by: Roger Meier <r.meier@siemens.com>	2025-10-30 17:36:56 +00:00
Danielle Robinson	9932ed6a83	[Kernel] Adding split_K implementation for fused_moe_lora (#27291 ) Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Signed-off-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-27 02:05:24 -07:00
tomeras91	61089465a6	[Model] Add MoE support for NemotronH (#25863 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-10-23 10:27:23 +00:00
Chen Wu	5f6cbf60d6	[Feature][Kernel]FusedMoE LoRA (#21229 ) Signed-off-by: wuchen <cntryroa@gmail.com> Signed-off-by: banjuede <lmklhc@163.com> Signed-off-by: Chen Wu <cntryroa@gmail.com> Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: bk-201 <joy25810@foxmail.com> Co-authored-by: wuchen <wuchen@zetyun.com> Co-authored-by: Nathan Van Gheem <vangheem@gmail.com> Co-authored-by: banjuede <lmklhc@163.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: bk-201 <joy25810@foxmail.com>	2025-10-21 03:01:37 +00:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
zhrrr	75c7ad9918	[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-10-17 07:30:35 +00:00
Boyuan Feng	08405609cc	disable graph partition in custom op (#26952 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 11:08:47 +08:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
Bram Wasti	7d8975de84	Deepseek-v3 Batch Invariant on 8xH100 (#26609 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-15 22:06:02 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
bnellnm	a462331e36	[Bugfix] Disable moe inplace for torch >= 2.9 (#26497 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 18:07:38 +00:00
bnellnm	da364615fc	[Kernels] Modular kernel refactor (#24812 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-08 17:51:52 -04:00
fxmarty-amd	41f1cf38f2	[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166 )	2025-10-07 09:35:26 -04:00
Harry Mellor	b893d661b1	Fix per file ruff ignores related to simplification (#26259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 20:31:53 +00:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Varun Sundar Rabindranath	7ef40bb983	[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-03 20:13:13 -04:00
Varun Sundar Rabindranath	8c9117181d	[Misc] Remove typing.List (#26150 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-03 07:00:33 +00:00
XuruiYang	845adb3ec6	[Model] Add LongCat-Flash (#23991 ) Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>	2025-09-24 21:53:40 -07:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Bowen Wang	06a41334c7	[EPLB] Reduce EPLB Inference Overhead (#24573 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-22 16:31:05 +00:00
bnellnm	4bdf400218	[Bugfix] Fix chunked a2_scales in modular kernels (#25264 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-19 19:42:01 +00:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Tahsin Tunan	cef32104b4	[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-16 18:31:06 -07:00
Jee Jee Li	04ad0dc275	[benchmark] Add triton version in the moe tuned config (#24769 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 14:10:54 +08:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
Xin Yang	8fb85b7bb6	Add routed_scaling_factor to MoE grouped topk (#23123 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-29 21:36:48 -07:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
Xin Yang	8a3cd90af5	[Kernel] Add fused grouped_topk kernel for MoE (#23274 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-25 11:47:52 -07:00
Jee Jee Li	4d4061b6e7	[Kernel] Add cuda kernel for gpt_oss activation (#22951 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-17 05:03:24 +00:00
Michael Goin	94096a47c9	[UX] Separate marlin moe config logic from triton moe (#23006 )	2025-08-16 22:16:42 -04:00
bnellnm	8ad7285ea2	[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 14:46:00 -04:00
amirkl94	b4cef5e6c7	refactor: Change scaling factors calculation for flashinfer FusedMoE (#22812 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 06:19:31 +00:00
Simon Mo	f1f0d2fab8	Revert "[Kernel] Add cuda kernel for gpt_oss activation" (#22948 )	2025-08-14 17:38:10 -07:00
Jee Jee Li	81f4b96481	[Kernel] Add cuda kernel for gpt_oss activation (#22538 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-14 17:21:29 -07:00
Chi Zhang	98deac3879	[FEATURE] support custom vllm tuned config path for fused moe triton kernels (#22791 ) Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-13 20:27:25 +08:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Isotr0py	7e8d685775	[Minor] Fix pre-commit error on main (#22579 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-10 00:08:23 -07:00
Jee Jee Li	0c5254b82a	[oss] Init gpt-oss bf16 support (#22508 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-09 20:19:13 -07:00
Varun Sundar Rabindranath	f703b923f3	[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-08 16:09:59 -07:00
Wentao Ye	d7b28f3415	[Log] DeepGEMM Update Log for Unaligned Problem Size (#22208 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-04 19:13:19 -07:00
JartX	3654847db5	feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733 )	2025-08-01 21:12:19 -04:00
amirkl94	207b750e19	[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 06:00:01 -07:00
Isotr0py	a4528f0cac	[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-29 03:13:27 -07:00

1 2 3

146 Commits