biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Charlie Fu	ed2462030f	[Bugfix] Fix moe weight losing all extra attrs after `process_weights_after_loading`. (#16854 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-04-28 21:05:07 +00:00
Lucas Wilkinson	7eb4255628	[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 22:13:29 -07:00
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
Michael Goin	c70cf0fe06	[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-10 15:08:47 +08:00
zxfan-cpu	ad971af8c7	[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161 )	2025-04-07 20:48:47 -07:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
liuzhenwei	0812d8dd41	[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945 ) Signed-off-by: zhenwei <zhenweiliu@habana.ai>	2025-04-04 09:38:55 -07:00
Roger Wang	0e00d40e4f	[V1][Bugfix] Fix typo in MoE TPU checking (#15927 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-04-01 23:46:42 -07:00
Alexander Matveev	7e4e709b43	[V1] TPU - Fix fused MOE (#15834 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-31 22:58:07 -07:00
Robert Shaw	43ed4143c4	[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-03-27 06:47:25 +00:00
Robert Shaw	e1e0fd7543	[TPU] Avoid Triton Import (#15589 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-27 06:43:02 +00:00
Mengqing Cao	fb22be5817	[moe][quant] add weight name case for offset (#15515 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-27 04:50:29 +00:00
vllmellm	5ebf66748b	[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-26 16:30:30 +08:00
Gregory Shtrasberg	f533b5837f	[ROCm][Kernel] MoE weights padding (#14454 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-03-24 23:45:30 +00:00
liuzhenwei	5eeadc2642	[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303 ) Signed-off-by: zhenwei <zhenweiliu@habana.ai>	2025-03-24 09:48:40 -07:00
Thien Tran	95d680b862	[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it (#14681 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-13 20:43:18 -07:00
Li, Jiang	ff47aab056	[CPU] Upgrade CPU backend to torch-2.6 (#13381 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-03-12 10:41:13 +00:00
Tyler Michael Smith	958adce478	[Bugfix] Fix use_direct_call condition in FusedMoE layer for (#14382 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 14:17:21 -08:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Jee Jee Li	67fc426845	[Misc] Print FusedMoE detail info (#13974 )	2025-02-27 18:53:13 -05:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Michael Goin	07c4353057	[Model] Support Grok1 (#13795 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-26 01:07:12 +00:00
Tyler Michael Smith	1e15aaef56	[Bugfix][Quantization] Fix FP8 + EP (#13784 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-25 10:54:17 +08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Dipika Sikka	7ca9934fe7	[Misc] Update w2 scale loading for GPTQMarlinMoE (#12757 )	2025-02-06 01:02:14 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Dipika Sikka	eb5cb5e528	[BugFix] Fix parameter names and `process_after_weight_loading` for W4A16 MoE Group Act Order (#11528 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-23 21:40:33 +00:00
Avshalom Manevich	263a870ee1	[Hardware][TPU] workaround fix for MoE on TPU (#11764 )	2025-01-12 10:53:51 -05:00
Li, Jiang	aa1e77a19c	[Hardware][CPU] Support MOE models on x86 CPU (#11831 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-01-10 11:07:58 -05:00
Robert Shaw	2339d59f92	[BugFix] Fix quantization for all other methods (#11547 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details	2024-12-26 22:23:29 -08:00
Simon Mo	f49777ba62	Deepseek v3 (#11502 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>	2024-12-26 16:09:44 -08:00
Michael Goin	2072924d14	[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 15:33:30 -08:00
Cyrus Leung	fa6ecb9aa7	[Model] Clean up MiniCPMV (#10751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-29 04:47:06 +00:00
youkaichao	32176fee73	[torch.compile] support moe models (#9632 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 21:58:04 -07:00
Luka Govedič	0f41fbe5a3	[torch.compile] Fine-grained CustomOp enabling mechanism (#9300 )	2024-10-17 18:36:37 +00:00
Michael Goin	873edda6cf	[Misc] Support FP8 MoE for compressed-tensors (#8588 )	2024-09-25 09:43:36 -07:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Wenxiang	1248e8506a	[Model] Adding support for MSFT Phi-3.5-MoE (#7729 ) Co-authored-by: Your Name <you@example.com> Co-authored-by: Zeqi Lin <zelin@microsoft.com> Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>	2024-08-30 13:42:57 -06:00
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
Michael Goin	aae74ef95c	Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 )" (#7764 )	2024-08-22 03:42:14 +00:00
Dipika Sikka	8678a69ab5	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-21 16:17:10 -07:00
Dipika Sikka	d3bdfd3ab9	[Misc] Update Fused MoE weight loading (#7334 )	2024-08-13 14:57:45 -04:00
Robert Shaw	683e3cb9c4	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00
Robert Shaw	dbe5588554	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
Woosuk Kwon	c467dff24f	[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )	2024-07-16 09:56:28 -07:00
Woosuk Kwon	ec9933f4a5	[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289 )	2024-07-15 19:02:14 +00:00
Robert Shaw	fb6af8bc08	[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417 )	2024-07-13 20:03:58 -07:00
Robert Shaw	7c008c51a9	[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-02 21:54:35 +00:00

50 Commits