Varun Sundar Rabindranath
|
fb0571b077
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-16 12:53:11 -07:00 |
|
XuruiYang
|
845adb3ec6
|
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
|
2025-09-24 21:53:40 -07:00 |
|
Himanshu Jaju
|
0ec82edda5
|
[perf] Speed up align sum kernels (#21079)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
|
2025-07-21 11:19:23 -07:00 |
|
Richard Barnes
|
86debab54c
|
Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082)
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-01 06:48:10 +00:00 |
|
Wentao Ye
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
Michael Goin
|
2344192a55
|
Optimize moe_align_block_size for deepseek_v3 (#12850)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-13 18:43:37 -05:00 |
|
Shiyan Deng
|
f1042e86f0
|
[Misc] AMD Build Improvements (#12923)
|
2025-02-12 02:36:10 -08:00 |
|
Gregory Shtrasberg
|
5b19b93082
|
[ROCm][Kernel] Using the correct warp_size value
|
2025-02-05 19:15:08 -08:00 |
|
Yang Chen
|
95460fc513
|
[Kernel] port sgl moe_align_block_size kernels (#12574)
sgl_moe_align_block_size is based on:
ded9fcd09a
moe_align_block_size is based on:
ba5112ff69
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-02-03 13:09:50 +08:00 |
|
ElizaWszola
|
221d388cc5
|
[Bugfix][Kernel] Fix moe align block issue for mixtral (#12413)
|
2025-01-25 01:49:28 +00:00 |
|
Jinzhen Lin
|
1e60f87bb3
|
[Kernel] fix moe_align_block_size error condition (#12239)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-01-21 10:30:28 -08:00 |
|
Jinzhen Lin
|
750f4cabfa
|
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-01-20 16:42:16 -08:00 |
|
Simon Mo
|
f49777ba62
|
Deepseek v3 (#11502)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
|
2024-12-26 16:09:44 -08:00 |
|
Charlie Fu
|
59449095ab
|
[Performance][Kernel] Fused_moe Performance Improvement (#9384)
Signed-off-by: charlifu <charlifu@amd.com>
|
2024-10-24 15:37:52 -07:00 |
|