From 78cacf70d41d15d688bd493ebc85845f7f2a3d5d Mon Sep 17 00:00:00 2001 From: Zhean Xu <94977922+xuzhean@users.noreply.github.com> Date: Wed, 26 Feb 2025 19:20:39 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 381adea..d3cbbdf 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Despite its lightweight design, DeepGEMM's performance matches or exceeds expert ## Performance -We test all shapes potentially used in DeepSeek-V3/R1 inference (including both prefilling and decoding, but without tensor parallelism) on H800 with NVCC 12.8. All speedup metrics are calculated in comparison to our internally and carefully optimized implementation based on CUTLASS 3.6. +We test all shapes potentially used in DeepSeek-V3/R1 inference (including both prefilling and decoding, but without tensor parallelism) on H800 SXM5 with NVCC 12.8. All speedup metrics are calculated in comparison to our internally and carefully optimized implementation based on CUTLASS 3.6. DeepGEMM does not behave very well on some shapes, optimization PRs are welcomed if you are interested.