-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[fp8]Support fp8 deep gemm in paddle #73092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fp8]Support fp8 deep gemm in paddle #73092
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
2dabc5f
to
a5c4aea
Compare
a5c4aea
to
f618e85
Compare
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
// The file has been adapted from DeepSeek DeepEP project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The file has been adapted from DeepSeek DeepEP project | |
// The file has been adapted from DeepSeek DeepGEMM project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// The file has been adapted from DeepSeek DeepEP project | ||
// Copyright (c) 2025 DeepSeek | ||
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepEP/blob/main/LICENSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepEP/blob/main/LICENSE | |
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepGEMM/blob/main/LICENSE |
后同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// TODO: remove some useless computation for unaligned Ms | ||
#pragma unroll | ||
for (uint32_t local_idx = 0; local_idx < BLOCK_M / WAVE_BLOCK_M; ++ local_idx) { | ||
auto m_offset = local_idx * WAVE_BLOCK_M; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里怎么用的是 tab?使用空格替换掉 tab
下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么 fp8
目录没有 __init__.py
?这样 fp8 目录不是一个 module,有潜在的 import 问题
另外,fp8
目录只有 deep_gemm
一个子目录是后续给其他模块预留的么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# keys = {k: keys[k] for k in sorted(keys.keys())} | ||
# signature = (name, f"{keys}") | ||
# if signature in self.tuned: | ||
# return self.tuned[signature] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无用代码删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/fp8/test_fp8_deep_gemm.py
Outdated
# x_np = np.full((num_groups, m, k), 3) | ||
# y_np = np.full((num_groups, n, k), 2) | ||
# x=paddle.to_tensor(x_np).astype(paddle.bfloat16) | ||
# y=paddle.to_tensor(y_np).astype(paddle.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无用注释删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test_gemm() | ||
test_m_grouped_gemm_contiguous() | ||
test_m_grouped_gemm_masked() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 unittest 的形式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续再改
python/paddle/incubate/__init__.py
Outdated
'gemm_fp8_fp8_bf16_nt', | ||
'm_grouped_gemm_fp8_fp8_bf16_nt_contiguous', | ||
'm_grouped_gemm_fp8_fp8_bf16_nt_masked', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认下,这些是公开 API 么?这些如果是公开 API,那就需要为每个 API
- 完善类型提示
- 完善文档
另外,如果是公开 API,目前这里也是不可行的,因为编译文档的环境这里应该不满足 cuda_version >= 12.9 and 90 in paddle.version.cuda_archs()
吧?这会导致官网无法渲染相关文档
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/incubate/__init__.py
Outdated
m_grouped_gemm_fp8_fp8_bf16_nt_contiguous, | ||
m_grouped_gemm_fp8_fp8_bf16_nt_masked, | ||
) | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 ImportError
在什么时候会出现?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
cb0cabb
to
11654d3
Compare
11654d3
to
03c2430
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for changes in DeepEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/re-run approval |
PR Category
Execute Infrastructure
PR Types
Others
Description
支持paddle接入fp8 deepgemm
pcard-67164