KEMBAR78
[fp8]Support fp8 deep gemm in paddle by risemeup1 · Pull Request #73092 · PaddlePaddle/Paddle · GitHub
Skip to content

Conversation

@risemeup1
Copy link
Contributor

@risemeup1 risemeup1 commented Jun 4, 2025

PR Category

Execute Infrastructure

PR Types

Others

Description

支持paddle接入fp8 deepgemm

pcard-67164

@paddle-bot
Copy link

paddle-bot bot commented Jun 4, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@risemeup1 risemeup1 force-pushed the support_fp8_deep_gemm_in_paddle branch 2 times, most recently from 2dabc5f to a5c4aea Compare June 4, 2025 08:48
@risemeup1 risemeup1 changed the title Support fp8 deep gemm in paddle [fp8]Support fp8 deep gemm in paddle Jun 4, 2025
@risemeup1 risemeup1 force-pushed the support_fp8_deep_gemm_in_paddle branch from a5c4aea to f618e85 Compare June 4, 2025 13:00
// See the License for the specific language governing permissions and
// limitations under the License.

// The file has been adapted from DeepSeek DeepEP project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The file has been adapted from DeepSeek DeepEP project
// The file has been adapted from DeepSeek DeepGEMM project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// The file has been adapted from DeepSeek DeepEP project
// Copyright (c) 2025 DeepSeek
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepEP/blob/main/LICENSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepEP/blob/main/LICENSE
// Licensed under the MIT License - https://github.com/deepseek-ai/DeepGEMM/blob/main/LICENSE

后同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// TODO: remove some useless computation for unaligned Ms
#pragma unroll
for (uint32_t local_idx = 0; local_idx < BLOCK_M / WAVE_BLOCK_M; ++ local_idx) {
auto m_offset = local_idx * WAVE_BLOCK_M;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里怎么用的是 tab?使用空格替换掉 tab

下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 fp8 目录没有 __init__.py?这样 fp8 目录不是一个 module,有潜在的 import 问题

另外,fp8 目录只有 deep_gemm 一个子目录是后续给其他模块预留的么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 112 to 115
# keys = {k: keys[k] for k in sorted(keys.keys())}
# signature = (name, f"{keys}")
# if signature in self.tuned:
# return self.tuned[signature]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

无用代码删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 86 to 89
# x_np = np.full((num_groups, m, k), 3)
# y_np = np.full((num_groups, n, k), 2)
# x=paddle.to_tensor(x_np).astype(paddle.bfloat16)
# y=paddle.to_tensor(y_np).astype(paddle.bfloat16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

无用注释删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +223 to +217
test_gemm()
test_m_grouped_gemm_contiguous()
test_m_grouped_gemm_masked()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 unittest 的形式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续再改

'gemm_fp8_fp8_bf16_nt',
'm_grouped_gemm_fp8_fp8_bf16_nt_contiguous',
'm_grouped_gemm_fp8_fp8_bf16_nt_masked',
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下,这些是公开 API 么?这些如果是公开 API,那就需要为每个 API

  • 完善类型提示
  • 完善文档

另外,如果是公开 API,目前这里也是不可行的,因为编译文档的环境这里应该不满足 cuda_version >= 12.9 and 90 in paddle.version.cuda_archs() 吧?这会导致官网无法渲染相关文档

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

m_grouped_gemm_fp8_fp8_bf16_nt_contiguous,
m_grouped_gemm_fp8_fp8_bf16_nt_masked,
)
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 ImportError 在什么时候会出现?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

@risemeup1 risemeup1 force-pushed the support_fp8_deep_gemm_in_paddle branch 2 times, most recently from cb0cabb to 11654d3 Compare June 4, 2025 18:15
@risemeup1 risemeup1 force-pushed the support_fp8_deep_gemm_in_paddle branch from 11654d3 to 03c2430 Compare June 5, 2025 02:17
Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾 for subprocess and print usage

Copy link
Contributor

@Hongqing-work Hongqing-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for changes in DeepEP.

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@risemeup1
Copy link
Contributor Author

/re-run approval

@risemeup1 risemeup1 merged commit e964106 into PaddlePaddle:develop Jun 5, 2025
65 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants