Update readme #306

gaoteng-git · 2021-06-17T07:56:31Z

No description provided.

cloudhan · 2021-06-17T08:35:19Z

tb_plugin/docs/gpu_utilization.md

                      For example, a kernel with only one thread per block can’t fully utilize each SM. 

-* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 
+* Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 


lets simply call it memory bounded instead of memory bandwidth bounded

I am actually not quite sure how to make it clear in the case of occupancy. The core logic behind occupancy is to over-subscribe the GPU to hide the latency, since intra-warp context switch is quite fast. Once target occupancy is achieved, it does not make it better after that.

See page 13 of GPU Performance Analysis and Optimization

I think we should remove the whole sentence here.

Memory bounded could be interpreted as size bounded or bandwidth bounded. So I prefer to add "bandwidth".

cloudhan · 2021-06-17T08:45:29Z

tb_plugin/docs/gpu_utilization.md

                      For example, a kernel with only one thread per block can’t fully utilize each SM. 

-* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 
+* Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 


I am actually not quite sure how to make it clear in the case of occupancy. The core logic behind occupancy is to over-subscribe the GPU to hide the latency, since intra-warp context switch is quite fast. Once target occupancy is achieved, it does not make it better after that.

See page 13 of GPU Performance Analysis and Optimization

I think we should remove the whole sentence here.

gaoteng-git · 2021-06-17T09:44:17Z

@cloudhan Cool reference URL. I've added it after this sentence as Reference. Giving user more info to judge is good.

gaoteng-git added 2 commits June 17, 2021 15:51

update doc and tooltip

2362a42

update end2end file

d5a4799

facebook-github-bot added the cla signed label Jun 17, 2021

cloudhan reviewed Jun 17, 2021

View reviewed changes

cloudhan approved these changes Jun 17, 2021

View reviewed changes

update readme

cbcc53f

gaoteng-git marked this pull request as ready for review June 17, 2021 09:42

gaoteng-git merged commit 29c3c46 into pytorch:plugin/0.2 Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update readme #306

Update readme #306

Uh oh!

gaoteng-git commented Jun 17, 2021

Uh oh!

cloudhan Jun 17, 2021

Uh oh!

cloudhan Jun 17, 2021

Uh oh!

gaoteng-git Jun 17, 2021

Uh oh!

cloudhan Jun 17, 2021

Uh oh!

gaoteng-git commented Jun 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update readme #306

Update readme #306

Uh oh!

Conversation

gaoteng-git commented Jun 17, 2021

Uh oh!

cloudhan Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

cloudhan Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gaoteng-git Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

cloudhan Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

gaoteng-git commented Jun 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants