-
Notifications
You must be signed in to change notification settings - Fork 206
Update readme #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update readme #306
Conversation
tb_plugin/docs/gpu_utilization.md
Outdated
| For example, a kernel with only one thread per block can’t fully utilize each SM. | ||
|
|
||
| * Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). | ||
| * Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets simply call it memory bounded instead of memory bandwidth bounded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not quite sure how to make it clear in the case of occupancy. The core logic behind occupancy is to over-subscribe the GPU to hide the latency, since intra-warp context switch is quite fast. Once target occupancy is achieved, it does not make it better after that.
See page 13 of GPU Performance Analysis and Optimization
I think we should remove the whole sentence here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory bounded could be interpreted as size bounded or bandwidth bounded. So I prefer to add "bandwidth".
tb_plugin/docs/gpu_utilization.md
Outdated
| For example, a kernel with only one thread per block can’t fully utilize each SM. | ||
|
|
||
| * Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). | ||
| * Est. Achieved Occupancy: For most cases such as memory bandwidth bounded kernels, the higher the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not quite sure how to make it clear in the case of occupancy. The core logic behind occupancy is to over-subscribe the GPU to hide the latency, since intra-warp context switch is quite fast. Once target occupancy is achieved, it does not make it better after that.
See page 13 of GPU Performance Analysis and Optimization
I think we should remove the whole sentence here.
|
@cloudhan Cool reference URL. I've added it after this sentence as Reference. Giving user more info to judge is good. |
No description provided.