KEMBAR78
Add a pod viewer tool for analyzing TPU pod performance. by qiuminxu · Pull Request #2111 · tensorflow/tensorboard · GitHub
Skip to content

Conversation

@qiuminxu
Copy link
Contributor

@qiuminxu qiuminxu commented Apr 12, 2019

Motivation for features / changes

As TPU pod is provided to cloud TPU users, we need to tool to provide a global visualization of the performance of each core, as well as the communication across the cores for your system.

Technical description of changes

The pod viewer UI shows:

  • A step slider, which allows you to select which step you want to dive into.
  • A topology graph, which interactively visualizes your TPU core in the whole TPU system.
  • A step breakdown chart, which visualizes a breakdown of a step for all cores. This can be used to track where the bottleneck of the system is and whether there is a particular core slows down the system.
  • A chart showing the latency of all the send-recv channels.
  • A chart showing the latency of all-reduce ops.
  • Communication links, which visualizes the send and recv channels in the topology graph. A channel details card also shows up on the right, providing detailed information of the channel, such as size of data transferred, latency and bandwidth.

Screenshots of UI changes

pod_viewer_demo1
pod_viewer_demo2

Detailed steps to verify changes work correctly (as executed by you)

bazel run tensorboard/plugins/profile:profile_demo
bazel run tensorboard:tensorboard -- --logdir=/tmp/profile_demo

@qiuminxu qiuminxu requested a review from stephanwlee April 12, 2019 17:28
Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed and most comments are stylistic nits.

  1. hlo and Ele seem to mean something important but I do not know what that would be. What do they mean?
  2. can you make sure the new license preambles are correct?
  3. for TypeScript files, can you add as much types as possible?

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@qiuminxu qiuminxu requested a review from stephanwlee April 16, 2019 16:34
Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I want to see

  • proper syntax for denoting private property
  • use more computed property instead of using an observer to update bunch of properties
  • correct Polymer property default value provider (e.g., value: () => [],)
  • code style (whitespaces mostly)

Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR now is in better shape then ever!
There are few stylistic nits and questions posed. After that, I think we are almost done!

@qiuminxu qiuminxu requested a review from stephanwlee April 20, 2019 00:03
Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the review. Code looks much readable now!

@stephanwlee stephanwlee merged commit 803c41a into tensorflow:master Apr 22, 2019
@qiuminxu
Copy link
Contributor Author

Thank you for the thorough review and many great suggestions!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants