[DebuggerV2] Flesh out graph execution data display #3528

caisq · 2020-04-17T21:09:22Z

Motivation for features / changes
- Continue developing DebuggerV2 plugin, specifically its GraphExecutionContainer that visualizes the details of the intra-graph execution events.
Technical description of changes
- Add necessary actions, selectors, reducers and effects to support the lazy, paged loading of GraphExecutions
- Add a cdk-virtual-scroll-viewport to GraphExecutionComponent
- The lazy loading is triggered by scrolling events on the cdk-virtual-scroll-viewport
- The displaying detailed debug-tensor summaries such as dtype, rank, shape, and numeric breakdown will be added in the follow PRs. This PR just adds displaying of the tensor name and op type in the cdk-virtual-scroll-viewport.
Screenshots of UI changes
- Loaded state:
- Loading state (mat-spinner to be added in follow-up CLs):
Detailed steps to verify changes work correctly (as executed by you)
- Unit tests added
- Manual testing against logdirs with real tfdbg2 data of different sizes

caisq · 2020-04-17T21:14:29Z

stephanwlee · 2020-04-20T16:59:33Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects.ts

+        this.store.select(getGraphExecutionDisplayCount)
+      ),
+      filter(([, runId, numGraphExecutions]) => {
+        return runId !== null && numGraphExecutions > 0;


To make it a tiny bit readable, can we only select the things we need here? i.e., move L565-566 to L588?

stephanwlee · 2020-04-20T17:03:18Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects_test.ts

+    for (const {dataExists, page3Size, loadingPages} of [
+      {dataExists: false, page3Size: 0, loadingPages: [3]},
+      {dataExists: false, page3Size: 0, loadingPages: []},
+      {dataExists: true, page3Size: 2, loadingPages: []},
+    ]) {
+      it('triggers GraphExecution loading', fakeAsync(() => {


Unsure about jasmine, but for at least mocha, you need to have unique spec name in order for tests to be run as a separate spec. Please generate a unique name for these. Please try changing value of L917 to something faulty and see if it fails.

Good catch. Fixed.

stephanwlee · 2020-04-20T17:05:18Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects_test.ts

+        );
+
+        action.next(graphExecutionScrollToIndex({index: newScrollBeginIndex}));
+        tick(100);


What does this achieve? Is this to sequentialize promise returned by the data source? Can we instead use proper async/await?

tick(100) is needed because of the debounceTime(100) in the effect created by onGraphExecutionScroll(). This test won't pass without it. 100 is therefore not an arbitrary number.

Using fakeAsync() with tick() is better than async/await because it is simulated passage of time and hence should make the test run in shorter period of time.

stephanwlee · 2020-04-20T17:29:48Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/store/debugger_reducers.ts

+        const pageIndex = Math.floor(i / pageSize);
+        if (graphExecutionDataLoadingPages.indexOf(pageIndex) !== -1) {
+          graphExecutionDataLoadingPages.splice(
+            graphExecutionDataLoadingPages.indexOf(pageIndex),
+            1
+          );
+        }


Is there a reason why we chose array for graphExecutionDataLoadingPages instead of an object?

graphExecutionDataLoadingPages maintains the pages that are currently being loaded. An object could serve the same purpose, perhaps a little more performantly if the number of pages is large. But the downside is that the key type will be a little arbitrary (perhaps just null?) We don't want to keep track of all the pages, because the number of pages can be large (consider a debugger run with 500k graph executions, which leads to 500k / 200 = 2500 keys in the object if we did that). In practice, the performance shouldn't matter either, because the amount of loading pages should usually be small (the size of the array grows with user scrolling the list, at a debounced rate).

stephanwlee · 2020-04-20T17:36:44Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/store/debugger_reducers_test.ts

+          begin: 2,
+          end: 4,
+          graph_executions: [
+            createTestGraphExecution({op_name: 'TestOp_2'}),


Can you change this name so I can know whether we are expecting to overwrite or not?

Done. It is supposed to be overwritten. The test now reflects that.

stephanwlee · 2020-04-20T17:44:10Z

...gins/debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.css

+}
+
+.tensor-name-and-op-type {
+  direction: rtl;


why do we use rtl in this file? What is this achieving? Is it abusing a11y for UI purposes? If so, how can we make these properly a11y complaint?

So the rationale for rtl is as follows. We often see very long node/tensor names in tf problems, e.g.,

resnet/residual_block_50/conv_30/Conv2D_0

resnet/residual_block_50/conv_31/Conv2D_0

When the space is limited, we want to omit some parts of the strings with ellipsis (...). But often the most interest part of a node/tensor name is at the end or toward the end. So we want to omit the beginning part of the name. Using rtl together with text-overflow: ellipsis and white-space: nowrapachieves that purpose.

Ah, about that. While I am okay for this to merge as is, I would like to talk about nicer approach. Let's chat offline (I want to show you something)

stephanwlee · 2020-04-20T17:44:53Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ng.html


-  <!-- TODO(cais): Add cdk-virtual-scroll-viewport for graph executions -->
+  <cdk-virtual-scroll-viewport
+    #executionsScroll


Is this used? I do not see any references to this.

You're right. It's unused. Removed.

stephanwlee · 2020-04-20T17:46:07Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ng.html

+        </div>
+        <div *ngIf="graphExecutionData[i]; else dataLoading">
+          <div class="tensor-name-and-op-type">
+            <div class="tensor-name">


No AI: you probably mean span in a lot of these places. I cannot tell by just reading these though.

In these cases here I do want div, because I want them to be one separate lines (e.g., tensor-name and op-type) and be nested (e.g., tensor-name-and-op-type and tensor-name). But follow-up CLs will indeed use a lot of spans. Stay tuned.

stephanwlee · 2020-04-20T17:47:02Z

...ugins/debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ts

  numGraphExecutions: number | null = null;
+
+  @Input()
+  graphExecutionData: {[index: number]: GraphExecution} = {};


components should not define default values. Prefer to use graphExecutionData!: {[index: number]: GraphExecution};

(it forces container to pass the input or remove unused input in the component)

Done here and elsewhere in this component file.

stephanwlee · 2020-04-20T17:52:19Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_container_test.ts

+  it('does not render execs viewport if # execs = 0', fakeAsync(() => {
+    const fixture = TestBed.createComponent(GraphExecutionsContainer);
+    store.overrideSelector(getNumGraphExecutions, 0);
+    fixture.autoDetectChanges();


interesting but prefer to use store.refreshState() and fixture.detectChanges() without tick and fakeAsync (is this for the virtual scrolling on @angular/material/cdk)?

Nonetheless, 500 in tick(500) feels very arbitrary.

Yes, the special test patterns you see here (including autoDetectChanges(), tick() and fakeAsync()) are used because of the virtual scrolling used by the component.

autoDetectChanges() seems to be required by the cdk scrolling. I couldn't get the test to pass by using refreshState() and detectChanges().

As for tick(500), I found it can be replaced with simply tick(). Done.

fakeAsync() is required for tick(). See https://angular.io/api/core/testing/tick.

caisq

Thanks for the review!

caisq · 2020-04-21T02:01:10Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects.ts

+        this.store.select(getGraphExecutionDisplayCount)
+      ),
+      filter(([, runId, numGraphExecutions]) => {
+        return runId !== null && numGraphExecutions > 0;


caisq · 2020-04-21T02:11:33Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ng.html


-  <!-- TODO(cais): Add cdk-virtual-scroll-viewport for graph executions -->
+  <cdk-virtual-scroll-viewport
+    #executionsScroll


You're right. It's unused. Removed.

caisq · 2020-04-21T02:17:28Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ng.html

+        </div>
+        <div *ngIf="graphExecutionData[i]; else dataLoading">
+          <div class="tensor-name-and-op-type">
+            <div class="tensor-name">


In these cases here I do want div, because I want them to be one separate lines (e.g., tensor-name and op-type) and be nested (e.g., tensor-name-and-op-type and tensor-name). But follow-up CLs will indeed use a lot of spans. Stay tuned.

caisq · 2020-04-21T02:17:38Z

...ugins/debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.ts

  numGraphExecutions: number | null = null;
+
+  @Input()
+  graphExecutionData: {[index: number]: GraphExecution} = {};


Done here and elsewhere in this component file.

caisq · 2020-04-21T02:32:07Z

.../debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_container_test.ts

+  it('does not render execs viewport if # execs = 0', fakeAsync(() => {
+    const fixture = TestBed.createComponent(GraphExecutionsContainer);
+    store.overrideSelector(getNumGraphExecutions, 0);
+    fixture.autoDetectChanges();


Yes, the special test patterns you see here (including autoDetectChanges(), tick() and fakeAsync()) are used because of the virtual scrolling used by the component.

autoDetectChanges() seems to be required by the cdk scrolling. I couldn't get the test to pass by using refreshState() and detectChanges().

As for tick(500), I found it can be replaced with simply tick(). Done.

fakeAsync() is required for tick(). See https://angular.io/api/core/testing/tick.

caisq · 2020-04-21T02:39:38Z

...gins/debugger_v2/tf_debugger_v2_plugin/views/graph_executions/graph_executions_component.css

+}
+
+.tensor-name-and-op-type {
+  direction: rtl;


So the rationale for rtl is as follows. We often see very long node/tensor names in tf problems, e.g.,

resnet/residual_block_50/conv_30/Conv2D_0

resnet/residual_block_50/conv_31/Conv2D_0

When the space is limited, we want to omit some parts of the strings with ellipsis (...). But often the most interest part of a node/tensor name is at the end or toward the end. So we want to omit the beginning part of the name. Using rtl together with text-overflow: ellipsis and white-space: nowrapachieves that purpose.

caisq · 2020-04-21T02:39:57Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects_test.ts

+    for (const {dataExists, page3Size, loadingPages} of [
+      {dataExists: false, page3Size: 0, loadingPages: [3]},
+      {dataExists: false, page3Size: 0, loadingPages: []},
+      {dataExists: true, page3Size: 2, loadingPages: []},
+    ]) {
+      it('triggers GraphExecution loading', fakeAsync(() => {


Good catch. Fixed.

caisq · 2020-04-21T02:43:31Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/effects/debugger_effects_test.ts

+        );
+
+        action.next(graphExecutionScrollToIndex({index: newScrollBeginIndex}));
+        tick(100);


tick(100) is needed because of the debounceTime(100) in the effect created by onGraphExecutionScroll(). This test won't pass without it. 100 is therefore not an arbitrary number.

Using fakeAsync() with tick() is better than async/await because it is simulated passage of time and hence should make the test run in shorter period of time.

caisq · 2020-04-21T02:50:43Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/store/debugger_reducers_test.ts

+          begin: 2,
+          end: 4,
+          graph_executions: [
+            createTestGraphExecution({op_name: 'TestOp_2'}),


Done. It is supposed to be overwritten. The test now reflects that.

caisq · 2020-04-21T02:56:48Z

tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin/store/debugger_reducers.ts

+        const pageIndex = Math.floor(i / pageSize);
+        if (graphExecutionDataLoadingPages.indexOf(pageIndex) !== -1) {
+          graphExecutionDataLoadingPages.splice(
+            graphExecutionDataLoadingPages.indexOf(pageIndex),
+            1
+          );
+        }


graphExecutionDataLoadingPages maintains the pages that are currently being loaded. An object could serve the same purpose, perhaps a little more performantly if the number of pages is large. But the downside is that the key type will be a little arbitrary (perhaps just null?) We don't want to keep track of all the pages, because the number of pages can be large (consider a debugger run with 500k graph executions, which leads to 500k / 200 = 2500 keys in the object if we did that). In practice, the performance shouldn't matter either, because the amount of loading pages should usually be small (the size of the array grows with user scrolling the list, at a debounced rate).

* Motivation for features / changes * Continue developing DebuggerV2 plugin, specifically its GraphExecutionContainer that visualizes the details of the intra-graph execution events. * Technical description of changes * Add necessary actions, selectors, reducers and effects to support the lazy, paged loading of `GraphExecution`s * Add a `cdk-virtual-scroll-viewport` to `GraphExecutionComponent` * The lazy loading is triggered by scrolling events on the `cdk-virtual-scroll-viewport` * The displaying detailed debug-tensor summaries such as dtype, rank, shape, and numeric breakdown will be added in the follow PRs. This PR just adds displaying of the tensor name and op type in the `cdk-virtual-scroll-viewport`. * Screenshots of UI changes * Loaded state: ![image](https://user-images.githubusercontent.com/16824702/79614243-24f6e500-80ce-11ea-8b9f-412cb831a449.png) * Loading state (mat-spinner to be added in follow-up CLs): ![image](https://user-images.githubusercontent.com/16824702/79614131-e19c7680-80cd-11ea-8dad-2dfd4b998ada.png) * Detailed steps to verify changes work correctly (as executed by you) * Unit tests added * Manual testing against logdirs with real tfdbg2 data of different sizes

caisq added 9 commits April 16, 2020 16:35

[DebuggerV2] Flesh out graph execution data display

6150f26

Flesh out scrolling effect; Improve CSS

d7dd9c5

Add unit tests for selectors

4b31ddb

Add unit tests for reducers

20daabf

Adjust CSS

5eb5e10

Add unit tests for effect

d1ef4ab

Add container tests

4df718f

Fix loading spinner css

964c20c

Revert extraneous change

9e37e35

googlebot added the cla: yes label Apr 17, 2020

Tweak some comments

ab58b65

caisq marked this pull request as ready for review April 17, 2020 21:14

caisq requested a review from stephanwlee April 17, 2020 21:14

stephanwlee approved these changes Apr 20, 2020

View reviewed changes

Reply to comments

a7d71ba

caisq commented Apr 21, 2020

View reviewed changes

caisq added 2 commits April 20, 2020 23:10

Replace incorrect cdk_overlay with cdk_scrolling

cb0ce76

Refactor large effect pipe into two pipes

340bfa8

caisq merged commit cad3ec8 into tensorflow:master Apr 21, 2020

[DebuggerV2] Flesh out graph execution data display #3528

[DebuggerV2] Flesh out graph execution data display #3528

Uh oh!

Conversation

caisq commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caisq commented Apr 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

caisq commented Apr 17, 2020 •

edited

Loading