KEMBAR78
line chart: recreate charts upon fatal renderer errors by psybuzz · Pull Request #5239 · tensorflow/tensorboard · GitHub
Skip to content

Conversation

@psybuzz
Copy link
Contributor

@psybuzz psybuzz commented Aug 13, 2021

The line chart's internal ChartImpl notifies the line chart when it is a WebGL
renderer that loses its context due to the page having too many active WebGL
contexts at once (>30 for my device).

This change adds logic to treat those errors as fatal when the line chart needs
to be updated. Scalar cards now handle the fatal error by destroying and
recreating the entire line chart component, fixing the blank chart issue.

Manually checked that charts are not blank after scrolling through lots of line
charts (used the tagFilter to view 2x the line charts, since elements are
duplicated).

Alternatives considered: calling restoreContext() [1] on the context does not
consistently fix the blank charts. When it does, it appears to lose the original
context configuration (antialias: true, precision: 'highp', alpha: true), leading to
low quality visuals [2].

Googlers, see b/196294346.

[1] https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_lose_context/restoreContext
[2] https://imgur.com/a/kKnhqYJ

Diffbase: #5237
Followup: none

@stephanwlee
Copy link
Contributor

Would you have a material you can share on whether this is happening because of too many contexts? Casual Google search tells me that this can happen when a canvas becomes invisible while first creating/drawing it. Also there was some conversation about the library attempting to recover from the event automatically mrdoob/three.js#5857 (comment). Even with the remediation, it is believable that Three.js will fail to recover when the canvas leaves the viewport but, again, I do not get an impression that this happens because of # of contexts (unless your memory is running out and browser is eagerly removing it).

If this truly happens only when becoming invisible while drawing the first frame, I think we can do away without propagating the information all the way up to scalar_card_component but instead have internal state in line_chart_component.ts and based on the disableUpdate value, we can decide whether to reattempt to bootstrap or not. WDYT?

psybuzz added a commit that referenced this pull request Aug 16, 2021
Each line chart component in the Time Series dashboard gets its own canvas and
webgl rendering context. After scrolling through too many line charts (for my
device after ~30), the page issues warnings in the console like "Too many active
WebGL contexts. Oldest context will be lost".

When the browser decides there are too many contexts, it disposes old ones
causing blank line charts. This change adds an event dispatched by the line chart's
internal `ChartImpl` to notify clients when the rendering context is lost.

This only applies to the WebGL renderer for now.  A followup change addresses the
remediation of blank scalar cards.

Googlers, see b/196294346.
Followup: #5239
@psybuzz psybuzz force-pushed the ts-webgl-restore2 branch from cf07b0b to a2b122d Compare August 16, 2021 21:05
@psybuzz
Copy link
Contributor Author

psybuzz commented Aug 16, 2021

From what I've seen manually, blank charts can be consistently triggered well after the canvas is initially drawn and visible. In the attached screencast, TensorBoard loads the dashboard containing 100s of scalar cards, then after scrolling through about 30-40, we scroll 3 scalar cards into view, and observe 3 "too many contexts" warnings in the console. After scrolling up, we can hide and reshow all the original cards (via entering a filter and removing the tag filter). Doing so produces exactly 3 blank cards, suggesting that for all these cards that were originally drawn with a visible DOM element, the only difference between a blank one and a non-blank one after the repro is the blank ones were the ones 'evicted' by the browser.

Based on that, I think it's safe to say that listening to some 'context lost' signal is necessary. It's unclear to me what the exact max limit on webgl contexts is or whether it's dependent on GPU memory, but this crbug suggests it might be device dependent in the future.

webglContextLossBlankChart.webm.zip

After scrolling through more and more cards, the warning just keep piling on.

image

I think we can do away without propagating the information all the way up to scalar_card_component but instead have internal state in line_chart_component.ts and based on the disableUpdate value, we can decide whether to reattempt to bootstrap or not. WDYT?

It might be possible to re-bootstrap the line chart from within line_chart_component without making scalar_card_component aware of things. However, given how stateful the LineChartComponent is, I opted to do the 'full re-creation' in ScalarCardComponent, as it is much less complexity.

@psybuzz psybuzz requested a review from stephanwlee August 16, 2021 22:17
Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, can we make it so that the context loss issue is contained in the LineChartComponent (and it "blinks" using its own ngIf) instead of propagating it all the way up to the ScalarCardComponent?

Also, if your empirical and crbugs are correct, then depending on screen, this bug will happen no matter what if there are so many charts above the fold, right? Perhaps we should be using half SVG and half Canvas in the future.

@psybuzz psybuzz force-pushed the ts-webgl-restore2 branch from 80613b4 to f0dfe0b Compare August 17, 2021 22:38
@psybuzz psybuzz force-pushed the ts-webgl-restore2 branch from f0dfe0b to 2c3794d Compare August 17, 2021 22:41
@psybuzz
Copy link
Contributor Author

psybuzz commented Aug 17, 2021

Ready for another look!

At the very least, can we make it so that the context loss issue is contained in the LineChartComponent (and it "blinks" using its own ngIf) instead of propagating it all the way up to the ScalarCardComponent?

Done.

Also, if your empirical and crbugs are correct, then depending on screen, this bug will happen no matter what if there are so many charts above the fold, right?

Yes. After hiding and showing cards (e.g. enter the filtered view then clear the filter), the early cards were blank.

Perhaps we should be using half SVG and half Canvas in the future.

Yeah, something to consider in the future. This webgl limitation was quite surprising to me, and I think TensorBoard's use case is fairly unique among webapps.

@psybuzz psybuzz requested a review from stephanwlee August 17, 2021 22:49
private onContextLost() {
// Since context may be lost when the component is hidden or does not need
// updates, the teardown and recreation happens lazily.
this.isRenderingContextLost = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I thought you'd dispose lineChart here and set it to this.lineChart = null. Would that make the code more readable in anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, yes a bit more expected.

/**
* Minimally and imperatively updates the chart library depending on prop changed.
* Minimally and imperatively updates the chart library depending on prop
* changed. When adding new `lineChart.*()` calls, keep this in sync with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any lineChart.* API invoked in recoverRendererIfNeeded. Do you truly need them to be synced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to be this.lineChart.set*(). If we added a this.lineChart.setZMode(...) inside updateLineChart one day, we'd need the corresponding flag this.zModeUpdated = true inside recoverRendererIfNeeded so that any changes to initial values of this hypothetical "zMode" would persist after recovery.

Comment on lines 675 to 679
expect(fixture.componentInstance.chart.getLineChartForTest()).toBe(
chartBefore
);
expect(disposeSpy).toHaveBeenCalledTimes(0);
expectChartUpdateSpiesToHaveBeenCalledTimes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear whether these assertions are truly testing whether the line chart is attempting to recover. I guess assertion of lineChartBefore === lineChartAfter is the one? If so, put a semantic name on the assertion method so we can change the assertion easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

Copy link
Contributor Author

@psybuzz psybuzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

private onContextLost() {
// Since context may be lost when the component is hidden or does not need
// updates, the teardown and recreation happens lazily.
this.isRenderingContextLost = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, yes a bit more expected.

/**
* Minimally and imperatively updates the chart library depending on prop changed.
* Minimally and imperatively updates the chart library depending on prop
* changed. When adding new `lineChart.*()` calls, keep this in sync with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to be this.lineChart.set*(). If we added a this.lineChart.setZMode(...) inside updateLineChart one day, we'd need the corresponding flag this.zModeUpdated = true inside recoverRendererIfNeeded so that any changes to initial values of this hypothetical "zMode" would persist after recovery.

Comment on lines 675 to 679
expect(fixture.componentInstance.chart.getLineChartForTest()).toBe(
chartBefore
);
expect(disposeSpy).toHaveBeenCalledTimes(0);
expectChartUpdateSpiesToHaveBeenCalledTimes(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@psybuzz psybuzz merged commit 82cfd86 into tensorflow:master Aug 18, 2021
yatbear pushed a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023
Each line chart component in the Time Series dashboard gets its own canvas and
webgl rendering context. After scrolling through too many line charts (for my
device after ~30), the page issues warnings in the console like "Too many active
WebGL contexts. Oldest context will be lost".

When the browser decides there are too many contexts, it disposes old ones
causing blank line charts. This change adds an event dispatched by the line chart's
internal `ChartImpl` to notify clients when the rendering context is lost.

This only applies to the WebGL renderer for now.  A followup change addresses the
remediation of blank scalar cards.

Googlers, see b/196294346.
Followup: tensorflow#5239
yatbear pushed a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023
The line chart's internal `ChartImpl` notifies the line chart when it is a WebGL
renderer that loses its context due to the page having too many active WebGL
contexts at once (>30 for my device).

This change adds logic to treat those errors as fatal when the line chart needs
to be updated.  Scalar cards now handle the fatal error by destroying and
recreating the entire line chart component, fixing the blank chart issue.

Manually checked that charts are not blank after scrolling through lots of line
charts (used the tagFilter to view 2x the line charts, since elements are
duplicated).

Alternatives considered: calling restoreContext() [1] on the context does not
consistently fix the blank charts.  When it does, it appears to lose the original
context configuration (antialias: true, precision: 'highp', alpha: true), leading to
low quality visuals [2].

Googlers, see b/196294346.

[1] https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_lose_context/restoreContext
[2] https://imgur.com/a/kKnhqYJ

Diffbase: tensorflow#5237
dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023
Each line chart component in the Time Series dashboard gets its own canvas and
webgl rendering context. After scrolling through too many line charts (for my
device after ~30), the page issues warnings in the console like "Too many active
WebGL contexts. Oldest context will be lost".

When the browser decides there are too many contexts, it disposes old ones
causing blank line charts. This change adds an event dispatched by the line chart's
internal `ChartImpl` to notify clients when the rendering context is lost.

This only applies to the WebGL renderer for now.  A followup change addresses the
remediation of blank scalar cards.

Googlers, see b/196294346.
Followup: tensorflow#5239
dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023
The line chart's internal `ChartImpl` notifies the line chart when it is a WebGL
renderer that loses its context due to the page having too many active WebGL
contexts at once (>30 for my device).

This change adds logic to treat those errors as fatal when the line chart needs
to be updated.  Scalar cards now handle the fatal error by destroying and
recreating the entire line chart component, fixing the blank chart issue.

Manually checked that charts are not blank after scrolling through lots of line
charts (used the tagFilter to view 2x the line charts, since elements are
duplicated).

Alternatives considered: calling restoreContext() [1] on the context does not
consistently fix the blank charts.  When it does, it appears to lose the original
context configuration (antialias: true, precision: 'highp', alpha: true), leading to
low quality visuals [2].

Googlers, see b/196294346.

[1] https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_lose_context/restoreContext
[2] https://imgur.com/a/kKnhqYJ

Diffbase: tensorflow#5237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants