-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[DebuggerV2] Display detailed tensor debug-values in graph- and eager-execution components #3541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| tensorDebugMode: TensorDebugMode, | ||
| array: number[] | null | ||
| ): DebugTensorValue { | ||
| switch (+tensorDebugMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need + part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Removed it.
| */ | ||
| export function parseDebugTensorValue( | ||
| tensorDebugMode: TensorDebugMode, | ||
| array: number[] | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't read the full code yet so it may be very misinformed: Does it make sense to better type the value? Wherever this value is coming from, it looks like it is possible to use discriminative types instead of coercing the types throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. See my reply to your other comment about the possibility of wrong array length below. After adding the length checks the coercion here is no longer necessary. Removed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean but at least for some of the types, you can type as such:
interface NoTensor {
mode: TensorDebugMode.NO_TENSOR,
array: null
}
interface CurtHealth {
mode: TensorDebugMode. CURT_HEALTH,
array: [number, number],
}
interface ConciseHealth {
mode: TensorDebugMode. CONCISE_HEALTH,
array: [
number, /* what is this */
number, /* what is this */
number, /* what is this */
number, /* what is this */
number,
],
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I like the extra compile-time type safety it gives us. Done.
| }); | ||
|
|
||
| describe('CONCISE_HEALTH', () => { | ||
| it('all healthy', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these test specs are describing context, not behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a sweeping update all these it names in this file.
| ]); | ||
| expect(debugValue).toEqual({ | ||
| size: 1000, | ||
| numPositiveInfs: 22, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it weird that certain properties are missing. Can we just set them to 0 or does 0 signify something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in the doc string of this function, this function will omit 0-size categories. The rationale is that the return value is meant to be consumed by the UI, which needs to be concise. For instance, if a tensor has only nans but no infs, we want to show something like
size=100, nan x 30
instead of
size=100, nan x 30, -inf x 0, +inf x 0,
which is space-consuming and distracting. I expanded the comment a little to clarify that.
| * (e.g., counts of -inf, +inf and nan), the corresponding fields | ||
| * in the returned object will be defined only if the count is non-zero. | ||
| */ | ||
| export function parseDebugTensorValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high level question: I find this rather tedious. What happens if we just have hard dependency on jspb or OSS equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tedious. But this is the only place where this logic needs to exist in tensorboard. The background is that for efficiency, DebugNumericsSummaryV2 ops return a fixed-size 1D tensor for all instrumented tensors. The advantages of this format is that
- It is smaller in size compared to other options such as a proto or serialized proto
- All of such small tensors can be potentially stacked and transferred over device boundaries as a whole in future performance optimizations.
As such, the 1D vector must have a contract in what each number means (given the TensorDebugMode). This function here knows and carries out this contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too convinced about the performance argument. Protobuf, sure, may not be as efficient at representing such values as what you did here but it surely is closer to this than, say, JSON (jspb encodes values as array of arrays; it does not transfer binaries).
I am still okay with this because jspb dependency (closure version) on FE is quite not pleasant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
| </div> | ||
| <div | ||
| *ngIf="numPositiveFinites !== undefined && numPositiveFinites > 0" | ||
| z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| styles: [ | ||
| ` | ||
| :host { | ||
| display: inline-block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wrong to have block element (flex) inside inline-block. Just make the host an inline-flex instead? and remove div.flexbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised the CSS here and in related places.
| expect(tagElements.length).toEqual(3); | ||
| expect(countElements.length).toEqual(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toEqual -> toBe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Fixes all instances like this in this file.
| class="output-slot-container" | ||
| > | ||
| <div class="output-slot-number"> | ||
| Output slot {{ i }}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i + 1? or do you want it to read Output slot 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is that tensorflow's tf.Tensor object has 0-based slot indices in their names like unique:0 and unique:1. See: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L462. We want to be consistent with that. Added a comment here.
| width: 100%; | ||
| } | ||
|
|
||
| .tensor-debug-info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any components using this in this PR but please always think about how inline element cannot have block element inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these two unused classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
| class="output-slot-container" | ||
| > | ||
| <div class="output-slot-number"> | ||
| Output slot {{ i }}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is that tensorflow's tf.Tensor object has 0-based slot indices in their names like unique:0 and unique:1. See: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L462. We want to be consistent with that. Added a comment here.
| expect(tagElements.length).toEqual(3); | ||
| expect(countElements.length).toEqual(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Fixes all instances like this in this file.
| </div> | ||
| <div | ||
| *ngIf="numPositiveFinites !== undefined && numPositiveFinites > 0" | ||
| z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| }); | ||
|
|
||
| it('has pos inf', () => { | ||
| const debugValue = parseDebugTensorValue(TensorDebugMode.CONCISE_HEALTH, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good question. I should add a check for the array length for each switch case in parseDebugTensroValue(). Done. And done adding unit tests for those error conditions.
| ]); | ||
| expect(debugValue).toEqual({ | ||
| size: 1000, | ||
| numPositiveInfs: 22, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in the doc string of this function, this function will omit 0-size categories. The rationale is that the return value is meant to be consumed by the UI, which needs to be concise. For instance, if a tensor has only nans but no infs, we want to show something like
size=100, nan x 30
instead of
size=100, nan x 30, -inf x 0, +inf x 0,
which is space-consuming and distracting. I expanded the comment a little to clarify that.
| * (e.g., counts of -inf, +inf and nan), the corresponding fields | ||
| * in the returned object will be defined only if the count is non-zero. | ||
| */ | ||
| export function parseDebugTensorValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tedious. But this is the only place where this logic needs to exist in tensorboard. The background is that for efficiency, DebugNumericsSummaryV2 ops return a fixed-size 1D tensor for all instrumented tensors. The advantages of this format is that
- It is smaller in size compared to other options such as a proto or serialized proto
- All of such small tensors can be potentially stacked and transferred over device boundaries as a whole in future performance optimizations.
As such, the 1D vector must have a contract in what each number means (given the TensorDebugMode). This function here knows and carries out this contract.
| tensorDebugMode: TensorDebugMode, | ||
| array: number[] | null | ||
| ): DebugTensorValue { | ||
| switch (+tensorDebugMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Removed it.
| */ | ||
| export function parseDebugTensorValue( | ||
| tensorDebugMode: TensorDebugMode, | ||
| array: number[] | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. See my reply to your other comment about the possibility of wrong array length below. After adding the length checks the coercion here is no longer necessary. Removed them.
| width: 100%; | ||
| } | ||
|
|
||
| .tensor-debug-info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these two unused classes.
| styles: [ | ||
| ` | ||
| :host { | ||
| display: inline-block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised the CSS here and in related places.
| * (e.g., counts of -inf, +inf and nan), the corresponding fields | ||
| * in the returned object will be defined only if the count is non-zero. | ||
| */ | ||
| export function parseDebugTensorValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too convinced about the performance argument. Protobuf, sure, may not be as efficient at representing such values as what you did here but it surely is closer to this than, say, JSON (jspb encodes values as array of arrays; it does not transfer binaries).
I am still okay with this because jspb dependency (closure version) on FE is quite not pleasant.
| */ | ||
| export function parseDebugTensorValue( | ||
| tensorDebugMode: TensorDebugMode, | ||
| array: number[] | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean but at least for some of the types, you can type as such:
interface NoTensor {
mode: TensorDebugMode.NO_TENSOR,
array: null
}
interface CurtHealth {
mode: TensorDebugMode. CURT_HEALTH,
array: [number, number],
}
interface ConciseHealth {
mode: TensorDebugMode. CONCISE_HEALTH,
array: [
number, /* what is this */
number, /* what is this */
number, /* what is this */
number, /* what is this */
number,
],
}…-execution components (tensorflow#3541) * Motivation for features / changes * Let the relatively new `GraphExecutionComponent` display details information about debugger-instrumented tensors, such as dtype, rank, size, shape, breakdown of numerical values by type (inf, nan, zero, etc.), and whether inf/nan exists. * In the older `ExecutionDataComponent` for eager tensors, replace the old Angular code that serves a similar purpose with the same Component as used by `GraphExecutionComponent`. This improves unity and reduces maintenance overhead going forward. * Technical description of changes * Add type `DebugTensorValue` to store/debugger_types.ts. The new interface defines the possible types of data available from various `TensorDebugMode`s (an existing enum). * Add helper function `parseDebugTensorValue()` in a new file store/debug_tensor_value.ts to parse an array representation of TensorDebugMode-specific debugger-generated tensor data into a `DebugTensorValue`. * Add `DebugTensorValueComponent` and its various subcomponents in new folder views/debug-_tensor_value/ * Use `DebugTensorValueComponent` from `GraphExecutionComponent` and `ExecutionDataComponent`. * Screenshots of UI changes * CURT_HEALTH mode:  * CONCISE_HEALTH mode:  * SHAPE mode:  * FULL_HEALTH mode:  * Detailed steps to verify changes work correctly (as executed by you) * Unit tests are added for the new helper method and new component and its subcomponents * Manual verification against logdirs with real tfdbg2 data.
…-execution components (#3541) * Motivation for features / changes * Let the relatively new `GraphExecutionComponent` display details information about debugger-instrumented tensors, such as dtype, rank, size, shape, breakdown of numerical values by type (inf, nan, zero, etc.), and whether inf/nan exists. * In the older `ExecutionDataComponent` for eager tensors, replace the old Angular code that serves a similar purpose with the same Component as used by `GraphExecutionComponent`. This improves unity and reduces maintenance overhead going forward. * Technical description of changes * Add type `DebugTensorValue` to store/debugger_types.ts. The new interface defines the possible types of data available from various `TensorDebugMode`s (an existing enum). * Add helper function `parseDebugTensorValue()` in a new file store/debug_tensor_value.ts to parse an array representation of TensorDebugMode-specific debugger-generated tensor data into a `DebugTensorValue`. * Add `DebugTensorValueComponent` and its various subcomponents in new folder views/debug-_tensor_value/ * Use `DebugTensorValueComponent` from `GraphExecutionComponent` and `ExecutionDataComponent`. * Screenshots of UI changes * CURT_HEALTH mode:  * CONCISE_HEALTH mode:  * SHAPE mode:  * FULL_HEALTH mode:  * Detailed steps to verify changes work correctly (as executed by you) * Unit tests are added for the new helper method and new component and its subcomponents * Manual verification against logdirs with real tfdbg2 data.
GraphExecutionComponentdisplay details information about debugger-instrumented tensors, such as dtype, rank, size, shape, breakdown of numerical values by type (inf, nan, zero, etc.), and whether inf/nan exists.ExecutionDataComponentfor eager tensors, replace the old Angular code that serves a similar purpose with the same Component as used byGraphExecutionComponent. This improves unity and reduces maintenance overhead going forward.DebugTensorValueto store/debugger_types.ts. The new interface defines the possible types of data available from variousTensorDebugModes (an existing enum).parseDebugTensorValue()in a new file store/debug_tensor_value.ts to parse an array representation of TensorDebugMode-specific debugger-generated tensor data into aDebugTensorValue.DebugTensorValueComponentand its various subcomponents in new folder views/debug-_tensor_value/DebugTensorValueComponentfromGraphExecutionComponentandExecutionDataComponent.