KEMBAR78
uploader: display binary object bytes in `tensorboard dev list` output by caisq · Pull Request #3464 · tensorflow/tensorboard · GitHub
Skip to content

Conversation

@caisq
Copy link
Contributor

@caisq caisq commented Apr 1, 2020

  • Motivation for features / changes
    • A part of fulfilling feature request b/152749189: Displaying number of blob bytes contained by an experiment in the output of tensorboard dev list
  • Technical description of changes
    • Pairing CL/304187797
    • Use the newly added Experiment.total_blob_bytes field.
    • Adjust the ordering of the items slightly: this fits the intuitive hierarchy better IMO
      • Before: Scalars -> Runs -> Tags
      • After: Runs -> Tags -> Scalars -> Blob bytes (if exist)
    • Given the long length of the new label string ("Binary object bytes"), the width of the first column is increased from 12 to 20.
  • Screenshots of UI changes
    • image
    • Note the actual number of bytes in the screenshot above is incorrect. But that's due to an orthogonal issue that @ericdnielsen is fixing.
  • Detailed steps to verify changes work correctly (as executed by you)
    • Tested against a locally-running environment running on the said pairing CL.

@nfelt
Copy link
Contributor

nfelt commented Apr 1, 2020

FWIW I think @GalOshri had not wanted to call these "blobs" publicly, though I don't know exactly how else we would describe this.

@caisq caisq changed the title Uploader: Display blob bytes in tensorboard dev list output uploader: Display blob bytes in tensorboard dev list output Apr 2, 2020
@caisq caisq changed the title uploader: Display blob bytes in tensorboard dev list output uploader: Display binary object bytes in tensorboard dev list output Apr 2, 2020
@caisq caisq requested review from GalOshri and ericdnielsen April 2, 2020 02:17
@caisq caisq marked this pull request as ready for review April 2, 2020 02:17
@caisq
Copy link
Contributor Author

caisq commented Apr 2, 2020

Added @GalOshri to reviewers list for his comments on the wording and appearance of the new screen output.

@caisq caisq requested a review from bileschi April 2, 2020 13:12
@caisq caisq changed the title uploader: Display binary object bytes in tensorboard dev list output uploader: display binary object bytes in tensorboard dev list output Apr 2, 2020
@wchargin wchargin self-requested a review April 2, 2020 19:31
("Tags", str(experiment.num_tags)),
("Scalars", str(experiment.num_scalars)),
]
if experiment.total_blob_bytes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this conditional? Shouldn’t we show that experiments with no
blobs have 0 blob bytes?

If this is intended to account for old servers, it should be handled
with a fieldmask in the response instead, so that “zero” and “unset”
aren’t conflated. But it doesn’t seem terribly important to me that we
treat old servers specially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the PR description:

Blob bytes is shown only if it's non-zero
Rationale: The text output is already pretty verbose, requiring tedious manual scrolling to find an experiment. We want to avoid further increasing the size of the text unnecessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it more. I agree it's better to make it unconditional, because the "name" and "description" fields show something even when there is no name or description. It's better to be consistent with that. Done making the change and updating screenshot.

]
if experiment.total_blob_bytes:
data.append(
("Binary object bytes", str(experiment.total_blob_bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s avoid whitespace in the field name. As described in #2941, this
format is designed to be easily parseable with something like:

tensorboard dev list |
    awk '$1 == "Id" { id = $2 } $1 == "Scalars" && $2 > 1000 { print id }'

With spaces in "Binary object bytes", the $1 == "Scalars" would have
to be something like ($1 " " $2 " " $3) == "Binary object bytes",
which is pretty ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more important that this format be clearly intelligible to humans than that it be machine parseable. If we want to make it easier to parse, we should add some non-space delimiter between the field name and the value (like :) or (my preference), just add an actually standard output format option like JSON, rather than artificially constraining how much information we provide to users.

There was already an internal thread about the naming to try to determine what to call this field (to avoid "blob") and I don't think there is a one-word name that is sufficiently unambiguous since we need to specify that the unit here is bytes rather than the number of blobs or blob sequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either of those options—an unambiguous delimiter (: is kind of awkward
due to the URLs) or a --json/--output json flag—works for me, too.
Currently (in master), the output is clear to both humans and machines,
so I would prefer not to regress from that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion. I think "Binary object bytes" Is a nice tradeoff balance clarity and conciseness.

I like the --json option better than :. : feels a little awkward as @wchargin mentioned. Do you want me to add the json mode in this PR? It feels like a sufficiently distinct feature that merits a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like "Binary object bytes" is the best option. Can we add an explanation for this in the list command help text?

Adding a JSON option seems like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @GalOshri . I'll merge the PR as is soon.

I filed b/153232102 to track the proposed JSON output feature. It is sufficiently distinct from the goal of this PR to merit a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchargin I'm not sure if you want to take a look again at this PR. If you do, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ll take a look at the --json change and then revisit here.
Thanks!

]
if experiment.total_blob_bytes:
data.append(
("Binary object bytes", str(experiment.total_blob_bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update fieldmask above to actually request total_blob_bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 642 to 650
response.experiments.add(
experiment_id="789",
name="one",
num_scalars=8,
total_tensor_bytes=1234,
total_blob_bytes=0,
num_runs=1,
num_tags=10,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these test changes for? It looks like all the prod code changes
are in uploader_main.py, which isn’t covered by exporter_test, so I
don’t see why they’re relevant. (All tests pass either with or without
these changes.) I also don’t see why the changes are desirable: the RPC
path in this test invokes StreamExperiments with a fieldmask that does
not include these extra metadata fields, so why change the server to
populate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes to this file.

@caisq
Copy link
Contributor Author

caisq commented Apr 7, 2020

@wcharing #3480 is merged into this PR. Conflicts are resolved. "Binary object bytes" (corresponding to the JSON key "binary_object_bytes") is now added to formatters.py. Unit tests are added in formatter_test.py. Screenshot is updated.

@caisq caisq requested a review from wchargin April 7, 2020 21:10
Copy link
Contributor

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; thank you!

@caisq caisq merged commit 1ad7b3d into tensorflow:master Apr 7, 2020
bileschi pushed a commit to bileschi/tensorboard that referenced this pull request Apr 15, 2020
tensorflow#3464)

* Motivation for features / changes
  * A part of fulfilling feature request b/152749189: Displaying number of blob bytes contained by an experiment in the output of `tensorboard dev list`
* Technical description of changes
  * Pairing CL/304187797
  * Use the newly added `Experiment.total_blob_bytes` field.
  * Adjust the ordering of the items slightly: this fits the intuitive hierarchy better IMO
    * Before: Scalars -> Runs -> Tags 
    * After: Runs -> Tags -> Scalars -> Blob bytes (if exist)
  * Given the long length of the new label string ("Binary object bytes"), the width of the first column is increased from 12 to 20.
* Screenshots of UI changes
  - ![image](https://user-images.githubusercontent.com/16824702/78719773-7c3cce80-78f2-11ea-8ac5-ed323be974fd.png)
  - Note the actual number of bytes in the screenshot above is incorrect. But that's due to an orthogonal issue that @ericdnielsen is fixing.
* Detailed steps to verify changes work correctly (as executed by you)
  * Tested against a locally-running environment running on the said pairing CL.
bileschi pushed a commit that referenced this pull request Apr 15, 2020
#3464)

* Motivation for features / changes
  * A part of fulfilling feature request b/152749189: Displaying number of blob bytes contained by an experiment in the output of `tensorboard dev list`
* Technical description of changes
  * Pairing CL/304187797
  * Use the newly added `Experiment.total_blob_bytes` field.
  * Adjust the ordering of the items slightly: this fits the intuitive hierarchy better IMO
    * Before: Scalars -> Runs -> Tags 
    * After: Runs -> Tags -> Scalars -> Blob bytes (if exist)
  * Given the long length of the new label string ("Binary object bytes"), the width of the first column is increased from 12 to 20.
* Screenshots of UI changes
  - ![image](https://user-images.githubusercontent.com/16824702/78719773-7c3cce80-78f2-11ea-8ac5-ed323be974fd.png)
  - Note the actual number of bytes in the screenshot above is incorrect. But that's due to an orthogonal issue that @ericdnielsen is fixing.
* Detailed steps to verify changes work correctly (as executed by you)
  * Tested against a locally-running environment running on the said pairing CL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants