Aggregate numeric values when slicing (What-If Tool) #2678

grovina · 2019-09-19T12:19:02Z

Motivation for features / changes

When slicing by a numeric feature, aggregate values in intervals instead of creating a slice for each different value.

Technical description of changes

The current implementation creates one slice for each different value of the feature. This might result in a huge number of slices that become unmanageable (imagine a slicing by features like "age" or "salary", which can have hundreds of unique values).
To work around this issue, we aggregate the values into buckets (like a histogram), then show aggregated statistics and performance results for each bucket.
Nothing changes for categorical features.

Screenshots of UI changes

Detailed steps to verify changes work correctly (as executed by you)

For both multi and age demos, wait for data to load and then:

Open the Performance tab
Slice by different features and check that:
- Categorical features behave just like before (compare with demos in prod at https://pair-code.github.io/what-if-tool)
- Numeric features with up to 10 unique values also remain the same
- Numeric features with more than 10 unique values are aggregated in buckets, and results are coherent with the correct totals.

Alternate designs / implementations considered

I opted for the equally-sized intervals (as opposed to the equally-populated ones, for example), mainly because they should be simpler to understand and because one of comparative columns is precisely the number of examples in each slice. The number of slices is controlled by a variable that could become a setting if needed.
As for notations, I also considered other a simple a — b, but the mathematical intervals are particularly precise and seem to be accessible enough.
Finally, generating the bucket edges beforehand was important to make the values look ok, since we would oftentimes get things like [17.43333333333, 20] printed otherwise. I tried to simply calculate them ad hoc, but having consistent and simple display turned out to be more complex than expected.

cc @jameswex and @tolga-b

This is a new boolean field `isNumeric` in `distanceStats_` to allow us to check is a feature is numeric instead of looking for the existance of `.stdDev`.

Each slice of the dataset is identified by a key. Currently, when slicing by a feature, each value becomes a slice and therefore a key. To facilitate how we handle with generating slices, and particularly to allow us to group multiple values into the same key (buckets for numerical features), we wrap the slice key generation in a separate function. This commit doesn't change any aggregating behaviour but makes way to the next steps.

We iterate over the dataset using the slice key function to attribute a key to each example. This avoids the usage of the `undefined` dummy keys, since all cases will be naturally covered by seeing all examples. Since we are now using a single function to handle slice keys throughout the code, we can change the key attribution logics in a consistent way.

We create buckets for each aggregated feature, so that we can associate each slice of that feature with an interval instead of a value. The getSliceKey_ function now checks if the feature has bucket to be aggregated and returns the slice key of the appropriate interval in the positive case. It uses the value of the feature as key otherwise (just as before).

`numBuckets` controls in how many buckets the range of values for a feature should be divided. This number is also used to avoid aggregating in buckets when we have less unique values than buckets.

When slicing by numeric features aggregated in buckets, displaying the buckets in order is a particularly useful feature. Alphabetical ordering is not enough for this purpose: we need to parse the number in the interval and order it by its value. Also, let's make this ordering the default one when aggregating.

jameswex

thanks for this! really excited for this feature. some minor comments

jameswex · 2019-09-19T13:34:40Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+          numBuckets: {
+            type: Number,
+            value: 10,
+          },


can you add this as a numeric paper-input element below the slice by dropdown (when a numeric feature is selected) with a range of 2-10 and default value of 2? can use a <template is="dom-if...> wrapping the element so it only appears in that case. same thing with another dropdown under the secondary slice if that is numeric too. then ensure the buckets are recalculated whenever numPrimaryBuckets or numSecondaryBuckets changes.

jameswex · 2019-09-19T13:35:05Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

            type: Boolean,
            value: false,
          },
+          featureBucketEdges_: Object,


add short comment on what it is

jameswex · 2019-09-19T13:38:13Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-          const feature1 = this.selectedBreakdownFeature;
-          if (feature1 == '') {
-            this.selectedSecondBreakdownFeature = '';
-          }


probably still want this logic here, which clears out the secondary slice selection when the primary slice selection is cleared. otherwise if you have both slices set, then clear the primary, the view goes back to showing only overall stats, but then when you select a slice again, the secondary slice is already selected as whatever it was last and you don't just see the primary slice you selected.

Absolutely. Thanks for catching this!

jameswex · 2019-09-19T13:42:04Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-                  }
-                }
+          this.visdata.forEach(
+            function(item) {


use arrow notation to simplify this, remove need for .bind(this) :
this.visdata.forEach(item => { blah blah.... return blah;});

jameswex · 2019-09-19T13:45:25Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            if (edges) {
+              for (let i = 1; i < edges.length; i++) {
+                if (
+                  example[feature] < edges[i] ||


how does this handle when an example is missing the feature value for the selected feature?

When example[feature] is undefined, this condition is false and we end up returning undefined (after this loop). This results in a key called "undefined" where all these values are grouped.
I think I can make it more explicit and human-friendly, though, by treating it separately.

jameswex · 2019-09-19T13:46:50Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            }.bind(this)
+          );
+          const thresholds = Object.values(thresholdsMap);
+          this.set('selectedFeatureSort', 'Alphabetical');


why reset the sort order when slices change? i can imagine wanting to keep viewing slices by something like false positive rate as i change slicing criteria.

Sure. It makes sense.

jameswex · 2019-09-19T13:47:50Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+         * Calculate edges between buckets for aggregating numeric features.
+         * We do this beforehand to round numbers and avoid ugly interval labels.
+         */
+        calculateFeatureBucketEdges_: function() {


* Add explanation to `featureBucketEdges_`. * Treat unselection of slicing feature. * Use arrow function to simplify `forEach` using outer `this`.

When slicing by a feature, group missing values in a '?' key.

When slicing by a numeric feature, values can be aggregated to buckets. We're adding an input field to control in how many buckets the values will be aggregated. This only appears for numeric features.

grovina

There you go, James :)

grovina · 2019-09-19T14:03:28Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-          const feature1 = this.selectedBreakdownFeature;
-          if (feature1 == '') {
-            this.selectedSecondBreakdownFeature = '';
-          }


Absolutely. Thanks for catching this!

grovina · 2019-09-19T14:04:04Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

            type: Boolean,
            value: false,
          },
+          featureBucketEdges_: Object,


grovina · 2019-09-19T14:06:35Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

-                  }
-                }
+          this.visdata.forEach(
+            function(item) {


grovina · 2019-09-19T14:07:36Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            }.bind(this)
+          );
+          const thresholds = Object.values(thresholdsMap);
+          this.set('selectedFeatureSort', 'Alphabetical');


Sure. It makes sense.

grovina · 2019-09-19T14:07:54Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+         * Calculate edges between buckets for aggregating numeric features.
+         * We do this beforehand to round numbers and avoid ugly interval labels.
+         */
+        calculateFeatureBucketEdges_: function() {


grovina · 2019-09-19T14:12:21Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+            if (edges) {
+              for (let i = 1; i < edges.length; i++) {
+                if (
+                  example[feature] < edges[i] ||


When example[feature] is undefined, this condition is false and we end up returning undefined (after this loop). This results in a key called "undefined" where all these values are grouped.
I think I can make it more explicit and human-friendly, though, by treating it separately.

grovina · 2019-09-19T15:39:19Z

...ractive_inference/tf_interactive_inference_dashboard/tf-interactive-inference-dashboard.html

+          numBuckets: {
+            type: Number,
+            value: 10,
+          },


jameswex · 2019-09-19T16:38:00Z

Thanks. Only issue is that the you don't necessary end up with equally-sized intervals when slicing by a numeric feature. Slice by age with 4 buckets and some buckets are width 10 and some width 20 (and the initial one is width 13 due to the min age of 17. Can the logic be adjusted so they all end up of width 10 or 20? Or we would need to document this in the tool to avoid confusion

Make it slightly more precise than the order of magnitude of the standard deviation for the values of that feature.

grovina · 2019-09-19T16:58:17Z

Thanks. Only issue is that the you don't necessary end up with equally-sized intervals when slicing by a numeric feature. Slice by age with 4 buckets and some buckets are width 10 and some width 20 (and the initial one is width 13 due to the min age of 17. Can the logic be adjusted so they all end up of width 10 or 20? Or we would need to document this in the tool to avoid confusion

Hey James, this is purely due to rounding the edges of the buckets. Increasing precision by 1 order of magnitude should solve it:

Note that the 3rd bucket ([49, 64)) has width 15, and the others 16: that's pretty much as good as it gets for reasonably rounded numbers. What that rounding does is to make sure these oscillations are smaller than the order of magnitude of the std. deviation (actually 1 order below it after that last commit).

jameswex · 2019-09-19T17:01:58Z

@nfelt please review or assign for review

jameswex · 2019-09-20T18:23:49Z

@stephanwlee please review or assign for review, thx

nfelt · 2019-09-23T18:49:50Z

@jameswex is there a reason this needs an additional review? I think your approval is sufficient to merge this since it only affects WIT code.

jameswex · 2019-09-23T19:58:20Z

@jameswex is there a reason this needs an additional review? I think your approval is sufficient to merge this since it only affects WIT code.

Historically we've given TB a chance to review our code before we commit to the repo, but I'm ok merging this with just my approval if you are.

nfelt · 2019-09-23T20:26:41Z

Fair enough - I appreciate the notice! But I think going forward it's fine to merge after your review for PRs that only affect the WIT plugin codebase. Of course if there are bigger structural changes or cases where you want specific reviewer expertise you're still welcome to request review, but don't feel like you need to be blocked on a TB core team reviewer for changes like this one.

grovina added 7 commits September 17, 2019 10:18

Store whether feature is numeric in distance stats

62a8e59

This is a new boolean field `isNumeric` in `distanceStats_` to allow us to check is a feature is numeric instead of looking for the existance of `.stdDev`.

Add auxiliar function to deep clone objects

937bb0e

Add variable to configure slicing aggregation

0b1b185

`numBuckets` controls in how many buckets the range of values for a feature should be divided. This number is also used to avoid aggregating in buckets when we have less unique values than buckets.

googlebot added the cla: yes label Sep 19, 2019

Merge branch 'master' into grovina-wit-aggregate-numeric

9e051f4

jameswex reviewed Sep 19, 2019

View reviewed changes

grovina added 3 commits September 19, 2019 15:15

Address simple PR comments

d9c5139

* Add explanation to `featureBucketEdges_`. * Treat unselection of slicing feature. * Use arrow function to simplify `forEach` using outer `this`.

Handle missing values more explicitly when slicing

2f09050

When slicing by a feature, group missing values in a '?' key.

Add inputs to control number of slicing buckets

0b4f860

When slicing by a numeric feature, values can be aggregated to buckets. We're adding an input field to control in how many buckets the values will be aggregated. This only appears for numeric features.

grovina commented Sep 19, 2019

View reviewed changes

Increase precision in bucket edges rounding

aa86a13

Make it slightly more precise than the order of magnitude of the standard deviation for the values of that feature.

jameswex approved these changes Sep 19, 2019

View reviewed changes

jameswex requested a review from nfelt September 19, 2019 17:01

jameswex requested a review from stephanwlee September 23, 2019 14:06

stephanwlee requested review from psybuzz and removed request for stephanwlee September 23, 2019 15:04

Merge branch 'master' into grovina-wit-aggregate-numeric

5e075b6

jameswex approved these changes Sep 24, 2019

View reviewed changes

jameswex merged commit 8ecff39 into tensorflow:master Sep 24, 2019

Aggregate numeric values when slicing (What-If Tool) #2678

Aggregate numeric values when slicing (What-If Tool) #2678

Uh oh!

Conversation

grovina commented Sep 19, 2019

Uh oh!

jameswex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grovina left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameswex commented Sep 19, 2019

Uh oh!

grovina commented Sep 19, 2019

Uh oh!

jameswex commented Sep 19, 2019

Uh oh!

jameswex commented Sep 20, 2019

Uh oh!

nfelt commented Sep 23, 2019

Uh oh!

jameswex commented Sep 23, 2019

Uh oh!

nfelt commented Sep 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants