KEMBAR78
feat: Update the AccessEntry class with a new condition attribute and unit tests by chalmerlowe · Pull Request #2163 · googleapis/python-bigquery · GitHub
Skip to content

Conversation

@chalmerlowe
Copy link
Collaborator

@chalmerlowe chalmerlowe commented Apr 16, 2025

Updates the AccessEntry class with a new condition attribute and unit tests.

Specifically:

  • Adds a condition attribute to the AccessEntry class
  • Updates the pre-existing Condition class with several dunder methods (eq, hash, etc)
  • Adds unit tests to put the condition setters, getters, etc through their paces
  • Adds unit tests to exercise the new Condition dunder methods

This work is in support of internal bug: b/330869964

@chalmerlowe chalmerlowe requested review from a team as code owners April 16, 2025 19:49
@chalmerlowe chalmerlowe requested a review from suzmue April 16, 2025 19:49
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Apr 16, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Apr 16, 2025
@tswast tswast self-requested a review April 18, 2025 18:36
Comment on lines 534 to 538
# The api_repr for an AccessEntry object is expected to be a dict with
# only a few keys. Two keys that may be present are role and condition.
# Any additional key is going to have one of ~eight different names:
# userByEmail, groupByEmail, domain, dataset, specialGroup, view,
# routine, iamMember
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. Why are we leaving out some data?

Couldn't we do:

access_entry = cls()
access_entry._properties = resource.copy()
return access_entry

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast

Thanks for this question. I am open to other approaches.

This approach is only slightly modified from the existing code. The existing code attempted to account for a peculiarity in how we assign values to the "possible" attributes that an AccessEntry "needs to have" and "might have" based on the way we have defined it.

I tweaked the existing code to maintain backwards compatibility and avoid rewrites in things like existing unittests, where possible.

The api_repr for an AccessEntry object is composed of the following keys (internal link):

  1. ROLE
  2. ONE OF THE FOLLOWING (and as I understand it, only one) key:value pairs (as listed in the proto above):
    a. userByEmail: some_value
    b. groupByEmail: some_value
    c. domain: some_value
    d. specialGroup: some_value
    e. IamMember: some_value
    f. view: some_value
    g. routine: some_value
    h. dataset: some_value
  3. CONDITION

Each of the three items above is stored in the _properties dict (in one form or another, see below).

Depending on how the AccessEntry object gets configured, each of a-h above MAY also be associated with a corresponding attribute (ie access_entry.domain, access_entry.user_by_email) but this is not enforced.

Which element a through h will be provided as part of the api_repr is an unknown.

Despite the presence of one of a through h, to instantiate the AccessEntry class users would need to provide role, entity_type, and entity_id or rely on the defaults. Thus since we received a key a through h there must be a way to translate the key to entity_type and to translate from the value associated with that key to entity_id.

Once the AccessEntry object is created users also have the ability to populate other attributes that align on a one-to-one basis with item a through h by using a setter. (This appears to be included as a convenience feature, does NOT happen automagically during initialization, and is not a strict requirement).

If we receive an api_repr resource we use the logic in .from_api_repr to try and extract the a through h key (without which one we got) to assign it to entity_type and then assign the value to entity_id.

We first pop the "role" if it exists and we pop the "condition" if it exists and the only thing that should be left in the dict is the single element a or b or c, etc. We have no expectation that there will be an additional key:value pair in the dictionary. Thus we do not expect to be dropping any values.

If my interpretation of how this works is wrong, please let me know.

An alternative approach could be to do a key lookup for any key in a set of eight keys, but that seems less resilient than the pop method (ie if a ninth key gets added to the API definition, the current PR still works).

Copy link
Contributor

@tswast tswast Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach is still quite brittle. If/when the backend adds additional properties, similar to how condition is being added now, the user will have no way of accessing those, as _properties isn't being saved. This can especially be a problem when doing a round trip of downloading access entries, modifying them and re-uploading them. Losing a property like condition (or whatever new thing comes in future) can have security consequences.

For these reasons, I strongly encourage changing from_api_repr to my proposal

access_entry = cls()
access_entry._properties = resource.copy()
return access_entry

and modify entity_type to look at _properties for keys other than condition and role. In the case of two or more remaining keys, we can perhaps use an allow list of the entity types we know about to account for some new condition-like property in future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast PTAL and approve, as appropriate.

I updated the _from_api_repr() method.
And fixed a number of downstream elements that relied upon the output of the original version of that method.

Comment on lines 1225 to 1229
return (
self.expression == other.expression
and self.title == other.title
and self.description == other.description
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could do the _key() trick here too and in __hash__ to avoid a bit of duplication / potential errors if any new properties are added.

exp_resource = {
"role": None,
"dataset": {
"dataset": DatasetReference("my-project", "my_dataset"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not great. What happens when we convert this resource to JSON and send it to the REST API? I would expect that json.dumps or the equivalent would fail on such on object.

Are we missing something in the @dataset.setter? Looks like we need a clause like

        if isinstance(value, DatasetReference):
            value = value.to_api_repr()

@tswast tswast self-requested a review April 29, 2025 13:03
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Chalmer. This leaves the code base in much better shape than it started. 🏕️

Comment on lines +497 to +499
if self.entity_type:
entity_type = self.entity_type
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: This could use the "walrus operator" to save a call to the entity_type property.

Suggested change
if self.entity_type:
entity_type = self.entity_type
else:
if not (entity_type := self.entity_type):

@chalmerlowe chalmerlowe merged commit 7301667 into main Apr 29, 2025
18 checks passed
@chalmerlowe chalmerlowe deleted the feat-b330869964-update-accessentry-class branch April 29, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants