KEMBAR78
feat: Add dtype parameters to to_geodataframe functions by chalmerlowe · Pull Request #2176 · googleapis/python-bigquery · GitHub
Skip to content

Conversation

@chalmerlowe
Copy link
Collaborator

@chalmerlowe chalmerlowe commented May 12, 2025

This change adds support for bool_dtype, int_dtype, float_dtype, and string_dtype parameters to the to_geodataframe method in RowIterator and QueryJob.

These parameters allow you to specify the desired pandas dtypes for boolean, integer, float, and string columns when converting BigQuery results to GeoDataFrames.

The changes include:

  • Updating RowIterator.to_geodataframe to accept and pass these dtype parameters to the underlying to_dataframe method.
  • Updating QueryJob.to_geodataframe to accept and pass these dtype parameters to the underlying RowIterator.to_geodataframe method.
  • Adding unit tests to verify the correct handling of these parameters.

Similar to #1529
Fixes #1902 🦕

This change adds support for `bool_dtype`, `int_dtype`, `float_dtype`, and `string_dtype` parameters to the `to_geodataframe` method in `RowIterator` and `QueryJob`.

These parameters allow you to specify the desired pandas dtypes for boolean, integer, float, and string columns when converting BigQuery results to GeoDataFrames.

The changes include:
- Updating `RowIterator.to_geodataframe` to accept and pass these dtype parameters to the underlying `to_dataframe` method.
- Updating `QueryJob.to_geodataframe` to accept and pass these dtype parameters to the underlying `RowIterator.to_geodataframe` method.
- Adding unit tests to verify the correct handling of these parameters.
@chalmerlowe chalmerlowe requested review from a team as code owners May 12, 2025 11:41
@chalmerlowe chalmerlowe requested a review from Neenu1995 May 12, 2025 11:41
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label May 12, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label May 12, 2025
@chalmerlowe chalmerlowe assigned tswast and unassigned chelsea-lin May 14, 2025

# autodoc/autosummary flags
autoclass_content = "both"
autodoc_default_options = {"members": True, "inherited-members": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this change. What inherited members were causing problems? IIRC, there's a few methods defined in the base class for jobs that we want to make sure are documented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change re: autodoc_default_options in conf.py was added by Owlbot.
Same thing for the removal of:
"google/cloud/bigquery_v2/**", # Legacy proto-based types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there plans to restore the missing docs, such as reservation and job_timeout_ms on the *JobConfig classes?

"matplotlib == 3.9.2; python_version == '3.9'",
"matplotlib >= 3.10.3; python_version >= '3.10'",
]
tqdm = ["tqdm >= 4.23.4, < 5.0.0"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No action required] I'm curious. What forced the tqdm upgrade? 4.23.4 is still quite old, so I'm OK with this. I don't think we need to support folks who are stuck in 2016 for 4.7.4.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.7.4 produced an error during unit tests that indicated something to the effect of:

  • an attribute was not present (or something similar - sorry I don't recall all the specifics of all the errors I tried to resolve). When I searched for the error the identified cause was that the older versions of tqdm did not include that attribute and it was necessary to upgrade.

I opted for 4.23.4 because it is the same version we are using in python-bigquery-pandas.

chalmerlowe and others added 2 commits May 14, 2025 12:02
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
@chalmerlowe chalmerlowe merged commit ebfd0a8 into main May 14, 2025
18 checks passed
@chalmerlowe chalmerlowe deleted the feat-geodataframe-dtypes branch May 14, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support string_dtype, etc. in to_geodataframe

4 participants