KEMBAR78
ValueError encountered when to_dataframe returns empty resultset with JSON field · Issue #1580 · googleapis/python-bigquery · GitHub
Skip to content

ValueError encountered when to_dataframe returns empty resultset with JSON field #1580

@jdub55

Description

@jdub55

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ python --version
Python 3.9.0
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip --version
pip 23.1.2 from /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages/pip (python 3.9)
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip show google-cloud-bigquery
Name: google-cloud-bigquery
Version: 3.11.0
Summary: Google BigQuery API client library
Home-page: https://github.com/googleapis/python-bigquery
Author-email: googleapis-packages@google.com
Location: /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages
Requires: google-api-core, google-cloud-core, google-resumable-media, grpcio, packaging, proto-plus, protobuf, python-dateutil, requests
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip show google-cloud-bigquery-storage
Name: google-cloud-bigquery-storage
Version: 2.20.0
Summary: Google Cloud Bigquery Storage API client library
Home-page: https://github.com/googleapis/python-bigquery-storage
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages
Requires: google-api-core, proto-plus, protobuf

(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip list
Package                       Version
----------------------------- --------
cachetools                    5.3.1
certifi                       2023.5.7
charset-normalizer            3.1.0
db-dtypes                     1.1.1
google-api-core               2.11.0
google-auth                   2.19.1
google-cloud-bigquery         3.11.0
google-cloud-bigquery-storage 2.20.0
google-cloud-core             2.3.2
google-crc32c                 1.5.0
google-resumable-media        2.5.0
googleapis-common-protos      1.59.0
grpcio                        1.55.0
grpcio-status                 1.55.0
idna                          3.4
numpy                         1.24.3
packaging                     23.1
pandas                        2.0.2
pip                           23.1.2
proto-plus                    1.22.2
protobuf                      4.23.2
pyarrow                       12.0.0
pyasn1                        0.5.0
pyasn1-modules                0.3.0
python-dateutil               2.8.2
pytz                          2023.3
requests                      2.31.0
rsa                           4.9
setuptools                    67.7.2
six                           1.16.0
tzdata                        2023.3
urllib3                       1.26.16
wheel                         0.40.0

Steps to reproduce

When bigquery return empty datase. bigquery python sdk will need to transfer [bq schema to arrow schema

https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/table.py#L1844-L1853),

so it will execute bq_to_arrow_data_type
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L225-L246

Looks like currently this no JSON type mapping]:
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L147-L162)

Code example

Error:

/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:267:
UserWarning: Unable to determine type for field 'json'.

warnings.warn("Unable to determine type for field
'{}'.".format(bq_field.name))

Traceback (most recent call last):

File "/home/jupyter/smapi/test_bq.py", line 12, in <module>

df = bqexecutor.execute_query(sql)

File "/home/jupyter/smapi/smapi/core/data_providers/sql/bq_executor.py",
line 119, in execute_query

querydf = rows.to_dataframe()

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 2151, in to_dataframe

progress_bar_type=progress_bar_type,

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 1840, in to_arrow

return pyarrow.Table.from_batches(record_batches)

File "pyarrow/table.pxi", line 3747, in pyarrow.lib.Table.from_batches

ValueError: Must pass schema, or at least one RecordBatch

Python version: Python 3.9.0

package version:

google-cloud-bigquery 3.10.0

google-cloud-bigquery-storage 2.19.1

pyarrow 8.0.0

pandas 1.5.3

How to reproduce?

test_bq.py

python

from google.cloud import bigquery


client = bigquery.Client()


table_id = "gdw-dev-gdml.abehsu.reproduce_issue"


# Define schema

schema = [

bigquery.SchemaField("field1", "STRING", mode="REQUIRED"),

bigquery.SchemaField("field2", "JSON", mode="REQUIRED"),

]


table = bigquery.Table(table_id, schema=schema)

table = client.create_table(table) # Make an API request.

print(

"Created table {}.{}.{}".format(table.project, table.dataset_id,
table.table_id)

)



# Perform a query.

QUERY = (

f"""

select * from {table_id}

"""

)

query_job = client.query(QUERY) # API request

df = query_job.to_dataframe()


print(df)

python test_bq.py

Created table gdw-dev-gdml.abehsu.reproduce_issue

Stack trace




/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:267:
UserWarning: Unable to determine type for field 'field2'.

warnings.warn("Unable to determine type for field
'{}'.".format(bq_field.name))

Traceback (most recent call last):

File "/home/jupyter/smapi/test_bq.py", line 41, in <module>

df = query_job.to_dataframe()

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/job/[query.py](https://query.py/)",
line 1800, in to_dataframe

return query_result.to_dataframe(

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 2150, in to_dataframe

record_batch = self.to_arrow(

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 1848, in to_arrow

return pyarrow.Table.from_batches(record_batches, schema=arrow_schema)

File "pyarrow/table.pxi", line 3747, in pyarrow.lib.Table.from_batches

ValueError: Must pass schema, or at least one RecordBatch


Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p3Desirable enhancement or fix. May not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions