KEMBAR78
feat: Provide option to have multiple null markers · Issue #830 · googleapis/python-bigquery · GitHub
Skip to content

feat: Provide option to have multiple null markers #830

@dshinzie

Description

@dshinzie

Currently, load jobs only take in a single string value in the null_marker param. It would be great if this could take in an array of strings, as there could be multiple string representations that should loaded in as null e.g.

LoadJobConfig(
    schema=schema,
    skip_leading_rows=1,
    source_format=SourceFormat.CSV,
    write_disposition=WriteDisposition.WRITE_TRUNCATE,
    null_marker=['NA', '', '#NA', 'na']
)

An alternative that many have proposed is loading all your data into one table, and having some process that "cleans" your data and properly loads the correct values into a 2nd, final table. Obviously this will work but takes some time, and having multiple null markers seems like a simple solution that can solve problems with loading CSVs into BigQuery with specific schemas (e.g. having a numeric data type field).

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.externalThis issue is blocked on a bug with the actual product.status: blockedResolving the issue is dependent on other work.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions