-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix polars cast column image #7800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix polars cast column image #7800
Conversation
|
The Image() type is set to have a storage of Maybe we can convert |
|
@lhoestq thanks for the review. Just to be thorough I checked the concat example and this seems to work: import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
import pandas as pd
import polars as pl
from datasets import Dataset, Image, concatenate_datasets
import pyarrow as pa
image_path = "tests/features/data/test_image_rgb.jpg"
df_pl = pl.DataFrame({"image": [image_path]})
dset_pl = Dataset.from_polars(df_pl).cast_column("image", Image())
df_pd = pd.DataFrame({"image": [image_path]})
dset_pd = Dataset.from_pandas(df_pd).cast_column("image", Image())
concatenated = concatenate_datasets([dset_pl, dset_pd])
print(concatenated._data)outputs: ConcatenationTable
image: struct<bytes: binary, path: string>
child 0, bytes: binary
child 1, path: string
----
image: [
-- is_valid: all not null
-- child 0 type: binary
[null]
-- child 1 type: string
["tests/features/data/test_image_rgb.jpg"],
-- is_valid: all not null
-- child 0 type: binary
[null]
-- child 1 type: string
["tests/features/data/test_image_rgb.jpg"]](not quite sure though if this is a really what you meant). I agree that there could be pro a lot of problems if we rely on implicit conversion therefore I updated the PR. I also checked the exception handling locally and it works, am unsure though if we want to create such large objects in the CI, if desired I can add a test for that. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! LGTM :)
This reverts commit aa7f2a9.
|
Apologies @lhoestq @CloseChoice , I unintentionally reverted this PR earlier. Leaving it as is. |
Fixes #7765
The problem here is that polars uses pyarrow large_string for images, while pandas and others just use the string type. This PR solves that and adds a test.
Outputs: