KEMBAR78
fix(webdataset): don't .lower() field_name by YassineYousfi · Pull Request #7726 · huggingface/datasets · GitHub
Skip to content

Conversation

@YassineYousfi
Copy link
Contributor

This fixes cases where keys have upper case identifiers

@YassineYousfi YassineYousfi changed the title webdataset: consistent .lower() for keys fix(webdataset): consistent .lower() for keys Aug 5, 2025
@YassineYousfi YassineYousfi changed the title fix(webdataset): consistent .lower() for keys fix(webdataset): don't .lower() for keys Aug 5, 2025
@YassineYousfi YassineYousfi changed the title fix(webdataset): don't .lower() for keys fix(webdataset): don't .lower() field_name Aug 5, 2025
@YassineYousfi
Copy link
Contributor Author

fixes: #7732

data_extension = field_name.split(".")[-1]
data_extension = field_name.split(".")[-1].lower()
if data_extension in cls.DECODERS:
current_example[field_name] = cls.DECODERS[data_extension](current_example[field_name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need it lowered to check if it's in cls.DECODERS no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the data_extension is lowered but the field_name is not in the proposed fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes !

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) can you just run make style before we merge ?

this will fix the code formatting for the CI

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq
Copy link
Member

lhoestq commented Aug 20, 2025

CI failures are unrelated, merging :)

@lhoestq lhoestq merged commit 896616c into huggingface:main Aug 20, 2025
5 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants