KEMBAR78
Fix small bugs with async map by lhoestq · Pull Request #7445 · huggingface/datasets · GitHub
Skip to content

Conversation

@lhoestq
Copy link
Member

@lhoestq lhoestq commented Mar 11, 2025

helpful for the next PR to enable parallel image/audio/video decoding and make multimodal datasets go brr (e.g. for lerobot)

  • fix with_indices
  • fix resuming with save_state_dict() / load_state_dict() - omg that wasn't easy
  • remove unnecessary decoding in map() to enable parallelism in FormattedExampleIterable later

small bonus: keeping features in batch()

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq lhoestq merged commit f09db01 into main Mar 13, 2025
12 of 15 checks passed
@lhoestq lhoestq deleted the fix-async-map-resuming branch March 13, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants