KEMBAR78
Document the HF_DATASETS_CACHE environment variable in the datasets cache documentation by Harry-Yang0518 · Pull Request #7532 · huggingface/datasets · GitHub
Skip to content

Conversation

@Harry-Yang0518
Copy link
Contributor

This pull request updates the Datasets documentation to include the HF_DATASETS_CACHE environment variable. While the current documentation only mentions HF_HOME for overriding the default cache directory, HF_DATASETS_CACHE is also a supported and useful option for specifying a custom cache location for datasets stored in Arrow format.

This addition is based on the discussion in (#7457), where users noted the absence of this variable in the documentation despite its functionality. The update adds a new section to cache.mdx that explains how to use HF_DATASETS_CACHE with an example.

This change aims to improve clarity and help users better manage their cache directories when working in shared environments or with limited local storage.

Closes #7457.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq
Copy link
Member

lhoestq commented Apr 28, 2025

Your clarification in your comment at #7480 (comment) sounds great, would you like to update this PR to include it ?

@Harry-Yang0518
Copy link
Contributor Author

Hi @lhoestq, I’ve updated the documentation to reflect the clarifications discussed in #7480. Let me know if anything else is needed!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

I also too the liberty to remove unnecessary \

@lhoestq lhoestq merged commit b1bfe15 into huggingface:main May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document the HF_DATASETS_CACHE env variable

3 participants