KEMBAR78
Better Drive files download failure · Issue #1482 · tensorflow/datasets · GitHub
Skip to content

Better Drive files download failure #1482

@Conchylicultor

Description

@Conchylicultor

Download of drive urls sometimes fails with NonMatchingChecksumError: Artifact https://drive.google.com/... has wrong checksum.

Explanation: Drive sometimes reject the download attempt, and the rejection page is downloaded instead of the data:

  • If the user is based in china (should use VPN)
  • If there is too many downloads of the same file.

The best solution currently is to manually download the data (https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails), rather than using the automated download which got rejected by drive.

Otherwise:

  • Try the download latter on.
  • Try on a different computer
  • Rather than downloading the file in each colab connection, load the dataset from a GCS bucket. See instructions.

Not sure there can be a solution on Google Drive side, while preventing abuse.
On TFDS side, we could make the error message more explicit when we detect a drive URL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions