KEMBAR78
feat: safely resume interrupted downloads by cojenco · Pull Request #294 · googleapis/google-resumable-media-python · GitHub
Skip to content

Conversation

@cojenco
Copy link
Contributor

@cojenco cojenco commented Jan 18, 2022

If a retryable error occurs mid-download, the download starts sending data to the stream from the offset_of_last_byte_received rather than starting from the beginning of the file, and resolves data integrity issues.

  • for interruped downloads, safely resume by reading from offset_of_last_byte_received using a ranged get request and include object generation URL query parameter to make sure the same object content is requested
  • adds support for download instances to track information such as object_generation and bytes downloaded
  • adds tests

Fixes #284

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/google-resumable-media-python API. label Jan 18, 2022
@cojenco cojenco marked this pull request as ready for review January 19, 2022 17:37
@cojenco cojenco requested review from a team as code owners January 19, 2022 17:37
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple thoughts, generally looking really good to me!

General comment, can you give some more details about how you tested this out with the emulator?

# data corruption for that byte range alone.
if self._expected_checksum is None and self._checksum_object is None:
# `_get_expected_checksum()` may return None even if a checksum was
# requested, in which case it will emit an info log _MISSING_CHECKSUM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes this case to happen? Transcoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to retried requests being range requests. For range requests, as noted here, there's no way to detect data corruption for that byte range alone.

Therefore, here we retrieve the expected checksum/checksum object only once for the initial download request. Then we calculate and validate the checksum when the download completes.

if self._stream is not None:
request_kwargs["stream"] = True

# Assign object generation if generation is specified in the media url.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this happen via a user specifying a generation on the object? Were we not respecting this previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this would happen via a user specifying a generation on the object. Previously, we've been respecting that only through download.media_url

A property download._object_generation is added. It records the object generation from either (1) generation query param from the media_url, or (2) the object generation from the initial response header. This specific line of code does (1) and retrieves it from the media_url

P.S. It's tricky in how limited information is passed from python-storage to resumable-media-python. A resumable-media-python download instance only knows the specified object generation from its media_url, and the "object" itself isn't pertained in a download.


self._process_response(result)

# With decompressive transcoding, GCS serves back the whole file regardless of the range request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be highlighted as a shortcoming in the decompressive transcoding docs-- not being able to resume a download may be costly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned in the very bottom section of the decompressive transcoding docs. I agree we can add notes on how retries may be impacted in this sense.

@cojenco
Copy link
Contributor Author

cojenco commented Jan 24, 2022

Couple thoughts, generally looking really good to me!

General comment, can you give some more details about how you tested this out with the emulator?

Thanks for the review! I've added data integrity checks and test cases to the retry conf test (open PR). The changes in this PR are tested against the testbench using above-mentioned tests.

Before the changes, conformance tests fail as below. The conf tests pass running locally against the changes made in this PR.

  File "/tmpfs/src/github/python-storage/tests/conformance/test_conformance.py", line 93, in blob_download_as_bytes
    assert stored_contents == payload
AssertionError: assert b'ThisThisThi... text file.\n' == b'This is a s... text file.\n'
  At index 4 diff: b'T' != b' '
  Full diff:
  - b'This is a simple text file.\n'
  ?       ^
  + b'ThisThisThis is a simple text file.\n'
  ?       ^^  +++++++
=========================== short test summary info ============================
FAILED tests/conformance/test_conformance.py::test-S8-storage.objects.get-blob_download_to_filename-0
FAILED tests/conformance/test_conformance.py::test-S8-storage.objects.get-client_download_blob_to_file-0
FAILED tests/conformance/test_conformance.py::test-S8-storage.objects.get-blob_download_as_bytes-0
FAILED tests/conformance/test_conformance.py::test-S8-storage.objects.get-blobreader_read-0
FAILED tests/conformance/test_conformance.py::test-S8-storage.objects.get-blob_download_as_text-0
5 failed, 555 passed, 5 skipped, 7 warnings in 287.55s (0:04:47)
nox > Command py.test -n auto --quiet tests/conformance failed with exit code 1
nox > Session conftest_retry-3.8 failed.

@tritone
Copy link
Contributor

tritone commented Feb 10, 2022

This is looking really good in general. Based on offline discussion I would recommend moving the decompressive transcoding feature to a TODO and moving ahead with the rest of this PR. There may be some details that take a while to resolve for transcoding and it's important that we still move ahead with the rest of this PR which is a major fix to retry logic for downloads.

Copy link
Contributor

@andrewsg andrewsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @tritone comment resolutions. Thank you!

@cojenco
Copy link
Contributor Author

cojenco commented Feb 11, 2022

Thanks Chris and Andrew! I've moved the transcoding feature, tracking in #303

@cojenco cojenco added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 11, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 11, 2022
@cojenco cojenco merged commit b363329 into googleapis:main Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/google-resumable-media-python API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stream reset on retry

3 participants