KEMBAR78
Add support for downloading a directory as a compressed tarball by ekangmonyet · Pull Request #544 · static-web-server/static-web-server · GitHub
Skip to content

Conversation

@ekangmonyet
Copy link
Contributor

@ekangmonyet ekangmonyet commented May 21, 2025

Description

Support downloading a directory as a compressed tarball.

  • Respond with compressed tarball when requested file is a directory and query has download-archive parameter set.
  • When listing said <directory>, add a link with href ?download-archive=1 for user to download all content in directory as compressed tarball (see screenshot).

Discussion & other notes

  • Is the "virtual file" approach appropriate? What should the server do when there is an existing .tar.gz file?
    • A radical alternative is to introduce an /api/ group, which may also be used by other non-static-file features such as health endpoint
    • Query param seems to be a more versatile and robust solution, e.g. /<directory>/?download-archive=gzip
    • UPDATE: I feel like query param is the sanest way to go, unless I missed any other caveats or considerations.
  • Since this will technically "list directory", if an option is used to toggle this feature, it should respect the directory listing option and possibly warn user/error on conflicting combination.
  • TODO: set follow_symlinks() option for tar builder accordingly
  • TODO: refactor and cleanup
  • TODO: feature flag and runtime options (WIP)
  • TODO: test: dir list index - download link follows dir download opt
  • TODO: test: dir download - normal, no follow_symlinks
  • TODO: verify config file
  • TODO: handle ignore hidden file
    • handle recursive append dir ourselves
  • TODO: polish testcase
  • TODO: content disposition unicode handling Apparently, non-ASCII is allowed by standard, and coincidentally there is a bug that does not validate the string the way it is documented, so everything works! see: HeaderValue::from_str does not validate that the string is valid ascii hyperium/http#519

Related Issue

implement #67

Motivation and Context

This feature allows user to download the content of the entire directory all at once.

How Has This Been Tested?

  • Added download_targz, download_targz_no_symlinks testcase to check downloaded tarball content against directory content
  • Added test to ensure download link exists when option is enabled, and ensure it doesnt exists when disabled

Screenshots (if appropriate):

(Final implementation)
image

(Previous POC)
image

Interface of this feature on other system

Github
A button in dropdown menu.
image

Jenkins
A link below list of files, separated from the list itself
image

@semanticdiff-com
Copy link

semanticdiff-com bot commented May 21, 2025

Review changes with  SemanticDiff

Changed Files
File Status
  src/directory_listing.rs  1% smaller
  Cargo.lock Unsupported file format
  Cargo.toml Unsupported file format
  docs/content/configuration/command-line-arguments.md Unsupported file format
  docs/content/configuration/config-file.md Unsupported file format
  docs/content/configuration/environment-variables.md Unsupported file format
  docs/content/features/directory-listing.md Unsupported file format
  src/directory_listing_download.rs  0% smaller
  src/handler.rs  0% smaller
  src/lib.rs  0% smaller
  src/server.rs  0% smaller
  src/settings/cli.rs  0% smaller
  src/settings/file.rs  0% smaller
  src/settings/mod.rs  0% smaller
  src/static_files.rs  0% smaller
  src/testing.rs  0% smaller
  tests/dir_listing.rs  0% smaller
  tests/dir_listing_download.rs  0% smaller
  tests/static_files.rs  0% smaller

@ekangmonyet
Copy link
Contributor Author

This is an early draft of the feature, implementing a prototype for #67.

Disclaimer: I do not have much experience in async Rust (or even that much experience in Rust in general), my implementation might be subpar/unidiomatic. All kind of feedback is very much appreciated.

@ekangmonyet ekangmonyet marked this pull request as draft May 21, 2025 11:40
@joseluisq
Copy link
Collaborator

joseluisq commented May 24, 2025

Sorry for the delay, here are a few comments on the feature. I will add more if needed during the next days.
Regarding the code, I will comment on that directly.

But let me know what your thoughts are on this.

--

  • UPDATE: I feel like query param is the sanest way to go, unless I missed any other caveats or considerations.

Agree, let's go with a query param for the directory download.
I suggest this query: /?download that will be the short version of /?download=tarball. Let's support only the first (/?download) for now. If we want to add zip or another format in the future then we can just extend the query to be like the second (/?download=tarball|zipball|etc).

  • Since this will technically "list directory", if an option is used to toggle this feature, it should respect the directory listing option and possibly warn user/error on conflicting combination.

What do you mean exactly by "conflicting combination"?
The question is, do we want to make this directory download feature dependent on the directory listing, or should this feature be standalone and also complement the directory listing instead (if enabled)?

For example, I think this feature will be behind an option flag like directory-download=tarball (just an idea), but it could work regardless of the directory listing. Of course, if the directory listing is enabled with the download, we have to display the link in the HTML template.

About the link text on the directory listing.

I believe we can just say "download tarball" or "download tar.gz" whatever fits better. Something like:

image

@ekangmonyet
Copy link
Contributor Author

Sorry for the delay

Hey, please take your time, don't worry about it! Thanks for the review too!

I suggest this query: /?download that will be the short version of /?download=tarball.

Yeap sure, download will do just fine. It is not too relevant for now, but does tarball always equate to gzipped tar? What if we want to support XZ or bzip2?

What do you mean exactly by "conflicting combination"?

IMO in some cases, directory-listing will be treated as a security toggle, i.e. in no way expose the list of content in the directory. Thus, if somehow the user disabled directory-listing but enabled directory-download, I guess that will be conflicting, in the sense that the content can now be leaked by downloading the tarball?

I believe we can just say "download tarball" or "download tar.gz" whatever fits better.

Yeap sure, sounds great!

@joseluisq
Copy link
Collaborator

Yeap sure, download will do just fine. It is not too relevant for now, but does tarball always equate to gzipped tar? What if we want to support XZ or bzip2?

Good question, I was thinking of only supporting in the future tar.gz and zip as those two are the popular ones (Unix/Windows).
To be consistent, let's forget the tarball or zipball thing and use the extension as a reference.
We will use /?download (which means tar.gz) and later expand it to /?download=tar.gz or /?download=zip if needed in the future.

The text link can also be download tar.gz.

IMO in some cases, directory-listing will be treated as a security toggle, i.e. in no way expose the list of content in the directory. Thus, if somehow the user disabled directory-listing but enabled directory-download, I guess that will be conflicting, in the sense that the content can now be leaked by downloading the tarball?

I got it. In that case, let's make it dependent on directory-listing=true, so if a user needs download functionality, the user has to enable it like this:

$ static-web-server -p 1234 -d ./public \
    --directory-listing --directory-listing-download=tar.gz

The --directory-listing-download option should be empty by default (which means it is disabled).
This will also allow us later if we need to support zip, to just expand it to a list like --directory-listing-download=tar.gz,zip.

Copy link
Collaborator

@joseluisq joseluisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for now, I believe you are still refactoring the code and adding the SWS feature option. SWS options can be added via src/settings/cli.rs and src/settings/file.rs.

So I will wait for your changes first.
Just let me know if you have questions.

@joseluisq joseluisq added enhancement New feature or request v2 v2 release labels May 24, 2025
@ekangmonyet
Copy link
Contributor Author

I think it's almost ready for review, just a few questions/confirmations:

  • This feature should probably respect the ignore_hidden_files option as well, to implement that I think we have to replace the append_dir_all with our own implementation. Do we want to do this in this commit as well?
  • I tried dancing with clap and the current state is the best I can do:
    • --directory-listing-download=targz to enable
    • --directory-listing-download=none to disable (mixing none with other non-none values will panic, should we handle this differently as well?, I can't find a way to do it all in clap, yet)
    • future: --directory-listing-download=targz,zip,...
    • this flag requires -z (or the long one)
  • on error handling:
    • I think we can still try our best to respond if error occurs when making the header (most likely due to code changes/upstream behavior changes), instead of responding with 500?
    • For compression task, the only way to signal error is by calling sender.abort(), which IMO doesn't do much anyway. The most likely error seems to be due to broken pipe, i.e. user aborted the download midway. In that situation, there is nothing we can do since the session has already been closed anyway.

@joseluisq
Copy link
Collaborator

  • This feature should probably respect the ignore_hidden_files option as well, to implement that I think we have to replace the append_dir_all with our own implementation. Do we want to do this in this commit as well?

Yes, since ignore_hidden_files skips dotfiles, we have to implement it. It may not be a big deal to implement it. See https://docs.rs/async-tar/latest/src/async_tar/builder.rs.html#585.

  • I tried dancing with clap and the current state is the best I can do:
    • --directory-listing-download=targz to enable
    • --directory-listing-download=none to disable (mixing none with other non-none values will panic, should we handle this differently as well?, I can't find a way to do it all in clap, yet)
    • future: --directory-listing-download=targz,zip,...
    • this flag requires -z (or the long one)

What if we represent the directory_listing_download as Vec<Option<DirDownloadFmt>> and remove the enum's None case of DirDownloadFmt (not tested)?

I will investigate from my side as well. It would be much better to support an enum, but if not possible, we could use a simple String instead, where an empty string means disabled. Remember that if we add zip archive support, we could use both, like --directory-listing-download=targz,zip. If we use a string, then only check for s.is_empty() or s == "targz".
In the future, we could read just the string value and split it by a comma separator.

  • on error handling:

    • I think we can still try our best to respond if error occurs when making the header (most likely due to code changes/upstream behavior changes), instead of responding with 500?
    • For compression task, the only way to signal error is by calling sender.abort(), which IMO doesn't do much anyway. The most likely error seems to be due to broken pipe, i.e. user aborted the download midway. In that situation, there is nothing we can do since the session has already been closed anyway.

This sounds fine to me. If we do not panic the server thread, we are fine.

@ekangmonyet
Copy link
Contributor Author

ekangmonyet commented May 27, 2025

What if we represent the directory_listing_download as Vec<Option<DirDownloadFmt>> and remove the enum's None case of DirDownloadFmt (not tested)?

Do we absolutely need to allow explicitly disabling using --directory-listing-download=, or will the lack of said option be sufficient? If that's the case, None can be removed.

append `download-archive=<anything>` to query params of a directory will
trigger tarball compression
instead of adding it as a file entry, make it part of the dir index page
since the link is always constant anyway
- make dld a feature flag
- handle conditional compilation
- fix clippy errors
Technically it should be safe to unwrap as the filename will not contain
invisible ASCII characters...

Unless... the filename is non-ascii. Which needs to be encoded
differently as 172 (0x007F) is illegal.

TODO: handle unicode
still can't get `--download-list-format=` to work though
remove unrelated formatting change
@ekangmonyet ekangmonyet marked this pull request as ready for review May 27, 2025 18:38
@ekangmonyet ekangmonyet changed the title WIP: feat: download directory as compressed tarball WIP: May 27, 2025
@ekangmonyet ekangmonyet changed the title WIP: feat: download directory as compressed tarball May 27, 2025
Copy link
Collaborator

@joseluisq joseluisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review and encountered a few minor changes. I will continue reviewing and testing in a second round.

@joseluisq joseluisq changed the title feat: download directory as compressed tarball Add support for download a directory as compressed tarball May 28, 2025
@joseluisq joseluisq changed the title Add support for download a directory as compressed tarball Add support for downloading a directory as a compressed tarball May 28, 2025
Copy link
Collaborator

@joseluisq joseluisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few last minor suggestions. It looks fine to me overall.

Copy link
Collaborator

@joseluisq joseluisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Ready to be available in the upcoming release.
Thanks!

@joseluisq joseluisq merged commit 89f5846 into static-web-server:master May 31, 2025
35 checks passed
@joseluisq joseluisq added this to the v2.37.0 milestone May 31, 2025
@ekangmonyet ekangmonyet deleted the dir-archive branch May 31, 2025 14:57
@joseluisq joseluisq linked an issue Jun 1, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request v2 v2 release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to download zip, tar etc?

2 participants