-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Definitions
By out-of-container relative URL string, I mean a relative URL string that has a number of double-dot path segments ("..") high enough to conceptually go outside the container.
For instance, "../../../../EPUB/content.xhtml" is an out-of-container relative URL string given the following container:
/
├── mimetype
├── META-INF
│ └── container.xml
└── EPUB
└── content.xhtml
Problem
In previous versions of EPUB, the URL definition was unclear (see #1888), but I believe the intent was to disallow them.
In the #1898 proposal, out-of-container URL string are conforming, but the base URL of the container is defined such that an out-of-container URL string is necessarily parsed into a in-container URL.
For instance, after #1898, the URL string "../../../../EPUB/content.xhtml" will be parsed to the same URL as the URL string "EPUB/content.xhtml".
But as we added to a note in the #1898 proposal, using an out-of-container URL string will likely lead to interoperability issues with legacy or non-conforming RS.
In addition, as I said earlier, I believe the intent in previous versions of EPUB was to disallow them.
Proposal
I think we should forbid out-of-container URL strings.
Here's a proposal (assuming #1898 is merged).
Replace:
In the OCF Abstract Container, when a file uses a URL string to reference another file in the container, the string MUST be a path-relative-scheme-less-URL string, optionally followed by
U+0023 (#)and a URL-fragment string.
by something along the lines of:
In the OCF Abstract Container, a relative-URL string MUST be a container relative URL string.
A URL string url is a container relative URL string if it is a path-relative-scheme-less-URL string and the following steps return true:
- let testURLRecord be the result of applying the URL parser to url with "
https://example.org/A/".- let testURLString be the result of applying the URL Serializer to testURLRecord.
- if testURLString does not start with "
https://example.org/A/", then return false.- set testURLRecord to the result of applying the URL parser to url with "
https://example.org/B/".- let testURLString be the result of applying the URL Serializer to testURLRecord.
- if testURLString does not start with "
https://example.org/B/", then return false.- Return true.
Explanation
The proposal above intends to override the URL standard definition of relative-URL string, so that:
- scheme-relative- and path-relative- URL strings are not allowed (in other words, URL strings starting with "
/" are not allowed) - "exceeding" or "leaky" URL strings are not allowed (in other words, URLs with enough "
.." path segments to go "outside" the container are not allowed)
The intent is even if we refer to a broader "category" of URL strings, like a relative-URL-with-fragment string, our restrictions on relative-URL string apply.
In some way, it is monkey patching the URL standard definition. Monkey patches are usually not considered a good thing. But I do not see how to do otherwise: for the document formats we own (e.g. Package Document), we can easily define what is a valid URL string; but for other formats used in EPUB (e.g. HTML), they directly refer to the URL standard so I don't see an alternative to tweaking the definition.
Editorial consequences
We will be able to replace all our use of:
path-relative-scheme-less-URL string, optionally followed by
U+0023 (#)and a URL-fragment string
by
relative-URL-string with fragment string (which is a bit more readable).
We may no longer need to assume the properties of the container root URL in the core spec, as they really only apply to out-of-container URLs.
We still need those in in the RS spec, to specify how reading systems must process non-conforming URLs.