|
|
@@ -23,7 +23,7 @@ The self-extracting ZIP files created by SingleFile are essentially regular ZIP
|
|
|
|
|
|
By default the ZIP payload is stored in a comment node (i.e. wrapped by `<!--` and `-->` tags). However, if the payload contains the closing tag (i.e. `-->`), then the payload is wrapped in another pairs of tags (i.e. tags of `noscript` or `script` or `xmp` or `plaintext` elements) whose closing tag does not conflict with the payload. Within this HTML page, there is also a script weighting approximately 50KB designed to extract and display the ZIP payload when the file is opened in a web browser and interpreted as a web page.
|
|
|
|
|
|
-The purpose of the embedded script is to read the ZIP payload as binary data, extract it, and then display the extracted page with its resources. Initially, the script can use the `window.fetch()` method to read the HTML page in binary form and retrieve the ZIP payload. However, this API doesn't work in Chromium-based and WebKit-based browsers when the page is accessed from the local file system due to security restrictions. To circumvent this, the page is encoded in `windows-1251`, and binary data is directly retrieved from the Document Object Model (DOM) when using the "universal" self-extracting ZIP format. The choice of `windows-1251 encoding` is preferred over `UTF-8` because it preserves all bytes without significant data loss.
|
|
|
+The purpose of the embedded script is to read the ZIP payload as binary data, extract it, and then display the extracted page with its resources. Initially, the script can use the `window.fetch()` method to read the HTML page in binary form and retrieve the ZIP payload. However, this API doesn't work in Chromium-based and WebKit-based browsers when the page is accessed from the local file system due to security restrictions. To circumvent this, the page is encoded in `windows-1251`, and binary data is directly retrieved from the Document Object Model (DOM) when using the "universal" self-extracting ZIP format. The choice of `windows-1251` encoding is preferred over `UTF-8` because it preserves all bytes without significant data loss.
|
|
|
|
|
|
Regardless of page encoding, all instances of `CR` (Carriage Return) and `CR+LF` (Carriage Return and Line Feed) bytes are replaced with `LF` (Line Feed) bytes when read from the DOM. As a consequence, additional data needs also to be incorporated into the page to restore this data loss. This task is accomplished by the `sfz-extra-data` element, which contains both the necessary data and the offset specifying the start of the ZIP payload encoded in base64 when using the "universal" self-extracting ZIP format. The data in this element is read by the embedded script before extracting the ZIP payload in order to restore `CR` (Carriage Return) and `CR+LF` (Carriage Return and Line Feed) bytes. Finally, because the zip specification tolerates no more than 64KB of random data after the ZIP payload, this element is positioned at the end or beginning of the HTML page (i.e. when it weighs more than 64KB).
|
|
|
|