Package org.bzdev.io

Class ZipDocFile

java.lang.Object
java.util.zip.ZipFile
org.bzdev.io.ZipDocFile
All Implemented Interfaces:
Closeable, AutoCloseable

public class ZipDocFile extends ZipFile
Class to read entries in a zip-based file format with an embedded media type. These files are typically created by using ZipDocWriter or one of its subclasses. A number of file formats are based on the Zip archive format including Java JAR files and Open Document files. The file format supported by this class is one in which documents based on Zip archives are self-labeled with their Media types.

The Zip Document format is a Zip archive whose initial entry is a directory name, "META-INF/", with a size of 0, and with an extra field for that initial entry denoting a media type. By convention, the first element in the extra field for the first entry should be the one providing the media type, with a two-byte header ID of 0xFACE (stored in little-endian order). The media type uses UTF-8 encoding but legal characters in a media type make this compatible with U.S. ASCII. The media type is not null-terminated. This results in the following byte sequences for the first portion of a file:

 Bytes 0 to 3: 50 4B 03 04
 Bytes 8 to 9:  00 00
 Bytes 14 to 25: 00 00 00 00 00 00 00 00 00 00 00 00
 Bytes 26 to 27: 09 00
 Bytes 28 to 29: (4 + the length of the media type)
                 in little-endian byte order; a larger value if
                 other information is included
 Bytes 30 to 38: the characters "META-INF/"  (in UTF-8 encoding)
 Bytes 39 to 40: CE FA  (0xFACE in little-endian byte order)
 Bytes 41 to 42: the length of the media type in little-endian order
 Bytes 43 to (43 + mtlen):  the characters making up the media type
                 encoded using UTF-8, where mtlen is the number of
                 characters in the media type
 

The file can be read by any software that can process ZIP files. Applications using this file format can store data in the META-INF directory, typically meta data. The rationale for this format is to make it easy for classing engines or similar software to determine the a document type.

In a few cases (e.g., the files generated by ImageSequenceWriter), the zip file represent a sequence of objects, and some subsequences may consist of the same object repeated multiple times. To store these efficiently, the ZIP entry for the first can be tagged with a repetition count, provided in the method ZipDocWriter.nextOutputStream(String,boolean,int,int). The names chosen for the entry should normally be such that the missing items can be filled in without risk of a name conflict. The tag is a ZIP-file extra header whose ID is 0xFCDA, whose length is 4, and whose value is a 32-bit positive integer, with all three fields stored in little-endian byte order, the normal convention for ZIP files.

When creating a ZipDocFile, several entry names are reserved. These are "META-INF/", "META-INF/counters", and "META-INF/repetitionMap". The entry META-INF/counters contains two 32-bit two's complement integers in little-endian byte order. The first of these two integers contains the actual number of ZIP entries in the ZIP file, excluding those whose names start with "META-INF/". The second contains the number of entries, excluding those whose names start with "META-INF/", and including repetitions.

The reserved entry "META-INF/repetitionMap" is a US-ASCII file using CRLF as a newline separator. The line contains two values: an entry name and the actual entry name, separated by a space. Each of these names is URL encoded with the unencoded names using a UTF-8 character set. The repetitionMap entry might not be present if the repetition count is 1 for all entries. A repetition count of 1 is the default value - the count includes the original entry. Entries for which the repetition count is 1 are not present in a repetitionMap entry.