JTE - Jigdo Template Export

JTE Introduction | Download | How to use JTE | How JTE works | How fast is JTE? | How to use mkimage | How to use iso-image.pl | How to use jigit | External integration | What's left to do?



Jigdo is a useful tool to help in the distribution of large files like CD and DVD images. See Richard Atterer's site for more details. Debian CDs and DVD ISO images are published on the web in jigdo format to allow end users to download them more efficiently.

Jigdo is generic and powerful - it can be used for any large files that are made up of smaller files. However, to be this generic is costly. Creating jigdo files from ISO images is quite inefficient - to work out which files are included in the ISO image, jigdo has to calculate and compare checksums of every possible file and every extent in the image. Essentially it has to brute-force the image. On my home system, it can take several hours to do this for a 4.5GB DVD image.

There are a few ways to improve this that I can see:

  1. Modify jigdo so it knows about the internals of ISO images and can efficiently scan them (bad, not very generic for jigdo)
  2. Write a helper tool to dump extra information for jigdo to use alongside the ISO image (helper tool written, but modifying jigdo to use this looks HARD)
  3. Patch genisoimage to write .jigdo and .template files alongside the ISO image

I've now done the third of these, and called it JTE (or Jigdo Template Export). The code works fine, and runs in a very small fraction of the time taken to run genisoimage and jigdo separately. The output .jigdo and .template files work correctly, i.e. jigdo-file and the wrapper script jigdo-mirror accept them and will generate an ISO image that exactly matches the original.

Current versions of JTE now also come with some extra tools:

mkimage, a simple and very fast tool to reconstruct image files from .jigdo and .template files. It doesn't have any logic to cope with downloading missing files, but will list the missing files that are needed. It is also much faster for people (like me!) who already have full local mirrors.

iso-image.pl is a CGI script to wrap around mkimage if you'd like to be able to offer images for HTTP download without using up multiple gigabytes of disk space. And for added network efficiency the perl CGI also supports HTTP v1.1 byte ranges so clients can resume aborted downloads.

jigit is a first attempt at a user-friendly wrapper for mkimage on a user's machine.

parallel-sums is a simple extra utility to generate checksums quickly and efficiently, reading file data only once and calculating checksums using multiple algorithms in parallel.

The addition of these extra tools means that I'm now distributing JTE as a source tarball and Debian packages rather than just a genisoimage patch.

NEW:The packages and source are now named jigit to match the wrapper script.



Jigit has now been uploaded and accepted into the Debian archive, so you should get binary packages from there (current is 1.20-1). Source and backported versions are in the download area alongside the current ChangeLog. If you want to verify the integrity of the source then check the md5sum against the .asc there. Older versions were signed with my GPG key fingerprint 88C7C1F7, and newer ones with my new 4096R key 3442684E.

jigit is also maintained in git; see http://git.einval.com/cgi-bin/gitweb.cgi?p=jigit.git. Current development is on the "stable" branch, for historical reasons.


How to use JTE

To use the jigdo creation code, specify the location of the output .jigdo and .template files alongside the ISO image. You can also specify the minimum size beneath which files will just be dropped into the binary template file data rather than listed as separate files to be found on the mirror, and exclude patterns to ignore certain files in the same way. And paths in the original filesystem can be mapped onto more global namespaces using the [Servers] section in the .jigdo file. For example:

genisoimage -J -r -o /home/steve/test1.iso \
        -jigdo-jigdo /home/steve/test1.jigdo \
        -jigdo-template /home/steve/test1.template \
        -jigdo-min-file-size 16384 \
        -jigdo-ignore "README*" \
        -jigdo-force-md5 "/pool/" \
        -jigdo-map Debian=/mirror/debian \
        -md5-list /home/steve/md5.list \

If the -jigdo-* options are not used, the normal genisoimage execution path is not affected at all. The above invocation will create 3 output files (.iso, .jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are accepted, for multiple ignore and map patterns.

Use the -md5-list option to specify the location of a list of files and their md5sums in normal md5sum format. genisoimage will then compare the checksum of each file it is asked to write against the checksum of that file in the list. It will abort on any mismatches. The MD5 list file must list all the files that are expected to be found and listed in the output .jigdo file. The -jigdo-force-md5 option specifies a path where all files are expected to have an MD5 entry (e.g. /pool/). Then if any files do not have a match, they must have been corrupted and genisoimage will abort.

More options have now been added in version 1.2 onwards so that you can specify the location of boot files within the ISO image. Previously the four architectures alpha, hppa, mips and mipsel needed separate tools to make an ISO image bootable. This also made life very hard when trying to produce jigdo files. Now that code is in genisoimage, life is much easier!

-alpha-boot <FILE> Specify the location of the boot image (relative to the root of the ISO image)
-hppa-cmdline <CMDLINE> Specify the hppa boot command line. Separate elements with commas or spaces.
-hppa-kernel-32 <FILE> Specify the location of the 32-bit boot image (relative to the root of the ISO image)
-hppa-kernel-64 <FILE> Specify the location of the 64-bit boot image (relative to the root of the ISO image)
-hppa-bootloader <FILE> Specify the location of the bootloader code (iplboot, relative to the root of the ISO image)
-hppa-ramdisk <FILE> Specify the location of the ramdisk (relative to the root of the ISO image)
-mips-boot <FILE> Specify the location of the boot image (relative to the root of the ISO image)
Mipsel is awkward because we have to parse the ELF header of the boot loader and write some locations from it into the boot sector. Ick!
-mipsel-boot <FILE> Specify the location of the boot image (relative to the root of the ISO image)


How JTE works

I've hooked all the places in genisoimage where it will normally write image data. All the normal data write calls (directory entries etc.) I simply copy through and build into the template file. Any file data entries are instead passed through with information about the original file. If that file is large enough (see -jigdo-min-file-size above), I grab the filename and the MD5 of the file's data. If that MD5, size and length match an entry in the md5-list, I can just write a file match record into the template file (and then the jigdo file) instead of the file data itself.


How fast is JTE?

On my laptop (600MHz P3, slow laptop disk) I can make a template file in parallel with the ISO image from a typical 500MB data set in about 2 minutes. By simply not creating the ISO (-o /dev/null), this time halves again. The data set I'm using here is a copy of the woody i386 r2 update CD, as it's a handy image I had lying around.

On my faster home server machine (1.7GHz Athlon, 512MB RAM, fast SCSI disks), I can produce a 7GB DVD iso image with the jigdo and template files in about 8 minutes. A debian-cd run from start to finish to create DVD images takes about 25 minutes per architecture.

Genisoimage is normally I/O-bound on this system, but when running the jigdo creation code it's now CPU bound - it's now running 2 MD5 checksums on each data block that it sees. To boost performance when creating images on a large SMP machine, running several copies of debian-cd in parallel should parallelise nicely - ideally run the CD and DVD versions of each architecture together to get maximum benefit from the dentry and page cache.


How to use mkimage

mkimage is a faster, local-only version of "jigdo-file make-image", again written in portable C. It takes a few options:

-f <MD5 file>Specify a file containing MD5sums for files we should attempt to use when rebuilding the image
-j <jigdo file>Specify the input jigdo file
-t <template file>Specify the input template file
-m <item=path>Map <item> to <path> to find the files in the mirror
-M <Missing file>Don't attempt to build the image; just verify that all the components needed are available. If some are missing, list them in the specified file.
-v Make the output logging more verbose.
-l <log file>Specify a logfile. If not specified, will log to stderr just like genisoimage
-qDon't bother checking md5sums of the input files, or of the output image.
WARNING: this may lead to corrupt images, but is much faster.
-s <start offset>Specify where to start in the image (in bytes). If not specified, will start at the beginning (offset 0). Added for iso-image.pl use
-e <end offset>Specify where to end in the image (in bytes). If not specified, will run all the way to the end of the image. Added for iso-image.pl use
-zDon't attempt to reassemble the image; simply parse the image descriptor in the template file and print the image size. Added for iso-image.pl use

Specifying a start or end offset implies -q - it's difficult to check MD5 sums if the full image is not generated!


How to use iso-image.pl

iso-image.pl is a small perl wrapper script written to drive mkimage and turn it into a CGI. It will parse the incoming request (including byte-ranges) and call mkimage to actually generate the image pieces wanted.

Configuration is simple: place iso-image.pl in a cgi-bin directory and set various paths in the script:


How to use jigit

jigit will automatically download the files needed to create a Debian/Ubuntu CD, using as many files available locally as possible before downloading any that are missing.

Configure jigit by editing /etc/jigit.conf or $HOME/.jigit.conf. The two settings that matter are:

HOST The base URL of the jigit update site.
TMPDIR Where jigit should store its temporary files. These may take up a LOT of space!

Then run jigit <CD name> to grab the CD you need. If you have any local files that may reduce the amount you need to download (e.g. from an earlier CD or a local mirror), tell jigit where to find them. It will automatically look in your apt cache too.

It should look something like this:

sledge:~$ jigit wibble
Downloading config:   http://tack/jigit/wibble.conf
Downloading jigdo:    http://tack/jigit/wibble.jigdo
Downloading template: http://tack/jigit/wibble.template
If you have a mirror, or any previous CD or CD image(s) available,
where are they mounted?
Say "none" if you have none; separate multiple entries with spaces
> [none] 

Checking MD5 sums of files in /var/cache/apt/archives:
Checking MD5 sums of files in /mirror/jigit-test/jigit/files:

Unable to recreate image from template file /mirror/jigit-test/jigit/jigdo/wibble.template
/mirror/jigit-test/jigit/jigdo/missing-list contains the list of missing files
Need to download 739 files to complete the image
    0 files missing; all needed files available                              

Image should be 467847168 bytes
Image MD5 should be 90bc9f792371c5c0dae185450d8e9f23
Creating ISO image /mirror/jigit-test/jigit/jigdo/wibble.iso
 100.00%  template data                                               
Output image MD5 is 90bc9f792371c5c0dae185450d8e9f23
Output image length is 467847168
Image created successfully in /mirror/jigit-test/jigit/jigdo/wibble.iso


External integration

The current released version of debian-cd in etch supports JTE out of the box. Wodim ships with integrated JTE code too.

---------------------------------------------------------------------- ----------------------------------------------------------------------

What's left to do?

  1. Testing! :-) This is where you lot come in! Please play with this some more and let me know if you have any problems, especially with data corruption.
  2. More documentation.
  3. Support for non-local mirrors in mkimage.