Archive compression and modification code snippets

Index

Creating new archives

The are two slightly different APIs for creating new archives:

The first API is designed to work with one particular archive format, like Zip. The Second API allows archive format independent programming.

Archive test structure

Some archive formats like GZip only support compression of a single file, while other archive formats allow multiple files and folders to be compressed. In order to demonstrate how those archives can be created, some test file and folder structure is required. The following snippets use a static structure defined by the CompressArchiveStructure class:

##INCLUDE_SNIPPET(CompressArchiveStructure)

Creating archives with the archive format specific API

The archive format specific API provides easy access to the archive configuration methods (e.g. for setting the compression level). Also it uses archive format specific item description interfaces (like IOutItemZip for Zip). Different archive formats support different archive item properties. Those interfaces provide access to the properties supported by the corresponding archive format, whether the unsupported properties remain hidden.

Lets see how different archives can be created using archive format specific API

Creating Zip archive using archive format specific API

Creating Zip archive using archive format specific API was already presented in the "first steps". The key parts of the code are:

##INCLUDE_SNIPPET(CompressNonGenericZip)

If you run this program with (on Linux)

$ java -cp ‹path-to-lib›\sevenzipjbinding.jar;              \  
           ‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \ 
       CompressNonGenericZip compressed.zip

you will get the output

##INCLUDE_OUTPUT(CompressNonGenericZip)

The archive file compressed.zip should be created. It contains files and folders specified in the CompressArchiveStructure class.

$ 7z l compressed.zip
Listing archive: compressed.zip

--
Path = compressed.zip
Type = zip
Physical Size = 718

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-09-09 08:56:42 .....           16           16  info.txt
2015-09-09 08:56:42 .....          100          100  random-100-bytes.dump
2015-09-09 08:56:42 .....           38           38  dir1/file1.txt
2015-09-09 08:56:42 D....            0            0  dir2
2015-09-09 08:56:42 .....           38           38  dir2/file2.txt
------------------- ----- ------------ ------------  ------------------------
                                   192          192  4 files, 1 folders

Creating 7-Zip archive using archive format specific API

Creating 7z archive is a little bit easer that creating Zip archives. The main difference is the implementation of the MyCreateCallback.getItemInformation() method. 7z doesn't need relative complex calculation of the attribute property providing a nice default behavior.

##INCLUDE_SNIPPET(CompressNonGeneric7z)

For instructions on how to running the snippet and check the results see Creating Zip archive using archive format specific API.

Creating Tar archive using archive format specific API

Creating tar archives is pretty much the same, as creating 7z archives, since the default values for most properties are good enough in most cases. Note, that the tar archive format do have attribute property. But due to the Unix-nature of the tar, it was renamed to PosixAttributes. Also the meaning of the bits is different.

##INCLUDE_SNIPPET(CompressNonGenericTar)

For instructions on how to running the snippet and check the results see Creating Zip archive using archive format specific API.

Creating GZip archive using archive format specific API

GZip format is a stream archive format meaning, that it can only compress a single file. This simplifies the programming quite a bit. In the following snippet a single message passed through the second command line parameter get compressed. Note, that like non-stream archive formats GZip also supports optional Path and LastModificationTime properties for the single archive item.

##INCLUDE_SNIPPET(CompressNonGenericGZip)

For instructions on how to running the snippet and check the results see Creating Zip archive using archive format specific API.

Creating BZip2 archive using archive format specific API

BZip2 is like GZip a stream archive format. It compresses single archive item supporting no additional archive item properties at all.

##INCLUDE_SNIPPET(CompressNonGenericBZip2)

For instructions on how to running the snippet and check the results see Creating Zip archive using archive format specific API.

Creating archives with the generic API

The one of the great features of the 7-Zip (and though of the 7-Zip-JBinding) is the ability to write archive format independent code supporting most or even all of the archive formats, supported by 7-Zip. The following code snippet accepts the required archive format as the first parameter compressing the test data in the specified archive format.

The key steps to write a generic compression code are

##INCLUDE_SNIPPET(CompressGeneric)

Now you can run the snippet with different parameters creating archives with different formats. The last parameter specifies, how many archive items from the CompressArchiveStructure should be compressed. This number should be between 1 and 5 for 7z, Zip and Tar, and must be 1 for the stream formats GZip and BZip2.

Also a bunch of the compressed_generic.* archives should be created with the corresponding contents.


Modifying existing archives

7-Zip-JBinding provides API for archive modification. Especially by small changes the modification of an archive is much faster compared to the extraction and the consequent compression. The archive modification API (like the compression API) offers archive format specific and archive format independent variants. The process of the modification of an existing archive contains following steps:

The following snippets show the modification process in details using archive format independent API. The archive to be modified is one of the Zip, 7z or Tar archives created by the corresponding compression snippets on this page. The structure of those archives is specified in the CompressArchiveStructure class.

Altering existing archive items

The first snippet modifies one existing item with index 2 (info.txt, 16 bytes):

##INCLUDE_SNIPPET(UpdateAlterItems)

If you run this program with (on Linux)

$ java -cp ‹path-to-lib›\sevenzipjbinding.jar;              \ 
           ‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \ 
       UpdateAlterItems /testdata/snippets/to-update.7z updated.7z

you will get the output

##INCLUDE_OUTPUT(UpdateAlterItems)

Now lets look at original and modified archives:

$ 7z l /testdata/snippets/to-update.7z
...

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-09-14 07:57:09 .....           38          159  dir1/file1.txt
2015-09-14 07:57:09 .....           38               dir2/file2.txt
2015-09-14 07:57:09 .....           16               info.txt
2015-09-14 07:57:09 .....          100               random-100-bytes.dump
2015-09-14 07:57:09 D....            0            0  dir2/
------------------- ----- ------------ ------------  ------------------------
                                   192          159  4 files, 1 folders
$ 7z l updated.7z
   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-09-14 07:57:09 .....           38          151  dir1/file1.txt
2015-09-14 07:57:09 .....           38               dir2/file2.txt
2015-09-14 07:57:09 .....          100               random-100-bytes.dump
2015-09-14 07:57:09 .....           10           16  info2.txt
2015-09-14 07:57:09 D....            0            0  dir2/
------------------- ----- ------------ ------------  ------------------------
                                   186          167  4 files, 1 folders

As you can see, the file "info.txt" (16 bytes) was replaces with the file "info2.txt" (10 bytes).

Adding and removing archive items

Now lets see how archive items can be added and removed. In order to remove an archive item a reindexing is necessary. In the previous snippet for each archive item the indexes in the old archive and the index in the new archive were the same. But after removing one item all consequent indexes in the new archive will change and will be less, that corresponding indexes in the old archive. Here is an example of removing an item C with index 2:

Index:          0      1      2      3      4
Old archive:   (A)    (B)    (C)    (D)    (E)
New archive:   (A)    (B)    (D)    (E)

Here the index of D in the old archive is 3, but in the new archive is 2.

In order to add a new item the count of items in archive passed to the IOutArchive.updateItems() method should be increased. In the callback the new item with the new index (that doesn't map to any old archive items) should be initialized exactly, like new items get initialized during a compression operation. The next snippet

##INCLUDE_SNIPPET(UpdateAddRemoveItems)

If you run this program with (on Linux)

$ java -cp ‹path-to-lib›\sevenzipjbinding.jar;              \ 
           ‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \ 
       UpdateAddRemoveItems ‹git›/testdata/snippets/to-update.7z updated.7z

you will get the output

##INCLUDE_OUTPUT(UpdateAlterItems)

Now lets look at original and modified archives:

$ 7z l /testdata/snippets/to-update.7z
...

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-09-14 07:57:09 .....           38          159  dir1/file1.txt
2015-09-14 07:57:09 .....           38               dir2/file2.txt
2015-09-14 07:57:09 .....           16               info.txt
2015-09-14 07:57:09 .....          100               random-100-bytes.dump
2015-09-14 07:57:09 D....            0            0  dir2/
------------------- ----- ------------ ------------  ------------------------
                                   192          159  4 files, 1 folders
$ 7z l updated.7z
...

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-09-14 07:57:09 .....           38          151  dir1/file1.txt
2015-09-14 07:57:09 .....           38               dir2/file2.txt
2015-09-14 07:57:09 .....          100               random-100-bytes.dump
2015-09-16 21:43:52 .....           11           16  data.dmp
2015-09-14 07:57:09 D....            0            0  dir2/
------------------- ----- ------------ ------------  ------------------------
                                   187          167  4 files, 1 folders

As you can see, the file info.txt is gone and the file data.dmp (11 bytes) appears in the archive.

Troubleshoot problems using tracing

One of the weak sides of the 7-zip compression engine is a rather simple error reporting. If some provided data doesn't satisfy the compressor it fails without any descriptive error message. One way to get an clue is to use 7-Zip-JBinding tracing feature. Here is the code passing invalid data size for the item 1 and though failing.

##INCLUDE_SNIPPET(CompressWithError)

If you run this program you will get the following error message printed to the System.err:

##INCLUDE_OUTPUT(CompressWithErrorErr)

The error message provides no useful information for finding the bug. But since the snippet enables tracing by calling IOutArchive.setTrace(true), the trace log get printed to the System.out.

##INCLUDE_OUTPUT(CompressWithError)

As you see, the tracing stops just after 7-zip retrieved the size of the data for the item 1. This suggests, that the value for the size of the data of the item 1 may cause the failure. In this small example, like in most other cases, this will help to find the problem.