Enhancement to MQMR

Capitalware has an MQ solution called MQ Message Replication (MQMR).

MQ Message Replication will clone messages being written (via MQPUT or MQPUT1 API calls) to an application’s output queue and MQMR will write the exact same messages to ‘n’ target queues (‘n’ can be up to 100). When MQMR replicates a message both the message data and the message’s MQMD structure will be cloned. This means that the fields of the MQMD structure (i.e. PutTime, MessageId, CorrelId, UserId, etc..) will be exactly the same as the original message’s MQMD structure.

MQMR includes 2 auxiliary programs:

  • MQ Queue To SQLite DB (MQ2SDB) program will offload MQ messages to an SQLite database.
  • SQLite DB To MQ Queue (SDB2MQ) program will load SQLite database rows into messages in an MQ queue.

The SQLite databases, created by the MQ2SDB program, can grow to be extremely large when thousands or tens of thousands of messages are offloaded to it. A quick solution would be to run a nightly job and compress/zip the previous day’s SQLite databases to free up disk space. Or the SQLite databases can be moved to a different file system.

I had a thought, why not add an option to the MQ2SDB program to compress the message data before it is written to the SQLite database. And add code in SDB2MQ program to decompress the data when it is put to a queue.

I did a bunch of research and compression algorithms are almost as complex as encryption algorithms. The compression algorithms are far, far more dependent on the data than encryption algorithms. What I mean is that the type of data and the structure of the data dictate how well and how fast the compression algorithms will perform.

I decided it was best to add a variety of lossless compression algorithms, so that end-users can select the compression algorithm that best fits their data.

The MQ2SDB program supports the following 8 lossless compression algorithms:

  • LZ1 (aka LZ77) – I used Andy Herbert’s modified version with a pointer length bit-width of 5.
  • LZ4 – It is promoted as extremely fast (which it is).
  • LZW – I used Michael Dipperstein’s implementation of Lempel-Ziv-Welch.
  • LZMA Fast – I used the LZMA SDK from 7-Zip with a Level set to 4.
  • LZMA Best – I used the LZMA SDK from 7-Zip with a Level set to 5.
  • RLE – Run Length Encoding – I wrote the code from pseudo code – very basic stuff.
  • ZLIB Fast – I used Rich Geldreich’s miniz implementation of ZLIB with a Level of Z_BEST_SPEED.
  • ZLIB Best – I used Rich Geldreich’s miniz implementation of ZLIB with a Level of Z_BEST_COMPRESSION.

So, how do you know what is the best compression algorithm for the end-user’s data? Well, to take the guess work out of it, I wrote a simple program called TESTCMPRSN. It applies all 8 compression algorithms against a file and display the results.

Here’s an example of TESTCMPRSN program being run against a 2.89MB XML file:

C:\test>testcmprsn.exe msg5.xml
testcmprsn version 0.0.1 (Windows64) {Sep  2 2020}

msg5.xml size is 3034652 (2.89MB)
Time taken to perform memcpy() is 1.0757ms

Algorithm               Compressed      Compression     Compression     Decompression
                           Size         Time in ms        Ratio           Time in ms
LZ1                 375173 (366.38KB)     541.6782       8.09 to 1          5.6972
LZ4                 140692 (137.39KB)       4.9557      21.57 to 1          1.3401
LZMA Fast            75967 (74.19KB)       49.4750      39.95 to 1         10.7603
LZMA Best            71453 (69.78KB)      463.8315      42.47 to 1         10.7566
LZW                 186484 (182.11KB)      76.0163      16.27 to 1         19.8878
RLE                4054366 (3.87MB)         8.1609       0.75 to 1          9.4421
ZLIB Fast           151404 (147.86KB)      15.3561      20.04 to 1          6.8379
ZLIB Best            84565 (82.58KB)       60.6147      35.89 to 1          6.0363
testcmprsn is ending.

Clearly, LZMA Best crushed it. It reduced a 2.89MB file to just 69.78KB but at a cost of 467.498 milliseconds. A better option for that type of data is to use LZMA Fast but if speed is what you want then LZ4 is by far the better choice.

As a benchmark, the TESTCMPRSN program performs a memcpy() of the data, so that the end-user can compare the compression algorithms compression time against the memcpy() time.

As they say: your mileage will vary. The only way to know which compression algorithm will work best for your data is to test it. Note: RLE should only be used with alphanumeric data (plain text) that has repeating characters and never with binary data.

I have completed a wide variety of tests and everything looks good.

If anyone would like to test out the latest release then send the email to support@capitalware.com

Regards,
Roger Lacroix
Capitalware Inc.

This entry was posted in Capitalware, Compression, IBM i (OS/400), IBM MQ, Linux, MQ Message Replication, Unix, Windows.

Comments are closed.