Firebird News

Tuesday, May 17, 2005

Blob Compress -- Some Numbers

Jim Starkey wrote on firebird-architect (after a long
discussion about advantages of compressing blobs):

I took a "documents" table from one of my production databases and
crunched some numbers. The table had 1,377 blob with summarized as follows:

MIMETYPE COUNT Average Size Average Compressed Size
----------------------------- ----- ------------ -----------------------

application/msword 768 122767 88694
application/octet-stream 420 108876 82103
application/pdf 153 1048402 896624
application/vnd.lotus-wordpro 4 41423 17888
application/vnd.ms-excel 3 79872 20694
application/vnd.rn-realmedia 9 3583755 3272342
application/x-macbinary 1 19968 1702
image/gif 3 37806 37745
image/jpeg 1 133523 124465
image/pjpeg 6 245316 238838
text/html 8 16459 4528
text/plain 1 1 9

The aggregate size of the blobs was 334,948,746. The blobs represent
whatever the government workers in the city of Amesbury, Massachusetts
thought was worth sharing. Normal content is managed in Word, which
explains the heavy skew. The Word documents had a total of about 1200
images, mostly jpegs.

I compressed each block with zlib using default settings, writing both
the original and the compressed versions to a new table. The aggregate
size of the compressed blobs was 271,077,508 bytes.

I started a Netfrastructure server from scratch and fetched all
uncompressed blobs of the new table. I restarted the server and fetch
and decompressed all compressed blobs. The elapsed time for the
uncompressed blobs was about 64 seconds, the elapse times for fetching
and decompressing the compressed blobs was about 58 seconds.

No comments: