Thursday, December 29, 2011

Pig store data in compressed format

Its really easy to compress data while using Pig in compressed format using bz2 format.All you need to do is add '.bz2' extension to the output directory. For e.g.,
STORE data into 'compressedoutput.bz2' USING SomeStore();


Taadaa!!! You are done!

Aother advantage of bz2 is this format is splittable.So if some other subsequent Map/Reduce job is going to read this data, it can be split giving more parallelism.

Awesome easy? isn't it?!