According to one embodiment of the present invention, a system includes at least one processor. The system partitions a document into a plurality of data blocks, wherein each data block comprises one or more complete logical units of the document. A plurality of sub-documents is produced from the plurality of data blocks. The sub-documents are processed in parallel by a plurality of processing elements. Embodiments of the present invention further include a method and computer program product for processing a document in parallel in substantially the same manner described above.