TODOs: - PEP-8 conformance. - Better text extraction / layout analysis. - Better API Documentation. - Robust error handling. - Any special handling for linearized PDFs? - Handle crypt filter. (More sample documents are needed!)