TODOs:
  - PEP-8 conformance.
  - Better text extraction / layout analysis.
  - Better API Documentation.
  - Robust error handling.
  - Any special handling for linearized PDFs?
  - Handle crypt filter. (More sample documents are needed!)