Reading .pdf files comfortably on Kindle

Lately I have been playing around with the idea of having an e-book reader. I read so many papers, academic books and novels that it would be a real big savior if I could keep some of them in a gadget that is made just for that. As far as the academic papers are concerned, it could save whole forests just because I wouldn’t have to actually print them on paper in order to read them.

Just as I was thinking about this a certain special somebody gave me the following kindle paper-white as a gift, one of the best and most appropriately targeted gifts I have ever received.

A kindle paperwhite device

Reading e-books using a kindle paperwhite is a breeze and an amazing experience. You can read them under any possible lighting conditions and without having to worry at all about the battery running out. Moreover they fit nicely to the screen of your e-reader and you can adjust the font-size to fit your preferred reading style. PDF files are, unfortunately, not that easy to work with from the perspective of an e-reader.

All the academic papers that I know and am interested in reading come in .pdf format. As I said above the .pdf format is really not the most suitable format for an e-reader. From my experience in kindle paperwhite you have the ability to pinch-zoom but it is so difficult and so hard to achieve a desired zoom that in my opinion it just isn’t worth it. There is always the possibility to use popular software like Calibre to convert your documents from .pdf to a more e-reader friendly format but in my experience the end result is not very good and does not constitute a pleasant reading experience.

After a little research around the web I came across this very informative post that did an analysis of all the options we have as far as reading .pdfs from an e-reader is concerned. Here I would like to basically extend my succesfull experience of using the last software he presents in his post, the k2pdfopt.

k2pdfopt is a command line tool that allows you to turn any .pdf into yet another .pdf that is adjusted and resized exactly for the dimensions of your particular e-reader device. It also has the ability to use OCR libraries to recognize the text of the .pdf and make it possible for you to take notes on it, highlight the text or even use the dictionary on some words of the PDF text. Let’s see the steps that you have to follow to make it work for you.

  • Download k2pfopt: First of all go and download k2pdfopt from the download link and choose your system. It is available for Windows, Linux and MacOsX!
  • Download Tesseract OCR: You can omit this step if you don’t want to be able to highlight text, recognize it and use the dictionary but why wouldn’t you? It’s very simple to accomplish. Go to the Tesseract download page and choose the appropriate language data file for your language. For example in my case and at the time of writing of this post (the 3.02 version of tesseract was the latest) I downloaded “tesseract-ocr-3.02.eng.tar.gz” for the English language data and “tesseract-ocr-3.02.jpn.tar.gz” for the japanese OCR data.
  • Installing Tesseract OCR: Put all the language data inside a directory in your computer. Let’s assume that this directory is Path/To/Tesseract/. Now depending on your Operating system you will have to create and set an environment variable called TESSDATA_PREFIX. Set this variable to the value of the directory you keep the downloaded data, in our case Path/To/Tesseract/ and you will be set to go. Remember that in Windows you may need to restart your system after setting the environment variable. For a more in-depth analysis of how to do this check the k2pdfopt OCR page and the Tesseract Read-me site.

After these steps are done you are ready to use the software to make those nice .pdfs nicely viewable in your kindle or any other e-reader device you may have. All you have to do is open a terminal window and call the k2pdfopt program with the right parameters. Look below for an example invocation of the program

k2pdfopt document.pdf -ocr -ocrlang eng -dev kpw -bp -f2p -1

Let us analyze the call to the program a bit here.

  • document.pdf: This is the input .pdf file we would like to convert for comfortable reading in the e-reader device.
  • -ocr:This option enables Optical Character Recognition (OCR) with the Tesseract engine that we downloaded above.
  • -ocrlang eng: This option selects the OCR language that Tesseract will use. If for example you had a Japanese text you would have to use -ocrlang jpn and the program would perform OCR on the Japanese text. It works, I have tried it.
  • -dev kpw: This option selects the resolution of the device that we would like the new .pdf to be optimized for. In the example above I used the value kpw which stands for Kindle Paper White but the program offers many other precomputed values for various e-reader devices such as k2 for Kindle 2 and nookst for Nook Simple Touch. In the worst case you can specify the dimensions of your e-reader manually via the -w and -h options.
  • -bp: This is a very important option that instructs the program to force break the pages of the output when the input document has a page break. You need this option turned on unless you like having the output document having page breaks in random places.
  • -f2p <val>: Fit-to-page option (Taken directly from the program’s documentation). The quantity controls fitting tall or small contiguous objects (like figures or photographs) to the device screen. Normally these are fit to the width of the device, but if they are too small or too tall, then if =10, for example, they are allowed to be 10% wider (if too small) or narrower (if too tall) than the screen in order to fit better. Use -1 to fit the object no matter what. Use -2 as a special case–all “red-boxed” regions (see -sm option) are placed one per page. Default is -f2p 0. See also -jf. Note: -f2p -2 will automatically also set -vb -2 to exactly preserve the spacing in the red-boxed region. If you want to compress the vertical spacing in the red-boxed region, use -f2p -2 -vb -1.

After we run this command we will get another pdf file perfectly formatted and OCRed for our e-reader device. It will have exactly the same name as the input document only with a _k2opt suffix appended to it. All you have to do is transfer it to your e-reader and enjoy reading and learning.

I hope this post comes of use to some of you in the same position as me, trying to figure out a way to utilize your kindle/e-reader to make your research and reading activities easier. If you have any comments, suggested feedback or questions do not hesitate to leave a comment below.

Comments (16)

  1. Perfect! I’ve had this problem with many PDFs, usually resulting in me giving up and not reading them.
    Didn’t realise there was a solution – thank you.

  2. raine

    As Tesseract’s ReadMe points out, the easiest way to install it on Mac is by using homebrew.

  3. badsykes

    Hi
    Is any way to automate this for more files simultaniously ? I mean instead of setting one pdf i want to select a folder with multiple pdf’s and convert them all at once..Thx for informative post.

  4. Lefteris

    You can just make a script to automate the conversion for multiple files depending on your Operating system using e.g. bash or python

  5. badsykes

    Thx for the answer..

  6. badsykes

    grrr… The conversion procedure is very slow…

  7. badsykes

    sorry for multiple post.I used the programs k2pfopt and teseract and it made the viewing worse..First page was converted ok but the rest the writing was very very small…Here is another way for converting files to kindle version for better viewing…Hope it helps..

    http://ebook.online-convert.com/convert-to-azw3

  8. Lefteris

    In order to properly use k2pdfopt you have to study its manual a bit and use the proper parameters which you obviously did not. The purpose of the tool you linked and the one described in this article differ a bit.

    But if with the tool you linked your job is done then that’s great. Enjoy your ebooks!

  9. badsykes

    You are right.I assumed the commands you wrote were already optimized for kindle.I didn’t have patience to read the manual..I have too many ebooks to read and i needed something quick to optimize them on kindle…

    Thanks again..

  10. Chris

    Thanks for the informative article and info. You saved me a lot of time and made things much easier for me.

  11. Matteo

    Thanks for the post, it helped me a lot!
    Anyway, I am not able to use the dictionary with the transformed pdf (the dictionary works only on some random words). Do you know how I can improve the dictionary function?

  12. don myatt

    Thrilled to find your link. I too have struggled moving docs to Kindle PW ( and with no success).

    Can you update link in para 4 ” this very informative post” ?

    efharistou
    don

  13. Lefteris

    link updated. The author of that post must have moved things around in their website

  14. CCCalboy

    Many thanks for helping. Windows users can now skip the ‘set environment variable’-step. I was able to use the Tesseras windows downloader which takes care of the necessary change.

  15. How this compare with using a tablet (iPad or Android)?
    I ask because the PDF support is more or less good there and OTOH I never used a Kindle…

  16. Lefteris

    Unfortunately I would say that it’s not as good. Reading .pdf on a tablet would look better, but nothing compares to the ease of reading, especially on the eyes, that a Kindle has.

    PDF files definitely need a lot of work on the Kindle though :(

Leave a Reply

Your email address will not be published. Required fields are marked *