I wasn’t sure what I was going to do when I stopped working, but I did know I wanted to catch up on my reading. There was a reasonably long list of books folks had recommended or I had earmarked for reading “when I had the time”. Obtaining e-books is easy, well-defined and consequently one of the first things I did to prepare for the break.
I had recently installed Plex on the box connected to our TV and was blown away by how easy it was to use and that it had a web interface out of the box. I fully expected to find a similar app for ebooks, but there didn’t seem to be anything comparable.
Maybe it’s because Kindle has the market cornered where it comes to such ebook management, but the closest, most full featured independent ebook library manager I came across was Calibre. Don’t get me wrong, it’s got a slew of features such as conversion between a myriad of formats (pdf, mobi, epub, you name it), and like plex has a web interface out of the box. It’s the web interface I was disappointed about it, as it seems like it was started some time ago and in need of some refreshment. It doesn’t even let you upload ebooks from the interface 🙁
So at the end of a fortnight, I should have something which will let you:
- Upload an ebook PDF
- Populate data about the ebook from amazon, including book cover, ratings, etc.
- Search for uploaded books, including text within the book
- Filter available books
- Download the PDF and text version (which I would have needed for indexing to search anyway)
Here’s my notepad drawing of how I think it will work:
Research today turned up a few things. There aren’t many native PDF parsing libraries in nodejs. Mozilla’s pdf.js used in firefox for PDF viewing seems to be the most popular native route, but shelling out to OS-available tools like Poppler’s pdftotext is also an option with libraries like pdftotextjs.
There’s also some interesting conditions with Amazon’s Product Advertising API, used to obtain product information. Goodreads which used to use Amazon’s API to obtain book info switched because of overly restrictive conditions in their usage agreement:
(e) You will not, without our express prior written approval, use any Product Advertising Content on or in connection with any site or application designed or intended for use with a mobile phone or other handheld device (which prohibition does not apply to any site that is not designed or intended for use with such devices but that may be accessible by such devices, such as a non-mobile-optimized site accessed via an internet browser on a tablet device), or any television set-top box (e.g., digital video recorders, cable or satellite boxes, streaming video players, blu-ray players, or dvd players) or Internet-enabled television (e.g., GoogleTV, Sony Bravia, Panasonic Viera Cast, or Vizio Internet Apps).
Really? No mobile sites at all? Funnily enough, Goodreads later got acquired by Amazon, but that’s a different story altogether.
It’s going to be fun 😀