An unintuitive secret of reading books on computers: reading PDFs with original typesetting is much better than reading ebooks, which treat text like a 4chan shitposter and have impoverished reading software.
But… where to get the PDFs?! A survey & suggestions for future work:
Conversation
Google Play:
👍 ~smooth workflow; clean pages
👎 PDFs lack text layer, so they're not searchable or selectable; only recent books available in PDF
archive.org:
👍 has many older books Play lacks; includes OCR'd text layer
👎 OCR errors; photo noise; clunkier workflow
Replying to
Z-Library:
👍 occasionally has clean PDFs for books which others lack
👎 PDFs are often EPUB->PDF conversions (the worst!); more illegal
4
1
23
One fun project idea: maybe you could improve upon the poor text layers in Play / archive.org's PDFs by building a tool which combines EPUBs and PDFs by aligning the EPUB's original text onto the PDF pages via OCR.
1
4
22
Maybe you could improve the EPUB reading experience by extracting text block layout parameters from the PDFs through computer vision: ie. try to estimate the text block width/height, line height, and font size in the original typesetting. Similar technique could map page numbers.
3
16
Related: while e-book reading software are truly impoverished, PDF software is also almost universally unimaginative and unserious for the task of reading. Would love to see more work there…
4
6
45
