An unintuitive secret of reading books on computers: reading PDFs with original typesetting is much better than reading ebooks, which treat text like a 4chan shitposter and have impoverished reading software.
But… where to get the PDFs?! A survey & suggestions for future work:
Conversation
Google Play:
👍 ~smooth workflow; clean pages
👎 PDFs lack text layer, so they're not searchable or selectable; only recent books available in PDF
archive.org:
👍 has many older books Play lacks; includes OCR'd text layer
👎 OCR errors; photo noise; clunkier workflow
1
3
25
Z-Library:
👍 occasionally has clean PDFs for books which others lack
👎 PDFs are often EPUB->PDF conversions (the worst!); more illegal
4
1
23
One fun project idea: maybe you could improve upon the poor text layers in Play / archive.org's PDFs by building a tool which combines EPUBs and PDFs by aligning the EPUB's original text onto the PDF pages via OCR.
Replying to
Maybe you could improve the EPUB reading experience by extracting text block layout parameters from the PDFs through computer vision: ie. try to estimate the text block width/height, line height, and font size in the original typesetting. Similar technique could map page numbers.
3
16
Related: while e-book reading software are truly impoverished, PDF software is also almost universally unimaginative and unserious for the task of reading. Would love to see more work there…
4
6
45
