Still no nearer to reading a batch of pdfs and outputting the needed data without opening them really
-
-
the tesseract lib itself does its one narrow job very well but building out a practical ETL pipeline to plug it into is 99% of the work and is very hard. pdfminer made the ETL pipeline part as easy as passing in a file handle.
-
it has mild learning curve but I figured it out enough for practical use in under a week of lots of swearing-at-my-monitor work sessions. was worth it though.
-
this was helpful for learning curvehttps://www.youtube.com/watch?v=k34wRxaxA_c …
-
I was trying to follow this video earlier today. Nevertheless encouraged.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.