Conversation

Replying to
This will send large chunks of ndjson (none broken in the middle of a json object) and the chunk can be sent to a multiprocessing worker thread and then split and decoded there. Should be able to process hundreds of thousands a second if the I/O is fast enough. We'll see.
Replying to
Assuming that reading stdin hits disk somewhere, in this approach you are reading 256K of data upfront from "slow media" and then finding the line in-memory in this 256K 'data' buffer. "for line in sys.stdin" is having to go to slow media for _each_ line. I assume your code ...
1
.. above is incomplete because I'd expect a for-loop after reading 256KB from stdin in which you keep looking for subsequent lines in 'data' and keep dispatching them to your pipelines. You stop only when what's left is a partial ndjson, then you "refill" from stdin. HTH.
1
Show replies