Maybe wrong but I don't think what I was doing can be done using dask wi/out 1st writing to disk actually... @seaandsailor @usethespacebar
-
-
I will give it a go but since everything has to be either from disk or cast to a numpy array/pandas df first it actually seems way slower!
1 reply 0 retweets 0 likes -
So far I have tried the latter and it's super slow. I will try writing the pairs of items to disk though and doing delayed loading.
3 replies 0 retweets 0 likes -
Replying to @o_guest @usethespacebar
don't forget you're simulating data but in real life it would be already on disk. Dask is meant to be used when you cannot load all the data
1 reply 0 retweets 1 like -
Replying to @seaandsailor @usethespacebar
actually I figured out how you do it w/out storing on disk — create np arrays in a loop & concatenate http://dask.pydata.org/en/latest/array-stack.html#concatenate … cc
@neuromusic1 reply 0 retweets 0 likes -
Replying to @o_guest @seaandsailor and
but technically I would need to have the same method for on disk as I haven't got the pairs stored on disk either only single points
1 reply 0 retweets 0 likes -
Replying to @o_guest @seaandsailor and
so iterating over combs might be useful here — but this would be in contrast to the method currently being used with multiprocessing
1 reply 0 retweets 0 likes
Let's see! 
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.