I think the operation is so simple that you always see a slowdown due to the overhead of sending data to other processes and back.
-
-
Replying to @seaandsailor
the thing is though it will run out of memory the other way so this way has to be better for big data — right?
1 reply 0 retweets 1 like -
Replying to @o_guest
There's another issue. I'll try to fix it because it will be easier than explaining it on Twitter, then we can talk about it :)
1 reply 0 retweets 1 like -
-
Replying to @o_guest
Here's a faster version: https://gist.github.com/jfsantos/8184653991558e30a9eab8613a6ea20f …. The trick is to create a list with all combinations at once and use Pool.map.
1 reply 0 retweets 1 like -
Replying to @seaandsailor @o_guest
It's still not faster than the others, but not far from the loop. You'll see more performance gains when the function is more complicated.
2 replies 0 retweets 1 like -
Replying to @seaandsailor @o_guest
You can also improve runtime by using the scipy implementation and sending blocks of points to it instead of the entire X, then concat'ing
1 reply 0 retweets 1 like -
Replying to @seaandsailor
Oh — that sounds sensible — do you know how this is done?
2 replies 0 retweets 0 likes -
Replying to @o_guest @seaandsailor
I tried to do it and failed miserably as many of the points were not being calculated in the pairs.
1 reply 0 retweets 0 likes -
Replying to @o_guest
Also, if you're facing this kind of issue a lot, you might want to take a look at Dask: http://dask.pydata.org/en/latest/ .
1 reply 0 retweets 1 like
ok will do — thanks for all the tips I really appreciate it. 
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
