Map reducable? Or do you need to retain context for all-vs-all at all times?
-
-
-
I have been given a solution that samples the distances and this does the job — I do not (yet) understand the code very much.
-
I understood enough to pluck from where it was, fix the globals (naughty naughty), and use it on dummy data & compare w pdist.
-
It's v obv sampling using a step, but it possibly can cope w step = 1 (i.e., sample all) as it's written with multiprocessing.
-
The solution exists because I have taken over the coding bit of project from previous coder/pd.
End of conversation
New conversation -
-
-
maybe you should use a different algorithm. Using KNN or any lasy learner is most of time bad for big data... Try SVM or XGBoost
-
this isn't a neural network... I just need to calculate pairwise distances...
-
I have some x, y, points that's it.
End of conversation
New conversation -
-
-
afaik scipy & numpy copy arrays when operating, so maybe try using iterators? https://gist.github.com/alep/603b41674dcb0b1fd5d2b752d386840d …
-
ah good tip! Explains the reason it's running out of memory, I guess. I'll see what I can do. Just 1 call to pdist currently.
End of conversation
New conversation -
-
-
had this with MATLAB. n points requires nxn matrix. I use for loop and devide problem in chunks of the max var size eg 10x nx(n/10)
-
basically what I have been given as the solution by somebody else, yep!
End of conversation
New conversation -
-
-
you can't use data streaming ou chunk processing?
-
I probably can, I'll Google how!
End of conversation
New conversation -
-
-
Erratum, not quite 400Gb (rather 50 ish) but you get my point ;)
-
I don't think it used that much of my RAM which can fit 50GB. But I have a solution now — just need to try it out tomorrow.
End of conversation
New conversation -
-
-
How many "items" are you computing the distance for? 10k * 10k * 64bits precision float > 400Gb
-
tens of millions
End of conversation
New conversation -
-
-
@cMadan or try random sampling pairs from all pairs and iterate -
any specific package suggestion?
-
hm, not that I know, probably need to code it yourself... :P
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
