I wonder what would happen if in the least squares problem one uses an approximate decomposition approach instead of Cholesky's. Faster but too noisy or failure to converge? (Maybe noise is good somehow; it's ML, you never know)
-
-
-
:-) Tried that a bit, approximating with network (not described here)! Slightly worse performance, sometimes a lot worse. But def faster, and changes to m^2 scaling. Original Kanerva Machine also used a different network for w, just conditioned on z but not M, which was worse.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.