I will be presenting “Similarity of Neural Network Representations Revisited” (joint work with @Mo_Norouzi, @honglaklee, and @geoffreyhinton) at @icmlconf Thursday (tomorrow!) at 12:15 in Hall A (Deep Learning) and at Poster #20. Paper/code: http://cka-similarity.github.io . Thread below.
-
Prikaži ovu nit
-
Idea: We can measure similarity of representations as the sum of squared similarities between all possible pairs of features, or as similarity of similarities between examples. If “similarity” = dot product, these are equivalent. Normalization yields a similarity index in [0, 1].
1 reply 0 proslijeđenih tweetova 0 korisnika označava da im se sviđaPrikaži ovu nit -
This idea has been discovered many times. Centered kernel alignment (http://www.jmlr.org/papers/volume13/cortes12a/cortes12a.pdf …) is a generalization where an arbitrary kernel is used to measure inter-example similarity. Sometimes an RBF kernel is better, but often the dot product (linear kernel) works fine.
1 reply 0 proslijeđenih tweetova 0 korisnika označava da im se sviđaPrikaži ovu nit -
Linear CKA is closely connected to canonical correlation (CCA). The main difference is that CKA weights directions in representation space by variance explained. CCA-based similarity resembles CKA on orthonormalized representations, and thus weights all directions equally.
1 reply 0 proslijeđenih tweetova 0 korisnika označava da im se sviđaPrikaži ovu nit -
Why weight by variance? For neural networks trained from different random inits, large principal components are similar
, but small PCs are not
. Below, we color examples according to the first two PCs from a layer of one net and plot them on PCs from another net.pic.twitter.com/GfxgDD6etl
1 reply 0 proslijeđenih tweetova 4 korisnika označavaju da im se sviđaPrikaži ovu nit -
This suggests a simple sanity check for similarity indexes: We train architecturally identical networks from different inits, and we check whether a layer from net A is most similar to the architecturally corresponding layer from net B. CKA passes; CCA-based indexes do not.pic.twitter.com/60gqTt49Qv
1 reply 0 proslijeđenih tweetova 2 korisnika označavaju da im se sviđaPrikaži ovu nit -
Finally, we show that CKA gives some insight into what goes wrong when networks don’t behave as we’d like. We train a shallow and a very deep CNN. The deep net is *less* accurate than the shallower net; CKA shows that representations in the second half are essentially identical.pic.twitter.com/kDKrizFPNv
1 reply 0 proslijeđenih tweetova 3 korisnika označavaju da im se sviđaPrikaži ovu nit
If you’re interested in applying this method to your own data, check out our Colab at http://cka-similarity.github.io or come find me in person at ICML or CVPR!
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

. In past lives, I was a neuroscientist