Skip to content
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
arimorcos's profile
Ari Morcos
Ari Morcos
Ari Morcos
@arimorcos

Tweets

Ari Morcos

@arimorcos

Research Scientist at @facebookai (FAIR) working to build a science of deep learning. Formerly at @DeepMindAI. Opinions are my own.

San Mateo, CA
arimorcos.com
Joined April 2009

Tweets

  • © 2019 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    Ari Morcos‏ @arimorcos 21 Jun 2018

    Check out our latest preprint (with @maithra_raghu and Samy Bengio) on the relationship between representational similarity and generalization and optimization in CNNs and training and sequence dynamics in RNNs! 1/N Blog: https://ai.googleblog.com/2018/06/how-can-neural-network-similarity-help.html … Paper: http://arxiv.org/abs/1806.05759 

    9:37 AM - 21 Jun 2018
    • 18 Retweets
    • 97 Likes
    • Dhruv Naik hardmaru Pine σ² Kiran Vodrahalli Adam Santoro Enric Guinovart Inkit Padhi Fahd Alhazmi 🧠 Chirantan Ghosh
    5 replies 18 retweets 97 likes
      1. New conversation
      2. Ari Morcos‏ @arimorcos 21 Jun 2018

        Building off @maithra_raghu's awesome previous work on SVCCA (https://arxiv.org/abs/1706.05806 ) we first develop a new weighted version of CCA distance which better distinguishes between signal and noise in representational similarity.pic.twitter.com/tz20LpOGMx

        1 reply 0 retweets 1 like
        Show this thread
      3. Ari Morcos‏ @arimorcos 21 Jun 2018

        Applying CCA to generalizing and memorizing networks, we found that generalizing networks converged to more similar solutions than memorizing networks, especially in later layers, suggesting that while there are lots of ways to memorize, there are only a few ways to generalize.pic.twitter.com/IRnAxumSMJ

        1 reply 0 retweets 1 like
        Show this thread
      4. Ari Morcos‏ @arimorcos 21 Jun 2018

        Inspired by the lottery ticket hypothesis (https://arxiv.org/abs/1803.03635  from @jefrankle), we reasoned that wider networks should converge to more similar solutions than narrow networks since larger networks are more likely to "win the initialization lottery," and indeed, they do!pic.twitter.com/1vrCZ0Kr4e

        1 reply 0 retweets 2 likes
        Show this thread
      5. Ari Morcos‏ @arimorcos 21 Jun 2018

        CCA also reveals clusters of solutions across different initializations and learning rates, and that these nearly perfectly match the clusters found in my previous work (https://arxiv.org/abs/1803.06959 ), suggesting that nets with similar performance often learn highly diverse solutions.pic.twitter.com/I51KXIwbOm

        1 reply 0 retweets 3 likes
        Show this thread
      6. Ari Morcos‏ @arimorcos 21 Jun 2018

        Moving to RNNs, we found that like CNNs, RNNs exhibit bottom-up convergence dynamics, in which layers close to the input converge to their final representations earlier in training than layer close to the output.pic.twitter.com/JBFbka83zl

        1 reply 0 retweets 1 like
        Show this thread
      7. Ari Morcos‏ @arimorcos 21 Jun 2018

        Because representations in RNNs change over a sequence, CCA is an ideal tool to study the extent to which they vary non-linearly. Interestingly, we found that the non-linearity of recurrence depends on the number of past inputs to an RNN. Lots more to do in this direction though!pic.twitter.com/E0UOHZrtVn

        1 reply 0 retweets 1 like
        Show this thread
      8. Ari Morcos‏ @arimorcos 21 Jun 2018

        Check out the paper for more detail and controls with other distance metrics, etc. The code for CCA is also open-source (https://github.com/google/svcca/ ), so apply CCA to your networks and see what you find!

        0 replies 2 retweets 4 likes
        Show this thread
      9. End of conversation
      1. New conversation
      2. Daniel Bear‏ @recursus 21 Jun 2018
        Replying to @arimorcos @maithra_raghu

        Looks really interesting Ari! In the RNN case, are you suggesting that these task-optimized networks are not necessarily falling into “attractor states” over the course of the input sequence?

        1 reply 0 retweets 0 likes
      3. Ari Morcos‏ @arimorcos 21 Jun 2018
        Replying to @recursus @maithra_raghu

        Thanks, Dan! The RNN over the course of a sequence work is definitely the least fleshed out, so I'm hesitant to draw any strong conclusions, but it does seem like the hidden states over the course of a sequence aren't that similar subject to a linear transform.

        1 reply 0 retweets 0 likes
      4. Ari Morcos‏ @arimorcos 21 Jun 2018
        Replying to @arimorcos @recursus @maithra_raghu

        In fact, the similarity between CCA, cosine and Euclidean distance suggests that there isn't necessarily even a strong rotation/scale component at all.

        1 reply 0 retweets 0 likes
      5. Ari Morcos‏ @arimorcos 21 Jun 2018
        Replying to @arimorcos @recursus @maithra_raghu

        To try to work out if this was due to the input or recurrent weights, we tried repeating the same input and measuring distance between states. CCA then shows the states as similar, suggesting that the recurrent dynamics are not inherently non-linear.

        1 reply 0 retweets 0 likes
      6. Ari Morcos‏ @arimorcos 21 Jun 2018
        Replying to @arimorcos @recursus @maithra_raghu

        But, oddly, this result changes depending on how many previous inputs the RNN has seen over inference, so if you've shown the RNN 30,000 previous tokens, this effect decreases substantially. This seems to suggest that the non-linearity of the dynamics depends on network past.

        1 reply 0 retweets 0 likes
      7. Ari Morcos‏ @arimorcos 21 Jun 2018
        Replying to @arimorcos @recursus @maithra_raghu

        Not really sure what to make of that result, but definitely an interesting question for future work!

        1 reply 0 retweets 0 likes
      8. Daniel Bear‏ @recursus 21 Jun 2018
        Replying to @arimorcos @maithra_raghu

        Looking forward to close reading!

        0 replies 0 retweets 0 likes
      9. End of conversation
      1. New conversation
      2. Martin Mundt‏ @mundt_martin 22 Jun 2018
        Replying to @arimorcos @poolio @maithra_raghu

        Thanks for the great work! Have you tried inverting the currently common design scheme of increasing layer width with depth , i.e. larger early widths with monotical decrease as is common in MLPs. I've found the correlation to flip & performance to be unaffected in these cases.

        1 reply 0 retweets 0 likes
      3. Ari Morcos‏ @arimorcos 22 Jun 2018
        Replying to @mundt_martin @poolio @maithra_raghu

        No, we haven't tried that, but it would be interesting to see what happens in that case!

        0 replies 0 retweets 0 likes
      4. End of conversation
      1. New conversation
      2. Martin Mundt‏ @mundt_martin 22 Jun 2018
        Replying to @arimorcos @poolio @maithra_raghu

        In your larger width networks, how much of the increasing similarity do you think is due to a greater subset not deviating much from the initial state & thus a question of how similar sampling of initialization scheme is? Or is this what you mean with "winning the lottery"?

        1 reply 0 retweets 0 likes
      3. Ari Morcos‏ @arimorcos 22 Jun 2018
        Replying to @mundt_martin @poolio @maithra_raghu

        In the Appendix, we have a random weights control which shows that the increase in similarity is not merely due to increased layer size. The lottery ticket hypothesis frames it as a larger networks having a higher chance of a lucky init in a subnetwork.

        1 reply 0 retweets 0 likes
      4. Martin Mundt‏ @mundt_martin 22 Jun 2018
        Replying to @arimorcos @poolio @maithra_raghu

        Ah thanks! I should have taken a look at the appendix :)

        0 replies 0 retweets 0 likes
      5. End of conversation

    Loading seems to be taking a while.

    Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

      Promoted Tweet

      false

      • © 2019 Twitter
      • About
      • Help Center
      • Terms
      • Privacy policy
      • Cookies
      • Ads info