Jascha

@jaschasd

San Francisco
Vrijeme pridruživanja: kolovoz 2009.

Tweetovi

Blokirali ste korisnika/cu @jaschasd

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @jaschasd

  1. 24. sij

    This has surprisingly little effect on prediction accuracy, but does improve the match between networks considered by theory and used in practice. With Roman Novak, , and . Implementation in .

    Prikaži ovu nit
    Poništi
  2. 24. sij

    This makes NTK training dynamics dissimilar from those of standard finite width networks. (Infinite width Bayesian networks, NNGPs, don't suffer from this problem.) In we derive infinite width kernels for the *standard* parameterization, resolving this.

    Prikaži ovu nit
    Poništi
  3. 24. sij

    Research on the Neural Tangent Kernel (NTK) almost exclusively uses a non-standard neural network parameterization, where activations are divided by sqrt(width), and weights are initialized to have variance 1 rather than variance 1/width.

    Prikaži ovu nit
    Poništi
  4. proslijedio/la je Tweet
    12. pro 2019.

    I'll be presenting this work with and tomorrow morning (Friday Dec 13) at 11:30am at the NeurIPS Deep Inverse Problems workshop ()

    Prikaži ovu nit
    Poništi
  5. proslijedio/la je Tweet
    12. pro 2019.
    Prikaži ovu nit
    Poništi
  6. proslijedio/la je Tweet
    12. pro 2019.

    Visit our poster today(12/12 Thu) at 10:45am! #175 "Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent"- work with , , , Roman Novak, , Jeffrey Pennington

    Prikaži ovu nit
    Poništi
  7. 6. pro 2019.
    Prikaži ovu nit
    Poništi
  8. 6. pro 2019.

    Infinite width networks (NNGPs and NTKs) are the most promising lead for theoretical understanding in deep learning. But, running experiments with them currently resembles the dark age of ML research before ubiquitous automatic differentiation. Neural Tangents fixes that.

    Prikaži ovu nit
    Poništi
  9. proslijedio/la je Tweet
    29. lis 2019.

    1/ I can't teach you how to dougie but I can teach you how to compute the Gaussian Process corresponding to infinite-width neural network of ANY architecture, feedforward or recurrent, eg: resnet, GRU, transformers, etc ... RT plz💪

    Prikaži ovu nit
    Poništi
  10. 23. lis 2019.

    Even better -- the code runs in your browser in colab!

    Poništi
  11. 27. ruj 2019.

    Neural reparameterization improves structural optimization! By parameterizing physical design in terms of the (constrained) output of a neural network, we propose stronger and more elegant bridges, skyscrapers, and cantilevers. With shoyer@ samgreydanus@

    Poništi
  12. proslijedio/la je Tweet
    23. svi 2019.
    Poništi
  13. 12. svi 2019.

    Takeaways: 1) A prescription for adjusting SGD hyperparameters with width 2) Generalization strictly improves with width 3) Test accuracy is predicted surprisingly well by a single scalar (eq 4) 4) There is a critical width, beyond which optimal hyperparameters are unachievable

    Prikaži ovu nit
    Poništi
  14. 12. svi 2019.

    A careful empirical study of the effect of network width on generalization and fixed learning rate SGD, for MLPs, convnets, resnets, and batch norm. With superstar resident Daniel Park, and + Sam Smith.

    Prikaži ovu nit
    Poništi
  15. proslijedio/la je Tweet
    9. svi 2019.

    Exciting work in the evolution approach and meta learning! — blowing us away with neuron local meta learned update rules and their surprising generalization ability (across datasets, data modalities, and architecture!!)

    Poništi
  16. proslijedio/la je Tweet
    9. svi 2019.
    Poništi
  17. proslijedio/la je Tweet
    2. svi 2019.

    1/ Does batchnorm make optimization landscape more smooth? says yes, but our new paper shows BN causes grad explosion in randomly initialized deep BN net. Contradiction? We clarify below

    Prikaži ovu nit
    Poništi
  18. 22. ožu 2019.

    : is the most fun to work with. He always has extremely novel ideas ... and makes the most mesmerizing animations.

    Poništi
  19. 20. ožu 2019.

    Including a massive, well curated, dataset mapping hyperparameter configuration to model performance. This may be a useful resource in your own research.

    Poništi
  20. 7. ožu 2019.

    Wonderful collaboration with powerhouses Jeffrey Pennington Vinay Rao and . Thanks to Sergey Ioffe, , and for comments and discussion along the way.

    Prikaži ovu nit
    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·