- "NNs must be trained by backpropagation" - "hybrid approaches like SVMs on top of NN features will always work better" - "backpropagation in any form is biologically implausible" - "CNNs are nothing like the human visual cortex & certainly don't predict its activations"
-
-
Show this thread
-
- "small NNs can't be trained directly, so NNs must need to be big" - "NNs only learn task-specific features, certainly no kind of hidden or latent 'dark knowledge'" - [style transfer arrives] "Who ordered that?" - "simple SGD is the worst update rule"
Show this thread -
- "simple SGD is the worst update rule" - "simple self-supervision like next-frame prediction can't learn semantics" - "adversarial examples're easy to fix & won't transfer, well, won't blackbox transfer, well, won't transfer to realworld, well..." - [batchnorm arrives] "Oops."
Show this thread -
- "big NNs overfit by memorizing data" - "you can't train 1000-layer NNs but that's OK, that wouldn't be useful anyway" - "big minibatches don't generalize" - "NNs aren't Bayesian at all"
Show this thread -
- "convolutions are only good for images; only LSTM RNNs can do translation/seq2seq/generation/meta-learning" - "you need small learning rates, not superhigh ones, to get fast training" (superconvergence) - "memory/discrete choices aren't differentiable"
Show this thread -
- [CycleGAN arrives] "Who ordered that?" - "you can't learn to generate raw audio, it's too low-level" - "you need bilingual corpuses to learn translation" - "you need shortcut connections, not new activations or initializations to train 1000-layer nets"
Show this thread -
- "NNs can't do zero-shot or few-shot learning" - "NNs can't do planning, symbolic reasoning, or deductive logic" - "NNs can't do causal reasoning" - "pure self-play is unstable and won't work"
Show this thread -
- "only convolutions and LSTM RNNs can do (translation/...), not feedfoward NNs with attention" - "learning deep environment models is unstable and won't work" - "OK maybe we do need pretraining, er, 'warmup', after all"
Show this thread -
- "we need hierarchical RL to learn long-term strategies" (not bruteforce PPO) - "you can't reuse minibatches for faster training, it never works" (https://arxiv.org/abs/1806.07353 ) - ...
Show this thread -
-
(As
@karpathy says, 'neural networks want to work' - and they are very patient with us as we figure out every possible way to train them orders of magnitude worse, slower, and bigger than necessary...)Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.