Would love to hear details about this!
-
-
Replying to @xuenay
The only one I wrote up was this one. The then-most-hyped version of RL+backprop turned out to work less well than RL+perceptron. https://www.ijcai.org/Proceedings/91-2/Papers/018.pdf …
1 reply 0 retweets 2 likes -
Replying to @Meaningness @xuenay
The one that got me really annoyed was XOR. The narrative was that Minsky&Papert unfairly killed perceptrons with that, and you could learn XOR if you added hidden layers. 1/
1 reply 0 retweets 1 like -
Replying to @Meaningness @xuenay
This turned out not to be true in any interesting sense. You can compute XOR with a feedforward network, but backprop won’t learn it reliably, nor in a reasonable length of time. It has to get lucky. 2/
2 replies 0 retweets 2 likes -
Replying to @Meaningness
OH. And I thought that I was doing something wrong back when I was trying to get an intuition for backprop and was simulating it by hand in a spreadsheet, but couldn't get it to converge for XOR.
1 reply 0 retweets 0 likes -
Replying to @xuenay
I did this 25 years ago, but my recollection is that backprop can only find the solution by accident. There’s no guiding gradient. You have to set the hyperparameters to force a random walk over the whole space, and hope it falls into the golf hole eventually.
1 reply 0 retweets 1 like -
Replying to @Meaningness @xuenay
This means that it takes O(2^n) steps to learn n-bit XOR, and the probability of getting permanently stuck in a local minimum (wrong answer) goes up rapidly with n too. (I forget exactly how rapidly.)
1 reply 0 retweets 1 like -
Replying to @Meaningness @xuenay
That said, parity is an unusually hard problem to solve feedforward (see circuit complexity), although trivial if you process it as a sequence. The same is true for us: easy to compute a 20 bit parity step by step, impossible in one glance.
2 replies 0 retweets 1 like -
Oh, I should have said also, I meant bitwise parallel xor, which does not require a deep circuit; not parity. Bitwise xor takes backprop exponential time to learn.
1 reply 0 retweets 0 likes -
Parallel xor is still not a great test of anything; but the right defense of backprop should have been “xor is not a good test; here is a good test, and backprop passes.”
1 reply 0 retweets 0 likes
To be totally careful I should also say that my memory is that it’s exponential, but it was a long time ago, and I may be wrong.
-
-
Replying to @Meaningness @xuenay
It's probably sensitive to the objective, e.g., squared error on the output as an n-dim real vector should train each bit independently. Regardless, it's true that "ML is magic" is often the intuitive takeaway, which is wrong and misleading!
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.