With BPROP training I get 2.3% of errors in MNIST with just 1000 hidden units 2-layers net, no convolution layers. Pretty good AFAIK.
-
-
Probably the Relu layers, which are a conv network feature, makes the difference. Don’t know Keras enough to suggest a model.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
another thing that can make a difference is softmax as activation function VS the normal 0/1 vector. Tempted to try…
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
disregard the last tweet, I’m actually using exactly softmax already.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
just modified my NN to use a relu activation function, let’s see if it converges at all…
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
oh also I see that the cost function is different in your case, I’m using RMS which is not ideal with softmax AFAIK.
-
you should be using cross entropy. Probability distributions are not an Euclidean space.
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.