This is getting more attention than expected, so full acknowledgements: thanks to Christopher Mattern (from @DeepMindAI) who mentioned this to me about two years ago over Friday Drinks as fun fact and to @owencm for a random afternoon conversation turning into a tiny project 

-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Sure but...45% accuracy is not exactly good. You can get close to 88% with a linear classifier. You can get 95% with nearest-neighbor/L2 distance. No deep learning necessary. But if you want more than 99% without losing your computational shirt, go with ConvNets.
-
Thanks! That's true
I would not recommend anyone use this classifier in seriousness
I was surprised it is working this well at all and better than nearest-neighbor on pixel sums.
At best, it's a simple proof-of-concept for information-theoretic approaches
End of conversation
New conversation -
-
-
Nice! For completeness, a link to some of the original classification-by-compression work: https://www.cs.waikato.ac.nz/~eibe/pubs/Frank_categorization.full.ps.gz …
-
Thanks! I had been looking around a bit for similar papers but haven't found much. It seems well-known in the statistical compression community. Indeed, I have to thank Christopher Mattern (from
@DeepMindAI) for mentioning this over drinks three years ago as a fun fact/idea
End of conversation
New conversation -
-
-
I was recently wondering about sth similar: you can probably just count the number of pixels (i.e., just do a sum over the pixel values) to classify MNIST images with ~50% accuracy, which isn't too bad.
-
Actually, we have tried that
You only get 20% accuracy. Zip compression indeed performs significantly better.
If you scroll down in the Jupyter Notebook, you can see results for summing on both binarized MNIST and vanilla MNIST.
https://github.com/BlackHC/mnist_by_zip/blob/master/MNIST_by_zip.ipynb …
End of conversation
New conversation -
-
-
Yaa, I mean ummm... It is definitely creative. Btw, isn't even like random coins should get like 50 percent accuracy?
-
Random baseline accuracy is 10%
End of conversation
New conversation -
-
-
“We are uncertain whether this is an appraisal of zip compression or an indictment of the MNIST dataset.”
-
After 30 years of optimizing on it, the MNIST test set is no longer a test set; it is rather a validation set.
- 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
You can get 45% accuracy on binarized MNIST using class-wise compression and counting bits
No