Context- regarding the new Bit-M models. I'm personally excited because I know the Facebook WSL models make a big difference practically. There's an easy to miss variation that looks more practical: 2x wide 152 layer. Competitive benchmarks, way less memory needed.https://twitter.com/wightmanr/status/1263615215108870151 …
Yeah it's a bit of a tease to not have access to those models. But in my mind we probably just got our new best set of models to do transfer learning on for free, so I can't complain too much! I just wish they'd tabulate the # of parameters and top 1 somewhere.