Last year BERT revolutionized NLP and since then there have appeared a large number of improvements over the original implementation: MT-DNN, RoBERTa, AlBERTa. The main feature of these models is...