Human explanations are *contrastive*–They explain why an event happened *instead of* another event (the contrast case). Making model explanations contrastive could thus make them more user-friendly/useful. However, this property has largely been ignored in interpretable NLP. 2/6
We present Minimal Contrastive Editing, or MiCE, a two-stage approach to generating contrastive explanations of model predictions. A MiCE explanation is a modification of an input that causes the model being explained to change its prediction to a given contrast prediction. 3/6
MiCE has 2 stages: In Stage 1, we train an Editor model to make edits targeting given contrast labels. In Stage 2, we use the Editor to make edits using both binary search and beam search to find edits resulting in the highest contrast prediction probabilities from the model. 4/6
Experiments on classification/multiple-choice Q&A show that MiCE edits are not only contrastive, but also *minimal* and *fluent*, consistent with human contrastive edits. 5/6
Finally, we show how MiCE edits can be used for two use cases in NLP system development–discovering dataset artifacts (ex: IMDB edit below) and debugging incorrect model predictions (ex: RACE edit below). Feel free to to reach out with any questions or comments! 6/6