Conversation

To make this difference more concrete, imagine a model makes a correct prediction originally, and an ACE results in an input for which the model changes its output to another label that a human would also give for that edited input.
1
An example of this kind of edit is the first example in Table 5 in our appendix. This edit would not qualify as a TAP, given that the human/true label for the edited input would also change with the edit. Thus, TAP methods would not generate this edit.
1
However, this edit would not be a good TAP, since it’s unclear what the true label for this edited input is due to its mixed signals. Thus, TAP methods may be designed to exclude edits w/ mixed signals, though such examples are of interest to ACE generation methods.
1
This goal differs from the goal of ACEs, which is to explain. For explanation purposes, the ACE in Table 5 is still useful, even though it did not deceive the model, as it allows us to verify that the model got the initial prediction right for the right reasons.
1
This larger difference in goals may also influence methodology in the additional ways, which point to interesting directions for future work: 1) Unlike in work on adv. examples, the goal of research on contrastive edits is to achieve strict minimality as we discuss in Sect. 5.
1