This means that many ACEs we’d want our methods to generate would not count as TAPs and thus would be excluded by adversarial methods.
Conversation
To make this difference more concrete, imagine a model makes a correct prediction originally, and an ACE results in an input for which the model changes its output to another label that a human would also give for that edited input.
1
An example of this kind of edit is the first example in Table 5 in our appendix. This edit would not qualify as a TAP, given that the human/true label for the edited input would also change with the edit. Thus, TAP methods would not generate this edit.
1
As another example, for the input in Table 5, if we saw that only editing “3/10” -> “9/10" led to the contrast prediction, this edit would be an ACE that highlights a dataset artifact—a reliance on the numerical rating.
1
However, this edit would not be a good TAP, since it’s unclear what the true label for this edited input is due to its mixed signals. Thus, TAP methods may be designed to exclude edits w/ mixed signals, though such examples are of interest to ACE generation methods.
1
Secondly, constraint 4 points to a larger difference in the goals of TAPs and ACEs—Adversarial examples are meant to deceive, not interpret, models. (This chapter offers a nice discussion of this difference: christophm.github.io/interpretable-).
1
This goal differs from the goal of ACEs, which is to explain. For explanation purposes, the ACE in Table 5 is still useful, even though it did not deceive the model, as it allows us to verify that the model got the initial prediction right for the right reasons.
1
This larger difference in goals may also influence methodology in the additional ways, which point to interesting directions for future work: 1) Unlike in work on adv. examples, the goal of research on contrastive edits is to achieve strict minimality as we discuss in Sect. 5.
1
2) Another goal of ACEs is to create edits that are immediately understandable to people, which could require different, more human-centered definitions of minimality.
1
For instance, we may want our edits to be minimal in the sense of having edits be contiguous, since connected edits are more understandable than edits that alter disconnected parts of input. This is also not of interest to work on adversarial examples.
1
Hopefully this clarifies how the goals of adversarial examples and contrastive edits come apart and why we’d expect to see this difference in methods for the two. And I will check out this paper—Thanks for sharing!
