Conversation

Replying to
Now, I don't closely follow the adversarial literature, but my understanding is that in vision, it is possible to synthesize targeted adversarial attacks (especially with glassbox access). Perhaps given that adversarial attacks are harder to generate in NLP, the emphasis ...
1
you are placing is on the shortcoming of current adversarial methods (but that you believe sufficiently improved TAP generators are valid ACE generators). If so, I should note that this was at least very unclear to me; my reading was you viewed these as separate but related.
1
Replying to
Hi Rishi, thanks for your question! You are right that TAPs and ACEs both satisfy constraints 1-3 but not 4. However, it does not follow that methods to generate TAPS are also good generators of ACEs—I’ll try to clarify here why.
1
1
To make this difference more concrete, imagine a model makes a correct prediction originally, and an ACE results in an input for which the model changes its output to another label that a human would also give for that edited input.
1
An example of this kind of edit is the first example in Table 5 in our appendix. This edit would not qualify as a TAP, given that the human/true label for the edited input would also change with the edit. Thus, TAP methods would not generate this edit.
1
However, this edit would not be a good TAP, since it’s unclear what the true label for this edited input is due to its mixed signals. Thus, TAP methods may be designed to exclude edits w/ mixed signals, though such examples are of interest to ACE generation methods.
1
This goal differs from the goal of ACEs, which is to explain. For explanation purposes, the ACE in Table 5 is still useful, even though it did not deceive the model, as it allows us to verify that the model got the initial prediction right for the right reasons.
1
This larger difference in goals may also influence methodology in the additional ways, which point to interesting directions for future work: 1) Unlike in work on adv. examples, the goal of research on contrastive edits is to achieve strict minimality as we discuss in Sect. 5.
1
Show replies