Conversation

As I understand, a contrastive edit (CE) must satisfy: 1) f(x') = y' And a CE ideally satisfies: 2) Minimality - x and x' are close (i.e. d(x, x') < epsilon, for some metric d). 3) Fluency - x' is fluent As shorthand, let us call CEs satisfying 2 and 3 an ACE (Amazing CE)
1
As I understand, a targeted adversarial perturbation (TAP) must satisfy 1 and 2 (perhaps for a small epsilon even), with work that considers fluency, i.e. 3. As you point out, an adversarial perturbation also is expected to 4) have the same true label.
1
But then, isn't every TAP an ACE? And, therefore, aren't methods to generate TAPs that work also good generators of ACEs (for the conditions you have stated). This seems to contradict your last sentence in Section 6 on Adversarial Examples.
2
I will say that I implicitly agree that adversarial examples do not (necessarily) feel like compelling explanations (especially if they are disfluent; perhaps also depending on the degree of minimality). But, under the conditions for a CE and ACE, shouldn't TAPs be valid?
1
1
Now, I don't closely follow the adversarial literature, but my understanding is that in vision, it is possible to synthesize targeted adversarial attacks (especially with glassbox access). Perhaps given that adversarial attacks are harder to generate in NLP, the emphasis ...
1
you are placing is on the shortcoming of current adversarial methods (but that you believe sufficiently improved TAP generators are valid ACE generators). If so, I should note that this was at least very unclear to me; my reading was you viewed these as separate but related.
1
Replying to
Hi Rishi, thanks for your question! You are right that TAPs and ACEs both satisfy constraints 1-3 but not 4. However, it does not follow that methods to generate TAPS are also good generators of ACEs—I’ll try to clarify here why.
1
1
To make this difference more concrete, imagine a model makes a correct prediction originally, and an ACE results in an input for which the model changes its output to another label that a human would also give for that edited input.
1
An example of this kind of edit is the first example in Table 5 in our appendix. This edit would not qualify as a TAP, given that the human/true label for the edited input would also change with the edit. Thus, TAP methods would not generate this edit.
1
Show replies
Replying to
Yup, I agree that a "perfect" TAP generator necessarily will fail to generate many valid ACEs, given TAPs are a strict subset of ACEs in terms of the conditions they must satisfy (perhaps part of my question is whether there are unstated/informal additional desiderata for ACEs).
1
Indeed, it seems this could be strengthened by arguing that the space of TAPs vanishes in the space of ACEs as models become more accurate. And it seems like having a high-coverage ACE generator might be desirable (e.g. it better elucidates the local decision boundary).
1
Show replies