Conversation

Replying to
As I understand, a contrastive edit (CE) must satisfy: 1) f(x') = y' And a CE ideally satisfies: 2) Minimality - x and x' are close (i.e. d(x, x') < epsilon, for some metric d). 3) Fluency - x' is fluent As shorthand, let us call CEs satisfying 2 and 3 an ACE (Amazing CE)
1
As I understand, a targeted adversarial perturbation (TAP) must satisfy 1 and 2 (perhaps for a small epsilon even), with work that considers fluency, i.e. 3. As you point out, an adversarial perturbation also is expected to 4) have the same true label.
1
But then, isn't every TAP an ACE? And, therefore, aren't methods to generate TAPs that work also good generators of ACEs (for the conditions you have stated). This seems to contradict your last sentence in Section 6 on Adversarial Examples.
2
I will say that I implicitly agree that adversarial examples do not (necessarily) feel like compelling explanations (especially if they are disfluent; perhaps also depending on the degree of minimality). But, under the conditions for a CE and ACE, shouldn't TAPs be valid?
1
1
Now, I don't closely follow the adversarial literature, but my understanding is that in vision, it is possible to synthesize targeted adversarial attacks (especially with glassbox access). Perhaps given that adversarial attacks are harder to generate in NLP, the emphasis ...
1
you are placing is on the shortcoming of current adversarial methods (but that you believe sufficiently improved TAP generators are valid ACE generators). If so, I should note that this was at least very unclear to me; my reading was you viewed these as separate but related.
1
Replying to
Hi Rishi, thanks for your question! You are right that TAPs and ACEs both satisfy constraints 1-3 but not 4. However, it does not follow that methods to generate TAPS are also good generators of ACEs—I’ll try to clarify here why.
1
1
An example of this kind of edit is the first example in Table 5 in our appendix. This edit would not qualify as a TAP, given that the human/true label for the edited input would also change with the edit. Thus, TAP methods would not generate this edit.
1
Show replies