Conversation

Replying to
As I understand, a targeted adversarial perturbation (TAP) must satisfy 1 and 2 (perhaps for a small epsilon even), with work that considers fluency, i.e. 3. As you point out, an adversarial perturbation also is expected to 4) have the same true label.
1
But then, isn't every TAP an ACE? And, therefore, aren't methods to generate TAPs that work also good generators of ACEs (for the conditions you have stated). This seems to contradict your last sentence in Section 6 on Adversarial Examples.
2
I will say that I implicitly agree that adversarial examples do not (necessarily) feel like compelling explanations (especially if they are disfluent; perhaps also depending on the degree of minimality). But, under the conditions for a CE and ACE, shouldn't TAPs be valid?
1
1
Now, I don't closely follow the adversarial literature, but my understanding is that in vision, it is possible to synthesize targeted adversarial attacks (especially with glassbox access). Perhaps given that adversarial attacks are harder to generate in NLP, the emphasis ...
1
you are placing is on the shortcoming of current adversarial methods (but that you believe sufficiently improved TAP generators are valid ACE generators). If so, I should note that this was at least very unclear to me; my reading was you viewed these as separate but related.
1
Replying to
Hi Rishi, thanks for your question! You are right that TAPs and ACEs both satisfy constraints 1-3 but not 4. However, it does not follow that methods to generate TAPS are also good generators of ACEs—I’ll try to clarify here why.
1
1
To make this difference more concrete, imagine a model makes a correct prediction originally, and an ACE results in an input for which the model changes its output to another label that a human would also give for that edited input.
1
However, this edit would not be a good TAP, since it’s unclear what the true label for this edited input is due to its mixed signals. Thus, TAP methods may be designed to exclude edits w/ mixed signals, though such examples are of interest to ACE generation methods.
1
Show replies