Conversation

I found this to be a cool paper to read over the holidays! I did have a question though (even after reading section 6), regarding the relationship between contrastive edits and (targeted) adversarial examples. 1/n
Quote Tweet
Excited to share our preprint, "Explaining NLP Models via Minimal Contrastive Editing (MiCE)" 🐭 This is joint work with @anmarasovic and @mattthemathman Link to paper: arxiv.org/pdf/2012.13985 Thread below 👇 1/6
Show this thread
1
3
Here is some notation (I tried to stick with the paper's notation): Input: x Edited input: x' = e(x) Model: f Original output/prediction: y = f(x) Contrast(ive) output/prediction: y' Precondition: y' != y 2/n
1
As I understand, a contrastive edit (CE) must satisfy: 1) f(x') = y' And a CE ideally satisfies: 2) Minimality - x and x' are close (i.e. d(x, x') < epsilon, for some metric d). 3) Fluency - x' is fluent As shorthand, let us call CEs satisfying 2 and 3 an ACE (Amazing CE)
1
As I understand, a targeted adversarial perturbation (TAP) must satisfy 1 and 2 (perhaps for a small epsilon even), with work that considers fluency, i.e. 3. As you point out, an adversarial perturbation also is expected to 4) have the same true label.
1
But then, isn't every TAP an ACE? And, therefore, aren't methods to generate TAPs that work also good generators of ACEs (for the conditions you have stated). This seems to contradict your last sentence in Section 6 on Adversarial Examples.
2
I will say that I implicitly agree that adversarial examples do not (necessarily) feel like compelling explanations (especially if they are disfluent; perhaps also depending on the degree of minimality). But, under the conditions for a CE and ACE, shouldn't TAPs be valid?
1
1
Now, I don't closely follow the adversarial literature, but my understanding is that in vision, it is possible to synthesize targeted adversarial attacks (especially with glassbox access). Perhaps given that adversarial attacks are harder to generate in NLP, the emphasis ...
1
you are placing is on the shortcoming of current adversarial methods (but that you believe sufficiently improved TAP generators are valid ACE generators). If so, I should note that this was at least very unclear to me; my reading was you viewed these as separate but related.
1
Show replies