Does training models with free-text rationales facilitate learning *for the right reasons*? 🤔
We ask this question in our #EMNLP2022 paper, "Does Self-Rationalization Improve Robustness to Spurious Correlations?" arxiv.org/abs/2210.13575
W/
🧵 1/n
Conversation
We train 6 model types on NLI & commonsense QA with/without free-text rationales and measure robustness to spurious correlations through 1) challenge datasets 2) test sets where reliance on spurious correlations would lead to incorrect answers 👩🏫
2/n
1
2
While results are model/task-specific, we observe some general trends 📈:
- Data: Improvements tend to be in lower-resource settings & self-rationalization can hurt in higher-resource settings
- Model size: Within model families, larger models benefit more from rationales
3/n
1
3
We also find that *rationale content* affects results. Training with positive rationales in the ECQA dataset improves robustness, while using freeflow/negative rationales harms robustness.
4/n
1
3
Overall, the variability of our results suggests that, despite the appeal of self-rationalization models for increasing model trustworthiness, self-rationalization training can have the unintended effect of *increasing* reliance on spurious features and biases 🚨
5/5
