Conversation

We train 6 model types on NLI & commonsense QA with/without free-text rationales and measure robustness to spurious correlations through 1) challenge datasets 2) test sets where reliance on spurious correlations would lead to incorrect answers 👩‍🏫 2/n
Image
1
2
While results are model/task-specific, we observe some general trends 📈: - Data: Improvements tend to be in lower-resource settings & self-rationalization can hurt in higher-resource settings - Model size: Within model families, larger models benefit more from rationales 3/n
Image
1
3
Replying to
Overall, the variability of our results suggests that, despite the appeal of self-rationalization models for increasing model trustworthiness, self-rationalization training can have the unintended effect of *increasing* reliance on spurious features and biases 🚨 5/5
8