Conversation

Replying to
Would love to see something like this, but a scoring system that evaluates the quality of the study behind the paper (sufficient N, sufficient participant diversity, double-blind RCT, replicability, etc.)
1
3
Show replies
Replying to
Thank you for using ours as an example :) Hope to share our follow-up work soon: β€œlarge language model (papers) can self-explain”
Quote Tweet
(1/8) *new paper* β€œLLMs can self-improve” w/ *self-generated CoTs* (β€œlogical dark knowledge”), no GT labels: - SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning - SoTA β€œzero-shot” (GSM8K 70.1% -> 74.2%) by prompting arxiv.org/abs/2210.11610
Show this thread
Image
1
14
Replying to
yes, this is an issue. we did not originally design this with the notion that people would select large amounts of text (but they do!)
1
Show replies