I’ve been studying dynamics of reader memory with the mnemonic medium, running experiments on interventions, etc. A big challenge has been that I'm roughly trying to understand changes in a continuous value (depth of encoding) through discrete measurements (remembered / didn’t).
Conversation
I can approximate a continuous measure by looking at populations: “X% of users in situation Y remembered.” Compare that % for situations Y and Y’ to sorta measure an effect. This works reasonably well when many users are “just on the edge” of remembering, and poorly otherwise…
1
5
It’s a threshold function on the underlying distribution. Imagine that a person will remember something iff their depth-of-encoding (a hidden variable)—plus some random noise (situation)—is greater than some threshold. Our population measure can distinguish A vs A’, not B vs B’.
2
6
So it works pretty well initially, when the distribution’s spread out. e.g.: I’ve been running an RCT on retry mechanics. Of readers who forget an answer while reading an essay, about 20% more will succeed in their first review if the in-essay prompt gave them a chance to retry.
2
1
6
But it doesn’t work well when the distribution’s skewed to one side. eg: I’ve run RCTs manipulating schedules. You might think shortened intervals would help struggling readers, but it has little effect on the population measure—just (likely) nudges some closer to the threshold.
1
2
Lack of a good continuous measure makes it hard to characterize the dynamics of what’s going on, which makes it hard to make iterative improvements. I’ll need to find some good solution here. Unfortunately, response times are (AFAICT) not a strong enough predictor to use.
2
6
Incidentally, this is part of why Ebbinghaus used nonsense syllables: he was memorizing sequences he’d *never* remember on the first try in subsequent tests. But it’d take less time to re-learn well-rehearsed sequences—time savings as a continuous proxy for depth of encoding.
1
3
(Yes, I’m aware that some memory systems ask users to subjectively “grade” their memory 1-5, which would be slightly less discrete. I suspect it probably doesn’t add enough measurement resolution to be worth the user burden, but could be worth trying.)
1
3
The thing I have to keep reminding myself about a statement like this is that it does *not* mean that the mechanic causes 20% increase in depth-of-encoding. It's more likely a fairly small increase for a large number of people right below the threshold.
Quote Tweet
So it works pretty well initially, when the distribution’s spread out. e.g.: I’ve been running an RCT on retry mechanics. Of readers who forget an answer while reading an essay, about 20% more will succeed in their first review if the in-essay prompt gave them a chance to retry.
Show this thread
3
6
Would stem completion questions be useful here, since the memory trace can be weaker? Matching? Multiple choice? You could use code to vary the number of letters provided in the stem, depending on expected trace strength.
1
1
1
Replying to
Yes, quite possibly! and have a nice paper which includes an experiment on retrieval impact of varying hint stem lengths. itiveresearchjournal.springeropen.com/articles/10.11
May be able to better infer a continuous memory trace measure from how that factor changes. Hm hm…
Right, moving from binary outcome to a "stem completion letters threshold" would be good. Ridiculously nonlinear due to some letters being easier to guess than others, of course,
1
1
but maybe one could use psychlinguistic databases to assess the word neighborhood size of different stems and thus choose stems such that the nonlinearity is reduced somewhat.
1
2

