Thus far most of the prevailing sentiment that this data isn't real seems to come from anecdotal beliefs: not very much evidence that the company exists, insider knowledge of how hard it is to connect EHR data, etc.
-
Show this thread
-
And that is all pretty convincing, but I wanted to find a statistical 'proof' - something like what ultimately exposed the Wansink papers and other frauds. I wanted to find numbers that cannot exist. And so far, I haven't found any.
2 replies 0 retweets 16 likesShow this thread -
(That doesn't mean they aren't there, just that I haven't found them yet)
1 reply 0 retweets 9 likesShow this thread -
So there's another possibility that I want to discuss. What if there's a "real" (fake) dataset? This gets a bit weird to talk about publicly - sort of an "If I Did It" thing - but go with me on this...
2 replies 2 retweets 25 likesShow this thread -
It really isn't *that* hard to simulate data to have simple patterns that you want. And the easiest way to make these papers look convincing is to create a "real" (fake) dataset, then run "analyses" on all of the fake dataset, so they're internally consistent.
2 replies 1 retweet 21 likesShow this thread -
Now, a lot of folks are claiming that the solution here is "open data" - figuring that if SSD is asked to hand over the data, he just won't agree, and that's the end of the story. He won't produce the data because the data don't exist.
1 reply 1 retweet 13 likesShow this thread -
But...what if he does produce the data? What then?
1 reply 0 retweets 15 likesShow this thread -
Playing this scenario out, he can turn over the "real" (fake) dataset, say that he just needed time to make sure that it was properly de-identified and had all legal agreements or whatever, and then no amount of statistical forensics will prove that it didn't exist.
2 replies 0 retweets 21 likesShow this thread -
So I am a bit stuck. There seems to be an assumption that asking him to provide the data will be game over because he cannot produce the data. I don't think that is a stone-cold lock, either.
3 replies 0 retweets 24 likesShow this thread -
And, since I'm one of the less-pro-open-data-than-much-of-academic-Twitter, I do feel compelled to point out that open data (while it has presumed advantages) is not going to be the foolproof solution, either.
6 replies 1 retweet 27 likesShow this thread
Yes this has occurred to me too. You can't just release data like this, it's far too readily reidentifiable. Much more complex than many are making it out to be
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.