Snippet on Random Sampling vs. Bias

This is a snippet from mathematician Ben Orlin on the Infinite Loops podcast :

Ben Orlin
Ask any statistician, like, statistics is all about handling the random error in sampling. And like, we’ve got great ways of putting constraints on that and and sort of knowing how much error might arise But if your samples biased you’re screwed, right? There’s the famous story of the beginning of Gallup polling I think this was the was it the 1936 election when FDR was up for reelection and you know readers digest a huge circulation and they Pulled their readership and on the basis of like 2 million, you know in a country I don’t know hundred million like they pulled tons of the population and they’re like, okay, you know, Landon is going to be Roosevelt. And then Gallup, you know, got 1000 people, tiny, tiny fraction, maybe a few thousand, and said, it’s going to be FDR, he’s going to win a reelection. And it was the difference between a huge, huge bias sample, which doesn’t really tell you that much, and a small random sample, which does. And that’s it. I don’t know. For students, I find for people learning and approaching samples from the outside, you just get it backwards. They think that. They think that the bias doesn’t really occur to them. They don’t worry about bias in a sample, but they’re always worried about random error. You know, you pull 50 people, it’s like, oh, well, what if we just got 50 people who really, really like Boba T, or 50 people who hate Boba T? And it’s like, well, that wouldn’t happen randomly. Like, you know, 50 maybe, but like, but that’s actually not, that’s not really the thing you need to be worried about. Thing you need to be worried about is if you went out and asked your friends, because your friends are not random. That’s biased. And it, but it’s not, if you actually could just pull random people from St. Paul, you’d be fine. I mean is a great a great sample if it’s actually random.
Jim O’Shaughnessy
Yeah, and that’s one of my little hobby horses or soap boxes that I like to be on self-selected samples, right? Like at the challenge that I have is a getting people to understand what a self-selected sample is and not to beat up on anyone. But like the book The Millionaire Next Door, right? So so this book sold millions and millions of copies and I read it. I’m like, this is all utter bullshit and I just started thinking about it. I’m like, well, wait a minute like What kind of millionaire will devote two days or whatever the amount of time it was to answering this very detailed questionnaire for $1,000. And the answer is that guy or gal? Exactly. They have nothing in common with real millionaires, in my opinion. And so it’s pervasive through culture, though. It’s like whenever I read something like, science says, or according to, I immediately kind of like, okay, where are the planet axioms here? Is this a self-selected sample, or is this a Reader’s Digest sample? People don’t intuitively go, okay, what type of person might subscribe to Reader’s Digest? That in and of itself is like a biased group. And then your example of your friends, that’s a super biased group. How do you disambiguate that? How do you get people to understand that?
Ben Orlin
I wish I knew. I mean, to me, that’s like one of the great epistemological problems that we all face, is you go around the world and you just you don’t get a random experience of the world. You get your little slice, you get the people you know, who are probably very similar to you in lots of ways. Yeah, I think about this a lot because you just like the kind of questions that we all care about, but you don’t get data on, you know, like, how do people act when they’re in conflict, you know, when they’re fighting with each other? Or like, what, what makes a good relationship? Or how do people approach death? You know, like, you don’t get a random sample of this. I don’t know how the world approaches this. I know how my very particular demographic world approaches this. It’s really hard to break out of. I think the first thing you can do, and I think math helps with this, is just to see that that’s the problem, that you’re seeing this tiny, tiny little slice that’s very biased to be like you. The world around you is going to resemble you a little bit. The exercise I like is, I’ll ask a class, think about Wikipedia, what fraction of pages do you think have pictures? Like what percentage of pages? If you pick a random Wikipedia page, what percentage of pictures? And it’s like, I never go to a Wikipedia page without a picture. Like John Travolta, yeah, they’ve got pictures of John Travolta. There’s pictures everywhere. So I’ve never seen anybody guess below 90%. Maybe one kid who didn’t use Wikipedia very much guessed like 80% or something. But everybody thinks it’s going to be 95 plus. A lot of people think it’s going to be effectively 100. And then you say, okay, everybody in the room, go go find a couple of Wikipedia pages and report back. And so yeah, they’ll go find a Wikipedia page for chicken. It’s like, Oh, yeah, there’s a picture of a chicken right there. And they come back and yeah, it’s 100% in the sample. He said, Okay, great. Now Wikipedia, the reason is Wikipedia, not Instagram or, you know, TikTok for some equivalent thing, is social media doesn’t give you a random button. But Wikipedia is this beautiful human institution here in the 21st century. It’s built to be accessible and transparent. And so they’ve got a random button. So click random article, do it 10 times, and come back and tell me how many have pictures. And so, you know, room of 20 kids, you get 200 examples, and it’s about half. Because what you realize is when you look at Wikipedia, you see, you know, you go to the page for the Oppenheimer movie, and then you go to the page for the Barbie movie. And so, yeah, they have pictures above those. But if you click a random article, you get the 2010 National Swimming Championships in Belarus, or you get a train station in Sri Lanka, or get, you know, a midterm election in Ireland in 1997. Like, you get stuff that you don’t think of. That’s not the pages you’re thinking to go to. And you realize, like, oh, I’m just going to the big famous pages. I’m getting these very biased glimpses of reality. And if you can really randomly sample it, you see that so much is missing from your daily experience. Like we all see the big stuff. We all see the famous stuff. There’s stuff that’s invisible and shows up all over and over again. And then there’s the invisible kind of dark matter of every population.

Leave a Reply