Speaker 1: You can take it for a test drive if you want.
Speaker 2: Oh, sure. Why not? The seatbelt, it's digging into the side of my neck.
Speaker 1: Don't worry about that. These cars are designed using the latest safety standards and features, so I'm sure it'll be fine once you get it all set up.
Speaker 2: Sounds like a good thing, so all right. Well, let me just move this seat a little bit higher first so I can see. Oh, now the steering wheel seems too close.
Speaker 1: Yeah, I hear this from a lot of women who come in, but this model lets you preset up to three different seat configurations.
Katy Milkman: The comfort factor in many cars, especially for women, could be improved, but car makers haven't addressed the root cause of this problem. And it's actually also a safety issue. It was just last year the researchers created the first crash test dummy based on the average female body. Cars are still tested and designed using a reference model based on the average male body. It's just one example of the thousands of ways our world is designed using data that's biased, data that only draws on a select portion of the relevant population. In this episode, we explore how seemingly small missteps in the way we gather data can distort our understanding of the world.
I'm Dr. Katy Milkman, and this is Choiceology, an original podcast from Charles Schwab. It's a show about the psychology and economics behind our decisions. We bring you true stories illuminating predictable quirks of human behavior that can help or hinder you, and then we examine how these stories relate to the latest research in behavioral science. We do it all to help you make better judgments and avoid costly mistakes.
W. Joseph Campbell: Things were pretty bleak in America in the 1930s. There was a worldwide depression that had hit in 1929, and the effects were still being felt in the mid-1930s.
Katy Milkman: This is W. Joseph Campbell.
W. Joseph Campbell: I'm Professor W. Joseph Campbell of the School of Communication at American University in Washington D.C. The worst of the Great Depression had maybe eased, but there were still maybe seven million Americans out of work. And it was also internationally a pretty troubled time as well.
Katy Milkman: Americans were looking for strong leadership during these turbulent years.
W. Joseph Campbell: Franklin D. Roosevelt had been elected president fairly easily in 1932 and was swept into the White House largely because of the effects of the Great Depression. By 1936, Roosevelt had initiated a number of policies that were collectively called the New Deal to rescue Americans from the worst effects of the Great Depression and to put many of them to work in various public works type projects.
Katy Milkman: This was a controversial approach back in the 1930s. Many voters were concerned about the cost of these social programs.
W. Joseph Campbell: And that helped to give some hope to the Republicans, who had been battered badly, and their nominee in 1936 was Alf Landon, the governor of Kansas. He was a very even-keeled kind of guy, soft-spoken. He had been a successful governor. He was a pleasant enough person, but he wasn't a dynamic person. He just didn't have the outgoing personality that Roosevelt had projected. Roosevelt just loved the campaign trail and loved meeting people. He was gregarious and outgoing and just loved being on the hustings.
Katy Milkman: These two candidates were squaring up for the 1936 election. In the 1930s, newspapers and magazines were key sources of information about national politics, and these publications were starting to experiment with a new way of reporting on public opinion.
W. Joseph Campbell: The best known was the poll conducted by the Literary Digest magazine. It had begun in 1924 to conduct nationwide election polls for president, and this was a device that the Literary Digest used to help drum up subscription. It sent millions of postcard ballots to people all across the country, and attached to this postcard ballot was an opportunity for the recipient to subscribe to the Literary Digest. The Literary Digest had achieved a very impressive track record in recent presidential elections. It had pegged the election of 1924, when Calvin Coolidge won that presidency, correctly. It had pegged the election of 1928, when Herbert Hoover won the election, correctly. It had pegged the 1932 election, in which Franklin Roosevelt won, correctly. A lot of people were inclined to compliment the Literary Digest for its uncanny accuracy.
Katy Milkman: 1936 was shaping up to be a tight race, and just as they always did, Literary Digest conducted an election poll. The magazine sent out 10 million surveys in a bid to predict whether Alf Landon or Franklin Roosevelt would win the presidential election.
W. Joseph Campbell: And this time the Literary Digest return response rate was about 24%. For these days, that would be a terrific response rate for a public opinion poll.
Katy Milkman: But by 1936, Literary Digest was not the only publication conducting polls.
W. Joseph Campbell: George Gallup began his work in 1935, in 1936 was ready to begin to sample the American public on the presidential election to see who was likely to win that race. Gallup was joined in 1936 by a couple of other pollsters.
Katy Milkman: But the public looked at these new pollsters with some skepticism.
W. Joseph Campbell: Their polls were not regarded necessarily with the same kind of weight or the same sort of interest even as the Literary Digest poll. They just didn't have that track record. There was just all kinds of data being flown around and shared, and it was really hard for people to figure out, "OK, which one should we focus on? Which one should we believe?" And Literary Digest stood out because it had that track record. It set an agenda for people, at least for some news organizations, because it was the oracle of public opinion.
Katy Milkman: And that oracle of public opinion made an astonishing claim in 1936.
W. Joseph Campbell: Overwhelmingly, the respondents, based on the returns of its mail-in poll, figured that Alf Landon, the Republican from Kansas, would carry 33 out of 48 states and win 370 electoral votes to Roosevelt's 15 states and 161 electoral votes. In other words, the Literary Digest was projecting a sweeping repudiation by landslide proportions of Franklin Roosevelt and his New Deal policies. Because the Literary Digest was an established polling organization essentially, and because it had been right on other presidential campaigns in the past, the Republicans and Alf Landon took a lot of confidence from this poll. There was no reason to think it was going to be wrong this time.
Speaker 4: With 175 out of 2,442 precincts reporting, Iowa gives Landon 8,439 and Roosevelt …
W. Joseph Campbell: And as the returns began to roll in on election night from Connecticut and Pennsylvania, places that had been strongholds then of Republican support, clearly showed Roosevelt in a commanding lead. New York state was another place that was clearly in his favor, and those results continued to pile up because he won every other state except for Maine and Vermont.
Katy Milkman: It was just the opposite of what the Literary Digest had predicted. It was a landslide for Roosevelt.
W. Joseph Campbell: This landslide, the Literary Digest later called it a super landslide, swept the country, and it was over fairly early on election night. There was a lot of surprise as to how sweeping the election was. There was a lot of newspaper criticism for the Literary Digest, rightly so. I mean the Digest was off by almost 20 percentage points. This was a deep embarrassment for the publication, for its poll, for a survey that had never been wrong before. The New York Herald Tribune, then one of the leading newspapers in the country, referred to the "rout and the wreck" of the Literary Digest and its poll and that its once proud infallibility had been punctured once and for all. And the Philadelphia Inquirer referred to the Literary Digest's polling embarrassment as "a crash of faith and a shattering of idols."
Katy Milkman: Failed presidential candidate Alf Landon seemed to take his defeat in stride.
W. Joseph Campbell: His congratulatory telegram to Roosevelt was very charitable. He pledged his support, said that the country had spoken, and that the president could count on Landon's backing in the future.
Katy Milkman: But the owners of the Literary Digest struggled to make sense of their embarrassing mistake.
W. Joseph Campbell: The magazine could not come up with an explanation for the deep inaccuracy of its mail-in poll. It rejected the notion that the poll should have been weighted, which is a term for the statistical adjustment of the data. Literary Digest didn't want to present anything other than a straightforward reflection of what the public had to say.
Katy Milkman: Meanwhile, competitors to the magazine jumped on this opportunity.
W. Joseph Campbell: Gallup was eager to establish his franchise as the go-to polling operation in the country, eager to supplant the Literary Digest. Gallup seized on the notion that the mailing list of the Literary Digest was distorted in favor of more upscale people. The Digest mailing list was drawn from automobile registrations, from telephone directories, and other sources. So Gallup surmised that automobile registrations and telephones were more upscale, more affluent, therefore more likely to vote for Landon than Roosevelt in the 1936 election. Gallup hammered this explanation throughout his career: "They are methodologically inaccurate and, therefore, should not be relied upon."
Katy Milkman: But there may have been a different reason for the Literary Digest's mistake.
W. Joseph Campbell: A far more compelling and I think persuasive explanation for the Literary Digest's polling failure of 1936 is not in the sampling problem. It's more likely non-response bias. In other words, that some people were more motivated to send back the Literary Digest's mail-in poll than other people. The problem with the Literary Digest's poll was that they did not know who was actually filling it out and who was returning it. In other words, opponents of Franklin Roosevelt were more likely to send in these postcard ballots to the Literary Digest, who compiled them, tabulated them, and came up with this very, very distorted estimate of how the 1936 election would go.
Katy Milkman: The Literary Digest never recovered from their polling error. Collecting information from a distorted subset of the population and projecting the wrong winner of the 1936 presidential election quite reasonably destroyed readers' confidence.
W. Joseph Campbell: Literary Digest never polled again because it essentially went out of business. By 1938, it had ceased publication. These polling numbers project a sense of certainty that is terribly appealing to people. Numbers, the data, generated by polls seems to be unambiguous and certain. And that helps to explain some of the enduring appeal of polling.
Katy Milkman: Dr. W. Joseph Campbell is a professor of communications at American University and the author of Lost in a Gallup: Polling Failure in U.S. Presidential Elections. You can learn more about his work in the show notes and at schwab.com/podcast.
The story of the Literary Digest's massive failure in predicting the outcome of the 1936 presidential election illustrates a common error in the way data is collected. W. Joseph Campbell referred to two features of the Literary Digest poll, which probably both played a role in its demise. First, respondents weren't selected at random. They were invited if they had registered an automobile or appeared in a telephone directory, which Gallup plausibly claimed skewed the sample towards more affluent voters. Second, Literary Digest made no attempt to track who chose to mail back their surveys or ensure the population was representative. It's likely that voters who were most frustrated with Roosevelt's policies and had more of an axe to grind were overeager to respond and express their dismay. Both of these issues, inviting a non-representative sample to participate in the poll and collecting responses only from the skewed subset of unhappy voters, likely doomed the poll to failure.
Those problems are both examples of something called selection bias. Selection bias can be a fatal error in election polling but also in any other context where decisions are based on data, like medical research, car safety testing, and policy analysis. Those of us who work with data professionally are always on the lookout for this bias because it can be difficult to spot until after the fact. We've talked before on this show about how hard it can be to disentangle correlation from causation. Selection bias is one of the sneakiest problems that can bite us when we try to interpret a relationship between two things as if they're cause and effect. Selection bias can sneak up on you anytime you try to draw conclusions from data if you aren't on the lookout for it.
My guest today is an economist who is best known for her data-driven advice on pregnancy and parenting and for her ability to sniff out selection bias and draw valid conclusions from whatever the latest research study happens to claim about how you can raise a healthy, happy child. She's penned multiple bestsellers on evidence-based parenting and was named one of the 100 most influential people of the year last year by Time magazine as a result of her work. Emily Oster is the JJE Goldman Sachs University Professor of Economics at Brown University.
Hi, Emily. Thank you so much for taking the time to talk to me today.
Emily Oster: Thank you so much for having me.
Katy Milkman: I want to start really basic and just ask you if you could define what selection bias means.
Emily Oster: So a lot of times when we look at data, we're interested in some characteristic of the data. So let's say you were interested in how tall people were in the population, so you wanted to look at just what's the average height in the population. So if you took a bunch of random people, just picked randomly out of the population, you measure them, that would be a good way to measure the average height. But now think about what if you selected a particular group? Let's say you only selected men, and you only measure the height of men. Well, then your heights would be too big for the average because men are on average bigger than women. That's a form of what we call selection bias, that our estimate of the height is biased. In this case, it's too high because we've selected a group that is not representative of the whole population. In this example, we selected men, even though we want to speak to the whole population. So that's a simple definition of selection bias.
Katy Milkman: Awesome. OK, I want to do something that I don't normally do on this show that I know you're going to be really good at, which is I want to try playing a game. The name of the game is "Name the Potential Source …
Emily Oster: Is it poker? I'm so good at that.
Katy Milkman: It is not poker. OK, so the game, I made it up this weekend. It's called "Name the Potential Source of Selection Bias in That Claim."
Emily Oster: That does sound like something I would be better at than poker.
Katy Milkman: I'm going to make up some headlines. These are not true statements, but they're plausible. If they haven't ever existed in the world, you will say like, "Oh, I've seen something like that," is my guess. And I think you're going to be like, "Here's what I'm concerned about in terms of selection bias" right away.
Emily Oster: All right, I'm ready.
Katy Milkman: OK, here we go. A New Yorker survey of readers found that most Americans read at least 10 books per year.
Emily Oster: The people who read The New Yorker, that's such a direct one, right? The people who read The New Yorker, they're already reading The New Yorker, which is basically a book. And so now you're taking a bunch of people who are already reading books and asking them, "How many books do you read a year?" They're already people who read, and so that's going to be a group that's selected to read much more than the average person in the U.S.
Katy Milkman: All right. Kids who spend more time in front of screens score lower on the SATs, so don't let your kids spend their time in front of screens.
Emily Oster: Concern: education confound, parental background, parental education highly correlated with screen use, also correlated with SAT scores.
Katy Milkman: OK, great. So basically, the kids who are being allowed to watch screens less have parents who are better educated?
Emily Oster: Exactly.
Katy Milkman: And that's actually causing it. OK, awesome. Intermittent fasting is great for your health, so you should try it.
Emily Oster: The kinds of people who choose to do intermittent fasting are also doing all kinds of other things that are good for their health, like exercising and eating well when they're eating. And also, they're more educated, and so they have more resources and more access to medical technology and all kinds of other things. So that is responsible for the health, not the intermittent fasting.
Katy Milkman: All right. This one is going to feel close to home if those didn't already. Moms who breastfeed have dramatically smarter kids, so you should never feed your baby formula.
Emily Oster: So there we actually know in the data that the confound is largely with education, so that's the most significant confound there. So education, parental family background, more educated, wealthier moms, more resources, are more likely to breastfeed. Those characteristics are independently associated with child test scores, and it's very hard to separate those out in the kinds of analysis you're talking about in that headline.
Katy Milkman: All right, this is my last one. Everyone who eats Quaker Oats loves them, so you surely will too.
Emily Oster: That's amazing because there's actually a lot of ads that are like this. So let's say everyone tried Quaker Oats, and half the people like Quaker Oats and half the people didn't. Well, people who didn't like Quaker Oats, they would just not eat them anymore, and the people who did like them would keep eating them. And so then you come back later, and you ask the people, "Do you like the Quaker Oats?" The people who are eating them, it's all the people who like them in the first place. There's such a direct selection there.
Katy Milkman: It seemed like a good one to end with for that reason.
Emily Oster: I like it. It's good.
Katy Milkman: OK. Thank you for playing my silly game. Hopefully, that gives people more of an intuition for the way you think that busts these selection biases. That was really fun. OK, so I want to get into some research. Actually, a lot of the research papers you've written are about rooting out sources of selection bias in data to find answers to practically important questions. And I was wondering if you have a favorite paper you could tell us about in this vein?
Emily Oster: Yeah, sure. So my favorite paper that I've written in this vein is about the dynamics of this process. And so it's a paper where the leading example is vitamin E. Vitamins are a sort of example of a kind of thing where there's a lot of selection and whether people take them, and then if you link them with health, you're concerned that the kinds of people who take vitamins are different from those who don't. What I was interested in is how that relationship evolves as we tell people different things about the value of these vitamins. So vitamin E is an example where in the mid-1990s, people got told that vitamin E was really good for heart health and preventing cancer. There was this whole vitamin E is this magical thing you should be taking a lot of.
And when you look in the data, you can see that when people get told vitamin E is good, more people start taking vitamin E, but it's not just random people. It is precisely that people who are also doing all kinds of other stuff for their health. So it's the people who exercise more, the people who don't smoke, people who eat vegetables, people with more education. There's a whole range of things, which are predictive of adopting vitamin E when we tell you that it's good for you. And as a result, if you look at, for example, the link between vitamin E consumption and mortality over time, actually, before we told people it was good for you, there's a little bit of a link. And after we tell people it's good for you, there's a much stronger link. So if you look after we tell people about this, it's like, "Wow, vitamin E really makes you not die!" But it's just because the people who started taking vitamin E are selected in all of these other positive health ways. Then in 2004, they realized not only is vitamin E not especially good for you, but actually, if you take too much, it can kill you.
Katy Milkman: From a randomized controlled trial.
Emily Oster: From a randomized controlled trial, actually. So then there's a meta-analysis of randomized controlled trials, which says actually in high doses it's really bad and in moderate doses it doesn't matter. And as a result of that, then all of those people who started taking it before, they stopped taking it. And this link with mortality goes back down. So it's an example, I think, which really problematizes some of these observational studies and just digs into, "Why is this going on? What are the actual mechanisms by which this is happening?"
Katy Milkman: Oh, that's amazing. When do you feel like our tendency to be naive about selection bias issues is most problematic for people in their daily lives?
Emily Oster: So I think this can become quite problematic when it causes people to make choices that are otherwise very difficult for them or otherwise make them upset. So something like breastfeeding, which I write a lot about, where I think we spend a lot of time telling people about benefits of breastfeeding that are not supported in the data. Not that there's no small benefits of breastfeeding, but many of the things we say, like breastfeeding will make your kid smarter, thinner, healthier later, those are just not supported in the best data. Because breastfeeding is hard for a lot of people, and because it doesn't work for a lot of people, and because it can be a source of tremendous shame, I think that's where we really should try to move away from telling people things which are not supported by data, when they are then going to feel bad about them, and it's going to impact negatively other aspects of their life.
Katy Milkman: So I can't go back to work, or I spend six hours a day pumping because I believe that I'm doing something horrible for my child and destroying their future by not breastfeeding. That's costly.
Emily Oster: Or I sink into postpartum depression. Actually, one of the things I heard a lot from people, actually from people's spouses, was "My partner is really depressed about this. It's not working, and she's killing herself to do this. And she's really sad, and I just think if she could let up the pressure with some better evidence, she would feel better." And I think that's a real cost to people.
Katy Milkman: Yeah. What do you think our listeners should do differently in their lives, in their personal lives and their decisions about their finances, now that they know a little bit more about selection bias?
Emily Oster: The most important thing is don't lurch. So when you see these new studies about whatever, finances, health, any of this space, there is such a strong temptation to say, OK, here's the new finding, and I'm going to change my behavior based on that and do something different. And it is occasionally true that some new finding is actually very believable and very reliable, and it's something that you should change your behavior about. But most findings are not that good, or if they are, they are better interpreted in light of all of what we know before. So there's a huge literature in most of these spaces that already exists. Don't change what you're doing based on one headline or one study. Try to think about everything together.
Katy Milkman: And by the way, that's great advice in general about your finances, so that's particularly useful …
Emily Oster: Just because you see, don't sell everything or buy everything. Just take a deep breath.
Katy Milkman: Take a deep breath.
Emily Oster: Take a deep breath.
Katy Milkman: I love that. That's a perfect place to end. Thank you so much for your time today, Emily. This has been really helpful.
Emily Oster: Thank you so much for having me.
Katy Milkman: Emily Oster is the JJE Goldman Sachs University Professor of Economics at Brown University. She's also the author of several bestselling books on pregnancy and parenting, including most recently The Family Firm: A Data-Driven Guide to Better Decision Making in the Early School Years. And I highly recommend her terrific newsletter, Parent Data, which you can find on Substack. You can find out more about Emily and her work in the show notes and at schwab.com/podcast.
Could selection bias be impacting your investment strategy? Our sister podcast, Financial Decoder, takes an in-depth look at important financial decisions and how to guard against the cognitive and emotional biases that might affect them. Check it out at schwab.com/financialdecoder or wherever you get your podcasts.
Selection bias is a topic that I spend a lot of time discussing with my Wharton students because whenever we collect or analyze data to reach a conclusion at home or at work, we're often quick to feel we now know the truth. Say you send out a survey to a group of friends to get their feedback on the name for your new business, and when there's a clear favorite, you're certain that name is the right choice. But maybe you don't consider that your friends aren't in any way representative of the customer base you're hoping to serve, who are on average 20 years older and unlikely to get the millennial pun in the name you picked.
The over-reliance on white male patients for medical studies for decades meant that medical advances improved patient outcomes disproportionately for this group, and polling a biased sample of people means failing to forecast who will win an election. So, knowing that, how can you avoid being tricked by selection bias? The key is to first ask, "Am I sure my data is representative? Have I collected information from the right people?" Next, if you're looking at a relationship between two variables, and you're tempted to say one thing caused the other, you should always ask yourself, "Could it possibly be selection bias?" Once you start thinking this way, you'll notice that selection bias is rampant in the information presented to you. You'll get a lot more skeptical, but I'll also wager that you'll be able to make much better decisions.
You've been listening to Choiceology, an original podcast from Charles Schwab. If you've enjoyed the show, we'd be really grateful if you'd leave us a review on Apple Podcasts, a rating on Spotify, or feedback wherever you listen. You can also follow us for free in your favorite podcasting app. And if you want more of the kinds of insights we bring you on Choiceology about how to improve your decisions, you can order my book, How to Change, or sign up for my monthly newsletter, Milkman Delivers, at katymilkman.com/newsletter. In two weeks, the story of an incredibly difficult and consequential decision, reached in part using the principles of super forecasting. I'm Dr. Katy Milkman. Talk to you soon.
Speaker 6: For important disclosures, see the show notes, or visit schwab.com/podcast.