Speaker 1: It’s been two years since a devastating wildfire rolled through this valley town. Many homeowners are still waiting on insurance settlements. Lawyer Sam Barber is working on dozens of unresolved cases.
Speaker 2: It’s the inconsistency that bothers me the most. You’ll have neighbors whose homes are right next to each other. They’ve suffered the exact same damage, but they’re treated totally differently by the insurance adjusters.
Speaker 1: A California Department of Insurance spokesperson says the vast majority of claims …
Katy Milkman: That scenario, which is based on a true story, isn’t uncommon. Floods, earthquakes, fires, and other natural disasters can damage your house and your neighbor’s house in equal measure. But the insurance settlements can vary dramatically, even when neighbors have similar homes, similar damage, and even identical insurance policies. In this episode of Choiceology, we’ll dive into a phenomenon that has major implications for homeowners …
Speaker 1: Several small earthquakes rattled the nerves of Seattle …
Katy Milkman: The world of sports …
Speaker 3: Meanwhile, the International Skating Union has tweaked its figure skating judging criteria …
Katy Milkman: And even the complex world of wine.
It’s also the topic of Daniel Kahneman’s upcoming book. I’ll speak with him later in the episode about his current preoccupation with noise in decision-making.
I’m Dr. Katy Milkman, and this is Choiceology, an original podcast from Charles Schwab. It’s a show about the psychology and economics behind our decisions. We bring you true stories involving high-stakes choices, and then we explore the latest research and behavioral science to help you make better judgments and avoid costly mistakes.
G.M. Pucilowski: I very much enjoy drinking a lot of different wines. I love pinot noirs and I love zinfandels. I have a wine cellar down in the garage.
Katy Milkman: This is Pooch.
G.M. Pucilowski: My name is G.M. “Pooch” Pucilowski, and I’ve been the wine judge and chief judge of the California State Fair for, well, I was there for 26 years.
Katy Milkman: The California State Fair Wine Competition is a big deal.
G.M. Pucilowski: It’s crazy. We are in two separate warehouses. One warehouse on the fairgrounds is loaded with all the wines. All the wines that came in are all stacked in neat rolls, and then we have tables all over the place. And then you come into the judges’ room, and we’ve got four judges to a panel, 16 panels. We’ve got a person that is kind of a clerk that takes care of all the judges’ scores. We’ve got people running around on carts that are delivering the wines. We’ve got a command center that’s sitting up a little bit high to oversee everything. And then we got a whole crew that runs around and empties the glasses after the judges are done judging it. They’re putting out the celery and meat and other things for the judges to taste. And it’s like an unbelievable three-ring circus going on. There is so many things happening, you can’t even imagine. It is just unbelievable.
Katy Milkman: The state fair plays an important role in promoting California wines.
G.M. Pucilowski: And the purpose of it is originally to give credibility to California wines, to say to the world, “Hey, we’ve got winners here. We’ve got wines that you should be trying.”
Katy Milkman: The stakes are high, especially for smaller wineries. A gold medal or a double gold can mean a huge boost in sales.
G.M. Pucilowski: It means recognition. It means repeat business. You get customers from it, you get followers, and the majority wineries out there are small wineries. They don’t have the money to market in advertising, so I think competitions are just one of those ways to help wineries distinguish themselves.
Katy Milkman: The State Fair takes wine judging really seriously, in part because of the business impact of its medals.
G.M. Pucilowski: We required all of our judges to take a qualification exam. We’re not looking for you to pick good wines. We’re looking for you to be consistent in your wines. There are judges that tend to score wines higher. They tend to give them higher medals. And there are judges that are very strict judges, and they find fault everywhere, and so they would be kind of a low-scoring judge. We don’t care whether you’re a low-scoring judge or a high-scoring judge. What we care about is that you’re judging consistently. That’s important.
Katy Milkman: Wine judges develop their knowledge and taste over years of study and practice. Many are winemakers themselves or wine merchants or sommeliers. The judges must be able to distinguish between subtle differences in color, texture, aroma, and flavor.
G.M. Pucilowski: We’re judging wines on their varietal characteristics. And what that means is, every grape has certain characteristics. As an example, we’ll take sauvignon blanc. If I had in front of me a glass of sauvignon blanc, I would be taking that glass and the first thing I’d be looking for is a very light color. The next thing I’d be looking for is, I’d be swirling the wine and I’d be smelling it. Now, for sauvignon blanc, I would be looking for grassy, grapefruit, passion fruit, fresh-mowed lawn type of a smell. That’s a kind of a smell that I want to find in a good sauvignon blanc, to me. The last thing I do is I’d put it in my mouth, I’d taste it, and I’d go, “Yeah, OK, this is it. This is a gold because this is really what I expect the sauvignon blanc to smell and taste like.”
Now, if it didn’t quite have all of those qualities, and maybe it was a little bit lacking in some of those varietal characteristics, and maybe it wasn’t quite a gold, but maybe I’d give it a silver. If the wine didn’t have very much of the qualities at all that I just described for sauvignon blanc, if it still tasted good, somewhat looked like a sauvignon blanc, I’d probably give it a bronze medal. And if it had no rhyme or reason, it didn’t smell like a sauvignon blanc, I couldn’t tell what it smelled like, if it didn’t have the right color, it just didn’t come across as a sauvignon blanc, then I would give it a no medal. So that’s my range from no medal to gold.
Katy Milkman: So that’s a short version of the judging criteria for a specific varietal in a wine competition. But there are other important aspects to the process too. First of all, it’s a blind test. The judges don’t see labels or names, just numbers.
G.M. Pucilowski: Now, once I’ve scored and I got a sheet in front of me, I’ve got 12 wines sitting in a semicircle in front of me. All 12 wines have a four-digit number randomly assigned on it. Now, all of these glasses in front of me are all listed on my sheet. They’re all in the same order that they’re out on the table. So I would pick up 4040, and I would smell it. I’d taste it. I’d write a few notes down. When I’m done scoring all those wines, those are the scores that I give to the clerk. Those are what we call my first judge.
Katy Milkman: The thing is, that first judgment was often inconsistent with the assessments of the other judges. One judge might feel that the wine was worthy of a gold medal. Another might feel that it didn’t deserve a medal at all. That inconsistency was interesting to Pooch. And his wine-judging colleague, Robert Hodgson.
G.M. Pucilowski: Yes, Robert was the owner of Fieldbrook Winery in Northern California, and he was one of my wine judges. And Bob was, I’m going to say somewhere around 1998, he was judging for me. Winemakers always seem to be some of the best judges. So Bob and I had a lot of conversations. Somewhere in those conversations, we talked about testing the judges, but how would you go about testing the judges? And Bob came up with some ideas and how it could be done. Since he was a statistician, we said, “Wow, OK, maybe this would work.” So that’s where I sort of got Bob from behind judging the wines to helping us in the backroom, trying to come up with a way of scoring the judges. Judge the judges.
Katy Milkman: Pooch worked with Bob Hodgson to devise a way to measure the consistency of wine scoring. Hodgson had a background in statistical analysis from his previous career as an oceanographer. And his years as a winemaker and judge meant that he was familiar with the problems inherent in the judging process.
G.M. Pucilowski: I think he didn’t like judging. I think he was discouraged also by the fact that his wines would get a gold medal at a certain competition, and it didn’t get a gold medal in another competition. And we just kind of got together and said, “Let’s go for it. Let’s make this thing happen.”
Katy Milkman: Around 2003, Pooch and Bob Hodgson devised a clever strategy to judge the judges. As Pooch mentioned, the flights, the groups of various wines presented to the judges during the competition, they’re anonymized. Each wine is assigned a four-digit number. Hodgson and Pooch figured they could repeat some of the wines in the flight. They settled on a plan to repeat four wines, each of which was poured into three glasses. Those 12 glasses were then interspersed randomly with the remaining 18 single wines. The judges assumed they were tasting 30 different wines, when they were really only tasting 22. Bob Hodgson could then measure any inconsistency in scoring between glasses of the very same wine by the very same judge.
They discovered the judges were often inconsistent in judging the exact same wine in three different glasses. At this point, Pooch started to wonder if there was anything they could do to improve the results.
G.M. Pucilowski: If the judges aren’t being consistent, as we found out they weren’t, I’d look at myself and I’d go, “What am I doing? What is it that’s causing this problem? What could we do? What could we change?” That’s my whole purpose of doing this with Bob was to—how can I get a better way of doing it?
Katy Milkman: Bob Hodgson’s statistical analysis showed that the degree of inconsistency varied from judge to judge. There was essentially a bell curve.
G.M. Pucilowski: Fifteen percent of those judges did really well. They were very close. They nailed most of the wines. They did very well. Fifteen percent on the very end, they were bad. They were giving no medals. They were giving gold. They were all over the board. That left about 70%. They weren’t really bad. They weren’t really great.
Katy Milkman: The problem was, in any bell curve, there are top performers and under-performers. That’s how a distribution works. But often, these outliers aren’t meaningful or stable. Still, Pooch and Bob hoped they could get somewhere with what they were seeing in the data.
G.M. Pucilowski: The next year, because we were hoping, “Wow, maybe we could take these top 15% and teach the 70% how to be better.” Well, unfortunately, those 15% weren’t there the next year. Those 15% sort of fell back into the 70%. And we got a new group of 15% of judges who did really well. The one constant, those that were on the bottom, the 15% that were on the bottom of the bell, they were always on the bottom of the bell every year. So I could easily see where I could start peeling off 15% of the judges, bring in another 15% to make up the difference. And I think we were finally getting somewhere with it, and it was becoming some useful information.
Katy Milkman: They now had data that demonstrated a high degree of variability in the judging of wine, with only 10 to 15% of judges achieving consistency in their assessments in a given year. They hadn’t figured out a solution, but they felt it was time to share this information in the industry. As you might guess, not everyone was happy.
G.M. Pucilowski: Yes. There was pushback coming from my board of directors. Every one of them was in the wine industry in one way or the other. Then when Bob proposed that we publish this, yeah, there was some pushback on it. There was a little bit of controversy on whether we should mention the California State Fair, but we kind of felt that, sooner or later, it was going to leak out that it was the State Fair that was doing it. So we might as well just be upfront about it and say it was the California State Fair.
Katy Milkman: They made the analysis public. And Bob Hodgson’s findings caused shock waves in the world of wine competitions.
G.M. Pucilowski: I think there was a lot of other competitions that didn’t like it. Especially when someone goes out and says, “Your judges aren’t very good.”
Katy Milkman: While the industry reaction was often negative, Pooch did his best to use what they’d learned to improve the consistency of wine-judging.
G.M. Pucilowski: Every year, I tried to make changes.
Katy Milkman: But no matter what, there was still wide variation in judging, in part because it’s hard for judges to avoid being influenced by the exact sensations they’re experiencing when they take a given sip. And in part because human judgments are by nature just fickle.
G.M. Pucilowski: So if I’m smelling wine that’s got an oaky smell to it, I go, “OK, great. All right.” And I taste it. Then I pick up another wine that’s got an oaky smell. It may not be as strong because I’m just already sensitized to the oakiness that I just smelled in the wine before it. So there’s that possibility sensory fatigue is setting in.
Katy Milkman: Lots of factors, sensory fatigue or overload, the time of day, the order of tasting, and even personal preferences are all things that conspire to affect a judge’s decisions. There’s also something even more difficult to grasp, and that’s the arbitrary nature of judgment itself.
G.M. Pucilowski: It’s very subjective. We tend to like what we like and drink what we drink because we like it. And I don’t know anybody in the wine business that doesn’t have opinions, but wine judges are out there to do the best job they can. And I think part of why we sit around and discuss the wines after we rated them is, in case I missed something. I’m just human. I miss things.
Katy Milkman: Pooch left the State Fair job in 2012 after a change in management. He continues to be active in the world of wine, and still sees opportunities to improve wine judging.
G.M. Pucilowski: And I think that more work needs to be done on this. I think we’re just barely scratching the surface. But because of the slamming that people get by doing it, it’s going to prevent a lot of wine competitions from wanting to do it. I think another 10 years would have been just awesome to be able to continue doing this kind of a study. Really believe that.
Katy Milkman: Despite the industry blowback from his work with Bob Hodgson, Pooch has no regrets.
G.M. Pucilowski: Oh, I’m so happy we did it. I feel good that we did it, and I wished that I could encourage other competitions to do it.
Katy Milkman: G.M. “Pooch” Pucilowski is a speaker, writer, wine judge, and educator. He runs wine appreciation classes through his University of Wine. I have links in the show notes and at schwab.com/podcast, where we also link to Robert Hodgson’s research paper, examining the reliability of judges and wine competitions.
After hearing this story, you may think, “Well, of course taste is subjective, and of course you’re going to get variability in judging.” But remember that Pooch was looking for consistency from each judge relative to their own past appraisals. And even then, there was a substantial amount of variability. Wine judging, from a behavioral science perspective, is a very noisy process. Not noisy as in loud and annoying, but noisy in terms of variability in the results it produces and the influence of irrelevant information.
Let me give you a simple example of this type of noise. Take a bathroom scale. A scale that consistently shows three pounds over the actual weight is biased. It’s predictably out of whack. A bathroom scale that measures three pounds over on one measurement, and then two pounds under, and then five pounds over, while measuring the same person’s weight just a few minutes apart—that’s noisy. It’s variable and unpredictable.
Noise can come into play in many types of decisions. Doctors in emergency rooms might make different diagnoses when seeing exactly the same symptoms. Appraisers at credit-rating agencies might give different risk scores to statistically indistinguishable applicants. Insurance adjusters might assign substantially different values to objectively identical claims. Gymnastics judges might score the same routine quite differently from one day to the next.
And there’s a cost to this variability. It could prevent a budding entrepreneur from securing a loan to start a business. It could lead an insurance company to underpay for life-changing claims. Patients might receive conflicting diagnoses from different doctors. Athletes might miss out on medals. An ever-growing body of research has proved that our decisions are often internally inconsistent.
That is, if you look at, say, a judge making bail decisions day in and day out, and you have all the information the judge used to make each decision, you’ll be able to detect a huge amount of variability in those decisions, with no apparent cause. Sometimes, the information on which we base our decisions is itself noisy, and that’s another source of variability. Noise is the subject of a book that Daniel Kahneman is in the midst of writing with Cass Sunstein and Olivier Sibony. I sat down with Danny Kahneman recently at his apartment in New York City to talk about his thoughts on noise, and why he felt it’s such an important topic to cover in his new book. I started out by asking Danny to tell me a little bit about why he got interested in writing a book on noise after so many years spent studying other topics.
Daniel Kahneman: It really started with an observation and when I was doing some consulting with an insurance company. I had the idea of running a study with underwriters and also with claims adjusters, who are people who have to put a value when a claim comes in, and some details about it are available. Somebody has the task of setting a value, an estimate of how much that claim will cost the company. So we had those two kinds of people, underwriters and claims adjusters.
And let’s focus on underwriters for a moment. So we had them construct realistic problems, the kind of problems that these underwriters encounter every day, large problems, significant problems. And we showed it to 50 underwriters. Each of them had to put a dollar value on a risk. And we asked the executives the following questions: In percentages, by how much would you expect two underwriters to differ? And there is a number that, for some reason, everybody agrees on that number, I don’t know why, but it’s about 10%. That’s what people expect in a well-run firm. Ten percent looks tolerable.
You know that they can’t be expected to be identical, but you don’t really want them to be wildly different. It turns out that there is a correct answer by how much they do differ. And it’s 50%, five zero. So it’s five times larger than the executives expect. And in some sense, when the variability that large, there’s a serious question whether you need those underwriters—that is, whether you might not have an algorithm that would actually predict much more accurately. So this is quite deep. And what made the observation striking was that nobody in the firm had ever thought about that. They did not know they had a noise problem. They had a large noise problem. And so we became interested in how general that is, and it seems to be fairly general.
So when somebody comes into the country and asks for asylum—asylum judges, it’s a complete lottery. Some of them approve more than 90%; others approve less than 10%. So in society, we have a lot of situations where there are different functionaries that should be interchangeable. But it turns out they’re not. And we call that system noise. What makes this interesting to me is that all my life I’ve studied biases. And noise is a different kind of error. But it turns out that you can measure noise without knowing the correct answer. Whereas, you can’t measure biases that easily without knowing the correct answer. And also, the kinds of thing that you can do to control noise may be different from the kinds of things it would do to control biases.
Katy Milkman: There’s a couple of elements I’m totally fascinated by. One of them that I’m thinking about now is the magnitude of how far off people are relative to expectations. Do you feel like that’s sort of the key thing that struck you and made this feel like a topic that was worth digging into is that we think noise is a small error margin?
Daniel Kahneman: I mean, people by and large completely ignore noise. When you look at popular books on decision-making, and you look for the topic of noise or reliability, you don’t find it. So the relative amount of attention that is paid to biases and to noise, the ratio is very, very large indeed. There has been real concentration of effort in thinking about biases. There’s a good reason for that, and this is that biases are better stories, and biases have causes, and people like to think causally, and they like to think in terms of systems that you can tell stories about. Noise doesn’t lend itself very well to a good story. So for those reasons and many others, noise is basically thoroughly neglected, and that’s why we decided to write a book, just to redress the balance. I mean, it’s not that people don’t know about it, so it’s not that we have many things to say that are very surprising. We just want to make noise more difficult to ignore.
Katy Milkman: It sounds like you think the cure to this is algorithms. And given that we vastly underestimate the noise when we have humans making choices that it makes it even more critical than we’d already thought, given past research on bias, to get algorithms in place.
Daniel Kahneman: It’s very well known that when you compare people to algorithms that, by and large, at least half of the time, algorithms beat people hands down. And the other half, it’s about a tie. But we know why people are inferior to algorithms, and noise is a big part of it. So there is compelling evidence that it’s because of noise in large part that people are inferior to algorithms. So by removing noise, you could …
Katy Milkman: Better people.
Daniel Kahneman: You could improve … Yeah.
Katy Milkman: It’s the future. Thank you, Danny.
Daniel Kahneman: OK.
Katy Milkman: Actually, this was really, really fun for me, so thanks for taking the time. I appreciate it.
Daniel Kahneman: No problem.
Katy Milkman: Daniel Kahneman is the Eugene Higgins Emeritus Professor of Psychology and Public Affairs at the Woodrow Wilson School at Princeton University. He won the 2002 Nobel Prize in Economic Sciences and in 2011 wrote the bestselling book Thinking, Fast and Slow. He’s currently working on a book about noise. I have links in the show notes and at schwab.com/podcast.
The existence of noise in decision-making is behind a lot of recent trends in investing, like the rise of index funds and ETFs, as well as the automated services referred to as robo-advisors. To learn about the role digital advice can play with your investments and financial plan, check out our sister podcast, Financial Decoder. The episode “How Do You Get Started on Your Financial To-Do List” is particularly useful. You can find it at schwab.com/financialdecoder or wherever you listen to podcasts.
As Danny mentioned, one solution to the problem of noise is to introduce algorithms into the decision-making process. Algorithms, while not perfect, are also not subject to the influence of irrelevant information like time of day, hunger level, or simple human fickleness. Let’s think about what this would look like if we returned to our wine judging problem. We spoke with James Hutchinson, who worked for the Royal Society of Chemistry for a number of years, and has given quite a bit of thought to this challenge.
James Hutchinson: So there’s two main elements that we think about when we’re looking at wine quality. One is how it actually tastes on the palate, and the second is all of the complex aromas and smells that give it all of that different flavor complexity. We get sourness from acidity, we get sweetness from sugar, and we get bitterness from alcohol.
That’s where it comes to subjectivity, that ultimately it comes down to a person, whoever’s tasting, to be able to interpret all of that complex information that they’re getting from the palate and from the aroma, from the nose, and turning that into some kind of judgment around whether that represents something that is quality.
Katy Milkman: To counter some of the effects of subjectivity and noise, James suggests that wine competitions could eventually introduce chemical analysis into the process.
James Hutchinson: We’ve even got a new technology that’s being developed in New Zealand at the moment, and they’ve developed a ground breaking new technique that can use UV-visible spectroscopy on cloudy solutions.
And so what I think would be really exciting is if we could develop, for instance, some machine-learning and artificial-intelligence technologies that would be able to objectively create a link between the complex chemicals that we know we can already measure within a glass of wine and make a judgment of quality in different circumstances that doesn’t rely on someone, on a human, to taste the wine and then make an interpretation themselves.
Katy Milkman: We might not be there yet, but it seems that chemical analysis combined with artificial intelligence might go a long way to reducing noise and delivering a more objective judgment of a wine’s qualities.
The fact that we humans are far less consistent in our own judgments than we expect to be is important to recognize. It means a heavier reliance on algorithms can make the world a fairer and more predictable place. But the noise in human judgment oddly has an upside. There’s a fascinating study from 2008 that shows a way you can use it to improve your decisions. Two psychologists named Edward Vul and Harold Pashler teamed up to see if the noisiness of people’s judgments could maybe be turned into an asset, helping them make higher-quality decisions.
Here was their logic. It’s a well-established statistical fact that if you want a high-quality forecast of an unknown quantity—say, the number of trees on earth, or the number of garbage trucks in Chicago, or even a forecast of the U.S. gross national product next year—you’ll actually do better if you average multiple people’s independent guesses than if you just make one guess. So if you’re not sure how many miles away the moon is from the Earth, for instance, rather than asking one person and assuming her guess is likely to be as good as any other, you’d be better off asking five people separately and averaging their answers. The reason is that the noise in their estimates will cancel out, and you’ll end up with a better average answer. People often talk about the wisdom of crowds, and this is what they’re referring to. James Surowiecki even wrote a bestselling book on the topic.
The cool thing about the noise in our own judgments that’s drawn Danny Kahneman’s attention is that it actually means we have what psychologists call “a crowd within.” That is, on one day, our judgments aren’t that well correlated with our estimates on another day. We saw this with wine judges, and it’s true of almost all judgments we make. So a couple of psychologists wondered, “Could we use that crowd within to help people?” They ran a study where they gave 428 people a bunch of tough questions to answer like, “What percentage of the world’s airports are in the United States?” They told people to give their best guess, but here’s where things got interesting. They actually had everyone guess twice. Some people guessed two times in a row, so they gave pretty similar guesses because they remembered their first response. Other people guessed twice but made their second guess three weeks after their first.
Two cool discoveries emerged from the study. First, in all cases, averaging people’s two guesses produce more accurate judgments than taking just one guess alone. So by averaging our repeated judgments, we can cancel out error or noise and zoom in on signal. But second, averaging two guesses people made that were three weeks apart led to significantly higher-quality guesses than averaging two guesses made in quick succession. So it was helpful to inject more noise into people’s guesses by giving them time to forget their initial estimates and come up with a truly distinct new opinion.
Now, I don’t want you to walk away thinking that the noisiness of our judgments is great news. In general, it’s not. It’s cause for humility and cause for relying more on algorithms and less on fickle humans to ensure we make fair and reliable choices. But it does give us an opportunity to harness the crowd within when we’re trying to make a forecast.
You’ve been listening to Choiceology, an original podcast from Charles Schwab. If you’ve enjoyed the show, I’d be really grateful if you’d leave us a review on Apple Podcasts. It helps other people find the show. You can also subscribe to the show for free in your favorite podcasting apps. Next time, we’ll look at a subtle nudge that can have an outsize influence on people’s choices in business, health, and technology. I’m Dr. Katy Milkman. Talk to you next time.
Speaker 8: For important disclosures, see the show notes or visit schwab.com/podcast.