Archive for July, 2011

One Pill, Two Pills, Red Pills Blue Pills

Tuesday, July 26th, 2011

Why do the same drugs look different? Pills, trade dress, and public health is the title of a paper by Jeremy Greene and Aaron Kesselheim published earlier this month in the New England Journal of Medicine (subscription required). It discusses the legal concept of trade dress as applied to pharmaceuticals. Trade dress, as much as I can understand, is not the same as a trademark, though the two are highly related. Trade dress refers to the branding of a product, and prevents competitors from copying the look and feel of a product to confuse consumers or piggyback on an existing product’s reputation.

One requirement for trade dress is that it must be non-functional. It also must be distinctive in the sense that it identifies the source of the product as a well-known brand. Apple’s sleek aluminum appearance on its computers seems to (sort of) fit that definition. It’s non-functional in the sense that one could produce a product with similar quality and durability with other colors or materials, and customers are associating the style with the Apple brand more and more. I’m not sure if the degree is enough to qualify for trade dress protection, but it’s going in that direction.

Here’s a more clear cut example: if I wanted to start a new fast-food restaurant called Wally’s, I couldn’t flip McDonald’s golden arch upside down to make a ‘W’ and use the same color scheme or fonts. Such a move would confuse customers who might think my restaurant is an offshoot of McDonald’s.

Similarly, pharmaceuticals attempt to create brand association by making their pills a certain color or shape. AstraZeneca advertises Nexium as the “purple pill” while Pfizer’s Viagra is recognized as a blue diamond. These attributes are non-functional and help to distinguish the product from generic competitors.

But are they really non-functional? The placebo effect is well-known; patients often feel better when given a pill that contains no active agent. The placebo effect is, not surprisingly, stronger for conditions with a large psychological component, such as depression, anxiety, pain, or impotence, and it has also been demonstrated in arthroscopic knee surgery. However, for a given condition, there is no one empirical placebo effect; the placebo’s effectiveness can vary by the color and smell of the pill, or by how much a physician talks up the benefits of treatment.

In a couple of famous studies from the 1970s, Italian researchers showed that the color of the placebo was significant for improving sleep quality and duration. In fact, the color effect differed between males and females; women thought the blue placebo was about as good as an actual blue pill, but an orange placebo was inferior to the actual treatment. Men did not observe much difference between the blue and orange placebos, although they did slightly better with active drugs than either placebo.

(It’s often difficult in clinical trials to create a placebo that exactly matches the active drug in size, shape, and smell. Some inquisitive patients like to break capsules apart to attempt to figure out which arm of the trial they are on. But not all clinical trials use placebos – in some cases it’s impractical or unethical, and other times it makes more sense to compare an experimental treatment to a standard existing treatment.)

Brand name pharmaceuticals have used trade dress as a way of distinguishing their products from generic competitors. Courts have upheld this especially when the generic is not chemically equivalent. However, in one case (Adderall), the drug was marketed with a color scheme for distinguishing between different doses. The color scheme was ruled to be functional, and thus not protected under trade dress.

In addition to the placebo effect, having different colored generics can increase prescription error and decrease medication adherence, especially for people on multiple drug regimens who use appearance as one way of identifying them. Is there a definitive policy that would be appropriate for everyone? I don’t know, but it’s a clever interplay of the law, medicine, and business.



Greene J & Kesselheim A. Why Do the Same Drugs Look Different? Pills, Trade Dress, and Public Health. NEJM 7 Jul 2011.

Lucchelli PE, Cattaneo AD, & Zattoni J. Effect of Capsule Colour and Order of Administration of Hypnotic Treatments. European Journal of clinical Pharmacology 1978.

Cattaneo AD, Lucchelli PE, & Filippucci G. Sedative Effects of Placebo Treatment. European Journal of clinical Pharmacology 1970.

Moerman D. Meaningful Placebos – Controlling the Uncontrollable. NEJM 14 Jul 2011.

Enserink M. Can the Placebo be the Cure? Science 9 Apr 1999.

Book review: Lies My Teacher Told Me, Part 1

Saturday, July 23rd, 2011

Authored by sociologist James Loewen, first written in 1995 and revised in 2007.

The title is provocative, but the book itself is not so much an attack on teachers but on the consortium of American high school history textbook publishers. Loewen’s regret is that the massive textbooks have too much quantity and not enough quality; students muddle through their history curriculum by memorizing and soon forgetting a litany of dates, places, and names, without ever connecting bigger themes or appreciating them.

An even bigger problem is that the textbooks tell an incomplete history, or sometimes the wrong one. If one could summarize their main theme in a single sentence, it would have to do with a neat and clean founding myth, that the United States from the beginning was destined to be a beacon for the world. This hubris shies away from messy details and controversy, and presents history as engraved truth rather than acknowledging continuing arguments among historians. In doing so, the textbooks miss an opportunity to engage students in debate over what really happened, boring them with detail after detail instead of allowing them to explore human behavior for themselves, through primary and secondary historical documents that represent multiple perspectives.

The state-sponsored high school history curriculum naturally puts the United States and its government in the best possible light. The major exception is slavery, which is so obvious and ingrained that it cannot possibly be ignored. But negative topics such as American Indian relations, interventions in Latin America and elsewhere, and some of the U.S. government’s own destructive domestic policies are glossed over or ignored. Students are left with a portrayal of historical figures that is too neat and clean, robbing them of the chance to understand the human faults of the people whose names they are supposed to remember.

I’ll proceed with a detailed reflection on the second chapter, since it epitomizes much of the rest of the book and focuses on a familiar figure, Christopher Columbus.

Here is the Columbus story more or less told by the history textbooks:

Christopher Columbus came from humble beginnings in Genoa, Italy, (doesn’t everything come from humble beginnings these days?) and ventured as far as Iceland and West Africa. He became convinced that the Earth was round and thus he could reach the East faster by sailing west. He lobbied monarchs across Europe for funding for an expedition. Finally, Ferdinand and Isabella of Spain agreed to underwrite a modest expedition, which Isabella paid for in part by pawning crown jewels. With three small ships and a motley crew, Columbus departed in the late summer of 1492. He kept a separate log of distance traveled so the crew wouldn’t know they were really farther from home than they thought, but nevertheless, the crew nearly mutinied. After a storm-filled voyage, they landed in the West Indies on October 12, whereupon Columbus glorified God and claimed the land for the King of Spain. He returned home and eventually made three more voyages in search of gold and a passage to Asia, but his monumental discoveries were never fully appreciated in his lifetime. Columbus died in obscurity, never convinced that he had found a New World.

Now the version told in Lies:

Columbus claimed to have been born in Genoa, but there is some evidence that he might not be from there. He also claimed to have visited Iceland in 1477; if so, he surely would have heard the Norse sagas about settlements on Greenland and Newfoundland. Whether he intended to sail to Asia or discover new lands will probably never be known. That the world was round had been known at least since the Greeks; few learned people in Europe and probably elsewhere would have believed in a flat earth in 1492. Columbus did petition numerous monarchs to fund an expedition, and received it from Queen Isabella, although the selling of the crown jewels appears to be a myth. The trip lasted two months (five weeks across the ocean following a stopover at the Canary Islands), had good weather and no mutiny, only some sailors getting on each others’ nerves after spending weeks together. Columbus did keep a separate log but his motive was to keep his route to the Indies secret. When they returned to Spain, news of the discovery was well received and Columbus was approved for a second, larger voyage. His subsequent voyages resulted in murder, mutilation, and enslavement of inhabitants of present-day Haiti and other Caribbean islands, although a gold discovery in 1499 did make Columbus very well off and he left a considerable fortune to his heirs.

The Lies version appears a lot closer to the consensus truth, at least with regard to the details above. But Lies goes further in suggesting there were numerous pre-Columbian contacts, and puts itself on shaky archaeological ground. For example, there’s this sentence on p. 39: “Ancient Roman and Carthaginian coins keep turning up all over the Americas, causing some archaeologists to conclude that Roman seafarers visited the Americas more than once.”

To quote Pat Kessler, that’s misleading. “Some archaeologists” means a stubborn minority. In fact, the referenced journal article, from Current Anthropology, is an attempt by Jeremiah Epstein to disprove the authenticity of all of the alleged coin discoveries. None of the finds were made at archaeological sites where the material could be accurately dated; instead, they mostly turned up in people’s gardens after Europeans began migrating en masse to the North American interior in the 19th century.

Similar problems apply also to another example put forth – some statues found in Mayan lands supposedly resemble negroid, or African faces. Lies claims there is a possibility that Phoenician sailors reached America in the first millennium B.C.E. (The Phoenicians were based in modern day Lebanon and ran a successful trade empire throughout the Mediterranean for several centuries.) Again, many of the statues were purchased in modern times, not found among ruins, and the faces could easily be Mayan or, as some have suggested, jaguars. Lies at least acknowledges this much, concluding “Most archaeologists think they were Mayan, so including the Afro-Phoenicians [in history texts] must be done as a mere possibility – an ongoing controversy.” Lies also suggests that West Africans were voyaging to Brazil during the 1400s.

The thing is, there will always be archaeologists and authors who espouse farfetched hypotheses to get attention. Like the media, the mundane and true is rarely news. Perhaps Loewen is simply advocating inclusion of these claims as a way of making history more interesting to students. But a lot of people are susceptible to conspiracy theories, and giving them equal time sounds an awful lot like the intelligent design argument. (One of the reasons I think people still fail to accept that the Theory of Evolution is the only general natural history theory that belongs in the science curriculum is because most non-scientists think theory is a synonym for hypothesis. But the Theory of Evolution is supported by mountains of evidence, and is no more hypothesis than is Newton’s “Theory” of Gravity.)

In Guns, Germs, and Steel, Jared Diamond takes a different approach: he acknowledges competing claims but then goes on to refute the more ridiculous ones. Ironcially, Lies references Guns multiple times but Loewen seems to forget about it whenever it refutes a claim for which he advocates inclusion. I think Guns would be an excellent alternative to many world history textbooks, and contains many of the sorts of perspectives Loewen finds lacking in standard textbooks.

A table on p. 40 lists 15 or so possible pre-Columbian contacts. Of these, I would bet on only three. One, a Norse settlement on Greenland (which lasted over 400 years but never had more than a few thousand inhabitants) and its repeated contact with the North American mainland. Two, repeated contact across the Bering Strait between Siberians and Inuits. Three, Polynesian contact with South America. From modern-day Indonesia, the Polynesians colonized islands as remote as Hawaii, Madagascar, and Easter Island, a tiny speck hundreds of miles from any other inhabited island and only 2,200 miles west of South America. In fact, ancestors of those Polynesians were the first to settle Australia, between 30,000 and 40,000 years ago. Such a voyage would have required crossing a channel at least 50 miles wide, even during an Ice Age, making this the first known use of oceangoing watercraft in human history.

The other fault I see in Lies is that while it is critical of standard textbooks for portraying a clean, rosy founding myth, it swings the pendulum too far in the other direction by embracing too much political correctness and white guilt. This is why Loewen brings up the supposed pre-Columbian contact by Phoenicians and Africans; he wants to find non-European centered themes to talk about. And while the American Indian genocide certainly deserves a lot of attention, its perpetrators, white Europeans, are by no means the lone guilty party in history. In every world region, including Africa, Asia, pre-Columbian America, and Polynesia, whenever two technologically-mismatched civilizations have clashed, the lesser one has usually been subjected to enslavement, torture, and/or genocide. This is a significant attribute of human nature, and one that deserves careful study.

In Chapter 4, Loewen does a nice job of highlighting the influence of American Indians on white culture well into the 19th century, helping make white American culture distinct from British or broader European culture. However, their contributions and the successful societies they themselves had became marginalized from a history standpoint when whites realized they needed a way to explain theft of Indian land, which was easier if Indians were remembered as “primitive”. The influence of the Spanish, French, and Dutch was also largely brushed aside. We likewise saw this revisionist history a few posts ago in the Evolution of God.

However, it’s far too easy to portray the Indians as noble, peaceful tribes who did nothing but care for the earth. War, treachery, and violence were prevalent within and between Indian societies, just as many white individuals sought to help, co-exist, and trade with Indians. But Loewen asserts that Indian warfare only increased after contact with Europeans, when they acquired guns and learned of new military techniques. Oh.

Another theme now frequently put forth by politically correct types is that the West grew rich by “exploiting” native civilizations in other parts of the world. Again, this is very much worth discussing, but while we’re in the spirit of including multiple viewpoints, let’s include Milton Friedman. Friedman asserts in this video that the administration of colonies always cost the mother country more than it received from the arrangement. He also points out that for the native inhabitants who survived the initial wars and disease, quality of life improved through contact and trade. One of my friends recently visited Alaska and met a local Inuit, who told him that “before whites arrived, we spent our entire lives attending to basic survival needs.” As Friedman says in the video, ”The wheel had not yet been invented in parts of Africa by the end of the 19th century.” I don’t know who is right, Friedman or his politically correct counterparts. But Friedman and his colleagues spent decades empirically researching issues like these and were right about a lot of other things; I would at least entertain the possibility that they are right on the colonization claim.

In a nice coincidence, John Stossel’s show this week was entitled ‘politically incorrect history,’ which presented a wide range of myths that are perpetuated in American history classes. I especially liked authors of One Nation Under Sex: How the Private Lives of Presidents, First Ladies, and their Lovers changed the course of American History, which I’ll have to add to my reading list. Also Ben Franklin’s treatise on selecting an older mistress. No wonder Franklin never finished his autobiography!

More on sports and society

Friday, July 15th, 2011

A few baseball players make $10-$20 million every year in base salary. Doctors make good salaries, but nowhere near that range*. Teachers make much less. Outrageous! Is this a symptom of misplaced priorities in society? This Bart Hinkle column takes on that myth. A few key paragraphs:

Jeter’s millions might be a good deal for the Yankees, but don’t they stray from what is often called “social justice”? What does it say about a society that pays a teacher thousands and a shortstop millions?

At this point it helps to consider Wilt Chamberlain, who was once to basketball what Jeter is to baseball today. In what has become known as the Wilt Chamberlain Hypothetical, the late philosopher Robert Nozick invites us to consider whether Chamberlain is entitled to the fruits of his game-playing labor.

Suppose, Nozick said, that there is a society in which wealth has been distributed ideally, however you want to define “ideal.” (In this case, let’s say everyone has exactly the same amount of money.) Now suppose Chamberlain signs a contract that entitles him to 25 cents out of every admission ticket sale. In the course of a season, 1 million people attend the games to watch him play. At season’s end Chamberlain ends up $250,000 richer than anyone else.

Is this unjust? If so, why?

* Actually, it makes more sense to compare the highest paid baseball players to the highest paid doctors, or average baseball players to average doctors. While a few hundred baseball players have million dollar contracts, the major league minimum is just over $400,000, but if we go down to AA minor league ball, the base salary is $1,500/month. There are a lot more baseball players in the minors than the majors. And there are a lot more professional doctors than professional baseball players. I don’t know who the richest doctors are, but I’m guessing Dr. Phil is way up there.

The key detail is that baseball players are part of a general class called entertainers. (You could argue that so in fact is Dr. Phil.) People are, collectively, willing to freely pay lots of money to be entertained by the very best entertainers. In that case, the very best entertainers provide a very valuable service to society – they make lots of people happy.

However, the nature of their work provides an important clue as to why the pay scale in the entertainment business is so skewed compared to the medical or education industries. Derek Jeter can, every season, play in front of 7 million people who spend $20, $30, $40, etc. per ticket, plus millions more who watch on television. The best and most sought doctors cannot see 7 million patients in a year, or even a lifetime.

Another wrinkle in this complex calculus in this: people are willing to pay a huge surcharge to watch the very best athletes (think of the difference between major and minor league ticket prices). However, suppose we could somehow keep the 500 best baseball players in every generation from playing baseball (the public would have no knowledge of them). The major leagues would instead be populated with what would have been lower-grade major leaguers and high minor leaguers, who would end up being the ones who make millions in our hypothetical scenario. The entertainment value of the game would probably not decrease noticeably from what it would have been with the 500 included. The second-best 500 are still really, really good, and a few would look like bona fide superstars in the absence of the “exiled” superstars. In contrast, if one were to remove the best 500 doctors in the country, the quality of medical care and pace of innovation would almost certainly decline. Not only do these doctors treat patients directly, but their methods and research can be taught to other doctors, and inspire derivative research, etc. On the other hand, even the best doctors rely on collaboration with colleagues and support staff, so perhaps their (relatively) more uniform pay distribution is somewhat fitting.

It’s inarguable that the enormous rise in baseball salaries has coincided with an enormous rise in the revenue generated by the sport. In 1975, total attendance was a hair under 30 million. Last year, it was 73 million. Nominal ticket prices averaged $3.30 in 1975 (real price $13.22 in 2010 dollars); they were $26.59 last year. As inflation-adjusted ticket prices have doubled in 35 years, attendance has more than doubled, for what is essentially the same product.

You could make a lot of conclusions with those numbers and others like them. The big trend I see is that society has a lot more money to spend on leisure and entertainment than it did a generation ago. Proponents of health care reform point to increasing health care expenses as evidence that reform is necessary. However, in an increasingly wealthy society, that is to be expected. As the real cost of basic needs like shelter, food, and transportation decrease (as they unequivocally have over the last century, recent blips in gas and food prices notwithstanding), people have more to spend on electronics, baseball tickets, and health care. Why are the first two viewed as positive or neutral while the third is taken as an alarm that requires massive government intervention?

To be fair, some proponents of health care reform are saying that the problem isn’t that we spend too much on health care, but that we don’t spend enough for certain groups of people. But many in those groups buy tickets to baseball games, and other sporting events, and movies, and big televisions to watch baseball games and movies, then turn around and claim they can’t afford the medical care they want. The solution is to subsidize their medical care by taxing people like Derek Jeter, whom the subsidy recipients paid, under no obligation, to be entertained. How is this arrangement any more “fair” than the supposed “unfairness” of income inequality which results from the Chamberlain Hypothetical?

This example becomes even more paradoxical if we substitute doctors for baseball players. Most doctors (a) make good salaries, but (b) incur lots of debt in medical school, and (c) work long hours in medical school and as a doctor. Some people think that’s a good tradeoff, and some don’t. In a diverse and tolerant society, that difference of opinion wouldn’t be a big deal, but anyway, I’ve heard lots of talk about expanding health care coverage, but little talk about expanding the supply of health care providers. The downstream effect of giving more services to some people would therefore be that other people will get less services, or existing providers will have to work more to provide the extra services. Presumably, they would get paid more for the extra work, but many are already in the income brackets that would be taxed to subsidize the extra care. The net effect is that doctors would be required to “pay” for a portion of their own work, which sounds an awful lot like servitude.

Maybe instead of going to med school, becoming an athletic trainer would look like a brighter option for aspiring would-be doctors.

Book review: Scorecasting, Part 5

Tuesday, July 12th, 2011

There are eight more chapters, but I won’t spoil them all. There’s some stuff on the NFL draft, steroids, icing the kicker, and a chapter that argues against the existence of momentum. I said it first, I said it first!

(Actually, Amos Tversky said it first, but as he said, “I’ve been in a thousand arguments over this topic. I’ve won them all, and I’ve convinced no one.”)

The next-to-last chapter is subtitled “Why ‘four out of his last five’ almost surely means four of six,” and provides a nice rebuttal of streak and small sample size-based statistics. Statistics with a larger sample size (i.e. a whole season) are almost always more predictive of future performance (even near-term) than statistics with a smaller sample size (i.e. last five games). In fact, it would probably be more meaningful to list a player’s career batting statistics than just the current season, or maybe his statistics over his last 500 plate appearances to account for young and old players whose performance deviates from their career average.

Another chapter is subtitled “Why American Idol is a fairer contest than an NFL overtime.” This should be obvious to anyone who watches the NFL; the winner of the coin toss wins 61% of overtimes, versus 39% for coin-toss losers. In fact, 37% of the time, the loser never even gets the ball. I’m a bit dumbfounded that the NFL hasn’t abandoned their model in favor of the college version (in which both teams get a chance on offense regardless of whether the first team scores). Others have proposed letting the kicking team kick from their 35-yard line instead of the 30, resulting in more touchbacks and probably evening out the winning percentages. Another outside-the-box idea is to have a sealed bid auction, with coaches putting in a bid for what yard line they would agree to take the ball. Whichever team bids closest to their own end zone gets possession at the spot of their bid.

However, the more I think about it, if two teams are tied after regulation, they are probably pretty even, and in the small sample space of overtime, luck is going to have a large role in determining the winner anyway. A coin flip lends expediency to the proceedings. Similarly, in soccer, some have suggested that penalty kicks are a crude means of deciding a winner if the match remains tied after overtime, akin to having a free throw shooting contest at the end of a tied basketball game. But overtime matches in international soccer last over 2 hours (that’s all game time). Teams are only allowed three substitutions the entire game (no extras in overtime), and substituted players may not return. Matches are typically played outdoors in warm climates. The players are exhausted, and having multiple overtimes like hockey and basketball would likely result in an increasingly slow and conservative game. Since soccer is low-scoring anyway, such an affair could take a long, long time to settle. Penalty kicks at least employ some element of skill to decide a winner in a timely manner.

Michael Phelps won one of his eight gold medals at the Beijing Olympics by 0.01 seconds. Had the margin been any closer, the race would have been a tie. Of course, at some level, someone would have reached the finish line first, but the difference would have been smaller than our ability to accurately measure it. Equivalently, in democratic elections, the margin of victory is sometimes so small that by law a recount is conducted. Inevitably, the recount turns up many cases of ambiguity: voters do not mark their ballots clearly, some voters vote illegally or more than once, and ballots are lost or miscounted. What is clear is that we cannot possibly 100% fairly and accurately measure the results of elections with anything more than maybe 3,000 or so votes cast. In the case that the margin of victory is smaller than the estimated inaccuracy percentage, say 0.1% of the total votes*, it’s probably just as fair to decide the winner by a coin flip than by recounts and lawsuits. A coin flip would certainly be faster and cheaper than court challenges and the make-up-rules-as-we-go-process that arises to deal with unforeseen ambiguities, although it wouldn’t afford partisans the opportunity to flood the contested district and continue smearing the opposing candidate for months after the election.

* Of course, we’d still have a problem in elections that finished close to the threshold but slightly over it: the loser would present challenges in an attempt to bring the certified total within the threshold thus forcing a coin flip.


I’ll also make some comments on the last chapter re: Are the Chicago Cubs cursed? If not, then why are the Cubs so futile?

The authors do a nice job distinguishing “bad luck” from simply “being bad”. An unlucky team is one that outscores its opponents over an entire season, but loses a lot of close games and ends up with a .500 record. Or is consistently good but finishes in 2nd place a lot more times than 1st. A bad team just finishes at the bottom of their division a lot. Interestingly, the Houston Astros are put forth as a team that has consistently won a lot of games but have almost no postseason success to show for it.

Here are the Cubs compared to another team that was supposedly cursed for most of the 20th century, Boston, which did finally win a championship in 2004 after an 86 year drought.

Chicago Cubs (1901-1950): 10 1st place finishes, 6 2nds, 3 lasts, 2-8 in World Series (titles in 1907, 1908)
Boston Red Sox (1901-1950): 7 1st place finishes, 9 2nds, 10 lasts (all but one between 1922-1932, the decade after the sale of Babe Ruth and numerous other stars), won 5/6 World Series appearances (no WS played after 1904 AL championship)

Chicago Cubs (1951-2000): 2 1st place finishes, 4 2nds, 11 lasts, 0-3 in postseason series (no WS appearances since 1945)
Boston Red Sox (1951-2000): 6 1st place finishes, 8 2nds, 1 last, lost all 3 WS appearances and went 3-5 in all other postseason series.

Eh, so it’s a wash the first half of the century. The Cubs maybe got a little unlucky going 2-8 in World Series. The Red Sox were the best franchise in baseball from 1901-1918, then sold all their stars and had an awful decade. But since World War II, the Cubs have just been bad. The Red Sox, you could make a case for, had bad luck. From 1946-2000, the Red Sox played in four World Series and lost them all in the seventh game. They lost one-game playoffs to get into the postseason in 1948 and 1978, and lost what was effectively a one-game playoff on the final day of the 1949 season.

About the only case you could make for unluckiness derailing the Cubs was in 2003, a year in which they won their division for the first time in 14 years and a postseason series for the first time in 95 years (the NLDS vs Atlanta). Although their opponent in the NLCS, Florida, was the wild card, the Cubs actually finished the regular season with 3 fewer wins. Both teams had about league-average offenses, and above-average run prevention. So no clear advantage for either side with any large sample numbers. The Cubs did get out to a 3-1 series lead, and after losing Game 5 in Florida 4-0, they led 3-0 after 7 innings in Game 6 at home and 5-3 after 4 innings in Game 7 at home. But everyone points to a certain foul-ball in the 8th inning of Game 6 as supposed evidence of a “curse”.

The Bartman play should have been the least of the Cubs’ worries. Here’s some exculpatory evidence in favor of Bartman (and the book mentions the first three):

1. The key to Florida’s eight run rally wasn’t so much the foul ball, but the Cubs’ shortstop, Alex Gonzalez, dropping a potential inning-ending double play grounder two batters later.
2. The Cubs’ veteran left-fielder, Moises Alou, unnecessarily threw a fit after failing to catch the foul pop, possibly unnerving the rest of the team.
3. The batter was Florida’s singles-hitting second baseman, Luis Castillo, who was subsequently walked by Cubs’ ace Mark Prior with a three run lead and Florida’s 3-4-5 hitters next in line.
4. At least one other set of hands was reaching for the ball. The left-field umpire, whose name has never been mentioned, was positioned 15 feet away and did not call fan interference.

The Cubs have simply had bad teams for the most part. Next, the authors hypothesize a reason: the franchise still draws lots of fans even when they lose, in large part because of Wrigley Field (and, as the book points out, cheap beer, cheap being relative at $5). Financially, the Cubs have less incentive to win than most other teams. On top of that, a large part of their appeal seems to stem from their role as ‘lovable losers’.

An attendance comparison with the crosstown Chicago White Sox is made (data shown on p. 245 for 1998-2009). Despite usually fielding poorer teams, the Cubs outdrew the White Sox every one of those years. The only year it was even close was 2006, when the White Sox were the defending World Series champs (their first title in 88 years) and the Cubs were dead last in the NL.

The concept of attendance elasticity is introduced, i.e. how much a team’s attendance varies with their win percentage. An elasticity of 1 means the two are perfectly matched; below 1 means year-to-year changes in win percentage exert less influence on attendance. The Cubs’ elasticity is 0.6; I’m not sure how to put that number in context, but it’s the lowest in baseball.

While this theory is intriguing and probably has some merit, I would like to see the same analysis applied to all teams over a longer time period. The financial structure of baseball has changed drastically during the Cubs’ losing century. Until 1970 or so, Wrigley Field was not that unique as there were many urban neighborhood-style parks. Now the only other one remaining is Boston’s Fenway Park, which is also usually filled to near-capacity despite expensive tickets. Yet Boston’s attendance elasticity is 0.9, close to the league average, and the Red Sox ended their curse by winning two titles in the 2000s. I am not sure what to make of this.

There is another looming factor, inept management, which Scorecasting alludes to briefly but never really embraces. There are so many teams in similar markets with similar stadiums that nevertheless have drastically different results. Kansas City, St. Louis, Cincinnati, Cleveland, and Pittsburgh are all small market teams in the rust belt. St. Louis has been very successful in recent decades, with numerous division titles and a championship in 2006. Cincinnati and Cleveland have had mixed success, while Pittsburgh and Kansas City are consistent cellar-dwellers. I have to think that management, scouting, and personnel decisions, as well as the owners’ willingness to spend on payroll have a lot to do with that variance.

In 1933, after 9 last place finishes in 11 seasons, the Red Sox were bought by Tom Yawkey. Yawkey’s family and trust owned the Red Sox for 70 years, which coincided perfectly with their mildly successful but unlucky period. In 2002, a group led by John Henry bought the Red Sox and began pouring money into the team’s roster, igniting an arms race in the AL East. Within 5 years, Boston won two championships. One of their first moves was to hire stats guru Bill James as a consultant. Around the same time, the Cubs hired as their manager one Dusty Baker, who infamously remarked that big, slow power hitters who walk a lot and thus get on base were detrimental to his offense because they “clog up the basepaths“.

The Ricketts family purchased the Cubs and Wrigley Field from the Tribune Company in 2009 for $900 million. We’ll see if that changes anything. As Cubs announcer Jack Brickhouse once said, “Everyone is entitled to a bad century.”

Book review: Scorecasting, Part 4

Saturday, July 9th, 2011

Chapters 9 and 10 try to find an explanation for persistent home-field advantage. If one were to only read two chapters, I recommend these two. First, they deal with a well-known principle with an elusive answer. Second, they form an excellent example of proper scientific process. Formulate a testable hypothesis, isolate the hypothesized mechanism as much as possible, adjust for other factors that might influence the mechanism, test your hypothesis with lots of quantitative data, then think of other possible, alternative explanations and test them too.

Both chapters use an actual game as an illustrative example, but by themselves these examples do nothing to prove the theory because they form a sample size of two. However, they provide a source for funny quotes, like this:

“The Blazers were introduced in a lifeless and staccato monotone that recalled the no-purchase-necessary-void-where-prohibited-consult-your-doctor-if-your-erection-lasts-more-than-four-hours-nobody-is-listening-to-me diclaimers at the end of commercials.”

“Then it was time to introduce YOURRRRRR SAN ANTONIOOOOOO SPURRRRRRSS!!! … as the players took the floor to thunderous applause, voluptuous dancers with black-and-silver skirts aerosoled onto their impossibly sculpted bodies did elaborate pirouettes. Charles Lindbergh was barely treated to this kind of fanfare when his plane touched down in Paris.”

That home-field advantage exists is not in dispute. Chapter 9 begins with a table listing the home winning percentage for various sports and leagues. For the NBA, it is a little over 60%.  The NFL and NHL are in the upper 50s, and MLB is 54%. Soccer leagues worldwide vary from 60-70%, with an average of around 63%.

It should be acknowledged that baseball has the lowest home winning percentage even though it is the only sport on the list that has a built-in advantage for home teams – they bat last which means they know exactly how many runs are needed to win or tie before they bat in the 9th inning or extra innings. The NL I think has another advantage for home teams – a pitcher has the opportunity to complete the top of an inning before possibly being removed for a pinch hitter in the bottom of the inning. No AL vs NL data are given, so I’m not sure if that is empirically true.)

What’s remarkable is that while there is variation among the sports, the percentages are very consistent within a given sport when one looks at different historical periods, and also for different leagues of the same sport. For example, the NBA and WNBA have the same home winning percentage, as do MLB and Japanese professional baseball. The only exceptions are soccer, which has some variation, and NCAA basketball and football, which are higher than the corresponding professional leagues. My first instinct when seeing the college/pro discrepancy is that the college numbers are biased because dominant programs schedule a lot of patsy opponents on their non-conference home schedule. (The host school will pay the no-namers to come, so the no-name schools get money and their players get some exposure. The host school gets a virtually guaranteed win in front of their boosters.) In fact, when they adjust for strength of schedule, the college home winning percentages revert almost exactly to that of the corresponding pro leagues.

Here are some conventional explanations for home-field advantage that are refuted: (1) positive crowd energy gives the home team a boost, (2) faraway travel makes road teams weary, and (3) familiar surroundings or quirks of the home team’s field give them an advantage. A fourth explanation, schedule bias, has some effect, but accounts for only about 20% of home-court advantage in the NBA, the sport in which its effect is largest. Of course, in pro basketball, schedules are made by the league and are supposed to be balanced; good teams cannot choose to play inferior teams exclusively at home like in college. However, it turns out that the NBA and NHL like their home teams to win; it translates into happier fans who are more likely to buy extra merchandise on their way out or come back for another game. They favor home teams by scheduling road teams to play on back-to-back nights more often, giving rested home teams a chance to play against tired road teams. This manipulation is more difficult given the nature of MLB and NFL schedules, although the NFL is going in the direction of having a game every night of the week, especially in December after the college season finishes, leaving free nights on the television schedule in which deprived Americans would otherwise not be able to watch football.

The first explanation, positive crowd energy, is tested by isolating the effect of the crowd on a player’s performance as much as possible. In basketball, this means looking at free throw percentages; the shooter isn’t influenced by coaches, teammates, referees, or opponents, only (maybe) the crowd. But the data show that home and away free-throw percentages are exactly the same (76% in the NBA), despite beating thundersticks, waving noodles, and the “questioning of the chastity of the shooter’s sister” that routinely attempt to distract visiting players.

Finding an isolated situation in which the performance of away players is not influenced by the context of the game is more difficult in other sports, but here are a few situations that come close. In football, kickers’ and punters’ statistics are the same home and away. In hockey, home-ice advantage disappears during shootouts, although it’s still present in overtime. In baseball, pitchers’ velocity and accuracy (measured by the computerized f/x pitch data; analysts must have been salivating to get pitch-by-pitch data years before it became available) are the same home and away. (44% of pitches are in the strike zone according to the computerized pitch-tracking software. This I found to be surprisingly low, since most in-game stats I see show 60-70% strikes, but that counts balls swung at and balls put into play as strikes. I don’t have all the data I need to make this calculation, but combining the numbers above with the umpire accuracy percentages from Chapter 1, it appears that on the order of 20-25% of all major league pitches involve batters swinging at balls outside the strike zone.)

Another hypothesis I’ve heard (the authors don’t discuss this one) is that people are descended from humans who passionately protected their home territory, and thus harbor an evolutionary instinct to defend one’s home turf in competition.

So, what is driving home field advantage? This question is the title of chapter 10, which is spent by the authors making a thorough case that referee bias is the true cause. Officials apparently respond to group pressure and an innate psychological desire to please people (i.e. fans), or at least not get mugged on the way out. Here’s the key supporting evidence:

1. Across multiple sports, discrepancies in calls, favoring the home team, are more prevalent for more ambiguous decisions made by the official, like loose ball fouls in basketball, hooking and holding penalties in hockey, and close ball and strike decisions and stolen base calls in baseball. There are no discrepancies for non-judgment calls, like shot clock violations or delay of game for shooting the puck into the stands, suggesting that general sloppy play on the part of the visitors is not causing refs to make more calls against them.

2. Bias increases in situations that have a high impact on the outcome of the game, like in the 9th inning or 4th quarter of a tied game.

3. In parts of the game in which a referee has essentially no role, like the act of free throw shooting, penalty kicks in soccer, or shootouts in hockey, home and away success rates are virtually identical.

4. In soccer, the referee makes one decision every game that has nothing to do with any particular play on the field: deciding how much “injury time” to add on to the end of regulation time (the clock runs continuously after it starts, even during stoppages in play, but the referee is supposed to estimate the amount of time lost to injury and other fracases, and add it back on at the end). When the home team is trailing by a goal, referees add on average twice as much injury time (4 min vs 2) than if the home team is winning.

5. Referee bias is stronger in games with higher attendance and in stadiums in which fans are closer to the field. In a natural experiment in Italian soccer, some teams were forced to play without fans due to hooligan violence, and discrepancies in fouls between home and away teams disappeared.

6. As mentioned before, home field advantage has been remarkably consistent across decades, just as the role of officials has stayed largely the same. However, in 1999, the NFL did change one important aspect of officiating – it allowed instant replay challenges. Before 1999, home teams would fumble the ball as frequently as visitors, but somehow they managed to recover those fumbles at a higher rate. After instant replay was instituted, the advantage disappeared. However, discrepancies in penalties, which cannot be reviewed, remained exactly what they were before.

I don’t think this conclusion is really damning to referees; it just shows they are as human as the rest of us. The bias is subtle enough that it’s rarely recognized by the naked eye. (A lot of partisan fans probably smell home cooking when their team is on the road, but shrug it off at home.) Instead, officials appear to be deferring to the crowd on a few ambiguous calls each game when the crowd is most vocal, which usually corresponds to the most important times in the match.

Still, in retrospect this seems so obvious that I feel I should have suspected it before. The key is that ranking sports by the strength of their home field advantage matches up almost perfectly to ranking sports by the amount of officials’ influence on the game. Basketball refs make subjective calls or non-calls on almost every play*, as players constantly bump into each other. The question of whether a foul has been committed is usually one of degree. By contrast, in baseball there are not more than a couple bang-bang plays each game. Most umpire judgments involve pitches on the edge of the strike zone.

* Pat Riley once devised a strategy that called for his players to foul on almost every play. He reasoned that the officials weren’t psychologically willing to call that many fouls, and gambled that his team could make up for a few extra fouls by gaining an advantage on plays on which a foul wasn’t called.

However, I didn’t realize the extent of referees’ influence on soccer. Soccer is the one sport I have refereed so part of this is probably wishful thinking. (I reffed youth soccer, which was basically neutral because I got an earful from both sets of spectators.) It turns out that in addition to injury time decisions, awarding a penalty kick has a huge impact on the match. Soccer matches are naturally low scoring and the penalty conversion rate is 75%. For this reason, referees call or do not call fouls in the penalty area differently from the rest of the field, even though the rulebook doesn’t advise this, and the higher threshold for penalty fouls is widely accepted (see chapter 1, swallowing the whistle). Referees can also unduly influence outcomes with their decisions to give out yellow and red cards. Americans usually label these reprimands as “warnings” and “ejections”, respectively, but soccer aficionados prefer the British terms “caution” and “send-off”. Bon voyage!

As told in the third paragraph above, soccer home-field advantage varies throughout the world. Interestingly, Africa has the lowest, 60%, although Africa and Asia are grouped together and data only goes back to 2005. It is also widely believed that African referees call fewer fouls than Europeans, leading to rougher matches with less referee influence, possibly explaining the lower home field advantage there.

So, in a way, it is fans who make the difference. But their influence is indirect, channeled through umpires, referees, and officials.

Book review: Scorecasting, Part 3

Thursday, July 7th, 2011

Chapter 5 – Offense wins championships, too
Is defense really more important than offense?

This too is a short chapter, but it presents a bevy of simple statistical analyses across many sports which together provide pretty strong evidence that defense is not more important than offense in the postseason, but rather equally important.

The last paragraph pretty much sums it up: “It’s not defense that wins championships. In virtually every sport, you need either a stellar offense or a stellar defense, and having both is even better. Instead of coming with the ‘defense wins championships’ cliche, a brutually honest coach might more aptly, if less inspirationally, say: ‘Defense is less sexy and no more essential than offense. But I urge it, anyway.’ ”

A few points on the statistics – their basic method is to rank teams according to their offensive and defensive ability (i.e. points scored and points allowed during the regular season), then look at head-to-head postseason contests and count up how many times the better offensive or defensive team won.  There are some minor differences, which are said to be not statistically-significant, but more importantly, the differences aren’t consistent across similar, independent outcomes. That suggests that the differences that do appear are really just noise. For example, better defenses do a little better in Super Bowls but better offenses do a little better when looking at all NFL postseason games. These are mostly independent outcomes; only 1/11 of all NFL postseason games are Super Bowls. An example of two non-independent outcomes is postseason wins and postseason series wins in baseball, basketball, or hockey. Since the team that wins more games always wins the series, those two outcomes are highly correlated. One would expect to get very similar results regardless of which of those two outcomes is chosen for analysis.

Writer and statistician Nate Silver has published some research that suggests defense is more important in baseball playoff games. It’s been a while since I read it, and it sounded convincing although it didn’t suggest a huge impact – more analogous to tweaking basic strategy in blackjack based on the count instead of “it’s a completely different game in the postseason!”

Baseball defense amounts to keeping the other team from scoring runs, which is divided into pitching and fielding. Pitching is by far the more important of the two, although it’s difficult to say exactly how much greater, but sabermetricians have come up with some justifiable metrics, such as defense-independent pitching statistics (DIPS) and fielding measures such as fielding runs above average (FRAA). For non-pitchers, players’ offensive contribution to winning usually far exceeds their defensive contribution. Except for some very rare, very amazing defensive players (see Smith, Ozzie), great individual defense does not usually compensate for below-average offense.


Chapter 6 – The Value of a Blocked Shot
Why Dwight Howard’s 232 blocked shots are worth less than Tim Duncan’s 149

The chapter opens with an interesting side story on how Chicago professor John Huizinga became Yao Ming’s agent. It’s basically about Huizinga’s and Sandy Weil’s efforts to quantify the value of every blocked shot – some are worth more than others. This would probably be of interest to someone who watches basketball a lot more than I do.

Empty-net goals in hockey and interceptions of Hail Mary passes right before halftime are offered as more obvious examples of events that are not as valuable as most goals/interceptions but counted just the same. When Christian Laettner played for the Timberwolves, he once griped that his missed half-court buzzer-beaters should not be counted against his shot percentage. (Laettner actually seems like a pretty good dude; among other things he had some reasonable thoughts on the signing of Ricky Rubio when the media cornered him at a camp he was running a few weeks ago.)

The moral of the chapter is that it’s often easier to count things than measure value. Indeed, a lot of the backlash from sports insiders against newfangled statistical methods is due to this confusion. Whereas the new statistics use seemingly complicated formulas to measure value, the insiders point out that these formulas cannot capture everything a player does, such as a known shot blocker deterring or altering shots without ever blocking them. That retort has some validity, although the new statistics are certainly better than traditional counting statistics. RBI and Runs are highly context-dependent; other than home runs, they have more to do with the players near you in the batting order managing to get hits in the same inning you do. I turned on a ballgame last month and saw a table flashed on the screen that was something like Players Whose Teams Have Best W-L Record When The Player Scores a Run. That belongs in the Hall-of-Fame of meaningless statistics.

Again, some parallels to life outside sports are put forth. Certificates are awarded to students for perfect attendance but attendance doesn’t measure what the student learned – good. Investors care about how many stocks they own but not enough about their value – come again? The price of the stock, which is easily knowable, reflects its value – present value, obviously future value is unknown. I don’t see many investors bragging about the number of distinct stocks in their portfolio. In fact, market-based prices are remarkably efficient “statistics” by which consumers can judge the value of products that often involve multiple inputs, complex production processes, and transportation halfway around the world (see I, Pencil).


Chapter 7 – Rounding First
Why .299 hitters are so much more rare (and maybe more valuable) than .300 hitters

The psychological importance of benchmark numbers is explored. Rationally speaking, a .299 hitter is worth no less to a team than a .300 hitter, just like the difference between a $10.00 shirt and and $9.99 shirt is negligible. In fact, the difference between a .300 hitter and a .275 hitter is about one hit per week. But since .300 is a benchmark number, reaching it brings media adulation and probably bigger contracts. Players therefore employ an end-game strategy in the last game of the season (assuming their team’s playoff status is settled, which is usually the case), sometimes leaving the game once the mark is reached so they don’t dip below it if they make an out later. And there’s this remarkable sentence: “In the last quarter century, no player hitting .299 has ever drawn a base on balls in his final plate appearance of the season.”

Kirby Puckett’s autobiography describes such a scenario. In 1990, he signed a big contract, but was in danger of averaging below .300 for the first time in five seasons. “In the bottom of the eighth inning of the final game of the year someone figured that a hit would get me to .300. I’m up against Dave Burba. Matt Sinatro was catching, and I worked the count to 3-2. Suddenly Sinatro announces from behind his mask, ‘Now what have we here?! A guy batting .299, 3-2 count, here it comes.’ What he meant was – good luck. Get a hit and I’m at .300. I thought that was neat of Sinatro. I think the ump said something, too, like, ‘Come on, Puck.’ The ump might make a statement like that on the last day of the year with nothing but that batting mark riding on the pitch. During the ‘real’ season, never.

“I knew that Burba would throw it hard because that’s his pitch, and I knew he didn’t want to walk me. I hit a bullet to shortstop and Omar Vizquel dove for the ball and then threw me out. Unbelievable. But I wasn’t angry. I hit the ball as hard as I could, and once I hit it, it’s out of my control. I ended up batting .298.”

I’m surprised Scorecasting doesn’t mention Ted Williams. In 1941, Williams wasn’t going for .300 but for .400. If .300 signifies very good, .400 is a mark of baseball immortality. On September 10, Williams’ average peaked at .413 then started sliding downward. Before the final three game series, he was hitting .401 and went 1-4 in the first game, putting his average at .39955, with a doubleheader remaining. Williams played and went 6-8 to finish at .406. The Red Sox were well behind the first place Yankees, so his manager offered him the day off, but Williams refused, later quipping that rounding up to .400 would not have been legitimate. Yet, he played in the last game for the start of which he was at .404. No player has hit .400 since; Tony Gwynn was at .394 when the final third of the 1994 season was canceled by a players’ strike.


Chapter 8 – Thanks, Mr. Rooney
Why black NFL coaches are doing worse than ever – and why that’s a good thing

Research by sociologist Janice Madden is cited, which showed that during the 1990s, there were few (1-3 per season) black NFL coaches, but their teams outperformed those of white coaches, indicating that the bar was set higher for black coaches to be hired (if you were black, you had to be better than average to even get offered a head coaching job). After the league adopted the “Rooney Rule” in 2003, requiring teams to interview at least one minority candidate for coaching vacancies, the number of black coaches increased, but on average their teams did just as well as those with white coaches.

The Onion famously reported in 2007 that Lovie Smith became the first black coach to lose a Super Bowl.

(If you didn’t get the dry humor, it’s because Smith lost to Tony Dungy, whom the mainstream media lauded as the first black coach to win a Super Bowl.)

Book Review: Scorecasting, Part 2

Tuesday, July 5th, 2011

Chapter 3 – How Competitive are Competitive Sports?
Why the Pittsburgh Steelers are so successful and the Pittsburgh Pirates so unsuccessful

I don’t know why this chapter was included. It’s only four pages long; there are a few numbers thrown out, but most of it is babble and, well, wrong. I have a more in-depth essay on competitive balance in baseball here.

Naturally, they bring up the Yankees, with a full page is devoted to an account of their World Series victory parade. Some apples to wolverines* comparisons of the Yankees’ percentage of World Series titles to the market shares of leading companies in various industries. Then some numbers showing that the World Series winners in 2007-2009 vastly outspent the runners-up in terms of player salaries. So what? That implies the runners-up had to beat some higher payroll teams to get to the Series. As I wrote before, payroll probably has moderate, positive correlation to winning. But counterexamples abound.

* I found Chuck Klosterman’s excerpt on apples to oranges so amusing and true that I no longer use the phrase.

Then it gets worse: “However, the reason for the Yankees’ extraordinary success is more complex than [payroll]. Just about everything in baseball’s structure militates against parity.” The next paragraph is about the 162 game season and 5 and 7 game playoff series being large sample sizes (well, larger than football), meaning “the better team will win the majority of the time.”

Where to begin? Parity is built into baseball because a high luck factor is intrinsic to the game itself. The worst team in baseball beats the best team in a single game about 30% of the time. Even in 5 or 7 game series between playoff-caliber teams, the team with the worse regular season record wins on the order of 1/3 of the time or more. The 162 game season is meant to even out the luck, allow the best teams into the playoffs, and keep the mediocre ones out. It did that until baseball let wild cards in, but p. 61 has a lament that “only” 8 teams make the playoffs. Also, the 162 game season is reflective of the physical nature of baseball that allows athletes to play almost every day for half a calendar year. It’s not in place to reduce “parity”.

This is contrasted to the NFL, which supposedly has high parity because with a 16 game schedule and one-game playoff “series”, one little choke can ruin a team’s prospect of a championship. No, no, no! Football is scheduled the way it is because of the extreme physical demands of the sport, not to achieve parity. In a single football game, the better team wins a much higher percentage of the time than in baseball; this works against parity. Parity is not achieved by the schedule but by the NFL’s salary cap and television revenue sharing structure.

Then we return to the subtitle, supposedly now knowing why the small-market Pittsburgh Steelers have been successful and the Pittsburgh Pirates unsuccessful. Again, so what? The Pirates, way back in the early 90s, were successful before they were unsuccessful. Their city didn’t change. St. Louis, Atlanta, and Minnesota are examples of small or mid-market baseball cities with extended periods of winning in the last 20 years. The New York Mets, Los Angeles Dodgers, and Chicago Cubs are large market teams that have continued losing.

“Trying to predict who will win the next Super Bowl is a fool’s errand, but trying to predict who will win the next World Series is far easier.” Really? Are you guys saying before the 2010 season you were more confident of the San Francisco Giants and Texas Rangers meeting in the World Series than Green Bay and Pittsburgh meeting in the Super Bowl? I would have had the Giants/Rangers no higher than 5th-best in their respective leagues at the start of the season, Pittsburgh has consistently been one of the top three teams in the AFC (with Indy and NE) for years, and Green Bay was high on everyone’s list from what I could tell.

As a concluding thought, I note that the entire premise of parity might be a false desire. Complete parity would turn the league into a champion-of-the-month club. Every game would be a crapshoot, and the league would be as boring as it would be if it were in a state of low competitive balance.


Chapter 4 – Tiger Woods is human (and not for the reason you think)
How Tiger Woods is just like the rest of us, even when it comes to playing golf

Taken literally, the title is not surprising (I am still in a critical mood from the last chapter).

Let’s see what it says.

The book gets back on track starting with some messianic predictions from Tiger’s father, Earl: “Tiger will do more than any man in history to change the course of humanity.” Then the requisite comparison to Jack Nicklaus – Woods has won roughly 30% of all tournaments he has entered, while Nicklaus won 12%.

The thing about Nicklaus is that he was ridiculously good in majors. Tiger is right on pace in terms of wins, but Jack is still way ahead if we look at top 3 finishes:

Jack Nicklaus – 18 major championship wins, 19 2nds, 9 3rds
Tiger Woods – 14 major championship wins, 6 2nds, 1 3rd

If consecutive, Nicklaus’s top 3s would have stretched for 11 1/2 golf seasons! Actually, all but 4 occurred in a 20-year stretch, 1962-1981.

But back to the point of the chapter, which pertains to all golfers, not just Woods. An analysis of over 2 million PGA tour putts revealed that putts for par are made at a higher rate than putts for birdie from the same distance. The authors cite the loss aversion principle as the reason; players’ aversion to making a bogey is stronger than their desire to make a birdie. In theory, what stroke the putt is shouldn’t make a difference in how the player approaches the putt; economists would call the number of previous strokes on the hole a “sunk cost”. Interestingly, the research shows that players are more successful on par putts because they are more aggressive (they leave more birdie putts short). As a statistician, I would love to see the distribution of scores for some of these putt ranges – are players making more par putts but also blowing more past the hole, and thus making more three-putts in those situations?

The loss aversion principle is common in stock investing; investors don’t like to sell a stock that has lost value since they bought it, even if a better investment is available. However, this analogy isn’t perfect, since loss aversion would appear to deliver suboptimal outcomes in investing, but the change in strategy to avoid a short-term loss (bogey) in golf appears to benefit players.

The principle is then applied to baseball, to look at the outcomes of 3-2 counts that were once 3-0 compared to 0-2. The loss aversion theory predicts that pitchers who get ahead 0-2 are already chalking up an out (makes sense; I think batting averages after 0-2 counts are under .150), but if the count goes to 3-2, they change strategy to avoid the “loss” of an out. The authors state that pitchers in that situation also play more aggressively by throwing more off-speed pitches and fewer fastballs (51.5%), compared to 55.4% fastballs in a 3-2 count that started 3-0. I’m not sure that off-speed pitches are unequivocally considered more “aggressive”; I think a pitcher who starts 3-0, gets two strikes and throws a ‘here-it-is, hit-it’ fastball could also be considered aggressive. The authors appear to be defining aggressive as throwing a pitch that is harder to hit but also harder to locate. In any case, the pitchers trying to avoid a loss get better results. (opponents’ batting average .220 vs .231 in the opposite scenario. These differences aren’t big but I assume the sample size is huge. They say that other offensive metrics are consistent with the difference in BA, and give the numbers for slugging, but not for OBP, which would be the most logical metric to provide for a 3-2 count.)

There’s definitely something (subtle) going on there, but I don’t know if I would chalk it up to loss aversion just yet. I think a lot has to do with the pitch sequence. For example, a pitcher who gets ahead 0-2 often “wastes” a pitch or two in order to set the hitter up for an out pitch later, or nibbles around the corners for a few pitches. The umpire bias brought up in Chapter 1 probably comes into play here too. The difference in outcomes appears legitimate, but I’m not positive they have the explanation nailed down.

On p. 77, there’s a reference to the work of Richard Thayer in developing the economic loss aversion theory, called the endowment effect. The Duke basketball student ticket lottery is given as an example. I swear I’ve read about the exact same study before, maybe in Superfreakonomics? Thayer et al’s theory is legitimate for economics, but I’m not sure it’s being correctly applied to sports. Another example of supposed evidence: in the NBA, teams that have a big lead and then lose it play more aggressively. An alternative explanation is that it takes a lot of energy for a losing team to get back in the game in basketball and hockey. Teams that lose a lead late only need one more push to put away a tired opponent, and perhaps were taking it easy themselves while the comeback was occurring, which perhaps explains why so many comebacks are left unfinished. And again, what’s with “aggressiveness” being posited as the mechanism through which loss aversion is attempted? Are investors or real estate owners who hang onto a property that has lost value being aggressive? Other than the golf study, the sporting examples appear inconclusive at best.


Book Review: Scorecasting, Part 1

Sunday, July 3rd, 2011

Scorecasting combines two of my favorite subjects – sports and science/statistics.  In the spirit of Freakonomics and Moneyball, the authors apply scientific methods to analyze and sometimes overturn conventional wisdom in sports.  The authors are Tobias Moskowitz, a behavioral economist at the University of Chicago, and L. Jon Wertheim, a writer for Sports Illustrated.  One thing I really like is that they generalize their theories to realms outside sports, taking advantage of readers’ familiarity with sports to introduce them to similar principles in other fields.

Here are a few of the chapter titles and subtitles: why fans and leagues want officials to miss calls, why coaches make decisions that reduce their team’s chances of winning, offense wins championships too, what is driving home field advantage, there’s no I in Team, but there is an m and an e, why Dominican baseball players are more likely to use steroids – and American players are more likely to smoke weed, and are the Chicago Cubs cursed?


Chapter 1 – Whistle Swallowing
Why fans and leagues want officials to miss calls

This mostly focuses on “letting players decide the game” at the end of close contests.  The chapter contrasts two such examples, that of referee Mike Carey in the 2008 Super Bowl (I find the Roman numbering of Super Bowls annoying and refuse to reference them that way from now on.  I cannot recall the significance of Super Bowl XVIIICVLXSLUURRRP.) and lineswoman Shino Tsurubuchi in the 2009 U.S. Open tennis tournament.  Carey was widely praised for a non-call near the end of the New York Giants’ upset of previously unbeaten New England; he did not call ‘in-the-grasp’, which would have turned David Tyree’s game-changing ‘velcro catch’ play into a sack.  Tsurubuchi on the other hand called a foot-fault on Serena Williams on the next-to-last point of a U.S. Open semifinal; Williams then threw a tantrum and was docked a point.  That point happened to be match point so the match was over.  Inexplicably, fans and commentators (including, not surprisingly, John McEnroe) blasted the lineswoman for making the call instead of Williams for her tirade.  That the call was both objective and correct did not seem to matter to critics; although foot-faults are objective calls, they are nevertheless not always called.

As the subtitle suggests, the implication is that fans don’t necessarily want a game to be officiated ‘by the book’ or even consistently, and officials seem to respond near the end of games by calling less penalties or fouls in all sports.  Cleverly, the authors refute the notion that players could be committing fewer fouls at the end of tight games by showing that ‘non-judgment’ calls, like shot clock violations or too-many-men-on-the-ice penalties, do not decrease.  Also, there are few non-judgment calls in baseball, but even there umpires have some discretion when calling balls and strikes.  But it turns out that umpires don’t like to call hitters out on strikes or let them walk; they want the hitter to get on base or make an out swinging.  Comparing umpire calls to computerized pitch tracking, they found that if a pitch is in the strike zone (determined by the computer), umpires will correctly call a strike 93% of the time on 3-0 counts.  But if the count is 0-2, they correctly call the strike only 58% of the time.  (Wow!)

The authors go on to make a distinction between sins of commission (wrong calls) and sins of omission (wrong non-calls), and conclude that the reaction is far worse for the former – officials will take more flak for making a call they shouldn’t have than for not making a call they should have.  When this manifests itself even more at the end of close games, serious bias can result, usually favoring the team that is behind.  The team trying to make a comeback usually plays more aggressively, and can get away with it if officials swallow their whistles.  One of the more memorable examples was in 2005 NCAA men’s basketball, when Illinois beat Arizona in overtime to advance to the Final Four after making up a 15-point deficit in the last four minutes of regulation, resulting in a number of oh-my’s from announcer Dick Enberg*.  The comeback was aided by a series of steals and turnovers, on most of which Illinois could have been called for a foul.  That the game was in Chicago could have exacerbated the situation, but there will be more on that in a later chapter.

Business managers, stockbrokers, and parents who decide not to vaccinate their children are listed as examples of people who are susceptible to sins of commission bias.  In these cases, the severe consequences of inaction are not given as much weight as possible risks that result from actually doing something.  I’ll throw in my own example: the FDA.

There are two possible errors an FDA commissioner can make.  One is to approve a drug or treatment too quickly based on incomplete evidence.  Patients die or suffer serious side effects or waste resources on something that doesn’t make them better.  This is a sin of commission.  The other error is to delay or not approve a treatment that really would be beneficial.  Patients are worse off than they would have been if given access to the treatment.  This is a sin of omission.  The difference in reaction between the two is striking.  The first case often involves headlines, recalls, and heads rolling.  The second often involves…nothing, or maybe a few relatively unnoticed criticisms.  Patients suffer silently, but blame their suffering on their disease, not a failed treatment.  The commissioner keeps his or her job.  Because of this principle, I am almost certain that the FDA behaves too conservatively, approving fewer treatments than would be optimal.  Not to mention the many innovations that are never attempted because of the maze of regulations and restrictions for the approval process.  My proposed remedy is something like the European system, in which any one of a number of private agencies can certify new treatments (think Underwriters Laborartories).  Some of these agencies might use strict criteria and others looser criteria.  Then doctors and patients can choose a treatment based on which stamps-of-approval it has, based on their own level of risk tolerance.


Chapter 2 – Go For It
Why coaches make decisions that reduce their team’s chances of winning

This chapter introduces Kevin Kelley, an Arkansas high school football coach.  I’ve read several articles on him; he’s the one whose teams never punt (even if pinned inside their own 20) and do onside kickoffs when they are winning.  Kelley has calculated probabilities of scoring based on field position, and concluded that having possession of the ball is much more important than 20-30 yards of extra field position.  His strategy is doubly effective because it is original, and forces opposing teams to spend practice time on things they are not used to, like onside kicks and trick plays.

Other quantitative experts have confirmed the superiority of Kelley’s strategy and calculated optimal scenarios to punt or not to punt on 4th down, but NFL coaches haven’t adopted anything close to the optimal strategy.  The reason put forth is that they fear being ridiculed if they call for something unconventional and it doesn’t work out, even if it was the right call statistically-speaking.  In short, coaches are more interested in job security than winning when the two are seemingly at odds with each other.

A few other things that caught my attention:

  • p. 45 – why do baseball managers insist upon using their closer in the 9th inning only?  I have always maintained that your best reliever should be summoned in a 7th inning tie with runners on base, instead of starting the 9th inning when your team has a 2-run lead and the opponents have nobody on base.  ”Saves” are a virtually meaningless statistic.  Baseball Prospectus’s Baseball Between the Numbers has a full chapter on this.
  • p. 43 – criticism of basketball coaches’ decisions to sit a star player with foul trouble.  While I agree with this sentiment (one of the sentences says that a player might foul out of the game if he plays, but if the coach sits him on the bench, he ensures that he’s out of the game), I am however confused by their justification in terms of the plus-minus metric.  Plus-minus in basketball sounds the same as plus-minus in hockey; a player gets a ‘plus 1′ if his team scores a point while he is on the floor, and a ‘minus 1′ if the opponent scores.  However, they claim that for an average NBA player, his plus-minus is almost two points lower in the fourth quarter than in the first quarter.  I do not understand how this is possible for the “average” player, since plus-minus has to balance to zero among all players (unlike hockey, which doesn’t completely balance due to shorthanded and extra attacker goals).  I could see it for “star” players, since the stars usually play against each other in the fourth quarter, meaning better competition.  A star could have a higher plus-minus in earlier quarters when he’s playing and the other team’s star is resting.
  • p. 46 – pulling the goalie.  I am on the fence on this one, since I have never really liked pulling the goalie, but I’ll submit to empirical evidence if it’s there.  It’s exciting when it works, but too many times I’ve seen a team give up an empty-net goal only to subsequently score with little time remaining and lose by one.  From the text: “We found that NHL teams pull their goalies too late (on average with only 1:08 left in the game when down by one goal and with 1:30 left when down by two goals).  By our calculations, pulling the goalie one minute or even two minutes earlier would increase the chances of tying the game from 11.6 percent to 17.6 percent.”  I don’t see a referenced paper for this topic, but I would guess they are coming up with these percentages by simulating games based on the rate of extra-attacker goals, the rate of empty-net goals, and the rate of even-strength goals.  If true that’s a valid start, but it overlooks line changes, which are a dicey proposition when your net is empty.  Most NHL shifts last 30-40 seconds, although stars can be extended to 60-70 seconds at the end of a game, and that’s exactly when teams pull their goalie on average.  If you have a two goal deficit, you can pull earlier and hope to get a quick goal, which gives you a chance to briefly rest players or change personnel.  If you get down to 20 seconds without a goal or whistle, you’re screwed anyway.  I stand by my position that the goalie should not be pulled until the final minute.
  • Another thought on pulling-the-goalie – a few years ago longtime Boston University coach Jack Parker pulled his goalie mid-game when his team had a two-man advantage, creating a 6-on-3, reasoning that the chance of the opponent scoring was negligible.  (He seems to have gotten the idea from former NHLer Uwe Krupp.)  I like that move.

Back with more later.


* For the Warrior hockey players who likely make up about 2/3 of this blog’s readership, you would not have seen the Illinois-Arizona game live because it coincided with one of the practices right before the Cottage Grove tournament.  Except for Morlok, who did not play in the tournament because he was attending the U of Illinois.  Also, I like Dick Enberg and the comment about him was meant to be humorous; just wanted to make that clear since commentators are often a subject of derision on these pages.


Saturday, July 2nd, 2011

Yesterday, a day before the Wimbledon final, commentator Mary Carillo, in an interview with finalist Maria Sharapova: “You have much more experience than she does.  How is that going to be a factor?  It has to be to your advantage, right?”

2004 Wimbledon Final
Serena Williams, age 22, 7 previous Grand Slam finals, 6 previous Grand Slam championships
Maria Sharapova, age 17, 0 previous Grand Slam finals
Sharapova wins 6-1, 6-4

2011 Wimbledon Final
Maria Sharapova, age 24, 4 previous Grand Slam finals, 3 previous Grand Slam championships
Petra Kvitova, age 21, 0 previous Grand Slam finals
Kvitova wins 6-3, 6-4

(I did not cherry pick these; Sharapova has been in exactly two Wimbledon finals)

I think the postseason/championship experience factor is vastly overplayed.  Part of the reason has to do with the fact that in most sports, an athlete’s prime seasons are 3-6 years after the age at which most break into the major league ranks in a given sport.  If they’re at the lower end of that range, they’re just as good at present as someone at the upper end, but haven’t played long enough to accrue championships.  Of course, we don’t know which 3-year veterans will go on to be perennial all-stars or merely have a serviceable professional career.

Experience probably has some effect in the sense that veteran players can make subtle adjustments in specific situations because they’ve seen them a thousand times before in their professional careers.  But most commentators equate experience to “handling pressure situations,” which is questionable.

Thwarted by Turtles

Friday, July 1st, 2011

This is the opening of a NYT article on NHL free-agency:

Confusion mounted across the N.H.L. on the eve of Friday’s free-agency frenzy. Among the complicating factors were turtles trundling across an airport runway and the legalization of gay marriage.

The turtles came into play Wednesday night, as about 150 crawled across the runway at Kennedy International Airport in New York, delaying the flight believed to be carrying Jaromir Jagr from the Czech Republic to North America.

But it was not clear Thursday whether he had actually arrived in New York. One report placed him in London watching tennis at Wimbledon.

Jagr’s agent, Petr Svoboda, said Thursday that “Jaromir is definitely in the States” but that he did not know where. Asked about the Wimbledon report, Svoboda said, “He was there, but he’s not anymore.”