Errors, blunders & lies

David S. Salsburg, author of “The Lady Tasting Tea”*, which I enjoyed greatly, hits the spot again with his new book on Errors, Blunders & Lies-How to Tell the Difference. It’s all about a fundamental statistical equation: Observation = model + error. The errors, of course, are normal and must be expected. But blunders and lies cannot be tolerated.

The section on errors concludes with my favorite chapter: “Regression and Big Data”. There Salsburg endorses my favorite way to avoid over-fitting of happenstance results—hold back at random 10 percent of the data and see how well these outcomes are predicted by the 90 percent you regress.** Whenever I tried this on manufacturing data it became very clear that our high-powered statistical models worked very well for predicting what happened last month. 😉 They were worthless for seeing into the future.

Another personal favorite is the bit on spurious correlations that Italian statistician Carlo Bonferroni*** guarded against, also known as the “will of the wisps” per the founder of Yale’s statistics school—Francis Anscombe.

If you are looking for statistical insights that come without all the dreary mathematical details, this book on “Errors, Blunders & Lies” will be just the ticket. Salsburg concludes with a timely heads-up on the statistical lies caused “curbstoning” (reported here by the New York Post), which may soon combine with gerrymandering (see my previous post) to create a perfect storm of data tampering in the upcoming census. We’d all do well to sharpen up our savvy on stats!

The old saying is that “figures will not lie,” but a new saying is “liars will figure.” It is our duty, as practical statisticians, to prevent the liar from figuring; in other words, to prevent him from perverting the truth, in the interest of some theory he wishes to establish.

– Carroll D. Wright, U.S. government statistician, speaking to 1889 Convention of Commissioners of Bureaus of Statistics of Labor.

*Based on the story told here.

**An idea attributed to the inventor of modern day statistics—R. A. Fisher, and endorsed by famed mathematician John Tukey, who suggested the hold-back be 10 percent.

***See my blog on Bonferroni of Bergamo.

No Comments

Gerrymanderers may soon be sent packing for doing too much cracking

Wisconsin Governor Scott Walker and his cohort of Republicans might have gone too far in redrawing their State’s political boundaries to their advantage. Last November, a federal district court declared these maneuvers, called gerrymandering,* unconstitutional. However, as discussed in this Chicago Tribune article, the Supreme Court might consider overturning the ruling—these gerrymanders being partisan, not racially discriminatory.

One of the most infamous of all gerrymandered districts—1992’s 12th Congressional District in North Carolina-is pictured here. It became known as the “I-85 district” due to being no wider than the freeway for stretches that connected the desired populations of voters.

North Carolina’s 12th was a kind of in vitro offspring of an unromantic union: Father was the 1980s/1990s judicial and administrative decisions under the Voting Rights Act, and Mother was the partisan and personal politics that have traditionally been at redistricting’s core. The laboratory that made this birth possible was the computer technology that became available for the 1990s redistricting cycle. The progeny won no Beautiful Baby contests.

— North Carolina Redistricting Cases:  the 1990s, posted at Minnesota Legislature Web Site

You may wonder, as I did, how gerrymandering works. The latest issue of Nature explains it with their graphic on “packing and cracking” here. Also, see the figures on measuring compactness. Mathematicians approach this in various ways, e.g., the area of the district compared to with that of the smallest polygon that surrounds it (called the convex hull). Quantifying the fairness of boundaries creates a great deal of contention–which measure to use being chosen for greatest advantage of whomever is wielding the figures.

Partisan gerrymandering, if not outlawed, will be catalyzed by the 2020 census. Keep an eye on this.

*A word coined in 1812 when Massachusetts’s Governor Gerry redrew a district north of Boston into the shape of a salamander.

No Comments

One-factor-at-a-time (OFAT) food experiments not very nourishing

Knowing of my interest in experiment design,my son-in-law (a newly minted PhD chemist) showed me his book on Cooking for Geeks. It offers a lot of fun detail on chemistry for a fellow like him. As a chemical engineer by profession, I like that too. Furthermore, I am all for the author’s enthusiasm for experimentation. However, his methodology, quoted below, lacks any sophistication or statistical power.

Make a recipe twice, changing just one thing (cookies: melt the butter or not?), and see what changes (if anything). If you’re not sure which way to do something, try both and see what happens. You’re guaranteed to learn something—possibly something the recipe writer didn’t even understand.

– Jeff Potter, author of Cooking for Geeks

Potter is not alone in remaining mired in OFAT and sample sizes of one (n1). This is also the methodology of the prestigious Cooks Illustrated as seen by this experiment on roasting ribs. Chris Kimball who launched this magazine, and, until recently, hosted “Americas Test Kitchen” on PBS, contacted me soon after Forbes recommended Stat-Ease software for multivariable testing (MVT) in March of 1996 (“The New Mantra: MVT”. I gave him a briefing on multifactor (as I prefer to deem it) design of experiments. However, Chris told me that his cooks were artists, not scientists, and they would not take to anything other than n1 OFAT. That works only when you make gross changes, such as roasting at 250 versus 450 degrees F. Even then, I’d like to see at least 4 of each level done in a randomized plan, and, better yet, a multifactor experiment.

The one nice thing about these poorly executed food experiments is that you can re-do them yourself. I might take on the question of roasting ribs, for example. Yum!

No Comments

Statistics to make distracted drivers more aware this month

April is now the Mathematics and Statistics Awareness Month (formerly it was just math–no stats). It also is Distracted Driving Awareness Month.

Putting these two themes together brings us to data published this month by Zendrive, a San Francisco-based startup that uses smartphone sensors to measure drivers’ behavior. They claim that 90% of collisions are due to human error, of which 1 in 4 stem from phone use while driving.

These statistics are very worrying to start off with.  But, according to this blog, it gets far worse when you drill down on Zendrive’s 3-month analysis of 3-million anonymous drivers, who made 570-million trips and covered 5.6-billion miles:

  • Drivers used their phones on 88-percent of the trips
  • They spent 3.5 minutes per hour on calls (an enormous amount of time considering that even a few seconds of distraction can create dire consequences)

About a third of US states prohibit use of hand-held phones while driving. Does this reduce distraction? The stats posted by Zendrive are not definitive.

It seems to me that that hands-free must be far safer. However, this ranking of driving distractions* (benchmarked to plain driving—rating of 1) does not provide much support for what is seemingly obvious:

  1. Listening to the radio — 1.21
  2. Listening to a book on tape — 1.75
  3. Talking on a hands-free cellphone — 2.27
  4. Talking with a passenger in the front seat — 2.33
  5. Talking on a hand-held cellphone — 2.45
  6. Interacting with a speech recognition e-mail or text system — 3.06

For all the fuss about talking on the phone, whether hands-free or not, it does not cause any more distraction than chatting with a passenger.

This list does not include texting, which Consumer Reports figures is 23 times more distracting than talking on your cell phone while driving.**

Please avoid any distractions when you drive, especially texting.

*Source: This 10/16/15 Boston Globe OpEd

**Posted here

No Comments

Slackers rule by a nerd’s law

The of and to a “in” is that it, for you, was with “on”.  Profound?  No.  These are the top 14 most commonly used words according to this Vsauce video by Michael Stevens.  He goes on to reveal a “bizarre” pattern where the second word (“of”) appears one-half as often, the third (“and”) one-third as frequently, and so on, that is, proportionally to one over its rank.  This phenomenon is known as Zipf’s law after the author of Human Behaviour and the Principle of Least Effort published in 1949.

“The” leads the list at 6% for being most used by the reckoning of Stevens.  Another study of 743 billion words found on Google books by their director of research came up with “the” occurring 7.14 percent of the time.  See this Abacaba video for entertaining and informative bubble charts on word frequencies by use, length and gender.

By the way, I learned a new term from Stevens: “hapax legomenon”—a word that only appears once in a book, that is, at the extreme end of the frequency chart ruled by Zipf’s law.  I am now on the lookout for these rarities so I can stop a casual conversation in its tracks by announcing my discovery of a hapax legomenon. ; )

Zipf’s law does not just apply to words, for example, this mysterious rule governs the size of cities as explained by this post on Gixmodo .

The driving force for this regularity in frequency distributions is the tendency for people to put in as little effort as they can, that is, slacking off for the most part.

That is it.

*For bringing this to my attention, I credit Nathaniel Chapman, an undergraduate researcher going for a Master’s degree in chemical engineering at the South Dakota School of Mines and Technology.

No Comments

National Beer Day–A fine time for fun facts and paying homage to a wickedly smart brewer from Guinness

Yesterday marked the end of American prohibition of beer in 1933, albeit only up to 3.2% alcohol by weight. This date every year in the USA has become a day to endorse President Roosevelt’s observation at the time that “I think this would be a good time for a beer.”

It’s also a great time to pay homage to master brewer William S. Gossett of Guinness–the “student” of the Student’s T-Test, a method for extracting the essence of discovery from small samples of data, such as that he generated from his experiments on dry stout. For the whole story, see this wonderful writeup by Priceonomics on The Guinness Brewer Who Revolutionized Statistics.

“He possessed a wickedly fertile imagination and more energy and focus than a St. Bernard in a snowstorm. An obsessive observer, counter, cyclist, and cricket nut, the self-styled brewer had a sizzle for invention, experiment, and the great outdoors.”

– Stephen Ziliak

Glory to Gossett—a brilliant boffin of beer! Beyond recognizing him, here are other fun facts and figures that I gleaned from the International Business Times from their post yesterday on National Beer Day :

  • If an empty beer glass makes you fearful, you suffer from Cenosillicaphobia. Say that after having a few.
  • Women “brewsters” pioneered beer making 5,000 years ago. Let’s tip our caps to these wonderful ladies.
  • Guinness estimated that at one time about 93,000 liters of beer was lost in the beards of Englishmen every year. Gross! Along those lines a brewmaster in Oregon developed a brew using yeast collected from his own beard. Yuk!
  • In ancient Babylon if a person brewed a bad batch, he was drowned. Come on, lighten up!

Cheers for beers!

No Comments

If you finish reading this headline, your attention span beats a goldfish

Jo Craven McGinty in her column The Numbers in today’s Wall Street Journal, debunks this report by Statistic Brain that our attention span has eroded to below that of a goldfish, presumably due to so many distractions nowadays.

My feeling is that the average person truly can only concentrate on one thing for 8 seconds. Where Statistics Brain goes wrong is by overestimating the attention span of a goldfish. I put my pet Pancho (pictured) to the test with a very attractive lure. He came nowhere near 9 seconds of focus, despite me yelling “pay attention!” repeatedly. In fact, he never stopped long enough for me to get a good photo—notice how it’s out of focus.

OK, hold on, I’m getting a text message…

No Comments

Reject love at first sight until you achieve sufficient sample size

Ok, this headline is a bit misleading. It’s not how quickly you fall in love that’s the problem, according to statisticians, it’s falling for the first potential mate that comes along. In other words, they calculate that only fools rush in. ; )

The optimal process for finding the love of your life is this:

  1. Estimate the number (“n”) of people you will date in your life.
  2. Take the square root (√) of n. This is your minimum (“m”).
  3. Keep records on the first m people you date and rank them by attraction—this is your benchmark (“b”). (I advise a 1-9 scale—the odd number allowing for those who are so-so, them being rated a 5.) Dump every one of them.  (Statisticians have no heart when it comes to algorithms like this.)
  4. After you dump m dates, settle down with the first one who exceeds b. Ideally they will rate 10. (Yes, I know this goes above the scale but that is true love.)

I credit the Wall Street Journal last Friday (Feb. 10)* for alerting me to this recipe for finding a soul mate. However, this 2014 article by Slate breaks it down a bit better, IMO. They report that out of a choice of 10 people (n), the √n method (dictating you dump the first 3-4 potential partners) will get you someone that’s three-quarters (75%) perfect. Not good enough? Then go for 100 candidates (ditching the initial 10 suitors) and increase your score to around 90 percent.

Still not satisfied? Revert to the original benchmark of 37% rejection (the reciprocal of Euler’s number e—the base of the natural logarithm) based on the first calculations for the marriage problem that came out in 1960. However, I suggest you make it easier on yourself (and those who desire you but have no shot) by opening up your search sooner by the square root rule. Just keep reminding yourself after settling down that it could have been a lot worse if you had been a fool by rushing in on your first love.

“If you end up marrying the second best person, life is probably not going to be rotten.”

– Neil Bearden, Decision Behavior Laboratory, University of Arizona, author of “Skip the Square Root of n”, Journal of Mathematical Psychology, 9 June 2005.

Happy Valentine’s Day!

* “In Love, Probability Calculus Suggests Only Fools Rush In”.

No Comments

Patriots make a mockery of 249 to 1 odds against them

Check out this Super Bowl win probability chart by ESPN Stats & Info.  It remains bottomed out at an Atlanta Falcons victory from halftime on to the end of regulation, after which the Patriots ultimately prevail.  When New England settled for a field goal to cut their deficit to 16 points (28-to-12), the ESPN algorithm registered a 0.4% probability for them to win, being 9 minutes and 44 seconds left in the game.  That computes to 249 to 1 odds against a Patriot victory. Ha!

I am not terribly surprised that a team could overcome such odds.  The reason is that on December 29, 2006 I attended the Insight bowl in Tempe, Arizona where the Red Raiders of Texas, after falling behind 38-7 with 7:47 remaining in the third quarter, rallied to score 31 unanswered points and ultimately defeat my Gophers in overtime.  At the time it was the greatest comeback of all time in a bowl game, matched only after another decade passed with the 2016 Alamo Bowl victory by the TCU Horned Frogs, who trailed the Oregon Ducks 0-31 at halftime.  But they had more time than the Gophers to throw away their sure victory.  I entered our 2006 chances of victory in this football win probability calculator.  It says 100.00% that Minnesota must win.  Ha!

The laws of probability, so true in general, so fallacious in particular.

– Edward Gibbon

,

1 Comment

A good New Year’s resolution: If you do not exercise, start now–a little goes a long way

I read a cheery Associated Press report today by their Chief Medical Writer Marilynn Marchione that It’s all good: Any exercise cuts your risk of death.  What impresses me is the sample size of 64,000 adults who the UK researchers interviewed and then tracked for death rates.  Another surprise is that almost two-thirds of these individuals did not exercise.  These slackers could reduce their risk of dying by 30 percent if they would just get out for a walk now and then.  Come on people!

“A particularly encouraging finding was that a physical activity frequency as low as one or two sessions per week was associated with lower mortality risks.”

– Researchers from the National Centre for Sport and Exercise Medicine–East Midlands at Loughborough University

No Comments