Posts Tagged statistics

2017—A prime year for statistics

To cap off the year, I present half a dozen wacky new statistics:

  • 2017 was a “sexy” prime, that is, 6 years beyond the last one in 2011 (six in latin is “sex”).
  • By 2050 the plastic trash floating in the oceans will outweigh the fish. (Source: Robert Samuelson, “The Top 10 Stats of 2017”, Washington Post, 12/27/17.)
  • University of Warwick statistician Nathan Cunningham debunked the “i-before-e except after c” rule based on evaluating 350,000 English words: The ratio of “ie” to “ei” is exactly the same for the after-c words as it is for all words in general. Weird science!
  • After digging into data compiled by the National UFO Reporting Center (NUFORC), Sam Monfort, a doctoral student in Human Factors and Applied Cognition at George Mason University, concluded that UFOs are visiting at all-time highs. Americans sight UFOs at a rate that exceeds the worldwide median by 300 times. Far out!
  • In May, an Australian cat named Omar was confirmed by the BBC as the world’s biggest at nearly 4 feet long and over 30 pounds. My oh meow!
  • Nearly a thousand people dressed up like penguins at Youngstown, Ohio this October to break the world’s record. Coincidentally, National Geographic reported on December 13 that the fossilized remains of a giant, man-sized penguin, were found in New Zealand. Eerie!

No Comments

‘Roid rage

Let’s not get caught off guard by an Earth-killing asteroid. As Dylan Thomas said: “Do not go gentle into that good night, …rage against the dying of the light.” 

That is the mission of NASA.  If you are reading this, chances are that Asteroid 2012 TC4 whizzed by today at 30,000 miles per hour—closely monitored by a network of observatories. Check out the details at this NASA website. They take asteroid defense very seriously.  Their defense plans for redirecting asteroids will be tested out in 2022 on a double asteroid Didymos B as explained here.

Keep in mind that asteroid 1950DA, about three-quarters a mile wide—big enough to destroy our planet, has a 0.1% chance of hitting the earth 2818.  In case NASA does not succeed in their defense efforts, start digging now and you might get hunkered down enough to survive for a short while after that.

No Comments

Errors, blunders & lies

David S. Salsburg, author of “The Lady Tasting Tea”*, which I enjoyed greatly, hits the spot again with his new book on Errors, Blunders & Lies-How to Tell the Difference. It’s all about a fundamental statistical equation: Observation = model + error. The errors, of course, are normal and must be expected. But blunders and lies cannot be tolerated.

The section on errors concludes with my favorite chapter: “Regression and Big Data”. There Salsburg endorses my favorite way to avoid over-fitting of happenstance results—hold back at random 10 percent of the data and see how well these outcomes are predicted by the 90 percent you regress.** Whenever I tried this on manufacturing data it became very clear that our high-powered statistical models worked very well for predicting what happened last month. 😉 They were worthless for seeing into the future.

Another personal favorite is the bit on spurious correlations that Italian statistician Carlo Bonferroni*** guarded against, also known as the “will of the wisps” per the founder of Yale’s statistics school—Francis Anscombe.

If you are looking for statistical insights that come without all the dreary mathematical details, this book on “Errors, Blunders & Lies” will be just the ticket. Salsburg concludes with a timely heads-up on the statistical lies caused “curbstoning” (reported here by the New York Post), which may soon combine with gerrymandering (see my previous post) to create a perfect storm of data tampering in the upcoming census. We’d all do well to sharpen up our savvy on stats!

The old saying is that “figures will not lie,” but a new saying is “liars will figure.” It is our duty, as practical statisticians, to prevent the liar from figuring; in other words, to prevent him from perverting the truth, in the interest of some theory he wishes to establish.

– Carroll D. Wright, U.S. government statistician, speaking to 1889 Convention of Commissioners of Bureaus of Statistics of Labor.

*Based on the story told here.

**An idea attributed to the inventor of modern day statistics—R. A. Fisher, and endorsed by famed mathematician John Tukey, who suggested the hold-back be 10 percent.

***See my blog on Bonferroni of Bergamo.

No Comments

Statistics to make distracted drivers more aware this month

April is now the Mathematics and Statistics Awareness Month (formerly it was just math–no stats). It also is Distracted Driving Awareness Month.

Putting these two themes together brings us to data published this month by Zendrive, a San Francisco-based startup that uses smartphone sensors to measure drivers’ behavior. They claim that 90% of collisions are due to human error, of which 1 in 4 stem from phone use while driving.

These statistics are very worrying to start off with.  But, according to this blog, it gets far worse when you drill down on Zendrive’s 3-month analysis of 3-million anonymous drivers, who made 570-million trips and covered 5.6-billion miles:

  • Drivers used their phones on 88-percent of the trips
  • They spent 3.5 minutes per hour on calls (an enormous amount of time considering that even a few seconds of distraction can create dire consequences)

About a third of US states prohibit use of hand-held phones while driving. Does this reduce distraction? The stats posted by Zendrive are not definitive.

It seems to me that that hands-free must be far safer. However, this ranking of driving distractions* (benchmarked to plain driving—rating of 1) does not provide much support for what is seemingly obvious:

  1. Listening to the radio — 1.21
  2. Listening to a book on tape — 1.75
  3. Talking on a hands-free cellphone — 2.27
  4. Talking with a passenger in the front seat — 2.33
  5. Talking on a hand-held cellphone — 2.45
  6. Interacting with a speech recognition e-mail or text system — 3.06

For all the fuss about talking on the phone, whether hands-free or not, it does not cause any more distraction than chatting with a passenger.

This list does not include texting, which Consumer Reports figures is 23 times more distracting than talking on your cell phone while driving.**

Please avoid any distractions when you drive, especially texting.

*Source: This 10/16/15 Boston Globe OpEd

**Posted here

No Comments

“Bright line” rules are simple but not very bright

Just the other day a new term came to light for me—a “bright line” rule.  Evidently this is commonplace legal jargon that traces back to at least 1946 according to this language log.  It refers to “a clear, simple, and objective standard which can be applied to judge a situation” by this definition.

I came across the term in this statement* on p-values from American Statistical Association (ASA) on statistical significance:

“Practices that reduce data analysis or scientific inference to mechanical ‘bright-line’ rules (such as ‘p < 0.05’) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision-making.”

The ASA goes on to say:

“Researchers should bring many contextual factors into play to derive scientific inferences, including the design of the study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity off assumptions that underlie the data analysis.”

It is hard to argue that if the p-value is high, the null will fly, that is, results cannot be deemed statistically significant.  However, I’ve never bought into 0.05 being the bright-line rule.  It is good to see ASA dulling down this overly simplistic statistical standard.

I can see the value for “bright line rules” in legal processes, a case in point being the requirement for the Miranda warning being given to advise US citizens of their rights when being arrested.  However, it is ludicrous to apply such dogmatism to statistics.

*(The American Statistician, v70, #2, May 2016, p131)

No Comments

Men who have children make more money and live longer–correlation or causation?

Hey guys, if you want to make more money and live longer, have kids.  Anyways that seems to be the gist of two studies reported this month, at least from my perspective as a father of five.  Here is the scoop:

  • “Men in the top 1 percent distribution level live about 15 years longer than men in the bottom 1 percent on the income distribution in the United States.” – Raj Chetty, professor of economics at Stanford University, quoted in this report by NPR on an article in The Journal of American Medical Association on “The Association Between Income and Life Expectancy in the United States, 2001-2014” he lead-authored.
  • Working fathers enjoy 21% ‘wage bonus’ over childless colleagues according to a study by United Kingdom’s Trades Union Congress reported here

Before you run off madly making babies, that correlation may not be causation.  For example, as reported in this expose by Slate, statistics indicate that eating ice cream turns people into killers.  Could that really be the scoop?


No Comments

American Statistical Association (ASA) defends itself against P-shooters

With the fundamental statistic of P value coming under severe attack, e.g., it being banned in early 2015 by the Basic and Applied Social Psychology (BASP) journal, the ASA has been compelled to issue an unprecedented press release with guidelines for avoiding misuse of hypothesis testing by scientists claiming significant experimental results.*  “The ASA statement is intended to steer research into a ‘post p<0.05 era,’” said Ron Wasserstein, the ASA’s executive director.

“To pounce on tiny P values and ignore the larger question is to fall prey to the ‘seductive certainty of significance.’”

– Geoff Cumming, emeritus psychologist, La Trobe University, Melbourne, Australia

The ASA statement on “Statistical Significance and P-Values” can be seen here.  It includes 6 guidelines on proper use of this essential tool for assessing research data, beginning with the assertion that “P-values can indicate how incompatible the data are with a specified statistical model.”

*See, for example, this Nature article that claims P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume.

No Comments

Big data puts an end to the reign of statistics

Michael S. Malone of the Wall Street Journal proclaimed last month* that

One of the most extraordinary features of big data is that it signals the end of the reign of statistics.  For 400 years, we’ve been forced to sample complex systems and extrapolate.  Now, with big data, it is possible to measure everything…

Based on what I’ve gathered (admittedly only a small and probably unrepresentative sample), I think this is very unlikely.  Nonetheless, if I were a statistician, I would reposition myself as a “Big Data Scientist”.

*”The Big-Data Future Has Arrived”, 2/22/16.

1 Comment

A Data Sherlock’s best friend: IBM’s Watson

According to this report last week by eWeek, more than 1 million users have registered for IBM’s Watson Analytics service since it launched a little over 1 year ago.  Evidently this artificially intelligent (AI) statistician-in-a-box will enable “citizen data scientists” to decipher patterns in the massive pile of information that now flow in from all quarters.  Current clients featuring by eWeek range from multinational law firm using it to identify new areas of practice to a UK a care provider looking for factors that improve worker safety.  IBM itself now operates an enterprise called Watson Health that deciphers medical imagery, and they bought the digital assets of the Weather Company to help businesses defend themselves against Mother Nature.*

Unfortunately for one of the early adopters of Watson—the MD Anderson Cancer Center at University of Texas (UT)—AI’s current IQ still falls far short of initial hopes.

“On Jeopardy! [Where Watson made its name 5 years ago by defeating the human champions] there’s a right answer to the question [actually the right question for the answer], but, in the medical world, there are often just informed decisions.”

— Lynda Chin, chief innovation officer for health affairs, UT

So it seems that, for the moment, at least, human statistical Sherlocks will not be replaced by AI’s overseen by amateurs at sleuthing out the culprits for cancer or other highly prized information.  However, Watson might be as capable an assistant as ‘his’ literary namesake.

*1/6/16 Financial Times “Big Read” on “Artificial Intelligence”, p 5 sidebar.

No Comments

Sine illusion makes peaks and valleys on graphs look overly variable

An article in the latest Journal of Computational and Graphical Statistics (JCGS, Vol 24, Num 4, Dec 2015, p1170)) alerted me to a fascinating misperception called the “sine illusion” that causes misinterpretation of trends in variability.  See it nicely illustrated here by vision researcher Micheal Bach.  The JGCS, Susan VanderPlas and Heike Hofmann, detail “Signs of Sine Illusion—Why We Need to Care” and provide methods to counteract its misleading effects.

If you see a scatter plot that goes up and down with seemingly large scatter at the bends, get out a ruler to get the true perspective.  That is my take home message for those like me who like to be accurate in their assessments of data.

“The illusion is explained in terms of a perceptual compromise between the vertical extent and the greater overall dimensions of the section at the turn of the sine-wave figure.”

– RH Day and EJ Stecher, “Sine of an illusion,” Perception, 20; 1991, 49–55.

No Comments