Archive for June, 2010

A breadth of fresh error

This weekend’s Wall Street Journal features a review by Stats.org editor Trevor Butterworth of a new book titled Wrong: Why Experts Keep Failing US – And How to know When Not to Trust Them.  The book undermines scientists, as well as financial wizards, doctors and all others who feel they are almost always right and thus never in doubt.  In fact, it turns out that these experts may be nearly as often wrong as they are right in their assertions.  Butterworth prescribes as a remedy the tools of uncertainty that applied statisticians employ to good effect.

Unfortunately the people funding consultants and researchers do not want to hear any equivocation in stated results.  However, it’s vital that experts convey the possible variability in their findings if we are to gain a true picture of what may, indeed, transpire.

“Error is to be expected and not something to be scorned or obscured.”

— Trevor Butterworth

,

No Comments

Tasty tidbits gleaned by a news-starved junky for stats trivia

The June 10th “Views” section of the International Herald Tribune (the global edition of New York Times) offered a few choice bits for me to savor after nearly two weeks traveling abroad without an American newspaper.

  • A pie chart reporting on a June 1-7 telephone survey by Stanford University of 1000 American adults asking their opinion on belief in global warming.  A pie chart illustrated that about 75% do believe in global warming, 20% do not, and 5% “don’t believe in pie charts”.  I suspect that the author of this editorial, Jon A. Krosnick – a professor of communications at Stanford, meant this last bit of the chart to represent those who are undecided, but the graphic designers (Fogleson-Lubliner) figured they’d have some fun.
  • Olivia Judson’s comments on “Galton’s legacy” note that this preeminent British statistician once published a comment in Nature (June 25, 1885 “Measure of Fidget”) that correlated boredom by how the audience squirmed during particularly wearisome presentations.  I wish I would’ve thought of this “amusing way of passing an otherwise dull” lecture before attending two statistical conferences over the last several weeks.  Based on this 2005 assessment of “Nodding and napping in medical lectures”, the more things change the more they stay the same, at least so far as presentations are concerned.  The only difference is cost.  For example, the authors figure that at a typical 1 hour talk to 100 high-powered professionals, say master statisticians, perhaps as much as $20,000 goes up in snores.

“Nodding was common, but whether in agreement with the speaker or in reverie remains undetermined.”

— Kenneth Rockwood (Dalhousie University), Christopher J. Patterson, McMaster University, David B. Hogan (University of Calgary)

,

No Comments

Bonferroni of Bergamo

Bonferroni corrected

Uncorrected (random results)

I enjoyed a fine afternoon in the old Citta Alta of Bergamo in northern Italy – a city in the sky that the Venetians, at the height of their power as the “most serene republic,” walled off as their western-most outpost in the 17 century.

In statistical circles this town is most notable for being the birthplace of Carlo Emilio Bonferroni.  You may have heard of the “Bonferroni Correction” – a method that addresses the problem of multiple comparisons.

For example, when I worked for General Mills the head of quality control in Minneapolis would mix up a barrel of flour and split it into 10 samples, carefully sealed in air-tight containers, for each of the mills to test in triplicate for moisture.  At this time I had just learned how to do the t-test for comparing two means.  Fortunately for the various QC supervisors, no one asked me to analyze the results, because I would have simply taken the highest moisture value and compared it to the lowest one.  Given that there are 45 possible pair-wise comparisons (10*9/2), this biased selection (high versus low) is likely to produce a result that tests significant at the 0.05 level (1 out of 20).

This is a sadistical statistical scheme for a Machiavellian manager because of the intimidating false positives (Type I error).  In the simulation pictured, using the random number generator in Design-Expert® software (based on a nominal value of 100), you can see how, with the significance threshold set at 0.05 for the least-significant-difference (LSD) bars (derived from t-testing), the supervisors of Mills 4 and 7 appear to be definitely discrepant.  (Click on the graphic to expand the view.) Shame on them!  Chances are that the next month’s inter-laboratory collaborative testing would cause others to be blamed for random variation.

In the second graph I used a 0.005 significance level – 1/10th as much per the Bonferroni Correction.  That produces a more sensible picture — all the LSD bars overlap, so no one can be fingered for being out of line.

By the way, the overall F-test on this data set produces a p value of 0.63 – not significant.

Since Bonferroni’s death a half-century ago in 1960, much more sophisticated procedures have been developed to correct for multiple comparisons.  Nevertheless, by any measure of comparative value, Bergamo can consider this native son as one of those who significantly stood above most others in terms of his contributions to the world.

,

1 Comment

Priming R&D managers to allow sufficient runs for well-designed experiment

I am learning a lot this week at the Third European DOE User meeting in Lucerne, which features many excellent applications of DOE to industrial problems.   Here’s an interesting observation from Pavel Nesladek, a technical expert from the Advanced Mask Technology Center of Dresden, Germany.  He encounters severe pressure to find answers in minimal time at the least possible cost.   Pavel found that whatever number of runs he asked to do for a given design of experiment, his manager would press for fewer.  However, he learned that by asking for a prime number, these questions would be preempted, presumably because this seemed to be so precise that it must not be tampered with! For example, Pavel really needed to complete 20 runs for adequate power and resolution in a troubleshooting experiment, so he asked for 23 and got it.  Tricky!  Perhaps you stats-savvy readers who need a certain sample size to accomplish your objective might try this approach.  Prima!

No Comments

css.php