Archive for category design of experiments

Response surface methods hit the spot for optimizing projectile hurling siege engines

Posted by mark in design of experiments, physics on November 12, 2025

A few weeks ago, Professor Ernst Ferg, Associate Professor – Physical Chemistry at South Africa’s Nelson Mandela University bounced (pun intended) some questions off me about deploying response surface methods (methods) on a catapult operated for education purposes by three of his students. I built up power and developed insights on relative performance of the artillerists by rebuilding his results into a fully replicated blocked design.

Now, aided by Stat-Ease software for DOE, you can see surprisingly close agreement on the central composite design’s center-point set-up (red dots) for the catapult (the reason for this soon to be revealed).

Pooling all the results into one model produced a very impressive 3D graph of distance as a function of the two biggest factors—release angle (A) and cup elevation (B).

Being impressed by Ernst’s initiative to teach his students RSM, I asked him to send me pictures of them operating the catapult. Ernst replied, “LoL, I am approaching this very much in the digital way of things: I made them use Virtual Catapult^© from SigmaZone.”

It turns out that Ernst learned about the Virtual Catapult (free!) from Tom Keenan—one of four DOE educators who shared their experiences Teaching Design of Experiments in Higher Education. Tom said that “I love the way that it shoots the ball but doesn’t give you the measurement. It comes to rest next to a tape measure that the students have to read.”

Tom also likes the way that the Sigma Zone simulation incorporates some variability, thus every student gets slightly differing results. He collects the results in blocks and does an analysis similar to what I did for Ernst—being watchful of students who deviate from the others.

Fun!

PS: After gaining possession of a South Dakota Mines trebuchet from Professor Dave Dixon (one of the four panelists), I enlisted my son Hank to run an RSM optimization on this more efficient counterweight-driven cousin of the catapult (powered by torsion). We ran a Box-Behnken design, which simplified the operation to only 3 levels of each factor (versus 5 levels required for a central composite design). Ultimately, we worked out a set up that would shoot a salt-weighted raquetball over our backyard bush into a bucket on the upper level of our play fort. Empowering! For all the details on our trebuchet experiment (and pictures), see Messing With Medieval Missile Machines (Part 2).

catapult, trebuchet

No Comments

The secret sauce in Guinness beer?

Posted by mark in design of experiments on August 9, 2024

I highly recommend Scientific American’s May 25 Opinion by Jack Murtagh explaining How the Guinness Brewery Invented the Most Important Statistical Method in Science. It nicely illustrates the t test—a landmark statistical method developed by William Sealy Gosset to assess a key ingredient in Guiness beer for ideal bitterness and preservation—soft resin content in hop flowers. Gosset calculated that a 1% difference in the amount of soft resins in the hops, the best and cheapest being purchased from Oregon,* increased their value to the brewery by almost 11%.

“Near the start of the 20th century, Guinness had been in operation for almost 150 years and towered over its competitors as the world’s largest brewery. Until then, quality control on its products had consisted of rough eyeballing and smell tests. But the demands of global expansion motivated Guinness leaders to revamp their approach to target consistency and industrial-grade rigor. The company hired a team of brainiacs and gave them latitude to pursue research questions in service of the perfect brew.”
– Jack Murtagh

Back in 2017 on National Beer Day, celebrated yearly on April 7 to commemorate the end of USA’s prohibition of its sale, I saluted Gosset and his very useful t-test of the significance of one treatment versus another, that is, a simple comparative experiment.**

“They began to accumulate data and, at once, they ran into difficulties because their measurements varied. The effects they were looking for were not usually clearcut or consistent, as they had expected, and they had no way of judging whether the differences they found were effects of treatment or accident. Two difficulties were confounded: the variation was high and the observations were few.”
– Joan Fisher Box,*** “Guinness, Gosset, Fisher, and Small Samples,” Statistical Science, Vol. 2, No. 1 (Feb., 1987), pp. 45-52

To see how the t-test works, check out this awesome graphical app developed Even Miller. Using Stat-Ease software, I cross-checked it against a case study (Example 3.3) from the second edition of Box, Hunter and Hunters’ textbook Statistics for Experimenters. It lays out a simple comparative experiment by a tomato gardener who randomly splits 11 plants for treatment either with her standard fertilizer (A) or a far more expensive one (B) that supposedly produces far better yields. Here are the yield results in pounds, which you can assess using the t test:

29.9, 11.4, 25.3, 16.5, 21.1
26.6, 23.7, 28.5, 14.2, 17.9, 24.3

On average the new fertilizer increases the yield by nearly 2 pounds, but is the difference statistically significant? That would be good to know! I have the answer, but it would be no fun to tell you, being so easy to find out for yourself.

PS: Due to the large variation between plants (a greater than 6-pound standard deviation!), this tomato study is badly underpowered. If you do an experiment like this, do anything possible to get more consistent results. Then assess power for whatever the difference is that makes changing fertilizers worthwhile. For example, let’s say that with better plant management you got the standard deviation reduced to 3 pounds and a difference of 4 pounds is needed at a minimum to make the switch in fertilizer cost-effective. Then, using Stat-Ease software’s power calculator, I figure you would need to test 3-dozen plants each in your randomized experiment to achieve an 80% probability of detecting a difference of 4 pounds given a 3-pound standard deviation. I hope you like tomatoes!

*As reported by Eat This Podcast in their 4/10/18 post on Guinness and the value of statistics

**National Beer Day–A fine time for fun facts and paying homage to a wickedly smart brewer from Guinness

***I was very fortunate to meet Joan Fisher Box in 2019 as related in this StatsMadeEasy blog/

beer

No Comments

Mentos volcano rocks Rapid City

Posted by mark in design of experiments, Education on April 12, 2021

It was my pleasure to oversee another outstanding collection of fun experiments by the Chemical and Biological Engineering (CBE) students at South Dakota School of Mines and Technology (SDSMT) for this Spring semester’s Applied Design of Experiments for the Chemical Industry class presented by Stat-Ease. They continued on the excellent tradition established by the class of 2020 which I reported in my blog on “DOE It Yourself” hits the spot for distance-learning projects.

As promised, I am highlighting a few of the many A+ projects in StatsMadeEasy, particularly those with engaging videos. My first selection goes to Dakin Nolan, Erick Hoon and Jared Wilson for their “DOE Soda and Mentos Experiment”. They studied the “heterogenous nucleation of gases on a surface” caused by type of soda, its temperature and volume versus the quantity of Mentos. See the results in the video (“the moment you’ve all been waiting for”). Do not miss the grand finale (“The Masterpiece”) that shows what happens if you mix 15 Mentos in a 2-liter bottle of hot Diet Coke.

It’s hard to say how high the cola spouted in the blow out at the end, but it must have made a big sticky mess of the surrounding area. At similar conditions but at a more prudent maximum of 3 Mentos (the highest level actually tested in the DOE), Design-Expert predicts a peak of 310 inches—an impressive 25 feet of magma.

Further work will be needed to optimize the dosage of Mentos. Perhaps 15 of the sugary oblate spheroids may be overkill. There’s always room for improvement, as well as more fun, making volcanoes.

volcano

No Comments

Experiment reveals secret to maximizing microwave popcorn—Part one: Setup

Posted by mark in design of experiments, Uncategorized on December 27, 2020

Energized by a new tool in Design-Expert® software (DX) for modeling counts (to be discussed in Part 2—Analysis of results), I laid out a design of experiment (DOE) aimed at reducing the number of unpopped kernels (UPK) from microwaved popcorn. I figured that counting the UPKs would be a far more precise measure of popcorn loss than weighing them, as done in this prior study by me and my son Hank).

My new experiment varied the following two factors in a replicated, full, multilevel, categorical design done with my General Electric (GE) Spacemaker microwave oven:

A. Preheat with 1 cup of water at 1 minute on high, No [L1] vs Yes [L2]

B. Timing, GE default [L1] vs GE++ [L2] vs Popcorn Expert app [L3]

I tested the preheating (factor A) before and found it to be unproductive. However, after seeing it on this list of microwave ‘hacks’, I decided to try again. Perhaps my more precise measuring of UPK might show preheating to be of some help after all.

The timing alternatives (factor B) came about when I discovered Popcorn Expert AI Cooking Assistant for systematically applying the #1 hack—the two-second rule: When this much time passes between pops, stop.

By the way, I also tried the third hack—pouring the popcorn into a covered glass bowl, but that failed completely—causing a very alarming “SENSOR ERROR”. It turns out that the GE Spacemaker uses humidity to determine when your popcorn is done. The plastic cover prevented moisture from escaping. Oops! Next time I try this it will be with a perforated lid.

While researching the user manual for the first time since buying the Spacemaker 15 years ago (engineers rarely read instructions) and learning about the humidity angle for the first time, I also found out that pressing 9 twice after beginning the popcorn cook added 20 and then 10 more seconds (++) at the end.

The original experiment-design of 12 runs (2×3 replicated) was laid out in a randomized recipe sheet by DX, all of them done using 3 ounce bags of Jolly Time, Simply Popped Sea Salt microwave popcorn. Due to a few mistakes by the machine operator (me) misreading the run sheet, two extra runs got added—no harm done: more being better for statistical power.

Part 2 of this two-part blog will delve into the analysis details, but it became readily apparent from a one-to-one comparison that the default popcorn setting of my GE microwave came up far short of Popcorn Expert for reducing UPK. However, the “++” adjustment closed the gap, as you will see.

To be continued…

popcorn

No Comments

Statisticians earn residuals by airing errors

Posted by mark in design of experiments, pop on October 10, 2020

A new book by David S. Salsburg provides a series of Cautionary Tales in Designed Experiments. Salsburg wrote the classic The Lady Tasting Tea, which I read with great delight. I passed along the titular story (quite amazing!) in a book review (article #4) for the July 2004 DOE FAQ Alert.

Salsburg’s cautionary tales offer a quick read with minimal mathematics on what can go wrong with poorly designed or badly managed experiments—mainly medical. I especially liked his story of the Lanarkshire Milk Experiment of 1930, which attempted to test whether pasteurization removed all the “good”. Another funny bit from Salsburg, also related in The Lady Tasting Tea and passed only by me in my review, stems from his time doing clinical research at Pfizer when a manager complained about him making too many “errors”. He changed this statistical term to “residuals” to make everyone happy.

With all the controversy now about clinical trials of Covid-19 vaccines and the associated politics, Cautionary Tales in Designed Experiments offers a welcome look with a light touch at how far science progressed over the past century in their experimental protocols.

“It is the well-designed randomized experiment that provides the final ‘proof’ of the finding. The terminology often differs from field to field. Atomic physicists look for “six sigma” deviations, structure-activity chemists look for a high percentage of variance accounted for, and medical scientists describe the “specificity” and “sensitivity” of measurements. But all of it starts with statistically based design of experiments.”
David S. Salsburg, conclusion to Cautionary Tales in Designed Experiments

book review

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part Three—the details and data

Posted by mark in design of experiments, Education, Uncategorized on September 2, 2020

Detail on factors:

Ball type (bought for $3.50 each from Five Below (www.fivebelow.com)):
- 4 inch, 41 g, hollow, licensed (Marvel Spiderman) playball from Hedstrom (Ashland, OH)
- 4 inch, 159 g, energy high bounce ball from PPNC (Yorba Linda, CA)
Temperature (equilibrated by storing overnight or longer):
- Freezer at about -4 F
- Room at 72 to 76 F with differing levels of humidity
Drop height (released by hand):
- 3 feet
- 6 feet
Floor surface:
- Oak hardwood
- Rubber, 3/4″ thick, Anti Fatigue Comfort Floor Mat by Sky Mats (www.skymats.com)

Measurement:

Measurements done with Android PhyPhox app “(In)Elastic”. Record T₁ and H₁, time and height (calculated) of first bounce. As a check note H₀, the estimated drop height—this is already known (specified by factor C low and high levels).

Data:

Std #	Run #	A: Ball type	B: Temp deg F	C: Height feet	D: Floor type	Time seconds	Height centimeters
1	16	Hollow	Room	3	Wood	0.618	46.85
2	6	Solid	Room	3	Wood	0.778	74.14
3	3	Hollow	Freezer	3	Wood	0.510	31.91
4	12	Solid	Freezer	3	Wood	0.326	13.02
5	8	Hollow	Room	6	Wood	0.829	84.33
6	14	Solid	Room	6	Wood	1.119	153.54
7	1	Hollow	Freezer	6	Wood	0.677	56.17
8	4	Solid	Freezer	6	Wood	0.481	28.34
9	5	Hollow	Room	3	Rubber	0.598	43.92
10	10	Solid	Room	3	Rubber	0.735	66.17
11	2	Hollow	Freezer	3	Rubber	0.559	38.27
12	7	Solid	Freezer	3	Rubber	0.478	28.03
13	15	Hollow	Room	6	Rubber	0.788	76.12
14	11	Solid	Room	6	Rubber	0.945	109.59
15	9	Hollow	Freezer	6	Rubber	0.719	63.43
16	13	Solid	Freezer	6	Rubber	0.693	58.96

Observations:

Run 7: First drop produced result >2 sec with height of 494 cm. This is >16 feet! Obviously something went wrong. My guess is that the mic on my phone is having trouble picking up the sound of the softer solid ball and missed a bounce or two. In any case, I redid the bounce.
- Starting run 8, I will record Height 0 in Comments as a check against bad readings.
Run 8: Had to drop 3 times to get time registered due to such small, quiet and quick bounces.
- Could have tried changing setting for threshold provided by the (In)Elastic app.
Run 14: Showing as outlier for height so it was re-run. Results came out nearly the same 1.123 s (vs 1.119 s) and 154.62 cm (vs 153.54). After transforming by square root these results fell into line. This makes sense by physics being that distance for is a function of time squared.

Suggestions for future:

Rather than drop the balls by eye from a mark on the wall, do so from a more precise mechanism to be more consistent and precise for height
Adjust up for 3/4″ loss in height of drop due to thickness of mat
Drop multiple times for each run and trim off outliers before averaging (or use median result)
Record room temp to nearest degree

home experiments

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part Two—the amazing results

Posted by mark in design of experiments, Uncategorized on August 31, 2020

The 2020 pandemic provided a perfect opportunity to spend time doing my favorite thing: Experimenting!

Read Part One of this three-part blog to learn what inspired me to investigate the impact of the following four factors on the bounciness of elastic spheroids:

A. Ball type: Hollow or Solid

B. Temperature: Room vs Freezer

C. Drop height: 3 vs 6 feet

D. Floor surface: Hardwood vs Rubber

Design-Expert® software (DX) provides the astonishing result: Neither the type of ball (factor A) nor the differing surfaces (factor D) produced significant main effects on first-bounce time (directly related to height per physics). I will now explain.

Let’s begin with the Pareto Chart of effects on bounce time (scaled to t-values).

First observe the main effects of A (ball type) and D (floor surface) falling far below the t-Value Limit: They are insignificant (p>>0.05). Weird!

Next, skipping by the main effect of factor B (temperature) for now (I will get back to that shortly), notice that C—the drop height—towers high above the more conservative Bonferroni Limit: The main effect of drop height is very significant. The orange shading indicates that increasing drop height creates a positive effect—it increases the bounce time. This makes perfect sense based on physics (and common knowledge).

Now look at a multi-view Model Graphs for all four main effects.

The plot at the lower left shows how the bounce time increased with height. The least-significant-difference ‘dumbbells’ at either end do not overlap. Therefore, the increase is significant (p<0.05). The slope quantifies the effect—very useful for engineering purposes.

However, as DX makes clear by its warnings, the other three main effects, A, B and D, must be approached with great caution because they interact with each other. The AB and BD interactions will tell the true story of the complex relationship of ball type (A), their temperature (B) and the floor material (D).

See by the interaction plot how the effect of ball type depends on the temperature. At room temperature (the top red line), going from the hollow to the solid ball produces a significant increase in bounce time. However, after being frozen, the balls behaved completely opposite—hollow beating solid (bottom green line). These opposing effects caused the main effect of ball type (factor A) to cancel!

Incredibly (I’ve never seen anything like this!), the same thing happened with the floor surface: The main effect of floor type got washed out by the opposite effects caused by changing temperature from room (ambient) to that in the freezer (below 0 degrees F).

Changing one factor at a time (OFAT) in this elastic spheroid experiment leads to a complete fail. Only by going to the multifactor testing approach of statistical DOE (design of experiments) can researchers reveal breakthrough interactions. Furthermore, by varying factors in parallel, DOE reveals effects far faster than OFAT.

If you still practice old-fashioned scientific methods, give DOE a try. You will surely come out far ahead of your OFAT competitors.

P.S. Details on elastic-spheroid experiments procedures will be laid out in Part 3 of this series.

home experiments

No Comments

Magic of multifactor testing revealed by fun physics experiment: Part One—the setup

Posted by mark in design of experiments, Education on August 23, 2020

The behavior of elastic spheres caught my attention due to a proposed, but not completed, experiment on ball bounciness turned in by a student from the South Dakota School of Mines and Technology.* I decided to see for myself what would happen.

To start, I went shopping for suitable elastic spheres. As pictured, I found two ball-toys with the same diameter—one of them with an eye-catching Spider-Man graphic.

My grandkids all thought that “Spidey” would bounce higher than the other ball—the one in swirly blue and yellow. Little did they know just by looking that “Swirley” was the one with superpowers, it being made from exceptionally elastic, solid synthetic rubber. Sadly, Spidey turned out to be a hollow airhead. This became immediately obvious when I dropped the two balls side by side from shoulder height. Spidey rebounded only to my knee while Swirley shot all the way back to nearly to the original drop level, which really amazed the children.

My next idea for the bouncy experiment came from Frugal Fun for Boys and Girls, a website that provides many great science projects. Their bouncy ball experiment focuses on the effect of temperature as seen here.

However, I could see one big problem straight away: How can you get an accurate measure of bounce height? That led me an amazing cell-phone app called Phyphox (Physics Phone Experiments) which provided an ingenious way to calculate how high a ball bounces by listening to them hit the floor.** Watch this short video to see how. (If you are a physicist, stay on for how the narrator of the demo, Sebastian Staacks, worked out all his calculations for the Phyphox (In)elastic tool.)

The third factor came easy: Height of drop. To make this obvious but manageable, I chose three versus six feet.

The fourth and final factor occurred to me while washing dishes. We recently purchased a thick rubber mat for easy cleanup and comfortable standing in front of our sink. I realized that this would provide a good contrast to our hardwood floors for bounce height, the softer surface being obviously inferior.

To recap, the four factors and their levels I tested were:

A. Ball type: Hollow or Solid

B. Temperature: Room vs Freezer

C. Drop height: 3 vs 6 feet

D. Floor surface: Hardwood vs Rubber

Using Design-Expert® software (DX) I then laid out a two-level, full factorial of 16 runs in random order. To be sure of temperature being stabilized, I did only one run per day, recording the time the first bounce and its height (calculated by the Phypox boffins as detailed in the videos).

When I completed the experiment and analyzed the results using DX, I was astounded to see that neither the type of ball nor the differing surfaces produced significant main effects. That made no sense based on my initial demonstrations on side-by-side bounce for the two balls on the floor versus the rubber mat.

Keeping in mind that my experiment provided a multifactor test of two other variables, perhaps you can guess what happened. I will give you a hint: Factors often interact to produce surprising results, such as time and temperature suddenly coming together to create a fire (or as I would say as a chemical engineer—an “exothermic reaction”).

Stay tuned for Part 2 of this blog on my elastic spheroid experiment to see how the factors interacted in delightful ways that, once laid out, make perfect sense to even for non-physicists.

*For background on my class and an impressive list of home experiments, see “DOE It Yourself” hits the spot for distance-learning projects.

**I credit Rhett Alain of Wired for alerting me to Phyphox via his 8/16/18 post on Three Science Experiments You Can Do With Your Phone. From there he provides a link to a prior, more detailed, post on Modeling a Bouncing Ball.

home experiments

No Comments

Business community discovers that “Experimentation Works”

Posted by mark in design of experiments on April 21, 2020

Last month the Wall Street Journal “Bookshelf” (3/15/20, David A. Shaywitz) featured a review of a book about The Surprising Power of Business Experiments.

“Tests at Microsoft in 2012 revealed that a tiny adjustment in the way its Bing search engine displayed ad headlines resulted in a 12% increase in revenue, translating into an extra $100 million annually for the company in the U.S. alone.”
Stefan Thomke, author of Experimentation Works: The Surprising Power of Business Experiments.

It’s great to see attention paid to the huge advantages gained from statistically rigorous experiments. However, vastly greater returns await those willing to go beyond simple-comparative one-factor A/B testing to multifactor design of experiments. The reason is obvious: Only by testing more than one factor at a time, can interactions be discovered.

A case in point is provided by an experiment I did on postcard advertisements. It produced a non-intuitive finding that, unlike marketers, our engineering clients preferred less colorful layouts. Knowing this, we succeeded in increasing our response at a far lower printing cost. See the proof in the interaction plot at the conclusion of this white paper on That Voodoo We Do – Marketers Are Embracing Statistical Design of Experiments.

Another compelling example of the value of multifactor testing is illustrated by website-conversion results* shown here—produced from a replicated, full, two-level factorial design.

The key to a more than 5-fold increase in clicks turned out to be the combination of going to a modern font (factor A) with a more compelling button label (C). A third factor (B), background being white versus blue, did not create a significant effect, which also provided valuable insights on the drivers for conversion.

Why settle for testing only one factor when, without investing much more time, if any, you can investigate many factors and, as a huge bonus, detect possible interactions?

*From Pochiraju & Seshadri, Essentials of Business Analytics, 2019, Springer, p 737.

marketing

No Comments

Enlightenment by an accidental statistician under the Great Comet of 1996

Posted by mark in design of experiments, history on October 21, 2019

A small, but select, group of people came Friday to University of Wisconsin, Madison for the celebration George E. P. Box’s 100th birthday, including his second wife Joan Fisher, whose father Ronald invented modern-day design of experiments (DOE) and the whole field of industrial statistics. Box, who doubled down on Fisher by his development of response surface methods (RSM), went by the name “Pel”. This nickname stemmed from the second of his middle names “Edward Pelham” (E. P. not standing for Elvis Presley as some who admired him thought more apropos).

In my blog on March 30, 2013—just after his death, I relayed stories of my two memorable encounters with Box. Friday marked my first visit to UW-Madison since I last saw him in 1996 for his short-course on DOE. Looking over Lake Mendota from the Memorial Union Terrace brought back memories of the incredible view during my class, when Comet Hyakutake peaked in spectacular fashion before rapidly diminishing. I rate Hyakutake on par with Hale-Bopp that came a year later, just as I view Box and Fisher as the luminaries for DOE.

Inspired by the Centenary, I ordered a copy of Box’s autobiography—The Accidental Statistician, which he completed in the last year of his life. I look forward to reading more about this remarkable fellow.

The video presented by Box at the time of publication—March 2013—provides a sampling of the stories he told to inspire experimenters to be more observant and methodical:

How a monk discovered the secret to making champagne,
What to make of seeing bloody Mr. Jones running down the street pursued by Mrs. Jones with a hatchet (good one for this Halloween season!).

https://www.youtube.com/watch?v=svmKEhsp1Gg

George Box

No Comments

Stats Made Easy

Archive for category design of experiments

Response surface methods hit the spot for optimizing projectile hurling siege engines

The secret sauce in Guinness beer?

Mentos volcano rocks Rapid City

Experiment reveals secret to maximizing microwave popcorn—Part one: Setup

Statisticians earn residuals by airing errors

Magic of multifactor testing revealed by fun physics experiment: Part Three—the details and data

Detail on factors:

Measurement:

Data:

Observations:

Suggestions for future:

Magic of multifactor testing revealed by fun physics experiment: Part Two—the amazing results

Magic of multifactor testing revealed by fun physics experiment: Part One—the setup

Business community discovers that “Experimentation Works”

Enlightenment by an accidental statistician under the Great Comet of 1996

Links

Archives

Meta