The residuals are "in control," meaning their long-term variation is no greater than their short-term variation and the grand mean really does look like a central tendency. This stationary series also plots as a straight line on normal probability paper, so it's consistent with the assumption that the residuals vary normally around a mean of zero. 20
In addition to spikes, trends, and shifts, data footprints include cycles, mixtures, stratifications, instability, etc., each being the footprint of a different species of cause. 21
The Allegory of the Fluoropolymer
The viscosity of a fluoropolymer was plotted batch-by-batch for two reactors, C and D. The chart for Reactor D (second panel) was shown to the class.
Figure 14: Viscosity of a fluoropolymer from two reactors when the polymer grade was changed.
Behold! A shift! Viscosity jumped from just under 32 for grade 400 to just over 35 for grade 500. The class response: Big deal, it was supposed to do that! Polymer 500 is thicker. 22 Then I showed them the chart for Reactor C, which did not exhibit the expected shift. Prior to the class, I did not know which reactor was doing the wrong thing; but I knew one of them was.
This strategy of Matching Charts is a simple tactic for root cause analysis. 23
When adjustments go bad.
"No purely mathematical procedure can make a good figure out of any number of bad ones."—W. Edwards Deming, Statistical Adjustment of Data
The Allegory of the Cookies. A dough is machine-kneaded, rolled into a wide continuous sheet, and cut into cookies, which then ride a belt through a baking oven 24 abreast. Each half-hour, a sample of three cookies is taken from the oven exit, one each from the left, center, and right sides of the belt and tested with a reflectance meter called an Agtron. The Xbar and R chart below shows what appears to be a stationary series.
Figure 15: Each subgroup consists of a cookie from the left, center, and right sides of the oven. The points hug the center line a little too tightly. Nearly all are within one standard deviation.
The expression "too good to be true" comes to mind. The points hug the center line a little too much, especially the ranges. Unlike the paste weights, most of the variation here is within the subgroups. To find the major causes of variation, we have to "crack open the subgroups" and see what's inside. What's inside is variation across the oven belt, so let's plot each location separately.
Figure 16: Time series for each position from oven. Breakdown reveals stratification.
This reveals a stratification effect. The left-side cookie is consistently darker than the other two, an "apple" in a sample of "oranges." This was due to an imbalance in the air flow in the oven.
To reduce the variation, correct the air flow. But suppose you can't? Then we might adjust the data by subtracting the mean value for that location and plotting the residuals. 24 Having done so with the three oven zones, the new limits reveal hitherto hidden signals of time-related changes.
Figure 17: "Cookie color change anomalies." Adjusted data now reveals time-related assignable causes.
The degree of stratification changed (starting around Sample seven) and the mean value dipped at Sample 13-14. But since you can't fit an average to an unstable process, the adjustment itself is now suspect and we must be careful about how we interpret and rely on the analysis.
"We do not have, in fact, an observing network capable of ensuring long-term climate records free of time-dependent biases."—NOAA, USCRN Program Overview
Summary
Science is based on data and data collection, and analysis is messy and subject to many influences. Difficulties arise in the definition, instrumentation, collection, and adjustment of the data.
Measurements are not always comparable, and great care must be taken to ensure operational definitions meaningful to the research objectives. Never use data collected for other purposes if you don't know how the measurements were in fact carried out.
Probabilities depend on the statistical model applied, and for an unstable process there may be no such model. Even for stable processes, the appropriate model may not be the Normal; yet many folks use the Good Ol' Bell Curve as a substitute for thought.
Data often need adjustments before use. There may be outliers or missing data points, or even external constraints. The three angles of a triangular part ought to sum to 180°. If they don't the measured angles must be corrected. 25
But an unstable process cannot be meaningfully adjusted. The adjustment of data—to external constraints, or to a regression model or average—is fraught with peril, the most dangerous of which is forgetting that you're working with adjusted data at all.
Exaggerated by some, downplayed by others, all of these considerations come into play for climate statistics, a topic we may return to when the Allegories are Explained.
Endnotes
1 Sheds a new light on "falsification," too, doesn't it?
2 Yeah, I know. Makes you wonder about the aether thingie.
3 Duh? Variation among different individuals is called "reproducibility." Measurement systems also possess various degrees of bias adn linearity, stability, and resolution.
4 W. Edwards Deming, Quality, Productivity, and Competitive Position (MIT Press, 1982), pp. 326–327, citing Haliday and Resnick, Fundamentals of Physics (Wiley, 1974).
5 But see João Magueijo, Faster Than the Speed of Light (Perseus Pub., 2003) and "Eifelheim" (Analog, Nov. 1986)
6 American Society for Testing and Materials
7 Richard Feynman, "Cargo Cult Science" (Commencement address, Caltech, 1974)
8 "Fact" from L. factum est, "that which has been made or accomplished."
9 Yes. The author admits to committing irony here.
10 The problem is worse when the "instrument" is a questionnaire.
11 This is why models are almost always reported as more precise than they really are. Scientists are interested in how tightly they have locked in the model; users are interested in the precision of the predictions.
12 The mean of all the sample values except the extreme values; esp. where the extremes are likely to be faulty measurements.
13 Yes. it's called kriging [hard g]. Basically, the missing value at location X is estimated as the average of the surrounding locations. Details, and complications, are beyond the scope of this essay.
14 A good discussion of sampling in general: A Sampler on Sampling, by Bill Williams. More technical, but still accessible: Case Studies in Sample Design, by A. C. Rosander.
15 Western Electric Quality Control Handbook (Bonnie Small, ed.)
16 Otherwise, you are a Neoplatonic woo-woo.
17 Ha ha. Sorry. I couldn't resist.
18 Invented by Walter Shewhart of old Bell Labs: The Economic Control of Quality of Manufactured Product (Van Nostrand, 1931).
19 This "tampering" actually increases overall variation!
20 Note: "is consistent with." Avoid saying that the data or the process "is normal." The real world can do whatever it darn well pleases. Be grateful some statistical models are often useful approximations.
21 This useful taxonomy is derived from the Western Electric Quality Control Handbook
(Bonnie Small, ed.)
22 And I have always wondered: What if I had shown them Reactor C first?
23 Turned out, Reactor C was piped differently than D (and A and B).
24 In climate science residuals are called "anomolies."
25 And there are at least two legitimate ways of doing so. See Deming, Statistical Adjustment of Data.
Useful Reading
• Box, George E. P., William G. Hunter, and J. Stuart Hunter. Statistics for Experimenters (John Wiley & Sons, 1978)
• Deming, W. Edwards, Statistical Adjustment of Data (Dover, 1964)
• Deming, W. Edwards, H.F. Dodge, Leslie Simon, et al. "Control Chart Method of Analyzing Data," American War Standard Z1.2-1941 (American Standards Association)
• Duhem, Pierre. "Some Reflections on the Subject of
Experimental Physics" (1894) in Duhem, Essays in the History and Philosophy of Science, tr. Ariew and Barker. (Hackett Publishing, 1996).
• Feynman, Richard. "Cargo Cult Science" (Commencement address, Caltech, Engineering and Science, 1974); on-line at
http://calteches.library.caltech.edu/51/2/CargoCult.pdf
• Flynn, Michael. "Garbage Out: The Fine Art of Putting Garbage In,"
Annual Quality Congress Transactions, (ASQC, 1986).
• Rosander, A.C. Case Studies in Sample Design (Marcel Dekker, 1971)
• Shewhart, Walter. The Economic Control of Quality of Manufactured Product (Van Nostrand, 1931)
• Small, Bonnie, ed. Western Electric Quality Control Handbook, (Mack Printing Co., 1956)
• Williams, Bill. A Sampler on Sampling (John Wiley & Sons, 1968)
* * *
Foreshadowing and the Ides of March: How to (sort of) Hint at Things to Come
Special Feature Richard A. Lovett | 3159 words
There is a scene in "Romeo and Juliet" where a worried Juliet exclaims, "O God, I have an ill-divining soul! Methinks I see thee [Romeo]... as one dead in the bottom of the tomb." It's a classic example of foreshadowing. Juliet fears that Romeo will die... which is exactly what ultimately happens.
Foreshadowing is the art of hinting at what is to come. It's the scary music when characters in a horror film enter an abandoned house, or Han Solo in Star Wars saying, "I have a bad feeling about this," when he, Luke, Chewbacca, and Princess Leia escape down a trash chute into what turns out to be a giant trash compactor.
English teachers tend to make a big deal of foreshadowing because it's easy to spot and fun to discuss. But that can make beginning writers think everything must be overtly foreshadowed.
"[Foreshadowing] shouldn't be too obvious to most readers," says novelist and Analog contributor Brenda Cooper. In most cases, she adds, readers shouldn't even realize that an event was foreshadowed until it actually happens. Nebula-winning novelist Gregory Benford agrees. "Otherwise, readers sniff the ending and lose interest," he says.
But this doesn't mean that foreshadowing is unimportant. Properly done, it serves the critical function of setting the stage for what is to follow. If you do too little of it, you get some thing like this:
Martin Smith was sitting on the couch flipping through his grandmother's old photo album when he was interrupted by a knock at the door. He got up, walked across the living room, and opened it.
Outside, a pizza deliveryman was waiting. The deliveryman pulled out a gun and fired three shots. Martin dropped dead. The deliveryman stepped across his body into the living room and quickly spotted the photo album. "That's convenient," he muttered, scooped it up, shut the door, and headed back to his car.
That's extremely unsatisfying because Martin's murder and the theft of the photo album come out of the blue. One moment Martin thinks he's getting a pizza... and ten words later, he's dead. I might just as well have written, "That night a thief, posing as a pizza de-liveryman, knocked on Martin Smith's door, gunned him down, and stole his grandmother's old photo album."
How can we fix it?
There are lots of possibilities, but one is to start with a prior scene in which Martin is at work, where he gets a call from someone claiming to be a museum curator, who offers him $250, $500, $1,000, possibly even more, until Martin tells him the thing isn't for sale at any price, and hangs up. Martin could be disturbed enough to tell this to his best friend (the real protagonist of our story), who will remember it after Martin dies.
Hours later, preferably after some intervening action (possibly including character development to help establish other protagonists to carry on once Martin is gone), we can now have the following scene:
Martin Smith was tired. Too tired to do anything but sit on the couch watching the evening news. The North Koreans were again threatening to nuke half the globe, while down on Fifth Street some idiot had carjacked an old lady's car, only to discover that in the rush-hour traffic there was nowhere to go. He'd fled on foot, but only after leaving the poor woman in critical condition. But Martin was having trouble focusing. Why did anyone want his grandmother's old photos?
Ignoring the TV, he walked to the bookcase and pulled out the album. But it was just a bunch of family photos, plus vacation shots labeled with things that looked like airport codes. MSQ w/ Bill & Cindy —that type of thing. His grandmother had met people wherever she went and seemed to have photographed them all.
He returned to the couch, intending to look up MSQ on Google, but was interrupted by a knock at the door. Glancing out the window, he spotted a pizza-delivery car by the curb. Odd, he thought, I didn't order a pizza.
Walking all five paces across his tiny living room, he opened the door and found himself facing an unusually old pizza man. Midfifties at least, with steel-grey eyes untouched by the smile attempting to curl his lips.
"Martin Smith?"
It took Martin a moment to notice the gun poking from beneath the pizza box.
"Uh..." Stepping back, he tried to slam the door, but was too late. The gun popped—once, twice, three times—a muffled poof, barely audible above the TV.
Martin fell backward and the gunman stepped over him, quickly spying the photo album on the coffee table. "That's convenient," he muttered as though Martin were still alive to hear. He scooped it up, kicked Martin's feet far enough back into the apartment to allow the door to close, and headed for the street.
Better? I hope so. I added a lot of things, some of which, such as the parade of bleak news on the television, the fact Martin never ordered a pizza, and his grandmother's habit of photographing strangers should help build suspicion that Martin might be in danger. Now when he dies, the reader may not be happy (we might rather see Martin slam the door and escape out a window, photo album clutched under one arm, gunshots ringing behind him) but at least we're not caught totally unprepared.
Poor Miss Bobbit
That's one type of foreshadowing: the use of hints to set up the expectation of danger. But foreshadowing can be far more explicit.
Truman Capote's 1948 short story, "Children on Their Birthdays," opens with the line, "Yesterday afternoon the six o'clock bus ran over Miss Bobbit." Just to make sure we realize the accident is fatal, Capote has his narrator continue: "I'm not sure what there is to be said about it; after all, she was only ten years old, still I know no one of us in this town will forget her."
Capote hasn't simply hinted that Miss Bobbit's life might be short; he's told us so in no uncertain words.
Norman Spinrad did the same in his 1983 Nebula-nominated novel The Void Captain's Tale.
That novel began: "I am Genro Kane Gupta, Void Captain of the Dragon Zephyr, and may-hap this is my todtentale. Of necessity, it is also the tale of Void Pilot Dominique Alia Wu, but she is gone into the Great and Only." A few paragraphs later, Spinrad adds:
On the ninth Jump... the consciousness of Void Pilot Dominique Alia Wu left its material matrix and did not return, though the Dragon Zephyr somehow survived this Blind Jump.
The ship is now marooned about a score light-years from the nearest habited star, without a Pilot.
Again, the ending hasn't just been hinted at. As one reviewer says, "[T]he narrator lays down the entire tale on the first page. There is no suspense regarding what will occur, only how and in what manner it will be presented." 1
It's an incredibly risky thing for a writer to do. Benford, in fact, thinks Spinrad's decision damaged his story. 2 So why would a writer do this?
I can only speak for myself, and I've only done it once, in "Neptune's Treasure" ( Analog, Jan/Feb 2010). That novella began:
How old were you when you first saw death? Me, I'll call it twenty-two. 3 It's a good number: one year beyond that at which you can vote and drink... no more arbitrary than the events that killed John Pilkin. The same ones that nearly got me and Floyd killed, too. Somehow, though, seeing someone else die is a whole bunch more real than dodging the
bullet yourself.
Like Capote with poor Miss Bobbit, I'm making sure readers know, even before they meet him, that John Pilkin is going to die. My reason is that the story is about Floyd and Brittney, and the impact on them of Pilkin's death (which occurs only about one-third of the way through the story). I don't want readers getting too invested in Pilkin himself.
Capote seems to be doing something similar. The story isn't about Miss Bobbit's death; it's about the effect she has on the town before she dies. Spinrad's novel is about sexual obsession strong enough to condemn ten thousand people to slow death. It is not about survival, which may be why he tips the ending before he starts.
Tachyon Explosions
There's an intermediate level of foreshadowing in which the author hints at events to come more explicitly than needed merely for tone setting, but less explicitly than Capote or Spinrad. Thus, we might read:
I should have paid more attention when Alex Ryder was explaining the risks of reversing the neutron flow in the tachyon-pulse generator. After all, the man was missing the middle three fingers on his right hand.
There are a lot of reasons why a writer might do this, but one of the simplest, suggested by blogger Gerry Visco, 4 is to pull the reader through slow sections in the narrative. If you have to spend two pages explaining tachyon-powered space drives, it might help to begin with the threat of disaster to come.
Visco suggests a similar mainstream example: "Susan had no idea when she paid her five dollars for the afternoon matinee that she had just made one of the biggest mistakes of her life." The nice thing about this kind of fore shadowing is that the reader isn't sure what type of calamity is about to befall. Will the tachyon-pulse generator blow up? And what's going to happen to Susan? Will she be taken hostage by terrorists? Step into a time warp? Meet her ex-mother-in-law? The possibilities are endless.
Analog Science Fiction and Fact - 2014-07 Page 33