Book Read Free

The Neuroscience of Intelligence

Page 20

by Richard J Haier


  Dr. Rauscher’s key clarification regarding the claim that increased general intelligence was an erroneous inference from the original report is correct, but confusion arose in part from the labeling of their figure axis (reproduced here as Figure 5.1) of “IQ equivalents.” Moreover, the University press release that accompanied the original Nature publication contributed to the confusion. My copy of that release (embargoed until 6 PM EDT, October 13, 1993) begins with the finding about “spatial intelligence,” but then quotes the researchers as saying, “Thus, the IQs of subjects participating in the music condition were 8–9 points above their scores in the other two conditions.” The distinction between spatial and general intelligence could have been clearer. After the 1999 exchange in Nature, controversy about a Mozart Effect on any mental ability tests persisted. More studies were published with conflicting and inconsistent results.

  Another, larger meta-analysis was published in 2010 and this one is widely regarded as the final blow (Pietschnig et al., 2010). This comprehensive analysis included nearly 40 studies and over 3,000 participants. An important feature of this analysis was the inclusion of unpublished studies because studies with negative results often fail to be published. This can bias the literature toward positive studies. Another feature was separate analyses for studies done by the original authors of the 1993 report for comparison with studies done by other researchers. Overall, this meta-analysis showed a small effect for Mozart on spatial task performance and a nearly identical small effect for other music conditions. Including the unpublished studies corrected this result to an even smaller effect. The results from only those studies done by the original researchers generally showed greater effects than those from studies by other investigators, indicating a confounding influence of lab affiliation. The meta-analysis authors concluded that, “On the whole, there is little left that would support the notion of a specific enhancement of spatial task performance through exposure to the Mozart sonata KV 448.” The title of this meta-analysis paper said it all, “Mozart effect – Shmozart effect.”

  Whether the newborns of Georgia got their CDs or not, it is clear that the public understanding of the 1993 Nature report went far beyond what the study’s authors intended. The original study had been conducted at my university under the auspices of the prestigious Center for Learning and Memory, and I knew the senior author, Dr. Gordon Shaw. Although I was unaware of the study before it was published, I subsequently had a number of affable conversations with him. A physicist by training, he was interested in the brain and problem-solving and, before he passed away, he was developing a theory relating the complexity of music composition to cognition. He regretted the widespread misunderstandings about the original finding and general intelligence, but he remained convinced that music and cognition were linked in a positive way. Considerable research supports this view and his work with Dr. Rauscher helped stimulate interest in this important area. Whatever the many rich benefits of music exposure and training are, increased intelligence, general or spatial, is not one of them. The Mozart Effect should be a cautionary tale for any researcher who claims dramatic increases in IQ after an intervention. Unfortunately, the lessons have not been taken to heart, and such claims continue.

  5.2 Case 2: You Must Remember This, and This, and This …

  Another extraordinary claim about increasing intelligence was published as the cover article in the Proceedings of the National Academy of Sciences (PNAS) (Jaeggi et al., 2008). Mozart was not mentioned, but the report claimed that training on a difficult task of working memory resulted in a “dramatic” improvement on a test of fluid intelligence. As noted in Chapter 1, fluid intelligence (often expressed with the notation Gf) is highly correlated to the g-factor and many intelligence researchers regard them as synonymous terms. Moreover, this surprising finding was augmented by two important observations: the effect increased with more training, suggesting a kind of dose response, and the effect transferred from the memory training task to an “entirely different” test of abstract reasoning. The authors concluded, “Thus, in contrast to many previous studies, we conclude that it is possible to improve Gf without practicing the testing tasks themselves, opening a wide range of applications.”

  This pronouncement was a bombshell. It received wide media coverage and public attention. Like the original Mozart report, this report also was seized upon by researchers intent on showing that general intelligence could not be something fixed or genetic because it could be increased dramatically with a memory training exercise. For most experienced intelligence researchers, however, this claim immediately was reminiscent of the 1989 cold fusion claim of an astonishing breakthrough thought to be impossible by most physicists. The cold fusion result turned out to be a heat measurement error made by eminent researchers in one field who were inexperienced in the measurement technicalities required in another field. Could measurement error of fluid intelligence possibly be a factor in the PNAS report? You know where this is going.

  The rationale for the PNAS memory training study was simple. Memory is a well-established component of intelligence, so improving memory by training could improve intelligence. Ignoring that both could be related to a third underlying factor, a critical component for testing this simple train of thought would be that the training task must be independent of the intelligence test. In other words, the memory training effect should transfer to a completely different test that did not require memory. For example, training a person to memorize the order of cards in a deck might transfer to their ability to remember a sequence of 52 random numbers because both are similar tests of memory. It would be more impressive if training to memorize cards resulted in better scores on a test of analogies (analogy tests usually have high g-loadings). It would be even more impressive if four weeks of training to memorize cards resulted in twice the improvement on analogy tests than did two weeks of card training.

  For the PNAS memory training experiment, 35 university students were randomly assigned to one of four training programs (one student dropped out for a total of 34 participants) and 35 other students were assigned to four control groups that received no training. Of the 70 students, the male/female ratio was about 50–50. Thus, each group had a very small number of about eight male or female participants. Each participant was tested before and after training (or the same control intervals) on either the Raven’s Advanced Progressive Matrices (RAPM, discussed in previous chapters and used only for one training group) or the similar Bochumer Matrizen-Test (BOMAT) of abstract reasoning used for three training groups. There was no explanation for using two tests instead of one. Each test had two forms, one for the pre-test and one for the post-test. The four training programs differed on the number of training sessions between the pre- and post-testing: 8, 12, 17, and 19 days. These same intervals defined pre- and post-testing for the four control groups who received no training.

  Memory training for all four groups used a well-known task in cognitive psychology. It’s called the n-back test, where n stands for any integer. The idea is that a long series of random numbers or letters or other elements are presented one at a time on a computer screen to the participant. In the 1-back version with letters, for example, whenever any letter is repeated twice in a row, that is the same letter is 1 back in the series, the participant presses a button. This is quite easy because it requires keeping only one letter in working memory until the next letter appears. If the next letter is not the same, no button is pressed and the new letter must now be remembered until the next letter appears. In the 2-back version, the button is pressed if the same letter was presented 2 letters before. This requires keeping two letters in working memory. The 3-back version is more difficult and 4-, 5-, 6-back become considerably more difficult. Note that the letters (or numbers or whatever elements are used) can be presented visually or through earphones. For this study, participants were trained to do both a visual and an auditory version simultaneously. I’m not kidding. To understate the obvious, this is quite d
ifficult and it is surprising that only one person dropped out. A more detailed description of the task and a link to an animation showing how it works is in Textbox 5.1.

  Textbox 5.1: The n-back test

  The dual n-back test is illustrated in Figure 5.2 (Jaeggi et al., 2008). The 2-back version is shown where spatial positions and letters are the elements used. The person presses a button whenever the same element is repeated with one intervening element. The spatial position elements are presented one at a time visually and the letters are presented one at a time through headphones. In the dual version, both the spatial and the letter elements are presented simultaneously for 500 milliseconds each with 2,500 milliseconds between elements. This is illustrated in the top row of Figure 5.2. It shows a sequence of spatial positions (white squares) in elements presented one at a time starting on the left. The middle element should trigger a button press because it is a repeat of the identical element 2-back (the element on the left end of the row). The bottom row shows the letter version. The middle element “C” should trigger a button press because it is an identical repeat of the “C” 2-back (at the very left end of the row). The “C” at the right end of the row is also a trigger because it is an identical repeat of the middle “C” 2-back. Once a person learns to do this difficult memory task better than chance, they move on to the harder 3-back version, which in turn progresses to 4-back and 5-back versions, and so on until performance cannot be learned better than chance. This all is a bit tricky to understand the first time you read it, but once you get how the n-back works, you’ll appreciate how difficult the training becomes. Remember that the claim is that training on this task increases your fluid intelligence (without giving you headaches). Animated demonstrations of the 2-back test can be found on this book’s website (www.cambridge.org/us/academic/subjects/psychology/cognition/neuroscience-intelligence).

  Figure 5.2 Illustration of the dual n-back memory task. This is a 2-back example. Two versions are run simultaneously. The top row shows the visual–spatial version. The location of the white box in each presentation must be remembered. If the same location is repeated after one intervening presentation, a button is pressed because the same location is repeated 2 presentations back. The bottom row shows the auditory letter version. Each letter presentation is made through earphones. When the same letter is repeated 2-back, a button is pressed. After training on each version separately, both versions are presented simultaneously and people practice until they can perform 3-back, 4-back or more better than chance. In this illustration the order of presentation is from left to right, one presentation at a time.

  Reprinted with permission, Jaeggi et al. (2008). See also Animations 5.1 and 5.2 on this book’s website (www.cambridge.org/us/academic/subjects/psychology/cognition/neuroscience-intelligence).

  The BOMAT test of abstract reasoning is based on visual analogies and is similar to the Raven’s test described in Chapter 1. In the BOMAT, a 5 × 3 matrix has a figure in each cell except one cell is blank. The missing figure must be determined from the logical rules derived from other components (shape, color, pattern, number, spatial arrangement of the elements of the figure). The person taking the test must recognize the structure of the matrix and select from six possible answers the one that allows the logical completion of the matrix. The Raven’s test used a 3 × 3 matrix, so fewer elements are required to be retained in working memory while solving each item compared to the 5 × 3 matrix of elements used in the BOMAT. This is why the BOMAT is more of a working memory test and why it is similar to the n-back. This similarity undercuts the claim that training on the n-back transfers to a completely different test of fluid intelligence (Moody, 2009).

  The results of training are shown in Figure 5.3. They appeared clear-cut to the authors, but to most intelligence researchers their meaning was far less clear. All the participants in the training sessions were combined into one group (N = 34) and all the controls into another group (N = 35). Average n-back difficulty increased for the training group from about 3-back at the start to about 5-back at the end. The groups did not differ on the pre-test of abstract reasoning. Both groups showed average increased abstract reasoning scores at post-test. This was about a 1-point increase for the control group and about 2 points for the training group. Note these are not IQ points; they are the number of correctly answered items on the test. This small change was statistically significant and described as “substantially superior.” When the intelligence test score increase was graphed against days of practice, the group with 8 days showed less than a 1-point increase whereas the group with 19 days of practice showed nearly a 5-point gain.

  Figure 5.3 The line graphs that claimed a “landmark” result for memory training. Panel (a) shows pre and post n-back training fluid intelligence test scores (y-axis) for training and control groups. Panel (b) shows the gain on intelligence test scores (y-axis) plotted against the number of training days.

  Reprinted with permission, Jaeggi et al. (2008).

  The authors boldly concluded, “The finding that cognitive training can improve Gf [fluid intelligence] is a landmark result because this form of intelligence has been claimed to be largely immutable. Instead of regarding Gf as an immutable trait, our data provide evidence that, with appropriate training, there is potential to improve Gf. Moreover, we provide evidence that the amount of Gf gain critically depends on the amount of training time. Considering the fundamental importance of Gf in everyday life and its predictive power for a large variety of intellectual tasks and professional success, we believe that our findings may be highly relevant to applications in education.” I do not know whether they contacted the Governor of Georgia, or any other state, with this newsflash, but they ignited a memory training frenzy.

  The first devastating critique came quickly (Moody, 2009). Dr. Moody pointed out several serious flaws in the PNAS cover article that rendered the results uninterpretable. The most important was that the BOMAT used to assess fluid reasoning was administered in a flawed manner. The items are arranged from easy ones to very difficult ones. Normally, the test-taker is given 45 minutes to complete as many of the 29 problems as possible. This important fact was omitted from the PNAS report. The PNAS study allowed only 10 minutes to complete the test, so any improvement was limited to relatively easy items because the time limit precluded getting to the harder items that are most predictive of Gf, especially in a sample of college students with restricted range. This non-standard administration of the test transformed the BOMAT from a test of fluid intelligence to a test of easy visual analogies with, at best, an unknown relationship to fluid intelligence. Interestingly, the one training group that was tested on the RAPM showed no improvement. A crucial difference between the two tests is that the BOMAT requires the test-taker to keep 14 visual figures in working memory to solve each problem, whereas the RAPM requires holding only eight in working memory (one element in each matrix is blank until the problem is solved). Thus, performance on the BOMAT is more heavily dependent on working memory. This is the exact nature of the n-back task, especially as the version used for training included the spatial position of matrix elements quite similar to the format used in the BOMAT problems (see Textbox 5.1). As noted by Moody, “Rather than being ‘entirely different’ from the test items on the BOMAT, this [n-back] task seems well-designed to facilitate performance on that test.” When this flaw is considered along with the small samples and issues surrounding small change scores of single tests, it is hard to understand the peer review and editorial processes that led to a featured publication in PNAS which claimed an extraordinary finding that was contrary to the weight of evidence from hundreds of previous reports.

  Subsequent n-back/intelligence research has progressed in stages similar to those in the Mozart Effect story. Dr. Jaeggi and colleagues published a series of papers addressing some of the key design flaws of the original study and reported results consistent with their original report (Jaeggi et al., 2010, 2011, 2014), as did some other researcher
s. Far more studies by other investigators failed to replicate the original claim of increased Gf, especially when they used more sophisticated research designs that included larger samples and multiple cognitive tests to estimate Gf as a latent variable along with other intelligence factors, to determine whether improved n-back performance transferred to increased intelligence scores (Chooi & Thompson, 2012; Colom et al., 2013; Harrison et al., 2013; Melby-Lervag & Hulme, 2013; Redick et al., 2013; Shipstead et al., 2012; Thompson et al., 2013; Tidwell et al., 2014; von Bastian & Oberauer, 2013, 2014).

  Undaunted by these independent failures to replicate, Jaeggi’s group published their own meta-analysis, including the negative studies. Their analysis supported a 4-point IQ increase due to n-back training (Au et al., 2015). They ignored warnings about IQ conversions and change scores, and they failed to note that 4 points is the estimated standard error of IQ tests. Other researchers quickly reanalyzed this meta-analysis (Bogg & Lasecki, 2015). They concluded the small effect reported by Au and colleagues likely resulted from the small sample sizes of most studies included in the meta-analysis because they were statistically underpowered and biased toward a spurious result. Therefore, they cautioned that the small training effects on Gf could be artifacts. Another comprehensive independent meta-analysis of 47 studies concluded that there were no sustainable transfer effects for memory training (Schwaighofer et al., 2015), although the authors encouraged more research with better study designs. Finally, there also is some evidence that small apparent increases in test scores after memory training can be due to improved task strategies rather than to increased intelligence (Hayes et al., 2015).

 

‹ Prev