Forensic Psychology

Home > Other > Forensic Psychology > Page 43
Forensic Psychology Page 43

by Graham M Davies


  9.5 DETECTING DECEPTION FROM VERBAL CONTENT

  Is it possible to detect deception on the basis of what is being said? If so, what should one listen for; and is it better to use human coders or computer programs? Or are you better off totally ignoring the verbal content and instead using special equipment to analyse the speaker’s voice? We address these questions by examining a number of deception detection methods, and where possible, report on each method’s discriminative power (see Vrij 2015a, for a detailed review).

  9.5.1 Statement Validity Analysis

  Statement validity analysis (SVA) is the most widely used technique for assessing veracity on the basis of verbal content. It was originally developed in Germany and Sweden for assessing children’s accounts of alleged sexual abuse (Trankell, 1963; Undeutsch, 1967). Underlying the technique is the Undeutsch hypothesis, stating that if a child’s statement is based on the memory of an actual experience, it will differ in content and quality from a statement based on fabrication (Steller & Köhnken, 1989). A full SVA is a four-stage procedure (Köhnken, 2004). The first stage consists of a thorough analysis of the case-file, which – in turn – forms the basis for the generation of hypotheses about the source of the statement. Second, a semi-structured interview is conducted, where the child tells his or her own story. The interview is audio-taped and transcribed. Third, the statement is assessed in terms of credibility, using a so-called Criteria-Based Content Analysis (CBCA). The CBCA is based on a list of 19 criteria, for example “logical structure”, “descriptions of interactions” and “self- deprecation”. Finally, by using a so-called validity checklist, alternative explanations to the CBCA outcome are considered (Steller & Köhnken, 1989).

  The CBCA constitutes the core of the SVA. Its 19 criteria are grouped into five categories: (1) general characteristics; (2) specific contents; (3) peculiarities of content; (4) motivation-related contents; and (5) offence-specific elements. The presence of each of the 19 criteria is rated, and the more criteria present and the stronger the presence of each criterion, the stronger the hypothesis that the memory is based on a genuine personal experience (see, Köhnken 2004 for a full list and an in-depth discussion of the criteria).

  To what extent can CBCA discriminate between genuine and fabricated accounts? Before attempting to answer this question, one needs to consider that although the CBCA is widely used in court in some countries (Vrij, 2008), it is problematic to use real-life cases to assess the diagnostic value of the technique. In most cases where the technique is used, there is no other evidence than the statement itself. In such cases, we simply do not know whether the accounts are based on genuine or fabricated experiences (i.e. the “ground truth” is unknown).

  In order to get information on the technique’s diagnostic value, laboratory studies are needed. Vrij (2005a) reviewed the 37 first studies from laboratory settings (now there are at least 50 studies on SVA/CBCA). The majority of these studies focused on adults’ statements, which is not considered problematic as the Undeutsch hypothesis is not limited to children’s statements (Köhnken, 2004). Vrij’s review showed an overall accuracy rate of 73%, and the technique proved to be equally good for detecting truthful and fabricated accounts.

  9.5.2 Reality Monitoring

  The term reality monitoring (RM) has been used in basic memory research for many years, and refers to people’s ability to discriminate between self-experienced and imagined events (Johnson & Raye, 1981). Research on RM supports the notion that real experiences are products of perceptual processes, whereas imagined events are products of reflective processes. As a consequence, memories of real events tend to differ from memories of imagined events. Specifically, memories of real events tend to contain more perceptual information (e.g. details concerning taste, touch, smell) and contextual information (i.e. spatial and temporal details), than memories of imagined events.

  Alonso-Quecuty (1992) was the first to suggest that RM could be used as a tool not only to distinguish between one’s own real and imagined events, but also between other people’s real and imagined events, thus framing RM as a tool for distinguishing truthful and deceptive accounts. Many researchers have picked up on her idea (Sporer, 2004). Masip, Sporer, Garrido, and Herrerro (2005) provided the first systematic overview of RM studies conducted within the deception detection framework. Overall accuracy was 75%, and the technique proved to be about equally accurate in detecting truths and lies and worked equally well for children’s and adults’ statements. In sum, considering that the technique rests upon a well-established theoretical framework and that the criteria are relatively easy to learn, the RM technique is an interesting alternative to CBCA.

  9.5.3 Scientific Content Analysis

  The scientific content analysis (SCAN) technique was developed by Sapir, a former Israeli polygraph examiner. Similarly to SVA and RM, the underlying assumption of SCAN is that a statement based on memory of an actual experience differs in content from a statement based on invention. SCAN uses written statements, preferably statements that are handwritten by the examinee (to ensure that the examinee’s own words are produced). The list of SCAN criteria is extensive, and includes “Denial of allegations”, “Emotions” and “Change in language”. There is a list of 12 SCAN criteria that is used in research (Smith, 2001), in workshops where the technique is taught (Driscoll, 1994) or in field observations of the technique (Bogaard, Meijer, & Vrij, 2014). Compared to the CBCA and RM, SCAN is much less standardised in terms of coding.

  To date, there has been little research on the diagnostic value of SCAN (Nahari, Vrij, & Fisher, 2012). To our knowledge there are only a handful of published studies on SCAN. The laboratory studies show that truthful and deceptive statements are not different with regard to the SCAN criteria tested (Nahari, et al,. 2012; Porter & Yuille, 1996), and for the field studies the ground truth is unknown (Driscoll, 1994; Smith, 2001). The name of the method implies scientific status, but one should be aware that the scientific evidence supporting the technique is very meagre (Shearer, 1999).

  9.6 COMPUTER-BASED LINGUISTIC ANALYSIS

  Yet another approach to the detection of deception is to examine the linguistic structure of statements. Scientific studies within this approach started to appear already in the late 1960s, and the basic idea is easy to grasp: people’s choice of words may reveal more about their underlying mental states than the actual message (Pennebaker & King, 1999). There are several methods for conducting a linguistic analysis (basically, decomposing text, based on natural language, to word-level), but we will restrict our discussion to one method.

  9.6.1 Linguistic Inquiry and Word Count (LIWC)

  The linguistic inquiry and word count or LIWC is a computer-based technique that creates linguistic profiles by means of categorising words into different classes, such as (1) standard language dimensions (e.g. pronouns and articles), (2) psychological processes (e.g. emotional and sensory processes) and (3) relativity (e.g. space and time).

  By using LIWC it has been found that some words are less frequent in deceptive statements (e.g. first-person pronouns), whereas others are more frequent in deceptive statements (e.g. negative emotional words). An examination of the hit rates reveals room for improvement, even if the LIWC has proven to discriminate between deceptive and truthful statements at a rate better than chance. For example, Newman, Pennebaker, Berry, & Richards (2003) found an average hit rate (over three studies) of 67%. Interestingly, recent research shows that an automatic coding of RM criteria of liars’ and truth tellers’ statements utilising the LIWC software program resulted in fewer verbal cues to deception than a manual coding of the very same criteria (Vrij, Mann, Kristen, & Fisher, 2007). Finally, and importantly, by decomposing text to word-level one loses context. Since the context of a statement is important in forensic settings (e.g. the statement’s development over time), it may render linguistic analysis limited in legal contexts.

  9.6.2 Computer Analysis of Voice Stress

  Another approach to decepti
on detection is to analyse the voice as such, and neglect the verbal content. We will discuss two such approaches: voice stress analysis and layered voice-stress analysis.

  9.6.2.1 Voice stress analysis (VSA)

  Sometimes called the psychological stress evaluator or PSE, the basic assumption behind the technique is that by measuring the activity in the muscles responsible for producing speech, it may be possible to infer the speaker’s mental state (e.g. experiences of stress). The main phenomena of interest are so called “micro tremors”, weak involuntary muscle activities that can be detected with electrodes. It is well established that such tremors occur in large groups of muscles, for example the biceps. However, there is very little scientific evidence for the existence of tremors in the muscles that produce speech (Shipp & Izdebski, 1981). If there is no tremor in the muscles that produce speech, there is no tremor to measure in the voice. Even if was possible to show tremor in the voice, it would be necessary to demonstrate an association between (1) a certain type of tremor and lying and (2) another type of tremor and telling the truth. Taken together, the VSA seems to suffer from problems relating to both reliability and validity (see the report of the (National Research Council, 2003 for a similarly sceptical view).

  9.6.2.2 Layered voice-stress analysis (LVA)

  This is a more recent method, and its advocates claim that the method rests upon highly sophisticated technology. The LVA uses a computer program for analysing errors occurring when a raw signal (sound) is digitised. These errors are very difficult for the human ear to pick up, but it is argued that these can be measured by more refined methods. Such errors are not exclusive for the human voice, and can be found for any type of sound (e.g. a clock ticking or a washing machine). The LVA offers statistical output on two such errors, and uses these to calculate a so-called “truth value”. Experts on forensic phonetics have equated the diagnostic value of the LVA to flipping a coin (Eriksson & Lacerda, 2007).

  9.7 PSYCHO-PHYSIOLOGICAL DETECTION OF DECEPTION

  The third major branch of research on deception detection focuses on differences in psycho-physiological patterns, which are typically measured using the polygraph.

  9.7.1 Development of Psycho-Physiological Detection of Deception

  Physiological deception detection approaches have a long history. In China, suspected transgressors were forced to chew rice powder and spit it out. If the rice powder was still dry, the suspect was deemed to be guilty (Sullivan, 2001). Underlying this approach is the assumption that liars and truth tellers differ in terms of physiological responses – in the case of the rice powder technique, a decrease in saliva production was interpreted as a result of fear of being caught lying.

  The polygraph in its present form is an instrument built to measure the same physiological processes as the Chinese attempted to tap. Although the modern polygraph is more technically sophisticated, the basic function of the polygraph is the same today as almost 100 years ago (Grubin & Madsen, 2005). The polygraph measures at least three physiological systems, all governed by the autonomic nervous system (Fiedler, Schmid, & Stahl, 2002), typically galvanic skin response (sweating from the palm), cardiovascular activity such as systolic and diastolic blood pressure (measured by a cuff on the upper arm) and breathing patterns (measured by sensors attached around the chest).

  PHOTO 9.2 The polygraph measures at least three physiological systems: typically galvanic skin response, cardiovascular activity and breathing patterns.

  Source: © Andrey Burmakin/Shutterstock

  In the US, polygraph testing occurs in many parts of law enforcement and security screenings (Honts, 2004), and polygraph tests play a role in the legal systems of Belgium, Canada, Israel, Japan, Korea, Mexico, Thailand and Turkey (Honts, 2004; Meijer & Verschuere, 2015; Pollina, Dollins, Senter, Krapohl, & Ryan, 2004; Vrij, 2008). Its use in these countries varies; in Japan, polygraph evidence is generally admissible (Hira & Furumitsu, 2002), while most countries are restrictive with the use of polygraph tests in court (Vrij, 2008).

  9.7.2 The Control Question Test (CQT)

  There are two main families of polygraph tests (Palmatier & Rovner, 2015). The most frequently used is the Control Question Test (CQT), sometimes referred to as the Comparison Question Test (Honts, 2004; Honts & Reavy, 2015), which is widely applied in law enforcement in the US, Canada and Israel (Ben-Shakhar, Bar-Hillel, & Kremnitzer, 2002).

  The CQT is administered in several stages (Lykken, 1998). In the introductory phase, rapport is established, basic information is obtained and the subject is invited to provide free recall. Questions are then formulated, and the subject and the polygraph examiner discuss these questions. The first reason for this is to establish that the subject has understood all the questions. The second reason is that the examiner wants to be sure that the subject will respond to the questions with “yes” or “no” (Vrij, 2008). After this, the question phase commences. This phase is run several times, and the responses are averaged across the different test occasions. There are three categories of question. The first category is irrelevant questions or neutral questions (“Is your last name Morris?”, “Do you live in the United States?”); these questions are not included in the analysis of the results. The second category of questions is the relevant questions that directly concern the crime under investigation (“Did you break into the house on Stanley Street?” “Did you shoot Mr. Philip?”). The third category is the control questions, which concern likely transgressions in the past, unrelated to the event under scrutiny (“Before the age of 25, did you ever take something that did not belong to you?”). These questions are designed to force everyone to give a deceptive response, both because they are vague enough to cover the most frequent transgressions (such as lies for social purposes), and because the subject has been steered into denying such transgression during the introductory phase of the test. The purpose of the control questions is to establish a deception baseline, to which responses to the relevant questions are compared. In simple terms, the comparison of interest is the difference in physiological response to the relevant and the control questions. The basic premise is that a guilty subject will react more strongly to the relevant questions than to the control questions (as they are lying on both questions, with the more serious lie being told in response to the relevant question), while the opposite pattern is expected for innocent people (as they are telling the truth on the relevant question, Honts & Reavy, 2015).

  9.7.2.1 Validity of the CQT

  The polygraph has been evaluated using both field and laboratory approaches. Field studies show that the CQT is rather good at classifying guilty suspects. In an overview by Vrij (2008), it was concluded that more than 80% of the guilty suspects failed the test. However, the accuracy rates are lower for innocent suspects. This indicates that the test has a tendency for false positive errors (classifying innocent suspects as guilty), which poses a problem in the legal system, as false positives are considered more severe mistakes than false negatives (classifying guilty suspects as innocent). The results of field studies must, however, be interpreted with caution. The main problem associated with these studies is being able to establish ground truth – unambiguous knowledge about whether the subject is actually guilty or not. In some studies, a main source of information leading to classifications is confession evidence. It is a well-known fact that innocent people sometimes confess to crimes they have not committed (see Chapter 8 and Kassin, 2004; Kassin et al., 2010).

  In laboratory studies, the ground truth is not problematic. Rather, the challenge is to create externally valid situations, taking into account among other things the high-stake nature of polygraph tests in the investigative context. One problem with giving an overall accuracy figure for laboratory-based studies on the polygraph is variability in the criteria for deciding whether a study is externally valid. The review by Vrij (2008) showed a hit rate between 74% and 82% for guilty suspects, but with a pronounced error rate for innocent suspects. In an overview by Honts (2004), the
average accuracy rate was 91%, with no prominent tendency for either false negatives or false positives. A third summary of studies produced an overall accuracy of 86% (NRC, 2003). In conclusion, although it is difficult to provide exact figures, field and laboratory studies indicate that the CQT has some discriminative value.

  9.7.2.2 Problems with the CQT

  The CQT has been the target of harsh criticism (Ben-Shakhar & Furedy, 1990; Lykken, 1998). There is no room in this chapter to fully discuss these criticisms, but we will briefly present a core argument against the use of the CQT. A central assumption of the CQT is that innocent suspects will respond with more arousal to the control questions than to relevant questions. This assumption is far from safe. For example, it is conceivable that innocent suspects would react more strongly to the details of the crime they are being falsely accused of (the relevant questions), than to a control question about a rather mild transgression in the past.

 

‹ Prev