Invisible Women: Exposing Data Bias in a World Designed for Men
Page 17
Voice recognition has also been suggested as a solution to smartphone-associated RSI,19 but this actually isn’t much of a solution for women, because voice-recognition software is often hopelessly male-biased. In 2016, Rachael Tatman, a research fellow in linguistics at the University of Washington, found that Google’s speech-recognition software was 70% more likely to accurately recognise male speech than female speech20 – and it’s currently the best on the market.21
Clearly, it is unfair for women to pay the same price as men for products that deliver an inferior service to them. But there can also be serious safety implications. Voice-recognition software in cars, for example, is meant to decrease distractions and make driving safer. But they can have the opposite effect if they don’t work – and often, they don’t work, at least for women. An article on car website Autoblog quoted a woman who had bought a 2012 Ford Focus, only to find that its voice-command system only listened to her husband, even though he was in the passenger seat.22 Another woman called the manufacturer for help when her Buick’s voice-activated phone system wouldn’t listen to her: ‘The guy told me point-blank it wasn’t ever going to work for me. They told me to get a man to set it up.’ Immediately after writing these pages I was with my mother in her Volvo Cross-Country watching her try and fail to get the voice-recognition system to call her sister. After five failed attempts I suggested she tried lowering the pitch of her voice. It worked first time.
As voice-recognition software has become more sophisticated, its use has branched out to numerous fields, including medicine, where errors can be just as grave. A 2016 paper analysed a random sample of a hundred notes dictated by attending emergency physicians using speech-recognition software, and found that 15% of the errors were critical, ‘potentially leading to miscommunication that could affect patient care’.23 Unfortunately these authors did not sex-disaggregate their data, but papers that have, report significantly higher transcription error rates for women than men.24 Dr Syed Ali, the lead author of one of the medical dictation studies, observed that his study’s ‘immediate impact’ was that women ‘may have to work somewhat harder’ than men ‘to make the [voice recognition] system successful’.25 Rachael Tatman agrees: ‘The fact that men enjoy better performance than women with these technologies means that it’s harder for women to do their jobs. Even if it only takes a second to correct an error, those seconds add up over the days and weeks to a major time sink, time your male colleagues aren’t wasting messing with technology.’
Thankfully for frustrated women around the world, Tom Schalk, the vice president of voice technology at car navigation system supplier ATX, has come up with a novel solution to fix the ‘many issues with women’s voices’.26 What women need, he said, was ‘lengthy training’ – if only women ‘were willing’ to submit to it. Which, sighs Schalk, they just aren’t. Just like the wilful women buying the wrong stoves in Bangladesh, women buying cars are unreasonably expecting voice-recognition software developers to design a product that works for them when it’s obvious that the problem needing fixing is the women themselves. Why can’t a woman be more like a man?
Rachael Tatman rubbishes the suggestion that the problem lies in women’s voices rather than the technology that doesn’t recognise them: studies have found that women have ‘significantly higher speech intelligibility’,27 perhaps because women tend to produce longer vowel sounds28 and tend to speak slightly more slowly than men.29 Meanwhile, men have ‘higher rates of disfluency, produce words with slightly shorter durations, and use more alternate (‘sloppy’) pronunciations’.30 With all this in mind, voice-recognition technology should, if anything, find it easier to recognise female rather than male voices – and indeed, Tatman writes that she has ‘trained classifiers on speech data from women and they worked just fine, thank you very much’.
Of course, the problem isn’t women’s voices. It’s our old friend, the gender data gap. speech-recognition technology is trained on large databases of voice recordings, called corpora. And these corpora are dominated by recordings of male voices. As far as we can tell, anyway: most don’t provide a sex breakdown on the voices contained in their corpus, which in itself is a data gap of course.31 When Tatman looked into the sex ratio of speech corpora only TIMIT (‘the single most popular speech corpus in the Linguistic Data Consortium’) provided data broken down by sex. It was 69% male. But contrary to what these findings imply, it is in fact possible to find recordings of women speaking: according to the data on its website, the British National Corpus (BNC)32 is sex-balanced.33
Voice corpora are not the only male-biased databases we’re using to produce what turn out to be male-biased algorithms. Text corpora (made up of a wide variety of texts from novels, to newspaper articles, to legal textbooks) are used to train translation software, CV-scanning software, and web search algorithms. And they are riddled with gendered data gaps. Searching the BNC34 (100 million words from a wide range of late twentieth-century texts) I found that female pronouns consistently appeared at around half the rate of male pronouns.35 The 520-million-word Corpus of Contemporary American English (COCA) also has a 2:1 male to female pronoun ratio despite including texts as recent as 2015.36 Algorithms trained on these gap-ridden corpora are being left with the impression that the world actually is dominated by men.
Image datasets also seem to have a gender data gap problem: a 2017 analysis of two commonly used datasets containing ‘more than 100,000 images of complex scenes drawn from the web, labeled with descriptions’ found that images of men greatly outnumber images of women.37 A University of Washington study similarly found that women were under-represented on Google Images across the forty-five professions they tested, with CEO being the most divergent result: 27% of CEOs in the US are female, but women made up only 11% of the Google Image search results.38 Searching for ‘author’ also delivered an imbalanced result, with only 25% of the Google Image results for the term being female compared to 56% of actual US authors, and the study also found that, at least in the short term, this discrepancy did affect people’s views of a field’s gender proportions. For algorithms, of course, the impact will be more long term.
As well as under-representing women, these datasets are misrepresenting them. A 2017 analysis of common text corpora found that female names and words (‘woman’, ‘girl’, etc.) were more associated with family than career; it was the opposite for men.39 A 2016 analysis of a popular publicly available dataset based on Google News found that the top occupation linked to women was ‘homemaker’ and the top occupation linked to men was ‘Maestro’.40 Also included in the top ten gender-linked occupations were philosopher, socialite, captain, receptionist, architect and nanny – I’ll leave it to you to guess which were male and which were female. The 2017 image dataset analysis also found that the activities and objects included in the images showed a ‘significant’ gender bias.41 One of the researchers, Mark Yatskar, saw a future where a robot trained on these datasets who is unsure of what someone is doing in the kitchen ‘offers a man a beer and a woman help washing dishes’.42
These cultural stereotypes can be found in artificial intelligence (AI) technologies already in widespread use. For example, when Londa Schiebinger, a professor at Stanford University, used translation software to translate a newspaper interview with her from Spanish into English, both Google Translate and Systran repeatedly used male pronouns to refer to her, despite the presence of clearly gendered terms like ‘profesora’ (female professor).43 Google Translate will also convert Turkish sentences with gender-neutral pronouns into English stereotypes. ‘O bir doktor,’ which means ‘S/he is a doctor’ is translated into English as ‘He is a doctor’, while ‘O bir hemsire (which means ‘S/he is a nurse’) is rendered ‘She is a nurse’. Researchers have found the same behaviour for translations into English from Finnish, Estonian, Hungarian and Persian.
The good news is that we now have this data – but whether or not coders will use it to fix their male-biased algorithms remains to
be seen. We have to hope that they will, because machines aren’t just reflecting our biases. Sometimes they are amplifying them – and by a significant amount. In the 2017 images study, pictures of cooking were over 33% more likely to involve women than men, but algorithms trained on this dataset connected pictures of kitchens with women 68% of the time. The paper also found that the higher the original bias, the stronger the amplification effect, which perhaps explains how the algorithm came to label a photo of a portly balding man standing in front of a stove as female. Kitchen > male pattern baldness.
James Zou, assistant professor of biomedical science at Stanford, explains why this matters. He gives the example of someone searching for ‘computer programmer’ on a program trained on a dataset that associates that term more closely with a man than a woman.44 The algorithm could deem a male programmer’s website more relevant than a female programmer’s – ‘even if the two websites are identical except for the names and gender pronouns’. So a male-biased algorithm trained on corpora marked by a gender data gap could literally do a woman out of a job.
But web search is only scraping the surface of how algorithms are already guiding decision-making. According to the Guardian 72% of US CVs never reach human eyes,45 and robots are already involved in the interview process with their algorithms trained on the posture, facial expressions and vocal tone of ‘top-performing employees’.46 Which sounds great – until you start thinking about the potential data gaps: did the coders ensure that these top-performing employees were gender and ethnically diverse and, if not, does the algorithm account for this? Has the algorithm been trained to account for socialised gender differences in tone and facial expression? We simply don’t know, because the companies developing these products don’t share their algorithms – but let’s face it, based on the available evidence, it seems unlikely.
AI systems have been introduced to the medical world as well, to guide diagnoses – and while this could ultimately be a boon to healthcare, it currently feels like hubris.47 The introduction of AI to diagnostics seems to be accompanied by little to no acknowledgement of the well-documented and chronic gaps in medical data when it comes to women.48 And this could be a disaster. It could, in fact, be fatal – particularly given what we know about machine learning amplifying already-existing biases. With our body of medical knowledge being so heavily skewed towards the male body, AIs could make diagnosis for women worse, rather than better.
And, at the moment, barely anyone is even aware that we have a major problem brewing here. The authors of the 2016 Google News study pointed out that not a single one of the ‘hundreds of papers’ about the applications for word-association software recognised how ‘blatantly sexist’ the datasets are. The authors of the image-labelling paper similarly noted that they were ‘the first to demonstrate structured prediction models amplify bias and the first to propose methods for reducing this effect’.
Our current approach to product design is disadvantaging women. It’s affecting our ability to do our jobs effectively – and sometimes to even get jobs in the first place. It’s affecting our health, and it’s affecting our safety. And perhaps worst of all, the evidence suggests that when it comes to algorithm-driven products, it’s making our world even more unequal. There are solutions to these problems if we choose to acknowledge them, however. The authors of the women = homemaker paper devised a new algorithm that reduced gender stereotyping (e.g. ‘he is to doctor as she is to nurse’) by over two-thirds, while leaving gender-appropriate word associations (e.g. ‘he is to prostate cancer as she is to ovarian cancer’) intact.49 And the authors of the 2017 study on image interpretation devised a new algorithm that decreased bias amplification by 47.5%.
CHAPTER 9
A Sea of Dudes
When Janica Alvarez was trying to raise funds for her tech start-up Naya Health Inc. in 2013, she struggled to get investors to take her seriously. In one meeting, ‘investors Googled the product and ended up on a porn site. They lingered on the page and started cracking jokes’, leaving Alvarez feeling like she was ‘in the middle of a fraternity’.1 Other investors were ‘too grossed out to touch her product or pleaded ignorance’, with one male investor saying ‘I’m not touching that; that’s disgusting.’2 And what was this vile, ‘disgusting’ and incomprehensible product Alvarez was pitching? Reader, it was a breast pump.
The odd thing is, the breast-pump industry is one that is ripe for ‘disruption’, as Silicon Valley would have it. Breast-pumping is huge business in the US in particular: given the lack of legally mandated maternity leave, for most American women breast-pumping is the only option if they want to follow their doctors’ recommendations and breastfeed their babies for at least six months (in fact, the American Academy of Pediatrics recommends that women try to breastfeed for at least twelve months).3
And one company, Medela, has pretty much cornered the market. According to the New Yorker, ‘Eighty per cent of hospitals in the United States and the United Kingdom stock Medela’s pumps, and its sales increased thirty-four per cent in the two years after the passage of the Affordable Care Act, which mandated coverage of lactation services, including pumps.’ But the Medela pump is just not very good. Writing for the New Yorker4 Jessica Winter described it as a ‘hard, ill-fitting breast shield with a bottle dangling from it’, which, as it sucks milk out of a woman’s breast ‘pulls and stretches the breast like it’s taffy, except that taffy doesn’t have nerve endings’.5 And although some women manage to make it work hands-free most can’t because it doesn’t fit well enough. So they just have to sit and hold their personal milking contraptions to their breasts, for twenty minutes a time, several times a day.
So, to sum up: captive market (currently estimated at $700 million with room to grow)?6 Check. Products that aren’t serving consumer needs? Check. Why aren’t investors lapping it up?
Addressing the under-representation of women in positions of power and influence is often framed as a good in itself. And, of course, it is. It is a matter of justice that women have an equal chance of success as their equally qualified male colleagues. But female representation is about more than a specific woman who does or doesn’t get a job, because female representation is also about the gender data gap. As we saw with Sheryl Sandberg’s story about pregnancy parking, there will be certain female needs men won’t think to cater for because they relate to experiences that men simply won’t have. And it’s not always easy to convince someone a need exists if they don’t have that need themselves.
Dr Tania Boler, founder of women’s health tech company Chiaro, thinks that the reluctance to back female-led companies is partly a result of the ‘stereotype that men like great design and great tech and women don’t’. But is this stereotype based in reality, or is it possible that the problem isn’t tech-blind women so much as woman-blind tech, created by a woman-blind tech industry and funded by woman-blind investors?
A substantial chunk of tech start-ups are backed by venture capitalists (VCs) because they can take risks where banks can’t.7 The problem is that 93% of VCs are men,8 and, ‘men back men’, explains Debbie Woskow, co-founder of AllBright, a members’ club, academy, and fund that backs female-led business. ‘We need to have more women writing cheques. And men need to recognise that backing women is a great investment.’ Woskow tells me that when she was in the process of setting up AllBright with her friend Anna Jones, the former CEO of Hearst, ‘men who should know better, to be honest’ would ‘frequently’ tell them, ‘That’s lovely, it’s great that you and Anna have set up a charity.’ Woskow bristles at this. ‘We’re not a charity. We’re doing this because women deliver great economic returns.’
The data suggests she’s not wrong. Research published in 2018 by Boston Consulting Group found that although on average female business owners receive less than half the level of investment their male counterparts get, they produce more than twice the revenue.9 For every dollar of funding, female-owned start-ups generate seventy-eight cents, compared
to male-owned start-ups which generate thirty-one cents. They also perform better over time, ‘generating 10% more in cumulative revenue over a five-year period’.
This may be partly because women are ‘better suited for leadership than men’.10 That was the conclusion of a study conducted by BI Norwegian Business School, which identified the five key traits (emotional stability, extraversion, openness to new experiences, agreeableness and conscientiousness) of a successful leader. Women scored higher than men in four out of the five. But it may also be because the women who do manage to make it through are filling a gender data gap: studies have repeatedly found that the more diverse a company’s leadership is, the more innovative they are.11 This could be because women are just innately more innovative – but more likely is that the presence of diverse perspectives makes businesses better informed about their customers. Certainly, innovation is strongly linked to financial performance.
And when it comes to consumer electronics for women, Boler says, innovation has been sorely lacking. ‘There’s never been much innovation in consumer electronics for women,’ she says. ‘It’s always focused on a very superficial aesthetic level: turn something pink, or turn something into a piece of jewellery, rather than taking account of the fact that technology can solve real problems for women.’ The result has been a chronic lack of investment, meaning that ‘the actual technology that’s used in medical devices for women is sort of a kickback from the 1980s’.
When I interview her early in 2018 Tania Boler is about to launch her own breast pump, and she is scathing about what is currently available on the market. ‘It’s just horrible,’ she says, bluntly. ‘It’s painful, it’s loud, it’s difficult to use. It’s quite humiliating.’ I think of trying to hold a conversation with my sister-in-law as she sits on the sofa with her top off, her breasts wired up to a machine. ‘It’s not even that complicated to get it right,’ Boler adds. The notion that ‘it would be nice to pump while you’re able to do something else, rather than having to spend hours a day sitting there chained to this noisy machine’ should, she says be ‘a basic requirement’. But somehow, it hasn’t been. When I ask her why she thinks this is, Boler muses that perhaps it’s different for her because she’s a woman. So ‘I just go in with: “As a woman what do I want from this?”’