The End of Insurance as We Know It
Page 17
TEAR DOWN THIS WALL
How can these barriers be overcome? There are a number of promising technologies that are here today and will emerge to provide a chance of solving these challenges in part or in whole, allowing for the possibility of entirely new risk transfer paradigms - many that include an insurance element but some that may be more quasi-insurance or fundamentally so different that we will not recognize it any more as insurance. These technologies will be highlighted in the remaining chapters of Part 3.
Need for trust:
• Blockchain
• Peer-to-peer (P2P) technology including social media, user reviews, etc.
Need for real-time data & ability to make sense of it:
• Telematics
• Internet of Things (IoT) including smart home technologies
• Natural language processing (NLP)
• Artificial intelligence (AI) including machine learning (ML)
• Aerial imagery
• Cloud storage
Need for customer engagement:
• Digital marketing
• Chatbots
• Robotic process automation (RPA)
• User interfaces (UI) and user experiences (UX)
1
CHAPTER 15 - INTELLIGENCE SQUARED
BIG DATA AND BIGGER DATA
For insurance carriers and many of the data providers that support the insurance ecosystem, the term Big Data that has become popularized in recent years may strike them as redundant. In fact, the insurance industry in general was one of the first adopters of batch processing technology in the 1960s and 1970s, taking advantage of the tech revolution in mainframe computing. Batch processing - a once per day, nightly process - was perfect for the insurance industry. Policies that were issued during the day by agents could be uploaded and coverage was bound for the following day with the requisite paperwork sent to the new policyholders. In addition, renewals could be processed or non-renewed overnight as well, sending proper documents and/or legal notice to customers. All of the new mainframe technology replaced stacks and stacks of physical papers and the need for an army of personnel to process all of that paperwork - a major efficiency gain over literally shuffling papers.
Before the rise of the digital economy over the past 25 years, the insurance industry were the proud owners of some of the largest private databases in the world (recall the State Farm system that holds 10x the information housed in the Library of Congress).[98] Insurance incumbents also employed some of the very first data scientists that developed advanced statistical techniques to query, analyze, synthesize and make predictions about the future using models. This job is called an actuary in the insurance industry and their work is properly described as actuarial science. In fact, at the risk of greatly oversimplifying their work, actuaries are at the heart of what makes running an insurance carrier possible. While any single policyholder runs the (low) risk of potentially having a calamitous event occurring, causing significant financial harm if not outright ruin, the chances that a diverse set of policyholders will have the same event befall them is virtually zero. By leveraging the law of large numbers among other statistical principles and techniques, actuaries can determine an appropriate price to charge - known as a rate or premium - that will cover the expected future losses (plus contingencies.) Actuaries determine rates for a certain group of customers or risk pool that has similar characteristics and has credibility. In this context, credibility means that the risk pool is large enough to have a statistically significant difference in loss performance than other groups, justifying a different rate.
Insurers have traditionally been in the business of risk segmentation. This involves identifying subsets within the general population of insureds that have similar characteristics that predict future losses and “pooling” them together in a cohort for rating and underwriting. How does Big Data help?
The more data that a carrier has, the more opportunities there are to segment risk.
The better a carrier segments risk relative to its competitors, the higher the likelihood it can grow profitably and thrive.
Over the course of time, carriers have gotten quite sophisticated at risk segmentation with millions - and some with billions - of pricing cells in their rating plan.[99] Most of the characteristics used in risk segmentation are directly observable characteristics that serve as proxies for hard-to-observe characteristics that are more directly related to losses.[100] For example, age and gender (where allowed) are used to price auto premiums today. A 16-year-old male is more likely to have auto losses than a 16-year-old female, and both are much more likely to have losses than a 35-year old (male or female). These characteristics are directly observable, but they serve as proxy variables that are generally correlated with an individual’s driving behavior. The reason 16-year old drivers perform worse is because they get into more accidents than a 35-year old, and this is because they do not have very much experience driving. They make bad judgments - drive too fast, change lanes quickly, brake too slowly - that result in accidents.
Traditionally, many of the variables or rating factors used in pricing and risk selection were collected on the insurance application. The application is generally thought of as a formality today in personal lines underwriting (less so in commercial lines) but strictly speaking the process for obtaining coverage usually starts with a licensed agent taking an application from a customer who is seeking insurance. As part of the application process, which is required in order to receive a quote, customers must answer a series of questions about themselves and the exposure they are seeking coverage for. Historically, the application process could take some time to compete: customers would set an appointment to meet with an insurance agent in a physical office and spend them completing the information on a paper form or answering questions from a customer service representative (CSR) who then input the responses into a computer system. Once the application was complete, a decision is often made by applying underwriting rules to decline the risk because it does not meet the carrier’s underwriting criteria or to provide a quote. (In some cases, a quote may be provided but the request for a policy may still need to be reviewed by underwriting to determine acceptability.) An independent agent collect variables on the application for a number of carriers and then provide quotes from multiple carriers for the consumers to choose among.
In today’s world where shifting consumer preferences are for quicker decisions and streamlined processes, taking 30-60 minutes to provide a quote for an auto, homeowners or renters insurance policy is archaic to many shoppers. Carriers compete on the speed of their application process to provide a quote. To do so, carriers have had to look closely at their applications to determine:
1.which questions can be eliminated because they do not provide significant value for pricing or underwriting the policy
2.which questions can be populated from public records and other third-party data sources
3.which additional rating and underwriting factors should be used based on independent third-party data, such as insurance credit score, that is not provided by the applicant
Over time, in general, insurance applications have become shorter to complete prior to obtaining a quote and underwriting decision, but the amount of information collected about the exposure may have grown due to the greater amount of third-party data that is provided by companies that specialize in this area. The amount of data collected by the agent and passed along to the carrier on any given customer is rather large, or used to be considered large. However, by today’s standards much of the information collected is rather small - it consists mostly of information captured using standardized data fields.
STAR SCHEMA
Historically, insurance data (along with many other industries) was configured in two different ways for two different purposes:
• conducting transactions
• analyzing results
Transactional data is information that is captured and used for
some sort of transaction: in a nutshell, to conduct business - get a quote, buy a policy, make an adjustment, file a claim. For transactional data, speed of data capture is essential because the data is being captured real-time - in the moment - and any undue delays in processing and completing that transaction leads to a poor user experience and could ultimate result in a loss of business. Data entry forms that captured fielded data, along with limited textual descriptions and image capture if needed, are tuned to take in information and store it in a “flat” format that takes the data from CPU memory to written on a disk as quickly as possible.
Analytical data is information that is used to assess business results and is ultimately used for decision making (some refer to this process as making data-driven decisions and is often contrasted with other ways of making decisions, usually conventional wisdom or other non-scientific method). For analytical data, the speed that is essential is the ability to query the data - that is, ask questions of it. A fundamental concept in information technology is that the speed to query data is a fundamentally different type of speed than is needed to capture transactional data.
•To make querying data for analysis purposes as fast as possible, data must go through an Extract, Transform and Load (ETL) process in order to be optimized for analytical purposes.
•Often, the data is stored in either a relational database (if it is small) or a data warehouse and/or data mart environment (if it is large).
◦ One benefit of having separate data stores for transactional and analytical data is that directly running analytical queries against the transactional data can slow down the ability for transactions to be processed.
◦ Another benefit is that applications and databases can be tuned to run optimally for either quick data capture and storage or for querying to facilitate data extraction and analysis.
◦ A third benefit is that analytical data stores can be supplemented with metadata - data about data - that help provide additional context but do not require direct input from a user or employee. For instance, each transaction that is processed can record a date and timestamp that documents precisely when the transaction was completed, which can be use to summarize the number of value of transactions for a given day, week, month, quarter or year.
Insurance companies typically have a plethora of data, generally from multiple transactional systems, that they seek to merge together into a single database, data warehouse or data mart. While this is simple in concept, it is extremely challenging in practice. In reality, several such data warehouses or data marts may exist, each supporting a different functional area such as Claims or HR, as bridging the gap between large systems is often exceedingly difficult, costly and time consuming. Data warehouses have traditionally supported basic reporting needs, either paper reports which were used traditionally by executives, managers and business analysts or by feeding dashboards and data cubes that are more commonly used today. Increasingly, these analytical data stores are being used not just by traditional analysts but data scientists as well. Data scientists use predictive algorithms and advanced analytics to gain deeper insights into the business than is possible with standard tables and graphs.
Data warehouses and analytical data stores can grow to be quite large as they contain thousands or millions of transactions across multiple systems over the course of many years. Structuring these data stores in a way that saves on storage space by removing redundant information through creating relationships between the data tables was historically important. For example, instead of recording details about each employee that conducted a transaction every time, an employee ID could be associated with each transaction and used to lookup information in a different table that stored information specific to the employee, such as their job title, experience level, tenure with the company, etc. A database schema is designed by an information architect to most efficiently relate the data with one another and optimize the tradeoff between the speed of analytical queries and storage space. Such structures are commonly known as a relational database and the data within them are termed structured data.
SENSORY OVERLOAD
All of the time and effort spent on capturing, storage, processing, reporting and analyzing data has traditionally been focused on fielded (structured) data. Fielded data is captured in a standardized manner using a drop-down menu, radio buttons, checkboxes and the like. This fielded, standardized data is best suited for a whole host of numerical exercises such as summarizing into counts, totals, averages and the like. Since the only data that was optimized for analysis was data that could be stored in relational tables, this caused much time and effort to be spent studying and understanding this information and prioritized it over other forms of data such as textual descriptions, voice and audio data, still images and videos. These types of data do not work well in a relational database environment and do not lend themselves easily to numerical analysis or searches. Additionally, they “cost” more in terms of their storage requirements as they generally take up a much larger amount of disk space than fielded data. If incumbents are unable to fully leverage the “big data” they have locked away today, they will likely struggle to make sense of a “bigger data” world tomorrow by using the same tried and true techniques. This new world of data explosion is here now and includes even greater amounts of data from new sources such as sensors. The opportunity costs to not keeping up with these new data streams will only grow over time.
A LIFESAVER TO THOSE DROWNING IN DATA
These new form of data comes at a high cost: it is exceedingly difficult to make sense of using our traditional data storage technologies and analytical software packages. But the world is rapidly changing.
•Storage and processing power are cheaper than ever with the advent of cloud computing
•Predictive algorithms critical to gaining unique insights that are key to successfully competing in today’s world work better on larger quantities of data
•Spending time to relate data to one another costs precious time that could be better spent analyzing the data directly.
In a new world full of Internet-enabled sensors that are continuously streaming data, old analytical tools need to be supplemented with new tools that require new storage and analytical approaches. Aside from blockchain, the technology that could most revolutionize insurance is Artificial Intelligence (AI). AI is a broad term that emcompasses many different types of technologies such as TensorFlow that enable the use of different sorts of algorithms. These algorithms are broadly described as machine learning and include a range of techniques including K-means clustering, random forests, neural networks, support vectors, and more.[101] Possibilities also include computer vision to process images and natural language processing (NLP)[102] to handle text and voice data.
Similar to blockchain, the potential use cases for AI are vast. Data from the IDC Worldwide suggests that global insurance IT spending on cognitive and AI technologies will grow from $205M in 2016 to $1,441M in 2021, a CAGR of 48%. Of that estimated spending in 2021, $119M is expected on hardware, $571M on software and $752M on services.[103] In thinking about the possibilities for AI in the insurance context, consider the following possibilities:
•Better use and greater insights from relational data such as trends hidden in claims data
•Ability to recognize patterns and gain insights from unstructured data including text, voice and images such as drivers of loss dollars based on adjuster field notes and pictures
•Enable new use cases from sensor data from telematics and smart home devices such as shutting off the main water line based on unusual water flow patterns in the house and a suspected leak verified by the homeowner using a mobile app
•Continual learning based on data collected from each and every claim starting with first notice of loss to claims estimate to confirmation of repairs being completed, providing real-time feedback to product managers, actuaries and underwriters
•Consistency in decisions based on analytics, no
t subjective human judgment
•24/7/365 availability of chatbots to handle first notice of loss contacts with consistent service quality that never degrades no matter the time of day
The most immediate use of AI in insurance appears to be better use of existing data sources to gain new insights. Actuaries and data scientists have used predictive algorithms in insurance for a long time but much of their work has been focused or supervised. Put another way, analysts look for ways that certain data inputs lead to one or more data outcomes and apply different statistical techniques in order to determine which inputs are the best predictors based on statistical tests that serve as diagnostic indicators. For example, actuaries hypothesize on what variables are most highly correlated with claims, create a data set with all of the relevant information and use statistics to find causal relationships so that they can charge an appropriate premium based on those factors. Many use cases exist and they are powerful and have led to a great deal of success in insurance. However, any supervised learning approach is inherently limited by the decisions made by data analysts and business experts.
By contrast, an unsupervised approach where the machine itself examines the data and finds the most relevant patterns is not as common today in insurance. One major hurdle that unsupervised learning approaches have faced in the past is the lack of computing resources to effectively run this type of analysis at scale. Another hurdle is a failure of imagination: many actuaries, underwriters, product managers, claims and financial analysts are more comfortable with tried-and-true approaches using supervised learning techniques. In particular, actuarial science has a long history of using a variety of advanced statistical techniques - but none have been used unsupervised learning.