by Phil Martin
If a threat is being accepted, then the residual risk has been determined and formally accepted by the business owner.
All controls have been mapped to each threat they will address.
Anytime the scope or attributes of a software application changes, the threat model should be revalidated.
Data Classification
At its core, data classification is simply the act of assigning a level of sensitivity, called a label, to each piece of data in an organization. The level is based on the resulting impact to the organization if the data were to be disclosed, altered or lost. NIST SP 800-18 is a good source to help in classifying information based on CIA, resulting in a categorization of high, medium and low.
The first thing we need to do is to classify information by its criticality and sensitivity. By doing this, we can apply protective measures in proportion to the information’s business value instead of making wild guesses. If done properly, most organizations discover that the vast majority of data is neither critical nor sensitive, meaning we have just saved ourselves a huge amount of money by not protecting those assets. Keep that in mind when people complain about the time and effort required for proper classification!
The protection of information assets must be prioritized if the organization is budget constrained, which is almost always the case. By attaching value to the various assets, it becomes much easier to figure out which should be protected if we can only cover a portion of all valued assets. Arriving at a monetary value can be difficult though – we can choose to use the cost of creating or replacing the information, or how much damage would result if the information were to be leaked. In some cases there simply is no way to arrive at a value since the loss would completely devastate the company, such as trade secrets. A useful approach is to create levels of value, with the lowest being very little value and the highest level representing those assets that are priceless. Assets that have no discernable owner, and there is no evidence of the information being used for a period of time would be assigned the lowest value.
A second approach that might be much easier to execute is to define critical business processes and figure out what information and physical assets are required for those processes. By tying assets to revenue generation, it becomes clear what is important and what is not.
But, those two approaches are designed to identify value, or criticality. The other attribute that is important is the sensitivity of information. Whereas criticality is concerned with the impact if we lose control of information, sensitivity is concerned with the impact if we accidentally disclose information - in this case we have not lost control of the information, but others are now aware of facts that can then be used to damage the organization. In this case the data owner is the best person to identify the classification level.
Figure 106: Sample Data Classification Labeling Scheme
Data classification is an absolute necessity if we are going to be serious about security governance, because it prevents us from over-protecting low-value assets and under-protecting high-value assets. The longer we wait to perform classification, the harder it becomes over time. However, we must be sure not to over rotate and declare that everything is of high-value – this often happens in a high-blame environment where no one wants to be charged with not protecting information properly. A useful way to combat this is for IT to charge business units for storage of higher-classified information – hitting the pocketbook always makes people think twice. One last item that must be addressed if we hope to become a security-conscious organization is to ensure that all assets have a defined owner and accountability – a RACI matrix is a great tool for making this happen.
A data classification strategy will give us the needed framework on which to create a road map, which are the series of steps to be executed to implement our strategy. A good security strategy will mitigate risk while supporting business goals, as well as showing us how we will embed good security practices into every area of the business.
The main objective of data classification is to lower the cost of data protection and maximize the return on investment when data is protected. This can be accomplished by implementing only the needed levels of security controls on data assets based on their categorization. In other words, security controls must be commensurate with the classification level. For example, there is no point to encrypting data or information that is to be publicly disclosed or implementing full-fledged load balancing and redundancy controls for data that has a very limited adverse effect on organizational operations, assets or individuals. In addition to lowering the cost of data protections, and maximizing ROI, data classification can also assist in increasing the quality of risk-based decisions. Since the data quality and characteristics are known upon classification, decisions that are made to protect them can also be made appropriately. Figure 106 illustrates a sample flow designed to help classify data.
Only the data owner, or business owner, should be allowed to make decisions on data classifications, not the IT department. In general, the data owner is responsible for the following activities:
Properly classifying information assets.
Validating that the appropriate security controls have been implemented according to classification.
Defining the lists of users and access criteria.
Ensuring that IT is backing the data up properly.
The data owner may optionally delegate these responsibilities to a data custodian. While a custodian may carry out the duties, the data owner is still ultimately responsible for ensuring all activities are executed properly.
Data lifecycle management, or DLM, manages data assets based on attributes such as file types and age. On the other hand, information lifecycle management, or ILM, focuses on the contents of files. There is a tendency to see DLM as a product, but in reality, it is a process based on policies. So, you will not find a DLM product, but you might find many products that help with DLM.
Data classification is usually the first step when executing DLM. A common type of storage solution for DLM is the use of hierarchical storage management, or HSM, which is made up of different types of storage media including RAID drives, solid state drives, optical storage and tapes. The most-frequently accessed data will be stored on the fastest types of media, such as RAID. Less-important or infrequently-accessed data assets will be stored on slower and less expensive media such as optical disks or tapes. From a security perspective, we need to understand that optical disks and tapes are removable media, and therefore present a greater risk of theft.
Regulations, Privacy and Compliance
As a result of ever-growing data breaches, the federal government has become increasingly involved in creating legislation requiring companies to up their security ‘foo’ to a minimal level. The cost of non-compliance, coupled with the natural loss of public trust, has made organizations pay attention to security at a level that has never before been seen.
Three types of personal data have been defined and are addressed by various laws and regulations. They are:
Personal Health Information, or PHI, which includes any data that describes the current or historical health of an individual.
Personally Identifiable Information, or PII, which is any combination of data that could be used to uniquely identify a single individual.
Personal Financial Information, or PFI, which reflects the current or historical financial state of an individual.
Significant Regulations and Privacy Acts
The following sections are not intended to be comprehensive, but instead provide just enough information for you to know when it is applicable to your situation. If a given law is applicable, you will need to dive in further to figure out specific requirements for your organization.
There are some significant challenges when attempting to obey the referenced laws. Most regulations are not very specific, leaving organizations to interpret requirements on their own. As a result, an auditor must rely on his or her own expe
rience when interpreting each law or regulation. Further complications arise when dealing with multiple jurisdictions. Laws coming from Europe are almost always more stringent than U.S. or Asian laws, yet data will need to flow across international boundaries and may be subject to different requirements based on physical location. This is very difficult to manage when an application rides on top of the Internet, which is purposefully not tied to geographic locations.
Sarbanes-Oxley Act (SOX)
The Sarbanes-Oxley Act, or SOX, was enacted in 2002 to address a series of spectacular fraud activities from companies such as Enron, Tyco International and WorldCom. In each case executives conspired to defraud stockholders and used loopholes to hide their activities. SOX improves quality and transparency in financial reporting by requiring independent audits for publicly-held companies. SOX consists of 11 titles that mandate specific requirements for financial reporting. Two sections within SOX have become the most prominent. Section 302 covers corporate responsibility for financial controls, while Section 404 deals with management’s assessment of internal controls. SOX requires that the strength of internal controls be assessed, and a report generated describing the adequacy and effectiveness of the controls.
BASEL II
BASEL II is the name for the European Financial Regulatory Act designed to protect against risk and fraud in financial operations. Because it was designed to be an international standard, it can impact U.S. banks.
Gramm-Leach-Bliley Act (GLBA)
The Gramm-Leach-Bliley Act, or GLBA, is designed to protect PFI (private financial information) contained within financial institutions such as banks and credit unions. Also known as the Financial Modernization Act of 1999, it has three primary components. A financial privacy rule governs the collection and disclosure of PFI for both financial and non-financial companies. A safeguards rule, which applies only to financial institutions, mandates the design, implementation and maintenance of safeguards to protect customer information. Pretexting provisions provide consumer protection from companies and individuals who falsely pretend, or pretext, to have a need to access the consumer’s PFI.
Health Insurance Portability and Accountability Act (HIPAA)
The Health Insurance Portability and Accountability Act, or HIPAA, is structured very similar to GLBA, but is targeted for PHI, or personal health information.
Data Protection Act
The Data Protection Act of 1998 is concerned with PII, or personally identifiable information. The European Union Personal Data Protection Directive, or EUDPD, and Canada’s version called the Personal Information Protection and Electronics Document Act, or PIPEDA, for the most part offer the same coverage as the Data Protection Act. In essence, the acts state that any PII collected must be deleted if the original need is no longer applicable. As a result, software must be designed and implemented with deletion or de-identification mechanisms.
Computer Misuse Act
The Computer Misuse Act puts into law what should already be common sense – don’t use a computer in an unauthorized or criminal manner. Specifically, this law defines activities such as hacking, unauthorized access, unauthorized modification of content and planting of viruses as criminal offenses.
Mobile Device Privacy Act
The Mobile Device Privacy Act was a bill introduced in 2012 that was never passed. It required mobile device sellers, manufacturers, service providers and app authors to disclose the existence of any monitoring software. Since 2012 numerous instances of parties abusing the average consumer have been reported, so it is likely this bill will be resurrected at some point. The takeaway for us is to keep this in mind as we design and implement software.
State Security Breach Laws
As of the date of this edition, no federal law has been passed requiring notification on a security breach discovery. However, every state in the union does have at least one law on the books. Given that California almost always has the most stringent laws, we should take special note of the California civil code 1798.82, commonly called State Bill 1386. This law affirms the need to delete PII when it is no longer needed, but more importantly requires any company doing business in the state of California to notify the owners of PII if their data has been breached, or even if the company reasonably believed the data has been accessed in an unauthorized manner.
Privacy and Software Development
In this section we are going to go over some basic principles and guidelines to help you implement a proper level of privacy in your software. Just because we implement security well does not necessarily mean that we have sufficiently protected privacy. In fact, we could even implement confidentiality in every way we can think of and still miss some crucial privacy element because we are not up to date on laws and regulations. Privacy should absolutely be thought about as its own security issue that must be directly addressed. This is why it is so important to understand which laws, regulations or standards apply before the design phase starts.
One of the best approaches to jump-start the privacy initiative is to carry out data classification, which helps to identify the data elements requiring privacy protection. By categorizing data into privacy tiers based on the business impact if the information is disclosed, altered or destroyed, we can be confident that the proper level of controls can be identified.
Beyond that, the following check list is crucial to complete if we hope to have an effective privacy program. First, establish a privacy policy that is enforceable. Then gain the support of executives and top-level management. Finally, educate people on privacy requirements and controls.
Based on the various laws, regulations and best practices, we can break down the high-level privacy requirements into five distinct rules:
If you don’t need it, don’t collect it.
If you need it, inform the user before collecting it.
If you need it for processing only, don’t persist it.
If you have to persist it, keep it only as long as the retention policy states and then delete it.
Don’t archive it unless there is an explicit retention requirement.
The best place to solicit user agreement to data collection is in the Acceptable Use Policy, or AUP, and the login splash screen and banners displayed during the login process. However, AUPs must be complimentary to information security policies, not contradictory.
Now let’s go over some time-tested techniques to protect privacy when developing software.
Data Anonymization
When importing production data into a test environment, PII must be protected or de-identified, which alters PII so that it can no longer be traced to a specific individual. Data anonymization is the act of removing private information from data, thereby removing the ability to link the data to any one individual. We can achieve this using four different methods – replacement, suppression, generalization and perturbation.
Replacement, also sometimes called substitution, replaces identifiable information with data that cannot be traced back to an individual. For example, we can replace the primary account number of a credit cardholder with dummy data.
Suppression, sometimes called omission, removes identifiable information from the data. Instead of storing all digits for a credit card account number, we retain only the last four digits.
Generalization replaces sensitive data with a more general form that cannot be traced back to an individual. An example of this approach is to replace birth dates with the year only, using ‘01/01’ as the month and day.
Perturbation, sometimes called randomization, randomizes the contents of a given field, rendering it unrecognizable.
For anonymization to be considered successful, unlinkability must be achieved. In other words, the provider of the information cannot be identified. However, successful data anonymization does not necessarily guarantee total privacy protection due to the possibility of an inference attack. This attack aggregates and correlates information to deduce relationships between data elements. While an
onymization ensures individual fields cannot be traced back to an individual, it does not promise that multiple fields used together will not leak privacy information, as is evident in an aggregation attack.
Disposition
Any software and the data it contains is vulnerable until it is completely disposed of. The use of PII simply increases this danger. Disposal does not stop when we decide to logically delete data from a persisted storage mechanism. Hard drives can contain sensitive information long after an application thinks that it has been destroyed. Most privacy regulations require the sanitization of hardware before it can be reused. Although we have previously covered this information, let’s revisit it again.
Sanitization for electronic media includes, in increasing levels of security:
Overwriting, sometimes called formatting or clearing, which uses software or hardware products to format media.
Degaussing, sometimes called purging, which exposes the media to a strong magnetic field designed to disrupt the recorded data.
Destruction, which is the act of physically disintegrating, pulverizing,
melting, incinerating or shredding the storage device.
Some privacy regulations require that the people involved in disposal activities receive the proper training and follow the appropriate disposal procedures.