Sharks in the Moat

Page 8

by Phil Martin

Figure 31: The Iron Triangle

However, the belief that proper security will delay the final delivery or create an inferior product because we cannot work on the “really important stuff” is extremely short-sighted and incorrect. We will discuss why that is later.

Figure 32: Relative cost of fixing code issues at different stages of the SDLC

‘Security as an afterthought’ is the second reason that proper security is seldom implemented correctly. In this case, the value of security is not as apparent as building in just one more feature might be, and the thought is ‘We can just tack it on at the end if we really need it”. The fallacy here is the belief that adding it later will not increase the cost to the project by ‘much’. Unfortunately for that poor project, Figure 32 illustrates the real cost of delaying implementation. Some studies estimate that is 100 times costlier to implement a feature after the product is already in production than if the team had simply taken the time upfront to design it in.

The final reason that security is often not implemented is the belief that it will negatively impact usability. For example, let’s say that 2 requirements are added to the project in the name of security:

1) All access requests should be authorized and logged

2) All passwords must be 15 characters long

The product is rolled out, only to discover that it performs too slowly because of the extreme amount of access validation and logging.

Additionally, security discovers that users are writing their passwords down on sticky notes and

leaving them on their monitors because they are too difficult to remember. As a result, the entire rollout fails, and the product is sent back to remove those security features.

In this scenario, no one thought to check something called psychological acceptability before coding started. While we will cover this term in more detail later, it balances the need for security with usability and performance.

Of course, we need to be realistic here – just because we have implemented a slew of security features does not necessarily make a product ‘secure’. For example, we can implement strong authentication methods and ensure that our code creates an audit trail, but if we forget to turn on these features in the production environment, they do us little good. Or, just because a vendor claims their product is secure doesn’t mean it is. ‘Trust but verify’ should always be our mantra by using the quality assurance process to ensure security really has been delivered. This applies whether we build our own software or purchase it from an external third-party.

Chapter 9: A Good Security Profile

Now that we’ve set the stage for holistic security, let’s talk about where we’re going next. There are a number of secure concepts that any application – regardless of the platform or programming language – should implement. If you’re a developer, you are probably already envisioning code snippets and unit tests. Likewise, QA folks might already be dreaming of test cases. However, we must broaden our thought processes to encompass four different stages of the software lifecycle – requirements, design, implementation and testing. Each stage is just as important as any other. As an example, consider confidentiality. We can design great confidentiality controls, but if we never implement them, what good does it do us? If we turn that around, trying to implement confidentiality controls without a thorough treatment in the requirements stage will leave us with numerous security gaps.

We can divide the security concepts into two groups, as shown in Figure 33. Core represents the most important features that are often stand-alone feature sets, while Design are more patterns that should be followed along the way. All of them must be understood and followed if a truly secure application is going to be created. Within each concept, we will discuss how to approach it within the four stages from requirements through testing.

Figure 33: Security Concepts

If you take the first letter from the top row of core concepts, you wind up with ‘CIA’. The acronym CIA is a well-known term in the security industry that stands for confidentiality, integrity and availability. My understanding is that the FBI, NSA and DHS are all very jealous of this fact. We will reference CIA many times throughout this book, so remember what it stands for.

Now, if you are a project manager or product owner, you might be tempted to skip some of these details. And, granted there is a lot of the information that is going to be very in-depth. But it is important that you have a grasp of these concepts, and that is why they are presented at

the front of this book in a section that everyone should read, regardless of their role.

While we will cover all of the security subjects in this section, you will also encounter the occasional side trip to a related topic. These tangents will be necessary to lay some groundwork required to grasp some of the core security concepts.

Let’s get to it!

Chapter 10: Confidentiality

Imagine private messaging a Facebook friend with some juicy details about a mutual acquaintance, only to discover later that the message was broadcast on your timeline for all to see – including the acquaintance’s. Besides being extremely embarrassed, you would also be the victim of the loss of confidentiality, because something that was supposed to remain a secret no longer was. In short, confidentiality prevents the unauthorized disclosure of information.

A loss of confidentiality can result from either purposeful or accidental causes. For example, your friend might have reposted your message on purpose so that everyone could see it. Or perhaps Facebook had a bug in their software which accidentally revealed your private messages. Just such a thing was reported to have happened with Facebook back in 2012, but it turned out to be a misconception of how timelines worked. Regardless of why it happened, a real disclosure could result in fines, a loss of reputation, law suits or any number of bad things.

Confidentiality protects much more than just your private Facebook musings though. Identity theft can almost always be traced backed to a loss of confidentiality due to data breaches. Those breaches can in turn be tied directly to a lack of the proper protection mechanisms. Imagine if you logged into your banking account and saw someone else’s checking activity. You can bet someone would be fired over such a gross negligence of confidentiality protection.

Of course, don’t make the mistake of thinking in terms of the Internet only. If someone sneaks a peek at a business proposal printed out on paper, confidentiality has been lost. Or perhaps someone overhears a conversation in a restaurant about your plans to ask someone out on a date. Your confidential plans are no longer a secret, are they?

But this book is interested in software, not the personal details of your love life. So, let’s talk about how to write requirements that will ensure confidentiality of a software system’s information.

The very first thing we need to do is to establish what information needs to be protected. Just because data happens to be in our database doesn’t mean we need to go to any great lengths to keep it from leaking out to the general public. For example, our website that calculates the market value of used cars will most certainly contain a list of vehicle makes and models for a user to choose from. But that list is pretty much available to everyone in the world, so why bother keeping it a secret? As a rule, any data that is publicly available requires no confidential protection. Now keep in mind that confidentiality protects data from being disclosed – it says nothing about protecting data from being altered or lost. We will cover that later when discussing integrity and availability. But if people can freely access information somewhere else than our own data repository, it probably needs no confidentiality protection at all.

When it comes right down to it, we can group all digital data into two categories – structured and unstructured. Data contained in a database with tables, rows and columns, is very orderly and can be easily searched. Databases are great examples of structured data. Other digital data that does not follow a predefined format or organization is said to be unstructu
red data. Think of an XML file – XML follows some very specific rules on formatting and how elements are arranged, but unless the data within an XML file is arranged in a well-known manner, it is difficult to parse through and recognize data fields. As far as we can tell, it’s just a jumble of nicely-formatted ‘stuff’. But the moment we inject some sort of reliable mapping – say an XML schema that goes with the XML data – it becomes very structured. Examples of unstructured data include images, videos, emails, documents and text because they do not all follow a common format.

Now, why do we care if data is structured or unstructured? Because with structured data we can apply different classifications to data within the entire ‘container’, but if something is unstructured we have to apply the same classification to everything within the container. As an example, we might label the ‘Social Security Number’ column in a table as containing private data but label the ‘Department’ column in the same table as public data. We can do that because all rows in a table have the exact same columns – they all have the same structure.

Data classification is the act of assigning a label, or level of sensitivity, to all information assets. Sensitivity is a measurement of how impactful it would be to an organization if a specific data asset were to be disclosed, altered or destroyed. Although we haven’t covered the topics of Integrity or Availability yet, those terms map directly to the three attributes we just mentioned. In other words, the following is true:

Confidentiality protects information from being incorrectly disclosed

Integrity protects information from being incorrectly altered

Availability protects information from being incorrectly destroyed

All three concepts are used to arrive at a classification label, which is then used to divide data assets into separate buckets. Since we need to really understand all three concepts (confidentiality, integrity and availability) before diving deeper into classification, let’s put that topic on hold until later and get back to confidentiality.

Figure 34: Confidentiality Protection Mechanisms

We had mentioned that public data - sometimes called directory information - needs no confidential protection, but what about non-public data? Private data by definition will always require some type of confidentiality protection mechanism, so let’s go over the possibilities. Figure 34 shows the various mechanisms we are about to cover. At a high-level, we can choose one of two directions – masking the data or disguising it in secret writing.

Masking is the weaker form of the two options and is carried out by asterisking or X’ing out the information needing to be hidden. You have probably seen this happen a lot with password fields – as you type in a character, an asterisk is shown instead of the character typed in. This approach is primarily used to protect against shoulder surfing attacks, which are characterized by someone looking over another person’s shoulder and viewing sensitive information. Other masking examples include hiding credit card numbers or social security numbers, except for the last four digits when printed on receipts or displayed on a screen.

Secret writing is a much better solution and can be broken down into two types – covert and overt. The goals of each are quite different. Covert writing results in the data being hidden within some type of media or form with the intent that only the intended recipient will notice the data or message. With this approach confidentiality is wholly dependent on the information remaining undiscovered until it is ‘supposed’ to be discovered. Overt writing mechanisms don’t worry about being discovered, and in fact make no effort to conceal their existence. Overt confidentiality is maintained by the fact that only the intended recipient has the capability to decipher the secret message.

The most basic forms of covert secret writing are steganography and digital watermarking. Steganography is more commonly referred to as invisible ink writing and is the art of camouflaging or hiding some form of writing in a static medium such as a painting or seemingly innocent paragraph. This approach is commonly found in military espionage communications. Digital watermarking is the process of embedding information into a digital signal in the form of audio, video, or image. Digital watermarking can be carried out in two ways - visible and invisible. In visible watermarking, there is no special effort to conceal the information and it is visible to plain sight. If you have ever downloaded a copyright-protected image, it will often be overlaid with some type of branding so that you cannot simply take the image and use it for your own purposes. This does not really protect confidentiality and is more akin to copyright protection. However, invisible watermarking conceals the message within other media and the watermark can be used to uniquely identify the originator of the signal. This allows us to use digital watermarking not only for confidentiality but for authentication purposes as well. Invisible watermarking is also mostly used for copyright protection, to deter and prevent someone from making an unauthorized copy of the digital media. As we mentioned, covert secret writing is really not all that useful when it comes to confidentiality.

Much more suited to our confidentiality purposes is overt secret writing. Sometimes called cryptography, this approach includes both encryption and hashing functions. We’ll dive in pretty deep with both of these topics later, but let’s get at least a passing familiarity with them now. The idea behind overt secret writing is to use a well-known algorithm to transform the secret into a form that cannot be traced back to the original secret. A key point to make here is that the algorithm used to transform the secret is well-known, and there is no need to hide the resulting ciphertext, or the string of bits resulting from encoding the original secret message. Covert methods require the message to remain hidden, while overt methods (encryption and hashing) put the ciphertext right out there in plain sight and dare you to figure out the original message.

Encryption requires a key – which is simply a string of 0s and 1s – to render the plain text (the original secret message) completely unreadable as ciphertext (the result of encryption). We won’t go into the magic of how that happens right now, but the important thing to realize is that whoever has the key used to encrypt plain text into ciphertext can use the key to decrypt the ciphertext back into the original plain text.

At this point encryption and hashing take two very different paths and solve separate problems, both related to confidentiality. The primary difference between encryption and hashing is one of reversibility – any text that has been encrypted can be decrypted to get the original text. But hashing cannot be reversed – once the ciphertext has been created, there is no way to figure out what the original message was. At first hashing may seem like a useless exercise – after all, what is the point of transforming a secret message if you can’t reverse it to see the original text? Well, it turns out that while hashing has no value in getting a secret message to someone else, it’s a great way to detect if a message has been altered. We’ll leave the rest of that discussion for later when we get into Integrity.

However, there is another great use for hashing that does apply directly to confidentiality – storing passwords in a database. If we save user passwords in a database in clear text, they can be easily stolen if an attacker gets his hands on the database. We can make it much harder for the attacker if we were to encrypt the password before saving it. However, if we store the encryption key in the database with the encrypted password, the hacker will simply chuckle at our poor attempt to make his life more difficult. If we store the encryption key somewhere else – say in a configuration file – it may cause the attacker some heartburn, but a determined hacker will still get to the original password.

But, if we hash the password, the attacker’s life suddenly gets a lot tougher. One of the nuances of hashing is that it produces the exact same ciphertext – called the hash value or message digest – every time as long as the plain text remains the same. So, instead of storing the password in clear text or an encrypted form of the password that might be stolen and decrypted, we store the computed hash of i
t, which makes it impossible to know what the original password was. Now, all we need to do is to have a user give us their password, hash it using the same method we used to hash the password stored in the database, and see if the results are the exact same. If they match, then we know the user gave us their correct password!

To this point, we have covered the need for data classification and the mechanisms surrounding confidentiality. But we’ve really only considered data that is ‘at-rest’, or data that has been persisted to some type of permanent storage such as a database. Confidentiality should also cover data that is ‘in-use’ and ‘in-transit’. Here is how we define each of those three terms:

At-Rest data has been persisted to permanent storage, such as a database or an archive

In-Transit data is being transmitted over networks, and is also called data-in-motion

In-Use data is held in computer memory for processing

To ensure true security for at-rest data, we need to state confidentiality requirements for the data from the moment it is created to the point at which it is either transferred out of our control or is destroyed. For example, some data might be considered to be ‘top secret’ until a certain point in time at which it becomes public knowledge and no longer needs to be protected. This happens quite often with classified government secrets that move to the public domain in response to a Freedom of Information Request – the data is still retained but the confidentiality needs change drastically. The best way to handle this unavoidable behavior is to protect data according to its classification, and not based on the data itself. Therefore, when data moves from ‘top secret’ to ‘unclassified’, the appropriate mechanisms automatically kick in.

‹ Prev Next ›