Sharks in the Moat

Page 18

by Phil Martin

Common examples of a bastion host are firewalls, routers, DNS servers, web servers and mail servers.

Metrics

Back in the day before organizational security was a recognized issue and there were no regulations requiring such a thing, the only way for security personnel to convince the people controlling purse strings to support security initiatives was to prey on their fears. This approach was seldom successful and earned security personnel the reputation of being the ‘the sky is falling’ people. It often went something like this – “If we don’t spend $10 thousand dollars on a new firewall, hackers will be able to get in and steal our proprietary data and software anytime they wish and give it to the world, resulting in us going out of business.” Without any proof to back these claims up, the warnings fell on deaf ears. Now, how differently do you think that conversation would have turned out if security had been able to show metrics proving hackers were attempting to get in? Objectivity would have entered the conversation instead of fear-mongering. Not only do proper metrics give us a reason for increasing security, they also allow us to measure how much progress we have made in achieving a more secure state. A key risk indicator, or KRI, is a metric used to measure that progress. While most KRIs are not specific to security, security KRIs must be included in those metrics that an organization watches closely.

Because decisions are made based on KRIs, the quality of those decisions will be directly influenced by the quality of the metrics a KRI is based on. If we have metrics of a sub-par quality, then our decisions will be bad as well. So then, what does a quality metric look like? It turns out they always have five characteristics in-common – each will be consistent, quantitative, objective, relevant and inexpensive.

A consistent metric produces the same result each time it is derived from the same data set. For example, if we gather all logs from the last two weeks and calculate how many breach attempts were recorded, the number from the first calculation should be identical to the number produced from the 20th calculation. Any significant deviation between the two implies there is a factor at play that we don’t understand.

A quantitative metric is precise and expressed in terms of a number rather than a category or level. If we are judging the number of failed authentication attempts in the last two weeks, reporting ‘High’ can mean anything, whereas reporting that 17% of all attempts failed is a number we can seek our teeth into and detect trends.

An objective metric produces the same result regardless of who is doing the collecting. If Fred tells us that 12% of all authentication attempts have failed in the last two weeks but May tells us that it is closer to 20%, then our metric is very subjective. We need to go back and find out why the discrepancy exists before using the metric.

A relevant metric is useful for the purpose for which it is being collected and can be said to be contextually specific. As an example, if a metric reports the percentage of failed authorization attempts, then it is of some value. But, if the metric is even more specific by telling us the number of failed authentication attempts split up by internal and external sources, then we know even more information and can decide if our more present danger is from an insider or a hacker coming from the outside.

Finally, a good metric must be inexpensive to capture. A metric captured through automated means, such as a program sifting through an audit log, will always be cheaper than paying a person to look through a spreadsheet.

Figure 41 lists each quality, and how good and bad metrics compare.

Attribute

Good Metrics

Bad Metrics

Collection

Consistent

Inconsistent

Expressed

Quantitative

Qualitative

Results

Objective

Subjective

Relevance

Contextually Specific

Contextually Irrelevant

Cost

Inexpensive (automated)

Expensive (manual)

Figure 41: Characteristics of Metrics

Auditing

To ensure that monitoring is being properly implemented, and that the results are the best possible, organizations often turn to an external party to come in and perform an audit. An auditor will examine the system, select a set of events to be tested, and watch to see that monitoring properly captures the events and provides effective metrics. While this is a great method to determine how well an organization is complying with regulatory and governance requirements, an audit by itself will do nothing to increase compliance – it can only tell us how complaint we already are. Therefore, it is a detective control, but can be used to uncover insider attacks and fraudulent activities. Audits are very common these days and are often mandated by regulatory requirements. If an audit finds that a company has fallen out of compliance, it can result in some fairly serious consequences.

Some of the specific areas that an auditor normally checks are the following:

Authentication cannot be bypassed.

Rights and privileges are working properly.

Audit trails are being properly generated.

Patches are up-to-date.

Unnecessary ports and services have been disabled.

Data records maintained by different people or teams can be reconciled.

Authorized transactions are accurate and complete.

Physical access to systems containing sensitive data is restricted to authorized people only.

In general, auditing will validate that security policies are being followed, and that CIA is properly implemented.

Incident Management

Whereas monitoring activities are designed to deter and detect attempts to breach security, the reality is that an attacker will eventually get through our defenses. If monitoring is up-to-par, then we should be able to recognize when this happens. At this point, we must switch from a detection mode to a more reactive stance and handle the security breach, or incident. This reactive mode is called incident management and is comprised of the various protocols that kick in, informing employees of the specific steps that will need to be followed.

NIST SP 800-61 “Computer Security Incident Handling Guide” provides some great tips on how to handle incidents efficiently. The very first step to carry out when an incident is detected is to establish if the activity truly represents an ‘incident’. For example, an employee might report that someone has breached physical security, only to find out that the suspicious individual is a package delivery person who inadvertently took a wrong turn. On the other hand, if the incident is real, then the next step is to identify the type of incident we are dealing with. Are we under a DoS attack, experiencing unauthorized access, or perhaps have encountered malicious code?

Once we have determined that a valid incident has occurred, and have determined the type of the incident, we then need to take steps to minimize the potential loss or destruction, followed by whatever actions are necessary to restore the expected levels of service to the business. This will include correction, mitigation, removal and remediation of the weakness. During this time, we hopefully have been following the established protocols and policies regarding the communication and reporting of activities to both internal and external groups.

Now that we have a good overview of incident handling, let’s dive in just a bit deeper in some areas.

Events, Alerts, and Incidents

Each activity that incident management handles can be classified as an event, alert or an incident. An event is any activity that attempts to change the state of an object. This covers just about anything that can be observed within a network, system or software. A server performing an unexpected reboot is an event, as are suspicious logins occurring at 2:30 AM. When an event is detected, it is examined – usually in an automated manner – and compared against known malicious activity patterns. If a match is found, then an alert is generated, which sends a notification to a human or another system requesti
ng that attention be paid to the suspicious activity. If an event has negative consequences, then we say it is an adverse event. If an event violates or might violate security policies, then it is called a security incident. An alert represents a potentially adverse event. Figure 42 illustrates the relationship between the triad of events, alerts and incidents.

Figure 42: Relationships between Events, Alerts and Incidents

Types of Incidents

We can group incidents into five categories – DoS, malicious code, unauthorized access, inappropriate usage and multiple component.

A denial of service, or DoS, incident is the most common type of security event, and acts by preventing an authorized user from access to a network, system or software by exhausting available resources.

Malicious code is represented by software code that infects a host, such as viruses, worms and Trojan Horses. This can result from activities such as a phishing attack, inserting a compromised USB key, or installing infected software.

Unauthorized access is experienced when a person gains physical or logical access to a protected resource without being explicitly granted that right. If credentials are stolen, and a malicious actor uses those credentials to access a sensitive system, then we have a case of unauthorized access.

When a person – usually an employee – violates the acceptable use of system resources or perhaps organizational policies, we have encountered inappropriate usage. For example, extensive viewing of social media sites while on the clock is probably a clear violation of expected employee behavior, and the security team will need to work with HR to correct such a situation. In extreme cases, legal or law enforcement will need to become involved.

When an incident is comprised of two or more ‘sub-incidents’, we have a multiple component incident. A classic example is when an attacker leverages SQL injection to drop a table. In this case we have two separate incidents – a SQL injection vulnerability, and the act of dropping a table – that roll up into a single multiple component incident. Another example might be the installation of a Trojan horse that then opens a backdoor for the attacker to issue further instructions.

Because of the vast array of possibilities, recognizing and categorizing incidents can quickly overwhelm green or inexperienced employees. To help with this, a diagnosis matrix is often helpful that lists categories and associated symptoms, allowing a person to quickly dial down the number of possible categories.

Incident Response Process

When executing a response to incidents, successful organizations often find themselves repeating the same series of four steps, so let’s go over that pattern and see how a proper response is carried out. The four steps are preparation, detection/analysis, containment/eradication/recovery, and post-incident activity, as shown in Figure 43.

Figure 43: Incident Response Steps

Preparation

Before we encounter the first incident, we have to be prepared. During this step we establish a formal incident response plan and implement controls that the initial risk assessment called for. Specifically, we must carry out the following activities:

Establish incident response policies and procedures.

Create an incident response team, or IRT, that will handle incidents as they come up.

Periodically perform a risk assessment to keep risk at or below the acceptable risk level. This has the effect of keeping the number of incidents to a manageable number.

Create an SLA that documents the expected actions and minimum response times for the IRT.

Identify both internal and external people that may need to be contacted for certain incidents.

Assemble all required tools and resources that the IRT will need when acting. Examples include contact lists, network diagrams, backup configurations, forensic software, port lists, security patches, encryption software and monitoring tools.

Carry out training on how to execute the processes as detailed in the incident response plan, or IRP.

Detection and Analysis

The second step in the incident response process is to detect an event, followed by thorough analysis. Detection is normally carried out by manually monitoring activity or by employing an IDS or IPS, but at the root of any approach we will find good logging. If software is not producing the right information in a log in real-time, we have little hope of detecting a situation before it becomes a significant and harmful incident. Since logs hold raw data, log analysis itself can be broken down into four steps – collection, normalization, correlation and visualization. The analysis itself my need to be automated depending on the amount of data being collected. Let’s look at the four steps of log analysis, as illustrated in Figure 44.

Figure 44: The Steps of Detection & Analysis

When collecting data for subsequent analysis, we can leverage several different sources. Network IDS and host IDS logs are useful, as are network access control list logs that capture details anytime someone requests access to a resource. OS logs that capture successful or failed authentication attempts as well as exception conditions can be very useful, especially when correlated with other logs. If software is written correctly, it will be producing its own logs describing how both users and processes are interacting. While database logs are extremely useful, they are hard to generate due to the negative performance impact on a difficult-to-scale resource. Additionally, if impersonation is being used at the tier connecting to the database, the usefulness of the information may be limited since it does not reflect activity tied to an end-user. Logs must be protected against tampering – if we cannot trust what the logs tell us, then what is the point in collecting the data? Integrity can be assured by using hashing values to detect alterations.

After we collect data, almost exclusively through logs, we will need to normalize that data, or parse the log files to extract important details. The use of regular expressions can help tremendously with this step. An important activity during this step is to synchronize log entries among multiple log sources. For example, to create a true picture of activity between 2:00 AM and 3:00 AM, we might need to intersperse log entries from the web server, mid-tier servers, the IDS log and host OS logs into a single, sequential list of events and activities. If the time stamps in each log is not synchronized, then we will not have an accurate picture of who did what and when, in the order that it actually occurred.

The third step in analyzing logs is to correlate log activity, or to deduce from log activities a real threat or the presence of a threat agent. For example, if suspicious data was entered into a text field, we would then need to look for an error condition that could tip off a SQL injection vulnerability to an attacker. Or perhaps we noticed a large number of failed authentication attempts and we will need to tie all to a single source before concluding that an actual threat exists. This step does two things for us – it allows us to recognize patterns and then helps to categorize the attack. This step can be manually intensive, so the frequency of such efforts must match the value of the data being protected. We would carry out correlation activities more often for a system containing PCI cardholder data than for a system hosting blogs.

The fourth and final step when performing log analysis is to visualize the correlated data. The point of this step is to turn reams of potentially useful – and perhaps useless – data into something that human brains can easily grasp. While there is nothing like the human mind to detect patterns, this will only work if we first eliminate the noise.

Now that we have discussed how to effectively carry out a log analysis, lets return to the larger conversation around the second step in an incident response process - detection and analysis. There are quite a few activities we will need to carry out if we want to capture useful data on which to make decisions, as shown in Figure 45.

Figure 45: Seven Activities for Detection & Analysis Visualization Step

If possible, we will want to implement automated monitoring software or use devices such as IDS and IPS appliances. If this sour
ce generates notifications, pay attention to them. If the number of alerts becomes too large, then tune the monitoring software to decrease the volume instead of ignoring it! For manual reporting processes when something of interest is found, ensure a phone number or email is widely known to both internal and external people.

Most logging capabilities will have some type of ’verbosity’ setting that dials the amount of logged data up or down. This setting should reflect the sensitivity of the system. For example, the default for all systems might be set to ‘Information Only’ while systems processing PCI cardholder data might be set to ‘Detailed’.

If using multiple sources for logging – and let’s face it, if you do not have multiple sources then you probably are not generating enough logs – then it is very helpful to implement some type of centralized log collection capability. If a centralized capture is not possible (they are very expensive, after all) it is important to ensure all host clocks are synchronized properly to ensure a single list of events in chronological order can be generated.

‹ Prev Next ›