Sharks in the Moat

Home > Other > Sharks in the Moat > Page 12
Sharks in the Moat Page 12

by Phil Martin


  Chapter 18: Accountability

  In my early days I was in charge of managing an SaaS application that allowed customer service representatives to login and manage customer accounts. The application properly implemented an authorization scheme restricting which users could create new user accounts. At some point, a customer rep apparently got tired of having to create new user accounts and started giving customers the right to create their own account, which directly violated multiple policies. Once this extreme laziness was discovered, the CTO called me up and asked a perfectly legitimate question – “Who was this culprit?” Unfortunately, I was unable to tell him, because the application did not properly implement an audit trail to show who granted that access. In short, the concept of accountability had never been enforced, and I was never able to positively identify who did what, and when. Since that time, I have been a major audit-trail Nazi.

  Auditing is a security concept in which privileged or critical activities and events are logged and tracked. This has two primary purposes. First, auditing allows us to reconstruct past scenarios that resulted in a defect or outage, and secondly, it allows us to identify people or processes that took an incorrect action and hold them accountable for it. Auditing is a passive detection control, meaning that it monitors activity and identifies issues after the fact.

  Audit information, however, must be recorded and persisted in real-time as activities happen, and we call this an audit trail. This can mean the data is written to a log file, sent to a database, or transferred to another system across the network that takes care of the rest. It does not necessarily imply a real-time analysis of the data, although that can be implemented if absolutely required.

  Now, what information should we include in an audit trail? At a minimum, we must capture the who, what, where, when and the delta. The ‘who’ is the identity of the person or process carrying out the action. The ‘what’ is the type of operation being executed, such as create, read, update or delete (CRUD). The ‘where’ represents the object being operated on. The ‘when’ is the date and time, down to the millisecond, at which the operation was performed, and the ‘delta’ represents the changes that resulted.

  As an example, suppose a user decided to change the online price of a widget. The audit entry would look something like this:

  Who: ‘[email protected]

  What: ‘Update’

  Where: ’Item, ID: 293001’

  When: ‘07/01/2021 11:14.2930’

  Delta: ‘‘Item price (was $15.80, now $21.50)’

  The audit trail itself must be protected from modification as well – otherwise how can we trust what it claims happened? New logs must never overwrite older logs unless a policy explicitly states this is allowed, and that policy should be based on regulatory requirements if applicable. If there is ever a conflict between an organizational policy and regulatory requirements, the regulatory requirements must always be followed. Unfortunately, retention policies often significantly increase storage space needs, and this eventuality should be planned for.

  While we previously said that auditing is a passive detection control, it can act as a deterrent control as well. If users know their actions are being audited, then it could deter them from carrying out a malicious or fraudulent activity. Of course, this depends on the users having knowledge of the activity, so logon banners should mention this capability.

  Properly implementing an audit trail is not a trivial task – if not done properly the result can range from useless log entries to bringing down the entire application when traffic increases. One of the biggest hurdles to overcome is to implement a full-featured audit trail without decreasing performance to an unacceptable level. You should just go ahead and accept that performance will be negatively impacted. But if done properly, it should not result in more than a 10% decrease in performance. This will often necessitate using some type of asynchronous persistence, such that the code simply gathers the information, fires it off to another process and continues, with the external process taking on the burden of persistence in real-time.

  Another challenge with logging is that too much information might be captured to allow an effective and streamlined review process to take place. To eliminate this problem, all log entries should be categorized so that we can later filter based on the categorization. For example, categories might include ‘Critical’, ‘Security’, Administrative’, ‘Information’ and ‘Debug’. Of course, the last thing you want is for a system to routinely log ‘Debug’ entries, so it should have some type of capability to turn logging levels on and off in real-time in a production environment without having to bring the system down. However, the capability to adjust logging levels must be secured, as it could allow an attacker to turn off logging while carrying out nefarious actions, and then simply re-enable them to hide the fact that logging was disabled for a time. This would directly impact both accountability and non-repudiation.

  Developers have a nasty habit of logging sensitive information in an audit trail, such as the password value used in unsuccessful authentication attempts. I have even seen the actual password logged for successful authentication attempts! Audit trails must be examined to ensure that this type of information is not captured, or if it is that the actual values are masked or hashed to prevent an attacker from accessing the information from a log file.

  We have already discussed the minimal information that should be captured for each audit entry – the who, what, when, where and delta. However, each system and business is unique, so the requirements must be discussed with business managers before finalizing. Furthermore, which operations or activities should be logged will most certainly depend on the nature of the application and should be explicitly spelled out in requirements. Good examples of activities to log might be anytime a price is changed, a discount is applied by a sales agent, or when a customer changes their banking information. The requirements team may very well have to elicit audit trail requirements from the business owner. Let’s look at some good examples of accountability requirements:

  “All unsuccessful logon attempts will be logged with the timestamp and originating IP address.”

  “When an item’s price is changed, an audit trail entry should be created showing the before and after price.”

  “Audit entries should always append and never overwrite previous entries.”

  “Audit entries must be retained for 5 years.”

  I recall that at one time I took over a very problematic application that suffered from frequent outages. In the first few weeks of leading the development team, I discussed the issues with the architect and engineering managers and made a point of asking how well logging was performing. After receiving a very rosy picture that the application had a deep level of logging implemented, we moved on. During the next outage, I asked the architect to describe the root cause as indicated from the logs. It was at that point I discovered there was absolutely no logging around the problem area. I then realized that logging was only being applied in a given area after it had a pattern of causing outages. The development team had never properly written an audit trail, and instead simply slapped some code around when they felt it was needed. This is a prime example of how NOT to implement an audit trail. After three months of re-architecting the solution and applying proper logging at all levels, we finally had a stable product.

  The lesson here is that a properly implemented audit trail capability has the side-effect of increasing stability and quality because it can force the proper architecture of an application. The inverse is just as applicable – not building in a good logging and audit trail from the beginning can have some serious negative side-effects. Aside from encouraging a good architecture, auditing is crucial for forensic purposes. In fact, the design must provide for us to prove beyond doubt that log entries have not been tampered with for them to be admissible in court. In fact, the design should include hashing of entire log files so that the hash value can be computed later to validate the
information has not been changed. Some highly-secure applications even hash each individual log entry and include the hash for the previous entry in the entry about to be added. While fairly complex to implement, this approach can show exactly which entry was modified, and is even able to prove that entries were removed.

  The design team must not forget that there are two types of users – people and processes. Background activities are often overlooked when writing requirements and designing the audit capabilities. It is also far better to be overly aggressive in applying logging and ensuring it can be turned off if needed. As my story pointed out, it is far more difficult and dangerous to try and add logging after the fact than simply building it in

  from the beginning.

  Since we will need to ensure logs are not overwritten, capacity constraints need to be considered including how to estimate the amount of storage space required. Care should be taken to not contradict external regulatory or internal retention requirements.

  We’ve already discussed the danger of logging sensitive information such as passwords. To drive the point home, consider a scenario in which we log the password entered by the user for failed authentication attempts. If a user typed in ‘Hungryliom123’ and we record it in clear text, an attacker will very easily figure out that the correct password is ‘Hungrylion123’.

  When logging identifies a user by their identity, we should not simply accept whatever identity was typed in by the user. Instead, once authentication has been successfully completed, all subsequent log entries should use the authenticated principal name and system timestamp. For example, in an IIS web application we can log the value provided by the Request.ServerVariables[LOGON_USER]. In SQL Server we can use the getDate () function to access the date and time as the server sees it instead of that value being passed from another tier.

  Non-repudiation allows us to definitively state that a unique person or process was responsible for a specific action. The interesting thing about this concept is that it does not require us to implement anything special by itself – the ability to provide non-repudiation is a result of properly implementing two other core security concepts – identification as carried out by authentication, and the existence of an audit trail as carried out by accountability. When we can look at an audit trail and see that a unique identity carried out a specific action, we have achieved non-repudiation. The person or process then has a limited ability to claim they did not carry out the action because we have definitive proof in the audit trail that it happened as described by the audit entry. To be able to make this claim, we should have previously performed the following steps:

  1) Test the code that authenticates to ensure it functions properly and has the correct level of security precautions implemented.

  2) Test the code producing the audit trail to ensure it produces accurate results.

  Now, notice we chose our words very carefully. We did not say ‘they cannot claim they did not carry out the action’, but instead we said, ‘they have a limited ability to claim they did not carry out the action’. Since there is no such thing as an un-hackable system, it is always possible that a bad actor authenticated as a valid user and carried out the operation. Therefore, a modicum of common sense must be applied.

  Since non-repudiation is a result of properly designing authentication and accountability, there are no specific design topics that we need to cover.

  Chapter 19: Least Privilege

  The principle of least privilege states that a person or process should be given only the minimum permissions for the minimum amount of time necessary to complete a given operation. This can be applied at two levels – when granting permissions to an end-user of a system, and when granting permissions to processes within the system itself. The goal of this concept is to minimize the damage that could result from an accidental or intentional security breach. Examples of least privilege include the military rule of a ‘need-to-know’ clearance level, modular programming and non-administrative accounts.

  The military security rule of need-to-know limits the disclosure of information to only those who have been authorized to access the information. Best practices suggest it is better to have many administrators with limited access instead of creating a single ‘super user’ account. This also aids in applying the separation of duties principle.

  Modular programming is a software design technique in which smaller modules are created with a high degree of cohesion. Good software engineering design will emphasize modules that are highly cohesive and loosely coupled at the same time. This encourages reuse and increases readability and maintainability. We will be covering the concepts of cohesiveness and coupling in just a bit.

  The use of non-administrative accounts encourages the use of least privilege. For example, many databases have a built-in account with super user privileges, often named something like ‘sysadmin’. It is very dangerous for server applications to log in using these credentials as an attacker can easily perform destructive operations such as dropping tables or creating other database user accounts. Instead, creating and using accounts such as ‘datareader’ and ‘datawriter’ will greatly reduce the ability for an attacker to cause damage.

  Chapter 20: Separation of Duties

  The separation of duties principle requires that a single critical operation be separated into two or more steps, each carried out by a different person or process. This is sometimes called compartmentalization.

  An example of this principle in action is the use of split keys. When a system uses encryption to protect sensitive data, the encryption key must be stored in a manner that is accessible from code. However, if an attacker is able to access the key, then the sensitive data is at-risk of being decrypted. To mitigate this vulnerability, an

  architect will often require that the key be split into two portions, with one portion being stored in a configuration file and the second residing elsewhere, such as in a registry.

  Another common example in software development is requiring code reviews to be carried out by a developer that did not write the source code to be reviewed. Likewise, not allowing a developer who writes code to deploy it to production is also a good separation of duties candidate.

  This principle can reduce the amount of damage caused by a single person or process and can discourage insider fraud due to the need for collusion between multiple parties.

  Chapter 21: Defense in Depth

  Defense in depth addresses any point within an infrastructure that can be completely compromised by defeating a single control. By adding multiple layers of security safeguards, an attacker will be forced to overcome many controls of varying designs before gaining access to the inner workings. Beyond simply increasing security, implementing a second layer of defense can often discourage curious and non-determined attackers. This approach is sometimes called a layered defense.

  One example of this tactic is to first validate all input, followed by disallowing the use of string concatenation to create dynamic SQL queries. This is very effective in

  defeating SQL injection attacks.

  Another example is to encode all output to a web browser, and not allow the use of embedded scripts to defeat a cross-site scripting attack (XSS). While not using scripts is hardly in-line with modern websites, it nonetheless may be a practical approach for high-security environments.

  A final example is the use of security zones created by unique subnets. In this approach web servers live in the demilitarized zone (DMZ), mid-tier servers live in a secondary zone, and the database resides in a highly-restricted zone. This would require an attacker to navigate and overcome three different networks.

  Chapter 22: Fail Secure

  Fail secure happens when a system experiences a design or implementation failure and defaults to a secure state. A common application of this principle is found with electronic door locks which on a power interruption will either fail secure, meaning the door reverts to a locked state, or fail safe in which the door automati
cally unlocks and remains so until power is restored. The term ‘safe’ is meant to apply to people – in other words, if a door automatically unlocks on a power failure, it is safe for people to exit the building instead of being locked in. The term ‘secure’ refers to the building that is being protected. Sometimes people claim fail secure and fail safe are the same thing but obviously this is not the case.

  The same principle can be applied to software security where failing secure indicates software can rapidly return to a secure state such that confidentiality, integrity and availability is maintained. It also implies that the software can reliably continue to function while it is under attack.

  The SD3 initiative states that software should be secure by design, secure by default and secure by deployment. In other words, we should purposefully design security into software such that it reverts to a secure state when attacked or experiences a failure and maintains that security all the way through the final deployment to the production environment.

  An example of a fail secure design is locking out user accounts after a specific number of failed authentication attempts. This is referred to as clipping.

  Another example might be disallowing the pattern of

  swallowing exceptions in code and continuing as if nothing happened. Instead, a try/catch block should be written to handle the exception in a safe manner, which usually includes logging the error for later review.

  Associated with this pattern is the suppression of too much information when communicating to the user that a problem was encountered. This is an example of information leakage that could enable an attacker to direct his attention on a specific vulnerability. Suppose that our code tries to execute the following SQL statement:

 

‹ Prev