by Phil Martin
To this point, we have defined a QFE and service pack in reference to an external vendor. When it comes to our own internally developed products, a QFE, or hotfix, will almost always update application code, and a service pack is simply a new version of an existing product. So, whether we are talking an external QFE and service pack, or an internal hotfix and version release, they essentially behave the same.
While patches are reactive since they address known vulnerabilities, patch and vulnerability management is seen as a proactive process since it is designed to mitigate vulnerabilities found by someone else before an attacker has the chance to exploit the weakness in our own system. That is why it is crucial to periodically apply patches even if we have not experienced a successful attack.
Patching is not without risks, though. When a fix has not been thoroughly regression tested it can often cause an outage by creating a new problem. This happens a lot with smaller vendors and is found frequently with internal products. Performing your own internal testing of an external patch, even if the vendor is a huge, global entity such as Microsoft, is crucial to maintain a stable environment. This will require a simulated production environment on which to run the tests, which of course costs money. Both upstream and downstream dependencies on the software must be tested. For example, if we apply a patch to a third-party application, we must ensure the underlying operating system is not affected as well as consumers of the application. And don’t forget to perform your own regression testing to ensure that existing functionality is not broken as the result of the new patch or service pack. From a security point of view, the minimum security baseline (MSB) must be revalidated to ensure the update has not exposed additional weaknesses. If a patch successfully addresses a security issue, then the MSB must be updated.
Due to the overhead of regression testing, it is a rare patch that addresses a single vulnerability. Instead, multiple vulnerabilities might be mitigated with a single patch, and it is crucial that we understand which ones are included with a patch. Otherwise, we will lose track of how vulnerabilities have been managed and we will never have a good grasp on our current security stance.
Most large external vendors provide some sort of subscription service for delivering patches and service packs. Applying these resources in a timely manner is crucial, as it is always a race against the bad guy to fix a weakness before he can exploit it. However, this clock does not necessarily start ticking when the first attacker figures out the vulnerability – often the ‘start’ button on the clock is pressed the moment a patch is released. Think about it – if you were a hacker, and Microsoft just released an emergency patch, what would be the first thing you would do? Reverse engineer the patch, that’s what. Once you figured out the weakness they are trying to ‘fix’, you would then exploit the heck out of it before everyone applies the patch. When a patch comes out, it’s like blood in the water for hacker sharks, so test, test, test and then apply the patch as soon as you can! The most vulnerable time for any software is the period between a patch being released and when it is applied.
Proper internal patching should follow a well-documented process, such as the following:
1) Notify the users or owners of the software about the upcoming patch.
2) Test both upstream and downstream dependencies in a simulated environment.
3) Document the change along with a rollback plan in case things go badly.
4) Identify maintenance windows or times when the patch can be applied.
5) Install the patch.
6) Test the patch post-installation with a proper amount of regression tests.
7) Validate that the patch did not regress the state of security and that the MSB remains intact.
8) Monitor the patched systems to catch unexpected side-effects.
9) If a rollback was executed, conduct a post-mortem to gain lessons-learned. If it was successful, update the MSB.
When documenting the patch change, you should include the estimated time to complete the installation and how you will know it was successful. This documentation is crucial to obtain the approval necessary to install the patch. The change advisory board, or CAB, is the body that reviews documentation and provides approval if it passes muster. This documentation can also be used in the future as evidence during an audit to determine if changes are controlled within an organization.
A reboot of the system is often required for patches to complete installation, so the best time to install a patch should be carefully selected. This will be a time where there is minimal use of a system, usually in the wee hours of the morning. With global applications that service all time zones, there is seldom a window that is significantly better than others.
NIST SP 800-40 provides a few best practices for the patching process. While most are common sense or have already been discussed, there are a few good nuggets worth noting:
Establish a patch and vulnerability group, or PVG. This group oversees the patching process and ensures its efficient execution.
Prioritize patches and used a phased deployment where appropriate. This means that a subset of systems is updated at the same time, followed by the next subset if the first was successful, etc. This allows us to limit the blast radius of a failed patching cycle.
Use automatic updating as appropriate, where patches are applied without previous testing. The quality of previous patches from a vendor must be considered, as well as the impact if an untested patch fails.
Periodically test the effectiveness of the organization’s patch and vulnerability management program.
Backups, Recovery and Archiving
The maturity of an organization can often be measured solely on how often backups are tested. In fact, if I had to come up with an organizational maturity scale based solely on backup and recovery processes, it would look something like the following:
Maturity
Activity
Crying baby
No backups are taken
Teenager with acne
Full backups are taken sporadically
20-something trying to be an adult
Full and differential backups are taken on a scheduled basis
Professional adult
Backups are tested by restoring to a non-production environment
Seer sitting on top of a mountain dispensing sage advice
Backups are tested in a production environment during a maintenance window
The bottom line here is that if a backup is never tested, Murphy’s Law says they will surely fail when you need them most.
It is a best practice to always backup a system before applying patches or service packs. You never know when a patch will go wildly wrong, and a proper backup and recovery process might mean the difference between an elegant rollback and an unmitigated disaster. Another frequent use of backups – other than recovering from a disaster – is to restore a system to a state before it was infected by malware. This is especially important in the modern age of ransomware, where an attacker will digitally encrypt mission-critical files and destructively delete the originals. Unless we have secure backups, we may have no choice but to meet the attacker’s demands.
It is not sufficient to simply have backups hanging around, though. They should be protected with the same security that is applied to the raw data, as an attacker will specifically look for a backup source instead of targeting the well-protected original. It is human nature to leave backups in a less-protected state and forget about them, and this provides a much easier way to access sensitive data than going head-to-head with a production-level security environment. Since backups are frequently copied and moved using removable media, physical access to unencrypted backups is also a great opportunity for an attacker. The recovery process itself should take security into account as well. We can take great care to encrypt backups, but if the key is freely available without requiring secure access, then what is the point of backup encryption?
Older data is normally archived to slower,
less expensive persisted data solutions. For example, any records older than 6 months might be moved to optical disk and stored on a shelf, freeing up valuable disk space on a database server. The information is still available in an emergency – it just might take a few hours to get to it. In the same way that backups must be protected, archived data should receive the same level of secure protection as the live data, including encryption management, logical access control and physical security.
Both backups and archived data should have hashes calculated and stored separately from the data, so we know if they have been tampered with. Encryption keys should be protected just like keys used in the live environment. When being used as part of a forensics investigation, a complete chain of custody should be established and followed for both backups and archived data. When this information is no longer needed, they must be disposed of in the proper manner. Which is an excellent segue into our next topic – disposal.
Disposal
Every IT asset has a limited shelf-life. This time period is fairly predictable when it comes to hardware, as either the hardware wears out and must be replaced, or newer hardware that is faster and better comes along. With data, we can judge shelf-life based on data retention policies. Software, however, is the wildcard – we might think an application will be useful for 10 years, when it turns out that changing market conditions requires us to leave it behind after 6 months.
On the opposite side of the spectrum, that same software might easily be in-use 30 or 40 years from now. If you think I am overblowing that last statement just a bit, where do you think the hysteria around Y2K came from? I was there as a young developer and lived through it, and I can tell you that despite the lack of doomsday scenarios that actually played out on January 1, 2000, the threat was very real and scary. It all started when programmers back in the 1960s and 1970s wrote code that only worked until 12/31/1999, thinking “My code will never be around by then – why bother?” A secondary problem was with how leap years were calculated, but again the same mentality prevented proper code from being written. Unfortunately, those programs were still in use by some of the biggest institutions in the world when 1999 rolled around. It was only because of the billions of dollars companies paid out to fix the issues in the late 1990s that a catastrophe was averted.
Back to the topic at hand – software shelf-life. At some point all software must be retired. As software continues to be used without being updated, risk will continue to accumulate. Once risk has exceeded an organization’s acceptable level, that software must be retired. This ‘retirement’ is called disposal and may sometimes be referred to as sun-setting or decommissioning.
End-of-Life Policies
Every software product and its related data and documents should have an end-of-life policy, or an EOL policy, established well in advance of its retirement. NIST SP 800-30 discusses the need for risk mitigation as part of the disposal activities to ensure residual data is not disclosed or lost. When dealing with a COTS product, or a commercial off-the-shelf product, the EOL starts by notifying customers of the last date on which the product will be sold. This allows customers to start planning their migration activities in advance.
An EOL policy should contain the following elements at a minimum:
The criteria used to make a decision to sun-set the product.
A notice referring to all hardware and software that is being discontinued or replaced.
How long support will continue to be provided from the end of sale date to the final disposition date.
Recommended alternatives for migration along with the versions that will be supported in the future.
Dates at which maintenance releases, workarounds and patches will no longer be available.
Contract renewal terms in cases of licensed or third-party software.
Sun-Setting Criteria
The first item we mentioned as part of an EOL policy referenced the criteria used to decide if a product should be retired. While both hardware and software should be considered, this book is about software so let’s focus on those criteria only. The following is a general list of conditions under which a product should be retired. While any of these can be used to justify retirement of a product, it is sometimes wise to require multiple options to be true before pulling the ‘sun-setting trigger’.
The risk from new threats and attacks cannot be brought down to levels that fall beneath the organization’s acceptable risk level.
Contracts requiring the use of the software have ended, and the cost of continued use cannot be justified.
The software is no longer under warranty, or it no longer has a valid support option.
The software is no longer compatible with hardware on which it must run. This can often be true of legacy applications.
Newer software can provide the same level of functionality but in a more secure manner.
Sun-Setting Processes
We would never think of rolling out new software without a plan on how to be successful. Likewise, disposing of existing software requires the same level of diligence to ensure a happy outcome. Otherwise we run the risk of losing crucial data, experiencing gaps in capabilities until a replacement system can be brought online, or encountering pushback from users who ‘liked the old system a lot better’. Just like most processes in the IT world, there is a recommended list of best practices we should follow to ensure things go smoothly.
First of all, have a replacement ready before the old software has been disposed of. This might sound like a ‘DUH!’ statement, but it is surprising how many organizations turn off a system well-before it is time. The replacement system should be purchased, tested and deployed into the right environment, with sufficient validation that it is working. Then, and only then, can we retire the old system once we obtain the necessary approvals from the authorized decision makers. Both the asset inventory and configuration management databases should be updated with both the system being retired and the new system being brought in.
Consider the impact to automated capabilities that will result when the old system is turned off. For example, will this trigger monitor alarms who think a crucial system just went offline? Any reference that a process – manual or automated – has to the retiring system must be removed or turned off.
Each termination access control, or TAC, must be executed to ensure proper removal of access rights. It is not enough to think that since the system will no longer be available, we can leave those rights intact, as they might allow access into other systems that continue to run, and the orphaned access rights will more than likely be forgotten about as time goes by. If we must reproduce the same access rights in the new system, do not copy them – recreate them from scratch in the new system. This forces us to revisit existing access rights and to make sure each still applies.
Archive both the system and data offline, in case it must be reloaded due to a regulatory requirement, or we discover later that the migration process was not completely successful.
Do not rely on uninstall scripts to remove software – always follow it up with a deletion process to ensure complete removal. Uninstall scripts can leave behind a log of the activity, which can contain sensitive data. All secure software removal processes must have a manual follow-up at the end to verify proper removal, and to carry out deletion of residual artifacts from automated uninstall scripts.
Information Disposal and Media Sanitization
When a retiring system processes sensitive or confidential data, it is crucial to ensure that persistent storage devices have been completely purged of all data to prevent information disclosure. Disposal is the act of discarding media without paying attention to residual data left on the storage medium. Sanitization is the act of clearing data from the storage medium before disposal occurs, and the possible options are illustrated in Figure 52. We sanitize media and then dispose of it. Sanitization applies to two types of storage – physical and electronic representation of the data.
Fi
gure 51: The Various Methods of Sanitization
Physical storage of data is best represented by printed hardcopies of the information on paper, but also includes printer or fax ribbons, drums and platens – any device used in the production of the physical medium. Most people do not know what a ‘platen’ is, but it is a metal roller used to imprint characters onto paper and can record residual imprints. The roller in a typewriter is an example. These assets are usually uncontrolled and simply discarded in the trash where an attacker can easily ‘dumpster dive’ and retrieve the sensitive information. Electronic storage occurs when we store information in bits and bytes. The most common examples of this media include hard drives, RAM, ROM, USB drives, mobile computing devices, and networking equipment. There are three approaches we can use to properly dispose of both types of information storage – clearing, purging or
destroying the medium.
Clearing applies to electronic storage only and is the process of logically overwriting storage space with random, non-sensitive data. This is not a 100% effective solution, and can cause data remanence, or sensitive data remaining intact on the storage device after clearing has finished. For write-once read-many devices, sometimes called WORM devices, clearing is obviously not applicable as the original data cannot be overwritten.
Figure 52: Data Sanitization and Decision Flow