Take a Product Tour Request a Demo Cybersecurity Assessment Contact Us

Blogs

The latest cybersecurity trends, best practices, security vulnerabilities, and more

Mitigating Security Update Risks: Part 1

A Professional Services Perspective

On July 19, 2024, news anchors all over the world did their best to explain a complex cybersecurity issue to their respective audiences. While we think they did a great job, and deeply appreciate their journalistic efforts, we wish they didn’t have to. The global scale and particular nature of this cybersecurity issue led to widespread impact on availability. Many CISOs are now questioning the wisdom of automatic security content updates.

While the exact details of the underlying technical issue are still unclear, as of this writing, what we do know is that a faulty update file was deployed to a portion of the install base before the Availability impact was discovered. This is not the first time such an issue has occurred in the cybersecurity industry, and it will not likely be the last. Even the most rigorous quality assurance processes cannot catch 100% of issues. The 0.0004% chance is a risk worth mitigating, and one we can balance against the risk of deploying an update too slowly.


Best Practices

Trellix Professional Services strongly advises that customers exercise due diligence with all software and security content updates, regardless of source, prior to deploying enterprise-wide.

Customers using the Information Technology Infrastructure Library (ITIL) v3 or v4 will be familiar with Change Management/Change Enablement processes. While the exact details of the processes differ from customer to customer, Trellix Professional Services is extremely familiar with the framework.

When evaluating various software and security content updates provided by Trellix we consider a  variety of factors to determine the best approach for a particular customer:

  1. Size and complexity of the customer environment
    1. A customer with 20 machines deserves as much care and diligence as a customer with 200,000 machines, but scale has certain benefits for testing.
    2. Customers with complex geographic footprints and widely heterogeneous software environments will require larger test groups to catch local/business unit specific issues as they arise.
  2. Customer reported risk profile
    1. Different customers have different threat landscapes, depending on their industry, geography, technical sophistication, specific critical assets, and corporate culture.
    2. The relative importance of Confidentiality, Integrity, and Availability to specific areas of the customer environment.
      1. Confidentiality - The need to ensure that only authorized users access an information system, and to only the extent that they are authorized.
      2. Integrity - The need to ensure that information and system components are not altered, degraded, and/or deleted inappropriately by any actor regardless of authorization status.
      3. Availability - The need to ensure that all authorized users can access the information system and any data to the extent that they are authorized.
      4. A customer with a water or electrical plant may have a greater need to ensure Integrity and Availability in their operations systems but care more about Confidentiality and Integrity in their billing systems.
    3. Customers with extremely high-security requirements may require compressed testing timelines while still performing full testing activities to test and deploy on the same day as release.
    4. Some customers prefer to adopt slowly, while others insist on having the latest features and updates as soon as possible. We respect and support both approaches within the bounds of reason.
  3. The particulars of the software or security content update being deployed
    1. Daily content updates can be reliably tested with automated test groups representing geographic and/or department level machine profiles, and deploy N-0, N-1, or N-2, and may be covered by a standard change.
    2. Countermeasure updates, updates which change filter drivers, hooks to third-party party applications, and customer configuration changes enabling major features require more stringent testing. It may take multiple days or weeks, with full documentation of test cases and test results, and require Normal or Emergency Changes depending on urgency.
  4. Availability of a test environment
    1. Mature  and well-funded cybersecurity practices may have  separate environments to complete initial testing, in testing, pre-production, or development environments.
    2. While these are not necessary for testing updates, it can add an additional layer of safety.

Trellix Professional Services brings together the above pieces to provide tailored update testing procedures to our customers, targeted to their specific risk tolerances and operational requirements.

Can you show me an example?

Absolutely!

  • Our example customer
    • 20k Microsoft Windows endpoints
    • 5k servers split 80/20 Windows and Linux
    • Healthcare vertical split between
      • Inpatient
      • Outpatient
      • Visiting nurse services
    • Located in a single major metro area
      • Approximately 30 branch offices
    • No testing environment
  • Analysis
    • Modern Healthcare relies on integrated technology systems for every step of providing life-saving services. Certain areas are highly critical and may impact the ability of the customer organization to save lives, deliver babies, and maintain the health of the community.
    • Risk of Availability and Integrity impacts from content updates must be weighed against the risk of Confidentiality, Integrity, and Availability impacts from ransomware and other cyber attacks.
  • Our testing recommendations:
    • Daily content updates
      • N-0 group - day of release
        • 200 workstations and laptops, split between 70 at each campus and two at each branch location
        • 0 servers
        • Exclude ER/ICU and surgical - these are high intensity/high availability requirement machines and should rightfully be treated differently from standard machines
      • N-1 group - 24 hours later
        • Majority of workstations and laptops excluding ER/ICU and surgical
        • Suitable test group representing 10% of ER/ICU and surgical machines
        • 150 servers, 120 windows and 30 linux spread across functional groups
      • N-2 group - 48 hours from day-of-release
        • Balance of ER/ICU and Surgical machines
        • Balance of servers
  • Software updates
    • Follow all customer change enablement procedures, perform and document tests, call a halt to fix-forward any identified issues
    • Workstation test group
      • Test N-0 as soon as available
      • Phase-In deployment groups over 1 month to test machines, workstations only
      • Phase A, 20 machines, Phase B, 200 machines, Phase C, 2000 machines
      • Document results and provide to change control board
    • Wide deployment - workstations
      • Phase-In deployment groups A, 10%(bringing our total to 20%), B, 20%(total 40%), C, 20%(total 60%), D, 20%(Total 80%), and E, 20%(Total 100%) with one week between each phase
      • ER/ICU and surgical machines should start in phase D and finish out phase E
    • Server test group
      • Test N-0 as soon as Workstation deployment reaches Phase C
      • Three test phases representing 1%/5%/10% of environment
      • Document results and provide to change control board
    • Wide deployment - servers
      • Phase-In deployment groups representing 20%/40%/60%/80%/100% with a week between each phase

In the next few days, we will post a series of blogs that explain how Trellix’s technology solutions support these best practices, additional best practices for configuration management, testing, and deployment within Trellix technology solutions. We will also include instructions explaining how to engage Trellix Professional Services to start building the process and procedure muscle, technical knowledge, and testing infrastructure in your organization.

Conclusion

While incidents like the July 19, 2024 CrowdStrike outage are disruptive, Trellix offers customers Transparency, Choice, and Responsibility in our software and content update processes. This enables customers to exercise due diligence and consistent testing in their environments, which can dramatically limit impacts and reduce the risks to your organization. If your organization needs help putting in that effort, consider Trellix Professional Services. We offer various strategic and technical services that can help your team test, automate, operate, and govern your cyber defenses. Contact your Trellix account manager, or if you haven’t got one yet, let us know you’re interested. Tell them you want fully tested software and security content updates to get the ball rolling!

Continue reading the series: Part 1 (Intro), Part 2 (Product Features ePO, EDR, and ENS), Part 3 (Product Features Endpoint Forensics), and Part 4 (Testing Procedures for ePO, EDR, ENS, and HX).

Make sure you read this disclaimer after all that recommendations goodness:

This document and the information contained herein describe computer security research for educational purposes only and for the convenience of Trellix customers. This document contains information on Trellix products, services, and/or processes in development. All information provided here is subject to change without notice at Trellix’s sole discretion. Contact your Trellix representative for the latest forecast, schedule, specifications, and roadmaps.

Get the latest

We’re no strangers to cybersecurity. But we are a new company.
Stay up to date as we evolve.

Please enter a valid email address.

Zero spam. Unsubscribe at any time.