On July 19, 2024, a major IT outage caused by an update from security software provider Crowdstrike brought critical infrastructure worldwide to a standstill. This incident affected banks, airlines, medical facilities, and media outlets, preventing them from offering essential services. Here are key insights and steps for risk managers and compliance professionals to avoid similar crises in the future.
What Happened?
The problem mainly affected Microsoft users and was caused by an update to a driver from Crowdstrike Falcon, a software used to detect threats on Windows computers. This update resulted in the “blue screen of death” (BSOD) on affected machines. Crowdstrike CEO George Kurtz confirmed that the issue was due to a defect in a single update for Windows hosts, not a security breach or cyberattack.
Bram De Buyser, founder of data and AI consultancy firm Arcology, explained: “[Crowdstrike] includes antivirus, but also many other monitoring and invasive stuff to determine if a machine is vulnerable or even attacked. To do that, the software installs drivers. Crowdstrike has pushed an update to one of those drivers, and that update broke it, so now, whenever Windows tries to load that driver it crashes.”
Looking to the Future
This incident highlights the global over-reliance on a few key software providers and the interconnectedness of IT systems. Steps to mitigate future risks include:
- Thorough Risk Assessments: Evaluate not only internal systems but also third-party dependencies. Regularly assess the risk posed by critical vendors and software providers.
- Rigorous Software Testing: Implement testing regimes for all software updates to prevent critical system impacts. Conduct extensive testing in staging environments before deployment to catch potential issues.
- Holistic Cyber Resilience: Adopt a multi-layered defense strategy encompassing software solutions, robust policies, regular training, and proactive threat hunting. Ensure continuous monitoring of controls to detect and respond to threats promptly.
- Continuous Controls Monitoring: Implement systems that provide real-time visibility into the effectiveness of security measures and rapid response to emerging threats.
Martin Greenfield, CEO of Quod Orbis, emphasizes: “This incident serves as a reminder that even industry-leading solutions can falter, potentially leaving entire sectors vulnerable. The widespread impact of this outage also highlights the interconnectedness of global IT systems and the potential for cascading failures.”
Better Business Continuity Planning
Organizations must have practiced business continuity plans to handle such outages effectively:
- Document Impacts: Record the effects of outages in business impact assessments, detailing impacts on stakeholders, including clients and regulatory bodies.
- Determine Acceptable Risks: Identify acceptable levels of disruption and plan actions accordingly. Develop scenarios and continuity plans to address potential outages.
- Operational Resilience: Ensure scenarios and continuity plans are in place, as regulations like DORA require. Regularly update and practice these plans to ensure preparedness.
Laura Fox, founder of Canary Risk, underscores the importance of readiness: “The best risk and resilience teams will have prepared their businesses for such outages. IT teams can then focus on fixing the issues with the support of a well-thought-out plan. Does this mean all problems are solved and there are no impacts? Absolutely not. But the operational impact on well-prepared businesses that have invested in resilience is likely far lower.”
Key Takeaways for Risk Managers and Compliance Professionals
- Conduct Comprehensive Data Audits: Ensure thorough documentation of data collection and usage practices. Regularly review and update these audits to reflect changes in data handling and regulatory requirements.
- Review Pricing Algorithms: Implement transparency and fairness in algorithmic pricing systems. Ensure algorithms are regularly tested and audited for bias and compliance.
- Enhance Data Governance: Develop robust frameworks prioritising ethical considerations. Implement policies beyond compliance to address broader ethical implications of data use.
- Stay Informed: Keep up-to-date with evolving regulations and best practices in data privacy and consumer protection. Participate in industry forums and training to stay ahead of regulatory changes.
- Invest in Resilience: Prioritize operational resilience through continuous monitoring, regular testing, and proactive planning. Ensure that business continuity plans are comprehensive, regularly tested, and updated to handle new threats.
By implementing these strategies, risk managers and compliance professionals can better prepare their organizations to handle and mitigate the impacts of similar crises in the future. Preparing for such incidents involves technical readiness, strategic communication, and comprehensive risk management planning.