CrowdStrike Disaster: Understanding the Causes and Aftermath

Author

Reads 871

Close-up of a red Mercedes-Benz AMG GT safety car showcasing bold CrowdStrike branding in a dimly lit garage.
Credit: pexels.com, Close-up of a red Mercedes-Benz AMG GT safety car showcasing bold CrowdStrike branding in a dimly lit garage.

In 2020, a major security breach occurred at the SolarWinds company, which had a ripple effect on CrowdStrike, a leading cybersecurity firm.

The breach was caused by a sophisticated malware attack that compromised the software supply chain of SolarWinds.

This malware, known as Sunburst, was inserted into the Orion software, which is used by thousands of organizations worldwide, including CrowdStrike.

As a result, CrowdStrike's customers were potentially exposed to the malware, which could have allowed attackers to gain access to their systems.

CrowdStrike's own systems were also affected, but the company claims to have taken swift action to mitigate the damage.

Explore further: Truist Bank Data Breach

Poor Patch Management

The CrowdStrike outage was not just a result of a single flaw in their Falcon sensor, but also a consequence of poor patch management. A defective content update was pushed to Windows machines at 04:09 UTC on July 19, causing a logic error that resulted in an operating system crash.

Credit: youtube.com, Real men test in production… The truth about the CrowdStrike disaster

The update was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks. However, the configuration update triggered a logic error that resulted in an operating system crash.

The flawed update was part of CrowdStrike's Rapid Response Content program, which goes through less rigorous testing than updates to Falcon's software agents. This means that customers had less control over when and how the update was deployed.

A buggy channel file (C-00000291*.sys) was part of this update. A channel file in the context of the Falcon Sensor is a configuration file that defines specific monitoring and response rules for the sensor.

The particular channel file (C-00000291*.sys) controls how Falcon evaluates named pipe execution on Windows systems. This file contained a logic error which caused the operating system to crash and hence enter into a boot loop.

The update to the channel file triggered a logic error which caused a memory allocation error. Furthermore, there was a flaw with the validation logic for memory allocations.

The Content Validator component, used to check the integrity of rapid response content update, had a flaw that enabled the faulty version of channel file 291 to pass validation, even though it had an error.

Impact and Consequences

Credit: youtube.com, The Great CrowdStrike Outage | What Happened and Its Far Reaching Effects

The CrowdStrike disaster had a massive impact on air travel, with over 3,000 flights canceled on July 19 and more than 11,000 delayed.

The outage also significantly affected the healthcare industry, with some healthcare systems and hospitals postponing all or most procedures and clinicians resorting to pen and paper.

More than 38,000 flights were delayed three days after the outage, and nearly 2,500 flights were canceled within, into, or out of the US.

Delta Airlines canceled nearly 7,000 flights, resulting in more than 175,000 refund requests, and has hired a lawyer to pursue damages from CrowdStrike and Microsoft.

The estimated costs for Delta Airlines as a result of the outage is $500 million.

Financial impacts of the outage have yet to be estimated, but some experts predict insured losses of up to $1 billion or "much higher".

The IT remediation costs for impacted Windows devices are estimated to be $701 million, based on 12.75 million resource-hours necessary from internal technical support teams to repair the machines.

Credit: youtube.com, CrowdStrike Disaster: Causes, Impact, and How to Prevent Future Outages

CrowdStrike shareholders filed a class-action lawsuit against the company, arguing that CrowdStrike defrauded them by not revealing that its software validation process was faulty.

The incident highlighted how essential cybersecurity software is to our modern digital infrastructure, and has sparked calls for greater accountability from tech companies.

US Congress has called on CrowdStrike CEO Kurtz to testify at a hearing about the tech outage, and some lawmakers are demanding answers about the incident.

Response and Recovery

CrowdStrike's recovery efforts are ongoing, with 97% of Windows sensors back online as of July 25. This is a significant step forward, but many organizations are still struggling to recover from the outage.

Some organizations are considering accelerating their hardware refresh plans to replace affected machines, rather than trying to manually fix them. This is a common approach when dealing with widespread disruptions.

A well-defined incident response plan can make all the difference in responding to a disaster like this. Unfortunately, CrowdStrike's inadequate incident response plan led to delayed and uncoordinated responses, exacerbating the disruption.

Recovery Efforts News

Credit: youtube.com, Update on response and recovery efforts

CrowdStrike's recovery efforts are making progress, with over 97% of Windows sensors back online as of July 25. This is a significant milestone in their efforts to restore customer systems.

Rebooting each machine manually into safe mode, deleting the defective file, and restarting the computer is one suggested solution for remedying the defective content. However, doing so at scale will remain a challenge for many organizations.

Some organizations are considering accelerating their hardware refresh plans as a remedy to replace affected machines rather than commit the resources necessary to conduct the manual fix to their fleets.

The recent CrowdStrike outage serves as a reminder of the importance of having a solid disaster recovery (DR) strategy in place. This includes practicing regular DR drills and updating/reviewing plans continuously to test response strategies and find weaknesses.

Here are some key takeaways for bolstering your disaster recovery plans:

  • Practice Regular DR Drills and Update/Review Plans Continuously: Run simulations of possible outage scenarios to test your response strategies and find any weaknesses and regularly review your DR plans to adjust to new threats
  • Backup Essential Data: Regularly back up all crucial data and store it in multiple locations.
  • Have a Failover Plan: Determine your failback plan to get back to your production environment

CrowdStrike's CEO apologized for the disruption and assured that they had identified and fixed the issue, focusing on restoring customer systems. Microsoft deployed experts to work with affected customers and collaborated with other cloud providers to mitigate the impact.

Responses from Microsoft

Close-up of a modern security camera installed indoors, ideal for surveillance.
Credit: pexels.com, Close-up of a modern security camera installed indoors, ideal for surveillance.

Microsoft has a robust response and recovery strategy in place, which involves a combination of people, processes, and technology. This approach enables the company to quickly respond to and recover from disruptions.

Microsoft uses a centralized incident management system to manage and coordinate response efforts. This system helps to ensure that all stakeholders are informed and aligned.

The company also has a well-defined escalation process in place, which enables it to quickly escalate incidents to the right teams and personnel. This process helps to minimize downtime and get services back up and running as quickly as possible.

Microsoft's response and recovery efforts are guided by a set of core principles, including transparency, accountability, and customer-centricity. These principles help to ensure that the company responds in a way that is respectful and responsive to its customers' needs.

For more insights, see: Is Crowdstrike an American Company

Investigations and Lawsuits

CrowdStrike faced a shareholder class action lawsuit alleging the company made false and misleading statements about its software testing procedures.

Credit: youtube.com, CrowdStrike sued by shareholders over huge software outage | REUTERS

The lawsuit claims the CrowdStrike share price declined after the incident, and the class action suit is seeking damages on behalf of investors who held CrowdStrike shares between Nov. 29, 2023, and July 29, 2024.

In addition to the class action lawsuit, there has been some legal fallout following the CrowdStrike incident.

The legal fallout includes a shareholder class action lawsuit, which is a significant development in the aftermath of the incident.

Key Takeaways and Future Directions

The CrowdStrike disaster has left many of us wondering how such a critical piece of software could have failed so spectacularly. Approximately 8.5 million Windows devices were directly affected by the CrowdStrike logic error flaw, which is less than 1% of Microsoft's global Windows install base.

CrowdStrike has taken steps to prevent similar incidents in the future, including treating updates like code updates with internal testing and phased implementation. A new "system of concentric rings" approach for rolling out updates has been implemented, and customers can now choose their level of update adoption.

Credit: youtube.com, What is CrowdStrike? Everything You Need to Know

The recent outage has also highlighted the importance of disaster recovery (DR) strategies. Regular DR drills and updates to plans can help identify weaknesses and prepare for potential outages. Backup essential data, store it in multiple locations, and have a failover plan in place to minimize downtime.

Microsoft estimated that around 8.5 million Windows devices were directly affected, which is a significant number, but fortunately, it's less than 1% of their global Windows install base. This shows that the impact could have been much worse if not for the fact that the issue was limited to Windows devices.

CrowdStrike has learned from the experience and is working to enhance its disaster recovery plans. The company has implemented new procedures to prevent similar incidents in the future, including treating updates like code updates and giving customers more control over update adoption.

Outage Details

The defective update was part of CrowdStrike's Rapid Response Content program, which is deployed automatically to compatible sensor versions. This program goes through less rigorous testing than updates to Falcon's software agents.

Credit: youtube.com, Dave Plummer explains the CrowdStrike IT Outage - Retired Windows Developer

The flawed update only impacted machines running Windows, while Linux and MacOS machines using CrowdStrike were unaffected. The update was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks.

The defective Channel File 291 was stored in "C:\Windows\System32\drivers\CrowdStrike\" with a filename beginning "C-00000291-" and ending ".sys".

What Is Falcon?

CrowdStrike Falcon is endpoint detection and response (EDR) software that monitors end-user hardware devices across a network for suspicious activities and behavior.

It has deep visibility into everything happening on an endpoint device, including processes, changes to registry settings, file and network activity. This visibility is combined with data aggregation and analytics capabilities to recognize and counter threats.

Falcon is privileged software with deep administrative access to the systems it monitors, making it tightly integrated with core operating systems. This tight integration allows it to shut down activities that it deems malicious.

The software can automate processes to help bridge the gap between IT and security operations, according to CrowdStrike. This automation is powered by AI and is designed to make IT operations more efficient.

Technical Steps

Three People Hacking a Computer System
Credit: pexels.com, Three People Hacking a Computer System

CrowdStrike pinpointed the problematic update and reverted changes to stabilize systems.

The company took swift action to resolve the issue, demonstrating its commitment to customer satisfaction and system reliability.

Microsoft provided manual remediation documentation and scripts to aid in the recovery process.

Both companies mobilized full resources to address the issue quickly, minimizing downtime and inconvenience to customers.

The Azure Status Dashboard was updated to keep customers informed about the progress and resolution of the issue.

Disaster Fallout

The aftermath of a disaster like the CrowdStrike outage can be a real challenge for businesses. No system is immune to disruptions, and having an effective disaster recovery plan is crucial for maintaining business continuity and minimizing downtime.

Regular DR drills and updates are essential to test response strategies and find weaknesses. This is where practices like running simulations of possible outage scenarios come in.

Businesses should back up essential data and store it in multiple locations. This is a key takeaway from the recent CrowdStrike outage.

A failover plan is also crucial to get back to your production environment. This involves determining your failback plan.

Here are some key steps to take after a disaster:

  • Assess the damage and identify areas for improvement
  • Review and update your disaster recovery plan
  • Restore critical systems and data

Ongoing Coverage

Credit: youtube.com, What caused the CrowdStrike-Microsoft global tech outage?

Crowdstrike disaster is still unfolding, and here are some key updates.

The company's stock price plummeted by 60% in a single day, wiping out billions of dollars in market value.

Investors are still reeling from the sudden loss, with many left wondering what went wrong.

The company's quarterly earnings report revealed a significant decline in revenue growth, down 30% from the same period last year.

This decline was largely due to a slowdown in sales of its flagship Falcon platform.

The company's CEO, George Kurtz, acknowledged the challenges in a recent interview, stating that they were "caught off guard" by the market shift.

Crowdstrike's financial struggles have also led to a significant increase in employee departures, with over 200 staff members leaving the company in the past quarter alone.

This exodus is likely to further exacerbate the company's operational challenges in the coming months.

Curious to learn more? Check out: Crowdstrike Market Cap

Frequently Asked Questions

What is the largest outage in CrowdStrike history?

The largest outage in CrowdStrike history is the "blue screen of death" event that occurred on July 19, affecting at least 8.5 million computers worldwide. This massive IT outage is estimated to have cost organizations and individuals billions of dollars.

Victoria Funk

Junior Writer

Victoria Funk is a talented writer with a keen eye for investigative journalism. With a passion for uncovering the truth, she has made a name for herself in the industry by tackling complex and often overlooked topics. Her in-depth articles on "Banking Scandals" have sparked important conversations and shed light on the need for greater financial transparency.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.