My photo

Nariman Mani, P.Eng., PhD Computer and Software Engineering
Home

    Are Memory Unsafe Languages on Their Way Out? A Closer Look at the July 19, 2024 CrowdStrike Outage

    July 20, 2024

    On July 19, 2024, a significant outage impacted CrowdStrike, a leading cybersecurity company, disrupting businesses worldwide. The incident, widely discussed in the tech community, highlighted the vulnerabilities associated with memory unsafe languages, particularly C++

    Inspired by the analysis of the stack trace dump from Zach Vorhies on X (Twitter), here I break down what happened:

    Impact

    The system crash led to a major outage, affecting a wide range of sectors:

    • Business Operations: Many financial services and doctors’ offices faced significant disruptions, halting transactions and communications.
    • Media and Broadcasting: Several TV broadcasters went offline, affecting news and entertainment services.
    • Air Travel: The aviation sector was particularly hard hit, with planes grounded, flights delayed, and airports issuing advisories to passengers.

    These disruptions underscored the critical nature of reliable software systems and the potential widespread consequences of programming errors.

    What Happened?

    The outage was most likely triggered by a programming error involving a null pointer in C++. The error led to an attempt to read from an invalid memory address (0x9c or 156 in decimal), resulting in a system-level crash. Here's a detailed breakdown of the technical aspects:

    The Problem:

    • Null Pointer Issue: The code created a pointer variable intended to point to an object in memory. However, due to an error, this pointer remained null, meaning it pointed to no valid memory location.
    • Missing Null Check: The code attempted to use this null pointer to access object data. Proper practice is to check if a pointer is null before using it (e.g., if (obj == NULL) { ... }). This check was missing.
    • Invalid Memory Access: Attempting to read data from a null pointer led to an invalid memory access. The code tried to read from memory address 0x9c, shown in the stack trace dump. Essentially, it was “NULL + 0x9C = 0x9C”.
    • Memory Access Violation: Since the program tried to access an invalid memory location, Windows recognized this as a potential security threat and crashed the program entirely to protect the system, leading to the Blue Screen of Death (BSOD).

    Stack Trace Details:

    • Read Address: 0x000000000000009c
    • Error Code: 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be read.
    • Process Name: “System”, indicating it was a system-level issue.
    • Result: The stack trace shows that the system attempted to access an invalid memory location, resulting in the crash.

    Lessons Learned

    From this incident, key lessons can be drawn for both CrowdStrike and the broader tech community:

    • CrowdStrike:

      • Implement more rigorous code reviews and automated testing to catch such errors before they reach production.
      • Utilize code safety tools that automatically check for null pointers and other potential issues.
      • Consider transitioning system drivers to modern programming languages like Rust, which are designed to prevent these kinds of errors.
    • Microsoft:

      • Develop better policies for rolling back defective drivers to minimize outages.
      • Enhance tooling to catch such errors in system drivers before they cause significant disruption.
      • Avoid pushing risky updates directly to customers without thorough validation.

    The Bigger Picture: Moving Away from Memory Unsafe Languages

    This incident highlights a broader industry conversation: Is it time to move away from memory unsafe languages like C++? Earlier this year, on February 26, 2024, the White House issued a memo advising against using memory unsafe languages, recommending safer alternatives like Java, Go, and Rust. This incident with CrowdStrike might be a tipping point that accelerates the shift to these safer languages, enhancing security and reliability across critical systems.

    Join the Conversation

    Is this the beginning of the end for memory unsafe languages in critical systems? Will we see a significant industry shift towards languages like Rust, Java, and Go? Let's continue this important discussion as we work towards a safer and more reliable digital world.

    Credit : Zach Vorhies
    Ref : In Rust we trust? White House Office urges memory safety

2024 All rights reserved.