CrowdStrike Update: Latest News, Lessons Learned from a Retired Microsoft Engineer

348,300
0
Published 2024-07-24
Dave brings you up to date on the CrowdStrike IT outage and considers its broader implications. For my book on the spectrum, see: amzn.to/3XLJ8kY

Follow me for updates!
Twitter: @davepl1968 davepl1968
Facebook: fb.com/davepl

1. Introduction to the CrowdStrike Falcon IT Outage:
• Overview of the recent CrowdStrike Falcon IT outage and its impact on various industries.
2. Technical Details of the Outage:
• Explanation of the faulty sensor configuration update and how it led to system crashes (BSOD) on Windows systems.
• Specifics about the corrupted “Channel File 291.”
3. Impact and Response:
• Description of the scale of the outage, affecting approximately 8.5 million devices worldwide.
• Steps taken by CrowdStrike to deploy a fix and provide mitigation guidance to affected customers.
4. Previous Issues with Linux Systems:
• Recap of earlier incidents where CrowdStrike updates caused crashes on Debian and Rocky Linux systems.
5. CrowdStrike on macOS:
• Discussion about CrowdStrike’s security solutions for macOS and their use of Apple’s System Extensions.
6. Kernel vs. User Mode in Security Software:
• Analysis of why kernel-mode access is used by CrowdStrike and the associated risks.
• Historical context of kernel vs. user mode in Windows drivers.
7. Regulatory Challenges:
• Narrative on Microsoft’s attempt to introduce an API to prevent such issues and the regulatory hurdles faced from the European Union, which deemed it anticompetitive.
8. Conspiracy Theories and Broader Lessons:
• Overview of conspiracy theories that emerged around the outage.
• Lessons to be learned from the incident, drawing a parallel to the Tylenol crisis management.

I'm long since retired, and any opinions are mine alone; not a spokesperson!

All Comments (21)
  • @hotzemusic
    "Standing around with their disks in their hand" was such a great quote lol
  • @m4rt_
    "Never attribute to malice that which is adequately explained by stupidity." - Hanlon's razor
  • “It choked, turned blue and died“ I love the way you explain these technical items in such a clear and funny way.
  • @BFLmouse
    Way back in high school my programming teacher taught us that our user interface code had to be bulletproof. A 'bullet' was then defined as a hyperactive ten year old at the keyboard. One of the first tests he would do is to mash both hands on the keyboard. My program was expected to handle that gracefully. I've been using that philosophy on everything I do ever since.
  • @spidalack
    "The code just raw dogs it and hopes for the best" You almost caused me to spit out my drink
  • This was a obvious Crowd Strike procedure error. What should have happened: 1. The update is sent to automated testing 2. The tests return with crashes in 2-5 minutes 3. The guy that made all-0 file would be very embarrassed for that mistake. 4. The code that apparently doesn't check the input would have been flagged for improvement 5. Nobody outside the dev company would have heard of it. Just another day in the office, doing Kernel level work responsibly.
  • @uzaiyaro
    We had our own Tylenol scare in Australia. A woman sent threats to Arnott’s, a biscuit company, saying that she had contaminated a line of biscuits in some states (if I remember correctly), and arnott’s response was to recall every line of biscuits in every state immediately. They also published full page ads of the police reports, and any updates that had come from the police. They instructed people to not eat any of their biscuits. When they found the woman and dealt with her, Arnott’s didn’t lose any market share, in fact they gained market share because of their demonstrated trustworthiness. It has also become a case study in how to deal with a crisis in business.
  • Falcon represents a significant burden for developers, acting as corporate security bloatware that penalizes them for basic tasks like compiling code. This results in lost productivity and siphons money from companies without providing a tangible return on investment. Executives often overlook these issues until a major crisis occurs, yet the decision-makers responsible are rarely held accountable. Consulting a financial advisor could help companies evaluate the true cost and financial impact of such security measures on their overall operations and productivity.
  • @XH13
    My first C course back in school. The prof asked us to write a program that asked for a date d and a number n, and return the date d+n days. Beginner stuff, but you have to start somewhere. When the first student completed the assignment, the prof went to their computer and when asked for a date, he typed : "did you trust the user input ?" Program crashed, lesson learned.
  • @roycsinclair
    Most telling is that CrowdStrike has already done the same thing to smaller code bases, i.e. those two Linux versions mentioned and has obviously not fixed the internal processes that allowed corrupt updates to be released. Thank you Dave for getting more of the story.
  • @homebrew07
    Please make a video addressing your history with SharewareOnline and the multiple PUPs and Rouge AVs you had hosted and advertised there.
  • "I try to never attribute to malice that which can be sufficiently explained by incompetence." Greatest statement ever
  • @Hamiltron_
    “Their code just kind of raw dogged it” 😂😂 One of many quotable moments in this video. Bravo, you 56 year old gem of a nerd.
  • @lindoran
    There is so much value added here. You can really tell that Dave has had to present -- "ok, what had happened was" to a design review before. The bit about standing around with their disks in their hands ... that was priceless sir :D
  • Thanks, Dave, especially the excellent review of the Tylenol crisis. I was working for J&J, teaching Mr. Burke to use his new IBM PC at the time. He was a marvelous leader and manager!
  • Regarding the Linux issues with Crowdstrike: The important difference there is those were problems with the falcon-sensor itself. Those were discovered through the normal server patching process. Since we patch test machines first, we were able to find the problem before it hit any production servers. Having the problem in an automaticly downloaded channel update is a big difference.
  • My new favorite quote I heard from this situation is "Any sufficiently advanced incompetence is indistinguishable from malice" and I felt that
  • @ReverendTed
    I took a couple of entry-level programming classes in college, back in the 90s. The understanding that you couldn't trust user inputs to be valid was one of the first things we learned. To be fair, they taught us that while explaining we wouldn't have to worry about input validation\vetting in our entry-level projects. Even so, anyone in the periphery of Internet culture is aware of user-input code injection exploits, like the "drop table" memes.
  • This is an exceptionally well done video. I'm also 56 years old, and was delighted to hear you use the term "bork". I say it all the time, and my young employees have no idea what I'm talking about.