This isn’t a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn’t log in.

Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)

Seems insane to me that one company’s messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.

  • shirro
    link
    fedilink
    English
    arrow-up
    138
    arrow-down
    6
    ·
    edit-2
    4 months ago

    I isn’t even a Linux vs Windows thing but a competent at your job vs don’t know what the fuck you are doing thing. Critical systems are immutable and isolated or as close as reasonably possible. They don’t do live updates of third party software and certainly not software that is running privileged and can crash the operating system.

    I couldn’t face working in corporate IT with this sort of bullshit going on.

    • rozodru@lemmy.world
      link
      fedilink
      arrow-up
      60
      ·
      4 months ago

      This is just like “what not to do in IT/dev/tech 101” right here. Every since I’ve been in the industry for literally decades at this point I was always told, even when in school, “Never test in production, never roll anything out to production on a Friday, if you’re unsure have someone senior code review” of which, Crowdstrike, failed to do all of the above. Even the most junior of junior devs should know better. So the fact that this update was allowed go through…I mean blame the juniors, the seniors, the PM’s, the CTO’s, everyone. If your shit is so critical that a couple bad lines of poorly written code (which apparently is what it was) can cripple the majority of the world…yeah crowdstrike is done.

      • magic_lobster_party@kbin.run
        link
        fedilink
        arrow-up
        35
        ·
        4 months ago

        It’s incredible how an issue of this magnitude didn’t get discovered before they shipped it. It’s not exactly an issue that happens in some niche cases. It’s happening on all Windows computers!

        This can only happen if they didn’t test their product at all before releasing to production. Or worse: maybe they did test, got the error, and they just “eh, it’s probably just something wrong with test systems”, and then shipped anyway.

        This is just stupid.

    • CalcProgrammer1@lemmy.ml
      link
      fedilink
      arrow-up
      30
      arrow-down
      2
      ·
      edit-2
      4 months ago

      It’s also a “don’t allow third party proprietary shit into your kernel” issue. If the driver was open source it would actually go through a public code review and the issue would be more likely to get caught. Even if it did slip through people would publically have a fix by now with all the eyes on the code. It also wouldn’t get pushed to everyone simultaneously under the control of a single company, it would get tested and packaged by distributions before making it to end users.

      • Aphelion@lemm.ee
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        4 months ago

        It’s actually a “test things first and have a proper change control process” thing. Doesn’t matter if it’s open source, closed source scummy bullshit or even coded by God: you always test it first before hitting deploy.

        • cybersandwich@lemmy.world
          link
          fedilink
          arrow-up
          12
          ·
          4 months ago

          And roll it out in a controlled fashion: 1% of machines, 10%, 25%…no issues? Do the rest.

          How this didn’t get caught by testing seems impossible to me.

          The implementation/rollout strategy just seems bonkers. I feel bad for all of the field support guys who have had there next few weeks ruined, the sys admins who won’t sleep for 3 days, and all of the innocent businesses that got roped into it.

          A couple local shops are fucked this morning. Kinda shocked they’d be running crowd strike but also these aren’t big businesses. They are probably using managed service providers who are now swamped and who know when they’ll get back online.

          One was a bakery. They couldn’t sell all the bread they made this morning.

          • No1
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            4 months ago

            One shop I was at had a manual process going with cash only purchases.

            That blew up when I ordered 3 things and the ‘cashier’ didn’t know how to add them together. They didn’t have calculator on Windows available🤣

            I told them the total and change to give me, but lent them the calculator on my phone so they could verify for themselves 🤣

      • Morphit @feddit.uk
        link
        fedilink
        arrow-up
        2
        ·
        4 months ago

        It’s not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can’t wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.

        This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.

    • umbrella@lemmy.ml
      link
      fedilink
      arrow-up
      15
      ·
      4 months ago

      I couldn’t face working in corporate IT with this sort of bullshit going on.

      im taking you don’t work in IT anymore then?

    • Aceticon@lemmy.world
      link
      fedilink
      arrow-up
      14
      ·
      edit-2
      4 months ago

      More generally: delegate anything critical to a 3rd party and you’ve just put your business at the mercy of the quality (or lack thereof) of their own business processes which you do not control, which is especially dangerous in the current era of “cheapest as possible” hiring practices.

      Having been in IT for almost 3 decades, a lesson I have learned long ago and which I’ve also been applying to my own things (such as having my own domain for my own e-mail address rather than using something like Google) was that you should avoid as much as possible to have your mission critical or hard to replace stuff dependent on a 3rd Party, especially if the dependency is Live (i.e. activelly connected rather than just buying and installing their software).

      I’ve managed to avoid quite a lot of the recent enshittification exactly because I’ve been playing it safe in this domain for 2 decades.

      • SayCyberOnceMore@feddit.uk
        link
        fedilink
        English
        arrow-up
        25
        ·
        4 months ago

        No it’s Crowdstrike… we’re just seeing an issue with their Windows software, not their Linux software.

        • Sethayy@sh.itjust.works
          link
          fedilink
          arrow-up
          3
          arrow-down
          9
          ·
          4 months ago

          That being said Microsoft still did hire crowd strike and give them the keys to release an update like this.

          End result still is windows having more issues than linux

          • SquigglyEmpire@lemmy.world
            link
            fedilink
            arrow-up
            6
            ·
            4 months ago

            Huh? Crowdstrike is an antivirus product, you’re only affected if you bought and installed it on your Windows devices. Crowdstrike also had issues with their Linux version a few weeks ago, but that one was thankfully less severe.