fbpx

A bug’s life

a-bug’s-life

Note to artist: Kindly place all the pics with the same photo credit together

(*edit – R Bala*)

IN scenes that could be the basis for future disaster movies, the world was shook from the fallout of an IT blunder. Certain industries such as travel were greatly affected as airports came to standstill, leaving thousands of flights grounded and many more passengers stranded.

As of July 23, KLIA2 had borne the brunt of the global IT outage storm, with systems being reportedly restored within four days of the incident. Elsewhere, other airports are either back in business, slowly recovering or are still affected.

However, it was not just airports that were affected. Other industries such as banks, schools, retail businesses and railways globally were all affected in different degrees. Analysts claim that the repercussion from the IT outage could continue to affect entire supply chains for weeks.

So, what brought down these industries? Here is what is known.

Cybersecurity failure

Roots of the chaos can be traced to July 19, after US cybersecurity firm CrowdStrike released a sensor configuration update for their Falcon programme, which is widely installed on Windows hosts, and to a lesser extent on Mac and Linux.

Being a company that specialises in ransomware, malware and internet security products that offers products for businesses and large corporations, CrowdStrike’s Falcon sensor is a cybersecurity programme that provides partially automated protection from malware, antivirus support, incident response and other security features.

According to the company, updates are applied regularly and automatically to the Falcon programme multiple times a day thanks to how it is cloud-based technology, but what was supposed to be a normal update on that fateful Friday had a coding error.

It sent millions of Windows computers worldwide to the infamous “Blue Screen of Death” (BSoD), while Mac and Linux computers were not affected.

Coding and logic error

Like the human body attempting to fight off an infection, these BSoD computers then fell into a reboot loop, with each attempt to restart without the error causing another BSoD. The error itself, which CrowdStrike calls a “logic error”, was due to a bug that resulted from a coding mistake.

How the Falcon programme works is by hooking into the Microsoft Windows operating system (OS) as a Windows kernel process. The process has high privileges and it gives Falcon the ability to monitor operations in real-time across the OS.

The flawed update was contained in a file that CrowdStrike refers to as “channel files,“ which specifically provide configuration updates for behavioural protections. July 19’s channel file 291 was an update that was supposed to help improve how Falcon evaluates named pipe execution on Microsoft Windows.

With channel file 291, CrowdStrike inadvertently introduced a logic error to the programme’s 7.11 version and above, causing the Falcon sensor to crash and, subsequently, Windows systems in which it was integrated.

For the regular person, they might say: “Well, then just delete the file”. That is easier said than done, because like mentioned above, the BSoD loop did not allow computers to be booted up normally.

Untangling the mess

The only way to delete the file would be manually booting into the system, which would require an IT administrator, and we are talking about one Windows OS here.

Imagine if it is a corporation with over a thousand Windows computers affected by CrowdStrike’s update. Now imagine its hundreds of corporations, companies and industries, all over the world, with complex IT infrastructures and encrypted drives.

This is why the IT outage lasted several days, even after it took CrowdStrike less than two hours on July 19 itself to release a fix.

In Malaysia, the outage reportedly only affected KLIA2 as it forced passengers to manually check-in for their flights, while Digital Minister Gobind Singh Deo posting on X (formerly Twitter) that the authorities are closely monitoring the situation.

Experts have now called out the hazard of cloud-based tech and the automation of software upgrades due to the global nature of cloud services.

Is this a one-off or an ominous warning that future errors could have more dire consequences? The incident certainly highlights mankind’s over-dependence on technology and how a small error could lead to chaos on a global scale.

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *