Facebook’s family of applications- Facebook, Instagram, WhatsApp, and Messenger went down yesterday night for almost six hours. The outage started after 9 pm IST and users were unable to access any of the services, including authentication using Facebook. According to Downdetector, the outage was the largest outage with over 10.6 million problem reports from all over the world. While the U.S. had the most reports at over 1.7 million, India had close to 300 thousand reports.
What caused the outage?
The outage lasted for nearly six hours and the services were back on Tuesday morning. Security Researcher Brian Krebs shared that the outage was from a routine BGP update that went wrong. The reason for the prolonged downtime was that the update also blocked remote users from reverting the configuration changes. Further, people who had physical access to the hardware did not have the access to the network in order to make changes.
Later, in a blog post, Facebook wrote that configuration changes on the backbone routers that coordinate network traffic between its data centers caused issues that interrupted network communication, which in turn took the entire services down.
“We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.”
Tl’dr; What is BGP?
To understand this, let’s first get one thing straight- the internet is not one single network. Rather, it’s a network of networks, interconnected through multiple networking protocols. As such, there’s no direct flight from your computer to let’s say Facebook’s servers. Instead, the route to reach Facebook is through a path that has multiple hop points (interconnections). BGP facilitates these interconnections. The name BGP- Border Gateway Protocol- since it borders other networks.
Now, let’s get into a bit more detail. Computers have an address (IP address) that helps them recognize each other on the internet. Let’s say you try to access Facebook.com. How does your computer facilitate this?
It forwards (“forwarding” in networking terms) your request to your local gateway (usually your ISP, e.g. Airtel), which then “routes” it to another network (usually a bigger network), to find a suitable path to Facebook. But how does Airtel know which network to forward the request next to? BGP. It allows one network to know other networks through something called “peering”.
Therefore, using BGP, Airtel would have peered with a bunch of other ISPs (or networks) to create a path that allows your computer to eventually reach Facebook. But what would happen if BGP doesn’t work (misconfigurations, cable cut, etc.)? The network would simply say that the Facebook address is no more valid (as Tom Warren found out in the screenshot below). It would also let other “peers” know the same. As a result, other networks would simply not allow connections to Facebook anymore.
How is DNS involved?
In very simple terms, once BGP errors propagated through multiple networks, networks also failed to resolve the DNS (Domain name system, a domain name is the text version of IP address) queries, since Facebook’s DNS map is unavailable too.