ThousandEyes, which tracks internet and cloud traffic, is providing Network World with weekly updates on the performance of three categories of service provider: ISP, cloud provider, UCaaS
As COVID-19 continues to spread, forcing employees to work from home, the services of ISPs, cloud providers and conferencing services a.k.a. unified communications as a service (UCaaS) providers are experiencing increased traffic.
ThousandEyes is monitoring how these increases affect outages and the performance these providers undergo. It will provide Network World a roundup of interesting events of the week in the delivery of these services, and Network World will provide a summary here. Stop back next week for another update.
Update April 13
During the week April 6-Apri 12, service outages for ISPs, cloud providers, and conferencing services dropped overall. They went from 298 down to 177 globally (40%, a six-week low), and in the U.S. dropped from 129 to 72 (44%).
Globally, ISP outages were down from 229 to 141 (38%), and in the U.S. were down from 100 to 56 (44%).
Cloud provider outages were also down overall from 25 to 19 (24%), ThousandEyes says, but jumped up from one to six (500%) in the U.S., which saw the highest rate of increase in seven weeks. Even so, the U.S. total was relatively low. “Again, cloud providers are doing quite well,” ThousandEyes says.Conferencing services recovered from a spike the week before, and all of the outages – nine – were i.n the U.S. Globally outages dropped from 29 to nine (68.9%), and in the U.S. from 25 to nine (64%).
Update April 6
Outages for ISPs globally were down 9.13% during the week of March 30 from the week before, whereas U.S. outages were down 16.7%, dropping from 120 to 100. Worldwide the outages were also down, from 252 to 229. Public cloud outages rose worldwide from 22 to 25, and in the U.S. there was one outage, up from zero the previous week.
Outages for collaboration apps rose dramatically, increasing more than 260% globally and more than 500% in the U.S. over the week before. The actual numbers were an increase from eight to 29 worldwide, and up from 4 to 25 in the U.S.
ISP Cogent Communications suffered what ThousandEyes called a significant outage April 1 from 12:30 p.m. to 12:35 p.m. Pacific time that affected the ability of users to connect to sites and service such as Office 365. Because Cogent peers with other providers, the customers of those providers might have experienced disruption to some services as well.
Access to Yelp and some applications and sites hosted by AWS and Cloudflare were unreachable between 12:35 and 12:40 p.m. Pacific time on April 1 when Russian ISP Rostelecom leaked illegitimate IP address prefixes to its ISP peers, including Level 3. Such leaks lead to incorrect or less than optimal routing, according to ThousandEyes.
In this case, the leak improperly inserted Rostelecom into the network path between users and the affected providers. Level 3 propagated those improperly advertised routes to its peers, setting off a chain of events that led to massive traffic drops during the outage time.
Update March 31
Looking at data over the past six weeks, ThousandEyes finds that the combined worldwide service outages among ISPs, public cloud providers, conferencing services and edge networks (content-delivery networks, DNS, and security as a service) has risen 42%.
Cloud-provider performance hasn’t been affected much at all, and in fact multiple weeks last year had a much higher number of outages.
Week of March 23
Between the week of March 16 and March 23, the outages suffered by ISPs worldwide went down from 230 to 203, nearly 12% lower. In the U.S., the number of outages rose from 100 to 107, up 7%.
Public cloud outages were down both worldwide and in the U.S. Worldwide, they dropped from 21 to 15 (down 28%), and in the U.S. dropped from six to zero. There was a service disruption to Google traffic due to a router failure in Atlanta, it did not meet ThousandEyes’ definition of an outage, and it wasn’t related to COVID-19.
Collaboration applications also showed a decline in outages from the week before, dropping from 15 to six worldwide, and down from seven to three in the U.S., reductions of 60% and 57%, respectively.
ThousandEyes highlighted what it considered significant outages:
- “Cogent Communications suffered yet another significant outage this week — its fifth major outage this month. The outage occurred within parts of Cogent’s network in Northern California and Oregon and impacted users connecting to sites and services in those regions, including projectbaseline.com, the website of Verily’s much-publicized COVID-19 testing program.”
- ”For approximately 20 minutes on March 25th, ThousandEyes observed that some users located on the East Coast may not have been able to reach Google services due to 100% traffic loss. A short time later, Google’s SVP of Engineering tweeted that the incident was due to a router failure in Atlanta, Georgia. US users outside of the Northeast were also impacted intermittently, although they would have experienced the incident as site errors when trying to reach some Google sites, such as google.com. The HTTP server errors seen during this period are consistent with an inability to reach the backend systems necessary to correctly load various services. Any traffic traversing the affected region — connecting from Google’s front-end servers to backend services — may have been impacted and seen the resulting server errors.”
With the increased use of remote-access VPNs, major carriers are reporting dramatic increases in their network traffic – with Verizon reporting a 20% week-over-week increase, and Vodafone reporting an increase of 50%.
While there has been no corresponding spike in outages in service provider networks, over the past six weeks there has been a steady increase in outages across multiple provider types both worldwide and in the U.S., all according to ThousandEyes, which keeps track of internet and cloud traffic.
This includes “a concerning upward trajectory” since the beginning of March of ISP outages worldwide that coincides with the spread of COVID-19, according to a ThousandEyes blog by Angelique Medina, the company’s director of product marketing. ISP outages worldwide hovered around 150 per week between Feb. 10 and March 19, but then increased to between just under 200 and about 225 during the following three weeks.
In the U.S. those numbers were a little over 50 in the first time range and reaching about 100 during the first week of March. “That early March level has been mostly sustained over the last couple of weeks,” Medina writes.
Cogent Communications was one ISP with nearly identical large scale outages on March 11 and March 18, with “disruptions for the fairly lengthy period (by Internet standards) of 30 minutes,” she wrote.
Hurricane Electric suffered an outage March 20 that was less extensive and shorter than Cogent’s but included smaller disruptions that altogether affected hundreds of sites and services, she wrote.
Public-cloud provider networks have withstood the effects of COVID-19 well, with slight increases in the number of outages in the U.S., but otherwise relatively level around the world. The possible reason: “Major public cloud providers, such as AWS, Microsoft Azure, and Google Cloud, have built massive global networks that are incredibly well-equipped to handle traffic surges,” Medina wrote. And when these networks do have major outages it’s due to routing or infrastructure state changes, not traffic congestion.
Some providers of collaboration applications – the likes of Zoom, Webex, MSFT Teams, RingCentral – also experienced performance problems between March 9 and March 20. ThousandEyes doesn’t name them, but does list performance numbers for what it describes “the top three” UCaaS providers. One actually showed improvements in availability, latency, packet loss and jitter. The other two “showed minimal (in the grand scheme of things) degradations on all fronts — not surprising given the unprecedented strain they’ve been under,” according to the blog.
Each provider showed spikes in traffic loss over the time period that ranged from less than 1% to more than 4% in one case. In the case of one provider, “outages within its own network spiked last week, meaning that the network issues impacting users were taking place on infrastructure managed by the provider versus an external ISP.”
“Outage incidents within large UCaaS provider networks are fairly infrequent; however, the recent massive surge in usage is clearly stressing current design limits. Capacity is reportedly being added across the board to meet new service demands,” according to the blog.
Meanwhile, ThousandEyes has introduced a new feature on its site a Global Internet Outages Map that is updated every few minutes. It shows recent and ongoing outages
Google outage unrelated to COVID-19
On March 26 Google suffered a 20 minute outage on the East Coast of the U.S., apparently from a router failure in Atlanta, ThousandEyes said, agreeing with a statement put out by Googe to that effect.
That problem affected other regions of the U.S. as evidenced by Google sites such as google.com intermittently returning server errors.”These 500 server errors are consistent with an inability to reach the backend systems necessary to correctly load various services,” ThousandEyes said in a statement. “Any traffic traversing the affected region — connecting from Google’s front-end servers to backend services — may have been impacted and seen the resulting server errors.”
ThousandEyes posted interactive results of tests it ran about the outage here and here.