Tech leaders told students and faculty in a town hall that much of the fault lay in equipment failures, including cabling issues and MAC flap incidents that flooded the network with unmetered waves of traffic, resulting in a sequences of crashes. There was another configuration issue at one of the school’s central servers, IT staff told the room, as well as at least a third additional issue that school was having trouble pinpointing almost a week out.
While frustrating on its own, the outage becomes even more of a pain point when you realize that not only was IT blind to the potential of an imminent hardware failure on this scale, but that they didn’t realize how vulnerable they were to single points of failure in their infrastructure.
New and old networking tech requires vigilant monitoring
In colleges across the country, campus IT teams have been following the lead taken by enterprises in retiring the hardware-based network architectures they historically leveraged to keep users online for configurations that are built for the cloud era.
As campus life becomes more and more dependent on solutions and applications that rely upon connectivity, the costs of supporting all of these tools—and the networks they travel over—become prohibitively more expensive when the campus buys, manages, and monitors all of their own networking hardware. To that end, most legacy network hardware simply isn’t built to handle the many Gbps increases in capacity that campus networks demand, which is why many schools have offloaded their hardware-dense networks in favor of cloud solutions.
With digital initiatives appearing to outpace the rate of change within Amherst College’s campus network, however, it’s no wonder that a “perfect storm” of traffic was able to take down a central server and cause a domino-effect of outages.
But what’s most glaring is that even with the outdated network hardware in place, it doesn’t appear that campus IT had been employing any holistic, end-to-end performance monitoring that might have raised the alarm before it put Amherst’s network in the dark.
Seeing network activity hop-by-hop, across campus and the public Internet
Even for networks that haven’t adopted significant cloud architectures, network performance monitoring and diagnostics is critical. Solutions that provide an end-user perspective to the network can alert IT to potential pitfalls on an active basis, not just forcing network teams to rely on networking hardware to stay performant based on a “good track record.”
Here's why network performance monitoring is so important
After a network goes down, teams should still be able to deploy a monitoring solution that can track network pathways “hop-by-hop” in and outside of the LAN to identify where things have gone dark. This at least gives teams an idea of where to focus their efforts in resolving the issue, giving IT a clearer answer as to where the fault lays when connections go down even if they can’t immediately resolve the issue.
But it’s not just a matter of using any network performance monitoring to give IT some assurance when things go dark. When so much is riding on the network—from dorm access to meal plans—the more granular the data teams can glean (i.e. a look at all network hops apps travel not just between local network endpoints, but even across ISP networks when traffic travels over the public internet) the better to keep students and staff happy.
As a result of the incident at Amherst, the school has committed to retiring their hardware-centric systems in favor of cloud-delivered environments. Before, during and after any network overhaul of this scale, comprehensive performance monitoring needs to be running so that teams can:
a. Baseline performance before they retire hardware to know, at a minimum, what they should expect out of their new configuration to justify the change
b. Stay ahead of any hiccups during the overhaul that may negatively impact end users, and make IT decision makers averse to similar upgrades in the future
c. Ensure that all stakeholders who participated in the overhaul—from the ISPs delivering bandwidth to third-party network management systems—are holding up their end of the deal/meeting SLAs (service level agreements), and to provide ongoing monitoring for the users on the network.
More connected schools and colleges will only help students hone their skills for a future (and really, a present) where connectivity is a critical component of everything they do. Upgrading university networks to support this forward-thinking is essential, but doing so carefully will help prevent major outages and support more opportunities for learning.