In September 2018, we did witness a severe outage of a data center in Vietnam for nearly 2.5 hours. This event utterly bewildered internet users and raised considerable doubt in data center operation. As more workloads find their way to the cloud and demand increases, it would not be surprising for cloud customers to experience more outages. Yet this has not proved to be the case—at least not directly. According to the Uptime Institute’s 2022 annual outage analysis, over the past three years, public cloud outages are occurring at about the same historical rate. In Uptime’s 2022 Data Center Resiliency Survey, 80% of data center managers and operators said they had experienced an outage within the past three years.
Causes of Outage have not much changed
Analyzing human error — with a view to preventing it — has always been challenging for data center operators. The cause of a failure can lie in how well a process was taught, how tired, well-trained, or resourced the staff is, or whether the equipment itself was unnecessarily difficult to operate.
Networking- and connectivity-related issues represent another leading cause of significant outages. These outages are driven primarily by the fact that IT architectures and application topologies are becoming more complex by the day. Organizations today increasingly rely on a mix of on-premises hardware and services, multiple cloud providers, and third-party APIs running in containers and virtual machines.
Even though all of the technologies involved were functioning as designed, when combined into a system, they created a situation that led to an outage.
Increase in Recovery Time & Cost
As with human error, according to Miles, increasing recovery times also affect the cloud reliability equation.
Agile programming methodologies, DevOps, and automated continuous integration and deployment (CI/CD) pipeline all push updates from development into production more quickly. This can create situations where IT administrators (who are tasked with the mundane business of keeping lights on) and developers don’t talk to one another about important production-related issues. This lack of communication means IT is not up to date on all of the changes that are taking place in the production environments it is charged with maintaining. When outages happen, it can take IT much longer to figure out what the root causes are and what to do about them.
Thoroughly understanding the above concerns of customers, USDC Technology always appreciates the meticulousness in each single phase of data center construction and operation. We provide an integrated design-build-operate process in our consultancy services. Our accredited team will verify and validate the design that is consistent and meets the requirements following the standards, such as Uptime Tier, ANSI/TIA-942, and TCVN9250,…hence reducing unnecessary iterative process in submitting the acceptable design by the mentioned standards. Furthermore, we will definitely enable you to own a standardized construction with the most optimized expenses in building up and operation.