Data centers: strategic choices in preparation for digital and economic resilience
The COVID-19 health crisis has demonstrated, if proof were necessary, the extent to which the continuing operation of data centers, and IT infrastructures in general, is a key element of national stability. What are the best practices to be implemented in data centers in order to coordinate the digital and economic resilience of companies and of society as a whole?
Continuity of digital services: the primary requirements remain the same
The stability of business or public service activity is for the large part dependent on the efficient operation of its data systems. This increased dependence has led to the adoption in France of the essential service operator (ESO) status[1], as an extension to the Operator of Vital Importance (OIV) status, in order to define an organization that provides an essential service and which is reliant on data systems, whose interruption of service would have a significant impact on the functioning of the economy or business.
This definition of dependence has been further highlighted by the COVID-19 health crisis with the critical availability requirement of telecommunications operators and ISPs, hospital data systems, continued operation of institutions, widespread remote working, distance learning, medical teleconsultations, etc. The general confinement policy applied globally has resulted in a significant increase in Internet traffic, which has reached up to 70% in some European countries according to KPMG[2]. The infrastructures underpinning these services have withstood the load up until this point because they have been designed to be resilient.
A design that guarantees service stability
Acting as a container for digital infrastructures, the data center plays an essential role in application availability. Its architecture is defines in accordance with certain service continuity objectives defined within the Uptime Institute reference base (Tier I to IV for availability increasing from 99.671% to 99.995% of uptime). Furthermore, a data center operative may implement site replication systems (“dual sites”) for immediate service resumption in the event of a total power outage.
As a response to the explosion in certain requirements, as is the case currently for collaborative work applications, service providers may rely on cloud-based technologies which provide a high level of flexibility for additional data processing and storage resources within a very short lead time. Through the decentralization of traffic (Edge Data center), the territorial grid coverage strategy adopted by major digital service providers also contributes to service provision to all.
Connectivity strategy to maintain fluid traffic
Digital services owe their efficient delivery to landline and cellular network service continuity, which is achieved through the interconnection of the various national operators. In addition, investments made for the decentralization of CDN functions (Content Delivery Networks) have played a part in improving the resilience of high-bandwidth major content platforms by reducing the distances between content and their places of consumption. At company-level, access to multiple competitor connectivity suppliers also allows data centers to distribute risk and to guarantee traffic flow.
During periods of high demand, IT departments may employ multiple prioritization strategies to ensure the resilience of critical applications in order to adapt service delivery level in line with flow criticality and typology, in particular through the use of the SD-WAN (Software-Defined Wide Area Network). The question of flow prioritization is also applicable to the Internet, notably with regard to the conflicting delivery demands for “professional” data flows (applications used for remote working, for example) and “recreation” data flows (video streaming, for example).
BCP and BRP: business life insurance
Within an organization, whether public or private sector, preparing for the unexpected is not limited to IT tools, even if, as we have seen, these form a major axis of activity stability. In order to manage all eventualities, namely unexpected events, which could prevent business activity from being performed effectively as a result of production equipment inaccessibility or teams not being able to complete their tasks, the company must take preparatory steps.
In the current situation, the whole point of business continuity plans (BCP) and business resumption plans (BRP), which are part of the everyday work of data center operators, is as follows: it is temporality that defines and therefore primarily differentiates these measures. So, whereas the BCP provides a response to short-term risk with procedures allowing the essential activities of an organization to be sustained, the purpose of the BRC is to resume all activity in the most ordered and expedient manner. The BRP is therefore more of a “long-term” solution, and is generally implemented following a sudden shutdown of an entire production system: major fault on production equipment, epidemic, etc.
Above all, the BCP and BRP must maintain or even increase the level of security/safety of central sites which have become all the more critical in the event of a crisis. In concrete terms, this entails the adoption of new certification procedures, reducing physical contact when accessing sensitive sites, installing CCTV for buildings closed to the public offering audiovisual peace of mind or establishing a backup security team able to replace any operative that is absent due to illness, for example.
The BCP and BRP must be based on clear and structured written procedures if they are to be effective. In particular, cover all subjects revolving around the three pillars that make up any organization today: human resources, production tools and digital infrastructures. It is only on the basis of the existence and relevance of this documentation that an organization will be able to tackle all types of unexpected event, sustaining the least amount of losses possible.
[1] The EU Directive relating to the security of network and information systems defines the rules for identifying ESOs.
[2] https://www.forbes.com/sites/markbeech/2020/03/25/covid-19-pushes-up-internet-use-70-streaming-more-than-12-first-figures-reveal/