DR – Technology and Learning

Part of cyber resilience is considering what to do when the worst happens. And that worst case scenario is sadly likely to inevitable at some point. This worst-case scenario will take the form of a significant incident, a disaster from which the school or college needs to recover, and in planning for this a Disaster Recovery (DR) plan should have been created. But what should such a plan look like?

I have given this quite a bit of thought. Is this disaster recovery plan a long and detailed document or something much more simple and digestible?

On one hand we might want the long document and all the details as in the event of a disaster we will want as much information as possible to help us with first isolating and managing the incident and then later with recovery. The issue with this is that when the fire has been lit under the IT Services team due to an IT incident, the last thing anyone wants to do is wade through a long and complex document. I have seen a disaster plan which included lots of Gantt charts with estimated timelines for different parts of the recovery, but how can we predict this with any accuracy against the multitude of different potential scenarios. Additionally, the information you will actually need is likely to depend very much on the nature of the incident.

The flip side is the much more managable document which is easier to digest and look to in a crisis situation, when things are high stress but its shortness will lack some of the detail you may want. That said, a shorter document will be easier to rehearse and prepare with when running simulated and desktop incidents such that staff remember the structure and are largely able to act without needing to refer too often to the supporting DR plan. It is also more likely to be applicable across a wider range of scenarios.

The above however suggests only two options, being the detail or the brevity and ease of use, but my thinking on DR has led me to think we need to have both. We need to have a brief incident plan which should be general and fit almost all possible incidents. It should consider how an incident might be called and then which roles will need to be implemented including contact details for the various people which might fill each of the roles. It should consider the initial steps only, getting the incident team together so they can then respond to the specific nature of the incident in hand. It is the outline process for calling and the initial management of an incident.

Then we need to have the reference information to refer to which will aid in the identification, management and eventual recovery from an incident. Now most of this should already exist in proper documentation of systems and setup and of processes, however this is often missed out. When things are busy its often about setting things up, deploying technology or fixing issues, and documenting activities, configurations, etc, is often put off for another day, a day which often never happens. I think the creation of this documentation may actually be key.

Conclusion

The specifics of a DR plan will vary with your context so I don’t think there is a single solution. For me there are 3 keys factors.

Having a basic plan which is well understood in relation to calling an “incident” and the initial phases of management of such an incident. This needs to be clear and accessible so as to be useful in a potentially high stress situation.
Having documentation for your systems and setup to aid recovery. This is often forgotten during setup or when changes are made, however in responding to an incident detailed documentation can be key.
Testing your processes to build familiarisation and to ensure processes work as intended, plus to adjust as needed.

DR planning is critical as we need to increasingly consider an incident as inevitable, so the better prepared we are the greater potential we have for minimising the impact of the incident on our school or college.

This is my second post related to IT Strategy, following up on the previous post regarding “Seeking Value” but this time looking at the resiliency of systems and infrastructure particularly around when things inevitably do go wrong.

Resiliency: Keeping it all working

I recently heard Mark Steed speaking at the EdTech Conversations event in London where he referred to his approach to the use of Educational Technology at JESS in Dubai.

In his speech, he talked about a “no excuses” approach to systems and the infrastructure on which educational technology solutions rely. His view was that if the foundations on which EdTech use are built are not solid, and if things such as Wi-Fi or the wider network don’t work or are intermittent then users of educational technology, be it the students or teachers, will simply turn off and seek non-technology solutions. Winning them back in the event of reliability issues being extremely difficult or near impossible. As such building strong technology foundations, a resilient infrastructure, is therefore key. Planning for when things might go wrong is a must.

As with most things building resiliency isn’t simple. In a world of infinite resources we would simply double up (N x 2), or even double up plus add spares; So in the case of our Internet provider we would require two separate diversely routed fibres so that, in the event one fibre was damaged, we would be able to run off the 2nd fibre. We might then have a third redundant backup solution, possibly with lower capacity, and again diversely routed. All of this sounds good and minimises potential downtime from fibre damage within the incoming internet services however this all comes with a cost, first in terms of financial costs of additional lines and also in terms of additional hardware and support costs. We don’t live in a world of infinite resources and therefore decisions need to be taken as to how much resiliency we build in. This is where the usual risk assessment and management processes must kick in.

Let’s consider the key pieces of infrastructure which may exist and issues around each:

Internet Service Provision, Firewalls and Core Switches

As we use more and more cloud services, internet access and school internet provision becomes critically important. Due to the critical nature of internet access, when looking at Internet service provision, firewalls and core switches, the two main focal areas I would consider are doubling up where finances allow or carefully examining the service level agreement along with any penalties proposed for where service levels are not met. In the case of firewalls and core switches, cold spares with a lower specification may also be an option to minimize cost but allow for quick recovery in the event of any issue. When looking at the SLAs of providers in terms of their support offering for when things go wrong consider, is it next business day on-site support or return to base for example and how long their anticipated recovery period is.

Edge Switches and Wi-Fi

In the case of edge switches and Wi-Fi Access Points we are likely to have large numbers especially for larger sites. I would suggest that heat mapping for Wi-Fi is key at the outset of a Wi-Fi deployment, in making sure Wi-Fi will work across the site. In looking at resiliency for when things go wrong my view is an N+1 approach. This involves establishing a spare or quantity of spares based on the total number of units in use and the level of risk which is deemed acceptable. High levels of risk acceptance mean fewer spares, whereas a low level of risk acceptance may lead to a greater number of spares.

Cabling / Routing

Cables break plus various small animals love to chew on cables given half a chance.

As a result, it is important to examine your overall network layout with a view to any weak points where a single failure might impact on large areas or large numbers of users within the school. Where possible plan for redundant routes such that any single failure can be quickly resolved by using an alternative route thereby minimising downtime while you wait for repairs.

School Management Solutions (SMS) /Management Information Systems (MIS)

I include the schools MIS system given its criticality in relation to parental contact info, student registration, etc. It is a critical system within a school. As such it is important to consider how it is backed up and how recovery would be undertaken. It is also important to test the processes. I have conducted tests in the past which have shown the recovery process did not perform as expected; Had I not tested, the first I would have known about difficulties would have been when I needed to recover the MIS for real, which is a time when the last thing you want is for things to not go as planned.

I note that the above is not an extensive or comprehensive list and I might have included classroom display technology, Mobile Device Management (MDM), Network Access Control (NAC), CCTV, access control and a whole manner of other solutions which may exist, however in the interest of keeping this post brief and to the point I have left these off.

For me, the key in relation to resiliency is a risk-based assessment of your systems and infrastructure.

We need to know the risks and their impact on the school. Armed with this information we can prioritise our available resources towards the aspects of our infrastructure where the greatest level of resiliency is required. The other key consideration is transparency and ensuring school leaders are aware of the risks which exist, where the available resources have been prioritised and where decisions have been taken not to deploy resources, plus the reasons why.

My concern with resiliency is that it is often something which people don’t worry about until things go wrong. Then come the difficult discussions as to why preventative measures or recovery plans hadn’t been put in place. Better to consider resiliency regularly and ensure that the state of play, including the risks, are all made clear to all. At my school, we approach this as part of an annual IT risk assessment process including risks related to resiliency. If you don’t have a risk assessment which includes a discussion of resiliency, it would be my strong advice to create one.

	Pledges – A re… on 2024: Pledges for the year…
	Gadgetgalaxyco on Technology: Balancing Benefits…
	FutureShots 2024, Pa… on The Gondola Incident
	FutureShots 2024, Pa… on FutureShots 2024, Part 1
	satyam rastogi on 2024: Pledges for the year…

Tag: DR

Disaster Recovery Planning

Planning for Resiliency

Share this:

Share this: