How to Test Your Cybersecurity Incident Response Plan

Written by: Trevor Meers

You’re listening to best practices, and you care about protecting your organization’s operations. So you’ve written a solid incident response plan and shared it with your team.

But does it work in the real world? To quote the philosopher Mike Tyson, “Everyone has a plan until they get punched in the mouth.” And when a real data security incident occurs, the punches will be flying. So you need to regularly test your cybersecurity incident response plan, along with the humans and technology that will carry it out. In this blog, we’ll share the three most common testing approaches, from basic discussions to full simulations.

Choosing the Right Incident Response Plan Test

If you’re not motivated to do regular testing, others may provide the incentive you need. Major third-party compliance frameworks such as SOC 2 and PCI DSS, for example, require an annual test of your incident response plan, even though they rarely specify an exact testing approach. And your organization’s cybersecurity maturity and risk level may indicate that you need semi-annual or quarterly tests. You’ll get more practice and more frequent opportunities to spot parts of the plan that have gone out of date.

Your biggest customers may be pickier than the frameworks about the testing approach. At HBS, we increasingly see contracts requiring vendors to run annual incident response tests to prove they can deliver their services, even if they face a major cybersecurity incident. We’ve seen some companies recently tell potential vendors that their testing process isn’t rigorous enough. That left the vendor to decide whether the contract was valuable enough to justify the time and expense of running a thorough simulation every year to satisfy the customer.

Below are the three most common levels of incident response plan testing, arranged by order of escalating realism and, thus, ability to truly reveal your plan’s value.

Tabletop Exercise

This is the most basic and theoretical test of your incident response plan. You’ll gather all the key players in a conference room, throw out several breach scenarios and have everyone talk through their part of the response, as dictated by the plan.

You’ll get definite value from this approach. In most organizations, questions like the ones below tend to highlight some holes and generate some meaningful to-do items after the meeting:

“How long will it take to actually get all of our data restored from backup?” You may know that you have backups, but have you ever drilled down into what the restoration process actually looks like? Most teams leave a tabletop exercise with an urgent assignment to work out the details on this front.
“Where can I find this list of phone numbers for every employee?” It probably turns out that the list is saved somewhere on the company server—which may be exactly what is compromised in your scenario. Now where do you go to get the numbers?
“How long do we have to restore things before we violate the service level agreements in our key contracts?” There’s a big difference between having several days to restore operations and having 8 hours before you’re in breach of contract. Defining your exact window may change your plan.

“What does this step mean when it says _______?” If you get that question, it’s time to rewrite parts of the plan to ensure that it can’t be misunderstood, even by people working under extreme stress.

The tabletop exercise’s weakness, of course, is its theoretical nature. When someone answers a question confidently, do you trust that they’ve done the homework to prove that their answer is correct and capturing the full implications? HBS vCISO Jeff Franklin points out that, “Colonial Pipeline paid the ransom to get the decryption key, and it still took them days to decrypt their data.” You could get a false sense of security if your only testing comes from conversations in a meeting room with no pressure bearing down as your organization goes dark.

Walkthrough

In this test, your team actually steps through the plan as it’s designed, stopping short only at the point of actually flipping a switch to do something like turning on the building’s alternative power source. Typical walkthrough activities include:

Calling people listed in the plan to confirm that the phone numbers work, they answer in a timely fashion, etc.
Walking to different parts of the building to confirm that things will happen as the plan specifies. Can you, for example, walk into the IT department unannounced and find the team required by the incident response plan?
Sending test messages. If the plan calls for an all-company e-mail telling everyone to evacuate the building, you’ll send that e-mail to confirm that it works (with a bold note at the top of the message stating that it’s only a test).
Confirming real-world timelines. During the walk-through, the incident response team will walk into someone’s office and find out how long it would really take to accomplish their assignment if they had to start right now. Is that person even in the office anymore? If not, can you reach them promptly? Can they accomplish their assigned tasks remotely?

With all of these tests, remember to include the human resources aspects of the situation. If you plan to use a text alert to find out who is out of the building, do you expect people on vacation to answer that text? If your operations go down for a half day, do you send everyone home? Do they still get paid?

Cutover

In this test, you’ll pull out all the stops to truly find out how your team handles a crisis. In a cutover, an incident response leader might walk into the IT department and declare that the team needs to switch from on-prems servers to the cloud. Right now. Then you see what happens. Does everything actually come on through the alternate system? Does it take longer than expected?

Some HBS customers go so far as killing the power to their building to force the generator to kick in and see what happens. (Note that no one does a cutover of all systems simultaneously.)

One big eye-opener for most organizations comes when they try to restore data from backup. Many teams find that the restore takes far longer than they expected, that the team isn’t fully trained to perform the backup quickly or that your equipment can’t handle moving that much data in the required timeframe.

Clearly, a cutover test will create an operational interruption. If your plan allows eight hours to restore critical data, you could be down that long during a cutover test. For that reason, we’ve seen some clients turn down big contracts that required an annual cutover test. With that downtime and staff commitment factored in once a year, the contract’s profit margin wasn’t compelling enough.

Third-party Facilitators

No matter which test you choose, it’s wise to have outside observers/advisors present. If you believe that you’ve achieved a high level of cybersecurity maturity, you can probably manage the tests internally. But it’s still wise to have a third party watch, take notes and share feedback during the post-game analysis. An expert cybersecurity observer will always see issues that you don’t even know about.

“One side knows the business, and one side knows incident response planning,” says Franklin. “You want to marry those two to manage that responsibility.”

Even if you do hire a third-party advisor to run your tests, HBS recommends that someone from your team provide the hands-on leadership during the test. This gives your team critical experience that will benefit your organization for a long time.

If you need help identifying the right testing method for you, contact the HBS consulting team today.