Saturday, January 2, 2021

No, IT is not about data (alone)


In my first weeks working at an OT specialist company, I was often explained IT people do not understand OT. Because OT is all about physical processes and IT is about data. 

The comparison is incorrect. Data is important for both OT and IT. You must protect readings. Integrity is of great importance. I think there is a better comparison: If OT revolves around physical processes, the counterpart revolves around administrative processes. OT brews the beer, the counterpart sends the bill. Both are processes. 

The image below is a generalization of "a process". That could be brewing beer, or sending the bill. Input and output in OT are physical, in the counterpart they are data. They both need steering information. Automation is used in both.

There are differences in the impact of a disruption of the process. It makes a big difference if you have to throw away a million liters of beer and brew it again because of a quality problem or if you send the bill too late. 

You would say that the consequences on the OT side often have more impact than on the side of the counterpart. Yet, in OT the one responsible for brewing beer is handling the automation himself, where the one sending out the bill has set up an IT department.

Time for change?

Wednesday, November 11, 2020

Cyber security value


When you damage your car you do not immediately think of a total loss. It is much more common that there is a dent in it that can be easily repaired. This chance is greater than the chance of a total loss. Logical? Not when it comes to IT risks! Then we traditionally go to work with maximum damage. But that can also be done differently.

I wrote an article about it for Schade Magazine. (Dutch, Google translate)

Wednesday, October 21, 2020

Monday, October 5, 2020

We need Top Management awareness


Awareness training in cybersecurity is often the equivalent of telling road users to be careful. It is a good idea, but it will have limited impact. On the road, we have rules such as please keep driving on the right, give space to people coming from the right and do not go faster than 50 km/h. Next to that, we have infrastructure to support the safe usage, like traffic lights, pedestrian crossings, and guardrails. And finally, we have legislation and enforcement by the police. Our society is aware off the fact these measures are all necessities to support safe road usage. In cybersecurity we need this as well. I call it Top Management Awareness. 

The standard way of addressing cybersecurity in your company is to organize it in accordance with the ISO27001. This standard makes it very clear top management is accountable for a proper functioning Information Security Management System. Top management should be aware of this accountability and should be aware of the implications. Similar to road users, top management needs to provide rules, infrastructure, enforcement etcetera. This is not always the case. 

In my experience this lack of awareness is due to two reasons: 

  1. Optimism bias 
    The risk is perceived lower than it is. Many security problems arise from technical problems, which are hard to explain to non-IT educated people. It is perceived to be so hard to misuse these technical imperfections that chances someone will misuse them is very low. Next to the idea that the company is an unlikely target is persisting. It leads to the idea cybersecurity is exaggerated by the specialists and does not need much management attention. 

  2. Fatalistic thinking 
    Cybersecurity is too hard to implement. The ISO27001 includes 114 controls. The NIST SP800-53, which is more detailed, contains 965 controls. Fatalistic thinking refers to the belief it is impossible to achieve a sufficient level of security. 

The quickest way of creating awareness at the top of an organisation is a cybersecurity disaster. It immediately removes the optimism bias. The meme that circulated on Twitter a few weeks ago sums it up very well. Before the fact it is just a risk, which is perceived very low. After the event there is tangible damage. Not only the company is damaged, but personal reputation is also damaged. When the press is knocking on your door it is the CFO that needs to explain the situation, not the security manager. 

The smarter way is to perform a realistic quantitative risk analysis. This method will make the risk more tangible. It makes the risk more actionable, because the effects of adding extra controls can be made visible. Quantitative risk analysis gives management control of the level of risk they are willing to take and removes the ground for fatalistic thinking. Once top management understands the need to act and have the means to act, we gained true top management awareness.

Tuesday, January 14, 2020

(re)Move the budget!


Are you expecting your IT department can work on a budget and keep things secure? Think again! But you could probably lift some weight of their shoulders, so they can do more with less.

A few weeks ago, I saw parallels with Expedition Robinson - a reality television program in which contestants are put into survival situations – and the cybersecurity industry. There was a challenge where contestants had to carry sand filled bags around their neck and run a course. The last one to finish had to leave the game and had to choose another contestant to hand over his sandbags. In the last race an unfortunate guy had to carry almost all of the bags and he lost from a girl who hat to carry only a few. The weight was so heavy he could barely walk, let alone run.

In IT it’s often the CIO who has to carry all the bags. Departments demand more and more new services, but won’t allow old ones to be removed. The IT department is expected to keep up with the latest technology AND to keep the old systems (often called ‘legacy’) running. The weight becomes heavier and heavier. 

This is due to the fact that legacy is harder to maintain. Older applications may need older Operating Systems, older Databases and sometimes even older hardware to run on. Some of these components may run out of supplier support. At that point security patches are no longer available and extra mitigating measures must be put in place. For all the components you need to have expertise but since people want to move on with modern technology (they have a career too), it will become harder to find. 

To exploit a vulnerability, an attacker must have at least one applicable tool or technique that can connect to a system weakness. In this frame, vulnerability is also known as the attack surface or exposure. The more different systems in use, the bigger the exposure, the bigger the risk.

At the same time the departments usually only pay for the initial project implementing new systems, but don’t have to pay the maintenance costs. Maintenance is part of the IT budget. Leaving ‘old stuff’ running comes at no cost for them. It’s free shopping, while replacing the legacy will cost them. It is a kind of prisoner’s dilemma  between the supplying IT department and the various consuming other departments. They can have an overall better, more secure, IT system if they cooperate and stop acting on individual interests.

Running Legacy is not only insecure, it’s also expensive. decreasing the cost will improve security, but the opposite is also true: Improving security will decrease the costs. Security can have a positive ROI. 

 As a CISO you could leverage the cost of IT to improve overall security by making the actual costs of your high- risk legacy applications clear. Analyse the dependencies and take into account that every component needs to be maintained. Then visualize which department is responsible for these hidden costs. In my experience this will trigger an important discussion: Why is the IT budget so unevenly spread? Why does IT carry all the load, while they don’t?


Wednesday, November 6, 2019

Clean up!


I recently visited the Risk and Resilience Festival at the University of Twente. One of the keynote speakers was Maarten van Aalst, Director of the international Red Cross Red Crescent Climate Centre, and professor at the University of Twente. He told a story about Bangladesh and how the frequent flooding lead to increasing numbers of casualties. It was too simple, he said, to blame climate change. A much bigger impact was the increasing population. The more people there are living in the Ganges Delta, the bigger the exposure. It reminded me of the fact that exposure is often something we can influence.

A common approach in IT risk management is to carry out a Threat and Vulnerability assessment (TVA). In cyber security, a threat is a possible danger that might exploit a vulnerability to breach security and therefore cause possible harm. A threat can be either "intentional" (i.e. hacking: an individual or a criminal organization) or "accidental" (e.g. the possibility of a computer malfunctioning, or the possibility of a natural disaster such as an earthquake, a fire, or a tornado) or otherwise a circumstance, capability, action, or event. A vulnerability is a weakness which can be exploited by a threatening actor, such as an attacker, to perform unauthorized actions within a computer system.

Exposure

To exploit a vulnerability, an attacker must have at least one applicable tool or technique that can connect to a system weakness. In this frame, vulnerability is also known as the attack surface or exposure. However, in most cases TVA exposure does not get much attention. When evaluated, exposure is usually estimated in a high-mid-low manner and then combined with ‘severity’ in a heatmap. The recommended way to minimize exposure is often to either patch a system to remove the vulnerability or move a vulnerable system into a separate network to hide it from an attacker.

While the increasing human casualties in Bangladesh is mainly due to the rising population, the increasing number of hacks is (also) due to the increasing amount of assets. Assets, in this context, can be hardware, applications, platforms and data. The more you have, the higher your exposure.

Although it is common to believe that ‘what you don’t own you don’t have to protect’, organizations do not seem to prioritize the inventory of assets. When performing a very basic assessment of the security status of an organization, I prefer to use the CIS security controls as a baseline. The CIS identifies 6 basic controls. The top three are:

  1. Inventory and Control of Hardware Assets
  2. Inventory and Control of Software Assets
  3. Continuous Vulnerability Management

The top two are about being aware of what you own, including software and hardware. In my experience, most of the organizations have no complete overview. In fact, it is often far from complete. Basic questions regarding purpose and function remain unanswered. The common reason being that the organizations find it ‘impossible’ to keep track of so many different assets. If that is the case, the organizations can’t manage the vulnerabilities in the various assets - and it’s safe to assume there are many vulnerabilities present. A high number of vulnerabilities and a large exposure equate a big risk.

One way to mitigate this risk is to try and manage the vulnerabilities. Organizations implement a vulnerability scanner. This will automate the inventory of vulnerabilities, but fixing the detected vulnerabilities requires manual effort. It is not uncommon to find multiple vulnerabilities on one server, and when scanning only a few hundred IP-addresses you can easily end up with an excessively lengthy report. The scanner is not able to detect and repot old or uncommon assets, monitoring network traffic is difficult. Of course, it is possible to sniff the network, but if you rely on artificial intelligence to discern genuine traffic from malicious traffic, it will be quite a naïve mistake.

It may not be feasible in Bangladesh to diminish the exposure by moving people out of the Ganges Delta, but in IT we can. If we want to decrease the risk, the best way is to remove of as many assets from the hypothetical risk scenario. This will decrease the exposure and decrease the effort needed to perform vulnerability management. The best security investment you can make might be to get rid of your legacy applications.

How to convince your environment?

I know from experience removing legacy is not a sexy subject. Of course no-one will really object against cleaning up. Who wants to be against that? However, removing legacy usually requires replacing one or more old applications with a new up-to-date one. This can be very costly and it may have impact on the way of working employees are used to. When faced with the decision to either implement a new application or a project to clean up legacy, the latter will be second choice.

One way to create some urgency is to make the risk visible. A breach in a legacy application may lead to severe damage. We recommend that organizations perform a quantitative risk analysis on the legacy systems and then calculate the risk. It is important to remember that this initiative will likely save significant capital in the long-term.

Monday, July 1, 2019

Quantitative Risk Analysis



IT security essentially reduces information related risk to an acceptable ratio of risk to cost. For this reason, the process begins with an extensive risk assessment – a tried and tested process that can be improved. I am inspired by the work of Douglas Hubbard on this topic. Here’s why.

Traditional risk analysis starts by creating a list of things that could go wrong. We estimate the maximum impact as well as probability and we multiply these figures to “calculate” the risk.

In some cases, this works reasonably well. The (financial) risk of a laptop being stolen from a parked car can be estimated like this. When a company owns a few thousand laptops it will happen several times per year. After a few years the number becomes predictable. The value of a laptop is harder to estimate. During presentations, when I ask the value of a laptop I’m holding up, the response of the audience is usually in the range of the list price of the laptop. After I tell them it holds the only copy of a research project someone has been working on for six weeks, the estimates inevitably rise. The estimates might increase further once we know that the laptop has shareholder information or a customer database saved on its hard drive.

Not all laptops will contain this kind of valuable information. However, these variables are often not adequately addressed in traditional IT risk management. The standard risk calculation is as follows: estimation laptop is stolen is 2% (per year), estimation laptops containing valuable information is 50%, probability of risk is 2% x 50%=1% (per year).

Even for such a simple risk calculation, the analysis is elusively complex. Risks with a very high impact and a low probability are even more elusive. What if your company is put on some blacklist and your cloud-based system, including all your data, becomes unavailable because the provider is not allowed to do business with you anymore? Not a very likely scenario, I admit, but what if this provider was hacked or goes bankrupt? This scenario is perhaps more probably. Worst case you lose all your data, but again this is not very likely. In traditional risk analyses the risk will be calculated using maximum impact:

risk = (almost zero) x (ridiculous amount of money) = nonsense.

Reality is not black and white. The impact of an unwanted event is not zero or maximum damage. Fire in the datacentre is usually very local, because of a failing power supply. A datacentre burning down to the ground is not very common. The damage in any incident will most likely be in the range of tens of thousands of euro’s, the maximum damage will be a number of millions. The fact that damage is probably a lot lower in most cases is completely ignored with this method.

One may argue, for absolute values the traditional methods are not ideal, but at least it’s one strategy to use when prioritizing risk.

In traditional risk analysis risks are categorized in levels of impact (Negligible, minor, moderate, …) and levels of likelihood (improbable, seldom, occasional, …). A matrix is created where all the identified risks are plotted. Usually it is color-coded to categorize high impact and high probability (red), low impact and low probability (green) and everything in between (gradients). It’s often called a heatmap.

Graph source: https://www.sketchbubble.com/en/presentation-risk-heatmap.html 
Hubbard gives a few examples in his book. For this example, the category “seldom” is defined as >1%-25% and category “catastrophic” as >10 Million like in the heatmap example above. Let’s assume we identified two risks:

risk A: likelihood is 2%, impact in 10 Million
risk B: likelihood is 20%, impact is 100 Million

When we calculate the risk as likelihood x impact, risk B is 100 times risk A. They would however be plotted in the same cell of the matrix. In the same way you could end up with risks which are very similar but end up in different cells.

A way to address the fact that there is variance in the impact of any event is to assume that the probability of any amount of damage will follow a normal distribution. A normal distribution is shown in the diagram below. The mean value will have the highest probability, very low and very high values will have a low probability, meaning that the bandwidth will vary. In the diagram below the blue line the mean value for blue, red and yellow lines is zero. The probability we are between -0.5 and 0.5 is much higher with the blue line than with the yellow line.

Graph source: https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Normal_Distribution_PDF.svg
Applied to the stolen laptop example the risk can now be defined as:

Risk = (2% chance per year a laptop is stolen) x 
(90% probability the impact is between 1000 and 25000 euro)

A giant data leak may occur resulting in millions of damages when a single laptop is stolen, the probability however is very low (less than 5%, probably lower).

Note we now define the risk as a change or probability of damage instead of an absolute amount of money. In my opinion, this models the real world, where we can predict the course of the impact of events happening, but we cannot predict the impact of individual events. People who play the lottery will “win” about half of the money they pay for their tickets, but the chance of someone winning the jackpot is miniscule.