Part IIi: AIOps 2.0 - The Evolution of Correlation
Static correlation is the process of investigating historic logs and events to analyze the failure or breach after an incident. Through applying the historical data to a statistical correlation model, we can analyze log data and identify complex patterns from past events. This can also help us discover threats that may have compromised your network’s security, or give you information about an ongoing attack (ManageEngine, 2021).
This type of correlation involves obtaining a batch of operational data from event and log sources and performing a statistical analysis or statistical learning task against the batch of data. This is typically performed with a classical clustering algorithm like K-Means or Hierarchical clustering. As an output of the clustering algorithm, all event types are assigned to a static prototype group. These groups are analyzed by human operators and groups that are determined to be understandable and actionable are deployed in production as a static rule. For example, if event type 1 and event type 2 belong to the same prototype group then every time they occur in a time window they are grouped together.
Pros:
- Potential for significant compression based on an initial application of clustering across all the output prototypes
Cons:
- Once a cluster is ‘approved’ it becomes a static entity that will only change if a human operator decides to modify it, either based on tribal knowledge or further statistical analysis.Â
- Rules are built on top of these static clusters- e.g. notification rules, assignment rules, escalation rules, etc.
- Maintenance of the clusters and the upstream rules grows exponentially as the number of clusters grows.
- Any change in the underlying fabric of the network / services requires additional statistical analysis and merging new knowledge into the existing cluster prototype ‘library’.
- Can’t capture ‘fuzzy’ boundaries across clusters. An event type either belongs to a cluster or it doesn’t, but the reality is more subtle.
Level 2B. Static Correlation with some statistical enhancements (SCSE)
This type of correlation involves defining a set of code-books or recipes that specify a predetermined relationship between event types. Once the relationship is established statistical learning techniques can be applied to reduce intra-group noise by filtering out events from the group if the events do not meet a specified threshold of information gain or entropy. In other words, if the particular event occurs very often but meets the code-book specification for membership, it may be filtered out because it doesn’t provide any useful additional information.
Pros:
- Potential for significant compression once a large number of code-books are developed;
Cons:
- Cook books are typically defined using mostly tribal knowledge, with some statistical tools to provide backup or guideposts;
- Once a cook-book is defined it becomes a static entity that will only change if a human operator decides to modify it. Rules are built on top of these statically generated clusters- e.g. notification rules, assignment rules, escalation rules, etc.;
- Maintenance of the cook-books and the upstream rules grows exponentially as the number of cook-books grows;
- Any change in the underlying fabric of the network / services requires additional analysis and merging new knowledge into the existing cook-book ‘library’;
- Can’t capture ‘fuzzy’ boundaries across clusters;
- An event type either belongs to a cluster or it doesn’t, but the reality is more subtle;
- Applying statistical tests (entropy) to filter membership helps a little, but doesn’t allow for much variability in the dynamics of the network;
UPCOMING!​
In the next part of AIOPS 2.0, we’ll dig further into each of these levels and level 3: Dynamic AI-centric (DAC) event correlation