A Comprehensive Guide" by Jason Walsh

Table of Contents

Anomaly Detection: An introduction

Anomaly detection, also known as outlier detection, is the identification of rare items, events, or observations that deviate significantly from the expected pattern in a dataset. In the context of AI and machine learning systems, anomaly detection has become increasingly critical for monitoring model behavior, detecting system failures, identifying security threats, and ensuring the reliability of agentic systems.

The importance of anomaly detection in modern AI systems cannot be overstated. As large language models (LLMs) and autonomous agents become more prevalent, detecting unusual behavior patterns helps prevent costly errors, identify model drift, and catch security vulnerabilities before they escalate. For instance, an AI agent making an unusually high number of API calls, generating responses with unexpected token lengths, or exhibiting sudden changes in decision patterns may indicate configuration errors, adversarial attacks, or system degradation.

Types of Anomalies

  • Point anomalies: Individual data points that deviate from normal patterns (e.g., a single API request taking 10 seconds when average is 200ms)
  • Contextual anomalies: Data points that are anomalous in a specific context (e.g., high API usage during off-peak hours)
  • Collective anomalies: Collections of data points that together represent anomalous behavior (e.g., a sequence of LLM responses gradually increasing in toxicity)

Graphical and Exploratory analysis techniques

Visual exploration remains one of the most effective first steps in anomaly detection. Techniques include:

  • Box plots and violin plots for identifying outliers in distributions
  • Scatter plots and pair plots for multivariate anomaly visualization
  • Time series plots for temporal anomalies
  • Heatmaps for correlation analysis and pattern recognition

For AI systems monitoring, dashboards displaying metrics like response latency, token consumption, error rates, and agent decision distributions provide immediate visual feedback on system health.

Statistical techniques in Anomaly Detection

Z-Score Method

The Z-score measures how many standard deviations a data point is from the mean. Points with |Z| > 3 are typically considered anomalies.

import numpy as np
from scipy import stats

def detect_anomalies_zscore(data, threshold=3):
    """
    Detect anomalies using Z-score method.

    Args:
        data: Array of numerical values
        threshold: Number of standard deviations for anomaly threshold

    Returns:
        Boolean array indicating anomalies
    """
    z_scores = np.abs(stats.zscore(data))
    return z_scores > threshold

# Example: Detect anomalous LLM response times
response_times = np.array([0.2, 0.25, 0.3, 0.22, 5.5, 0.28, 0.19, 0.31])
anomalies = detect_anomalies_zscore(response_times)
print(f"Anomalous response times at indices: {np.where(anomalies)[0]}")

Interquartile Range (IQR) Method

The IQR method is robust to outliers in the data itself and works well for skewed distributions.

def detect_anomalies_iqr(data, multiplier=1.5):
    """
    Detect anomalies using Interquartile Range method.

    Args:
        data: Array of numerical values
        multiplier: IQR multiplier (typically 1.5 for outliers, 3.0 for extreme outliers)

    Returns:
        Boolean array indicating anomalies
    """
    Q1 = np.percentile(data, 25)
    Q3 = np.percentile(data, 75)
    IQR = Q3 - Q1

    lower_bound = Q1 - multiplier * IQR
    upper_bound = Q3 + multiplier * IQR

    return (data < lower_bound) | (data > upper_bound)

# Example: Detect anomalous token usage
token_counts = np.array([450, 520, 480, 510, 3500, 490, 475, 530])
anomalies = detect_anomalies_iqr(token_counts)
print(f"Anomalous token usage: {token_counts[anomalies]}")

Isolation Forest

Isolation Forest is particularly effective for high-dimensional data and doesn't rely on distance or density measures.

from sklearn.ensemble import IsolationForest

def detect_anomalies_isolation_forest(data, contamination=0.1):
    """
    Detect anomalies using Isolation Forest algorithm.

    Args:
        data: 2D array of features
        contamination: Expected proportion of anomalies in dataset

    Returns:
        Array of predictions (-1 for anomalies, 1 for normal)
    """
    clf = IsolationForest(contamination=contamination, random_state=42)
    predictions = clf.fit_predict(data)
    return predictions

# Example: Detect anomalous agent behavior patterns
agent_metrics = np.array([
    [0.2, 450, 0.95],  # [latency, tokens, success_rate]
    [0.3, 520, 0.93],
    [5.5, 3500, 0.45], # Anomaly: high latency, high tokens, low success
    [0.25, 480, 0.94],
    [0.28, 510, 0.96]
])
predictions = detect_anomalies_isolation_forest(agent_metrics)
print(f"Anomaly indices: {np.where(predictions == -1)[0]}")

Machine learning methods for Outlier analysis

Autoencoders for Anomaly Detection

Neural network-based autoencoders learn to reconstruct normal patterns. High reconstruction error indicates anomalies.

import torch
import torch.nn as nn

class AnomalyAutoencoder(nn.Module):
    def __init__(self, input_dim, encoding_dim=8):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 16),
            nn.ReLU(),
            nn.Linear(16, encoding_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(encoding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, input_dim)
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

def detect_anomalies_autoencoder(model, data, threshold):
    """
    Detect anomalies based on reconstruction error.

    Args:
        model: Trained autoencoder model
        data: Input data tensor
        threshold: Reconstruction error threshold

    Returns:
        Boolean array indicating anomalies
    """
    model.eval()
    with torch.no_grad():
        reconstructed = model(data)
        mse = torch.mean((data - reconstructed) ** 2, dim=1)
    return mse.numpy() > threshold

One-Class SVM

One-Class SVM learns a decision boundary around normal data points.

from sklearn.svm import OneClassSVM

def detect_anomalies_ocsvm(data, nu=0.1):
    """
    Detect anomalies using One-Class SVM.

    Args:
        data: 2D array of features
        nu: Upper bound on fraction of training errors

    Returns:
        Array of predictions (-1 for anomalies, 1 for normal)
    """
    clf = OneClassSVM(nu=nu, kernel='rbf', gamma='auto')
    predictions = clf.fit_predict(data)
    return predictions

Application to AI/LLM Monitoring

For monitoring agentic systems and LLM deployments, combine multiple techniques:

class LLMMonitor:
    """Monitor LLM and agent behavior for anomalies."""

    def __init__(self):
        self.isolation_forest = IsolationForest(contamination=0.05)
        self.fitted = False

    def fit(self, historical_metrics):
        """
        Train on historical normal behavior.

        Args:
            historical_metrics: Array of shape (n_samples, n_features)
                               Features: [latency, tokens, cost, error_rate, ...]
        """
        self.isolation_forest.fit(historical_metrics)
        self.fitted = True

    def detect_anomalies(self, current_metrics):
        """
        Detect anomalies in current system behavior.

        Returns:
            dict with anomaly scores and specific alerts
        """
        if not self.fitted:
            raise ValueError("Monitor must be fitted first")

        predictions = self.isolation_forest.predict(current_metrics)
        scores = self.isolation_forest.score_samples(current_metrics)

        alerts = []
        for i, (pred, score) in enumerate(zip(predictions, scores)):
            if pred == -1:
                metric = current_metrics[i]
                if metric[2] > 10.0:  # cost threshold
                    alerts.append(f"Cost anomaly: ${metric[2]:.2f}")
                if metric[0] > 5.0:   # latency threshold
                    alerts.append(f"Latency anomaly: {metric[0]:.2f}s")
                if metric[3] > 0.1:   # error rate threshold
                    alerts.append(f"Error rate anomaly: {metric[3]:.1%}")

        return {
            'anomaly_detected': any(predictions == -1),
            'anomaly_scores': scores,
            'alerts': alerts
        }

# Example usage
monitor = LLMMonitor()
historical_data = np.random.normal(loc=[0.3, 500, 0.5, 0.01],
                                   scale=[0.1, 50, 0.1, 0.005],
                                   size=(1000, 4))
monitor.fit(historical_data)

# Check current metrics
current = np.array([[0.35, 520, 0.55, 0.012],  # Normal
                    [8.2, 4500, 15.3, 0.25]])   # Anomaly
result = monitor.detect_anomalies(current)
print(f"Alerts: {result['alerts']}")

Evaluating performance in Anomaly detection techniques

Performance evaluation for anomaly detection requires special consideration since anomalies are rare:

  • Precision: Of detected anomalies, how many are true anomalies?
  • Recall: Of all true anomalies, how many were detected?
  • F1-Score: Harmonic mean of precision and recall
  • ROC-AUC: Area under the receiver operating characteristic curve

For AI systems, additional metrics include:

  • Time to detection (latency between anomaly occurrence and alert)
  • False positive rate (critical for avoiding alert fatigue)
  • Cost of missed anomalies (business impact)

Case study 1: Anomalies in energy data

Case study 2: Detecting anomalies in time series data

Research

Author: Jason Walsh

j@wal.sh

Last Updated: 2025-12-22 12:54:05

build: 2025-12-29 20:00 | sha: 34015db