Data Mining for Intrusion Detection: A Critical Review

Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)

Knowledge Discovery from databases (KDD) • Five steps • (1) Understanding the application domain • (2) Data integration and selection • (3) Data mining • (4) Pattern evaluation • (5) Knowledge representation

Data Mining Meets Intrusion Detection • IDS: Detection and anomaly detection • Misuse detection: Requires a collection of known attacks • Anomaly detection: Requires user or system profile • IDS: Host-based and network-based IDS • Host-based: Analyze host-bound audit sources such as audit trails, system logs, or application logs. • Network-based: Analyze packets captured on a network • MADAM ID: At Columbia University---Learn classifiers that distinguish between intrusions and normal activities • (i) Training connection records are partitioned into---normal connection records and intrusion connection records • (ii) Frequent episode rules are mined separately for the two categories of training data---form intrusion-only patterns • (iii) Intrusion-only patterns are used to derive additional attributes---indicative of intrusive behavior • (iv) Initial training records are augmented with the new attributes • (v) A classifier is learnt that distinguishes normal records from intrusion records---the misuse IDS – the classifier ---is the end product of MADAMID

ADAM • Network-based anomaly detection system • Learns normal network behavior from attack-free training data and represents it as a set of association rules---the profile • At runtime, the records of the past δ seconds are continuously mined for new association rules that are not contained in the profile---which are sent to a classifier which separates false positives from true positives • Its association rules are of the form: ∏ Ai = vi • Each association rule must have the source host and destination host and destination port among the attributes • Multi-level association rules have been introdfuced to capture coordinated and distributed attacks

Clustering of Unlabeled ID Data • Main focus: Training anomaly detection systems over noisy data • Number of normal elements in the training data is assumed to be significantly larger than the number of anomalous elements • Anomalous elements are assumed to be qualitatively different from normal ones • Thus, anomalies appear as outliers standing out from normal data---thus explicit modeling of outliers results in anomaly detection • Use of clustering--- all normal data may cluster into similar groups and all intrusive into the others---intrusive ones will be in small clusters since they are rare • Real-time data is compared with the clusters to determine a classification • Network-based anomaly detection has been built • In addition to the intrinsic attributes (e.g., source host, destination host, start time, etc.), connection records also include derived attributes such as the #of failed login attempts, the #of file-creation operations as well as various counts and averages over temporally adjacent connection records • Euclidean distance is used to determine similarity between connection records

Mining the Alarm Stream • Applying data mining to alarms triggered by IDS • (i) Model the normal alarm stream so a sto henceforth raise the severity of “abnormal alarms” • (ii) Extract predominant alarm patterns---which a human expert can understand and act upon---e,g., write filters or patch a weak IDS signature • Manganaris et al: • Models alarms as tuples (t,A)---t timestamp and A is an alarm type • All other attributes of an alarm are ignored • The profile of normal alarm behavior is learned as: • Time-ordered alarm stream is partitioned into bursts • Association rules are mined from the bursts • This results in profile of normal alarms • At run time various tests are carried out to test if an alarm burst is normal

Clifton and Gengo; Julisch: • Mine historival alarm logs to find new knowledge---to reduce the future alarm load---e.g., to write filtering rules to discard false positives • Tools: Frequent episode rules • Attribute-oriented induction • Repeated replacing attributes by more abstract values • E.g., IP addresses to networks, timestamps to weekdays, and ports to port ranges; the hierarchies are provided by user • Generalization helps previously distinct alarms getting merged into a few classes---huge alarm logs are condensed into short and comprehensible summaries---reduces the alarm load by 80%

Isolated application of data mining techniques can be a dangerous activity---leading to the discovery of meaningless or misleading patterns • Data mining without a proper understanding of the application domain should be avoided • Validation step is extremely important

Data Mining for Intrusion Detection: A Critical Review