Automating Analysis of Large-Scale Botnet Probing Events

Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST) Northwestern University * UC Berkeley / ICSI

Motivation IPv4 Space Botnets Can we answer this question with only limited information observed locally in the enterprise? Enterprise Does this attack specially target us? Administrators

Motivation • Can we infer the probe strategy used by botnets? • Can we infer whether a botnet probing attack specially targets a certain network, or we are just part of a larger, indiscriminant attack? • Can we extrapolate botnet global properties given limited local information?

Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Evaluation • Conclusions

Botnet Probing Events Big spikes of larger numbers of probers mainly caused by botnets

System Framework See the paper for subtle system details.

Discover the Botnet Probing Strategies • Use statistical tests to understand probing strategies • Leverage on existing statistical tests • Monotonic trend checking: detect whether bots probe the IP space monotonically • Uniformity checking: detect whether bots scan the IP range uniformly. • Design our own • Hitlist (liveness) checking: detect whether they avoid the dark IP space • Dependency checking: do the bots scan independently or are they coordinated?

Design Space

Hitlist Checking • Configure the sensor to be half darknet and half honeynet • Use metric θ= # src in darknet/ # src in honeynet. • Threshold 0.5

Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Global scan scope, total # of bots, total # of scans, total scan rate for each bot • Evaluation • Conclusions

Extrapolate Global Properties: Basic Ideas and Validation • Observe the packet fields that change with certain patterns in continuous probes. • IPID: a packet field in IP header used for IP defragmentation • Ephemeral port number: the source port used by bots • Increment for a fixed # per scan • Validation • IPID continuity: All versions of Windows and MacOS • Ephemeral port number continuity: botnet source code study • Agobot, Phatbot, Spybot, SDbot, rxBot, etc. • Control experiments with NAT

IPID T Estimate Global Scan Rate of Each Bot • Count the IPID & ephemeral port # changes • Recover the overflow of IPID and ephemeral port number • Estimate the rate with linear regression when correlation coefficient > 0.99 • Counter overestimation: use less of the two

Extrapolate Global Scan Scope IPv4 Space Botnets boti ni=100 Total scans from boti: scan rate Ri * scan time Ti = 100*1000=100,000 Local/global ratio Aggregating multiple bots

Extrapolate Global # of Bots • Idea: similar to Mark and Recapture • Assumption: All bots have the same global scan range • Total M=4000 Bots M • First half m1=1000 • Second half m2=1000 • Observed by both m12= 250 m1 m2 M=m1*m2/m12 m12

Dataset • Based on a 10 /24 honeynet in a National Lab (LBNL) • 293GB packet traces in 24 months (2006-07) • Totally observed 203 botnet probing events • Average observed #bots/event is 980. • Mainly on SMB/WINRPC, VNC, Symantec, MSSQL, HTTP, Telnet • Size of the system: 13,900 lines: Bro (6,000), Python (4,000), C++ (2,500), R (1,400)

Property Checking Results • More than 80% uniform scanning • Validate the results through visualization and find the results are highly accurate.

Extrapolation Results • Most of extrapolated global scopes are at /8 size, which means the botnets do not target the enterprise (LBNL). • Validation based with DShield data • DShield: the largest Internet alert repository • Find the /8 prefixes in DShield with sufficient source (bots) overlap with the honeynet events • Due to incompleteness of Dshield data, 12 events validated • Calculate the scan scope in each /8 based on sensor coverage ratio.

Extrapolation Validation • Define scope factor as max(DShield/Honeynet,Honeynet/DShield) 75% within 1.35 All within 1.5 CDF of the scope factor

Conclusions • Develop a set of statistical approaches to assess four properties of botnet probing strategies • Designed approaches to extrapolate the global properties of a scan event based on limited local view • Through real-world validation based on DShield, we show our scheme are promisingly accurate

Backup

Event size distribution

Extrapolate the scope Probes observed locally Local/global ratio Estimate global probing rate Probing time window

Monotonic trend checking • Goal: detect whether the bots probe the IP space monotonically • E.g. simple sequential probing • Technique: • Mann-Kendall trend test • Intuition: check whether the aggregated sign value (sign(Ai+1-Ai)) out of the range of randomness can achieve. • When most (>80%) senders in an events follow trend we label the events follow trends

Uniformity Checking • Goal: detect whether the botnet scan the IP range uniformly. • Technique: • Chi-Square test • Intuition: put address into bins. The scan observed in each bin should be similar. • Significance level of 0.5%

Dependency Checking • Goal: Is the bots try to get out each other’s way? • Idea: account the number of address receive zero scan and comparing with confidence interval of the independent random case.

Automating Analysis of Large-Scale Botnet Probing Events