1 / 31

Anomaly Detection of Web-based Attacks

Anomaly Detection of Web-based Attacks. Cristopher Kruegel & Giovanni Vigna CCS ‘03 Presented By: Payas Gupta. Outline. Web based attacks. XSS attacks Buffer overflow Directory transversal Input validation Code red Anomaly Detection v/s Misuse Detection. Data Model.

Download Presentation

Anomaly Detection of Web-based Attacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anomaly Detection of Web-based Attacks Cristopher Kruegel & Giovanni Vigna CCS ‘03 Presented By: Payas Gupta

  2. Outline

  3. Web based attacks • XSS attacks • Buffer overflow • Directory transversal • Input validation • Code red • Anomaly Detection v/s Misuse Detection

  4. Data Model • Only GET requests with no header • 169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET/scripts/access.pl?user=johndoe&cred=admin" 200 2122 • Only Query string, no path • For query q, Sq={a1,a2} a1=v1 a2=v2 Query Path

  5. Detection model • Each model is associated with weight wm. • Each model returns the probability pm. • A value close to 0 indicates anomalous event i.e. a value of pm close to 1 indicates anomalous event.

  6. Attribute Length • Normal Parameters • Fixed sized tokens (session identifiers) • Short strings (input from HTML form) • So, doesn’t vary much associated with certain prg. • Malicious activity • E.g. for buffer overflow • Goal: to approximate the actual but unknown distribution of the parameter lengths and detect deviation from the normal

  7. Learning & Detection • Learning • Calculate mean and variance for the lengths l1,l2,...,ln for the parameters processed. • N queries with this attribute • Detection • Chebyshev inequality • This computation bound has to be weak, to result in high degree of tolerance (very weak) • Only obvious outliers are flagged as suspicious

  8. Attribute character distribution • Attributes have regular structure, printable characters • There are similarities between the character frequencies of query parameters. • Relative character frequencies of the attribute are sorted in relative order • Normal • freq. slowly decrease in value • Malicious • Drop extremely fast (peak cause by single character distrib.) • Nearly not at all (random values) Passwd – 112 97 115 115 119 110 0.33 0.17 0.17 0.17 0.17 0 255 times ICD(0) = 0.33 & ICD(1) to ICD(4) = 0.17 ICD(5)=0

  9. Why is it useful? • Cannot be evaded by some well-known attempts to hide malicious code in the string. • Nop operation substituted by similar behavior (add rA,rA,0) • But not useful in when small routine change in the payload distribution

  10. Learning and detection • Learning • For each query attribute, its character distribution is stored • ICD is obtained by averaging of all the stored character distributions q1 q2 q3 avg

  11. Learning and detection (cont...) • Pearson chi-square test • Not necessary to operate on all values of ICD consider a small number of intervals, i.e. bins • Calculate observed and expected frequencies • Oi= observer frequencies for each bin • Ei= relative freq of each bin * length of the attribute • Compute chi-square • Calculate probability from chi-square predefined table

  12. Structural inference • Structural is the regular grammar that describes all of its normal legitimate values. • Why?? • Craft attack in a manner that makes its manifestation appear more regular. • For example, non-printable characters can be replaces by groups of printable characters.

  13. Learning and detection • Basic approach is to generalize grammar as long as it seems reasonable and stop before too much structural information is lost. • MARKOV model and Bayesian probability • NFA • Each state S has a set of ns possible output symbols o which are emitted with the probability of ps(o). • Each transition t is marked with probability p(t), likelihood that the transition is taken.

  14. Learning and detection (cont...) So, probability of ‘ab’ Start 0.3 0.7 a|p(a) = 0.5 b|p(b) = 0.5 a|p(a) = 1 0.2 0.4 1.0 0.4 c|p(c) = 1 b|p(b) = 1 1.0 1.0 Terminal P(w) = (1.0*0.3*0.5*0.2*0.5*0.4)+ (1.0*0.7*1.0*1.0*1.0*1.0)

  15. Learning and detection (cont...) By adding the probabilities calculated for each input training element

  16. Learning and detection (cont...) • Aim to maximize the product. • Conflict between simple models that tend to over-generalize and models that perfectly fit the data but are too complex. • Simple model- high probability, but likelihood of producing the training data is extremely low. So, product is low • Complex model- low probability, but likelihood of producing the training data is high. Still product is low. • Model starts building up and generating input data then the states starts building up using Viterbi algorithm.

  17. Learning and detection (cont...) • Detection • The problem is that even a legitimate input that has been regularly seen during the training phase may receive a very small probability values • The probability values of all possible input words sum to 1 • Model return value 1 if valid output otherwise 0 when the value cannot be derived from the given grammar

  18. Token finder • Whether the values of the attributes are from a limited set of possible alternatives (enumeration) • When malicious user try to usually pass the illegal values to the application, the attack can b detected.

  19. Learning and detection • Learning • Enumeration: when different occurrences of parameter values is bound by some threshold t. • Random: when the no of different argument instances grows proportionally • Calculate statistical correlation

  20. Learning and detection (cont...) • Detection • If any unexpected happens in case of enumeration, then it returns 0, otherwise 1 and in case of randomness it always return 1. < 0, enumeration > 0, random

  21. Attribute presence of absence • Client-side programs, scripts or HTML forms pre-process the data and transform in into a suitable request. • Hand crafted attacks focus on exploiting a vulnerability in the code that processes a certain parameter value and little attention is paid on the order.

  22. Learning and detection • Learning • Model of acceptable subsets • Recording each distinct subset Sq={ai,...ak} of attributes that is seen during the training phase. • Detection • The algorithm performs for each query a lookup of the current attribute set. • If encountered then 1 otherwise 0

  23. Attribute order • Legitimate invocations of server-side programs often contain the same parameters in the same order. • Hand craft attacks don’t • To test whether the given order is consistent with the model deduced during the learning phase.

  24. Learning and detection • Learning: • A set of attribute pairs O such that: • Each vertex vi in directed G is associated with the corresponding attribute ai. • For every query ordered list is processed. • Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a directed edge is inserted into the graph from vs to vt.

  25. Learning and detection (cont...) • Graph G contains all ordered constraints imposed by queries in the training data. • Order is determined by • Directed edge • Path • Detection • Given a query with attributes a1,a2,...,ai and a set of order constraints O, all the parameter pairs (aj,ak) with j~=k and 1 <= j,k <= I • Violation then return 0 otherwise 1

  26. Evaluation • Data sets • Apache web server • GOOGLE • University of California, Santa Barbara • Technical university, Vienna • 1000 for training • All rest for testing

  27. Model Validation

  28. Significant entries for Nimda and Code red worm but removed. • Include only queries that results from the invocation of existing programs into the training and detection process. • Also for Google, thresholds were changed to account for higher variability in traffic

  29. Detection effectiveness

  30. Conclusions • Anomaly-based intrusion detection system on web. • Takes advantage of application-specific correlation between server-side programs and parameters used in their invocation. • Parameter characteristics are learned from the input data. • Tested on Google, and two universities in US and Europe

  31. Q / A

More Related