1 / 35

A Framework for Detection and Measurement of Phishing Attacks

A Framework for Detection and Measurement of Phishing Attacks. Reporter: Li, Fong Ruei National Taiwan University of Science and Technology. Reference. Workshop On Rapid Malcode Proceedings of the 2007 ACM workshop on Recurring malcode  Alexandria, Virginia, USA SESSION: Threats 

noel-solis
Download Presentation

A Framework for Detection and Measurement of Phishing Attacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slide 1 (of 35) A Framework for Detection and Measurementof Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology

  2. Slide 2 (of 35) Reference • Workshop On Rapid MalcodeProceedings of the 2007 ACM workshop on Recurring malcode Alexandria, Virginia, USA • SESSION: Threats  • Pages: 1 - 8   • Year of Publication: 2007 • ISBN:978-1-59593-886-2

  3. Slide 3 (of 35) Outline • Introduction • Phishing URL Types • Modeling Phishing URLs • Feature Analysis • Training With Features • Analysis and Findings • Conclusion

  4. Slide 4 (of 35) INTRODUCTION • Phishing is form of identity theft • social engineering techniques • sophisticated attack vectors • To harvest financial information from unsuspecting consumers. • Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page.

  5. Slide 5 (of 35) PHISHING URL TYPES • We examined a black list of phishing URLs maintained by Google • This black list is used to provide phishing protection in Firefox

  6. Slide 6 (of 35) PHISHING URL TYPES • The prominent obfuscation techniques are: • Type I: Obfuscating the Host with an IP address • Type II: Obfuscating the Host with another Domain • Type III: Obfuscating with large host names • Type IV: Domain unknown or misspelled

  7. Slide 7 (of 35) PHISHING URL TYPES

  8. Slide 8 (of 35) MODELING PHISHING URLS • Using logistic regression classifier • For training the model training black list and white list as follows • We use 1245 URLs from this list as our training black list • We used a list of the top 1000 most popular URLs as the basis of our training white list set

  9. Slide 9 (of 35) MODELING PHISHING URLS • Feature Analysis • We categorize our features into four groups: • Page Based • Domain Based • Type Based • Word Based

  10. Slide 10 (of 35) MODELING PHISHING URLS • Page Based : • a numeric value on a scale of [0,1] • relative importance of a page within a set of web pages

  11. Slide 11 (of 35) MODELING PHISHING URLSPage Based : • Page Rank distribution for the white list and black list URLs hostname

  12. Slide 12 (of 35) MODELING PHISHING URLS • Domain Based • This category contains only one feature: • whether or not the URL’s domain name can be found in the White Domain Table.

  13. Slide 13 (of 35) MODELING PHISHING URLSDomain Based • 51.2% of the white list URLs were present in the table • 0.2% of the black list URLs were found in this table.

  14. Slide 14 (of 35) MODELING PHISHING URLS • Type Based • Type I URL • Almost all non-phishing (white list) URLs in our training data do not contain host obfuscation • A significant portion of the phishing URLs are host obfuscated with an IP address. • Type II URL • portion of the black list URLs are Type II URLs.

  15. Slide 15 (of 35) MODELING PHISHING URLSType Based • Distribution of Type I and Type II URLs in the training data

  16. Slide 16 (of 35) MODELING PHISHING URLS • Type Based • Type III URL • we determine the number of characters present after an organization in the hostname

  17. Slide 17 (of 35) MODELING PHISHING URLSType Based • non-phishing URL • http://by124fd.bay124.hotmail.msn.com/cgi-bin/getmsg • 0 characters after msn.com & before the path separator • the maximum number noticed in a white list URL are 14 characters • Type III phishing URLs • 7.34 characters (on average) after the target before the path separator • a maximum of 63 characters

  18. Slide 18 (of 35) MODELING PHISHING URLS • Word Based Features • Phishing URLs are found to contain several suggestive word tokens • login and signin are very often found in a phishing URL • We discarded all tokens with length < 5 • containe several common URL parts such as http://, and www. • We discarded organization name tokens • We further removed query parameters

  19. Slide 19 (of 35) MODELING PHISHING URLS • Distribution of these features in our training set

  20. Slide 20 (of 35) MODELING PHISHING URLS • Training With Features • Our labeled data consisted of 2508 URLs • 1245 were phishing URLs • 1263 were benign URLs • Phishing URLs were placed under the positive (true) class • non-phishing ones were under the negative (false) class • 66% of URLs were used for training and the remaining 34% were used as the test set

  21. Slide 21 (of 35) MODELING PHISHING URLS • To indicate the relative strength of each feature in identifying a Phishing URL we report the corresponding odds ratios, ecoefficient

  22. Slide 22 (of 35) MODELING PHISHING URLS

  23. Slide 23 (of 35) MODELING PHISHING URLS • Evaluation Result • We evaluated the trained model on the 34% test set split. • We performed our evaluation over multiple runs with randomized partitioning. • This evaluation gave us an average accuracy of 97.31% with • True Positive Rate of 95.8 % • False Positive Rate of 1.2%.

  24. Slide 24 (of 35) ANALYSIS AND FINDINGS • We collected several million URLs from August 20th to August 31 2006 • The data consisted of two main components , unique URLs • which are visited each day • consecutive look up requests to these URLs

  25. Slide 25 (of 35) ANALYSIS AND FINDINGSAverage Phishing URLs per day. • The average number of phishing URLs which have been visited from Google’s toolbar in a day. • we find that on average there are • 777 URL phishing attacks in a day • 5073 viewers to a phishing page

  26. Slide 26 (of 35) ANALYSIS AND FINDINGSAverage Phishing URLs per day. • the distribution of phishing attacks on each day of our study.

  27. Slide 27 (of 35) ANALYSIS AND FINDINGSAverage Phishing URLs per day.

  28. Slide 28 (of 35) ANALYSIS AND FINDINGSAverage Phishing URLs per day.

  29. Slide 29 (of 35) ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day. • Determine how many users interact with a phishing page • A user that has any interaction at a site classified as phishing is regarded as a potential phishing victim.

  30. Slide 30 (of 35) ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day. • Based on the number of users who view phishing pages in a day, we further can infer Potential Success Rate of a phisher as follows:

  31. Slide 31 (of 35) ANALYSIS AND FINDINGSAverage Potential Phishing Victims per day. • the distribution of phishing attacks on each day of our study.

  32. Slide 32 (of 35) ANALYSIS AND FINDINGSDistribution of Phishing by Organization

  33. Slide 33 (of 35) ANALYSIS AND FINDINGSGeographical Distribution of Phishing. • To determine country that hosts a particular phishing URL, we used Google’s IP to Geo-Location infrastructure.

  34. Slide 34 (of 35) Anti-Phishing Tools

  35. Slide 35 (of 35) CONCLUSION • We use our features in a logistic regression classifier that achieves a very high accuracy. • One of the major contributions of this work is a large scale measurement study conducted on Google Toolbar URLs • On average we found around 777 unique phishing pages per day and on average 8.24% of the number users who view phishing pages are potential phishing victims

More Related