1 / 20

Detecting Influenza Outbreaks by Analyzing Twitter Messages

Detecting Influenza Outbreaks by Analyzing Twitter Messages. By Aron Culotta. Jedsada Chartree 02/28/11. Outline. Introduction Motivations Data Methodology Results Conclusion Reference. Introduction. The growing in monitoring disease outbreaks using the Internet

vina
Download Presentation

Detecting Influenza Outbreaks by Analyzing Twitter Messages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Influenza Outbreaks by Analyzing Twitter Messages By AronCulotta Jedsada Chartree 02/28/11

  2. Outline • Introduction • Motivations • Data • Methodology • Results • Conclusion • Reference

  3. Introduction • The growing in monitoring disease outbreaks using the Internet • The growing of Twitter

  4. Motivations • Developing methods that can reliably track ILI rates in real-time.

  5. Data • The U.S. Centers for Disease Control and Prevention (CDC) • Twitter data • 36 week period from August 29, 2009 to May 8, 2010.

  6. Data The ILI rates from the CDC’s weekly tracking statistics (09/05/09 to 05/08/10) The number of Twitter messages collected per week

  7. Methodology • Gathering the ILI rates and Twitter messages • Finding the correlation between the ILI rates and Twitter messages P = The proportion of the population exhibiting in ILI symptoms W = {w1…wk} = A set of k keywords, D = Document collection = The coefficients = The error term Q(W,D) = The fraction of documents in D the match W (|Dw|/|D|) Logit(P) = ln(P/(1-P))

  8. Methodology • Filtering spurious matches (noise) The number of messages containing the keyword “flu” and a number of keywords that might lead to spurious correlations.

  9. Methodology • Filtering spurious matches by supervised learning - Training a document classifier using logistic regression

  10. Methodology • Filtering spurious matches by supervised learning - Combining filtering with regression 1. Soft classifier

  11. Methodology • Filtering spurious matches by supervised learning - Combining filtering with regression 2. Hard classifier • Applying both classifier to the simple linear model.

  12. Methodology • Evaluating false alarms by simulation - Sample 1,000 messages deemed to be spurious. - Sample with replacement an increasing number of the spurious messages and add them to the original message set. - Use the same trained regression models.

  13. Results Fitted and predicted ILI rates using regression over query fractions of Twitter messages

  14. Results Fitted and predicted ILI rates using regression over query fractions of Twitter messages

  15. Results Correlation results with refinements of the flu query

  16. Results Correlation results with refinements of the flu query

  17. Results

  18. Results Number false messages added

  19. Conclusion • The proposed method can be used to track influenza rates from Twitter messages. • The proposed evaluating false alarm can be used satisfying.

  20. References • AronCulotta. 2010. Detecting influenza outbreaks by analyzing Twitter messages. • Jeremy Ginsberg and others. 2009. Detecting influenza epidemics using search engine query data.

More Related