1 / 7

CS 548 – Project 3

CS 548 – Project 3. Association Rules. Correlation coefficient. Symmetric measure of correlation Compute contingency table with support counts: Use formula : Weka code in AprioriItemSet.java :. public double correlationForRule ( AprioriItemSet premise, AprioriItemSet consequence,

lori
Download Presentation

CS 548 – Project 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 548 – Project 3 Association Rules Skyler Whorton – March 29, 2012

  2. Correlation coefficient • Symmetric measure of correlation • Compute contingency tablewith support counts: • Use formula: • Weka code inAprioriItemSet.java: public double correlationForRule(AprioriItemSet premise, AprioriItemSetconsequence, intpremiseCount, intconsequenceCount) { // Compute contingency table entries double N = (double)m_totalTransactions; double f11 = (double)m_counter; double f1x = (double)premiseCount; double fx1 = (double)consequenceCount;double f0x = N - f1x; double fx0 = N - fx1;double f10 = f1x - f11; double f01 = fx1 - f11; // Support count of “not A and not B” double f00 = fx0 - f10; // Calculate ratio numerator and denominator double num = f11 * f00 - f01 * f10; double denom = Math.sqrt(f1x * fx1 * f0x * fx0); // Return ratio return num/denom; } Skyler Whorton – March 29, 2012

  3. College data • Pre-processing: • Equal-frequencydiscretization into3 bins, “Lo,” “Med,” “Hi” • Binarize intoitem-type attributes • Remove id, name,state • Objectives: • Which groups of features are highly associated? • Which are associated with high tuition costs? • What are some different trends between public vs. private schools? Skyler Whorton – March 29, 2012

  4. College data • inStateTuitionHi, stuFacRatioLo → priv • numFtUndergradHi, inStateTuitionLo, pctAlumniGiveLo → pub CAR Rules Skyler Whorton – March 29, 2012

  5. ASSistments data • Dataset of 241 teachers, 1,500 problem sets, 1M logs • Can I make Netflix/Amazon-style recommendations based on these problem set data? (No.) • Logs too sparse—only ~4,000 items total • Average of 1% transaction width per teacher • Highest-supported rule: 31 instances of premise Skyler Whorton – March 29, 2012

  6. ASSISTments data Skyler Whorton – March 29, 2012

  7. ASSistments data • Findings • Problem set associations • “Evaluating Expressions” -> “Equation Solving (1)”, etc. • Mined associations are highly confident • Not enough data to make many recommendations • Wide, sparse data set • Use leverage and lift to your advantage • Few highly-supported itemsets • Teachers assigning similar content • Similar account creation date, and/or • Similar school e-mail domains Skyler Whorton – March 29, 2012

More Related