160 likes | 290 Views
Feature Engineering Studio. September 9, 2013. Welcome to Problem Proposal Day. Rules for Presenters Rules for the Rest of the Class. Rules for Presenters. Talk for 3 minutes on: Data set What variable will you predict? What kind of variables will you use to predict it?
E N D
Feature Engineering Studio September 9, 2013
Welcome to Problem Proposal Day • Rules for Presenters • Rules for the Rest of the Class
Rules for Presenters • Talk for 3 minutes on: • Data set • What variable will you predict? • What kind of variables will you use to predict it? • Why is this worth doing? • Remember to send me your slides (if any)
Rules for Audience • After the presentation • Ask quick questions • Give quick suggestions
Criteria • Everyone • Is the problem genuinely important? (usable or publishable) • Is there a good measure of ground truth? • Only if you know what you’re talking about • Is there rich enough data to distill meaningful features? • Is there enough data to be able to take advantage of data mining?
Rules for Audience • Be polite! • No interrupting • No rambling • No being mean
First Step • Get into the right collaborative spirit • You are officially encouraged (though not required)to sing along • http://www.youtube.com/watch?v=pd_5-2kCzfs • 0:25
Presentations • Alphabetical Order Based on Last Name • Tie-Breaker: First Name
For next week • Think about how to improve your problem proposal • Rewrite your problem proposal based on the feedback you got today • Then email it to me for further feedback and a “thumbs-up” before the next class
Assignment 2 • Data Familiarization“Mucking Around” • Get your data set • Open it in Excel • Look at your ground truth label (if you have one) • Look at other key variables • What does each variable mean semantically? • If numerical, what are its max, min, average, stdev? Create histograms of key variables. • If categorical, what is the distribution of each value?
Assignment 2 • Data Familiarization“Mucking Around” • Write a brief report for me • You don’t need to prepare a presentation • But be ready to discuss what you learn about your data
What if you don’t have data yet? • Get your data • If you can’t get your data before class, email me at least 48 hours before class and I’ll send you a practice data set
How to compute in Excel • If numerical, what are its max, min, average, stdev? • If categorical, what is the distribution of each value? • Using Class2Data
How to do a histogram in Excel • Using Class2Data
Next Class • 9/23 Feature distillation in Excel (Asgn.2 due) • Do the assignment • Read the readings
Upcoming Classes • 9/23 Feature distillation in Excel (Asgn.2 due) • 9/25 Special session on prediction models • Come to this if you don’t know why student-level cross-validation is important, or if you don’t know what J48 is • 9/30 Advanced feature distillation in Excel (Asgn. 3 due) • 10/2 Special session on RapidMiner • Come to this if you’ve never built a classifier or regressor in RapidMiner (or a similar tool) • Statistical significance tests using linear regression don’t count…