1 / 10

A Perspective on the Data

A Perspective on the Data. Ajit Paul Singh M.Sc. Candidate Dept. of Computing Science University of Alberta. Machine Learning. Systems that use experience to improve at a given task . Data as experience Supervised vs. Unsupervised Learning SNP focus: supervised learning.

tivona
Download Presentation

A Perspective on the Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Perspective on the Data Ajit Paul Singh M.Sc. Candidate Dept. of Computing Science University of Alberta

  2. Machine Learning • Systems that use experience to improve at a given task. • Data as experience • Supervised vs. Unsupervised Learning SNP focus: supervised learning

  3. The Running Example

  4. Data Assumptions • Samples are independent, and identically distributed (IID) • Dealing with patients/tuples • One set  complex distribution  more training data • Split into subsets  many simpler distribution  less training data per problem

  5. Defining the Task • Predictive • Diagnosing members of the public • Rare class issue • Diagnosing clinic referrals • Is the training set representative of patients that will be tested ? • Subtyping cancer patients • Feature Selection • Find interesting SNPs for further study

  6. Measuring Improvement • Competitors • Human experts using clinical data • Diagnostic tests (e.g. BRCA1 truncations) • Other learners using genetic markers • Benefits of Polyomx • Accuracy, Cost, Speed • Need for a baseline to compare against

  7. Issues to Consider • Missing data • Negative control features

  8. Types of Missing Data • Missing Completely At Random (MCAR) • Missing At Random (MAR) • Censored

  9. Negative Control Features • SNPs were hand selected • Feature selection problem • Measuring relevance of selected features • Prediction problem • Ensuring the learner is robust • Add negative control features • Features that are probably irrelevant

More Related