340 likes | 558 Views
Nir Friedman Hebrew University nir@cs.huji.ac.il. Lise Getoor Stanford University getoor@cs.stanford.edu. Daphne Koller Stanford University koller@cs.stanford.edu. Avi Pfeffer Stanford University avi@cs.stanford.edu. Learning Probabilistic Relational Models.
E N D
Nir Friedman Hebrew University nir@cs.huji.ac.il Lise Getoor Stanford University getoor@cs.stanford.edu Daphne Koller Stanford University koller@cs.stanford.edu Avi Pfeffer Stanford University avi@cs.stanford.edu Learning Probabilistic Relational Models
Learning from Relational Data • Data sources • relational and object-oriented databases • frame-based knowledge bases • World Wide Web • Traditional approaches • work well with flat representations • fixed length attribute-value vectors • assume IID samples • Problem: • must fix attributes in advance can represent only some limited set of structures • IID assumption may not hold
Our Approach • Probabilistic Relational Models (PRMs) • rich representation language models • relational dependencies • probabilistic dependencies • Learning PRMs • parameter estimation • model selection from data stored in relational databases
Outline • Motivation • Probabilistic relational models • Probabilistic Logic Programming[Poole, 1993]; [Ngo & Haddawy 1994] • Probabilistic object-oriented knowledge[Koller & Pfeffer 1997; 1998]; [Koller, Levy & Pfeffer; 1997] • Learning PRMs • Experimental results • Conclusions
Probabilistic Relational Models • Combine advantages of predicate logic & BNs: • natural domain modeling: objects, properties, relations; • generalization over a variety of situations; • compact, natural probability models. • Integrate uncertainty with relational model: • properties of domain entities can depend on properties of related entities; • uncertainty over relational structure of domain.
Classes Student Professor Intelligence Popularity Performance Teaching-Ability Stress-Level Relationships Attributes Registration Course Grade Difficulty Satisfaction Rating Relational Schema Take Teach In • Describes the types of objects and relations in the database
Example instance I • Professor • Prof. Gump • Popularity • high • Teaching Ability • medium • Stress-Level • low • Student • John Doe • Intelligence • high • Performance • average • Student • Jane Doe • Intelligence • high • Performance • average • Reg • #5639 • Grade • A • Satisfaction • 3 • Reg • #5639 • Grade • A • Satisfaction • 3 • Course • Phil142 • Difficulty • low • Rating • high • Course • Phil101 • Difficulty • low • Rating • high • Reg • #5639 • Grade • A • Satisfaction • 3
Objects • Student • Judy Dunn • Intelligence • high • Performance • high Relations Attribute Values What’s Uncertain? • Professor • Prof. Gump • Popularity • high • Teaching Ability • medium • Stress-Level • low • Student • John Doe • Intelligence • high • Performance • average • Student • Jane Doe • Intelligence • high • Performance • average • Reg • #5639 • Grade • A • Satisfaction • 3 • Reg • #5639 • Grade • A • Satisfaction • 3 • Course • Phil142 • Difficulty • low • Rating • high • Course • Phil101 • Difficulty • low • Rating • high • Reg • #5639 • Grade • A • Satisfaction • 3
Attribute Uncertainty • Professor • Prof. Gump • Popularity • ??? • Teaching Ability • ??? • Stress-Level • ??? • Student • John Deer • Intelligence • ??? • Performance • ??? • Student • Jane Doe • Intelligence • ??? • Performance • ??? • Reg • #5639 • Grade • A • Satisfaction • 3 • Reg • #5639 • Grade • A • Satisfaction • 3 • Course • Phil142 • Difficulty • ??? • Rating • ??? • Course • Phil101 • Difficulty • ??? • Rating • ??? • Reg • #5639 • Grade • ??? • Satisfaction • ??? Fixed skeleton • set of objects in each class • relations between them Uncertainty • over assignments of values to attributes
Popularity Teaching-Ability Stress-Level PRM: Dependencies Professor Student Intelligence Performance Course Difficulty Rating Reg Grade Satisfaction
Student • John Deer • Intelligence • low • Performance • average • Reg • #5639 • Grade • ? • Satisfaction • 3 PRM: Dependencies (cont.) • Professor • Prof. Gump • Popularity • high • Teaching Ability • medium • Stress-Level • low • Student • John Doe • Intelligence • high • Performance • average • Student • Jane Doe • Intelligence • high • Performance • average • Reg • #5639 • Grade • A • Satisfaction • 3 • Reg • #5639 • Grade • A • Satisfaction • 3 • Course • Phil142 • Difficulty • low • Rating • high • Course • Phil101 • Difficulty • low • Rating • high • Reg • #5639 • Grade • ? • Satisfaction • 3
Professor Student • Student • Jane Doe • Intelligence • high • Performance • average Intelligence Performance Course avg • Reg • #5077 • Grade • C • Satisfaction • 2 Difficulty Problem!!! Need CPTs of varying sizes • Reg • #5054 • Grade • C • Satisfaction • 1 Rating • Reg • #5639 • Grade • A • Satisfaction • 3 Popularity Teaching-Ability Satisfaction Stress-Level PRM: aggregate dependencies Reg Grade
Popularity Teaching-Ability Stress-Level PRM: aggregate dependencies Professor Student Intelligence Performance Course Difficulty count avg Rating Reg Grade Satisfaction avg sum, min, max, avg, mode, count
Value of attribute A in object x Attributes Classes Objects PRM: Summary • A PRM specifies • a probabilistic dependency structure S • a set of parents for each attribute X.A • a set of local probability modelsq • Given a skeleton structure , a PRM specifies a probability distribution over instances I: • over attribute values of all objects in
Learning PRMs Reg Course Database: Student Instance I PRM Reg • Parameter estimation Course Student Relational Schema • Structure selection
Parameter estimation in PRMs • Assume known dependency structure S • Goal: estimate PRM parameters q • entries in local probability models, • A parameterization q is good if it is likely to generate the observed data, instance I. • MLE Principle: Choose q* so as to maximize l • crucial property: decomposition • separate terms for different X.A
Student Course Intelligence Difficulty Performance Rating Reg table Student table Course table ML parameter estimation Reg Grade Satisfaction sufficient statistics DB technology well-suited to the computation of suff statistics: Count
Model Selection • Idea: • define scoring function • do local search over legal structures • Key Components: • scoring models • legal models • searching model space
Scoring Models • Bayesian approach: • closed form solution
Researcher Paper Reputation Accepted author-of if X.A depends on Y.B y.b x.a Legal Models • Dependency ordering over attributes: • PRM defines a coherent probability model over skeleton if is acyclic
PRM dependency structure S dependency graph Y.B if X.A depends directly on Y.B X.A Attribute stratification: dependency graph acyclic acyclic for any Guaranteeing Acyclicity How do we guarantee that a PRM is acyclic for every skeleton?
M-chromosome M-chromosome M-chromosome Person.M-chrom Person.P-chrom P-chromosome P-chromosome P-chromosome ??? Person.B-type Blood-type Blood-type Blood-type Limitation of stratification Father Mother Person Person Person
M-chromosome M-chromosome M-chromosome P-chromosome P-chromosome P-chromosome Blood-type Blood-type Blood-type Guaranteed acyclic relations Father Mother Person Person Person • Prior knowledge: the Father-of relation is acyclic • dependence of Person.A on Person.Father.B cannot induce cycles
Person.M-chrom Person.P-chrom Person.B-type Guaranteeing acyclicity • With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. • We color the edges in the dependency graph X.A X.A X.A yellow: within single object green: via g.a. relation red: via other relations X.B Y.B Y.B • A cycle is safe if • it has a green edge • it has no red edge
Add C.AC.B score Searching Model Space Phase 0: consider only dependencies within a class Course Student Reg Course Student Reg DeleteS.IS.P score Course Student Reg
Add C.AR.B score Phased structure search Phase 1: consider dependencies from “neighboring” classes, via schema relations Course Student Reg Course Student Reg AddS.IR.C score Course Student Reg
Add C.AS.P score Phased structure search Phase 2: consider dependencies from “further” classes, via relation chains Course Student Reg Course Student Reg AddS.IC.B Course Student Reg score
Actor Gender Appears Role-type Experimental Results:Movie Domain (real data) 11,000 movies, 7,000 actors Movie Process Decade Genre source: http://www-db.stanford.edu/movies/doc.html
Person Blood-Test Contaminated M-chromosome M-chromosome M-chromosome Result P-chromosome P-chromosome P-chromosome Blood-type Blood-type Blood-type Genetics domain (synthetic data) Father Mother Person Person
Experimental Results -18000 -20000 -22000 -24000 Median Likelihood Score Gold Standard -26000 -28000 -30000 -32000 200 300 400 500 600 700 800 Dataset Size
Future directions • Learning in complex real-world domains • drug treatment regimes • collaborative filtering • Missing data • Learning with structural uncertainty • Discovery • hidden variables • causal structure • class hierarchy
Conclusions • PRMs natural extension of BNs: • well-founded (probabilistic) semantics • compact representation of complex models • Powerful learning techniques • builds on BN learning techniques • can learn directly from relational data • Parameter estimation • efficient, effective exploitation of DB technology • Structure identification • builds on well understood theory • major issues: • guaranteeingcoherence • search heuristics