1 / 19

Program 3

BCNF vs 3NF Program 3 BCNF Written recursively Pseudo code: doBCNF(schema r , fds) Let fd = BCNF violating dependency in fds if fd is null then return else split into two tables table1 = fd[domain] U fd[range] table2 = r – fd[range] doBCNF(table2, fds) BCNF Results

liam
Download Presentation

Program 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCNF vs 3NF Program 3

  2. BCNF • Written recursively • Pseudo code: doBCNF(schema r , fds) Let fd = BCNF violating dependency in fds if fd is null then return else split into two tables table1 = fd[domain] U fd[range] table2 = r – fd[range] doBCNF(table2, fds)

  3. BCNF Results • Average time taken to do decomposition • 36ms • Pros: • Easy to code • Fast • Cons: • Fds lost (see sample inputs)

  4. BCNF Sample Inputs • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {C,A}, {D,E,G}, {B,C,D} } • Lost Fds: • ACD->B, AB->C,BE->C,GC->BD,CE->AG

  5. Another BCNF Example • From Class Also • Input: • R = {C,T,H,R,S,G} • F = {CS->G, C->T, HR->C, HS->R, HT->R} • Output: • { {C,S,G}, {C,T}, {H,R,C}, {H,R,S} } • Lost Fds • HT->R

  6. Beers Example (from book) • Input: • R = {beer, manf, bar, bar_addr, license, person, phone, person_addr, price} • F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } } • Output: • { {beer, manf}, {bar, bar_addr, license}, {person, phone, person_addr}, {beer, bar, price} } • Lost Fds • None

  7. Movies Example • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress} • Lost Fds • LeadActress,Year->Movie

  8. 3NF Synthesis • Implementation • Broke into modules that performed a separate step in the algorithm • MinimalCover, MergeLHS, FormSubSchema, MergeSubShema, AddMissing, AddKey • Average Time Taken: 200ms • Pros: • Lossless • Cons: • Slow and complex

  9. 3NF Example 1 • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {A,B,C}, {B,C,D}, {D,E,G}, {B,E,C}, {C,G,B}, {C,E,G} }

  10. 3NF Example 2 • From Class Also • Input: • R = {C,T,H,R,S,G} • F = {CS->G, C->T, HR->C, HS->R, HT->R} • Output: • { {S,C,G}, {C,T}, {R,H,C}, {S,H,R}, {T,H,R} }

  11. 3NF Example 3 • Input: • R = {beer, manf, bar, bar_addr, license, person, phone, person_addr, price} • F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } } • Output: • { {beer, manf}, {bar,bar_addr,license}, {person,phone,person_addr}, {beer,bar,price} } • Same as BCNF

  12. 3NF Example 4 • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress}, {Year,LeadActress,Movie}

  13. Hybrid Approach • First use 3NF to generate temporary schemas • Next, use BCNF on each of the temporary schemas to further decompose • Eliminate possible redundancies • Average Time: 227ms (which is to be expected)

  14. Hybrid Example 1 • Example From Class • Input: • R = {A,B,C,D,E,G} • F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG} • Output: • { {C,A}, {B,C,D}, {D,E,G}, {E,B,C}, {G,C,B}, {E,C,G} } • Lost Fds • ?

  15. Hybrid: Movies • Input: • R={Movie, Year, Director, Studio, LeadActor, LeadActress, Duration, BoxOfficeRevenue, Awards, Rating} • F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} } • Output • {Movie,Year,Director,Studio,Duration,BoxOfficeRevenue,Awards,Rating}, {Year,LeadActor,Movie} ,{Year,LeadActor,LeadActress}, {Year,LeadActress,Movie} • Same as 3NF

  16. Data Mining Program 4

  17. Data Mining • Artificial Neural Network • Weka Implementation • ANN with error back propagation • Oracle-XE Database • Java Connectivity • Oracle Thin Driver • Get database data with java and use Weka to build ANN and classify

  18. Data Set • Census Data • Predict Income Class of US citizens given census data. • Income Class: <=$50,000 or >$50,000 • For simplicity (difficult to predict exact income) • > 30,000 tuples • CSV: • http://cs.uga.edu/~mcknight/nlp/data.csv • Database: (login required) • http://128.192.101.74:9090/apex

  19. Results • ANN with 27 internal nodes, 55 input nodes • Training Data: • 66% of original data used for training • the remaining 33% used as test set for classification • Training Time: 2050.91 seconds • Error: Correctly classified: 84.0665 % Mean absolute error: 0.1698 Root mean squared error: 0.3558 Relative absolute error: 46.6323 % Root relative squared error: 83.9026 %

More Related