340 likes | 480 Views
New Approaches for Data Reduction in Generalized Multi-valued Decision Information System (GMDIS): Case Study of Rheumatic Fever Patients. By Abd El-Monem M. Kozea, Mohamed M. E. Abd El-Monsef, Soaad Abd El-Badie Attia El-Afify Mathematics Department, Faculty of Science,
E N D
New Approaches for Data Reduction in Generalized Multi-valued Decision Information System (GMDIS):Case Study ofRheumatic Fever Patients WRSTA2006, 13 August 2006
By Abd El-Monem M. Kozea, Mohamed M. E. Abd El-Monsef,Soaad Abd El-Badie Attia El-Afify Mathematics Department, Faculty of Science, Tanta University, Egypt Email:savvymore@yahoo.com WRSTA2006, 13 August 2006
Outline • Motivation / Introduction • Basic Concepts of Rough Sets • Rheumatic Fever Data: Characteristics • New Thinking • Generalized Multi-Valued Decision Information System (GMDIS) • New Approaches for Data Reduction in GMDIS • Non-equivalence Relations, • Topological Spaces and • Degree of Dependencies in GMDIS • Reduct Algorithms based on GMDIS • Rheumatic Fever GMDIS Reduction: Worked example • Conclusion and Future Work WRSTA2006, 13 August 2006
Motivation / Introduction (1) • Rough set theory was developed by Zdzislaw Pawlak in the early 1982’s. • RS is based on the idea of equivalence relations which partition the domain into different classes. • It is a mathematical tool for dealing with incomplete data for induction of approximations of concepts and for discovering patterns hidden in data. • It can be used for feature selection, data reduction, identifies partial/total dependencies in data, gives approach to null values and missing data, and decision rule generation. • Rough Set Features: • It is applicable to problems with both numeric and descriptive attributes • It is capable of finding all minimal knowledge representation • It is highly automated based on strict rules. • A multi-valued information system (MIS) is a generalization of the idea of a single valued information system (SIS). • In a multi-valued information system, • Attribute functions are allowed to map elements to sets of attribute values. WRSTA2006, 13 August 2006
Motivation / Introduction (2) • Initiated a new approach for data reduction in GMDIS. • By converting the Single-Valued Decision Information System (SDIS) to a GMDIS. • Two general relations are defined • Constructing new classes using the general relations • The measure of decision dependency on the condition attributes is studied To evaluate the performance of the approach, • An application of Rheumatic Fever datasets. WRSTA2006, 13 August 2006
Rough Set Theory: Basic Concepts • Information/Decision Systems (Tables) • Indiscernibility • Set Approximation • Reducts and Core • Rough Membership • Dependency of Attributes WRSTA2006, 13 August 2006
Information Systems/Tables • IS is a pair (U, A) • U is a non-empty finite set of objects. • A is a non-empty finite set of attributes such that for every • is called the value set of a. WRSTA2006, 13 August 2006
Decision Systems/Tables • DS: • is the decisionattribute(instead of one we can consider more decision attributes). • The elements ofAare called the condition attributes. WRSTA2006, 13 August 2006
= Î ( U , At , { V : a At }, f ) a a = Î ( U , At D , { V : a At }, f ) U a a Information Systems Types • The first concept of IS was developed by Grzymala-Busse (1988). There are many types of IS as follows: • Single valued Information System (SIS) • The data takes a single value for each element • Single valued Decision Information System (SDIS) • A Multi-valued Information System (MIS) • An ordinary information system which its values ore sets • A Multi-valued Decision Information System (MDIS) WRSTA2006, 13 August 2006
Rheumatic Fever Data: Characteristics • We obtained the used Rheumatic Fever patients data from Tanta University Hospital, Egypt. • All patients are between 9-12 years old with history of Arthritis began from age 3-5 years. • This disease has many symptoms and it is usually started in young age and still with the patient along his life. • The following table shows seven patients characterized by 8 symptoms (attributes) using them to decide the diagnosis for each patient (decision attribute). WRSTA2006, 13 August 2006
Rheumatic Fever Data: Characteristics ِAttribute Symbol ً to?Refers ِAttribute Values to?Refers Sex Male Female Pharyngitis Yes No Arthritis No arthritis Began in the knee Began in the ankle Carditis Affected Not affected Chorea Yes No ESR Normal High Abdonominal Pain Absent Present Headache Yes No Diagnosis Rheumatic Arthritis CarditisRheumatic Rheumatic Arthritis and Carditis WRSTA2006, 13 August 2006
New Thinking (1) • A multi-valued information system (MIS) is a generalization of the idea of a single valued information system (SIS). • In a multi-valued information system, attribute functions are allowed to map elements to sets of attribute values • Covert the SDIS to a MDIS and vice versa? WRSTA2006, 13 August 2006
New Thinking (2) • Initiative two methods to: • Covert the SIS to a MIS and vice versa! • Covert the SDIS to a MDIS and vice versa! by ( Collecting of Attributes). WRSTA2006, 13 August 2006
S F A R K E P H D x1 s2 f1 a1 r1 k1 e1 p1 h2 d3 x2 s1 f1 a1 r1 k1 e2 p1 h1 d3 x3 s2 f1 a2 r1 k2 e1 p1 h2 d3 x4 s1 f1 a1 r2 k2 e1 p1 h2 d1 r1 x5 s1 f2 a0 k2 e1 p2 h2 d2 x6 s1 f1 a1 r1 k2 e2 p1 h2 d3 x7 s1 f1 a2 r1 k2 e1 p1 h2 d3 Worked Example 1 (SDIS): Rheumatic Fever SDIS Data WRSTA2006, 13 August 2006
Worked Example 2 (MDIS ): Converted Data Description (MDIS) Attribute Symbol ًRefers to ? ِAttribute Values ًRefers to ? α {S,K} α1 S → s1 α2 K → k1 α3 {S,K}→ {s2,k2} β {F,A,E} β1 F → f1 β2 A →a1 β3 A →a2 β4 E → e1 β5 {F,A,E} →{f2,a0,e2} δ {R,P,H} δ1 R → r1 δ2 P→p1 δ3 H→h1 δ4 {R,P,H}→ {r2,p2,h2} D Diagnosis d1 Rheumatic arthritis d2 Rheumatic carditis d3 Rheumatic arthritis and carditis WRSTA2006, 13 August 2006
α β δ D x1 {α2} {β1,β2,β4} {δ1,δ2,} {d3 } x2 {α1,α2} {β1, β2,} {δ1,δ2,δ3} {d3 } x3 {α3} {β1, β2, β4} {δ1,δ2} {d3 } x4 {α1} {β1,β2,β4} {δ2 } {d1 } x5 {α1} {β4} {δ1 } {d2 } x6 {α1} {β1,β2} {δ1,δ2} {d3 } x7 {α1} {β1, β3, β4} {δ1,δ2,δ3} {d3 } Worked Example 3 (MDIS ): Rheumatic Fever MDIS Data WRSTA2006, 13 August 2006
Generalized Multi-Valued Decision Information System (GMDIS) WRSTA2006, 13 August 2006
Initiated a New Approach • Initiate a new approach for data reduction in Generalized Multi–Valued Decision Information System (GMDIS). • Convert the SDIS to GMDIS. • Two general relations are defined on condition attributes and decision attribute. • Construct new classes using the general relations which are used for data reduction. • Study The measure of decision dependency on the condition attributes • Evaluate the performance of the approach, • an application of, rheumatic fever datasets has been chosen and the reduct approach have been applied to see their ability and accuracy. WRSTA2006, 13 August 2006
= y Î h Í GMIS ( U , At , { : a At }, f , { : B At }) (1) a a B = y Î h Í GMDIS ( U , At D , { : a At }, f , { : B At }) U a a B (2) Generalized Multi-valued Decision Information System A Generalized Multi-valued Information System can be defined as follows. A Generalized Multi-valued Decision Information System can be defined as follows. WRSTA2006, 13 August 2006
(1) c h = Í " Î Í {( x , y ) : f ( x ) f ( y ) , a B , B At } (4) m = m = m ¹ Î ¹ { A A , A , A , A , A A , i j } U I h = {( x , y ) : f ( x ) depends on f ( y )} B a a h a l i j l k i j k a D D k D = Í {( x , y ) : f ( x ) f ( y )} (3) D D h = Í " Î Í {( x , y ) : f ( y ) f ( x ) , a B , B At } B a a (2) Set Approximations in GMDIS (1) Define the set of all intersections of members of as the Meeting Point Relation (MPR) can be written as: WRSTA2006, 13 August 2006
D = Í (5) POS ( D ) X , B At U h B B Î X A h D Where, for any subset the lower and upper approximations are defined by, Í X U = h h Í Í X { : X }, B At U h Bx Bx (6) B = h h ¹ F Í X { : X }, B At U I h Bx Bx B Set Approximations in GMDIS (2) WRSTA2006, 13 August 2006
2.The attribute is called the principal attribute (PA) if , and if then both a and b are principal attributes. Î a At t £ t B D Í B At t £ t " Î t $ Î t Ì ¹ iff G , G ' s . t . G G ' , G , G ' U B D B D t = t t t " Î ¹ , a , b At , b a f a b a b Suggested New Technique : Consideration (1) 1.The set of attributes is called a reduct if and B is minimal, where WRSTA2006, 13 August 2006
Y = { R , R , , R } L 1 2 n Y = ' { R ' , R ' , , R ' } L 1 2 n " Î Y $ Î Y Í R ' ' , R s . t . R ' R i i i i Suggested New Technique : Consideration (2) • The set of attributes of equal highest degree of dependency is the PA of the GMDIS. If the set of all reducts of any SDIS is , , and the set of reducts for the GMDIS system using tha new approach is, . .Then, it can be said that Y’ is more refinement than, Y if . WRSTA2006, 13 August 2006
Simplified Reducts • Is the set of all reducts, after omitted the supersets of each reduct in the set RED (At), and we denote it by SRED (At). WRSTA2006, 13 August 2006
GMDIS Reduction Algorithms • Algorithm 1: GMDIS Reduct • Algorithm 2: GMDIS PA Algorithm WRSTA2006, 13 August 2006
Í R At A GMDIS = y Î h Í ( U , At D , { : a At }, f , { : B At }) U a a B (1) ¬ R GMDIS ¬ (6) R {} ¬ (3) GMDIS R t £ t (7) R D R Î - Loop a ( At R ) (9) Return ¬ GMDIS R { a } º U Where R Reduct GMDIS Reduction Algorithms: GMDIS Reduct (2) Do (4) (8) Until t £ t (5) If { } R a D U R : A set of minimum attribute subset; WRSTA2006, 13 August 2006
Í PA: A set of principal attribute subset, PA At A GMDIS = y Î h Í ( U , At D , { : a At }, f , { : B At }) (1) U (6) ¬ ¬ PA PA { a } PA {} U a a B (7) End Loop (2) Do Î (8) a At End Loop (9) Î PA b At Return t t f a b GMDIS Reduction Algorithms: GMDIS PA Algorithm (3) Loop (4) Loop (5) If WRSTA2006, 13 August 2006
c h = Í " Î Í {( x , y ) : f ( x ) f ( y ) , a B , B At } B a a a { } = a = RED ( At ) { } { S , K } Rheumatic Fever GMDIS Reduction: Worked Example • Applying the new approach on MDIS Rheumatic Fever data to be a GMDIS by using the relation • So we conclude that is the reduct and it is the PA of the GMDIS and this is the same result obtained using the second consideration. WRSTA2006, 13 August 2006
x1 x2 x3 x4 x5 x6 x7 x1 Ф x2 Ф Ф x3 Ф Ф Ф x4 {S,R,K} {R,K,E,H} {S,A,R} Ф x5 {S,F,A,K,P} {F,A,K,E,P,H} {S,F,A,P} {F,A,R,P} Ф x6 Ф Ф Ф {R,E} {F,A,E,P} Ф x7 Ф Ф Ф {A,R,H} {F,A,P,H} Ф Ф Discernibility Matrix versus GMDIS Rheumatic Fever Data Discernibility Matrix WRSTA2006, 13 August 2006
= Ú Ú Ù Ú Ú Ú Ù Ú Ú Ù Ú Ú Ú Ú f { S R K } { R K E H } { S A R } { S F A K P } At Ù Ú Ú Ú Ú Ú Ù Ú Ú Ú Ù Ú Ú Ú Ú Ù Ú { F A K E P H } { S F A P } { F A R K P } { R E } Ù Ú Ú Ú Ù Ú Ú Ù Ú Ú Ú { F A E P } { A R H } { F A P H } = Ú Ú Ú Ú Ú Ú Ú Ú Ú Ú Ú Ú Re d ( At ) {{ S R K }, { S A R }, { S F A P }, { F A R K P }, { R E } Ú Ú Ú Ú Ú Ú Ú Ú , { F A E P }, { A R H }, { F A P H }} = a = RED ( At ) { } { S , K } The discernibility function WRSTA2006, 13 August 2006
Final Note Reducts obtained by GMDIS is contained in the reducts obtained on SDIS using the discernibility matrix, that means that the new approach gives more reduction. WRSTA2006, 13 August 2006
Conclusion • New approach for data reduction in GMDIS is considered as a generalization in the case of MDIS. • This approach extended to Pawlak approach if the system is single-valued and the relations are equivalence. • Opens the way for other approaches of data reduction • if we use the general topological recent concepts such as Pre-open sets, Semi-open sets, etc. • In many real life situations, the use of attributes in a single fashion is not represetable for the actual effect of attributes. So, it is necessary to consider subsets of the attributes as a multi criteria. • An application of, Rheumatic Fever datasets has been chosen and the reduct approach has been applied to see their ability and accuracy. WRSTA2006, 13 August 2006
Acknowledgment • The authors greatly appreciate and thanks many peoples for their valuable comments and advices: • Dr. K. E. Sturtz, , Air Force Research Laboratory, Wright Patterson Air Force Base, Ohio; • Prof. Aboul Ella Hassanien, Cairo University • Prof. E. Rady,, I.S.S.R., Cairo University. • Dr. A. S. Salama. Pure Mathematics Dept., Faculty of Science, Tanta University. WRSTA2006, 13 August 2006
شكرا لحسن استماعكم WRSTA2006, 13 August 2006