190 likes | 324 Views
The need for Temporal Databases. (1) Need for coping with temporal data (both VT and TT). (2) Just adding 1 (or 2, or 4) temporal attributes (and maybe some ad-hoc procedures) does not work!.
E N D
The need for Temporal Databases (1) Need for coping with temporal data (both VT and TT) (2) Just adding 1 (or 2, or 4) temporal attributes (and maybe some ad-hoc procedures) does not work! (3) First, a rigorous semantic framework is needed, to give formal specification to the implementation. Properties: clearness, expressiveness, upward compatibiliy. Ex. BCDM (4) Second, the implementation must be proven to respect the semantics. Core issue here: efficient (1-NF) implementations hardly grant uniqueness of representation. Ex TSQL2
Ad-hoc approaches are complex and are not likely to work Example 1. Projection (and temporal coalescing) Question: salary history of Andrea “Intuitive” SQL query: SELECT Salary,VT_start,VT_end FROM Employee WHERE Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work Result obtained: Desired result:
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL92: CREATE TABLE Temp(Salary, VT_start,VT_end) AS SELECT Salary, VT_start, VT_end FROM Employee WHERE Name = “Andrea”; Repeat UPDATE Temp T1 SET (T1.VT_end)=SELECT MAX(T2.VT_end) WHERE T1.Salary=T2.Salary AND T1.VT_start < T2.VT_Start AND T1.VT_end >= T2.VT_start AND T1.VT_end < T2.VT_end WHERE EXISTS (SELECT * FROM Temp AS T2 WHERE T1.Salary=T2.Salary AND T1.VT_start < T2.VT_Start AND T1.VT_end >= T2.VT_start AND T1.VT_end < T2.VT_end) Until no tuples updated
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL92 (continues!): DELETE FROM Temp T1 WHERE EXISTS (SELECT * FROM Temp AS T2 WHERE T1.Salary = T2.Salary AND ((T1.VT_start > T2.VT_Start) AND (T1.VT_end <= T2.VT_end)) OR ((T1.VT_start >= T2.VT_Start) AND (T1.VT_end < T2.VT_end))
Ad-hoc approaches are complex and are not likely to work Underlying semantic phenomenon: Projection on temporal relations involves temporal coalescing about value equivalent tuples When it occurs (SQL): Whenever a proper subset of the attributes of the relations is chosen in the SELECT part of the query
Ad-hoc approaches are complex and are not likely to work How to get the desired result using a Temporal DB (ex. TSQL2) SELECT Salary FROM Employee WHERE Name = “Andrea”
Ad-hoc approaches are complex and are not likely to work Example 2. Join (and temporal intersection) Employee1 Employee2 Query: “combined” history of both Andrea’s salary and title
Ad-hoc approaches are complex and are not likely to work “Intuitive” SQL query: SELECT Salary, Title, Emp1.VT_start, Emp1.VT_end Emp2.VT_start, Emp2.VT_end FROM Employee1, Employee2 WHERE Employee1.Name=“Andrea” AND Employee1.Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work Result obtained:
Ad-hoc approaches are complex and are not likely to work Result desired:
Ad-hoc approaches are complex and are not likely to work How to get the desired result using SQL92: SELECT Employee1.Name,Salary,Dept,Employee1.VT_start,Employee1.VT_end FROM Employee1, Employee2 WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start <= Employee1.VT_start AND Employee1.VT_end < Employee2.VT_end UNION SELECT Employee1.Name,Salary,Dept,Employee1.VT_start,Employee2.VT_end FROM Employee1, Employee2 WHERE Employee1.Name=Employee2.Name AND Employee1.VT_start >= Employee2.VT_start AND Employee2.VT_end < Employee1.VT_end AND Employye1.VT_start < Employee2.VT_end UNION SELECT Employee1.Name,Salary,Dept,Employee2.VT_start,Employee1.VT_end FROM Employee1, Employee2 WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start > Employee2.VT_start AND Employee1.VT_end < Employee2.VT_end AND Employye2.VT_start < Employee1.VT_end UNION SELECT Employee1.Name,Salary,Dept,Employee2.VT_start,Employee2.VT_end FROM Employee1, Employee2 WHERE Employee1.Name=Employee2.Name AND Employee2.VT_start > Employee1.VT_start AND Employee2.VT_end < Employee1.VT_end
Ad-hoc approaches are complex and are not likely to work Underlying semantic phenomenon: Join (Cartesian product) on temporal relations involves temporal intersection When it occurs (SQL): Whenever more than one relation is used in the FROM part of the query Note: the number of terms in the SQL union is 2n, where n is the number of relations in the FROM part
Ad-hoc approaches are complex and are not likely to work How to get the desired result using a Temporal DB (ex. TSQL2) SELECT Salary, Title FROM Employee1, Employee2 WHERE Employee1.Name=“Andrea” AND Employee1.Name=“Andrea”
Ad-hoc approaches are complex and are not likely to work • Until now, just two simple examples concerning: • SELECT a subset of attributes ( loop to do colaescing) • FROM with >1 relations (exponential union to do intersection) • And what about: • Union, difference, …, nested queries • Temporal predicates • Primary\secondary keys • Aggregate functions • Integrity constraints • Multiple (user-defined!) granularities • …… • arbitrary combinations of all the above issues ?
Ad-hoc approaches are complex and are not likely to work • Key message: • Dealing with temporal data is a general problem in DB’s • Difficult problems (often “hidden” ones) have to be faced WHY? (informally) (1) Relational model assumes independence of attribute values Time attributes predicates about when the values of the other attributes hold in the modeled reality (VT) or are current in the DB (TT)
Ad-hoc approaches are complex and are not likely to work WHY? (informally) (2) Relational model assumes all data\knowledge is explicit The association of time to tuples involves “semantic assumptions” Such assumptions are part of our “commonsense” way of looking at time. However, they are NOT supported by the relational model (as long as they are left implicit) Making the semantic of time explicit is a complex task (everywhere, not only in DBs)
Ad-hoc approaches are complex and are not likely to work • From a Software Engeneering point of view: • Letting applications solve the problem in an ad-hoc way is: • Both cost and time expensive • Likely to lead to errors in the applications • Likely to make integration (shared data) between applications impossible
Ad-hoc approaches are complex and are not likely to work Temporal DB: an area of research aiming at providing once-and-forall principled and integrated solution to the problem