100 likes | 184 Views
SWiM Panel on Stream Query Languages. Jennifer Widom Stan Zdonik Carlo Zaniolo Dennis Shasha Mike Franklin. Format. Four representative continuous queries over data streams I’ll present them in English We’ll all present or them in our favorite query language
E N D
SWiM Panel on Stream Query Languages Jennifer Widom Stan Zdonik Carlo Zaniolo Dennis Shasha Mike Franklin
Format • Four representative continuous queries over data streams • I’ll present them in English • We’ll all present or them in our favorite query language • Demos at lunch • As we go we’ll develop a list of the most salient language features, differences, issues, tradeoffs, etc. (need scribe at board)
Stream Query Repository • The four queries were chosen from a repository we’ve been creating at Stanford • Informal description + relational schemas for four applications • English description of queries • Queries in various languages – you write them, send us the URL http://www-db.stanford.edu/stream/sqr
The Queries • Windowing and aggregation, self-join or subquery • Packets(pID, length, [time]) • Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour • Windowing, stream “splitting,” stored relation • SquirrelSensors(sID, region, [time]) • SquirrelType(sID, type) // stored relation • Create an alert when more than 20 type ‘A’ squirrels are in Jennifer’s backyard
The Queries (cont’d) • Stream self-joins • SquirrelChirps(sID, loc, [time]) • Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other • Super bonus query – windowing, stream transformations • Packets(pID, src, dest, length, [time]) • Log packet count and total packet length for each “flow” – new flow defined by > 2 minute gap for (src,dest) pair
Query 1 in CQL • Packets(pID, length, [time]) • Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour Select Istream() From Packets [Now] Where length > (Select 2 Avg(length) From Packets [Range 1 Hour])
Query 2 in CQL • SquirrelSensors(sID, region, [time]) • SquirrelType(sID, type) // stored relation • Create an alert when more than 20 type ‘A’ squirrels are in Jennifer’s backyard Select alert() From SquirrelSensors [Partition by sID Rows 1] SS, SquirrelType ST Where SS.region = 'JW yard' And SS.sID = ST.sID And ST.type = 'A' Having Count() > 20
Query 3 in CQL • SquirrelChirps(sID, loc, [time]) • Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other Select Rstream(S1.sID, S1.loc, S2.sID, S2.loc, S3.sID, S3.loc) From SquirrelChirps [Now] S1, SquirrelChirps [Range 10 Seconds] S2, SquirrelChirps [Range 10 Seconds] S3 Where S1.sID <> S2.sID And S1.sID <> S3.sID And S2.sID <> S3.sID And distance(S1.loc, S2.loc) < 5 meters And distance(S1.loc, S3.loc) < 5 meters And distance(S2.loc, S3.loc) < 5 meters
Query 4 (super bonus) in CQL • Packets(pID, src, dest, length, [time]) • Log packet count and total packet length for each “flow” – new flow defined by > 2 minute gap for (src,dest) pair FlowStartStream: Istream( (Select P., 1 as flowStartBit From Packets [Now] as P Where (src, dest) Not In (Select src, dest From Packets [Range 2 Minutes] Where pID <> P.pID)) Union (Select P., 0 as flowStartBit From Packets [Now] as P Where (src, dest) In (Select src, dest From Packets [Range 2 Minutes] Where pID <> P.pID)))
Query 4 in CQL (cont’d) FlowIDStream: Select Istream(P., flowID) From FlowStartStream [Now] P, (Select src, dest, Count() as flowID From FlowStartStream Where flowStartBit = 1 Group By src, dest) ID Where P.src = ID.src and P.dest = ID.dest FlowTable: Select src, dest, Count(), Sum(length) From FlowIDStream Group By src, dest, flowID