1 / 74

SPL Enhancements InfoSphere Streams Version 3.0

Howard Nasgaard SPL Compiler, SPL Runtime & Standard Toolkit Development. SPL Enhancements InfoSphere Streams Version 3.0. Important Disclaimer. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

bliss
Download Presentation

SPL Enhancements InfoSphere Streams Version 3.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Howard Nasgaard SPL Compiler, SPL Runtime & Standard Toolkit Development SPL Enhancements InfoSphere Streams Version 3.0

  2. Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion. THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

  3. Agenda • Walk-through of new SPL and Standard Toolkit changes and additions

  4. Problem • How do I ingest and work on XML data in a streams application?

  5. XML Support • ‘xml’ added as a first-class datatype • stream<xml x> .... • Checked for form • Can also specify a schema • stream<xml<“mySchema”> x> • Checked for form and validity • Schema can be file or web-based URI • Recommend local file • data directory root if relative

  6. XML Support - Conversion • rstring <-> xml • xml x = (xml) “<doc>...</doc>”r; • Checked for form • xml<“schema”> xs = (xml<“schema”>)“<doc>...</doc>”r; • Checked for form and validity • rstring s = (rstring)x; • Form and validity checking done only when needed • xml x = (xml)xs; • Not checked • xs = (xml<“schema”>)x • Checked for validity • Validation failure at runtime will throw an exception • Set of built-in functions available to convert to xml • Can be used with return code in logic

  7. XML Support - Conversion • xml <-> tuple • type T = tuple<int32 i, rstring s>; • mutable T t = {....}; • mutable xml x = (xml)t; • Converted to xml in “Serialized Tuple Model” format • Schema provided with Streams (serializedTupleModel.xsd) • mutable T t = (T)x; • Validated against tuple model schema • No conversion from ustring to xml directly • Must go through rstring

  8. XML Literals • New literal type added for XML • “<a b=\”hi\”>x</a>”x; • String syntax extended to ease use in XML literals • ‘ (single quote) can now be used to delineate strings • ‘<a b=”hi”>x</a>’x; • Embedded new-lines are now legal in string literals • ‘<a> <b>x</b> </a>’x; • Can be used in all string literals

  9. XML Support - Encoding • XML literals are assumed to be in UTF-8 encoding • Source files are in UTF-8 so any explicit encoding must be too • ‘<?xml ... encoding=xxxxx”> ...’x; • Compile-time error if xxxxx is not UTF-8 • rstring expressions that contain XML data... • Are assumed to be encoded in UTF-8 if no encoding specified • Must contain valid characters if encoding is specified • Error raised at cast time if not

  10. XML Support – Source/Sink Operators • Source operators can read attributes of xml type • stream<xml x> In = FileSource() {...} • xml is checked for form and validated (if there is a schema) • Sink operators can write attributes of xml type • () as Out = FileSink(stream<xml x> In) {...} • csv, txt as quoted xml literals • bin in serialized form • No validation (assumed valid) • Source can read “traditional” xml file in line or block • Requires XMLParse operator to be useful • No validation in Source operators • Sink operators cannot write XML using line or block format • XML must first be converted to rstring or blob.

  11. XML Support - XMLParse Operator • Converts xml data to tuples • Input attribute can be rstring, ustring, blob, xml • Input data can be in multiple lines/blocks with rstring, ustring or blob • “line” format can be used to read “traditional” XML files • If ustring any encoding directive is ignored • Operator validates xml • Input in rstring, ustring or blob can contain multiple, sequential, XML documents • A window marker punctuation is generated at the end of each XML document • XMLParse will not produce attributes of xml type

  12. XML Support – XMLParse Operator • Generates one or more output stream • Each stream • Corresponds to a subtree within the XML (ie: element) • Requires a “trigger” expression • Trigger expression • rstring containing an XPath expression that defines a node set • Tuples are generated for each node in the node set • Must start at the root of the document • param trigger : “/doc/a”; • Two mechanisms to specify the mapping of XML to tuple content • Implicit: content derived from the output stream schema • Explicit: content specified in the output clauses

  13. XML Support – Implicitly deriving tuple content • The tuple schema representation of an XML element is: • type Element = tuple<map<rstring, rstring> _attrs, rstring _text [, NestedTuples]*> • _attrs contains all the attribute name/value pairs • _text contains the text content between the open/close tag • Additional tuples or lists of tuples represent nested elements • Defining an output stream schema that follows this notion allows the XMLParse operator to generate a SAX parser that will extract the desired information • An example:

  14. XML Support – XMLParse example <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • The nested tuples for sub-elements ‘d’ and ‘e’ do not have a map for attributes. Not needed. • The trigger expression “/a” always starts with ‘/’ type aElem = tuple<map<rstring,rstring> _attrs, rstring _text, tuple<rstring _text> d, list<tuple<rstring _text>> e>; stream<aElem> O = XMLParse(…) { param trigger : “/a”; }

  15. XML Support – Another XMLParse example <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • Map has been replaced by scalar b and list<scalar>[1] c • param flatten : attributes • Put XML attributes in spl [list]scalar attributes of the same name • SPL attribute b has type int32 • Everything in XML is considered rstring by default • Specifying a non-rstring type causes a conversion type aElem = tuple<int32 b, list<rstring>[1] c, rstring _text, tuple<rstring _text> d, list<tuple<rstring _text>> e>; stream<aElem> O = XMLParse(…) { param trigger : “/a”; flatten : attributes; }

  16. XML Support – Still another XMLParse example <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • The nested tuple<rstring _text> dis reduced to rstring d • rstring SPL attributes not named _text are assumed to refer to the text content of a nested element by that name • The list<tuple<rstring _text>> e is reduced to list<rstring> e • The map is back? Why? type aElem = tuple<map<rstring,rstring> _attrs, rstring _text, rstring d, list<rstring> e>; stream<aElem> O = XMLParse(…) { param trigger : “/a”; flatten : elements; }

  17. XML Support – Implicitly deriving tuple content • Reduction of maps/tuples to scalars is referred to as flattening • Reduction of maps/tuples to scalars can only be done for XML attributes OR elements, not both • rstring b could mean element b or attribute b. • You must tell the XMLParse operator which one you want • param flatten : attributes/elements/none (default none) • XML attribute or element content not represented in the tuple schema will be ignored • You do not need to fully represent the XML structure in the schema • My XML just happens to have an element named _text. • params textName and AttributeName can change the default values of _text and _attrs.

  18. XML Support – Explicitly specifying tuple content • As with other operators, expressions in the output clause assign values to the output tuple attributes • Expressions use custom output functions to specify the mapping of XML data to SPL attribute • rstring XPath(rstringxpathExpn) • <tuple T> XPath(rstring xpathExpn, TtupleLiteral) • list<rstring> XPathList(rstringxPathExpn) • <any T> list<T> XPathList(rstringxpathExpn, T elements) • map<rstring,rstring> XPathMap(rstring xpathExpn) • Each of these functions require an XPath expression relative to the trigger expression, or the containing expression • Examples, please!

  19. XML Support – Example of using explicit specification <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • Trigger expression says output a tuple for each “e” subtree • XPath expression “text()” specifies what to get from the ‘e’ subtree, the ‘e’ element’s text content in this case • Everything else in the XML is ignored • Two tuples would be output for the example XML • No naming convention for tuple attributes stream<rstring s> O = XMLParse(...) { param trigger : “/a/e”; output O : s = XPath(“text()”); }

  20. XML Support – Example of using explicit specification <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • Trigger expression says output a tuple for each “a” subtree • XPath expression “@b” specifies that we want the content of XML attribute ‘b’ • Must explicitly cast output of COFs if not rstring • One tuple would be output for the example XML stream<int32 i> O = XMLParse(...) { param trigger : “/a”; output O : i = (int32)XPath(“@b”); }

  21. XML Support – Example of using explicit specification <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • Trigger expression says output a tuple for each “a” subtree • XPath expression “e/text()” specifies that we want the content of XML element ‘e’ and the XPathList function returns a list of all ‘e’ contents • One tuple would be output for the example XML with two values in list l stream<list<rstring> l> O = XMLParse(...) { param trigger : “/a”; output O : l = XPathList(“e/text()”); }

  22. XML Support – Example of using explicit specification <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> • Things to note: • XPath expression “@*” specifies we want all attributes • XPathMap function returns the map containing all the attributes • One tuple will be output with a map containing two key/value pairs stream<map<rstring, rstring> attrs> O = XMLParse(...) { param trigger : “/a”; output O : attrs = XPathMap(“@*”); }

  23. XML Support – XMLParse Operator • Some other behavior • If an SPL attribute assignment does NOT contain XPath, XPathList or XPathMap, then the expression will be resolved from the input stream • If an SPL attribute assignment is omitted, the XMLParse operator will try to generate an implicit assignment using a default XPath or XPathList expression • The parsing parameter controls its error processing: • strict: logs an error and terminates the operator • permissive: logs an error and continues

  24. XML Support – spl-schema-from-xml utility • Given complex XML, crafting either tuple schemas for implicit generation, or output clauses for explicit generation, could be difficult • Enter the spl-schema-from-xml utility • Given a representative XML document it will: • generate a set of typedefs for the tuple schema to support the full XML • optionally generate output clauses for each trigger specified • optionally generate a schema for the XML • optionally generate a composite operator wrapping the XMLParse operator • optionally generate a main composite with a source, sink and a call to the parser composite • You can tailor the output from the utility to suit your needs • You can tell it to flatten elements or attributes

  25. Sample spl-schema-from-xml output <a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a> spl-schema-from-xml -o a.spl -t '/a' --composite Parse --mainComposite Main data/test1.xml

  26. Sample spl-schema-from-xml output use spl.XML::*; composite Parse(input input0; output output0) { type static a_type = tuple<map<rstring, rstring> _attrs, rstring _text, a_d_type d, list<a_e_type> e>; static a_d_type = tuple<rstring _text>; static a_e_type = tuple<rstring _text>; graph stream<a_type> output0 = XMLParse(input0) { param trigger : "/a"; parsing : permissive; // log and ignore errors output output0 : _attrs = XPathMap("@*"), _text = XPath("text()"), d = XPath("d", {_text = XPath("text()")}), e = XPathList("e", {_text = XPath("text()")}); // *trigger: /a } } composite Main() { graph stream<rstring s> Input = FileSource() { param file : "test1.xml"; format : line; } stream<Parse.a_type> X0 = Parse(Input) { } () as O0 = FileSink(X0) { param file : "out0.dat"; } }

  27. XML Support – Standard Library Support Functions • A number of new functions have been added to the library • Safe conversions from string to xml • <xml X, string T> void convertToXML(mutable X xmlResult, T input, mutable int32 error); • <xml X, string T> public bool convertToXML (mutable X xmlResult, T input); • An XQuery engine is added as an alternative to the XMLParse operator • <xml X > public list<rstring> xquery (X input, rstring xqueryExpression); • <xml X > public list<rstring> xquery (X input, rstring xqueryExpression, mutable int32 error); • And numerous more flavors • All return a list of rstrings with the query results

  28. XML Support – XQuery example type T = tuple<int32 id, tuple<rstring b, list<int32> x, float64 d> a, rstring c>;stream<T> OutTuples = Custom (Data) { logic onTuple Data: { // extract string ‘c’ mutable list<rstring> results = xquery(Data.xmlVar, “/something/bar/c/text()”); mutable rstring s = results[0]; // extract string ‘b’ attribute in ‘a’ mutable tuple<rstring b, list<int32> x, float64 d> a = {}; results = xquery(Data.xmlVar, “/something/bar/a/@bdata”); a.b = results[0]; // extract list<int32> ‘x’ attribute in ‘a’ results = xquery(Data.xmlVar, “/something/bar/foo/text()”); for (rstring r in results) appendM (a.x, (int32) r); // extract float64 ‘d’ attribute in ‘a’ results = xquery(Data.xmlVar, “/something/bar/a/d/text()”); a.d = (float64) results[0]; // submit the final result submit ({id = Data.id, a = a, c = s}, OutTuples); }

  29. XML Support – Database Toolkit • All database toolkit operators have been extended to support XML • XML converted to/from char data for DB that doesn’t support XML • DB2 PureXML capabilities are accessible if using DB2 V9.7 or later

  30. Problem • How do I use a for statement to iterate over, and modify, a list? • In SPL the for statement looks like list<rstring> l = [...]; for (rstring entry in l) { l = “????”; } • This doesn’t work because entry is an rstring value, not an index into the list • You need a list of indexes with the same number of entries as the list you are iterating over for (int32 i in indexes) { l[i] = “????”; }

  31. More Efficient ‘for’ Loops • Introduce a set of ‘range’ functions • // return [0, ..., limit-1]list<int32> range(int32 limit); • // return [start, ..., limit-1]list<int32> range(int32 start, int32 limit); • // return [start, start+step, ... number < limit]list<int32> range(int32 start, int32 limit, int32 step) • // return [0, ..., size(l)-1]list<T> list<int32> range(T l) • Use: • mutable list<rstring> myList = [“hi”, “there”]; for (int32 i in range(myList)) { myList[i] = upper(myList[i]);} • Compiled into the C++ code when used inside a for loop

  32. More Efficient ‘for’ loops logic ....: { mutable list<rstring> myList = ["hi", "there"]; for (int32 i in range(myList)) { myList[i] = upper(myList[i]); println(myList[i]); } } SPL::list<SPL::rstring > myList = ...; SPL::int32 temp = myList.size(); for (SPL::int32 i = 0; i < temp; i++) { myList.at(i) = ...::upper(myList.at(i)); ...::println(myList.at(i)); }

  33. Other SPL Changes • SPADE to SPL translator removed • Must install Streams 2.x if you need translation • submit([tuple|punct], portNo) functions added • Enable dynamic port selection • Will raise an exception at runtime if port invalid • Return statement allowed in logic clause to enable simplification • Does not affect the normal processing of tuples in the generated primitive operator. • new Perl regex compatible functions added • regexMatchPerl • regexReplacePerl • Both rstring and ustring varients

  34. Problem • I want to write a primitive operator with custom output functions that can be nested within an output assignment • You saw this in the examples of the XMLParse operator

  35. Operator Model Changes • Allow Custom Output Functions to be nested within an expression • Recall from XMLParse: • output O : attr = XPathList(“...”, XPathList(...)); • outputPortOpenSetType • <allowNestedCustomOutputFunctions> true/false • Allow Custom Output Functions to be used in a param expression • parameterType • <customOutputFunction> - name of a COF that can appear

  36. Operator Model Changes • To support nested COFs and COFs in a param expression • Compiler will optionally generate an expression tree into the Operator Instance Model (OIM) • APIs provided in the OIM interface to walk the expression tree and query characteristics • APIs documented in html in doc/spl/operator/code-generation-api/perl • Also support for generation of C++ code from the expression tree • Use documented in the Toolkit Developer’s Reference

  37. Problem • I have a Streams application that Imports from, or Exports to, another streams application • I would like to dynamically update the Export properties or the Import subscription

  38. Export Property/Import Subscription Update from SPL code • Allows SPL programs to query/update properties/ subscriptions without having to use primitive operators. • getOutputPortExportProperties • setOutputPortExportProperties • getInputPortImportSubscription • setInputPortImportSubscription • Port must come from Import or go to Export operator that uses subscription/property • Triggers a disconnect/reconnect • Use:setInputPortImportSubscription(‘stock == "IBM“’, 0u);

  39. Problem • I provide a toolkit that is used in various countries. I would like to be able to load strings in a language appropriate for the locale in effect where my toolkit is used. • Within those strings I would like numeric values, for example, to be formatted in a locale sensitive way.

  40. Localization Support • Utilizes ICU under the covers • Resource bundle creation • locale sensitive loading mechanism • Translatable strings contained in XLIFF files • XML Localization Interchange File Format • Specify .xlf files in info.xml file • Resource bundles built during toolkit indexing • C++ header and Perl module generated • Standard library functions added to load resource • loadAndFormatResource • Localized strings available at compile-time and run-time • Localization sample program • Documented in the Toolkit Developement Reference

  41. XLIFF File <xliff version="1.1" xmlns="urn:oasis:names:tc:xliff:document:1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.1 _path_/xliff-core-1.1.xsd"> <file datatype="plaintext" original="root.txt" source-language="en" target-language="en" xml:space="preserve"> <body> <group> <trans-unit id="1" extraData="MESSAGE_1" resname="MSG0001"> <source>English: A message emitted at compile time.</source> </trans-unit> <trans-unit id="2" extraData="MESSAGE_2" resname="MSG0002"> <source>English: A message emitted at run time. A formatted value ''{0,number,currency}''.</source> </trans-unit> </group> </body> </file> </xliff>

  42. Usage: <% # Add a require for the Perl module that contains the subroutine that loads and formats the string. require MyResource; # Emit the message using a SPL helper method SPL::CodeGen::println(MyResource::MESSAGE_1()); %> // Add an include for the header that contains the macro which loads and formats the message #include "MyResource.h“ // Tuple processing for non-mutating ports void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) { const IPort0Type & t = static_cast<const IPort0Type &>(tuple); // Get the loaded and formatted message and initialize the output tuple SPL::rstring r = MESSAGE_2(t.get_i()); // Add a message to the runtime log SPLAPPLOG(L_INFO, r, "test");

  43. Standard Toolkit Changes

  44. Problem • I find the Beacon operator somewhat limited in how I can use it as a stream generator. Is there another way of generating tuples?

  45. Custom Operator as a Source • A Custom operator with no inputs can act as a source(stream<int32 a> A; stream<int32 a> B)=Custom() { logic state : mutable int32 i = 9; onProcess : { for (int32 x in range(10)) { submit ({a = x + i}, A); submit ({a = 6 + x + i}, B); i++; } } } • onProcess clause added • Only allowed in a Custom operator • Only allowed if there are no input ports

  46. Problem • My Streams application imports data from another application, but I am only interested in part of the data. I have to add a Functor to filter out a lot of the imported data. That wastes a lot of transport time.

  47. Filter Support for Import • A new filter param added to the Import operator • boolean or rstring expression type streamT = int64 value, rstring str, int32 x; stream<streamT> I = Import () { param subscription : “a >= 55”; filter : value < 0 && str == “foo”; } • Filtering will be performed at Export operator and only matching tuples will be seen at the Import operator

  48. Filter Support for Import • Filter expressions are the same as subscription expressions: • int64, rstring, float64, lists of same • Export operator has a new parameter: • allowFilter: true/false; • If allowFilter is false, an Import with a filter parameter will not connect to the Export operator • New metric added to PE output port (per connection): • TuplesFilteredOut • Shows number of tuples not sent over connection • New functions to query/update filter expression • getInputPortImportFilterExpression • setInputPortImportFilterExpression • Update is asyncronous

  49. Problem • I have noticed, when using windows in my primitive operator, that they cache every tuple within the library. • It occurs to me that, at least with tumbling windows, it shouldn’t be necessary to cache the tuples.

  50. Window Library Extended to Optimize Tumbling Windows • In Streams Version 2.0 the window library cached all tuples • Can use a lot of memory • In many cases it is not necessary to cache all the tuples • ie: Compute the average of attribute price in a tumbling window • Requires only a count and a running total • In Version 3.0 the window library is extended with Summarizers • Aggregate operator updated to use this optimization

More Related