1 / 17

Toolkit Enhancements InfoSphere Streams Version 3.0

Paul Bye and Mike Accola. Toolkit Enhancements InfoSphere Streams Version 3.0. Important Disclaimer. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

kelsey-long
Download Presentation

Toolkit Enhancements InfoSphere Streams Version 3.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paul Bye and Mike Accola Toolkit EnhancementsInfoSphere Streams Version 3.0

  2. Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion. THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

  3. Agenda • Toolkit Repackaging for Streams Version 3.0 • Database Toolkit XML Support • Database Toolkit Optional Control Port • Database Toolkit Netezza Native Loader

  4. Toolkit Repackaging for Streams Version 3.0 • The Streams Financial Toolkit and Mining Toolkit are now included in the base InfoSphere Streams installation along with the rest of the toolkits • Shipped and installed separately in prior releases of Streams • No changes are necessary to SPL source files using these toolkits from previous releases • Application Makefile and commandline sc commands used to build applications with the Financial or Mining Toolkit in a previous release need to be adjusted to point to the new toolkit locations OLD: sc -a -z -t /mining/toolkit/install/location NEW: sc -a -z -t $(STREAMS_INSTALL)/toolkits/com.ibm.streams.mining

  5. Database Toolkit – XML Support • With the addition of the native ‘xml’ SPL type, the Database Toolkit has been enhanced to interact with XML data between the database and the Streams application • Tuple attributes of type ‘xml’ can be written to, or read from, database column types of CHAR, VARCHAR, NCHAR, or NVARCHAR • Native column types are now specified in the connections.xml file with the new <native_schema> and <column> elements which replace the deprecated <external_schema> and <attribute> elements • Example: OLD: <external_schema> <attribute name=“id” type=“int32/> <attribute name=“name” type=“rstring” length=“15”/> </external_schema> NEW: <native_schema> <column name=“id” type=“INTEGER/> <column name=“name” type=“VARCHAR” length=“15”/> </native_schema>

  6. Example - ODBCAppend DB Table INTEGER (id) VARCHAR (xmldata) ODBCAppend source Input tuple schema -int32 id -xml xmldata SPL Application: //////////////////////////////////////////////// // Read from file //////////////////////////////////////////////// stream<int32 id, xml xmldata> tabledata = FileSource() { param file : “tabledata.csv"; format : csv; initDelay : 5.0; } //////////////////////////////////////////////// // Write data to the database //////////////////////////////////////////////// () as DBSink = ODBCAppend(tabledata) { param connection : “DBXML"; access : “TableWithXML"; connectionDocument : "./etc/connections.xml"; } connection.xml: <connection_specifications> <connection_specification name=“DBXML" > <ODBC database=“mydb” user=“user" password=“password” /> </connection_specification> </connection_specifications> <access_specifications> <access_specification name=“TableWithXML"> <table tablename=“XMLTABLE" /> <uses_connection connection=“DBXML" /> <native_schema> <column name="id" type="INTEGER" /> <column name=“xmldata" type=“VARCHAR“ length=“15” /> </native_schema> </access_specification> </access_specifications>

  7. Example - ODBCSource Output tuple schema -int32 id -xml xmldata DB Table INTEGER (id) VARCHAR (xmldata) ODBCSource SPL Application: //////////////////////////////////////////////// // Read data from the database //////////////////////////////////////////////// stream<int32 id, xml xmldata> dbdata = ODBCSource() { param connection : “DBXML"; access : “TableWithXML"; connectionDocument : "./etc/connections.xml"; } connection.xml: <connection_specifications> <connection_specification name=“DBXML" > <ODBC database=“mydb” user=“user" password=“password” /> </connection_specification> </connection_specifications> <access_specifications> <access_specification name=“TableWithXML"> <query query="SELECT * FROM XMLTABLE" replays="1" isolation_level="READ_COMMITTED" /> <uses_connection connection=“DBXML" /> <native_schema> <column name="id" type="INTEGER" /> <column name=“xmldata" type=“VARCHAR“ length=“15” /> </native_schema> </access_specification> </access_specifications>

  8. DB2 pureXML Support • If using a DB2 database, the Database Toolkit operators can read from and write to DB2 pureXML table columns • Can provide additional XML validation if DB2 is configured for this • Can use DB2’s support for the xQuery language to do queries in a Streams application. Example: <query query="SELECT XMLQUERY('for $d in $doc/cusInfo return&lt;out&gt;{$d/name}&lt;/out&gt;' passing info as \&quot;doc\&quot;) from PERSONTEST as c where XMLEXISTS ('$i/cusInfo[company=\&quot;IBM\&quot;]' passing c.info as \&quot;i\&quot;)" replays="1" isolation_level="READ_COMMITTED" /> • DB2 pureXML fields are specified in the connections.xml <column> element as type=“XML”

  9. Example – ODBCSource with DB2 pureXML Output tuple schema -int32 id -xml xmldata DB Table INTEGER (id) XML (xmldata) ODBCSource SPL Application: //////////////////////////////////////////////// // Read data from the database //////////////////////////////////////////////// stream<int32 id, xml xmldata> dbdata = ODBCSource() { param connection : “DBXML"; access : “TableWithXML"; connectionDocument : "./etc/connections.xml"; } connection.xml: <connection_specifications> <connection_specification name=“DBXML" > <ODBC database=“mydb” user=“user" password=“password” /> </connection_specification> </connection_specifications> <access_specifications> <access_specification name=“TableWithXML"> <query query="SELECT * FROM XMLTABLE" replays="1" isolation_level="READ_COMMITTED" /> <uses_connection connection=“DBXML" /> <native_schema> <column name="id" type="INTEGER" /> <column name=“xmldata" type=“XML" length=“200”/> </native_schema> </access_specification> </access_specifications>

  10. Database Toolkit – Optional Control Port • All Database Toolkit operators that use an ODBC connection now accept an optional “control” input port, which can be used to change operator configuration at runtime • Initially, the control port only supports changing the connection password • Additional support (e.g. userid, delay time, etc.) may be added in the future • The control port tuple schema is a tuple containing exactly two rstring values • First attribute contains the pre-defined name of the configuration option being set (e.g. “connection.password”) • Second attribute contains the value corresponding to the configuration option being set (e.g. “mynewpassword”) • The control port is configured as either port 0 if the operator does not have a required input port, or port 1 if the operator has a required input port

  11. Example – ODBCSource with control port ODBCSource source (output tuples) Control Port Schema rstring name rstring value SPL Application: //////////////////////////////////////////////// // Read new password from file //////////////////////////////////////////////// stream<rstring name, rstring value> configdata = FileSource() { param file : “password.csv"; format : csv; initDelay : 5.0; output: configdata : name = “connection.password”; } //////////////////////////////////////////////// // Read data from the database //////////////////////////////////////////////// stream<DBOutputSchenma> dbdata = ODBCSource(configdata) { param connection : “DBXML"; access : “TableWithXML"; connectionDocument : "./etc/connections.xml"; }

  12. Database Toolkit Netezza Native Loader • Enhancement to the Database Toolkit in Streams Version 3.0 • Utilizes Netezza’s External Table interface which allows for high speed data inserts (faster than ODBC) • Based on versions of operators in DeveloperWorks Note: interface has changed in the new versions of the operators

  13. New Operators • NetezzaPrepareLoad • Takes an input stream (tuple) • Generates a delimited string that can be used by NetezzaLoad. Format of string defined by user within a connection.xml file (similar to what is done for ODBCAppend) • NetezzaLoad • Takes an input stream with one rstring attribute containing the delimited string from NetezzaPrepareLoad • Loads records into specified Netezza table

  14. Basic Usage NetezzaPrepareLoad source NetezzaLoad //////////////////////////////////////////////// // Prepare the string to load //////////////////////////////////////////////// stream<rstring buf> preparedData = NetezzaPrepareLoad(dataSource) { param access : "access1"; escapeCharList : [","]; delimiter : ","; } //////////////////////////////////////////////// // Load the record into Netezza //////////////////////////////////////////////// () as myLoad = NetezzaLoad(preparedData) { param connection : "conn1"; access : "access1"; delimiter : ","; EscapeChar : "\\"; }

  15. Additional Use Cases – Example A NetezzaLoad source NetezzaPrepareLoad ThreadedSplit NetezzaLoad //////////////////////////////////////////////// // Prepare the string to load //////////////////////////////////////////////// stream<rstring buf> preparedData = NetezzaPrepareLoad(dataSource) { param access : "access1"; escapeCharList : [","]; delimiter : ","; } //////////////////////////////////////////////// // Split down two paths //////////////////////////////////////////////// (stream <rstring buf> preparedData1 ; stream <rstring buf> preparedData2) = ThreadedSplit(preparedData) { param bufferSize : 1000u; } //////////////////////////////////////////////// // Two load operators //////////////////////////////////////////////// () as myLoad1 = NetezzaLoad(preparedData1) { param connection : "conn1"; access : "access1"; delimiter : ","; EscapeChar : "\\"; } () as myLoad2 = NetezzaLoad(preparedData2) { param connection : "conn1"; access : "access1"; delimiter : ","; EscapeChar : "\\"; }

  16. Additional Use Cases – Example B NetezzaPrepareLoad source ThreadedSplit NetezzaLoad NetezzaPrepareLoad //////////////////////////////////////////////// // Split down two paths //////////////////////////////////////////////// (stream <MySchema> dataSource1 ; stream <MySchema> dataSource2) = ThreadedSplit(dataSource) { param bufferSize : 1000u; } //////////////////////////////////////////////// // Two prepare operators //////////////////////////////////////////////// stream<rstring buf> preparedData1 = NetezzaPrepareLoad(dataSource1) { param access : "access1"; escapeCharList : [","]; delimiter : ","; } stream<rstring buf> preparedData2 = NetezzaPrepareLoad(dataSource2) { param access : "access1"; escapeCharList : [","]; delimiter : ","; } //////////////////////////////////////////////// // Load the records into Netezza //////////////////////////////////////////////// () as myLoad = NetezzaLoad(preparedData1, preparedData2) { param connection : "conn1"; access : "access1"; delimiter : ","; EscapeChar : "\\"; }

  17. Other Notes • Both Netezza operators contain optional error output ports • NetezzaLoad contains optional input port for updating password information

More Related