SQL CLR Demystified

SQL CLR Demystified A look at the integration of the CLR into SQL Server Matt Whitfield Atlantis Interactive UK Ltd

Overview of CLR integration SqlContext Stored Procedures Scalar & Table Functions CLR DML Triggers CLR DDL Triggers Aggregate Functions CLR Types Visual Studio Database Projects A look at permission sets Examples: Data maniplation for performance Environment manipulation for maintenance What will be covered

Overview • CLR integration into SQL Server allows us to run CLR code that conforms to certain restrictions • CLR integration means that we can use common business rule validation code in the middle tier and the database • For some operations, CLR integration allows us to extract maximum performance • CLR integration allows us to perform maintenance tasks that would have been very difficult or impossible before

CLR Disabled by Default • CLR integration is disabled by default in SQL Server, and therefore must be enabled: EXECsp_configure‘clr enabled’, 1; RECONFIGURE WITH OVERRIDE; GO • This can also be done using the Surface Area Configuration tool.

What Framework version? • The CLR in SQL Server is always loaded as the 2.0 runtime • Until .NET 4.0, a single CLR host process could only load a single version of the .NET runtime • It is possible future versions of SQL Server may allow more up-to-date runtime versions to be loaded • We can confirm the loaded version with: select*fromsys.dm_clr_properties • However, we can use types from .NET 3.0 and 3.5, as these releases did not update mscorlib

Some Things To Note • When SQL Server loads assemblies, they are cached in memory. When the O/S signals memory pressure to SQL Server, explicit garbage collection may be run, and the assemblies may be unloaded. This can cause performance issues if it happens frequently. • SQL CLR code cannot be costed accurately by the query optimiser, as it does not look at what the code actually does – this can affect execution plans. • SQL CLR code can sometimes prevent parallelism, and SQL CLR procedures are usually single threaded. Sometimes this can hurt performance, other times you may find you achieve the same amount of work in the same time using 1 thread instead of 8.

SqlContext • SqlContext is a class that gives us access to four members: • IsAvailable • Pipe • TriggerContext • WindowsIdentity • IsAvailable indicates whether the other members can be used

SqlContext.Pipe • Pipe is the main point of interaction with the SQL Server when returning result sets or messages • Messages can be sent to the client using the Send(string) function: SqlContext.Pipe.Send(“Hello World”); • Single row result sets can be sent using the Send(SqlDataRecord) function.

Multi-row Result Sets • SendResultsStart is used to specify the schema of the result set • No data is sent by the SendResultsStart call • 0 or more calls to SendResultsRow are then sent with data • Empty result sets can be sent by making no calls to SendResultsRow • SendResultsEnd marks the end of the result set

SqlContext.WindowsIdentity • This returns a standard System.Security.Principal.WindowsIdentity object • The property can only be used under EXTERNAL_ACCESS or UNSAFE permission sets • For 99% of CLR work, using this property is not required

SqlContext.TriggerContext • TriggerContext gives you access to: • ColumnCount – number of columns in the target table • EventData – SqlXml that returns the same as the EVENTDATA T-SQL function • TriggerAction – a TriggerAction enumeration value that specifies which action caused the trigger to fire • IsUpdatedColumn – method that determines if a column was updated – same as the T-SQL UPDATE() function • …but what about the inserted and deleted virtual tables that are available in normal triggers?

Context Connection • To access data from within CLR code, the special ‘context connection’ connection string is used… using (SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); ... } • We can then run any SQL statement as we would normally from .NET • The inserted and deleted virtual tables are available through this method when in a DML trigger

Stored Procedures • What is a stored procedure? • A procedure that contains a sequence of operations, which may affect existing data, return result sets or both • CLR Stored Procedures are no different • CLR Stored Procedures are implemented as static members of a class • The containing class is irrelevant to the execution of the procedure, and is an organisational unit only

Hello World public partial classStoredProcedures { [Microsoft.SqlServer.Server.SqlProcedure] public static void HelloWorld() { SqlContext.Pipe.Send("Hello World!"); } } • Procedure is just a method, decorated with the SqlProcedure attribute. • Procedure is registered in SQL Server as follows: CREATE PROCEDURE [dbo].[HelloWorld] AS EXTERNAL NAME [SQLAssembly].[StoredProcedures].[HelloWorld] GO

Simple Data Access • In part 1: • we will open a context connection • we will execute a command • we will read the results, and use those to populate a Dictionary object • In part 2: • We will create record metadata • We will create a record set • We will populate the record set from the Dictionary • While a very mundane example, and not something you would usually use the CLR for, it shows the basic mechanics of moving data into and out of the CLR

Simple Data Access (1/2) public static void RowCounter() { Dictionary<string, int> typeDictionary = newDictionary<string, int>(); using (SqlConnection connection = newSqlConnection("context connection=true")) { connection.Open(); using (SqlCommand command = newSqlCommand("SELECT [type] from [sys].[objects];", connection)) { using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { string type = reader.GetString(0); if (typeDictionary.ContainsKey(type)) { typeDictionary[type] = typeDictionary[type] + 1; } else { typeDictionary.Add(type, 1); } } } } } ...

Simple Data Access (2/2) ... SqlDataRecord record = newSqlDataRecord(newSqlMetaData[] { newSqlMetaData("Type", SqlDbType.NChar, 2), newSqlMetaData("Count", SqlDbType.Int) }); SqlContext.Pipe.SendResultsStart(record); foreach (KeyValuePair<string, int> kvp in typeDictionary) { record.SetString(0, kvp.Key); record.SetInt32(1, kvp.Value); SqlContext.Pipe.SendResultsRow(record); } SqlContext.Pipe.SendResultsEnd(); }

Things to note • That we did not create a new SqlDataRecord object for every row, we just populated the existing object • That the name exposed to SQL Server does not have to match the name of the implementing method • The return type of the stored procedure can be int to return a value, as with T-SQL stored procedures

Scalar Functions • What is a scalar function? • A statement or set of statements that return a single value of a specific type based on zero or more parameters • CLR Scalar Functions are no different • CLR Scalar Functions are implemented much the same as stored procedures – a single static method of a class

String Length Scalar Function [Microsoft.SqlServer.Server.SqlFunction] public static long fn_LengthTestCLR(string input) { return input.Length; } CREATE FUNCTION [dbo].[fn_LengthTestCLR] (@input nvarchar (MAX)) RETURNS bigint AS EXTERNAL NAME [SQLAssembly].[UserDefinedFunctions].[fn_LengthTestCLR] GO

Performance Comparison • Consider the equivalent T-SQL function, and the following SQL Statement: CREATE FUNCTION dbo.fn_LengthTest(@input varchar(MAX)) RETURNS [bigint] AS BEGIN RETURN LEN(@input) END SELECTSUM(dbo.fn_LengthTest(o.[name])) FROM [sys].[objects] [o] CROSS JOIN [sys].[all_columns] [ac] CROSS JOIN [sys].[all_columns] [ac1]

CLR would be slower, right? • No! • On a dual core 2GHz CPU – • SQL Function – 9.036 seconds • SQL CLR Function – 3.918 seconds • Overhead of calling a SQL CLR function is lower than that of calling a T-SQL function

Table Functions • What is a table function? • A statement or set of statements that return a single result set with a defined schema based on zero or more parameters • CLR Table Functions can only be multi-statement equivalents, there is no concept of an ‘inline CLR table function’ • Implemented as a pair of methods – one that returns an IEnumerable, and one that sets outbound parameters for each object – both are static members

Regular Expression Function • In Part 1: • We will implement an object that we can return to hold the data required to fill each row • In Part 2: • We will create a list of our storage object • We will run a regular expression match based on the parameters • We will populate our list with the results of the regular expression • We will return the list • In Part 3: • We will set the values to be stored in each row

A Regular Expression Function (1/3) private struct_match { public readonly int MatchNumber; public readonly int GroupNumber; public readonly string CaptureValue; public _match(int matchNumber, int groupNumber, string captureValue) { MatchNumber = matchNumber; GroupNumber = groupNumber; CaptureValue = captureValue; } }

A Regular Expression Function (2/3) [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "MatchNumber int, " + "GroupNumber int, " + "CaptureValue nvarchar(MAX)")] public staticIEnumerable GetCaptureGroupValues(string input, string pattern) { List<_match> matchList = newList<_match>(); int matchIndex = 0; foreach (Match m inRegex.Matches(input, pattern)) { int groupIndex = 0; foreach (Group g in m.Groups) { matchList.Add(new_match(matchIndex, groupIndex++, g.Value)); } matchIndex++; } return matchList; }

A Regular Expression Function (3/3) public static void FillRow(Object obj, out int MatchNumber, out int GroupNumber, outSqlString CaptureValue) { _match match = (_match)obj; MatchNumber = match.MatchNumber; GroupNumber = match.GroupNumber; CaptureValue = match.CaptureValue; }

Creation SQL CREATE FUNCTION [dbo].[GetCaptureGroupValues] (@input [nvarchar] (MAX), @pattern [nvarchar] (MAX)) RETURNS TABLE ([MatchNumber] [int] NULL, [GroupNumber] [int] NULL, [CaptureValue] [nvarchar] (MAX) NULL) AS EXTERNAL NAME [SQLAssembly].[UserDefinedFunctions].[GetCaptureGroupValues] GO • Again, the name exposed to SQL Server does not have to match the name of the main body function • The SQL definition can specify different column names and parameter names to the main method and fill row method

Things to note • That we used a struct for the storage object, rather than a class, because of reduced instantiation costs • That our function was decorated with a SqlFunction attribute that named the fill row method and specified the shape of the output table • That the fill row method took an object as it’s first parameter, and had an outbound parameter for each of the columns we wanted to fill

CLR Triggers • Known bug in Visual Studio 2005 and 2008 that means SqlTrigger attribute does not allow you to specify the schema of the target table • Fixed in Visual Studio 2010 • Workaround is to comment out the SqlTrigger attribute, and deploy manually

DML Triggers • What is a DML Trigger? • A statement or set of statements run when data in a target table is modified • CLR DML Triggers are no different • CLR DML Triggers still have access to the inserted and deleted virtual tables • CLR DML Triggers are implemented as static methods in a class

A Simple DML Trigger • We will specify the event type and target object using the SqlTrigger attribute • We will connect to the database using the context connection • We will select the rows from the inserted virtual table, and count them • We will send a message to the SqlContext pipe stating the number of rows

A Simple DML Trigger [Microsoft.SqlServer.Server.SqlTrigger(Target="CallLog", Event="FOR UPDATE")] public static void TestDMLTrigger() { int i = 0; using (SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); using (SqlCommand command = newSqlCommand("SELECT * from inserted;", connection)) { using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { i++; } } } } SqlContext.Pipe.Send(i.ToString() + " rows in inserted"); }

DDL Triggers • What is a DDL Trigger? • A statement or set of statements run when a schema modification event occurs • CLR DDL Triggers are no different • CLR DDL Triggers have access to the event data via SqlContext.TriggerContext • CLR DDL Triggers are implemented as static methods in a class

A Simple DDL Trigger • We will specify the target and the event type using the SqlTrigger attribute • We will create an XmlDocument based on the TriggerContext object from the SqlContext object • We will then use XPath to find the created object’s schema and name • We will send a message to the SqlContext pipe stating the name of the created object

A Simple DDL Trigger [Microsoft.SqlServer.Server.SqlTrigger(Target="DATABASE", Event="FOR CREATE_TABLE")] public static void TestDDLTrigger() { XmlDocument document = newXmlDocument(); document.LoadXml(SqlContext.TriggerContext.EventData.Value); string objectName = document.SelectSingleNode("//SchemaName").InnerText + "." + document.SelectSingleNode("//ObjectName").InnerText; SqlContext.Pipe.Send("Table " + objectName + " was created."); }

Aggregate Functions • What is an aggregate function? • A function that takes a series of values and accumulates them into a single, aggregated result • CLR Aggregate functions are no different • Implemented as a class with four methods: • Init • Accumulate • Merge • Terminate • Aggregate functions in SQL Server 2005 can take only one parameter, in SQL Server 2008 this restriction is lifted

Aggregate Function Methods • Init method is called first, here we initialise our member fields • Accumulate is called to add a value to the aggregate • Merge is called to add together two aggregates (for example, as a result of parallelism) • Terminate is called to return our value

A Simple Aggregate • We will use the SqlUserDefinedAggregate attribute to specify the serialisation format of our aggregate • We will track the minimum value in Accumulate, and also track that we have seen values • In Merge, we will check if the other aggregate class has seen values - if it has, then we take the minimum value seen • In Terminate we will return the minimum value we have seen if we have seen any values, or NULL if we have not seen any values

A Simple Aggregate (1/2) [Serializable] [Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)] public structMinValue { int minValue; bool hasValues; public void Init() { minValue = Int32.MaxValue; hasValues = false; } public void Accumulate(SqlInt32 Value) { hasValues = true; if (Value.Value < minValue) { minValue = Value.Value; } } ...

A Simple Aggregate (2/2) ... public void Merge(MinValue Group) { if (Group.hasValues) { if (Group.minValue < minValue) { minValue = Group.minValue; } hasValues = true; } } publicSqlInt32 Terminate() { if (hasValues) { return minValue; } else { returnSqlInt32.Null; } } }

SqlUserDefinedAggregate • This attribute contains parameters which can affect the query optimiser, and can cause incorrect results if set wrong: • IsInvariantToDuplicates – should only be true if aggregating the same value many times does not affect the result • IsInvariantToNulls – should only be true if aggregating NULL values does not affect the result • IsInvariantToOrder – should only be true if the result is not dependent on the order of the aggregated values • IsNullIfEmpty – should only be true if the aggregate of 0 values is NULL

Native and UserDefined Serialisation • If your aggregate contains reference types then you must implement custom serialisation, and specify Format.UserDefined in the SqlUserDefinedAggregate attribute • When specifying UserDefined format, your aggregate must implement the IBinarySerialize interface, and specify it’s maximum size in the SqlUserDefinedAggregate attribute • Maximum byte size can be 1 to 8000 in SQL Server 2005, or -1 for any value between 8001 bytes and 2 GB in SQL Server 2008. • The same serialisation interface applies to both Aggregates and User Defined Types

CLR Types • What is a user defined type? • A type based on a system type, which may be bound to old-style rules and defaults • CLR Types are different • CLR Types can be complex, and can have properties and methods • CLR Types can have static methods (and the built in CLR Types in SQL Server 2008 have some)

A Simple CLR Type • Part 1: • We will define the serialisation format of the type • We will provide the IsNull property and the Null static property to return a NULL instance • Part 2: • We will define methods to convert the type to and from strings • Part 3: • We will define a property for the symbol, a method to change the symbol and a static method to create a new currency with a defined symbol and value • Part 4: • We will declare our member fields and provide implementation of the IBinarySerialize interface to load and store our type

A Simple CLR Type (1/4) [Serializable] [Microsoft.SqlServer.Server.SqlUserDefinedType(Format.UserDefined, MaxByteSize=24)] public structCurrency : INullable, IBinarySerialize { public bool IsNull { get { return _null; } } public staticCurrency Null { get { Currency h = new Currency(); h._null = true; return h; } }

A Simple CLR Type (2/4) public override string ToString() { return _symbol + _value.ToString(); } public staticCurrency Parse(SqlString s) { if (s.IsNull || string.IsNullOrEmpty(s.Value) || s.Value.Length < 2 || string.Equals(s.Value, "NULL", StringComparison.OrdinalIgnoreCase)) { return Currency.Null; } Currency u = newCurrency(); u._symbol = s.Value[0]; u._value = Decimal.Parse(s.Value.Substring(1)); return u; }

A Simple CLR Type (3/4) public char Symbol { get { return _symbol; } } [SqlMethod(IsMutator=true)] public void ChangeSymbol(char newSymbol) { _symbol = newSymbol; } public staticCurrency CreateCurrency(char symbol, Decimal value) { Currency c = newCurrency(); c._symbol = symbol; c._value = value; return c; }

A Simple CLR Type (4/4) private char _symbol; privateDecimal _value; private bool _null; public void Read(BinaryReader r) { _null = r.ReadBoolean(); _symbol = r.ReadChar(); _value = r.ReadDecimal(); } public void Write(BinaryWriter w) { w.Write(_null); w.Write(_symbol); w.Write(_value); } }

The CLR Type in use • Using our static method to create an instance: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) PRINTCONVERT([varchar], @i) • Using our instance mutator method to change the symbol: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) SET @i.ChangeSymbol('£') PRINTCONVERT([varchar], @i) • Retrieving the value of a property: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) PRINT @i.Symbol

SQL CLR Demystified