Then, at initialization time, SQL Server invokes RegexReader’s GetEnumeratorinstance method, which returns an instance of RegexEnumerator, which does all the real work, utilizing the mem
Trang 1WhenMatchAll()is invoked, it returns an instance of the RegexReaderclass In its
constructor, RegexReadersets the passed-in regular expression, input string, and options
to its data members Then, at initialization time, SQL Server invokes RegexReader’s
GetEnumerator()instance method, which returns an instance of RegexEnumerator, which
does all the real work, utilizing the members of the RegexReaderobject that is passed into
its constructor and set to its private _readerobject
Reset()is called in RegexEnumerator’s constructor so that it can initialize its members in
the following way:
RegexEnumeratoruses a privateRegexobject (_rex) for performing the match and
stores the resulting array ofMatch(Match[]) in a privateRegex.Matchobject (_match)
The ordinal number of the match is kept in _matchIndexand initialized to 0(in case
there are no matches)
WhenReset()is complete, it is up to SQL Server to iterate through the matches by
callingMoveNext()
MoveNext()does the work of re-creating the row (represented as a private array of object
called_current) for every successful match stored in _match:
_match[0]is set to the value of _matchIndex(incremented on a per-match basis) and
corresponds to the output table column (defined in the TableDefinitionnamed
parameter)MatchIndex
_match[1]is set to the value of an XML document that is built for every match and
contains subnodes for each group and group capture This value corresponds to the
output table column GroupList
When SQL Server uses the RegexEnumerator, it first calls MoveNext()and then uses the
Currentproperty
Next, execution passes to the method specified in FillRowMethodName(FillMatchAll())
Finally, the CLR passes the latest value of _currenttoFillMatchAll()as the row
parame-ter Each outparameter of FillMatchAll()is set to the value for the columns in the
output row
NOTE
If this implementation seems daunting, the best way to overcome that is to walk
though the function line by line in debug mode, using VS
Developing Managed User-Defined Types (UDTs)
In the preceding section, you used a managed user-defined type (UDT) called
RegexPatternto store the regular expression pattern In this section, you explore how
custom UDTs are built and used in SQL Server
The first thing to note is that although the name UDT is the same as the extended data
types built using SQL Server 2000, they are by no means the same in SQL Server 2008
Trang 2SQL Server 2000’s UDTs were actually retro-named “alias data types” in SQL Server 2005
SQL Server 2008 UDTs are structs (value types) built using the NET Framework.
To create a UDT of your own, you right-click your Visual Studio project and then select
Add, User-Defined Type Next, you should name both the class and its autogenerated
methodRegexPattern Notice the attribute used to decorate the RegexPatternstruct:
SqlUserDefinedType Its constructor has the following parameters:
Format—Tells SQL Server how serialization (and its complement, deserialization) of
the struct should be done You specify Format.Nativeto let SQL Server handle
serial-ization for you You specify Format.UserDefinedto do your own serialization
WhenFormat.UserDefinedis specified, the struct must implement the
IBinarySerializeinterface to explicitly take the values from string(orint, or
whatever the value passed into the constructor of the type is) back to binaryand
vice versa
A named parameter list—This list contains the following:
IsFixedLength—Tells SQL Server that the byte count of the struct is the same
for all its instances
IsByteOrdered—Tells SQL Server that the bytes of the struct are ordered so that
it may be used in binary comparisons, as withORDER BY,GROUP BY, orPARTITION
BYclauses, in indexing, and when the UDT is a primary or foreign key
MaxByteSize—Tells SQL Server not to allow more than the specified number of
bytes to be held in an instance of the UDT The overall limit is 8KB You must
specify this when using Format.UserDefined
Name—Tells the deployment routine what to call the UDT when it is created in
the database
ValidationMethodName—Tells SQL Server which method of the struct to use to
validate it when it has been deserialized (in certain cases)
The implementation contract for any UDT is as follows:
It must provide a static method called Parse(), used by SQL Server for conversion to
the struct from a string
It must provide an instance method that overrides the default ToString()method
for converting from the struct to a string
It must implement the INullableinterface, providing a Boolean instance method
calledIsNull, used by SQL Server to determine whether an instance is null
It must have a static property called Nullof the type of the struct This property
returns an instance of the struct whose value is null(that is, where IsNullis true for
that instance) (This concept seems to be derived from the “null object” design
pat-tern.)
Trang 3Also, you need to be aware that UDTs can have only read-only static fields, they cannot
use inheritance, and they cannot have overloaded methods (except the constructor, whose
overloads are mainly used when ADO.NET is the calling context)
Given these fairly stringent requirements, Listing 46.6 provides an implementation of a
UDT representing a regular expression pattern
LISTING 46.6 A UDT Representing a Regular Expression Pattern
using System;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
//added
using System.Text.RegularExpressions;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType(
Format.UserDefined, // requires IBinarySerialize
IsFixedLength=false,
IsByteOrdered=true,
MaxByteSize=250,
ValidationMethodName = “RegexPatternValidator”
)]
public struct RegexPattern : INullable, IBinarySerialize
{
//instance data fields
private Regex _reg;
private bool _null;
//constructor
public RegexPattern(String Pattern)
{
_reg = new Regex(Pattern);
_null = (Pattern == String.Empty);
}
//instance method
public override string ToString()
{
return _reg.ToString();
}
//instance property
public bool IsNull
Trang 4{
get
{
if (_reg == null || _reg.ToString() == string.Empty)
{
return true;
}
else
return false;
}
}
//static method
public static RegexPattern Null
{
get
{
RegexPattern NullInstance = new RegexPattern();
NullInstance._null = true;
return NullInstance;
}
}
//static method
public static RegexPattern Parse(SqlString Pattern)
{
if (Pattern.IsNull)
return Null;
else
{
RegexPattern u = new RegexPattern((String)Pattern);
return u;
}
}
//private instance method
private bool RegexPatternValidator()
{
return (_reg.ToString() != string.Empty);
}
//instance method
public Int32 Match(String Input)
{
Match m = _reg.Match(Regex.Escape(Input.ToString()));
if (m != null)
Trang 5return Convert.ToInt32(m.Success);
else
return 0;
}
//instance property
public bool IsFullStringMatch
{
get
{
Match m = Regex.Match(_reg.ToString(), @”\^.+\$”);
if (m != null)
return m.Success;
else
return false;
}
}
//instance method
[SqlMethod(
DataAccess = DataAccessKind.None,
IsMutator = false,
IsPrecise = true,
OnNullCall = false,
SystemDataAccess = SystemDataAccessKind.None
)]
public Int32 MatchingGroupCount(SqlString Input)
{
Match m = _reg.Match(Regex.Escape(Input.ToString()));
if (m != null)
return m.Groups.Count;
else
return 0;
}
//static method
[SqlMethod(
DataAccess = DataAccessKind.None,
IsMutator = false,
IsPrecise = true,
OnNullCall = false,
SystemDataAccess = SystemDataAccessKind.None
)]
public static bool UsesLookaheads(RegexPattern p)
// must be static to be called with :: syntax
{
Trang 6Match m = Regex.Match(p.ToString(), @
if (m != null)
return m.Success;
else
return false;
}
#region IBinarySerialize Members
public void Read(System.IO.BinaryReader r)
{
_reg = new Regex(r.ReadString());
}
public void Write(System.IO.BinaryWriter w)
{
w.Write(_reg.ToString());
}
#endregion
}
As you can see by scanning this code, it meets the required implementation contract In
addition, it declares static and instance methods, as well as instance properties Both
static and instance methods can optionally be decorated with theSqlMethodattribute By
default, methods of UDTs are declared to be nondeterministic and nonmutator, meaning
that they do not change the value of the instance
You use the named parameters of the constructor for SqlMethodto override this and other
behaviors These are its named parameters:
DataAccess—Tells SQL Server whether the method will access user table data on the
server in its body If you provide the enumvalueDataAccessKind.None, some
opti-mizations may be made
SystemDataAccess—Tells SQL Server whether the method will access system table
data on the server in its body Again, if you provide the enumvalue
SystemDataAccessKind.None, some optimizations may be made
IsDeterministic—Tells SQL Server whether the method always returns the same
values, given the same input parameters
IsMutator—Must be set to trueif the method changes the state of the instance
Name—Tells the deployment routine what to call the UDT when it is created in the
database
OnNullCall—Returnsnullif any arguments to the method are null
Trang 7InvokeIfReceiverIsNull—Indicates whether to invoke the method if the instance of
the struct itself isnull
To create this type in SQL Server without using Visual Studio, you use the CREATE TYPE
DDL syntax, as follows:
CREATE TYPE RegexPattern EXTERNAL NAME SQLCLR.RegexPattern
Note that DROP TYPE TypeNameis also available, but there is no ALTER TYPE statement
Let us add a few words on the code in Listing 46.6 The constructor to RegexPattern
vali-dates the expression passed to it via the constructor of
System.Text.RegularExpressions.Regex
If you pass an invalid regex to the T-SQL SETstatement (when declaring a variable of type
RegexPattern) or when the UDT is used as a table column data type and a value is
modi-fied, the Regexclass does its usual pattern validation, as it does in the NET world
Let’s look at some of the ways you can use your UDT The following example shows how
to call all the public members (both static and instance) of RegexPattern:
DECLARE @rp RegexPattern
SET @rp = ‘(\w+)\s+?(?!bar)’
SELECT
@rp.ToString() AS ToString,
@rp.IsFullStringMatch AS FullStringMatch,
@rp.Match(‘uncle freddie’) AS Match,
@rp.MatchingGroupCount(‘loves elken’) AS GroupCount,
RegexPattern::UsesLookaheads(@rp) AS UsesLH
go
ToString FullStringMatch Match GroupCt UsesLH
-(\w+)\s+?(?!bar) 0 1 2 1
(1 row(s) affected)
Note that static members can be called (without an instance, that is) by using the
follow-ing new syntax:
TypeName::MemberName(OptionalParameters)
To try this, you can create a table and populate it as shown here:
CREATE TABLE dbo.RegexTest
(
PatternId int IDENTITY(1,1),
Pattern RegexPattern
)
GO
Trang 8INSERT RegexTest SELECT ‘\d+’
INSERT RegexTest SELECT ‘foo (?:bar)’
INSERT RegexTest SELECT ‘(\s+()’
Msg 6522, Level 16, State 2, Line 215
A NET Framework error occurred during execution of user defined
routine or aggregate
‘RegexPattern’:
System.ArgumentException: parsing “(\s+()” - Not enough )’s.
System.ArgumentException:
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re,
RegexOptions op)
at System.Text.RegularExpressions.Regex ctor(String pattern,
RegexOptions options,
Boolean useCache)
at System.Text.RegularExpressions.Regex ctor(String pattern)
at RegexPattern ctor(String Pattern)
at RegexPattern.Parse(SqlString Pattern)
Do you see what happens when you try to insert an invalid regex pattern into the Pattern
column (the third insertstatement)? The parenthesis count is off, and the CLR tells you
so in the query window’s output
Because the UDT has the IsByteOrderednamed parameter set to true, you can index this
column (based on the struct’s serialized value) and use it in ORDER BYstatements Here’s
an example:
CREATE NONCLUSTERED INDEX PatternIndex ON dbo.RegexTest(Pattern)
GO
SELECT
Pattern.ToString(),
RegexPattern::UsesLookaheads(Pattern)
FROM RegexTest
ORDER BY Pattern
go
PatString UsesLookaheads
-\d+ 0
foo (?:bar) 1
(2 row(s) affected)
Back using ADO.NET, you can access the UDT by using the new SqlDbType.Udt enum
value To try this, you can add a new C# Windows application to your sample solution
You can add a project reference to your sample project (”SQLCLR”) and then add a using
statement for System.Data.SqlClient Then you should add a list box called lbRegexesto
the form Finally, you should add a button called btnCallUDTto the form, double-click it,
and add the code in Listing 46.7 to the body of its OnClickevent handler
Trang 9LISTING 46.7 Using a UDT from ADO.NET in a Client Application
private void btnCallUDT_Click(object sender, EventArgs e)
{
using (SqlConnection c =
new SqlConnection(ConfigurationManager.AppSettings[“connstring”]))
{
using (SqlCommand s = new SqlCommand(“SELECT Pattern FROM dbo.RegexTest”, c))
{
c.Open();
SqlDataReader r = s.ExecuteReader(CommandBehavior.CloseConnection);
{
while (r.Read())
{
RegexPattern p = (RegexPattern)r.GetValue(0);
lbRegexes.Items.Add(p.ToString());
}
r.Close();
}
}
}
}
In this example, you selected all the rows from the sample tabledbo.RegexTextand then
cast thePatterncolumn values intoRegexPatternstructs Finally, you called the
ToString()method of each struct, adding the text of the regex as a new item in the list box
You can also create SqlParameterobjects to be mapped to UDT columns by using code
such as the following:
SqlParameter p = new SqlParameter(“@Pattern”, SqlDbType.Udt);
p.UdtTypeName = “RegexPattern”;
p.Value = new RegexPattern(“\d+\s+\d+”);
command.Parameters.Add(p);
Finally, keep in mind thatFOR XMLdoes not implicitly serialize UDTs You have to do that
yourself, as in the following example:
SELECT Pattern.ToString() AS ‘@Regex’
FROM dbo.RegexTest
FOR XML PATH(‘Pattern’), ROOT(‘Patterns’), TYPE
go
<Patterns>
<Pattern Regex=”\d+” />
<Pattern Regex=”foo (?:bar)” />
</Patterns>
Trang 10Developing Managed User-Defined Aggregates (UDAs)
A highly specialized feature of SQL Server 2008, managed user-defined aggregates (UDAs)
provide the capability to aggregate column data based on user-defined criteria built in to
.NET code You can now extend the (somewhat small) list of aggregate functions usable
inside SQL Server to include those you custom-define
NOTE
If you’ve been following the examples in this chapter sequentially, at this point, you
need to drop the sample table dbo.RegexTestto redeploy the assembly after creating
the UDA example
The implementation contract for a UDA requires the following:
A static method called Init(), used to initialize any data fields in the struct,
particu-larly the field that contains the aggregated value
A static method called Terminate(), used to return the aggregated value to the
UDA’s caller
A static method called Aggregate(), used to add the value in the current row to the
growing value
A static method called Merge(), used when SQL Server breaks an aggregation task
into multiple threads of execution (SQL Server actually uses a thread abstraction
called a task), each of which needs to merge the value stored in its instance of the
UDA with the growing value
UDAs cannot do any data access, nor can they have any side-effects—meaning they
cannot change the state of the database They take only a single input parameter, of any
type You can also add public methods or properties other than those required by the
contract (such as the IsPrime()method used in the following example)
Like UDTs, UDAs are structs They are decorated with the SqlUserDefinedAggregate
attribute, which has the following parameters for its constructor:
Format—Tells SQL Server how serialization (and its complement, deserialization) of
the struct should be done This has the same possible values and meaning as
described earlier for SqlUserDefinedType
A named parameter list—This list contains the following:
IsInvariantToDuplicates—Tells SQL Server whether the UDA behaves
differ-ently with respect to duplicate values passed in from multiple rows
IsInvariantToNulls—Tells SQL Server whether the UDA behaves differently
when null values are passed to it
IsInvariantToOrder—Tells SQL Server whether the UDA cares about the order
in which column values are fed to it