Searching down the hierarchy with materialized path Navigating down the hierarchy and returning a subtree of all nodes under a given node is where the materialized path method really shi
Trang 1Subtree queries
The primary work of a hierarchy is returning the hierarchy as a set The adjacency list method used
sim-ilar methods for scanning up or down the hierarchy Not so with materialized path Searching down a
materialized path is a piece of cake, but searching up the tree is a real pain
Searching down the hierarchy with materialized path
Navigating down the hierarchy and returning a subtree of all nodes under a given node is where the
materialized path method really shines
Check out the simplicity of this query:
SELECT BusinessEntityID, ManagerID, MaterializedPath FROM HumanResources.Employee
WHERE MaterializedPath LIKE ‘1,263,%’
Result:
BusinessEntityID ManagerID MaterializedPath - -
That’s all it takes to find a node’s subtree Because the materialized path for every node in the subtree
is just a string that begins with the subtree’s parent’s materialized path, it’s easily searched with aLIKE
function and a%wildcard in theWHEREclause
It’s important that theLIKEsearch string includes the comma before the%wildcard; otherwise,
search-ing for1,263%would find1,2635, which would be an error, of course
Searching up the hierarchy with materialized path
Searching up the hierarchy means searching for the all the ancestors, or the chain of command, for a
given node The nice thing about a materialized path is that the full list of ancestors is right there in the
materialized path There’s no need to read any other rows
Therefore, to get the parent nodes, you need to parse the materialized path to return the IDs of each
parent node and then join to this set of IDs to get the parent nodes
The trick is to extract it quickly Unfortunately, SQL Server lacks a simple split function There are two
options: build a CLR function that uses the C# split function or build a T-SQL scalar user-defined
func-tion to parse the string
Trang 2A C# CLR function to split a string is a relatively straightforward task:
using Microsoft.SqlServer.Server;
using System.Data.SqlClient;
using System;using System.Collections;
public class ListFunctionClass
{
[SqlFunction(FillRowMethodName = "FillRow",
TableDefinition = "list nvarchar(max)")]
public static IEnumerator ListSplitFunction(string list)
{
string[] listArray = list.Split(new char[] {’,’});
Array array = listArray;
return array.GetEnumerator();
}
public static void FillRow(Object obj, out String sc)
{
sc = (String)obj;
}
}
Adam Machanic, SQL Server MVP and one of the sharpest SQL Server programmers
around, went on a quest to write the fastest CLR split function possible The result is
posted on SQLBlog.com at http://tinyurl.com/dycmxb
But I’m a T-SQL guy, so unless there’s a compelling need to use CLR, I’ll opt for T-SQL There are
a number of T-SQL string-split solutions available I’ve found that the performance depends on the
length of the delimited strings Erland Sommerskog’s website analyzes several T-SQL split solutions:
http://www.sommarskog.se/arrays-in-sql-2005.html
Of Erland’s solutions, the one I prefer for shorter length strings such as these is in theParseString
user-defined function:
up the hierarchy
parse the string
CREATE
alter
FUNCTION dbo.ParseString (@list varchar(200))
RETURNS @tbl TABLE (ID INT) AS
BEGIN
code by Erland Sommarskog
Trang 3DECLARE @valuelen int,
@nextpos int SELECT @pos = 0, @nextpos = 1 WHILE @nextpos > 0
BEGIN SELECT @nextpos = charindex(’,’, @list, @pos + 1) SELECT @valuelen = CASE WHEN @nextpos > 0
THEN @nextpos ELSE len(@list) + 1 END - @pos - 1
INSERT @tbl (ID) VALUES (substring(@list, @pos + 1, @valuelen)) SELECT @pos = @nextpos
END RETURN END
go SELECT ID FROM HumanResources.Employee
CROSS APPLY dbo.ParseString(MaterializedPath)
WHERE BusinessEntityID = 270
go DECLARE @MatPath VARCHAR(200) SELECT @MatPath = MaterializedPath FROM HumanResources.Employee WHERE BusinessEntityID = 270 SELECT E.BusinessEntityID, MaterializedPath
FROM dbo.ParseString(@MatPath)
JOIN HumanResources.Employee E
ON ParseString.ID = E.BusinessEntityID ORDER BY MaterializedPath
Is the node in the subtree?
Because the materialized-path pattern is so efficient at finding subtrees, the best way to determine
whether a node is in a subtree is to reference theWHERE-like subtree query in aWHEREclause, similar
to the adjacency list solution:
Trang 4Does 270 work for 263
SELECT ‘True’
WHERE 270 IN
(SELECT BusinessEntityID
FROM HumanResources.Employee
WHERE MaterializedPath LIKE ‘1,263,%’)
Determining the node level
Determining the current node level using the materialized-path pattern is as simple as counting the
com-mas in the materialized path The following function usesCHARINDEXto locate the commas and make
quick work of the task:
CREATE FUNCTION MaterializedPathLevel
(@Path VARCHAR(200))
RETURNS TINYINT
AS
BEGIN
DECLARE
@Position TINYINT = 1,
@Lv TINYINT = 0;
WHILE @Position >0
BEGIN;
SET @Lv += 1;
SELECT @Position = CHARINDEX(’,’, @Path, @Position + 1 );
END;
RETURN @Lv - 1
END;
Testing the function:
SELECT dbo.MaterializedPathLevel(’1,20,56,345,1010’)
As Level
Result:
Level
-6
A function may be easily called within an update query, so pre-calculating and storing the level is a
triv-ial process The next script adds aLevelcolumn, updates it using the new function, and then takes a
look at the data:
ALTER TABLE HumanResources.Employee
ADD Level TINYINT
UPDATE HumanResources.Employee
SET Level = dbo.MaterializedPathLevel(MaterializedPath)
Trang 5SELECT BusinessEntityID, MaterializedPath, Level FROM HumanResources.Employee
Result (abbreviated):
BusinessEntityID MaterializedPath Level
Storing the level can be useful; for example, being able to query the node’s level makes writing
single-level queries significantly easier Using the function in a persisted calculated column with an index
works great
Single-level queries
Whereas the adjacency list pattern was simpler for doing single-level queries, rather than returning
complete subtrees, the materialized-path pattern excels at returning subtrees, but it’s more difficult to
return just a single level Although neither solution excels at returning a specific level in a hierarchy
on its own, it is possible with the adjacency pattern but requires some recursive functionality For the
materialized-path pattern, if the node’s level is also stored in table, then the level can be easily added to
theWHEREclause, and the queries become simple
This query locates all the nodes one level down from the CEO The CTE locates the
MaterializedPathand theLevelfor the CEO, and the main query’s join conditions filter
the query to the next level down:
Query Search 1 level down WITH CurrentNode(MaterializedPath, Level) AS
(SELECT MaterializedPath, Level FROM HumanResources.Employee WHERE BusinessEntityID = 1) SELECT BusinessEntityID, ManagerID, E.MaterializedPath, E.Level FROM HumanResources.Employee E
JOIN CurrentNode C
ON E.MaterializedPath LIKE C.MaterializedPath + ‘%’
AND E.Level = C.Level + 1
Trang 6BusinessEntityID ManagerID MaterializedPath Level
-
An advantage of this method over the single join method used for finding single-level queries for the
adjacency list pattern is that this method can be used to find any specific level, not just the nearest level
Locating the single-level query up the hierarchy is the same basic outer query, but the CTE/subquery
uses the up-the-hierarchy subtree query instead, parsing the materialized path string
Reparenting the materialized path
Because the materialized-path pattern stores the entire tree in the materialized path value in each node,
when the tree is modified by inserting, updating, or deleting a node, the entire affected subtree must
have its materialized path recalculated
Each node’s path contains the path of its parent node, so if the parent node’s path changes, so do the
children This will propagate down and affect all descendants of the node being changed
The brute force method is to reexecute the user-defined function that calculates the materialized path A
more elegant method, when it applies, is to use theREPLACET-SQL function
Indexing the materialized path
Indexing the materialized path requires only a non-clustered index on the materialized path column
Because the level column is used in some searches, depending on the usage, it’s also a candidate for a
non-clustered index If so, then a composite index of the level and materialized path columns would be
the best-performing option
Materialized path pros and cons
There are some points in favor of the materialized-path pattern:
■ The strongest point in its favor is that in contains the actual references to every node in its
hierarchy This gives the pattern considerable durability and consistency If a node is deleted
or updated accidentally, the remaining nodes in its subtree are not orphaned The tree can be
reconstructed If Jean Trenary is deleted, the materialized path of the IT department employees
remains intact
Trang 7■ The materialized-path pattern is the only pattern that can retrieve an entire subtree with a single index seek It’s wicked fast
■ Reading a materialized path is simple and intuitive The keys are there to read in plain text
On the down side, there are a number of issues, including the following:
■ The key sizes can become large; at 10 levels deep with an integer key, the keys can be 40–80 bytes in size This is large for a key
■ Constraining the hierarchy is difficult without the use of triggers or complex check constraints
Unlike the adjacency list pattern, you cannot easily enforce that a parent node exists
■ Simple operations like ‘‘get me the parent node’’ are more complex without the aid of helper functions
■ Inserting new nodes requires calculating the materialized path, and reparenting the material-ized path requires recalculating the materialmaterial-ized paths for every node in the affected subtree
For an OLTP system this can be a very expensive operation and lead to a large amount of contention Offloading the maintenance of the hierarchy to a background process can alleviate this An option is to combine adjacency and path solutions; one provides ease of maintenance and one provides performance for querying
The materialized path is my favorite hierarchy pattern and the one I use in Nordic (my SQL Server
object relational fac¸ade) to store the class structure
Using the New HierarchyID
For SQL Server 2008, Microsoft has released a new data type targeted specifically at solving the
hierar-chy problem Working through the materialized-path pattern was a good introduction toHierarchyID
becauseHierarchyIDis basically a binary version of materialized path
HierarchyIDis implemented as a CLR data type with CLR methods, but you don’t need to enable
CLR to useHierarchyID Technically speaking, the CLR is always running Disabling the CLR only
disables installing and running user-programmed CLR assemblies
To jump right into theHierarchyID, this first query exposes the raw data The
OrganizationalNodecolumn in theHumanResources.Employeetable is aHierarchyID
column The second column simply returns the binary data fromOrganizationalNode The
third column,HierarchyID.ToString()uses the.ToString()method to converrt the
HierarchyIDdata to text The column returns the values stored in a caluculated column that’s set to
the.getlevel()method:
View raw HierarchyID Data
SELECT E.BusinessEntityID, P.FirstName + ‘ ‘ + P.LastName as ‘Name’,
OrganizationNode, OrganizationNode.ToString() as ‘HierarchyID.ToString()’, OrganizationLevel
FROM HumanResources.Employee E
Trang 8JOIN Person.Person P
ON E.BusinessEntityID = P.BusinessEntityID
Result (abbreviated):
BusinessEntityID OrganizationNode HierarchyID.ToString() OrganizationLevel
- -
In the third column, you can see data that looks similar to the materialized path pattern, but there’s a
significant difference Instead of storing a delimited path of ancestor primary keys,HierarchyIDis
intended to store the relative node position, as shown in Figure 17-6
FIGURE 17-6
The AdventureWorks Information Services Department with HierarchyID nodes displayed
Adventure Works 2008 Information Service Department
1 Ken Sánchez
/
263 Jean Trenary
/5/
264
Stephanie Conroy
/5/1/
270 François Ajenstat
/5/5/
271 Dan Wilson
/5/6/
266 Peter Connelly
/5/1/2/
265 Ashvini Sharma
/5/1/1/
267 Karen Berg
/5/2/
268 Ramesh Meyyappan
/5/3/
269 Dan Bacon
/5/4/
272 Janaina Bueno
/5/7/
Trang 9Walking through a few examples in this hierarchy, note the following:
■ The CEO is the root node, so hisHierarchyIDis just/
■ If all the nodes under Ken were displayed, then Jean would be the fifth node Her relative node position is the fifth node under Ken, so herHierarchyIDis/5/
■ Stephanie is the first node under Jean, so herHierarchyIDis/5/1/
■ Ashivini is the first node under Stephanie, so his node is/5/1/1/
Selecting a single node
Even thoughHierarchyIDstores the data in binary, it’s possible to filter by aHierarchyIDdata
type column in aWHEREclause using the text form of the data:
SELECT E.BusinessEntityID, P.FirstName + ‘ ‘ + P.LastName as ‘Name’, E.JobTitle
FROM HumanResources.Employee E
JOIN Person.Person P
ON E.BusinessEntityID = P.BusinessEntityID WHERE OrganizationNode = ‘/5/5/’
Result:
- -
-270 Fran¸ cois Ajenstat Database Administrator
Scanning for ancestors
Searching for all ancestor nodes is relatively easy withHierarchyID There’s a great CLR method,
IsDescendantOf(), that tests any node to determine whether it’s a descendant of another node
and returns either true or false The followingWHEREclause tests each row to determine whether the
@EmployeeNodeis a descendent of that row’sOrganizationNode:
WHERE @EmployeeNode.IsDescendantOf(OrganizationNode) = 1
The full query returns the ancestor list for Franc¸ois The script must first store Franc¸ois’HierarchyID
value in a local variable Because the variable is aHierarchyID, theIsDescendantOf()method
may be applied The fourth column displays the same test used in theWHEREclause:
DECLARE @EmployeeNode HierarchyID
SELECT @EmployeeNode = OrganizationNode
FROM HumanResources.Employee
WHERE OrganizationNode = ‘/5/5/’ Fran¸ cois Ajenstat the DBA
SELECT E.BusinessEntityID, P.FirstName + ‘ ‘ + P.LastName as ‘Name’, E.JobTitle,
@EmployeeNode.IsDescendantOf(OrganizationNode) as Test FROM HumanResources.Employee E
JOIN Person.Person P
ON E.BusinessEntityID = P.BusinessEntityID
WHERE @EmployeeNode.IsDescendantOf(OrganizationNode) = 1
Trang 10- -
263 Jean Trenary Information Services Manager 1
270 Fran¸ cois Ajenstat Database Administrator 1
Performing a subtree search
TheIsDescendantOf()method is easily flipped around to perform a subtree search locating all
descendants The trick is that either side of theIsDescendantOf()method can use a variable or
column In this case the variable goes in the parameter and the method is applied to the column The
result is the now familiar AdventureWorks Information Service Department:
DECLARE @ManagerNode HierarchyID
SELECT @ManagerNode = OrganizationNode
FROM HumanResources.Employee
WHERE OrganizationNode = ‘/5/’ Jean Trenary - IT Manager
SELECT E.BusinessEntityID, P.FirstName + ‘ ‘ + P.LastName as ‘Name’,
OrganizationNode.ToString() as ‘HierarchyID.ToString()’,
OrganizationLevel
FROM HumanResources.Employee E
JOIN Person.Person P
ON E.BusinessEntityID = P.BusinessEntityID
WHERE OrganizationNode.IsDescendantOf(@ManagerNode) = 1
Result:
BusinessEntityID Name HierarchyID.ToString() OrganizationLevel
- -
Single-level searches
Single-level searches were presented first for the adjcency list pattern because they were the
sim-pler searches ForHierarchyIDsearches, a single-level search is more complex and builds on
the previous searches In fact, a single-levelHierarchyIDsearch is really nothing more than an