30.1 Basic Graph Characteristics 683CREATE VIEW GraphNodes node_id AS SELECT DISTINCT node_id FROM NestedSetsGraph; 30.1.2 Path Endpoints A path through a graph is a traversal of consecu
Trang 1682 CHAPTER 30: GRAPHS IN SQL
The most common way to model a graph in SQL is with an adjacency list model Each edge of the graph is shown as a pair of nodes in which the ordering matters, and then any values associated with that edge are shown in another column
30.1 Basic Graph Characteristics
The following code is from John Gilson This code uses an adjacency list model of the graph, with nodes in a separate table This is the most common method for modeling graphs in SQL
CREATE TABLE Nodes (node_id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE AdjacencyListGraph (begin_node_id INTEGER NOT NULL REFERENCES Nodes (node_id), end_node_id INTEGER NOT NULL REFERENCES Nodes (node_id), PRIMARY KEY (begin_node_id, end_node_id),
CHECK (begin_node_id <> end_node_id));
It is also possible to load an acyclic directed graph into a nested set model by splitting the nodes
CREATE TABLE NestedSetsGraph (node_id INTEGER NOT NULL REFERENCES Nodes (node_id), lft INTEGER NOT NULL CHECK (lft >= 1) PRIMARY KEY, rgt INTEGER NOT NULL UNIQUE,
CHECK (rgt > lft), UNIQUE (node_id, lft));
To split nodes, start at the sink nodes and move up the tree When you come to a node with an indegree greater than one, replace it with that many copies of the node under each of its superiors Continue to do this until you get to the root The acyclic graph will become a tree, but with duplicated node values There are advantages to this model; we will discuss them in Section 30.3
30.1.1 All Nodes in the Graph
To view all nodes in the graph, use the following:
Trang 230.1 Basic Graph Characteristics 683
CREATE VIEW GraphNodes (node_id) AS
SELECT DISTINCT node_id FROM NestedSetsGraph;
30.1.2 Path Endpoints
A path through a graph is a traversal of consecutive nodes along a sequence of edges Clearly, the node at the end of one edge in the sequence must also be the node at the beginning of the next edge in the sequence The length of the path is the number of edges that are traversed along the path
Path endpoints are the first and last nodes of each path in the graph For a path of length zero, the path endpoints are the same node If there
is more than one path between two nodes, each path will be distinguished by its own distinct set of number pairs for the nested-set representation.
If there is only one path, P, between two nodes, but P is a subpath of more than one distinct path, then the endpoints of P will have number pairs for each of these greater paths As a canonical form, the least-numbered pairs are returned for these endpoints
CREATE VIEW PathEndpoints (begin_node_id, end_node_id, begin_lft, begin_rgt, end_lft, end_rgt) AS
SELECT G1.node_id, G2.node_id, G1.lft, G1.rgt, G2.lft, G2.rgt FROM (SELECT node_id, MIN(lft), MIN(rgt) FROM NestedSetsGraph
GROUP BY node_id) AS G1 (node_id, lft, rgt) INNER JOIN
NestedSetsGraph AS G2
ON G2.lft >= G1.lft AND G2.lft < G1.rgt;
30.1.3 Reachable Nodes
If a node is reachable from another node, then a path exists from the one node to the other It is assumed that every node is reachable from itself
Trang 3684 CHAPTER 30: GRAPHS IN SQL
CREATE VIEW ReachableNodes (begin_node_id, end_node_id) AS
SELECT DISTINCT begin_node_id, end_node_id FROM PathEndpoints;
30.1.4 Edges
Edges are pairs of adjacent connected nodes in the graph If edge E is represented by the pair of nodes (n0, n1), then n1 is reachable from n0
in a single traversal
CREATE VIEW Edges (begin_node_id, end_node_id) AS
SELECT begin_node_id, end_node_id FROM PathEndpoints AS PE
WHERE begin_node_id <> end_node_id AND NOT EXISTS
(SELECT * FROM NestedSetsGraph AS G WHERE G.lft > PE.begin_lft AND G.lft < PE.end_lft AND G.rgt > PE.end_rgt);
30.1.5 Indegree and Outdegree
The indegree of a node, n, is the number of distinct edges ending at n Nodes that have an indegree of zero are not returned To determine the indegree of all nodes in the graph:
CREATE VIEW Indegree (node_id, node_indegree) AS
SELECT N.node_id, COUNT(E.begin_node_id) FROM GraphNodes AS N
LEFT OUTER JOIN Edges AS E
ON N.node_id = E.end_node_id GROUP BY N.node_id;
The outdegree of a node, (n), is the number of distinct edges beginning at (n) Nodes that have an outdegree of zero are not returned
To determine the outdegree of all nodes in the graph:
Trang 430.1 Basic Graph Characteristics 685
CREATE VIEW Outdegree (node_id, node_outdegree)
AS
SELECT N.node_id, COUNT(E.end_node_id)
FROM GraphNodes AS N
LEFT OUTER JOIN
Edges AS E
ON N.node_id = E.begin_node_id
GROUP BY N.node_id;
30.1.6 Source, Sink, Isolated, and Internal Nodes
A source node of a graph has a positive outdegree but an indegree of zero; that is, it has edges leading from, but not to, the node This assumes there are no isolated nodes (nodes belonging to no edges).
CREATE VIEW SourceNodes (node_id, lft, rgt)
AS
SELECT node_id, lft, rgt
FROM NestedSetsGraph AS G1
WHERE NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G
WHERE G1.lft > G2.lft
AND G1.lft < G2.rgt);
Likewise, a sink node of a graph has positive indegree but an
outdegree of zero; that is, it has edges leading to, but not from, the node This assumes there are no isolated nodes
CREATE VIEW SinkNodes (node_id)
AS
SELECT node_id
FROM NestedSetsGraph AS G1
WHERE lft = rgt - 1
AND NOT EXISTS
(SELECT *
FROM NestedSetsGraph AS G2
WHERE G1.node_id = G2.node_id
AND G2.lft < G1.lft);
An isolated node belongs to no edges; i.e., it has zero indegree and zero outdegree
Trang 5686 CHAPTER 30: GRAPHS IN SQL
CREATE VIEW IsolatedNodes (node_id, lft, rgt) AS
SELECT node_id, lft, rgt FROM NestedSetsGraph AS G1 WHERE lft = rgt - 1
AND NOT EXISTS (SELECT * FROM NestedSetsGraph AS G2 WHERE G1.lft > G2.lft AND G1.lft < G2.rgt);
An internal node of a graph has an indegree greater than zero and an outdegree greater than zero; that is, it acts as both a source and a sink
CREATE VIEW InternalNodes (node_id) AS
SELECT node_id FROM (SELECT node_id, MIN(lft) AS lft, MIN(rgt) AS rgt FROM NestedSetsGraph
WHERE lft < rgt - 1 GROUP BY node_id) AS G1 WHERE EXISTS
(SELECT * FROM NestedSetsGraph AS G2 WHERE G1.lft > G2.lft AND G1.lft < G2.rgt)
Finding a path in a graph is the most important commercial application
of graphs Graphs model transportation networks, electrical and cable systems, process control flow and thousands of other things
A path, P, of length L from a node n0 to a node n k in the graph is defined as a traversal of ( L + 1) contiguous nodes along a sequence of edges, where the first node is node number 0 and the last is node number k
CREATE VIEW Paths (begin_node_id, end_node_id, this_node_id, seq_nbr,
begin_lft, begin_rgt, end_lft, end_rgt,
Trang 630.2 Paths in a Graph 687
this_lft, this_rgt)
AS
SELECT PE.begin_node_id, PE.end_node_id, G1.node_id,
(SELECT COUNT(*)
FROM NestedSetsGraph AS G2
WHERE G2.lft > PE.begin_lft
AND G2.lft <= G1.lft
AND G2.rgt >= G1.rgt),
PE.begin_lft, PE.begin_rgt,
PE.end_lft, PE.end_rgt,
G1.lft, G1.rgt
FROM PathEndpoints AS PE
INNER JOIN
NestedSetsGraph AS G1
ON G1.lft BETWEEN PE.begin_lft
AND PE.end_lft
AND G1.rgt >= PE.end_rgt
30.2.1 Length of Paths
The length of a path is the number of edges that are traversed along the path A path of n nodes has a length of ( n − 1)
CREATE VIEW PathLengths
(begin_node_id, end_node_id,
path_length,
begin_lft, begin_rgt,
end_lft, end_rgt)
AS
SELECT begin_node_id, end_node_id, MAX(seq_nbr),
begin_lft, begin_rgt, end_lft, end_rgt
FROM Paths
GROUP BY begin_lft, end_lft, begin_rgt, end_rgt,
begin_node_id, end_node_id;
30.2.2 Shortest Path
The following code gives the shortest path length between all nodes, but it does not tell you what the actual path is There are other queries that use the new CTE feature and recursion, which we will discuss in Section 30.3
Trang 7688 CHAPTER 30: GRAPHS IN SQL
CREATE VIEW ShortestPathLengths (begin_node_id, end_node_id, path_length, begin_lft, begin_rgt, end_lft, end_rgt) AS
SELECT PL.begin_node_id, PL.end_node_id, PL.path_length,
PL.begin_lft, PL.begin_rgt, PL.end_lft, PL.end_rgt FROM (SELECT begin_node_id, end_node_id, MIN(path_length) AS path_length FROM PathLengths
GROUP BY begin_node_id, end_node_id) AS MPL INNER JOIN
PathLengths AS PL
ON MPL.begin_node_id = PL.begin_node_id AND MPL.end_node_id = PL.end_node_id AND MPL.path_length = PL.path_length;
30.2.3 Paths by Iteration
First, let’s build a graph that has a cost associated with each edge and put
it into an adjacency list model
INSERT INTO Edges (out_node, in_node, cost) VALUES ('A', 'B', 50),
('A', 'C', 30), ('A', 'D', 100), ('A', 'E', 10), ('C', 'B', 5), ('D', 'B', 20), ('D', 'C', 50), ('E', 'D', 10);
To find the shortest paths from one node to the other nodes it can reach, we can write this recursive VIEW
CREATE VIEW ShortestPaths (out_node, in_node, path_length) AS
WITH RECURSIVE Paths (out_node, in_node, path_length) AS
(SELECT out_node, in_node, 1 FROM Edges
Trang 830.2 Paths in a Graph 689
UNION ALL
SELECT E1.out_node, P1.in_node, P1.path_length + 1
FROM Edges AS E1, Paths AS P1
WHERE E1.in_node = P1.out_node)
SELECT out_node, in_node, MIN(path_length)
FROM Paths
GROUP BY out_node, in_node;
out_node in_node path_length
============================
'A' 'B' 1
'A' 'C' 1
'A' 'D' 1
'A' 'E' 1
'C' 'B' 1
'D' 'B' 1
'D' 'C' 1
'E' 'B' 2
'E' 'D' 1
To find the shortest paths without recursion, stay in a loop and add one edge at a time to the set of paths defined so far
CREATE PROCEDURE IteratePaths()
LANGUAGE SQL
MODIFIES SQL DATA
BEGIN
DECLARE old_path_tally INTEGER;
SET old_path_tally = 0;
DELETE FROM Paths; clean out working table
INSERT INTO Paths
SELECT out_node, in_node, 1
FROM Edges; load the edges
add one edge to each path
WHILE old_path_tally < (SELECT COUNT(*) FROM Paths)
DO SET old_path_tally = (SELECT COUNT(*) FROM Paths);
INSERT INTO Paths (out_node, in_node, lgth)
SELECT E1.out_node, P1.in_node, (1 + P1.lgth)
FROM Edges AS E1, Paths AS P1
WHERE E1.in_node = P1.out_node
AND NOT EXISTS path is not here already
Trang 9690 CHAPTER 30: GRAPHS IN SQL
(SELECT * FROM Paths AS P2 WHERE E1.out_node = P2.out_node AND P1.in_node = P2.in_node);
END WHILE;
END;
The least cost path is basically the same algorithm, but instead of a constant of one for the path length, we use the actual costs of the edges
CREATE PROCEDURE IterateCheapPaths () LANGUAGE SQL
MODIFIES SQL DATA BEGIN
DECLARE old_path_cost INTEGER;
SET old_path_cost = 0;
DELETE FROM Paths; clean out working table INSERT INTO Paths
SELECT out_node, in_node, cost FROM Edges; load the edges add one edge to each path WHILE old_path_cost < (SELECT COUNT(*) FROM Paths)
DO SET old_path_cost = (SELECT COUNT(*) FROM Paths);
INSERT INTO Paths (out_node, in_node, cost) SELECT E1.out_node, P1.in_node, (E1.cost + P1.cost) FROM Edges AS E1
INNER JOIN (SELECT out_node, in_node, MIN(cost) FROM Paths
GROUP BY out_node, in_node)
AS P1 (out_node, in_node, cost)
ON E1.in_node = P1.out_node AND NOT EXISTS
(SELECT * FROM Paths AS P2 WHERE E1.out_node = P2.out_node AND P1.in_node = P2.in_node AND P2.cost <= E1.cost + P1.cost);
END WHILE;
END;
Trang 1030.2 Paths in a Graph 691
30.2.4 Listing the Paths
I took the data for this table from the book Introduction to Algorithms
(Cormen, Leiserson, and Rivest 1990), page 518 This book was very popular in college courses in the United States I made one decision that will be important later: I added self-traversal edges (i.e., the node is both the out_node and the in_node of an edge) with weights of zero.
INSERT INTO Edges VALUES ('s', 's', 0);
INSERT INTO Edges VALUES ('s', 'u', 3);
INSERT INTO Edges VALUES ('s', 'x', 5);
INSERT INTO Edges VALUES ('u', 'u', 0);
INSERT INTO Edges VALUES ('u', 'v', 6);
INSERT INTO Edges VALUES ('u', 'x', 2);
INSERT INTO Edges VALUES ('v', 'v', 0);
INSERT INTO Edges VALUES ('v', 'y', 2);
INSERT INTO Edges VALUES ('x', 'u', 1);
INSERT INTO Edges VALUES ('x', 'v', 4);
INSERT INTO Edges VALUES ('x', 'x', 0);
INSERT INTO Edges VALUES ('x', 'y', 6);
INSERT INTO Edges VALUES ('y', 's', 3);
INSERT INTO Edges VALUES ('y', 'v', 7);
INSERT INTO Edges VALUES ('y', 'y', 0);
I am not happy about this approach, because I have to decide the maximum number of edges in a path before I start looking for an answer But this solution will work, and I know that a path will have no more than the total number of nodes in the graph Let’s create a table to hold the paths:
CREATE TABLE Paths
(step1 CHAR(2) NOT NULL,
step2 CHAR(2) NOT NULL,
step3 CHAR(2) NOT NULL,
step4 CHAR(2) NOT NULL,
step5 CHAR(2) NOT NULL,
total_cost INTEGER NOT NULL,
path_length INTEGER NOT NULL,
PRIMARY KEY (step1, step2, step3, step4, step5));