582 CHAPTER 25: ARRAYS IN SQL j INTEGER NOT NULL CHECK j > 0, CHECK SELECT MAXi FROM MyMatrix = SELECT COUNTi FROM MyMatrix, CHECK SELECT MAXj FROM MyMatrix = SELECT COUNTj FROM MyMatr
Trang 1582 CHAPTER 25: ARRAYS IN SQL
j INTEGER NOT NULL CHECK (j > 0), CHECK ((SELECT MAX(i) FROM MyMatrix) = (SELECT COUNT(i) FROM MyMatrix)), CHECK ((SELECT MAX(j) FROM MyMatrix)
= (SELECT COUNT(j) FROM MyMatrix)));
The constraints see that the subscripts of each element are within proper range I am starting my subscripts at one, but a little change in the logic would allow any value
25.3.1 Matrix Equality
This test for matrix equality is from the article “SQL Matrix Processing” (Mrdalj, Vujovic, and Jovanovic 1996) Two matrices are equal if their cardinalities and the cardinality of the their intersection are all equal
SELECT COUNT(*) FROM MatrixA UNION
SELECT COUNT(*) FROM MatrixB UNION
SELECT COUNT(*) FROM MatrixA AS A, MatrixB AS B WHERE A.i = B.i
AND A.j = B.j AND A.element = B.element;
You have to decide how to use this query in your context If it returns one number, they are the same; otherwise, they are different
25.3.2 Matrix Addition
Matrix addition and subtraction are possible only between matrices of the same dimensions The obvious way to do the addition is simply:
SELECT A.i, A.j, (A.element + B.element) AS total FROM MatrixA AS A, MatrixB AS B
WHERE A.i = B.i AND A.j = B.j;
But properly, you ought to add some checking to be sure the matrices match We can assume that both start numbering subscripts with either one or zero
Trang 225.3 Matrix Operations in SQL 583
SELECT A.i, A.j, (A.element + B.element) AS total
FROM MatrixA AS A, MatrixB AS B
WHERE A.i = B.i
AND A.j = B.j
AND (SELECT COUNT(*) FROM MatrixA) =
(SELECT COUNT(*) FROM MatrixB)
AND (SELECT MAX(i) FROM MatrixA) =
(SELECT MAX(i) FROM MatrixB)
AND (SELECT MAX(j) FROM MatrixA) =
(SELECT MAX(j) FROM MatrixB));
Likewise, to make the addition permanent, you can use the same basic query in an UPDATE statement:
UPDATE MatrixA
SET element = element + (SELECT element
FROM MatrixB
WHERE MatrixB.i = MatrixA.i
AND MatrixB.j = MatrixA.j)
WHERE (SELECT COUNT(*) FROM MatrixA)
=(SELECT COUNT(*) FROM MatrixB)
AND (SELECT MAX(i) FROM MatrixA)
= (SELECT MAX(i) FROM MatrixB)
AND (SELECT MAX(j) FROM MatrixA)
= (SELECT MAX(j) FROM MatrixB));
25.3.3 Matrix Multiplication
Multiplication by a scalar constant is direct and easy:
UPDATE MyMatrix
SET element = element * :constant;
Matrix multiplication is not as big a mess as might be expected
Remember that the first matrix must have the same number of rows
as the second matrix has columns That means A[i, k] * B[k, j] = C[i, j], which we can show with an example:
CREATE TABLE MatrixA
(i INTEGER NOT NULL
CHECK (i BETWEEN 1 AND 10), pick your own bounds
k INTEGER NOT NULL
Trang 3584 CHAPTER 25: ARRAYS IN SQL
CHECK (k BETWEEN 1 AND 10), must match MatrixB.k range element INTEGER NOT NULL,
PRIMARY KEY (i, k));
MatrixA
i k element
===================
1 1 2
1 2 -3
1 3 4
2 1 -1
2 2 0
2 3 2
CREATE TABLE MatrixB (k INTEGER NOT NULL CHECK (k BETWEEN 1 AND 10), must match MatrixA.k range
j INTEGER NOT NULL CHECK (j BETWEEN 1 AND 4), pick your own bounds element INTEGER NOT NULL,
PRIMARY KEY (k, j));
MatrixB
k j element
==================
1 1 -1
1 2 2
1 3 3
2 1 0
2 2 1
2 3 7
3 1 1
3 2 1
3 3 -2
CREATE VIEW MatrixC(i, j, element)
AS SELECT i, j, SUM(MatrixA.element * MatrixB.element) FROM MatrixA, MatrixB
WHERE MatrixA.k = MatrixB.k GROUP BY i, j;
Trang 425.4 Flattening a Table into an Array 585
This is taken directly from the definition of multiplication
25.3.4 Other Matrix Operations
The transposition of a matrix is easy to do:
CREATE VIEW TransA (i, j, element)
AS SELECT j, i, element FROM MatrixA;
Again, you can make the change permanent with an UPDATE statement:
UPDATE MatrixA
SET i = j, j = i;
Multiplication by a column or row vector is just a special case of
matrix multiplication, but a bit easier Given the vector V and MatrixA:
SELECT i, SUM(A.element * V.element)
FROM MatrixA AS A, VectorV AS V
WHERE V.j = A.i
GROUP BY A.i;
Cross tabulations and other statistical functions traditionally use an
array to hold data But you do not need a matrix for them in SQL
It is possible to do other matrix operations in SQL, but the code
becomes so complex, and the execution time so long, that it is simply
not worth the effort If a reader would like to submit queries for
eigenvalues and determinants, I will be happy to put them in future
editions of this book
25.4 Flattening a Table into an Array
Reports and data warehouse summary tables often want to see an array
laid horizontally across a line The original one element/one column
approach to mapping arrays was based on seeing such reports and
duplicating that structure in a table A subscript is often an enumeration,
denoting a month or another time period, rather than an integer
For example, a row in a “Salesmen” table might have a dozen
columns, one for each month of the year, each of which holds the total
commission earned in a particular month The year is really an array,
subscripted by the month The subscripts-and-value approach requires
Trang 5586 CHAPTER 25: ARRAYS IN SQL
more work to produce the same results It is often easier to explain a technique with an example Let us imagine a company that collects time cards from its truck drivers, each with the driver’s name, the week within the year (numbered 0 to 51 or 52, depending on the year), and his total hours We want to produce a report with one line for each driver and six weeks of his time across the page The Timecards table looks like this:
CREATE TABLE Timecards (driver_name CHAR(25) NOT NULL, week_nbr INTEGER NOT NULL CONSTRAINT valid_week_nbr CHECK(week BETWEEN 0 AND 52) work_hrs INTEGER
CONSTRAINT zero_or_more_hours CHECK(work_hrs >= 0),
PRIMARY KEY (driver_name, week_nbr));
We need to “flatten out” this table to get the desired rows for the report First, create a working storage table from which the report can be built:
CREATE TEMPORARY TABLE TimeReportWork working storage (driver_name CHAR(25) NOT NULL,
wk1 INTEGER, important that these columns are NULL-able wk2 INTEGER,
wk3 INTEGER, wk4 INTEGER, wk5 INTEGER, wk6 INTEGER);
Notice two important points about this table First, there is no primary key; second, the weekly data columns are NULL-able This table
is then filled with time card values:
INSERT INTO TimeReportWork (driver_name, wk1, wk2, wk3, wk4, wk5, wk6)
SELECT driver_name,
SUM(CASE (week_nbr = :rpt_week_nbr) THEN work_hrs ELSE 0 END) AS wk1,
SUM(CASE (week_nbr = :rpt_week_nbr - 1) THEN work_hrs ELSE 0 END) AS wk2,
SUM(CASE (week_nbr = :rpt_week_nbr - 2) THEN work_hrs ELSE 0 END) AS wk3,
SUM(CASE (week_nbr = :rpt_week_nbr - 3) THEN work_hrs ELSE 0 END) AS wk4,
SUM(CASE (week_nbr = :rpt_week_nbr - 4) THEN work_hrs ELSE 0 END) AS wk5,
Trang 625.5 Comparing Arrays in Table Format 587
SUM(CASE (week_nbr = :rpt_week_nbr - 5) THEN work_hrs ELSE 0 END) AS wk6
FROM Timecards
WHERE week_nbr BETWEEN :rpt_week_nbr AND (:rpt_week_nbr - 5);
The number of the weeks in the WHERE clauses will vary with the period covered by the report The parameter :rpt_week_nbr is “week
of the report,” and it computes backwards for the prior five weeks If a driver did not work in a particular week, the corresponding weekly column gets a zero hour total However, if the driver has not worked at all in the last six weeks, we could lose him completely (no time cards, no summary) Depending on the nature of the report, you might consider using an OUTER JOIN to a Personnel table to be sure you have all the drivers’ names
The NULLs are coalesced to zero in this example, but if you drop the
ELSE 0 clauses, the SUM() will have to deal with a week of all NULLs and return a NULL This enables you to tell the difference between a driver who was missing for the reporting period and a driver who worked zero hours but turned in a time card for that period That difference could be important for computing the payroll
25.5 Comparing Arrays in Table Format
It is often necessary to compare one array or set of values with another when the data is represented in a table Remember that comparing a set with a set does not involve ordering the elements, whereas an array does For this discussion, let us create two tables, one for employees and one for their dependents The children are subscripted in the order
of their births—i.e., 1 is the oldest living child, 2 is the second oldest, and so forth
CREATE TABLE Employees (emp_id INTEGER PRIMARY KEY, emp_name CHAR(15) NOT NULL, );
CREATE TABLE Dependents (emp_id INTEGER NOT NULL the parent kid CHAR(15) NOT NULL, the array element birthorder INTEGER NOT NULL, the array subscript PRIMARY KEY (emp_id, kid));
Trang 7588 CHAPTER 25: ARRAYS IN SQL
The query “Find pairs of employees whose children have the same set
of names” is very restrictive, but we can make it more so by requiring that the children be named in the same birth order Both Mr X and Mr
Y must have exactly the same number of dependents; both sets of names must match We can assume that no parent has two children with the same name (George Foreman does not work here) or born at the same time (we will order twins) Let us begin by inserting test data into the Dependents table, thus:
Dependents emp_id kid_name birthorder
==========================
1 'Dick' 2
1 'Harry' 3
1 'Tom' 1
2 'Dick' 3
2 'Harry' 1
2 'Tom' 2
3 'Dick' 2
3 'Harry' 3
3 'Tom' 1
4 'Harry' 1
4 'Tom' 2
5 'Curly' 2
5 'Harry' 3
5 'Moe' 1
In this test data, employees 1, 2, and 3 all have dependents named
‘Tom’, ‘Dick’, and ‘Harry’
The birth order is the same for the children of employees 1 and 3, but not for employee 2
For testing purposes, you might consider adding an extra child to the family of employee 3, and so forth, to play with this data
Though there are many ways to solve this query, this approach will give us some flexibility that others would not Construct a VIEW that gives us the number of dependents for each employee:
CREATE VIEW Familysize (emp_id, tally)
AS SELECT emp_id, COUNT(*) FROM Dependents GROUP BY emp_id;
Trang 825.5 Comparing Arrays in Table Format 589
Create a second VIEW that holds pairs of employees who have families of the same size (This VIEW is also useful for other statistical work, but that is another topic.)
CREATE VIEW Samesize (emp_id1, emp_id2, tally)
AS SELECT F1.emp_id, F2.emp_id, F1.tally
FROM Familysize AS F1, Familysize AS F2
WHERE F1.tally = F2.tally
AND F1.emp_id < F2.emp_id;
We will test for set equality by doing a self-JOIN on the dependents
of employees with families of the same size If one set can be mapped onto another with no children left over, and in the same birth order, then the two sets are equal
SELECT D1.emp_id, ' named his ',
S1.tally, ' kids just like ',
D2.emp_id
FROM Dependents AS D1, Dependents AS D2, Samesize AS S1
WHERE S1.emp_id1 = D1.emp_id
AND S1.emp_id2 = D2.emp_id
AND D1.kid = D2.kid
AND D1.birthorder = D2.birthorder
GROUP BY D1.emp_id, D2.emp_id, S1.tally
HAVING COUNT(*) = S1.tally;
If birth order is not important, then drop the predicate
D1.birthorder = D2.birthorder from the query
This is a form of exact relational division, with a second column equality test as part of the criteria
Trang 10C H A P T E R
26
Set Operations
B Y SET OPERATIONS, I mean union, intersection, and set differences, where the sets in SQL are tables These are the basic operators used in elementary set theory, which has been taught in the United States public school systems for decades Since the relational model is based
on sets, you would expect that SQL would have had a good variety of set operators from the start However, this was not the case Standard SQL has added the basic set operators, but they are still not common
in actual products
There is another problem in SQL that you did not have in high school set theory SQL tables are multisets (also called bags), which means that, unlike sets, they allow duplicate elements (rows or tuples) Dr Codd’s relational model is stricter and uses only true sets SQL handles these duplicate rows with an ALL or DISTINCT modifier
in different places in the language; ALL preserves duplicates, and
DISTINCT removes them
So that we can discuss the result of each operator formally, let R be
a row that is a duplicate of some row in TableA, or of some row in TableB, or of both Let m be the number of duplicates of R in TableA and let n be the number of duplicates of R in TableB, where (m >= 0) and (n >= 0) Informally, the engines will pair off the two tables on a row-per-row basis in set operations We will see how this works for each operator