Joe Celko s SQL for Smarties - Advanced SQL Programming P62 potx

582 CHAPTER 25: ARRAYS IN SQL j INTEGER NOT NULL CHECK j > 0, CHECK SELECT MAXi FROM MyMatrix = SELECT COUNTi FROM MyMatrix, CHECK SELECT MAXj FROM MyMatrix = SELECT COUNTj FROM MyMatr

Trang 1

582 CHAPTER 25: ARRAYS IN SQL

j INTEGER NOT NULL CHECK (j > 0), CHECK ((SELECT MAX(i) FROM MyMatrix) = (SELECT COUNT(i) FROM MyMatrix)), CHECK ((SELECT MAX(j) FROM MyMatrix)

= (SELECT COUNT(j) FROM MyMatrix)));

The constraints see that the subscripts of each element are within proper range I am starting my subscripts at one, but a little change in the logic would allow any value

25.3.1 Matrix Equality

This test for matrix equality is from the article “SQL Matrix Processing” (Mrdalj, Vujovic, and Jovanovic 1996) Two matrices are equal if their cardinalities and the cardinality of the their intersection are all equal

SELECT COUNT(*) FROM MatrixA UNION

SELECT COUNT(*) FROM MatrixB UNION

SELECT COUNT(*) FROM MatrixA AS A, MatrixB AS B WHERE A.i = B.i

AND A.j = B.j AND A.element = B.element;

You have to decide how to use this query in your context If it returns one number, they are the same; otherwise, they are different

25.3.2 Matrix Addition

Matrix addition and subtraction are possible only between matrices of the same dimensions The obvious way to do the addition is simply:

SELECT A.i, A.j, (A.element + B.element) AS total FROM MatrixA AS A, MatrixB AS B

WHERE A.i = B.i AND A.j = B.j;

But properly, you ought to add some checking to be sure the matrices match We can assume that both start numbering subscripts with either one or zero

Trang 2

25.3 Matrix Operations in SQL 583

SELECT A.i, A.j, (A.element + B.element) AS total

FROM MatrixA AS A, MatrixB AS B

WHERE A.i = B.i

AND A.j = B.j

AND (SELECT COUNT(*) FROM MatrixA) =

(SELECT COUNT(*) FROM MatrixB)

AND (SELECT MAX(i) FROM MatrixA) =

(SELECT MAX(i) FROM MatrixB)

AND (SELECT MAX(j) FROM MatrixA) =

(SELECT MAX(j) FROM MatrixB));

Likewise, to make the addition permanent, you can use the same basic query in an UPDATE statement:

UPDATE MatrixA

SET element = element + (SELECT element

FROM MatrixB

WHERE MatrixB.i = MatrixA.i

AND MatrixB.j = MatrixA.j)

WHERE (SELECT COUNT(*) FROM MatrixA)

=(SELECT COUNT(*) FROM MatrixB)

AND (SELECT MAX(i) FROM MatrixA)

= (SELECT MAX(i) FROM MatrixB)

AND (SELECT MAX(j) FROM MatrixA)

= (SELECT MAX(j) FROM MatrixB));

25.3.3 Matrix Multiplication

Multiplication by a scalar constant is direct and easy:

UPDATE MyMatrix

SET element = element * :constant;

Matrix multiplication is not as big a mess as might be expected

Remember that the first matrix must have the same number of rows

as the second matrix has columns That means A[i, k] * B[k, j] = C[i, j], which we can show with an example:

CREATE TABLE MatrixA

(i INTEGER NOT NULL

CHECK (i BETWEEN 1 AND 10), pick your own bounds

k INTEGER NOT NULL

Trang 3

CHECK (k BETWEEN 1 AND 10), must match MatrixB.k range element INTEGER NOT NULL,

PRIMARY KEY (i, k));

MatrixA

i k element

===================

1 1 2

1 2 -3

1 3 4

2 1 -1

2 2 0

2 3 2

CREATE TABLE MatrixB (k INTEGER NOT NULL CHECK (k BETWEEN 1 AND 10), must match MatrixA.k range

j INTEGER NOT NULL CHECK (j BETWEEN 1 AND 4), pick your own bounds element INTEGER NOT NULL,

PRIMARY KEY (k, j));

MatrixB

k j element

==================

1 1 -1

1 2 2

1 3 3

2 1 0

2 2 1

2 3 7

3 1 1

3 2 1

3 3 -2

CREATE VIEW MatrixC(i, j, element)

AS SELECT i, j, SUM(MatrixA.element * MatrixB.element) FROM MatrixA, MatrixB

WHERE MatrixA.k = MatrixB.k GROUP BY i, j;

Trang 4

25.4 Flattening a Table into an Array 585

This is taken directly from the definition of multiplication

25.3.4 Other Matrix Operations

The transposition of a matrix is easy to do:

CREATE VIEW TransA (i, j, element)

AS SELECT j, i, element FROM MatrixA;

Again, you can make the change permanent with an UPDATE statement:

UPDATE MatrixA

SET i = j, j = i;

Multiplication by a column or row vector is just a special case of

matrix multiplication, but a bit easier Given the vector V and MatrixA:

SELECT i, SUM(A.element * V.element)

FROM MatrixA AS A, VectorV AS V

WHERE V.j = A.i

GROUP BY A.i;

Cross tabulations and other statistical functions traditionally use an

array to hold data But you do not need a matrix for them in SQL

It is possible to do other matrix operations in SQL, but the code

becomes so complex, and the execution time so long, that it is simply

not worth the effort If a reader would like to submit queries for

eigenvalues and determinants, I will be happy to put them in future

editions of this book

25.4 Flattening a Table into an Array

Reports and data warehouse summary tables often want to see an array

laid horizontally across a line The original one element/one column

approach to mapping arrays was based on seeing such reports and

duplicating that structure in a table A subscript is often an enumeration,

denoting a month or another time period, rather than an integer

For example, a row in a “Salesmen” table might have a dozen

columns, one for each month of the year, each of which holds the total

commission earned in a particular month The year is really an array,

subscripted by the month The subscripts-and-value approach requires

Trang 5

more work to produce the same results It is often easier to explain a technique with an example Let us imagine a company that collects time cards from its truck drivers, each with the driver’s name, the week within the year (numbered 0 to 51 or 52, depending on the year), and his total hours We want to produce a report with one line for each driver and six weeks of his time across the page The Timecards table looks like this:

CREATE TABLE Timecards (driver_name CHAR(25) NOT NULL, week_nbr INTEGER NOT NULL CONSTRAINT valid_week_nbr CHECK(week BETWEEN 0 AND 52) work_hrs INTEGER

CONSTRAINT zero_or_more_hours CHECK(work_hrs >= 0),

PRIMARY KEY (driver_name, week_nbr));

We need to “flatten out” this table to get the desired rows for the report First, create a working storage table from which the report can be built:

CREATE TEMPORARY TABLE TimeReportWork working storage (driver_name CHAR(25) NOT NULL,

wk1 INTEGER, important that these columns are NULL-able wk2 INTEGER,

wk3 INTEGER, wk4 INTEGER, wk5 INTEGER, wk6 INTEGER);

Notice two important points about this table First, there is no primary key; second, the weekly data columns are NULL-able This table

is then filled with time card values:

INSERT INTO TimeReportWork (driver_name, wk1, wk2, wk3, wk4, wk5, wk6)

SELECT driver_name,

SUM(CASE (week_nbr = :rpt_week_nbr) THEN work_hrs ELSE 0 END) AS wk1,

SUM(CASE (week_nbr = :rpt_week_nbr - 1) THEN work_hrs ELSE 0 END) AS wk2,

Trang 6

25.5 Comparing Arrays in Table Format 587

SUM(CASE (week_nbr = :rpt_week_nbr - 5) THEN work_hrs ELSE 0 END) AS wk6

FROM Timecards

WHERE week_nbr BETWEEN :rpt_week_nbr AND (:rpt_week_nbr - 5);

The number of the weeks in the WHERE clauses will vary with the period covered by the report The parameter :rpt_week_nbr is “week

of the report,” and it computes backwards for the prior five weeks If a driver did not work in a particular week, the corresponding weekly column gets a zero hour total However, if the driver has not worked at all in the last six weeks, we could lose him completely (no time cards, no summary) Depending on the nature of the report, you might consider using an OUTER JOIN to a Personnel table to be sure you have all the drivers’ names

The NULLs are coalesced to zero in this example, but if you drop the

ELSE 0 clauses, the SUM() will have to deal with a week of all NULLs and return a NULL This enables you to tell the difference between a driver who was missing for the reporting period and a driver who worked zero hours but turned in a time card for that period That difference could be important for computing the payroll

25.5 Comparing Arrays in Table Format

It is often necessary to compare one array or set of values with another when the data is represented in a table Remember that comparing a set with a set does not involve ordering the elements, whereas an array does For this discussion, let us create two tables, one for employees and one for their dependents The children are subscripted in the order

of their births—i.e., 1 is the oldest living child, 2 is the second oldest, and so forth

CREATE TABLE Employees (emp_id INTEGER PRIMARY KEY, emp_name CHAR(15) NOT NULL, );

CREATE TABLE Dependents (emp_id INTEGER NOT NULL the parent kid CHAR(15) NOT NULL, the array element birthorder INTEGER NOT NULL, the array subscript PRIMARY KEY (emp_id, kid));

Trang 7

The query “Find pairs of employees whose children have the same set

of names” is very restrictive, but we can make it more so by requiring that the children be named in the same birth order Both Mr X and Mr

Y must have exactly the same number of dependents; both sets of names must match We can assume that no parent has two children with the same name (George Foreman does not work here) or born at the same time (we will order twins) Let us begin by inserting test data into the Dependents table, thus:

Dependents emp_id kid_name birthorder

==========================

1 'Dick' 2

1 'Harry' 3

1 'Tom' 1

2 'Dick' 3

2 'Harry' 1

2 'Tom' 2

3 'Dick' 2

3 'Harry' 3

3 'Tom' 1

4 'Harry' 1

4 'Tom' 2

5 'Curly' 2

5 'Harry' 3

5 'Moe' 1

In this test data, employees 1, 2, and 3 all have dependents named

‘Tom’, ‘Dick’, and ‘Harry’

The birth order is the same for the children of employees 1 and 3, but not for employee 2

For testing purposes, you might consider adding an extra child to the family of employee 3, and so forth, to play with this data

Though there are many ways to solve this query, this approach will give us some flexibility that others would not Construct a VIEW that gives us the number of dependents for each employee:

CREATE VIEW Familysize (emp_id, tally)

AS SELECT emp_id, COUNT(*) FROM Dependents GROUP BY emp_id;

Trang 8

25.5 Comparing Arrays in Table Format 589

Create a second VIEW that holds pairs of employees who have families of the same size (This VIEW is also useful for other statistical work, but that is another topic.)

CREATE VIEW Samesize (emp_id1, emp_id2, tally)

AS SELECT F1.emp_id, F2.emp_id, F1.tally

FROM Familysize AS F1, Familysize AS F2

WHERE F1.tally = F2.tally

AND F1.emp_id < F2.emp_id;

We will test for set equality by doing a self-JOIN on the dependents

of employees with families of the same size If one set can be mapped onto another with no children left over, and in the same birth order, then the two sets are equal

SELECT D1.emp_id, ' named his ',

S1.tally, ' kids just like ',

D2.emp_id

FROM Dependents AS D1, Dependents AS D2, Samesize AS S1

WHERE S1.emp_id1 = D1.emp_id

AND S1.emp_id2 = D2.emp_id

AND D1.kid = D2.kid

AND D1.birthorder = D2.birthorder

GROUP BY D1.emp_id, D2.emp_id, S1.tally

HAVING COUNT(*) = S1.tally;

If birth order is not important, then drop the predicate

D1.birthorder = D2.birthorder from the query

This is a form of exact relational division, with a second column equality test as part of the criteria

Trang 10

C H A P T E R

26

Set Operations

B Y SET OPERATIONS, I mean union, intersection, and set differences, where the sets in SQL are tables These are the basic operators used in elementary set theory, which has been taught in the United States public school systems for decades Since the relational model is based

on sets, you would expect that SQL would have had a good variety of set operators from the start However, this was not the case Standard SQL has added the basic set operators, but they are still not common

in actual products

There is another problem in SQL that you did not have in high school set theory SQL tables are multisets (also called bags), which means that, unlike sets, they allow duplicate elements (rows or tuples) Dr Codd’s relational model is stricter and uses only true sets SQL handles these duplicate rows with an ALL or DISTINCT modifier

in different places in the language; ALL preserves duplicates, and

DISTINCT removes them

So that we can discuss the result of each operator formally, let R be

a row that is a duplicate of some row in TableA, or of some row in TableB, or of both Let m be the number of duplicates of R in TableA and let n be the number of duplicates of R in TableB, where (m >= 0) and (n >= 0) Informally, the engines will pair off the two tables on a row-per-row basis in set operations We will see how this works for each operator

Định dạng
Số trang	10
Dung lượng	226,79 KB