func-For example, using our Employee table withPERCENT_RANK and CUME_DIST: SELECT empno, ename, region, RANK OVERPARTITION BY region ORDER BY curr_salary of PERCENT_RANK and CUME_DIST as
Trang 1The value of nr here is 20 (20 rows).
By the row, the CUME_RANK calculation is:
CNAME TEMP RANK rownum cr calculation CD - - - - - -
The PERCENT_RANK and CUME_RANK tions are very specialized and far less common than RANK or ROW_NUMBER Also, in our examples we have depicted only one grouping — one partition A PARTITION BY clause may be added to the analytic clause of the function, and sub-grouping and sub-PER- CENT_RANKs and CUME_DISTs may also be reported.
Trang 2func-For example, using our Employee table with
PERCENT_RANK and CUME_DIST:
SELECT empno, ename, region,
RANK() OVER(PARTITION BY region ORDER BY curr_salary)
of PERCENT_RANK and CUME_DIST as per the previous algorithms.
Trang 3SQL for Analysis in Data Warehouses, Oracle
Corpo-ration, Redwood Shores, CA, Oracle9i Data Warehousing Guide, Release 2 (9.2), Part Number A96520-01.
For an excellent discussion of how Oracle 10g has
improved querying, see “DSS Performance in
Oracle Database 10g,” an Oracle white paper,
Sep-tember 2003 This article shows how the Optimizer
has been improved in 10g.
Trang 4Chapter 4
Aggregate Functions Used as Analytical Functions (Analytical Functions II)
The Use of Aggregate Functions
in SQL
Many of the common aggregate functions can be used
as analytical functions: SUM, AVG, COUNT, STDDEV, VARIANCE, MAX, and MIN The aggre- gate functions used as analytical functions offer the advantage of partitioning and ordering as well As an example, say you want to display each person’s employee number, name, original salary, and the aver- age salary of all employees This cannot be done with a query like the following because you cannot mix aggre- gates and row-level results.
Trang 5SELECT empno, ename, orig_salary,AVG(orig_salary)
FROM employeeORDER BY ename
Gives:
SELECT empno, ename, orig_salary,
*ERROR at line 1:
ORA-00937: not a single-group group function
But we can use a Cartesian product/virtual table like this:
SELECT e.empno, e.ename, e.orig_salary,x.aos "Avg salary"
FROM employee e,(SELECT AVG(orig_salary) aos FROM employee) xORDER BY ename
This type of query is borderline cumbersome and may
be done far more easily using AVG in an analytical function:
Trang 6SELECT empno, ename, orig_salary,
AVG(orig_salary) OVER() "Avg salary"
This display looks off-balance due to the decimal points
in the average salary We can modify the displayed result using the analytical function nested inside an ordinary row-level function; a better version of the query with a ROUND function added would be:
SELECT empno, ename, orig_salary,
ROUND(AVG(orig_salary) OVER()) "Avg salary"
Trang 7The aggregate/analytical function uses an argument to
specify which column is aggregated/analyzed (orig_ salary) It should also be noted that there is a null
OVER clause When the OVER clause is null as it is here, it is said to be a reporting function and applies to the entire dataset.
We can use partitioning in the OVER clause of the aggregate-analytical function like this:
SELECT empno, ename, orig_salary, region,ROUND(AVG(orig_salary) OVER(PARTITION BY region))
"Avg Salary"
FROM employeeORDER BY region, ename
In this version of the query, we now have the average
by region reported along with the other ordinary row data for an individual.
The result of the row-level reporting may be used
in arithmetic in the result set Suppose we wanted to see the difference between a person’s salary and the average for his or her region This example shows that query:
Trang 8SELECT empno, ename, region, curr_salary,orig_salary,
ROUND(AVG(orig_salary) OVER(PARTITION BY region))
"Avg-group",ROUND(orig_salary - AVG(orig_salary) OVER(PARTITION
BY region)) "Diff."
FROM employeeORDER BY region, ename
COLUMN portion FORMAT 99.9999SELECT ename, curr_salary,curr_salary/SUM(curr_salary) OVER() PortionFROM employee
ORDER BY curr_salary
Trang 9Notice that the PORTION column adds up to 100%:
COLUMN total FORMAT 9.9999SELECT sum(o.portion) TotalFROM
(SELECT i.ename, i.curr_salary,i.curr_salary/SUM(i.curr_salary) OVER() PortionFROM employee i
ORDER BY i.curr_salary) o
Gives:
TOTAL -1.0000
The above query showing the fraction of salary tioned to each individual can be done in one step with
appor-an appor-analytical function called RATIO_TO_REPORT, which is used like this:
COLUMN portion2 LIKE portionSELECT ename, curr_salary,curr_salary/SUM(curr_salary) OVER() Portion,RATIO_TO_REPORT(curr_salary) OVER() Portion2FROM employee
Trang 10SELECT ename, curr_salary, region,
curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,
RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region)Portion2
Trang 11Notice that the portion amounts add to 1.000 in each region:
SELECT ename, curr_salary, region,curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,
RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region)Portion2
FROM employeeUNIONSELECT null, TO_NUMBER(null), region, sum(P1), sum(p2)FROM
(SELECT ename, curr_salary, region,curr_salary/SUM(curr_salary) OVER(PARTITION BY Region) P1,RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region) P2FROM employee)
GROUP BY regionORDER BY 3,2
Trang 12A similar report can be had without the UNION workaround with the following SQL*Plus formatting commands included in a script:
BREAK ON region
COMPUTE sum of portion ON region
SELECT ename, curr_salary, region,
curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,
RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region)Portion2
Christina 55000 302197802 302197802
******
Trang 13Windowing Subclauses with Physical Offsets in Aggregate Analytical
Functions
A windowing subclause is a way of capturing several rows of a result set (i.e., a “window”) and reporting the result in one “window row.” An example of this tech- nique would be in applications where one wants to smooth data by finding a moving average Moving averages are most often calculated based on sorted data and on a physical offset of rows Once we have established how the physical (row) offsets function, we will explore logical (range) offsets To illustrate the moving average using physical offsets, suppose we have some observations that have these values:
Suppose further we know that the data is noisy; that is,
it contains a random factor that is added or subtracted from what we might consider a “true” value One way
to smooth out the data and remove some of the random
noise is to use a moving average on ordered data by taking an average using n physical rows above and
below each row A moving average will operate in a window so that if the moving average is based on, say,
three numbers (n = 3), the windows and their reported
window rows would be:
Trang 14These calculations result in this display of the data:
Time Value Moving Average
of readings with only the “inside” numbers smoothed.
In Oracle’s analytical functions, the way the gate functions work is that the end points are reported, but they are based on averages that include nulls in
Trang 15aggre-rows preceding and past the data points In Oracle, nulls in calculations involving aggregate functions are ignored Consider, for example, this query:
SELECT ename, curr_salaryFROM empwnulls
UNIONSELECT 'The average ', averageFROM
(SELECT avg(curr_salary) averageFROM empwnulls)
Returning to our simple example and the moving averages we have computed thus far:
Time Value Moving Average
Trang 16The end points would be calculated as follows:
AVG(attribute1) OVER (ORDER BY attribute2)
ROWS BETWEEN x PRECEDING
AND y FOLLOWING
where attribute1 and attribute2 do not have to be the same attribute Attribute2 defines the window, and attribute1 defines the value on which to operate The
designation of “ROWS” means we will use a physical
offset The x and y values are the row limits — the
number of physical rows below and above the window (Later, we will look at another way to do these prob- lems using a logical offset, RANGE, instead of ROWS.)
Trang 17The ORDER BY in the analytical clause is absolutely necessary, and only one attribute may be used for ordering in the function Also, only numeric or date data types would make sense in calculations of aggre- gates Here is the above example in SQL using physical offsets for the moving average on a table called
Testma:
SELECT * FROM testma;
Which gives:
MTIME MVALUE - -
ORDER BY mtime
Gives:
MTIME MVALUE MA - - -
Trang 18If the ordering subclause is changed, then the ordering is done first and then the moving average:
row-SELECT mtime, mvalue,
AVG(mvalue) OVER(ORDER BY mvalue
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) ma
COLUMN ma FORMAT 99.999
COLUMN sum LIKE ma
COLUMN "sum/3" LIKE ma
SELECT mtime, mvalue,
AVG(mvalue) OVER(ORDER BY mtime
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) ma,
SUM(mvalue) OVER(ORDER BY mtime
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sum,
(SUM(mvalue) OVER(ORDER BY mtime
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING))/3 "Sum/3"FROM testma
ORDER BY mtime
Trang 19Also, we can use the COUNT aggregate analytical function to show how many rows are included in each window like this:
SELECT mtime, mvalue,COUNT(mvalue) OVER(ORDER BY mtimeROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) HowmanyrowsFROM testma
Trang 20An Expanded Example of a Physical Window
We will need some additional data to look at more examples of windowing functions Let us consider the following data of some fictitious stock whose symbol is FROG:
COLUMN price FORMAT 9999.99SELECT *
FROM stockWHERE symb like 'FR%'ORDER BY symb desc, dte
Which gives:
SYMB DTE PRICE - - -FROG 06-JAN-06 63.13FROG 09-JAN-06 63.52FROG 10-JAN-06 64.30FROG 11-JAN-06 65.11FROG 12-JAN-06 65.07FROG 13-JAN-06 65.67FROG 16-JAN-06 65.60FROG 17-JAN-06 65.99FROG 18-JAN-06 66.11FROG 19-JAN-06 66.26FROG 20-JAN-06 67.03FROG 23-JAN-06 67.51FROG 24-JAN-06 67.23FROG 25-JAN-06 67.43FROG 26-JAN-06 67.27FROG 27-JAN-06 66.85FROG 30-JAN-06 66.95FROG 31-JAN-06 67.82FROG 01-FEB-06 68.21FROG 02-FEB-06 68.60FROG 03-FEB-06 68.76
Trang 21FROG 06-FEB-06 69.55FROG 07-FEB-06 69.89FROG 08-FEB-06 70.18FROG 09-FEB-06 70.18
28 rows selected
To see how the moving average window can expand,
we can change the clause ROWS BETWEEN x PRECEDING AND y FOLLOWING to have different
values for x and y In fact, x and y do not have to be the same value at all For example, suppose we let x = 3 and y = 1, which gives more weight to three days
before the row-window date and less to the one day after The query and result look like this:
COLUMN ma FORMAT 99.999SELECT dte, price,AVG(price) OVER(ORDER BY dteROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) maFROM stock
WHERE symb like 'FR%'ORDER BY dte
Giving:
- 03-JAN-06 62.45 62.83504-JAN-06 63.22 62.82705-JAN-06 62.81 62.90306-JAN-06 63.13 63.32509-JAN-06 63.52 63.65010-JAN-06 64.30 64.01511-JAN-06 65.11 64.22612-JAN-06 65.07 64.73413-JAN-06 65.67 65.15016-JAN-06 65.60 65.488
Trang 22The trailing end is done similarly:
02-FEB-06 68.60 68.068
03-FEB-06 68.76 68.588
06-FEB-06 69.55 69.002
07-FEB-06 69.89 69.396 (68.60 + 68.76 + 69.55 + 69.89 + 70.18)/508-FEB-06 70.18 69.712 (68.76 + 69.55 + 69.89 + 70.18 + 70.18)/509-FEB-06 70.18 69.950 (69.55 + 69.89 + 70.18 + 70.18)/4
Trang 23We can clarify the demonstration a bit by displaying which rows are used in these moving average calcula- tions with two other analytical functions: FIRST_ VALUE and LAST_VALUE These two functions tell
us which rows are used in the calculation of the window function for each row.
COLUMN first FORMAT 9999.99COLUMN last LIKE firstSELECT dte, price,AVG(price) OVER(ORDER BY dteROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) ma,FIRST_VALUE(price) OVER(ORDER BY dte
ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) first,LAST_VALUE(price) OVER(ORDER BY dte
ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) lastFROM stock
WHERE symb like 'F%'ORDER BY dte
Giving:
- - 03-JAN-06 62.45 62.835 62.45 63.2204-JAN-06 63.22 62.827 62.45 62.8105-JAN-06 62.81 62.903 62.45 63.1306-JAN-06 63.13 63.325 63.13 63.5209-JAN-06 63.52 63.650 63.13 64.3010-JAN-06 64.30 64.015 63.13 65.1111-JAN-06 65.11 64.226 63.13 65.0712-JAN-06 65.07 64.734 63.52 65.6713-JAN-06 65.67 65.150 64.30 65.6016-JAN-06 65.60 65.488 65.11 65.9917-JAN-06 65.99 65.688 65.07 66.1118-JAN-06 66.11 65.926 65.67 66.2619-JAN-06 66.26 66.198 65.60 67.0320-JAN-06 67.03 66.580 65.99 67.51
Trang 24-25-JAN-06 67.43 67.294 67.03 67.2726-JAN-06 67.27 67.258 67.51 66.8527-JAN-06 66.85 67.146 67.23 66.9530-JAN-06 66.95 67.264 67.43 67.8231-JAN-06 67.82 67.420 67.27 68.2101-FEB-06 68.21 67.686 66.85 68.6002-FEB-06 68.60 68.068 66.95 68.7603-FEB-06 68.76 68.588 67.82 69.5506-FEB-06 69.55 69.002 68.21 69.8907-FEB-06 69.89 69.396 68.60 70.1808-FEB-06 70.18 69.712 68.76 70.1809-FEB-06 70.18 69.950 69.55 70.18
Displaying a Running Total Using SUM as an Analytical Function
As we noted earlier, the aggregate function SUM may
be used as an analytical function (as may AVG, MAX, MIN, COUNT, STDDEV, and VARIANCE) The SUM function is most easily seen when using a cumula- tive total calculation For example, suppose we have the following receipts for a cash register application for several weeks ordered by date and location (DTE, LOCATION):
SELECT * FROM storeORDER BY dte, location
Giving:
LOCATION DTE RECEIPTS - - -MOBILE 07-JAN-06 724.6PROVO 07-JAN-06 969.61MOBILE 08-JAN-06 88.76PROVO 08-JAN-06 662.45MOBILE 09-JAN-06 705.47
Trang 25PROVO 09-JAN-06 928.37MOBILE 10-JAN-06 217.26PROVO 10-JAN-06 664.9MOBILE 11-JAN-06 16.13PROVO 11-JAN-06 694.51MOBILE 12-JAN-06 421.59PROVO 12-JAN-06 413.12MOBILE 13-JAN-06 403.95PROVO 13-JAN-06 645.78MOBILE 14-JAN-06 831.12PROVO 14-JAN-06 678.41MOBILE 15-JAN-06 783.57PROVO 15-JAN-06 491.05MOBILE 16-JAN-06 878.15PROVO 16-JAN-06 635.75MOBILE 17-JAN-06 968.89PROVO 17-JAN-06 378.25MOBILE 18-JAN-06 351PROVO 18-JAN-06 882.51MOBILE 19-JAN-06 975.73PROVO 19-JAN-06 24.52MOBILE 20-JAN-06 191PROVO 20-JAN-06 542.2MOBILE 21-JAN-06 462.92PROVO 21-JAN-06 294.19MOBILE 22-JAN-06 707.57PROVO 22-JAN-06 729.92MOBILE 23-JAN-06 919.61PROVO 23-JAN-06 272.24MOBILE 24-JAN-06 217.91PROVO 24-JAN-06 554.12
Now, suppose we’d like to have a running total of the receipts regardless of the location One way to obtain this display is to use SUM and a slightly different physical offset Previously we used this analytical function:
Trang 26ROWS UNBOUNDED PRECEDING
This means that we will start with the first row and use all rows up to the current row of the window.
COLUMN "Running total" FORMAT 99,999.99
SELECT dte "Date", location, receipts,
SUM(receipts) OVER(ORDER BY dte
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) "Running total"
FROM store
WHERE dte < '10-Jan-2006'
ORDER BY dte, location
Trang 27Date LOCATION RECEIPTS Running total - - - -07-JAN-06 MOBILE 724.6 724.6007-JAN-06 PROVO 969.61 1,694.2108-JAN-06 MOBILE 88.76 1,782.9708-JAN-06 PROVO 662.45 2,445.4209-JAN-06 MOBILE 705.47 3,150.8909-JAN-06 PROVO 928.37 4,079.26
UNBOUNDED FOLLOWING
The clause UNBOUNDED FOLLOWING is used for the end of the window Such a command is used like this:
SELECT dte "Date", location, receipts,SUM(receipts) OVER(ORDER BY dteROWS BETWEEN CURRENT ROWAND UNBOUNDED FOLLOWING) "Running total"
FROM storeWHERE dte < '10-Jan-2006'ORDER BY dte, location
Which results in:
Date LOCATION RECEIPTS Running total - - - -07-JAN-06 MOBILE 724.6 4079.2607-JAN-06 PROVO 969.61 3354.6608-JAN-06 MOBILE 88.76 2385.0508-JAN-06 PROVO 662.45 2296.2909-JAN-06 MOBILE 705.47 1633.8409-JAN-06 PROVO 928.37 928.37