But even with today’s SQL which lacks these temporal extensions, Asserted Versioning manages assertion and effective time date pairs as user-defined PERIOD datatypes, and supports all th
Trang 1AND c.eff_beg_dt <¼ cl.row_crt_dt AND c.eff_end_dt > cl.row_crt_dt AND c.asr_beg_dt <¼ cl.row_crt_dt AND c.asr_end_dt > cl.row_crt_dt WHERE cl.claim_amt > p.copay_amt ORDER BY cl.adjud_dt, c.client_nbr, p.policy_nbr, p.eff_beg_dt;
To conclude this section, we show what this query might look like if the SQL language supported PERIOD datatypes, and also our taxonomy of Allen relationships We suppose that the taxon-omy node [fills1] is represented by the reserved word INCLUDES With a SQL language like this, the Asserted Versioning schema no longer has pairs of dates to represent its two time periods Instead,
it has the single columns asr_per and eff_per
SELECT c.client_nbr, c.client_nm,
p.policy_nbr, p.policy_type, p.copay_amt, cl.service_dt, cl.claim_amt, cl.adjud_dt FROM Claim cl
INNER JOIN Policy_AV p
ON p.policy_oid ¼ cl.policy_oid AND p.eff_per INCLUDES cl.service_dt AND p.asr_per INCLUDES cl.adjud_dt INNER JOIN Client_AV c
ON c.client_oid ¼ p.client_oid AND c.eff_per INCLUDES cl.row_crt_dt AND c.asr_per INCLUDES cl.row_crt_dt WHERE cl.claim_amt > p.copay_amt ORDER BY cl.adjud_dt, c.client_nbr, p.policy_nbr, p.eff_beg_dt;
In either form, what is striking about the query is its simplicity relative to the complexity of the bi-temporal semantics that under-lies it Unlike queries in the standard temporal model and, for that matter, uni-temporal queries in the alternative temporal model as well, this query does not assemble a collection of rows and then proceed to check for temporal gaps and temporal overlaps within sub-selected collections of those rows Asserted Versioning enforces bi-temporal semantics once, as the data is being created and modified, rather than each time the data is queried
In Other Words With appropriate temporal extensions to the SQL language, the expression of all thirteen Allen relationships, and of this and other relationships which are combinations of those
Trang 2thirteen relationships, would be greatly simplified The first
thing that is needed to support predicates for these relationships
is to provide a PERIOD datatype, as we discussed in Chapter 3
With that datatype available, SQL could express each of the
relationships we have discussed with one binary predicate
relat-ing two time periods (not two pairs of dates)
For example, instead of having to request data associated
with two time periods such that the first starts before the second
and ends after the second starts but before the second ends, we
could simply request data associated with two time periods such
that the first [overlaps] the second
Or, instead of having to request data associated with two time
periods such that the first doesn’t start after the second and doesn’t
end before the second, we could simply request data associated
with two time periods such that the first [fills] the second
It is clearly easier to think about what information one
wants from the database at the higher level of abstraction
provided by this new datatype and these new relationships,
rather than at the level of abstraction in which begin and end
dates have to be used, as they are in the original formulation
of the example And it is just as clearly easier to write the
corresponding SQL
But even with today’s SQL which lacks these temporal
extensions, Asserted Versioning manages assertion and effective
time date pairs as user-defined PERIOD datatypes, and supports
all the Allen relationships as well as the other relationships in
our Allen relationship taxonomy Asserted Versioning thus
pro-vides a migration path to the day when these extensions are
supported in the SQL standard and in commercial DBMSs
Glossary References
Glossary entries whose definitions form strong
inter-dependencies are grouped together in the following list The
same glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter There will usually be
sev-eral other, and often many other, glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered
We note, in particular, that none of the nodes in the Asserted
Versioning taxonomy of Allen relationships are included in this
list In general, we leave taxonomy nodes out of these lists since
they are long enough without them
Trang 3Allen relationships Asserted Versioning Framework (AVF) episode
clock tick closed-open contiguous granularity effective begin date effective end date object
PERIOD datatype point in time time period temporal entity integrity (TEI) temporal referential integrity (TRI) the alternative temporal model the standard temporal model version
Trang 4OPTIMIZING ASSERTED
VERSIONING DATABASES
Bi-Temporal, Conventional, and Non-Temporal Databases 350
Data Volumes in Bi-Temporal and in Conventional Databases 350
Response Times in Bi-Temporal and in Conventional Databases 351
The Optimization Drill: Modify, Monitor, Repeat 351
Performance Tuning Bi-Temporal Tables Using Indexes 352
General Considerations 353
Indexes to Optimize Queries 354
Indexes to Optimize Temporal Referential Integrity 366
Other Techniques for Performance Tuning Bi-Temporal Tables 372
Avoiding MAX(dt) Predicates 372
NULL vs 12/31/9999 372
Partitioning 373
Clustering 375
Materialized Query Tables 376
Standard Tuning Techniques 377
Glossary References 378
One concern about Asserted Versioning is with how well
it will perform We believe that with recent improvements in
technology, and with the use of the physical design techniques
described in this chapter, Asserted Versioning databases can
achieve performance very close to that of conventional
databases This is especially true for queries, which are
usually the most frequent kind of access to any relational
database The AVF, our own implementation of Asserted
Versioning, is designed to operate well with large data volume
databases supporting a high volume of mixed-type data retrieval
requests
Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00015-7
Copyright # 2010 Elsevier Inc All rights of reproduction in any form reserved. 349
Trang 5Bi-Temporal, Conventional, and Non-Temporal Databases
In this section, we compare data volumes and response times
in bi-temporal and in conventional databases We find that differences in both data volumes and response times are gener-ally quite small, and are usugener-ally not good reasons for hesitating
to implement bi-temporal data in even the largest databases of the world’s largest corporations
Data Volumes in Bi-Temporal and in Conventional Databases
It might seem that a bi-temporal database will have a lot more data in it than a conventional database, and will conse-quently take a lot longer to process It is true that the size of a bi-temporal database will be larger than that of an otherwise identical database which contains only current data about per-sistent objects But in our consulting engagements, which span several decades and dozens of clients, we have found that in most mission-critical systems, temporal data is jury-rigged into ostensibly non-temporal databases
There are any number of ways that this may happen For example, in some systems a version date is added to the primary key of selected tables In other systems, more advanced forms of best practice versioning (as described in Chapter 4) are employed Sometimes, history will be captured by triggering an insert into a history table every time a particular non-temporal table is modified Another approach is to generate a series of periodic snapshot tables that capture the state of a non-temporal table at regular intervals
Of course, a database with no temporal data at all will certainly be smaller than the same database with temporal data But adding up the overhead associated with embedded best practice versioning, or with triggered history, periodic snapshots or some combination of these and other techniques, the amount of data in a so-called non-temporal database may be as much or even more than the amount of data in a bi-temporal database
Throughout this book, we have been using the terms “non-temporal database” and “conventional database” as equivalent expressions But now we have a reason to distinguish them From now on, we will call a database “non-temporal” only if it
Trang 6contains no temporal data about persistent objects at all.1 And
from now on, we will use the term “conventional database” to
refer to databases that may or may not contain temporal data
about persistent objects (and that usually do), but that do not
contain explicitly bi-temporal tables and instead incorporate
temporal data by using variations on one or more of the ad
hoc methods we have described
Response Times in Bi-Temporal and
in Conventional Databases
At the level of individual tables, a table lacking temporal
data will clearly have less data than an otherwise identical table
that also contains temporal data But even if a table has more
data than another table, it may perform nearly as well as that
other table because response times are usually not linear to the
amount of data in the target table
Response times will be approximately linear to the amount of
data in the table in the case of full table scans, but will almost never
be linear for direct access reads A direct (random) read to a table
with five million rows will perform almost as well as a direct read
to a table with only one million rows, provided that the table is
indexed properly and that the number of non-leaf index levels is
the same And, in most cases, they will be the same, or very close to it
In addition, when adding in the overhead of triggers of an
expo-nentially growing number of dependents, and of the often
ineffi-cient SQL used to access and maintain data in conventional
databases, it is likely that using the AVF to manage temporal data
in an Asserted Versioning database will prove to be a more efficient
method of managing temporal data than directly invoking DBMS
methods to manage temporal data in a conventional database
The Optimization Drill: Modify, Monitor,
Repeat
Performance optimization, also known as “performance
tun-ing”, is usually an iterative approach to making and then
moni-toring modifications to an application and its database It
1 The point of adding “about persistent objects”, of course, is to distinguish between
objects and events, as we did in our taxonomy in Chapter 2 So a “non-temporal
database”, in this new sense, may contain event tables, i.e tables of transactions And
it may also contain fact-dimension data marts What it may not contain is data about
any historical (or future) states of persistent objects.
Trang 7could involve adjusting the configuration of the database and server, or making changes to the applications and the SQL that maintain and query the database As authors of this book, we can’t participate in the specific modify and monitor iterative pro-cesses being carried on by any of our readers and their IT organizations But we can describe factors that are likely to apply
to any Asserted Versioning implementation
These factors include the number of users, the complexity of the application and the SQL, the volatility of the data, and the DBMS and server platform The major DBMSs may optimize varying configurations differently, and may have extensions that can be used to simplify and improve a “plain vanilla” implemen-tation of Asserted Versioning
In this chapter, we will take a broad brush approach and, in general, discuss optimization techniques that apply to the temporalization of any relational database, regardless of what industry its owning organization is part of, and regardless of what types of applications it supports Each reader will need to review these recommendations and determine if and how they apply to specific databases and applications that she may be responsible for
To repeat once more as we read the following sections, although we use the term “date” in this book to describe the delimiters of assertion and effective time periods, those delimiters can actually be of any time duration, such as a day, minute, second or microsecond We use a month as the clock tick granu-larity in many of our examples But in most cases, a finer level of granularity will be chosen, such as a timestamp representing the smallest clock tick supported by the DBMS
Performance Tuning Bi-Temporal Tables Using Indexes
Many indexes are designed using something similar to a B-tree (balanced tree) structure, in which each node points to its next-level child nodes, and the leaf nodes contain pointers
to the desired data These indexes are used by working down from the top of the hierarchy until the leaf node containing the desired pointer is reached Each pointer is a specific index value paired with the physical address, page or row id of the row that matches that value From that point, the DBMS can
do a direct read and retrieve the I/O page that contains the desired data
Trang 8B-tree indexes for bi-temporal tables work no differently
than B-tree indexes for non-temporal tables Knowing how
these indexes work, our design objective is to construct indexes
that will optimize the speed of access to the most frequently
accessed data In bi-temporal tables, we believe, that will
almost always be the currently asserted current versions of
the objects represented in those tables As index designers,
our task is two-fold First, we need to determine the best
columns to index on Then we need to arrange those columns
in the best sequence
General Considerations
The physical sequence of columns within an index has a
sig-nificant impact on the performance of queries that use that
index Our objective is to get to the desired row in a table with
the minimum amount of I/O activity against the index, followed
by a single direct read to the table itself So in determining the
sequence of columns in an index, a good idea is to put the most
frequently used lookup columns in the leftmost (initial) nodes of
the index These columns are often the columns that make up
the business key, or perhaps some other identifier such as the
primary key, or a foreign key
Against asserted version tables, most queries will be similar to
queries against non-temporal tables except that a few temporal
predicates will be added to the queries These temporal
pre-dicates eliminate rows whose assertion time periods and/or
effective time periods are not what the query is looking for
An object that is represented by exactly one row in a
non-temporal table may be represented by any number of rows in a
temporal table But for normal business use, the one current
row in the temporal table, i.e the row which corresponds to that
one row in the non-temporal table, is likely to be accessed much
more frequently than any of the other rows Unless we properly
combine temporal columns with non-temporal columns in the
index, access to that current row may require us to scan through
many past or future rows to get to it
Of course, we are talking about both a scan of index leaf
pages, as well as the more expensive scan of the table itself
When specific rows are being searched for, and when they may
or may not be clustered close to one another in physical storage,
we want to minimize any type of scan
Another important consideration in determining the optimal
sequence of columns in an index is that optimizers may decide
Trang 9not to use a column in an index unless values have been provided for all the columns to its left, those being the columns that help to more directly trace a path through the higher levels
of the index tree, using the columns that match supplied pre-dicates So if we design an index with its temporal columns too far to the right, and with unqualified columns prior to them, a scan might still be triggered whenever the optimizer looks for the one current row for the object being queried On the other hand, as we will see, the solution is not to simply make the tem-poral columns left-most in the index
There will usually be many more non-current rows for an object, in an asserted version table, than the one current row for that object The table may contain any number of rows representing the history of the object, and any number of rows representing anticipated future states of the object The table may contain any number of no longer asserted rows for that object, as well as rows that we are not yet prepared to assert
So what we want the optimizer to do is to jump as directly as possible to the one currently asserted current version for an object, without having to scan though a potentially large number
of non-current rows
Indexes to Optimize Queries
Let’s look at an example We will assume that it is currently September 2011 So the next time the clock ticks, according to the clock tick granularity used in this book, it will be October 2011
In the table shown in Figure 15.1, there are nine rows representing the object whose object identifier is 55 Three of those rows are historical versions Their effectivity periods are past They represent past states of the object they refer to We designate them with “pe” (past effective) in the state column of the table.2
Another three of those rows are no longer asserted Their assertion periods are past They represent claims that we once made, claims that the statements which those rows made about the objects which they represented were true statements But now we no longer make those claims They exist in the assertion time past We designate these rows with “pa” (past asserted) in the state column of the table
2 The state and row # columns are not columns of the table itself They are metadata about the rows of the table, just like the row # column in the tables shown in other chapters in this book.
Trang 10Two of those rows are not yet asserted They are deferred
assertions We are not yet willing to claim that the statements
made by those rows are true statements We designate these
rows with “fa” (future asserted) in the state column of the table
There is one current row representing the object whose
iden-tifier is 55 This row is currently asserted and, within current
assertion time, became effective in August 2009 and will remain
in effect until further notice Note, however, that it will remain
asserted only until October 2012 At that time, if nothing in the
data changes, the database will cease to say that the data for
object 55 is Kiwi from August 2009 until further notice Instead,
it will say that data for object 55 is Kiwi from August 2009 to
December 2013, and that from December 2013 until further
notice, it will be Grapes We designate this earlier, but current,
row with “cc” (currently asserted current version) in the state
metadata column of the table
The SQL to retrieve the one current row for object 55 is:
SELECT data
FROM mytable
WHERE oid ¼ 55
AND eff_beg_dt <¼ Now() AND eff_end_dt > Now()
AND asr_beg_dt <¼ Now() AND asr_end_dt > Now()
Most optimizers will use the index tree to locate the row id
(rid) of the qualifying row or rows using, first of all, the columns
that have direct matching predicates, such as EQUALS or IN,
columns which are sometimes called match columns These
optimizers will also use the index tree for a column with a range
predicate, such as BETWEEN or LESS THAN OR EQUAL TO
(<¼), provided that it is the first column in the index or the first
column following the direct match columns
state
pa
pe
pa
pe
pa
pe
cc
fa
fa
1
2
3
4
5
6
7
8
9
55 Jan09 Jan09 Mar09 Mar09 Jun09 Jun09 Aug09 Aug09 Dec13
Jan09 Feb09 Feb09 Jun09 Jun09 Aug09 Aug09 Oct12 Oct12
Apples Apples Berries Berries Cherries Cherries Kiwi Kiwi Grapes
Feb09 9999 Jun09 9999 Aug09 9999 Oct12 9999 9999
9999 Mar09 9999 Jun09 9999 Aug09 9999 Dec13 9999
55 55 55 55 55 55 55 55 row # oid eff-beg eff-end asr-beg asr-end data
Figure 15.1 A Bi-Temporal Table