We have found no differences between how valid time works in the standard model, and how effective time works in Asserted Versioning.. While a transaction begin date always indicates whe
Trang 1the row representing that assertion will cease to be asserted on that date even if no correcting assertion is supplied to replace it The last reason an assertion end date may be changed is to lock an assertion which has been updated or deleted by a deferred transaction, until the resulting deferred assertion becomes current We will have more to say about deferred trans-actions, deferred assertions and locking in Chapter 13
Now() and UTC Keeping our notation DBMS agnostic, and keeping the clock tick granularity generic, we will refer to the current moment, to right now, as Now().7 SQL Server may use getdate(), and DB2 may use Current Timestamp or Current Date Depending on our clock tick duration, we might need to use a date formatting func-tion to set the current granularity In our examples, we generally use one month as our clock tick granularity However for our purposes, Now() can take on values at whatever level of granular-ity we choose to use, including day, second or microsecond Now() is usually assumed to represent the current moment by using local time But local time may change because of entering
or leaving Daylight Savings Time And another issue is time zone
At any one time, data about to update a database may exist in a different time zone than the database itself Users about to retrieve data from a database may exist in a different time zone than the database itself And, of course, federated queries may attempt to join data from databases located in different time zones
So the data values returned by Now() can change for reasons other than the passage of time Daylight Savings Time can change those values At any one point in time, those values can differ because of time zones Clearly, we need a reference frame-work, and a set of values, that will not change for any reason other than the passage of time, and that will be the same value,
at any point in time, the world over and year around
This reference framework is Universal Coordinated Time (UTC).8To make use of UTC, our Asserted Versioning Framework will convert local time to UTC on maintenance and queries, and
7 Now() is a function that returns the current date It is not a value However, we will often use it to designate a specific point in time For example, we may say that a time period starts at Now() and continues on until 12/31/9999 This is a shorthand way of emphasizing that, whenever that time period was created, it was given as its begin date the value returned by Now() at that moment.
8 However, even in UTC, some variations in time values do not reflect the passage of time We are referring here to the periodic adjustments in UTC made by adding or removing leap seconds, as we described in an earlier section of this chapter.
Trang 2will store Asserted Versioning temporal parameters, such as begin
and end dates, in UTC For example, with Policy_AV being an
asserted version table of insurance policies, we would insert a
policy like this:
INSERT INTO Policy_AV (oid, asr_beg_dt )
VALUES (55, CURRENT TIMESTAMP - CURRENT TIMEZONE )
For queries, they will perform better if we do the time
conver-sion before using the value as a selection predicate in the SQL
itself This is because most optimizers treat functions that
appear in predicates as non-indexable For example, in DB2,
we should write:
SET :my-cut ¼ TIMESTAMP(:my-local-time-value) - CURRENT
TIMEZONE
SELECT FROM
WHERE oid ¼ 55
AND asr_beg_dt <¼ :my-cut
AND asr_end_dt > :my-cut
rather than
SELECT FROM
WHERE oid ¼ 55
AND asr_beg_dt <¼
TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE
AND
However, if these functions are used for display purposes, then
there is no reason to exclude them from the queries For example:
SELECT asr_beg_dt þ CURRENT TIMEZONE AS my_local_asr_beg_dt
FROM
It would also be useful to add alternate columns for the
tem-poral dates in our views that have the translation to local time
performed already
The Very Concept of Bi-Temporality
Business IT professionals were using tables with both an
effective date and a physical row create date by the early 90s.9
But they were doing so with apparently no knowledge of
9 Or timestamps, or other datatypes We remind the reader that, throughout this book,
we use the date datatype for all temporal columns, and a first of the month value for
all our dates This simplifies the presentation without affecting any of the semantics.
In real business applications, of course, these columns would often be timestamps.
Trang 3academic work on bi-temporality At that time, these version tables which also contained a row create date were state of the art in best practice methods for managing temporal data We will discuss them in the next chapter
With a row creation date, of course, any query can be restricted to the rows present in a table as of any specific date
by including a WHERE clause predicate that qualifies only those rows whose create date is less than or equal to the speci-fied date With two effective dates, tables like these are also able
to specify one of the two temporal dimensions that make up full bi-temporality
The standard temporal model uses the term “valid time” where we use the term “effective time” But the difference is purely verbal We have found no differences between how valid time works in the standard model, and how effective time works
in Asserted Versioning We use “effective time” because it is the preferred term among business IT professionals, and also because it readily adapts itself to other grammatical forms such
as “becoming effective” and “in effect”
The standard model states that “(v)alid time captur(es) the history of a changing reality, and transaction time captur(es) the sequence of states of a changing table A table supporting both is termed a “bi-temporal table” [2000, Snodgrass, p 20] But
as we will see later, Asserted Versioning does not define bi-tempo-rality in exactly the same way The difference lies primarily in the second of the two temporal dimensions, what computer scientists call “transaction time” and what we call “assertion time” While a transaction begin date always indicates when a row is physically inserted into a table, an assertion begin date indicates when we are first willing to assert, or claim, that a row is a true statement about the object it represents, during that row’s effective (valid) time period, and also that the quality of the row’s data is good enough to base business decisions on
In the standard temporal model, the beginning of a transaction time period is the date on which the row is created Obviously, once the row is created, that date cannot be changed But in the Asserted Versioning temporal model, an assertion time period begins either on the date a row is created, or on a later date Because an assertion begin date is not necessarily the same
as the date on which its row is physically created, Asserted Versioning needs, in addition to the pair of dates that define this time period, an additional date which is the physical creation date of each row That date serves as an audit trail, and as a means of reconstructing a table as it physically existed at any past point in time
Trang 4What are these rows with future assertion begin dates? To
take a single example, they might be rows for which we have
some of the business data, but not all of it, rows which are in
the process of being made ready “for prime time” These
rows—which may be assertions about past, present or future
versions—are not yet ready, we will say, to become part of the
production data in the table, not yet ready to become rows that
we are willing to present to the world and of which we are
will-ing to say “We stand behind the statements these rows make
We claim that the statements they make are (or are likely to
become) true, and that the information these rows provide
meets the standards of reliability understood (or explicitly
stated) to apply to all rows in this table”
So the semantics of the standard temporal model are fully
supported by Asserted Versioning But Asserted Versioning adds
the semantics of what we call deferred assertions, and which we
have just briefly described As we will see in later chapters,
deferred assertions are just one kind of internalized pipeline
dataset, and the internalization of pipeline datasets can eliminate
a large part of the IT maintenance budget by eliminating the need
to manage pipeline datasets as distinct physical objects
Allen Relationships
Allen relationships describe all possible positional
relationships between two time periods along a common
time-line This includes the special case of one or both time periods
being a point in time, i.e being exactly one clock tick in length
There are 13 Allen relationships in total Six have a
corresponding inverse relationship, and one does not Standard
treatments of the Allen relationships may be found in both
[2000, Snodgrass] and [2002, Date, Darwen, Lorentzos] We have
found it useful to reconstruct the Allen relationships as a binary
taxonomy Our taxonomy is shown inFigure 3.4
In this diagram, the leaf nodes include a graphic in which
there are two timelines, each represented by a dashed line All
the leaf nodes but one have an inverse, and that one is
italicized; when two time periods are identical, they do not
have a distinct inverse Thus, this taxonomy specifies 13
leaf-node relationships which are, in fact, precisely the 13 Allen
relationships
The names of the Allen relationships are standard, and have
been since Allen wrote his seminal article in 1983 But those
names, and the names of the higher-level nodes in our own
tax-onomy of the Allen relationships, are also expressions in
Trang 5ordinary language In order to distinguish between the ordinary language and the technical uses of these terms, we will include the names of Allen relationships and our other taxonomy nodes
in brackets when we are discussing them We will also underline the non-leaf node relationships in the taxonomy, to emphasize that they are relationships we have defined, and are not one of the Allen relationships
In the following discussion, the first time period in a pair of them is the one that is either earlier than the other, or not longer than the other
Given two time periods on a common timeline, either they have at least one clock tick in common or they do not If they
do, we will say that they [intersect] one another If they do not,
we will say that they [exclude] one another
If there is an [intersects] relationship between two time per-iods, then either one [fills] the other or each [overlaps] the other
If one time period [fills] another, then all its clock ticks are also
in the time period it [fills], but not necessarily vice versa If one time period [overlaps] another, then the latter also overlaps the former; but, being the later of the two time periods, we say that the latter time period has the inverse relationship, [overlaps1]
In the overlaps cases, each has at least one clock tick that the
Time Periods Relationships Along a Common Timeline
| -|
| -|
| -| | -| | -| -|
Equals | -|
| -|
Occupies Starts Finishes During | -|
| -|
Aligns | -|
| -|
| -|
| -|
Figure 3.4 The Asserted Versioning Allen Relationship Taxonomy
Trang 6other does not have, as well as having at least one clock tick that
the other does have
If two time periods [exclude] one another, then they do not
share any clock ticks, and they are either non-contiguous or
con-tiguous If there is at least one clock tick between them, they are
non-contiguous and we say that one is [before] the other
Other-wise they are contiguous and we say that one [meets] the other
If one time period [fills] the other, then either they are [equal],
or one [occupies] the other If they are [equal], then neither has a
clock tick that the other does not have If one [occupies] the
other, then all the clock ticks in the occupying time period are
also in the occupied time period, but not vice versa
If one time period [occupies] the other, then either they share
an [aligns] relationship, or one occurs [during] the other If they
are aligned, then they either start on the same clock tick or end
on the same clock tick, and we say that one either [starts] or
[finishes] the other Otherwise, one occurs [during] the other,
beginning after the other and ending before it Note that if two
time periods are aligned, one cannot both [start] and [finish]
the other because if it did, it would be [equal] to the other
If one time period [starts] another, they both begin on the
same clock tick If one [finishes] the other, they both end on
the same clock tick If one time period [occupies] another, but
they are not aligned, then one occurs [during] the other
Now let’s consider the special case in which one of the two
time periods is a point in time, i.e is exactly one clock tick in
length, and the other one contains two or more clock ticks This
point in time may either [intersect] or [exclude] the time period
If the point in time [intersects] the time period, it also [fills] and
[occupies] that time period If it [aligns] with the time period,
then it either [starts] the time period or [finishes] it Otherwise,
the point in time occurs [during] the time period If the point
in time [excludes] the time period, then either may be [before]
the other, or they may [meet]
Finally, let’s consider one more special case, that in which both
the time periods are points in time Those two points in time may
be [equal], or one may be [before] the other, or they may [meet]
There are no other Allen relationships possible for them
As we will see later, four of these Allen relationship categories
are especially important They will be discussed in later
chapters, but we choose to mention them here
(i) The [intersects] relationship is important because for a
tem-poral insert transaction to be valid, its effective time period
cannot intersect that of any episode for the same object
which is already in the target table By the same token, for
Trang 7a temporal update or delete transaction to be valid, the tar-get table must already contain at least one episode for the same object whose effective time period does [intersect] the time period designated by the transaction
(ii) The [fills] relationship is important because violations of the temporal analog of referential integrity always involve the failure of a child time period to [fill] a parent time period We will be frequently discussing this relationship from the parent side, and we would like to avoid having to say things like “ failure of a parent time period to be filled by a child time period” So we will use the term
“includes” as a synonym for “is filled by”, i.e as a synonym for [fills1] Now we can say “ failure of a parent time period to include a child time period”
(iii) The [before] relationship is important because it distinguishes episodes from one another Every episode of
an object is non-contiguous with every other episode of the same object, and so for each pair of them, one of them must be [before] the other
(iv) The [meets] relationship is important because it groups versions for the same object into episodes A series of vers-ions for the same object that are all contiguous, i.e that all [meet], fall within the same episode of that object
Advanced Indexing Strategies Indexes are one way to improve performance And it should
be clear that it would be a serious performance handicap if we could not define indexes over either or both of the two time per-iods of a bi-temporal table But this proves to be more complex than it might at first sight appear to be
The issue is that traditional indexes contain pointers to rows, pointers which are based on discrete values, while the two time periods of rows in bi-temporal tables are not discrete values, but rather an unbroken and non-overlapping sequence of such values Such rows occupy points in effective (valid) time or in assertion (transaction) time only as a limit case What they really occupy are intervals along those two timelines That’s the reason
we need two dates to describe each of them Traditional balanced-tree indexes work well with discrete values, including such discrete values as dates But they don’t work well with intervals of time, i.e with time periods
But indexing methods which manage intervals are being developed Specifically, some bi-temporal indexing methods manage the two intervals for a bi-temporal object as a single
Trang 8unit, which would appear as a rectangle on a Cartesian graph in
which one temporal dimension is represented by the X-axis and
the other by the Y-axis
Another approach is to manage each of the two temporal
dimensions separately One reason for taking this approach is that,
for the standard temporal model, the two temporal dimensions
behave differently Specifically, for the standard model,
transac-tion time always moves forwards, whereas valid time can move
forwards or backwards This means that a bi-temporal row can
be inserted into a table proactively in valid time, but can never
be inserted into a table proactively in transaction time
Asserted Versioning, as we have already pointed out, supports
both forwards and backwards movement in both temporal
dimensions So for Asserted Versioning, there is no difference
in behavior which would justify separating the two temporal
dimensions for indexing purposes Specifically, Asserted
Versioning supports both proactive (future-dated) versions and
proactive assertions (i.e deferred assertions) and also both
retro-active versions and an approval transaction which can move
deferred assertions backwards in time, but not prior to Now()
In Chapter 15, we will describe certain indexing strategies that
will improve performance using today’s DBMS index designs
Temporal Extensions to SQL
Following [2000, Snodgrass], we will refer to a future release
of the SQL language that will contain temporal extensions as
SQL3 A more detailed discussion may be found in that book,
although we should note that the book is, at the time of
publica-tion of this book, 10 years old
Temporal Upward Compatibility
One issue related to changes in the SQL standard is temporal
upward compatibility In describing SQL3, Snodgrass states that
“(t)emporal upward compatibility at its core says this: ‘Take an
application that doesn’t involve time, that concerns only the
cur-rent reality Alter one or more of the tables so that they now
have temporal support The application should run as
before, without changing a single line of code’” [2000, Snodgrass,
p 449]
This cannot be an objective for Asserted Versioning, because
we are limited to current SQL, not to a future release of SQL that
builds temporal extensions into the language itself But we can
come close We can achieve this objective for queries by using
a view which filters out all but current data, and by redirecting
Trang 9existing queries to that view We can achieve this objective for temporal inserts, updates and deletes by defining default effec-tive and assertion dates for these transactions These will be default dates that cause these transactions, as written by their authors, and as parsed and submitted to the DBMS by the AVF,
to physically insert and update current assertions of the current versions of the objects they reference
The PERIOD Datatype
A second issue related to changes in the SQL standard is the need for a PERIOD datatype This new datatype will not change the semantics of temporal data management, but it will simplify the expression of those semantics For one thing, a single col-umn will replace a pair of dates This will simplify the way that Allen relationships are specified in SQL statements For example,
it will change the expression with which we ask whether a point
in time is or is not contained in a period of time Where P and T are, respectively, a point in time and a period of time, and T1and
T2 are dates delimiting T using the closed-open representation,
we currently must ask whether P is a clock tick within T like this: WHERE T1 <¼ P AND P < T 2
With the PERIOD datatype, and the new Allen relationship operators that will accompany it (including such derivative operators as those used in our taxonomy), we will be able to ask the same question like this:
WHERE T OCCUPIES P
A PERIOD datatype will also make it easier to enforce con-straints on time periods, such as insuring that two time periods
do not intersect When representing time periods by means of begin and end dates, this is impossible to do with only an index Here’s why
Consider the time period represented by the closed-open pair [4/23/2012 – 8/04/2014] Suppose that we want to define an exclusive index on time periods The problem is that there is
no way, by means of any standards-compliant indexing method available with today’s technology, to exclude [3/12/2011 – 4/24/ 2012], or [10/15/2012 – 9/30/2013], or [6/1/2014 – 12/31/2014],
or any other time period that in fact [intersects] the time period designated by [4/23/2012 – 8/04/2014] The index sees two columns of values, and knows that the combination of values must be unique in each instance That’s all it sees, that’s all it knows, and that’s all it can enforce
Trang 10But if we had a PERIOD datatype, and SQL extensions and
indexing methods that could recognize and manage that
datatype, then all the Allen relationships among time periods
could be easily expressed, and the very important [excludes]
relationship could be enforced by means of a unique index
Lacking that future technology, and the standards needed to
insure interoperability across different vendor implementations,
the AVF contains its own code that effectively turns pairs of dates
into a user-defined PERIOD datatype
Temporal Primary Keys
A third issue related to changes in the SQL standard is
sup-port for temporal primary keys With those temporal extensions,
we will be able to declare a temporal primary key to the DBMS
and, by the same token, declare temporal foreign keys as well
But what is it we will be declaring? Temporal tags added to
phys-ically unique identifiers of rows of otherwise non-temporal
tables? Or something more?
If a SQL standard for bi-temporality, when we eventually have
one, is a standard for adding two temporal tags to rows in
other-wise non-temporal tables, and providing a PERIOD data type
and Allen relationship operators to manage the data thus tagged,
then most of the semantics of bi-temporality will have been left
out of that standard, and left up to developers The managed
objects of temporal data management are not physical rows in
physical tables They are collections of one or more rows which
represent temporally delimited claims about temporally
delimited statements about what real-world persistent objects
were like, are like, or will be like
As long as every database table contains one and only one
row for each instance of the type indicated by the table, it is easy
to forget about the semantics and concentrate on the
mechan-ics Primary key uniqueness is mechanics; its role in eliminating
row-level synonyms—and its failure to address the problem of
row-level homonyms—is the semantics that are easily and
almost always overlooked Foreign key referential integrity is
mechanics; its role in expressing existence dependencies among
the objects represented by those rows is the semantics that are
easily and almost always overlooked
It has been this one-to-one correlation between rows and
the objects they represent that has made it easy to give short
shrift to semantics, and to then get right down to what really
fascinates engineering types—the mechanics of making things
work But once we attempt to manage both the epistemological