Tài liệu Managing time in relational databases- P5 pptx

We have found no differences between how valid time works in the standard model, and how effective time works in Asserted Versioning.. While a transaction begin date always indicates whe

Trang 1

the row representing that assertion will cease to be asserted on that date even if no correcting assertion is supplied to replace it The last reason an assertion end date may be changed is to lock an assertion which has been updated or deleted by a deferred transaction, until the resulting deferred assertion becomes current We will have more to say about deferred trans-actions, deferred assertions and locking in Chapter 13

Now() and UTC Keeping our notation DBMS agnostic, and keeping the clock tick granularity generic, we will refer to the current moment, to right now, as Now().7 SQL Server may use getdate(), and DB2 may use Current Timestamp or Current Date Depending on our clock tick duration, we might need to use a date formatting func-tion to set the current granularity In our examples, we generally use one month as our clock tick granularity However for our purposes, Now() can take on values at whatever level of granular-ity we choose to use, including day, second or microsecond Now() is usually assumed to represent the current moment by using local time But local time may change because of entering

or leaving Daylight Savings Time And another issue is time zone

At any one time, data about to update a database may exist in a different time zone than the database itself Users about to retrieve data from a database may exist in a different time zone than the database itself And, of course, federated queries may attempt to join data from databases located in different time zones

So the data values returned by Now() can change for reasons other than the passage of time Daylight Savings Time can change those values At any one point in time, those values can differ because of time zones Clearly, we need a reference frame-work, and a set of values, that will not change for any reason other than the passage of time, and that will be the same value,

at any point in time, the world over and year around

This reference framework is Universal Coordinated Time (UTC).8To make use of UTC, our Asserted Versioning Framework will convert local time to UTC on maintenance and queries, and

7 Now() is a function that returns the current date It is not a value However, we will often use it to designate a specific point in time For example, we may say that a time period starts at Now() and continues on until 12/31/9999 This is a shorthand way of emphasizing that, whenever that time period was created, it was given as its begin date the value returned by Now() at that moment.

8 However, even in UTC, some variations in time values do not reflect the passage of time We are referring here to the periodic adjustments in UTC made by adding or removing leap seconds, as we described in an earlier section of this chapter.

Trang 2

will store Asserted Versioning temporal parameters, such as begin

and end dates, in UTC For example, with Policy_AV being an

asserted version table of insurance policies, we would insert a

policy like this:

INSERT INTO Policy_AV (oid, asr_beg_dt )

VALUES (55, CURRENT TIMESTAMP - CURRENT TIMEZONE )

For queries, they will perform better if we do the time

conver-sion before using the value as a selection predicate in the SQL

itself This is because most optimizers treat functions that

appear in predicates as non-indexable For example, in DB2,

we should write:

SET :my-cut ¼ TIMESTAMP(:my-local-time-value) - CURRENT

TIMEZONE

SELECT FROM

WHERE oid ¼ 55

AND asr_beg_dt <¼ :my-cut

AND asr_end_dt > :my-cut

rather than

SELECT FROM

WHERE oid ¼ 55

AND asr_beg_dt <¼

TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE

AND

However, if these functions are used for display purposes, then

there is no reason to exclude them from the queries For example:

SELECT asr_beg_dt þ CURRENT TIMEZONE AS my_local_asr_beg_dt

FROM

It would also be useful to add alternate columns for the

tem-poral dates in our views that have the translation to local time

performed already

The Very Concept of Bi-Temporality

Business IT professionals were using tables with both an

effective date and a physical row create date by the early 90s.9

But they were doing so with apparently no knowledge of

9 Or timestamps, or other datatypes We remind the reader that, throughout this book,

we use the date datatype for all temporal columns, and a first of the month value for

all our dates This simplifies the presentation without affecting any of the semantics.

In real business applications, of course, these columns would often be timestamps.

Trang 3

academic work on bi-temporality At that time, these version tables which also contained a row create date were state of the art in best practice methods for managing temporal data We will discuss them in the next chapter

With a row creation date, of course, any query can be restricted to the rows present in a table as of any specific date

by including a WHERE clause predicate that qualifies only those rows whose create date is less than or equal to the speci-fied date With two effective dates, tables like these are also able

to specify one of the two temporal dimensions that make up full bi-temporality

The standard temporal model uses the term “valid time” where we use the term “effective time” But the difference is purely verbal We have found no differences between how valid time works in the standard model, and how effective time works

in Asserted Versioning We use “effective time” because it is the preferred term among business IT professionals, and also because it readily adapts itself to other grammatical forms such

as “becoming effective” and “in effect”

The standard model states that “(v)alid time captur(es) the history of a changing reality, and transaction time captur(es) the sequence of states of a changing table A table supporting both is termed a “bi-temporal table” [2000, Snodgrass, p 20] But

as we will see later, Asserted Versioning does not define bi-tempo-rality in exactly the same way The difference lies primarily in the second of the two temporal dimensions, what computer scientists call “transaction time” and what we call “assertion time” While a transaction begin date always indicates when a row is physically inserted into a table, an assertion begin date indicates when we are first willing to assert, or claim, that a row is a true statement about the object it represents, during that row’s effective (valid) time period, and also that the quality of the row’s data is good enough to base business decisions on

In the standard temporal model, the beginning of a transaction time period is the date on which the row is created Obviously, once the row is created, that date cannot be changed But in the Asserted Versioning temporal model, an assertion time period begins either on the date a row is created, or on a later date Because an assertion begin date is not necessarily the same

as the date on which its row is physically created, Asserted Versioning needs, in addition to the pair of dates that define this time period, an additional date which is the physical creation date of each row That date serves as an audit trail, and as a means of reconstructing a table as it physically existed at any past point in time

Trang 4

What are these rows with future assertion begin dates? To

take a single example, they might be rows for which we have

some of the business data, but not all of it, rows which are in

the process of being made ready “for prime time” These

rows—which may be assertions about past, present or future

versions—are not yet ready, we will say, to become part of the

production data in the table, not yet ready to become rows that

we are willing to present to the world and of which we are

will-ing to say “We stand behind the statements these rows make

We claim that the statements they make are (or are likely to

become) true, and that the information these rows provide

meets the standards of reliability understood (or explicitly

stated) to apply to all rows in this table”

So the semantics of the standard temporal model are fully

supported by Asserted Versioning But Asserted Versioning adds

the semantics of what we call deferred assertions, and which we

have just briefly described As we will see in later chapters,

deferred assertions are just one kind of internalized pipeline

dataset, and the internalization of pipeline datasets can eliminate

a large part of the IT maintenance budget by eliminating the need

to manage pipeline datasets as distinct physical objects

Allen Relationships

Allen relationships describe all possible positional

relationships between two time periods along a common

time-line This includes the special case of one or both time periods

being a point in time, i.e being exactly one clock tick in length

There are 13 Allen relationships in total Six have a

corresponding inverse relationship, and one does not Standard

treatments of the Allen relationships may be found in both

[2000, Snodgrass] and [2002, Date, Darwen, Lorentzos] We have

found it useful to reconstruct the Allen relationships as a binary

taxonomy Our taxonomy is shown inFigure 3.4

In this diagram, the leaf nodes include a graphic in which

there are two timelines, each represented by a dashed line All

the leaf nodes but one have an inverse, and that one is

italicized; when two time periods are identical, they do not

have a distinct inverse Thus, this taxonomy specifies 13

leaf-node relationships which are, in fact, precisely the 13 Allen

relationships

The names of the Allen relationships are standard, and have

been since Allen wrote his seminal article in 1983 But those

names, and the names of the higher-level nodes in our own

tax-onomy of the Allen relationships, are also expressions in

Trang 5

ordinary language In order to distinguish between the ordinary language and the technical uses of these terms, we will include the names of Allen relationships and our other taxonomy nodes

in brackets when we are discussing them We will also underline the non-leaf node relationships in the taxonomy, to emphasize that they are relationships we have defined, and are not one of the Allen relationships

In the following discussion, the first time period in a pair of them is the one that is either earlier than the other, or not longer than the other

Given two time periods on a common timeline, either they have at least one clock tick in common or they do not If they

do, we will say that they [intersect] one another If they do not,

we will say that they [exclude] one another

If there is an [intersects] relationship between two time per-iods, then either one [fills] the other or each [overlaps] the other

If one time period [fills] another, then all its clock ticks are also

in the time period it [fills], but not necessarily vice versa If one time period [overlaps] another, then the latter also overlaps the former; but, being the later of the two time periods, we say that the latter time period has the inverse relationship, [overlaps1]

In the overlaps cases, each has at least one clock tick that the

Time Periods Relationships Along a Common Timeline

| -|

| -| | -| | -| -|

Equals | -|

| -|

Occupies Starts Finishes During | -|

| -|

Aligns | -|

| -|

Figure 3.4 The Asserted Versioning Allen Relationship Taxonomy

Trang 6

other does not have, as well as having at least one clock tick that

the other does have

If two time periods [exclude] one another, then they do not

share any clock ticks, and they are either non-contiguous or

con-tiguous If there is at least one clock tick between them, they are

non-contiguous and we say that one is [before] the other

Other-wise they are contiguous and we say that one [meets] the other

If one time period [fills] the other, then either they are [equal],

or one [occupies] the other If they are [equal], then neither has a

clock tick that the other does not have If one [occupies] the

other, then all the clock ticks in the occupying time period are

also in the occupied time period, but not vice versa

If one time period [occupies] the other, then either they share

an [aligns] relationship, or one occurs [during] the other If they

are aligned, then they either start on the same clock tick or end

on the same clock tick, and we say that one either [starts] or

[finishes] the other Otherwise, one occurs [during] the other,

beginning after the other and ending before it Note that if two

time periods are aligned, one cannot both [start] and [finish]

the other because if it did, it would be [equal] to the other

If one time period [starts] another, they both begin on the

same clock tick If one [finishes] the other, they both end on

the same clock tick If one time period [occupies] another, but

they are not aligned, then one occurs [during] the other

Now let’s consider the special case in which one of the two

time periods is a point in time, i.e is exactly one clock tick in

length, and the other one contains two or more clock ticks This

point in time may either [intersect] or [exclude] the time period

If the point in time [intersects] the time period, it also [fills] and

[occupies] that time period If it [aligns] with the time period,

then it either [starts] the time period or [finishes] it Otherwise,

the point in time occurs [during] the time period If the point

in time [excludes] the time period, then either may be [before]

the other, or they may [meet]

Finally, let’s consider one more special case, that in which both

the time periods are points in time Those two points in time may

be [equal], or one may be [before] the other, or they may [meet]

There are no other Allen relationships possible for them

As we will see later, four of these Allen relationship categories

are especially important They will be discussed in later

chapters, but we choose to mention them here

(i) The [intersects] relationship is important because for a

tem-poral insert transaction to be valid, its effective time period

cannot intersect that of any episode for the same object

which is already in the target table By the same token, for

Trang 7

a temporal update or delete transaction to be valid, the tar-get table must already contain at least one episode for the same object whose effective time period does [intersect] the time period designated by the transaction

(ii) The [fills] relationship is important because violations of the temporal analog of referential integrity always involve the failure of a child time period to [fill] a parent time period We will be frequently discussing this relationship from the parent side, and we would like to avoid having to say things like “ failure of a parent time period to be filled by a child time period” So we will use the term

“includes” as a synonym for “is filled by”, i.e as a synonym for [fills1] Now we can say “ failure of a parent time period to include a child time period”

(iii) The [before] relationship is important because it distinguishes episodes from one another Every episode of

an object is non-contiguous with every other episode of the same object, and so for each pair of them, one of them must be [before] the other

(iv) The [meets] relationship is important because it groups versions for the same object into episodes A series of vers-ions for the same object that are all contiguous, i.e that all [meet], fall within the same episode of that object

Advanced Indexing Strategies Indexes are one way to improve performance And it should

be clear that it would be a serious performance handicap if we could not define indexes over either or both of the two time per-iods of a bi-temporal table But this proves to be more complex than it might at first sight appear to be

The issue is that traditional indexes contain pointers to rows, pointers which are based on discrete values, while the two time periods of rows in bi-temporal tables are not discrete values, but rather an unbroken and non-overlapping sequence of such values Such rows occupy points in effective (valid) time or in assertion (transaction) time only as a limit case What they really occupy are intervals along those two timelines That’s the reason

we need two dates to describe each of them Traditional balanced-tree indexes work well with discrete values, including such discrete values as dates But they don’t work well with intervals of time, i.e with time periods

But indexing methods which manage intervals are being developed Specifically, some bi-temporal indexing methods manage the two intervals for a bi-temporal object as a single

Trang 8

unit, which would appear as a rectangle on a Cartesian graph in

which one temporal dimension is represented by the X-axis and

the other by the Y-axis

Another approach is to manage each of the two temporal

dimensions separately One reason for taking this approach is that,

for the standard temporal model, the two temporal dimensions

behave differently Specifically, for the standard model,

transac-tion time always moves forwards, whereas valid time can move

forwards or backwards This means that a bi-temporal row can

be inserted into a table proactively in valid time, but can never

be inserted into a table proactively in transaction time

Asserted Versioning, as we have already pointed out, supports

both forwards and backwards movement in both temporal

dimensions So for Asserted Versioning, there is no difference

in behavior which would justify separating the two temporal

dimensions for indexing purposes Specifically, Asserted

Versioning supports both proactive (future-dated) versions and

proactive assertions (i.e deferred assertions) and also both

retro-active versions and an approval transaction which can move

deferred assertions backwards in time, but not prior to Now()

In Chapter 15, we will describe certain indexing strategies that

will improve performance using today’s DBMS index designs

Temporal Extensions to SQL

Following [2000, Snodgrass], we will refer to a future release

of the SQL language that will contain temporal extensions as

SQL3 A more detailed discussion may be found in that book,

although we should note that the book is, at the time of

publica-tion of this book, 10 years old

Temporal Upward Compatibility

One issue related to changes in the SQL standard is temporal

upward compatibility In describing SQL3, Snodgrass states that

“(t)emporal upward compatibility at its core says this: ‘Take an

application that doesn’t involve time, that concerns only the

cur-rent reality Alter one or more of the tables so that they now

have temporal support The application should run as

before, without changing a single line of code’” [2000, Snodgrass,

p 449]

This cannot be an objective for Asserted Versioning, because

we are limited to current SQL, not to a future release of SQL that

builds temporal extensions into the language itself But we can

come close We can achieve this objective for queries by using

a view which filters out all but current data, and by redirecting

Trang 9

existing queries to that view We can achieve this objective for temporal inserts, updates and deletes by defining default effec-tive and assertion dates for these transactions These will be default dates that cause these transactions, as written by their authors, and as parsed and submitted to the DBMS by the AVF,

to physically insert and update current assertions of the current versions of the objects they reference

The PERIOD Datatype

A second issue related to changes in the SQL standard is the need for a PERIOD datatype This new datatype will not change the semantics of temporal data management, but it will simplify the expression of those semantics For one thing, a single col-umn will replace a pair of dates This will simplify the way that Allen relationships are specified in SQL statements For example,

it will change the expression with which we ask whether a point

in time is or is not contained in a period of time Where P and T are, respectively, a point in time and a period of time, and T1and

T2 are dates delimiting T using the closed-open representation,

we currently must ask whether P is a clock tick within T like this: WHERE T1 <¼ P AND P < T 2

With the PERIOD datatype, and the new Allen relationship operators that will accompany it (including such derivative operators as those used in our taxonomy), we will be able to ask the same question like this:

WHERE T OCCUPIES P

A PERIOD datatype will also make it easier to enforce con-straints on time periods, such as insuring that two time periods

do not intersect When representing time periods by means of begin and end dates, this is impossible to do with only an index Here’s why

Consider the time period represented by the closed-open pair [4/23/2012 – 8/04/2014] Suppose that we want to define an exclusive index on time periods The problem is that there is

no way, by means of any standards-compliant indexing method available with today’s technology, to exclude [3/12/2011 – 4/24/ 2012], or [10/15/2012 – 9/30/2013], or [6/1/2014 – 12/31/2014],

or any other time period that in fact [intersects] the time period designated by [4/23/2012 – 8/04/2014] The index sees two columns of values, and knows that the combination of values must be unique in each instance That’s all it sees, that’s all it knows, and that’s all it can enforce

Trang 10

But if we had a PERIOD datatype, and SQL extensions and

indexing methods that could recognize and manage that

datatype, then all the Allen relationships among time periods

could be easily expressed, and the very important [excludes]

relationship could be enforced by means of a unique index

Lacking that future technology, and the standards needed to

insure interoperability across different vendor implementations,

the AVF contains its own code that effectively turns pairs of dates

into a user-defined PERIOD datatype

Temporal Primary Keys

A third issue related to changes in the SQL standard is

sup-port for temporal primary keys With those temporal extensions,

we will be able to declare a temporal primary key to the DBMS

and, by the same token, declare temporal foreign keys as well

But what is it we will be declaring? Temporal tags added to

phys-ically unique identifiers of rows of otherwise non-temporal

tables? Or something more?

If a SQL standard for bi-temporality, when we eventually have

one, is a standard for adding two temporal tags to rows in

other-wise non-temporal tables, and providing a PERIOD data type

and Allen relationship operators to manage the data thus tagged,

then most of the semantics of bi-temporality will have been left

out of that standard, and left up to developers The managed

objects of temporal data management are not physical rows in

physical tables They are collections of one or more rows which

represent temporally delimited claims about temporally

delimited statements about what real-world persistent objects

were like, are like, or will be like

As long as every database table contains one and only one

row for each instance of the type indicated by the table, it is easy

to forget about the semantics and concentrate on the

mechan-ics Primary key uniqueness is mechanics; its role in eliminating

row-level synonyms—and its failure to address the problem of

row-level homonyms—is the semantics that are easily and

almost always overlooked Foreign key referential integrity is

mechanics; its role in expressing existence dependencies among

the objects represented by those rows is the semantics that are

easily and almost always overlooked

It has been this one-to-one correlation between rows and

the objects they represent that has made it easy to give short

shrift to semantics, and to then get right down to what really

fascinates engineering types—the mechanics of making things

work But once we attempt to manage both the epistemological

Định dạng
Số trang	20
Dung lượng	315,81 KB