Tài liệu Managing time in relational databases- P4 pptx

State Temporal Data: Uni-Temporal and Bi-Temporal Data At this point in our discussion, we are concerned with state data rather than with event data, and with state data that is queryabl

Trang 1

approach is better at tracking changes to persistent objects and

to relationships other than metric balances

State Temporal Data: Uni-Temporal

and Bi-Temporal Data

At this point in our discussion, we are concerned with state

data rather than with event data, and with state data that is

queryable rather than state data that needs to be reconstructed

What then are the various options for managing temporal

queryable state data?

First of all, we need to recognize that there are two kinds of

states to manage One is the state of the things we are interested

in, the states those things pass through as they change over time

But there is another kind of state, that being the state of the data

itself Data, such as rows in tables, can be in one of two states:

correct or incorrect (As we will see in Chapter 12, it can also

be in a third state, one in which it is neither correct nor

incor-rect.) Version tables and assertion tables record, respectively,

the state of objects and the state of our data about those objects

Uni-Temporal State Data

In a conventional Customer table, each row represents the

cur-rent state of a customer Each time the state of a customer

changes, i.e each time a row is updated, the old data is overwritten

with the new data By adding one (or sometimes two) date(s) or

timestamp(s) to the primary key of the table, it becomes a

uni-temporal table But since we already know that there are two

dif-ferent temporal dimensions that can be associated with data, we

know to ask “What kind of uni-temporal table?”

As we saw in the Preface, there are uni-temporal version

tables and uni-temporal assertion tables Version tables keep

track of changes that happen in the real world, changes to the

objects represented in those tables Each change is recorded as

a new version of an object Assertion tables keep track of

correct-ions we have made to data we later discovered to be in error

Each correction is recorded as a new assertion about the object

The versions make up a true history of what happened to those

objects The assertions make up a virtual logfile of corrections

to the data in the table

Usually, when table-level temporal data is discussed, the

tables turn out to be version tables, not assertion tables In their

book describing the alternative temporal model [2002, Date,

Darwen, Lorentzos], the authors focus on uni-temporal

versioned data Bi-temporality is not even alluded to until the

Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 41

Trang 2

penultimate chapter, at which point it is suggested that “logged time history” tables be used to manage the other temporal dimension Since bi-temporality receives only a passing mention

in that book, we choose to classify the alternative temporal model as a uni-temporal model

In IT best practices for managing temporal data—which we will discuss in detail in Chapter 4—once again the temporal tables are version tables, and error correction is an issue that is mostly left to take care of itself.4 For the most part, it does so

by overwriting incorrect data.5 This is why we classify IT best practices as uni-temporal models

The Alternative Temporal Model What we call the alternative temporal model was developed

by Chris Date, Hugh Darwen and Dr Nikos Lorentzos in their book Temporal Data and the Relational Model (Morgan-Kaufmann, 2002).6 This model is based in large part on tec-hniques developed by Dr Lorentzos to manage temporal data

by breaking temporal durations down into temporally atomic components, applying various transformations to those compo-nents, and then re-assembling the components back into those temporal durations—a technique, as the authors note, whose applicability is not restricted to temporal data

As we said, except for the penultimate chapter in that book, the entire book is a discussion of uni-temporal versioned tables

In that chapter, the authors recommend that if there is a require-ment to keep track of the assertion time history of a table (which they call “logged-time history”), it be implemented by means of

an auxiliary table which is maintained by the DBMS

4 Lacking criteria to distinguish the best from the rest, the term “best practices” has come to mean little more than “standard practices” What we call “best practices”, and which we discuss in Chapter 4, are standard practices we have seen used by many of our clients.

5 An even worse solution is to mix up versions and assertions by creating a new row, with a begin date of Now(), both every time there is a real change, and also every time there is an error in the data to correct When that happens, we no longer have a history

of the changes things went through, because we cannot distinguish versions from corrections And we no longer have a “virtual logfile” of corrections because we don’t know how far back the corrections should actually have taken effect.

6 The word “model”, as used here and also in the phrases “alternative model” and

“Asserted Versioning model” obviously doesn’t refer to a data model of specific subject matter It means something like theory, but with an emphasis on its applicability to real-world problems So “the relational model”, as we use the term, for example, means something like “relational theory as implemented in current relational technology”.

42 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS

Trang 3

In addition, these authors do not attempt, in their book, to

explain how this method of managing temporal data would work

with current relational technology Like much of the computer

science research on temporal data, they allude to SQL operators

and other constructs that do not yet exist, and so their book is in

large part a recommendation to the standards committees to

adopt the changes to the SQL language which they describe

Because our own concern is with how to implement temporal

concepts with today’s technologies, and also with how to

sup-port both kinds of uni-temporal data, as well as fully bi-temporal

data, we will have little more to say about the alternative

tempo-ral model in this book

Best Practices

Over several decades, a best practice has emerged in

manag-ing temporal queryable state data It is to manage this kind of

data by versioning otherwise conventional tables The result is

versioned tables which, logically speaking, are tables which

com-bine the history tables and current tables described previously

Past, present and future states of customers, for example, are

kept in one and the same Customer table Corrections may or

may not be flagged; but if they are not, it will be impossible to

distinguish versions created because something about a

cus-tomer changed from versions created because past cuscus-tomer

data was entered incorrectly On the other hand, if they are

flagged, the management and use of these flags will quickly

become difficult and confusing

There are many variations on the theme of versioning, which

we have grouped into four major categories We will discuss

them in Chapter 4

The IT community has always used the term “version” for this

kind of uni-temporal data And this terminology seems to reflect

an awareness of an important concept that, as we shall see, is

cen-tral to the Asserted Versioning approach to temporal data For the

term “version” naturally raises the question “A version of what?”, to

which our answer is “A version of anything that can persist and

change over time” This is the concept of a persistent object, and

it is, most fundamentally, what Asserted Versioning is about

Bi-Temporal State Data

We now come to our second option, which is to manage

both versions and assertions and, most importantly, their

interdependencies This is bi-temporal data management, the

subject of both Dr Rick Snodgrass’s book [2000, Snodgrass] and

of our book

Trang 4

The Standard Temporal Model What we call the standard temporal model was developed

by Dr Rick Snodgrass in his book Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, 2000) Based on the computer science work current at that time, and especially on the work Dr Snodgrass and others had done

on the TSQL (temporal SQL) proposal to the SQL standards committees, it shows how to implement both uni-temporal and bi-temporal data management using then-current DBMSs and then-current SQL

We emphasize that, as we are writing, Dr Snodgrass’s book is

a decade old We use it as our baseline view of computer science work on bi-temporal data because most of the computer science literature exists in the form of articles in scientific journals that are not readily accessible to many IT professionals We also emphasize that Dr Snodgrass did not write that book as a com-pendium of computer science research for an IT audience Instead, he wrote it as a description of how some of that research could be adapted to provide a means of managing bi-temporal data with the SQL and the DBMSs available at that time One of the greatest strengths of the standard model is that it discusses and illustrates both the maintenance and the querying

of temporal data at the level of SQL statements For example, it shows us the kind of code that is needed to apply the temporal analogues of entity integrity and referential integrity to temporal data And for any readers who might think that temporal data management is just a small step beyond the versioning they are already familiar with, many of the constraint-checking SQL statements shown in Dr Snodgrass’s book should suffice

to disabuse them of that notion

The Asserted Versioning Temporal Model What we call the Asserted Versioning temporal model is our own approach to managing temporal data Like the standard model, it attempts to manage temporal data with current tech-nology and current SQL

The Asserted Versioning model of uni-temporal and bi-tem-poral data management supports all of the functionality of the standard model In addition, it extends the standard model’s notion of transaction time by permitting data to be physically added to a table prior to the time when that data will appear

in the table as production data, available for use This is done

by means of deferred transactions, which result in deferred assertions, those being the inserted, updated or logically deleted

Trang 5

rows resulting from those transactions.7 Deferred assertions,

although physically co-located in the same tables as other data,

will not be immediately available to normal queries But once

time in the real world reaches the beginning of their assertion

periods, they will, by that very fact, become currently asserted

data, part of the production data that makes up the database

as it is perceived by its users

We emphasize that deferred assertions are not the same thing

as rows describing what things will be like at some time in the

future Those latter rows are current claims about what things

will be like in the future They are ontologically post-dated

Deferred assertions are rows describing what things were, are,

or will be like, but rows which we are not yet willing to claim

make true statements They are epistemologically post-dated

Another way that Asserted Versioning differs from the

stan-dard temporal model is in the encapsulation and simplification

of integrity constraints The encapsulation of integrity

con-straints is made possible by distinguishing temporal transactions

from physical transactions Temporal transactions are the ones

that users write The corresponding physical transactions are

what the DBMS applies to asserted version tables The Asserted

Versioning Framework (AVF) uses an API to accept temporal

transactions Once it validates them, the AVF translates each

temporal transaction into one or more physical transactions

By means of triggers generated from a combination of a logical

data model together with supplementary metadata, the AVF

enforces temporal semantic constraints as it submits physical

transactions to the DBMS

The simplification of these integrity constraints is made

possi-ble by introducing the concept of an episode With non-temporal

tables, a row representing an object can be inserted into that table

at some point in time, and later deleted from the table After it is

deleted, of course, that table no longer contains the information

that the row was ever present Corresponding to the period of

time during which that row existed in that non-temporal table,

there would be an episode in an asserted version table, consisting

of one or more temporally contiguous rows for the same object

So an episode of an object in an asserted version table is in effect

during exactly the period of time that a row for that object would

exist in a non-temporal table And just as a deletion in a

conven-tional table can sometime later be followed by the insertion of a

new row with the same primary key, the termination of an

7 The term “deferred transaction” was suggested by Dr Snodgrass during a series of

email exchanges which the authors had with him in the summer of 2008.

Trang 6

episode in an assertion version table can sometime later be followed by the insertion of a new episode for the same object

In a non-temporal table, each row must conform to entity integrity and referential integrity constraints In an asserted ver-sion table, each verver-sion must conform to temporal entity integ-rity and temporal referential integinteg-rity constraints As we will see, the parallels are in more than name only Temporal entity integrity really is entity integrity applied to temporal data Tem-poral referential integrity really is referential integrity applied to temporal data

Glossary References

Glossary entries whose definitions form strong inter-dependencies are grouped together in the following list The same glossary entries may be grouped together in different ways

at the end of different chapters, each grouping reflecting the semantic perspective of each chapter There will usually be sev-eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered

as-is as-was Asserted Versioning Asserted Versioning Framework (AVF) episode

persistent object state

thing physical transaction temporal transaction temporal entity integrity (TEI) temporal referential integrity (TRI) the alternative temporal model the Asserted Versioning temporal model the standard temporal model

Trang 7

2

AN INTRODUCTION TO

ASSERTED VERSIONING

Chapter Contents

3 The Origins of Asserted Versioning: Computer Science Research 51

4 The Origins of Asserted Versioning: The Best Practices 75

5 The Core Concepts of Asserted Versioning 95

6 Diagrams and Other Notations 119

7 The Basic Scenario 141

Part 1 provided the context for Asserted Versioning, a history

and a taxonomy of various ways in which temporal data has

been managed over the last several decades Here in Part 2, we

introduce Asserted Versioning itself and prepare the way for

the detailed discussion in Part 3 of how Asserted Versioning

actually works

In Chapter 3, we discuss the origins of Asserted Versioning in

computer science research Based on the work of computer

scientists, we introduce the concepts of a clock tick and an

atomic clock tick, the latter of which, in their terminology, is

called a chronon We go on to discuss the various ways in which

time periods are represented by pairs of dates or of timestamps,

since SQL does not directly support the concept of a time period

There are only a finite number of ways that two time periods

can be situated, with respect to one another, along a common

Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00024-8

Trang 8

timeline For example, one time period may entirely precede or entirely follow another, they may partially overlap or be identi-cal, they may start at different times but end at the same time, and so on These different relationships among pairs of time per-iods have been identified and catalogued, and are called the Allen relationships They will play an important role in our discussions of Asserted Versioning because there are various ways in which we will want to compare time periods With the Allen relationships as a completeness check, we can make sure that we have considered all the possibilities

Another important section of this chapter discusses the dif-ference between the computer science notion of transaction time, and our own notion of assertion time This difference is based on our development of the concepts of deferred trans-actions and deferred assertions, and for their subsumption under the more general concept of a pipeline dataset

In Chapter 4, we discuss the origins of Asserted Versioning in

IT best practices, specifically those related to versioning We believe that these practices are variations on four basic methods

of versioning data In this chapter, we present each of these methods by means of examples which include sample tables and a running commentary on how inserts, updates and deletes affect the data in those tables

In Chapter 5, we present the conceptual foundations of Asserted Versioning The core concepts of objects, episodes, vers-ions and assertvers-ions are defined, a discussion which leads us to the fundamental statement of Asserted Versioning, that every row in an asserted version table is the assertion of a version of

an episode of an object We continue on to discuss how time periods are represented in asserted version tables, how temporal entity integrity and temporal referential integrity enforce the core semantics of Asserted Versioning, and finally how Asserted Versioning internalizes the complexities of temporal data management

In Chapter 6, we introduce the schema common to all asserted version tables, as well as various diagrams and notations that will be used in the rest of the book We also introduce the topic of how Asserted Versioning supports the dynamic views that hide the complexities of that schema from query authors who would otherwise likely be confused by that complexity When an object is represented by a row in a non-temporal table, the sequence of events begins with the insertion of that row, continues with zero or more updates, and either continues

on with no further activity, or ends when the row is eventually deleted When an object is represented in an asserted version

48 Part 2 AN INTRODUCTION TO ASSERTED VERSIONING

Trang 9

table, the result includes one row corresponding to the insert in

the non-temporal table, additional rows corresponding to the

updates to the original row in the non-temporal table, and an

additional row if a delete eventually takes place This sequence

of events constitutes what we call the basic scenario of activity

against both conventional and asserted version tables In

Chap-ter 7, we describe how the basic scenario works when the target

of that activity is an asserted version table

Glossary References

Glossary entries whose definitions form strong

inter-dependencies are grouped together in the following list The

same Glossary entries may be grouped together in different ways

at the end of different chapters, each grouping reflecting the

semantic perspective of each chapter There will usually be

sev-eral other, and often many other, Glossary entries that are not

included in the list, and we recommend that the Glossary be

consulted whenever an unfamiliar term is encountered

Allen relationships

time period

assertion

version

episode

object

assertion time

transaction time

atomic clock tick

chronon

clock tick

deferred assertion

deferred transaction

pipeline dataset

temporal entity integrity

temporal referential integrity

Part 2 AN INTRODUCTION TO ASSERTED VERSIONING 49

Trang 10

3 THE ORIGINS OF ASSERTED

VERSIONING: COMPUTER

SCIENCE RESEARCH

CONTENTS

The Roots of Asserted Versioning 51

Computer Science Research 54

Clocks and Clock Ticks 55

Time Periods and Date Pairs 56

The Very Concept of Bi-Temporality 63

Allen Relationships 65

Advanced Indexing Strategies 68

Temporal Extensions to SQL 69

Glossary References 72

We begin this chapter with an overview of the three sources of

Asserted Versioning: computer science work on temporal data;

best practices in the IT profession related to versioning; and

original work by the authors themselves We then spend the rest

of this chapter discussing computer science contributions to

temporal data management, and the relevance of some of these

concepts to Asserted Versioning

The Roots of Asserted Versioning

Over the last three decades, the computer science community

has done extensive work on temporal data, and especially on

bi-temporal data During that same period of time, the IT

commu-nity has developed various forms of versioning, all of which are

methods of managing one of the two kinds of uni-temporal data

Asserted Versioning may be thought of as a method of

manag-ing both uni- and bi-temporal data which, unlike the standard

model of temporal data management, recognizes that rows

in bi-temporal tables represent versions of things and that,

Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00003-0

Tiêu đề	A Taxonomy Of Bi-Temporal Data Management Methods
Tác giả	Chris Date, Hugh Darwen, Dr. Nikos Lorentzos
Trường học	Morgan Kaufmann
Chuyên ngành	Relational Databases
Thể loại	Tài liệu
Năm xuất bản	2002
Thành phố	San Francisco

Định dạng
Số trang	20
Dung lượng	212,19 KB