State Temporal Data: Uni-Temporal and Bi-Temporal Data At this point in our discussion, we are concerned with state data rather than with event data, and with state data that is queryabl
Trang 1approach is better at tracking changes to persistent objects and
to relationships other than metric balances
State Temporal Data: Uni-Temporal
and Bi-Temporal Data
At this point in our discussion, we are concerned with state
data rather than with event data, and with state data that is
queryable rather than state data that needs to be reconstructed
What then are the various options for managing temporal
queryable state data?
First of all, we need to recognize that there are two kinds of
states to manage One is the state of the things we are interested
in, the states those things pass through as they change over time
But there is another kind of state, that being the state of the data
itself Data, such as rows in tables, can be in one of two states:
correct or incorrect (As we will see in Chapter 12, it can also
be in a third state, one in which it is neither correct nor
incor-rect.) Version tables and assertion tables record, respectively,
the state of objects and the state of our data about those objects
Uni-Temporal State Data
In a conventional Customer table, each row represents the
cur-rent state of a customer Each time the state of a customer
changes, i.e each time a row is updated, the old data is overwritten
with the new data By adding one (or sometimes two) date(s) or
timestamp(s) to the primary key of the table, it becomes a
uni-temporal table But since we already know that there are two
dif-ferent temporal dimensions that can be associated with data, we
know to ask “What kind of uni-temporal table?”
As we saw in the Preface, there are uni-temporal version
tables and uni-temporal assertion tables Version tables keep
track of changes that happen in the real world, changes to the
objects represented in those tables Each change is recorded as
a new version of an object Assertion tables keep track of
correct-ions we have made to data we later discovered to be in error
Each correction is recorded as a new assertion about the object
The versions make up a true history of what happened to those
objects The assertions make up a virtual logfile of corrections
to the data in the table
Usually, when table-level temporal data is discussed, the
tables turn out to be version tables, not assertion tables In their
book describing the alternative temporal model [2002, Date,
Darwen, Lorentzos], the authors focus on uni-temporal
versioned data Bi-temporality is not even alluded to until the
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 41
Trang 2penultimate chapter, at which point it is suggested that “logged time history” tables be used to manage the other temporal dimension Since bi-temporality receives only a passing mention
in that book, we choose to classify the alternative temporal model as a uni-temporal model
In IT best practices for managing temporal data—which we will discuss in detail in Chapter 4—once again the temporal tables are version tables, and error correction is an issue that is mostly left to take care of itself.4 For the most part, it does so
by overwriting incorrect data.5 This is why we classify IT best practices as uni-temporal models
The Alternative Temporal Model What we call the alternative temporal model was developed
by Chris Date, Hugh Darwen and Dr Nikos Lorentzos in their book Temporal Data and the Relational Model (Morgan-Kaufmann, 2002).6 This model is based in large part on tec-hniques developed by Dr Lorentzos to manage temporal data
by breaking temporal durations down into temporally atomic components, applying various transformations to those compo-nents, and then re-assembling the components back into those temporal durations—a technique, as the authors note, whose applicability is not restricted to temporal data
As we said, except for the penultimate chapter in that book, the entire book is a discussion of uni-temporal versioned tables
In that chapter, the authors recommend that if there is a require-ment to keep track of the assertion time history of a table (which they call “logged-time history”), it be implemented by means of
an auxiliary table which is maintained by the DBMS
4 Lacking criteria to distinguish the best from the rest, the term “best practices” has come to mean little more than “standard practices” What we call “best practices”, and which we discuss in Chapter 4, are standard practices we have seen used by many of our clients.
5 An even worse solution is to mix up versions and assertions by creating a new row, with a begin date of Now(), both every time there is a real change, and also every time there is an error in the data to correct When that happens, we no longer have a history
of the changes things went through, because we cannot distinguish versions from corrections And we no longer have a “virtual logfile” of corrections because we don’t know how far back the corrections should actually have taken effect.
6 The word “model”, as used here and also in the phrases “alternative model” and
“Asserted Versioning model” obviously doesn’t refer to a data model of specific subject matter It means something like theory, but with an emphasis on its applicability to real-world problems So “the relational model”, as we use the term, for example, means something like “relational theory as implemented in current relational technology”.
42 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
Trang 3In addition, these authors do not attempt, in their book, to
explain how this method of managing temporal data would work
with current relational technology Like much of the computer
science research on temporal data, they allude to SQL operators
and other constructs that do not yet exist, and so their book is in
large part a recommendation to the standards committees to
adopt the changes to the SQL language which they describe
Because our own concern is with how to implement temporal
concepts with today’s technologies, and also with how to
sup-port both kinds of uni-temporal data, as well as fully bi-temporal
data, we will have little more to say about the alternative
tempo-ral model in this book
Best Practices
Over several decades, a best practice has emerged in
manag-ing temporal queryable state data It is to manage this kind of
data by versioning otherwise conventional tables The result is
versioned tables which, logically speaking, are tables which
com-bine the history tables and current tables described previously
Past, present and future states of customers, for example, are
kept in one and the same Customer table Corrections may or
may not be flagged; but if they are not, it will be impossible to
distinguish versions created because something about a
cus-tomer changed from versions created because past cuscus-tomer
data was entered incorrectly On the other hand, if they are
flagged, the management and use of these flags will quickly
become difficult and confusing
There are many variations on the theme of versioning, which
we have grouped into four major categories We will discuss
them in Chapter 4
The IT community has always used the term “version” for this
kind of uni-temporal data And this terminology seems to reflect
an awareness of an important concept that, as we shall see, is
cen-tral to the Asserted Versioning approach to temporal data For the
term “version” naturally raises the question “A version of what?”, to
which our answer is “A version of anything that can persist and
change over time” This is the concept of a persistent object, and
it is, most fundamentally, what Asserted Versioning is about
Bi-Temporal State Data
We now come to our second option, which is to manage
both versions and assertions and, most importantly, their
interdependencies This is bi-temporal data management, the
subject of both Dr Rick Snodgrass’s book [2000, Snodgrass] and
of our book
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 43
Trang 4The Standard Temporal Model What we call the standard temporal model was developed
by Dr Rick Snodgrass in his book Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, 2000) Based on the computer science work current at that time, and especially on the work Dr Snodgrass and others had done
on the TSQL (temporal SQL) proposal to the SQL standards committees, it shows how to implement both uni-temporal and bi-temporal data management using then-current DBMSs and then-current SQL
We emphasize that, as we are writing, Dr Snodgrass’s book is
a decade old We use it as our baseline view of computer science work on bi-temporal data because most of the computer science literature exists in the form of articles in scientific journals that are not readily accessible to many IT professionals We also emphasize that Dr Snodgrass did not write that book as a com-pendium of computer science research for an IT audience Instead, he wrote it as a description of how some of that research could be adapted to provide a means of managing bi-temporal data with the SQL and the DBMSs available at that time One of the greatest strengths of the standard model is that it discusses and illustrates both the maintenance and the querying
of temporal data at the level of SQL statements For example, it shows us the kind of code that is needed to apply the temporal analogues of entity integrity and referential integrity to temporal data And for any readers who might think that temporal data management is just a small step beyond the versioning they are already familiar with, many of the constraint-checking SQL statements shown in Dr Snodgrass’s book should suffice
to disabuse them of that notion
The Asserted Versioning Temporal Model What we call the Asserted Versioning temporal model is our own approach to managing temporal data Like the standard model, it attempts to manage temporal data with current tech-nology and current SQL
The Asserted Versioning model of uni-temporal and bi-tem-poral data management supports all of the functionality of the standard model In addition, it extends the standard model’s notion of transaction time by permitting data to be physically added to a table prior to the time when that data will appear
in the table as production data, available for use This is done
by means of deferred transactions, which result in deferred assertions, those being the inserted, updated or logically deleted
44 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
Trang 5rows resulting from those transactions.7 Deferred assertions,
although physically co-located in the same tables as other data,
will not be immediately available to normal queries But once
time in the real world reaches the beginning of their assertion
periods, they will, by that very fact, become currently asserted
data, part of the production data that makes up the database
as it is perceived by its users
We emphasize that deferred assertions are not the same thing
as rows describing what things will be like at some time in the
future Those latter rows are current claims about what things
will be like in the future They are ontologically post-dated
Deferred assertions are rows describing what things were, are,
or will be like, but rows which we are not yet willing to claim
make true statements They are epistemologically post-dated
Another way that Asserted Versioning differs from the
stan-dard temporal model is in the encapsulation and simplification
of integrity constraints The encapsulation of integrity
con-straints is made possible by distinguishing temporal transactions
from physical transactions Temporal transactions are the ones
that users write The corresponding physical transactions are
what the DBMS applies to asserted version tables The Asserted
Versioning Framework (AVF) uses an API to accept temporal
transactions Once it validates them, the AVF translates each
temporal transaction into one or more physical transactions
By means of triggers generated from a combination of a logical
data model together with supplementary metadata, the AVF
enforces temporal semantic constraints as it submits physical
transactions to the DBMS
The simplification of these integrity constraints is made
possi-ble by introducing the concept of an episode With non-temporal
tables, a row representing an object can be inserted into that table
at some point in time, and later deleted from the table After it is
deleted, of course, that table no longer contains the information
that the row was ever present Corresponding to the period of
time during which that row existed in that non-temporal table,
there would be an episode in an asserted version table, consisting
of one or more temporally contiguous rows for the same object
So an episode of an object in an asserted version table is in effect
during exactly the period of time that a row for that object would
exist in a non-temporal table And just as a deletion in a
conven-tional table can sometime later be followed by the insertion of a
new row with the same primary key, the termination of an
7 The term “deferred transaction” was suggested by Dr Snodgrass during a series of
email exchanges which the authors had with him in the summer of 2008.
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 45
Trang 6episode in an assertion version table can sometime later be followed by the insertion of a new episode for the same object
In a non-temporal table, each row must conform to entity integrity and referential integrity constraints In an asserted ver-sion table, each verver-sion must conform to temporal entity integ-rity and temporal referential integinteg-rity constraints As we will see, the parallels are in more than name only Temporal entity integrity really is entity integrity applied to temporal data Tem-poral referential integrity really is referential integrity applied to temporal data
Glossary References
Glossary entries whose definitions form strong inter-dependencies are grouped together in the following list The same glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the semantic perspective of each chapter There will usually be sev-eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered
as-is as-was Asserted Versioning Asserted Versioning Framework (AVF) episode
persistent object state
thing physical transaction temporal transaction temporal entity integrity (TEI) temporal referential integrity (TRI) the alternative temporal model the Asserted Versioning temporal model the standard temporal model
46 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
Trang 72
AN INTRODUCTION TO
ASSERTED VERSIONING
Chapter Contents
3 The Origins of Asserted Versioning: Computer Science Research 51
4 The Origins of Asserted Versioning: The Best Practices 75
5 The Core Concepts of Asserted Versioning 95
6 Diagrams and Other Notations 119
7 The Basic Scenario 141
Part 1 provided the context for Asserted Versioning, a history
and a taxonomy of various ways in which temporal data has
been managed over the last several decades Here in Part 2, we
introduce Asserted Versioning itself and prepare the way for
the detailed discussion in Part 3 of how Asserted Versioning
actually works
In Chapter 3, we discuss the origins of Asserted Versioning in
computer science research Based on the work of computer
scientists, we introduce the concepts of a clock tick and an
atomic clock tick, the latter of which, in their terminology, is
called a chronon We go on to discuss the various ways in which
time periods are represented by pairs of dates or of timestamps,
since SQL does not directly support the concept of a time period
There are only a finite number of ways that two time periods
can be situated, with respect to one another, along a common
Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00024-8
Trang 8timeline For example, one time period may entirely precede or entirely follow another, they may partially overlap or be identi-cal, they may start at different times but end at the same time, and so on These different relationships among pairs of time per-iods have been identified and catalogued, and are called the Allen relationships They will play an important role in our discussions of Asserted Versioning because there are various ways in which we will want to compare time periods With the Allen relationships as a completeness check, we can make sure that we have considered all the possibilities
Another important section of this chapter discusses the dif-ference between the computer science notion of transaction time, and our own notion of assertion time This difference is based on our development of the concepts of deferred trans-actions and deferred assertions, and for their subsumption under the more general concept of a pipeline dataset
In Chapter 4, we discuss the origins of Asserted Versioning in
IT best practices, specifically those related to versioning We believe that these practices are variations on four basic methods
of versioning data In this chapter, we present each of these methods by means of examples which include sample tables and a running commentary on how inserts, updates and deletes affect the data in those tables
In Chapter 5, we present the conceptual foundations of Asserted Versioning The core concepts of objects, episodes, vers-ions and assertvers-ions are defined, a discussion which leads us to the fundamental statement of Asserted Versioning, that every row in an asserted version table is the assertion of a version of
an episode of an object We continue on to discuss how time periods are represented in asserted version tables, how temporal entity integrity and temporal referential integrity enforce the core semantics of Asserted Versioning, and finally how Asserted Versioning internalizes the complexities of temporal data management
In Chapter 6, we introduce the schema common to all asserted version tables, as well as various diagrams and notations that will be used in the rest of the book We also introduce the topic of how Asserted Versioning supports the dynamic views that hide the complexities of that schema from query authors who would otherwise likely be confused by that complexity When an object is represented by a row in a non-temporal table, the sequence of events begins with the insertion of that row, continues with zero or more updates, and either continues
on with no further activity, or ends when the row is eventually deleted When an object is represented in an asserted version
48 Part 2 AN INTRODUCTION TO ASSERTED VERSIONING
Trang 9table, the result includes one row corresponding to the insert in
the non-temporal table, additional rows corresponding to the
updates to the original row in the non-temporal table, and an
additional row if a delete eventually takes place This sequence
of events constitutes what we call the basic scenario of activity
against both conventional and asserted version tables In
Chap-ter 7, we describe how the basic scenario works when the target
of that activity is an asserted version table
Glossary References
Glossary entries whose definitions form strong
inter-dependencies are grouped together in the following list The
same Glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter There will usually be
sev-eral other, and often many other, Glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered
Allen relationships
time period
assertion
version
episode
object
assertion time
transaction time
atomic clock tick
chronon
clock tick
deferred assertion
deferred transaction
pipeline dataset
temporal entity integrity
temporal referential integrity
Part 2 AN INTRODUCTION TO ASSERTED VERSIONING 49
Trang 103 THE ORIGINS OF ASSERTED
VERSIONING: COMPUTER
SCIENCE RESEARCH
CONTENTS
The Roots of Asserted Versioning 51
Computer Science Research 54
Clocks and Clock Ticks 55
Time Periods and Date Pairs 56
The Very Concept of Bi-Temporality 63
Allen Relationships 65
Advanced Indexing Strategies 68
Temporal Extensions to SQL 69
Glossary References 72
We begin this chapter with an overview of the three sources of
Asserted Versioning: computer science work on temporal data;
best practices in the IT profession related to versioning; and
original work by the authors themselves We then spend the rest
of this chapter discussing computer science contributions to
temporal data management, and the relevance of some of these
concepts to Asserted Versioning
The Roots of Asserted Versioning
Over the last three decades, the computer science community
has done extensive work on temporal data, and especially on
bi-temporal data During that same period of time, the IT
commu-nity has developed various forms of versioning, all of which are
methods of managing one of the two kinds of uni-temporal data
Asserted Versioning may be thought of as a method of
manag-ing both uni- and bi-temporal data which, unlike the standard
model of temporal data management, recognizes that rows
in bi-temporal tables represent versions of things and that,
Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00003-0