1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Managing time in relational databases- P16 pdf

20 280 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Deferred assertions and other pipeline datasets
Thể loại Chapter
Định dạng
Số trang 20
Dung lượng 389,39 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

assertion time assertion closed assertion conventional table dataset episode open episode statement hand-over clock tick instance type managed object object oid persistent object thing o

Trang 1

beginning assertion time of the not-yet-approved parent We

are working on the problem as this book goes to press We know

that the problem is not insoluble But we also know that it is

difficult

Glossary References

Glossary entries whose definitions form strong

interdep-endencies are grouped together in the following list The same

glossary entries may be grouped together in different ways at

the end of different chapters, each grouping reflecting the

semantic perspective of each chapter There will usually be

several other, and often many other, glossary entries that are

not included in the list, and we recommend that the Glossary

be consulted whenever an unfamiliar term is encountered

We note, in particular, that the nine terms used to refer to the

act of giving a truth value to a statement, listed in the section

The Semantics of Deferred Assertion Time, are not included in this

list Nor are nodes in our Allen Relationship taxonomy or our

State Transformation taxonomy included in this list

12/31/9999

clock tick

closed-open

Now()

Allen relationships

approval transaction

assertion group date

deferred assertion group

deferred assertion

deferred transaction

empty assertion time

fall into currency

fall out of currency

far future assertion time

near future assertion time

override

lock

retrograde movement

Asserted Versioning Framework (AVF)

assertion begin date

assertion end date

assertion time period

Trang 2

assertion time assertion closed assertion conventional table dataset

episode open episode statement hand-over clock tick instance

type managed object object

oid persistent object thing

occupied represented match replace supercede withdraw pipeline dataset inflow pipeline dataset inflow pipeline outflow pipeline dataset outflow pipeline production data production database production dataset production table row creation date temporal dimension temporal entity integrity (TEI) temporal foreign key (TFK) temporal referential integrity (TRI) the standard temporal model

Trang 3

transaction table

transaction time

version

effective begin date

effective end date

effective time period

Trang 4

RE-PRESENTING INTERNALIZED

PIPELINE DATASETS

CONTENTS

Internalized Pipeline Datasets 292

Pipeline Datasets as Queryable Objects 296

Posted History: Past Claims About the Past 297

Posted Updates: Past Claims About the Present 298

Posted Projections: Past Claims About the Future 299

Current History: Current Claims About the Past 300

Current Data: Current Claims About the Present 301

Current Projections: Current Claims About the Future 303

Pending History: Future Claims About the Past 304

Pending Updates: Future Claims About the Present 305

Pending Projections: Future Claims About the Future 306

Mirror Images of the Nine-Fold Way 307

The Value of Internalizing Pipeline Datasets 308

Glossary References 309

In Chapter 12, we introduced the concept of pipeline datasets

These are files, tables or other physical datasets in which the

managed object itself represents a type and contains multiple

managed objects each of which represents an instance of that

type, and which in turn themselves contain instances of other

types Using the language of tables, rows and columns, these

managed objects are tables, the instances they contain are rows,

and those last-mentioned types are the columns of those tables,

whose instances describe the properties and relationships of the

objects represented by those rows

Because our focus is temporal data management at the level

of tables and rows, and not at the level of databases, we have

discussed pipeline datasets as though there were a distinct set

of them for each production table.Figure 13.1shows one

con-ventional table, and a set of eight pipeline datasets related to it

Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00013-3

Copyright # 2010 Elsevier Inc All rights of reproduction in any form reserved. 289

Trang 5

WhatFigure 13.1 illustrates is a simplification of the always complex and usually messy physical database environment which IT departments everywhere must manage Pipeline datasets may often contain data targeted at, or derived from, several tables within that database They do not necessarily tar-get, or derive from, single tables within a database In addition, the IT industry has only the broadest of categories of pipeline datasets, categories such as batch transaction tables, logfiles of processed transactions, history tables, or staging areas where unusually complicated data transformations are carried out before the data is moved back into the production tables from whence it originated

Figure 13.1 shows eight different types of pipeline datasets surrounding a conventional table of current data These nine datasets align with the set of nine categories of temporal data which we introduced in Chapter 12

Given a bi-temporal framework of two temporal dimensions,

in each of which data can exist in the past, the present or the future, this set of nine categories is what results from the intersec-tion of those two temporal dimensions In addiintersec-tion, since the past, present and future are clear and distinct within each tempo-ral dimension, and since each dimension is clear and distinct from the other, the result of this intersection is a set of nine categories which are themselves clear and distinct, which are, pre-cisely, jointly exhaustive and mutually exclusive Like our taxonomies, they cover all the ground there is to cover, and they don’t overlap Like our taxonomies, they are what mathematicians call a partitioning of their domain Like our taxonomies, they

Posted History Current History Pending History

A Conventional Table

Current Data

Posted Projections Current Projections Pending Projections

Figure 13.1 Physically Distinct Pipeline Datasets

Trang 6

assure us that in our discussions, we won’t overlook anything and

we won’t confuse anything with anything else

In the previous chapter, we showed how to physically

inter-nalize one particular kind of pipeline dataset within the

produc-tion tables which are their destinaproduc-tions or points of origin We

showed how to turn them from distinct physical collections of

data into logical collections of data that share residence in a

single physical table

The internalization of pipeline datasets is illustrated in

Figure 13.2 These internalizations of pipeline datasets are not

themselves managed objects to either the operating system or

the DBMS They are managed objects only to the AVF The

operating system recognizes and manages database instances,

but is neither aware of nor can manage tables, rows, columns

or the other managed objects that exist within database

instances As for the DBMS, once these pipeline datasets are

internalized, all it sees is the production table itself, and the

columns and rows of that table

In this chapter, we show how to re-present these internalized

datasets as queryable objects We use the hyphenated form

“re-present” advisedly We do mean that we will show how to

represent those internalized datasets as queryable objects, in

the ordinary sense of the word “represent” But we also wish to

emphasize that we are re-presenting, i.e presenting again, things

whose presence we had removed.1Those things are the physical

An Asserted Version Table

Posted History

Posted Updates Current Data

Current History Pending History

Pending Updates Posted Projections Current Projections Pending Projections

Figure 13.2 Internalized Pipeline Datasets

1 We also wish to avoid confusion with our technical term represent, in which an object,

we say, is represented in an effective time clock tick within an assertion time clock tick

just in case business data describing that object exists on an asserted version row

whose assertion and effective time periods contain those clock tick pairs.

Trang 7

pipeline datasets which, in the previous chapter, we showed how

to internalize within the production tables which are their destinations or points of origin

For example, we show how to provide, as queryable objects, all the pending transactions against a production table, or a logfile of posted transactions that have already been applied to that table,

or a set of data from that table which we currently claim to be true, or that same set of data but as it was originally entered and prior to any corrections that may have been made to it

We do not claim that any of these eight types of pipeline dataset correspond to data that supports a specific business need For the most part, that will not be the case For example, auditors will frequently want to look at Posted History pipeline datasets, i.e at the rows that belong to that logical category

of temporal data But they will usually want to see current assertions about the historical past of the objects they are inter-ested in, along with those past assertions The current assertions about historical data are logically part of, as we will see, the Posted Updates pipeline dataset So to provide queryable objects corresponding to their specific business requirements, auditors will usually write queries directly against asserted version tables, queries that combine and filter data from any number of these pipeline datasets

To take another example, the Pending Projections pipeline dataset does not distinguish data in the near assertion time future from data in the far assertion time future Yet deferred assertions with an assertion begin date that will become current

an hour from now serve an entirely different business purpose than deferred assertions whose assertion begin date is January

1st, 5000 So to provide queryable objects corresponding to real business requirements, we will often have to write queries that filter out rows from within a single pipeline dataset, and com-bine rows from multiple pipeline datasets

Internalized Pipeline Datasets

We can say what things used to be like, what they are like, and also what they will be like These statements we can make are statements about, respectively, the past, the present and the future In a table in a database, each row makes one such state-ment In conventional tables, however, the only rows are ones that make statements about the present

These things we say represent what we claim is true Of course, as we saw in Chapter 12, we can equally well say that

Trang 8

they represent what we accept as true, agree is true, assent to or

assert as true, or believe, know or think is true For now, we’ll just

call them our truth claims, or simply our claims, about the

statements made by rows in our tables

Besides what we currently claim is true, there are also claims

that we once made but are no longer willing to make These

are statements that, based on our current understanding of

things, are not true, or should no longer be considered as

reli-able sources of information It is also the case that we may have

statements—whether about the past, the present or the future—

that we are not yet willing to claim are true, but which

none-theless are “works in progress” that we intend to complete and

that, at that time, we will be willing to claim are true Or perhaps

they are complete, and we are pretty certain that they are

cor-rect, but we are waiting on a business decision-maker to review

them and approve them for release as current assertions The

former is a set of transactions about to be applied to the

data-base The latter is a set of data in a staging area, either waiting

for additional work to be performed on it, or waiting for review

and approval

So if statements may be about what things were, are or will

be like, and claims about statements may have once been made

and later repudiated, or be current claims, or be claims that

we are not yet willing to make but might at some time in the

future be willing to make, then the intersection of facts and

claims creates a matrix of nine temporal combinations That

matrix is shown inFigure 13.3.2

what things

used to be like

what we used to claim

what we used to claim

things used to be like

what we currently claim things used to be like

what we will claim things used to be like

what we will claim things are like now

what we will claim things will be like

what we currently claim things are like now what we currently claim things will be like

what we used to claim

things are like now

what we used to claim

things will be like

what we currently claim what we will claim

what things

are like

what things

will be like

Figure 13.3 Facts, Claims and Time

2

With the substitution of the word “claims” for “beliefs”, this is the same matrix shown

in Figure 12.1 Chapter 12 also contains a discussion of the interchangeability of

“claims”, “beliefs” and several other terms We note, however, that “claims” is a

stronger word than “beliefs” in this sense, that some of the things we believe are

true are things we are nonetheless not yet willing to claim are true We take “claims”,

and “asserts” or “assertions”, to be synonymous, and the other equivalent terms

discussed in Chapter 12 to be terminological variations that appear more or less

suitable in different contexts.

Trang 9

The reason we are interested in the intersection of facts and claims is that rows in database tables are both All rows in data-base tables represent factual claims One aspect of the row is that it represents a statement of fact The other aspect is that it represents a claim that that statement of fact is, in fact, true This

is just as true of conventional tables as it is of asserted version tables

When dealing with periods of time, as we are, the past includes all and only those periods of time which end before Now() The future includes all and only those periods of time which begin after Now() The present includes all and only those periods of time which include Now()

Every row in a bi-temporal table is tagged with two periods

of time, which we call assertion time and effective time Conse-quently, every row falls into one of these nine categories Con-ventional tables contain rows which exist in only one of these nine temporal combinations They are rows which represent current claims about what things are currently like But since conventional tables do not contain any of the other eight categories of rows, their rows don’t need explicit time periods

to distinguish them from rows in those other categories And in conventional tables, of course, they don’t have them

Both the assertion and the effective time periods of conven-tional rows are co-extensive with their physical presence in their tables They begin to be asserted, and also go into effect, when they are created; and they remain asserted, and also remain in effect, until they are deleted They don’t keep track of history because they aren’t interested in it They don’t distinguish updates which correct mistakes in data from updates which keep data current with a changing reality, ultimately because the busi-ness doesn’t notice the difference, or is willing to tolerate the ambiguity in the data

So conventional tables, all in all, are a poor kind of thing They do less than they could, and less than the business needs them to do They overwrite history They don’t distinguish between correcting mistakes and making changes to keep up with a changing world And these conventional tables, as we all know, make up the vast majority of all persistent object tables managed by IT departments

We put up with tables like these because the IT profession isn’t yet aware that there is an alternative and because, by dint

of hard work, we can make up for the shortcomings of these tables Data which falls into one of the other eight categories can usually be found somewhere, or reconstructed from data that can be found somewhere If all else fails, DBMS archives

Trang 10

and backups, and their associated transaction logs, will usually

enable us to recreate any state that the database has been in

They will allow us to re-present six of the nine temporal

categories we have identified.3

The three categories that cannot be re-presented from

backups and logfiles are the three categories of future claims—

things we are going to make our databases say (unless we

change our minds) about what things once were like, or are like

now, or may be like in the future Future claims often start out as

scribbled notes on someone’s desk But once inside the machine,

they exist in transaction datasets, in collections of data that are

intended, at some time or other, to be applied to the database

and become currently asserted data

In the previous chapter, we called the eight categories of

data which are not current claims about the present, pipeline

datasets, collections of data that exist at various points along

the pipelines leading into production tables or leading out from

them As physically separate from those production tables, these

collections of data are generally not immediately available for

business use Usually, IT technical personnel must do some work

on these physical files or tables before a business user can query

them for information

This takes time, and until the work is complete, the

informa-tion is not available By the time the work is complete, the

busi-ness value of the information may be much reduced This work

also has its costs in terms of how much time those technicians

must spend to prepare that data to be queried In addition, even

without special requests for information in them, these physical

datasets, taken together, constitute a significant management

cost for IT

With multiple points of rest in the pipelines leading into and

out of production database tables, there are multiple points at

which data can be lost For example, data can be accidentally

deleted before any copies are made For datasets in the inflow

pipelines, and which have not yet made it into the database

itself, the only recourse for lost data is to reacquire or recreate

the data If prior datasets in the pipeline have already been

3 That’s the idea, anyway In reality, this “data of last resort” isn’t always there when

we go looking for it Backups and logfiles are rarely kept forever, so the data we need

may have been purged or written over There will inevitably be occasional intervals

during which the system hiccupped, and simply failed to capture the data in the first

place If the data is still available, it might not be in a readily accessible format because

of schema changes made after it was captured.

Ngày đăng: 21/01/2014, 08:20

TỪ KHÓA LIÊN QUAN