1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Managing time in relational databases- P17 doc

20 237 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Pending Projections
Định dạng
Số trang 20
Dung lượng 390,31 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Pending Projections: Future Claims About the Future The Pending Projections dataset consists of all those rows in an asserted version table which lie in both the assertion time future an

Trang 1

Pending Updates exist in what we called, in the previous chapter, either the time near future or the assertion-time far future Those in the near future have an assertion begin date close enough to Now() that the business is willing

to let the passage of time make them current Near future deferred assertions would typically have a begin date that will become current in the next few seconds, hours, days or weeks

In a conventional database, pending updates are transactions accumulated in an external batch transaction file, or perhaps

in a batch transaction table within the database

Far future deferred assertions are the internalization of data located in what are often called staging areas They are collections

of data that are usually more complicated than usual to update By placing them in far future assertion time, we guarantee that they will not inadvertently become current assertions simply because

of the passage of time They can become current assertions only when, presumably after a review-and-approve process, the busi-ness releases them into near-future assertion time

Pending Projections: Future Claims About the Future

The Pending Projections dataset consists of all those rows in

an asserted version table which lie in both the assertion time future and in the effective time future Its subject matter is things as they may turn out to be Its rows are claims about what currently lies in the future, but claims which we are not yet willing to make Pending Projections are a record of what we may eventually be willing to say things are going to be like Here is the view which re-presents Pending Projections With the suffix “Pend_Proj” standing for “pending projections”, it looks like this:

CREATE VIEW Policy_Pend_Proj

AS SELECT oid, asr_beg_dt, asr_end_dt, eff_beg_dt, eff_end_dt, client, type, copay

what things used to be like

what we used to claim what we currently claim

what we will claim things will be like

what we will claim

what things are like what things will be like

Figure 13.12 Pending Projections

306 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS

Trang 2

FROM Policy_AV

WHERE asr_beg_dt > Now()

AND eff_beg_dt > Now()

As we have seen with our other re-presented pipeline

datasets, Pending Projections include both the assertion and

effective time period as part of the unique identifier because

both temporal dimensions are specified as ranges, and neither

as points in time

Mirror Images of the Nine-Fold Way

As we said in Chapter 9, effective time exists within assertion

time First, logically speaking, we make a statement about how

things are Next, logically speaking, we make a truth claim about

that statement

Most of our queries against bi-temporal tables will specify a

point in assertion time—most commonly Now()—and then ask

for rows asserted at that point in time that were in effect at

some point or period of effective time For example, we might

ask for all policies that were in effect on August 23, 2008, as we

currently believe them to have been Or we might ask for all

policies which we currently claim were in effect any time in

the first half of 2008

Pinning down a point in assertion time, and then asking for

versions of objects claimed at that point in time to be correct,

is the general form that queries will take when posed by business

users But we can look at bi-temporal data from the opposite

point of view as well We can pin down a point or period in

effec-tive time, and ask for everything we ever asserted about things at

that point in time

It would not be too misleading to call this the auditor’s point

of view From this point of view, we are interested in the history

of our claims about what is true, not in the history of what

actually happened out there in the world Of course, we could

also ask for all future assertions about a given point in effective

time But auditors, by the nature of their work, have little interest

in future assertions By the same token, they are very interested

in past assertions, along with current ones So an auditor’s

mirror-image of the nine categories reduces to a set of six categories,

those shown inFigure 13.13

These views that auditors are interested in are physically the

same ones we have already described The “mirror-image” is in

perspective, not in content

Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 307

Trang 3

The Value of Internalizing Pipeline Datasets

The cost of managing physical pipeline datasets is high This cost is seldom discussed because it is universally thought to be just an inevitable cost of doing business Bringing down this cost

is a matter of doing all those various things that IT management has done for decades, and continues to do Quality control pro-cedures are put in place so errors don’t creep into our databases and later have to be backed out The platform costs of storing, transforming, and moving data into and out of pipeline data-sets are controlled by minimizing redundancy, and by moving datasets up and down the storage hierarchy Software that sets

up and runs production schedules minimizes the human costs

of scheduling work involving these pipeline datasets

But the work of managing pipeline datasets is tedious And whenever the management of these datasets is a one-off kind

of thing, i.e whenever the development group has to manage these datasets rather than the IT Operations group that handles scheduled maintenance, errors in managing them are not uncommon

Asserted Versioning does not offer a way to more efficiently manage pipeline datasets It offers a way to eliminate them and, consequently, eliminate the totality of their management costs! There will always be some circumstances in which data must be manipulated in external pipeline datasets But these can become the exception rather than the rule

In place of these pipeline datasets, Asserted Versioning stores the information contained in those pipeline datasets internally, within the production tables that are their sources and destinations Pending transactions can be stored within the production tables themselves Posted transactions can be, too Data staging areas can also exist as semantically distinct sets

of rows, physically contained within production tables Pipeline datasets, then, cease to exist as distinct physical objects They become virtualized, as semantically distinct collections of rows

what things used to be like

what we used to claim what we currently claim

what we used to claim things used to be like

what we currently claim things used to be like what we currently claim things are like now what we currently claim things will be like

what we used to claim things are like now what we used to claim things will be like

what things are like what things will be like

Figure 13.13 The Auditor’s Mirror Image of the Nine-Fold Way

308 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS

Trang 4

all physically existing within the same tables, re-presented in

different views

We may think that the principal cost elimination benefit of

internalizing pipeline datasets is that it reduces the number of

distinct datasets that programs, SQL and production scheduling

software have to identify and manage This is a reduction in the

cost of the mechanics of pipeline datasets Instead of assembling

data from multiple tables, it already exists all in one place

But the more significant cost reduction has to do with the

semantics of pipeline datasets With all data about the same

things in the same place, we will, all of us, find all of it when

we go looking for it The most junior member of the business

community will find the same set of data for his queries that

the most senior member does There won’t be differences in

completeness of the source data, or quality of that data, as there

so often are in today’s business world and today’s collections of

business data

When we need any of this data, we won’t have to go looking for

it All of the data about what we once thought was true, or what

we currently think is true, or what we are not yet willing to assert

is true, will be available by simply changing the assertion

point-in-time selection criterion on views and queries By changing that

predicate in a WHERE clause to a past point in assertion time, we

will be able to access the internalized re-presentation of posted

transactions By changing the predicate to a future point in time,

we will be able to access the internalized re-presentation of

pend-ing transactions

By the same token, we will be able to access historical data

about what things used to be like from the same table that

contains data about what they are like right now, and that may

also contain data about what those things are going to be like

sometime in the future Again, it will be as easy as changing a

predicate in a WHERE clause

Glossary References

Glossary entries whose definitions form strong

inter-dependencies are grouped together in the following list The

same glossary entries may be grouped together in different ways

at the end of different chapters, each grouping reflecting the

semantic perspective of each chapter There will usually be

sev-eral other, and often many other, glossary entries that are not

included in the list, and we recommend that the Glossary be

consulted whenever an unfamiliar term is encountered

Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS 309

Trang 5

We note, in particular, that none of the nine types of pipeline dataset are included in this list In general, we leave category sets out of these lists, but recommend that the reader look them up

in the Glossary

as-is as-was Asserted Versioning Framework (AVF) assertion time

statement conventional table non-temporal table deferred assertion far future assertion time near future assertion time instance

type managed object object

oid queryable object pipeline dataset inflow pipeline outflow pipeline internalization of pipeline datasets re-presentation of pipeline datasets production database

production table temporal dimension temporal transaction version

310 Chapter 13 RE-PRESENTING INTERNALIZED PIPELINE DATASETS

Trang 6

ALLEN RELATIONSHIP AND

OTHER QUERIES

Allen Relationship Queries 313

Time Period to Time Period Queries 316

Point in Time to Period of Time Queries 333

Point in Time to Point in Time Queries 341

A Claims Processing Example 343

In Other Words 346

Glossary References 347

In this chapter, we examine each of the thirteen Allen

relation-ships, as well as each non-leaf node in the taxonomy of Allen

relationships which we introduced in Chapter 3 We describe

the Allen relationships as they hold between two time periods,

between a time period and a point in time, and also between

two points in time We show how these relationships are

expressed in terms of time periods represented with the

closed-open convention, and we provide a sample query for each one

After a section in which we illustrate how much simpler these

queries would be to express if we had a PERIOD datatype, we

conclude this chapter by discussing queries which involve

tem-poral joins

Figure 14.1 shows our taxonomy of the Allen relationships

Those relationships are the leaf nodes in this taxonomy Every

leaf node has an inverse relationship, except the [equals]

rela-tionship We italicize that relationship name to emphasize that

it has no inverse So counting the [equals] relationship, and

the six leaf nodes and their inverses, we have the full set

of thirteen Allen relationships We also underline the non-leaf

node relationships in the taxonomy, to emphasize that they

are relationships we have defined, and are not one of the Allen

relationships

Managing Time in Relational Databases Doi: 10.1016/B978-0-12-375041-9.00014-5

Copyright # 2010 Elsevier Inc All rights of reproduction in any form reserved. 311

Trang 7

Many of the Allen relationships are used by the AVF to enforce TEI and TRI For example, as we pointed out in Chapter 3, the [intersects] relationship is important because it defines TEI If two asserted versions of the same object share even a single effective time clock tick, within shared assertion time, then they [intersect], and violate TEI Otherwise, they don’t The [fills] rela-tionship is important because it defines TRI If a TRI relarela-tionship fails, it is because there is no episode of the referenced parent object which temporally includes, i.e [fills1], that of the child version The [before] relationship is important because it distinguishes episodes from one another Every episode of an object is non-contiguous with every other episode of the same object, and so one of them must be [before] the other

As for queries issued by business users, we have found that many ad hoc queries, and perhaps the majority of them, are queries about episodes, not about versions That is, they are queries that want (i) the begin and end date of the episode and, for business data, (ii) the last version of past episodes, the current version of current episodes, or the latest version of future episodes Because of the importance of episodes to queries, the SQL examples in this chapter will select episodes The last, cur-rent or latest version contains the business data The episode

| -|

| -|

Before | -| | -|

Meets | -| -|

Equals | -|

| -|

Occupies Starts Finishes During | -|

| -|

Aligns | -|

| -|

| -|

| -|

Time Period Relationships Along a Common Timeline

Figure 14.1 The Asserted Versioning Allen Relationship Taxonomy

312 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES

Trang 8

begin date that is on every version, and the version’s own effective

end date, provide the effective time period of the episode itself

We also note that the SQL in many of the following examples

does not represent typical queries that a business would write

Each of these queries focuses on one specific Allen relationship,

and show how to express it in SQL In particular, these sample

queries do not include typical join criteria Instead, the only join

criteria used in these examples are two time periods and the

Allen relationship between them

Another reason these sample queries don’t look very real

world is that they select from two of the tables in our sample

database that don’t have much to do with one another In

partic-ular, there is no TRI relationship between them They are the

Policy and Wellness Program tables If we had used, for example,

the Client and Policy tables instead, many of the queries would

have been more realistic

But TRI-related tables cannot illustrate all of the Allen

relationships In fact, every instance of a TRI relationship

involves a parent and a child time period that is an instance of

one of seven of the Allen relationships This leaves six other Allen

relationships that TRI-related tables cannot illustrate

Nevertheless, as overly simple and unrealistic as most of

these sample queries may be, they are the foundation for all

queries that express temporal relationships No query will ever

need to express a temporal relationship that is not one of these

relationships So if we know how to write the temporal

pre-dicates in these queries, we will know how to write any temporal

predicate for any query

Allen Relationship Queries

The value of reviewing all the Allen relationships in terms of

queries against asserted version tables is that, as we already

know, the Allen relationships are exhaustive There are no

posi-tional relationships along a common timeline, among time

periods and/or points in time, other than those ones Thus,

by showing how to write a query for each one of them, as well

as for the groups of them identified in our taxonomy, we will

have provided the basic material out of which any query against

any assertion version table may be expressed

In addition to the thirteen Allen relationships themselves,

our taxonomy provides five additional relationships, each of which

is a logical combination of two or more Allen relationships

And these combinations are not formed simply by stringing

together Allen relationships with OR predicates Although they

Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 313

Trang 9

are, necessarily, logically equivalent to the OR’d set of those relationships, they are often much simpler expressions, easier to understand and faster when executed

In these sample queries, we will not include predicates for asser-tion time, and will pretend that our sample tables are uni-temporal versioned tables This eliminates unnecessary detail from these examples We will do this by using two version table views, shown below: V_Wellness_Program_Curr_Asr and V_Policy_Curr_Asr The former is a view of all currently asserted Wellness Program versions The latter is a view of all currently asserted Policy versions CREATE VIEW V_Wellness_Program_Curr_Asr AS

SELECT * FROM Wellness_Program_AV WHERE asr_beg_dt <¼ Now() AND asr_end_dt > Now() CREATE VIEW V_ Policy _Curr_Asr AS SELECT * FROM Policy_AV

WHERE asr_beg_dt <¼ Now() AND asr_end_dt > Now()

In these example queries, as we said before, we will be selecting episodes, not versions For the two tables used in this chapter, these are the views which provide episodes as queryable managed objects:

CREATE VIEW V_Wellness_Program_Epis AS SELECT wp.wellpgm_oid, wp.epis_beg_dt, wp.eff_end_dt

AS epis_end_dt, wp.welllpgm_nm, wp.wellpgm_nbr, wp.wellpgm_cat_cd

FROM V_Wellness_Program_Curr_Asr AS wp WHERE wp.eff_end_dt ¼

(SELECT MAX(wpx.eff_end_dt) FROM V_Wellness_Program_Curr_Asr AS wpx WHERE wpx.wellpgm_oid ¼ wp.wellpgm_oid AND wpx.epis_beg_dt ¼ wp.epis_beg_dt) CREATE VIEW V_Policy_Epis AS

SELECT pol.policy_oid, pol.epis_beg_dt, pol.eff_end_dt AS epis_end_dt, pol.policy_type, pol.copay_amt,

pol.client_oid, pol.policy_nbr FROM V_Policy_Curr_Asr AS pol WHERE pol.eff_end_dt ¼ (SELECT MAX(px.eff_end_dt) FROM V_Policy_Curr_Asr px WHERE px.policy_oid ¼ pol.policy_oid AND px.epis_beg_dt ¼ pol.epis_beg_dt)

314 Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES

Trang 10

These episode views are the query-side work of defining an

episode datatype The AVF presents episodes as maintainable

managed objects These views present episodes as queryable

managed objects

This is a very important point Both computer science

research and IT practice have shown the importance of the

con-cept of a string of one or more contiguous clock ticks with a

known location in time SQL does not directly support this

con-cept; and so instead, we, and others, have to write code to

exclude gaps and overlaps occurring in the timespan between a

pair of dates or timestamps A PERIOD datatype is the direct

support needed for this concept This datatype implements this

concept at the correct level of abstraction

By the same token, our own research and practice has shown

the importance of the concept of an episode, a string of one or

more contiguous and non-overlapping versions of the same

object Without that concept, and the concepts of objects and

versions on which it depends, there is also no concept of

tempo-ral entity integrity and tempotempo-ral referential integrity Without

that concept, collections of rows are defined, as needed, within

each SQL statement As we can see with both the standard and

alternative temporal models, their SQL insert, update and delete

statements do result in bi-temporal data that satisfies what we

call TEI and TRI Their SQL queries do find episodes, when they

need them, past assertions when they need them, and so on But

the level of abstraction is wrong, for the same reason that getting

the same results with a pair of dates that one would get with a

PERIOD datatype is wrong

So we now have two views which externalize, as queryable

managed objects, the best data we currently have (i.e our

currently asserted data) about policy episodes and wellness

program episodes Now, using these two views, we will define

another view that we will use to illustrate each of the Allen

relationships This is the view V_Allen_Example This view

will keep the examples as small and easy to understand as

possi-ble, eliminating all extraneous and repetitive detail while

focus-ing on the Allen relationships themselves Here is the

V_Allen_Example view:

CREATE VIEW V_Allen_Example AS

SELECT wp.wellpgm_oid, pol.policy_oid,

wp.epis_beg_dt AS wp_epis_beg_dt,

wp.epis_end_dt AS wp_epis_end_dt,

pol.epis_beg_dt AS pol_epis_beg_dt,

pol.epis_end_dt AS pol_epis_end_dt

Chapter 14 ALLEN RELATIONSHIP AND OTHER QUERIES 315

Ngày đăng: 21/01/2014, 08:20

TỪ KHÓA LIÊN QUAN