Tài liệu Managing time in relational databases- P1 ppt

His current research interests are i the management of bi-temporal data with today’s DBMS technology; ii overcoming this newest gener-ation of informgener-ation stovepipes—for example, i

Trang 2

MANAGING TIME IN

RELATIONAL DATABASES

Trang 3

Companion Web site

Ancillary materials are available online at:

www.elsevierdirect.com/companions/9780123750419

Trang 4

MANAGING TIME IN

RELATIONAL DATABASES

How to Design, Update

and Query Temporal Data

TOM JOHNSTON

RANDALL WEIS

AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier

Trang 5

Morgan Kaufmann Publishers is an imprint of Elsevier.

30 Corporate Drive, Suite 400, Burlington, MA 01803, USA

This book is printed on acid-free paper.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein.

In using such information or methods they should be mindful of their own safety and the safety

of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data

Application submitted

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

ISBN: 978-0-12-375041-9

For information on all Morgan Kaufmann publications,

visit our Web site at www.mkp.com or www.elsevierdirect.com

Printed in the United States of America

10 11 12 13 14 5 4 3 2 1

Trang 6

ABOUT THE AUTHORS

Tom Johnston

Tom Johnston is an independent consultant specializing in the

design and management of data at the enterprise level He has a

doctorate in Philosophy, with an academic concentration in

ontology, logic and semantics He has spent his entire working

career in business IT, in such roles as programmer, systems

pro-grammer, analyst, systems designer, data modeler and enterprise

data architect He has designed and implemented systems in over

a dozen industries, including healthcare, telecommunications,

banking, manufacturing, transportation and retailing His current

research interests are (i) the management of bi-temporal data

with today’s DBMS technology; (ii) overcoming this newest

gener-ation of informgener-ation stovepipes—for example, in medical records

and national security databases—by more cleanly separating the

semantics of data from the syntax of its representation; and (iii)

providing additional semantics for the relational model of data

by supplementing its first-order predicate logic statements with

modalities such as time and person

Randall J Weis

Randall J Weis, founder and CEO of InBase, Inc., has more

than 24 years of experience in IT, specializing in enterprise data

architecture, including the logical and physical modeling of very

large database (VLDB) systems in the financial, insurance and

health care industries

He has been implementing systems with stringent temporal

and performance requirements for over 15 years The bi-temporal

pattern he developed for modeling history, retro activity and

future dating was used for the implementation of IBM’s Insurance

Application Architecture (IAA) model This pattern allows the

multidimensional temporal view of data as of any given effective

and assertion points in time

InBase, Inc has developed software used by many of the

nation’s largest companies, and is known for creating the first

popular mainframe spellchecker, Lingo, early in Randy’s career

Weis has been a senior consultant at InBase and other companies,

such as PricewaterhouseCoopers LLP, Solving IT International

vii

Trang 7

Inc., Visual Highway and Beyond If Informatics Randy has been a presenter at various user groups, including Guide, Share, Midwest Database Users Group and Camp IT Expo, and has developed computer courses used in colleges and corporate training programs

Randy had been married to his wife Marina for over 30 years, and has 3 children, Matt, Michelle and Nicolle He plays guitar and sings; he enjoys running, and has run several marathons

He also creates web sites and produces commercial videos

He may be reached via email at randyw@inbaseinc.com

viii ABOUT THE AUTHORS

Trang 8

Over time, things change—things like customers, products,

accounts, and so forth But most of the data we keep about

things describes what they are like currently, not what they used

to be like When things change, we update the data that

describes them so that the description remains current But all

these things have a history, and many of them have a future as

well, and often data about their past or about their future is also

important

It is usually possible to restore and then to retrieve historical

data, given enough time and effort But businesses are finding it

increasingly important to access historical data, as well as data

about the future, without those associated delays and costs

More and more, business value attaches to the ability to directly

and immediately access non-current data as easily as current

data, and to do so with equivalent response times

Conventional tables contain data describing what things are

currently like But to provide comparable access to data

describ-ing what thdescrib-ings used to be like, and to what they may be like in

the future, we believe it is necessary to combine data about the

past, the present and the future in the same tables Tables which

do this, which contain data about what the objects they

repre-sent used to be like and also data about what they may be like

later on, together with data about what those objects are like now,

are versioned tables

Versioned tables are one of two kinds of uni-temporal tables

In this book, we will show how the use of versioned tables lowers

the cost and increases the value of temporal data, data that

describes what things used to be like as well as what they are like

now, and sometimes what they will be like as well Costs, as

we will see, are lowered by simplifying the design, maintenance

and querying of temporal data Value, as we will see, is increased

by providing faster and more accurate answers to queries that

access temporal data

Another important thing about data is that, from time to

time, we occasionally get it wrong We might record the wrong

data about a particular customer’s status, indicating, for example,

that a VIP customer is really a deadbeat If we do, then as soon

as we find out about the mistake, we will hasten to fix it by

updating the customer’s record with the correct data

ix

Trang 9

But that doesn’t just correct the mistake It also covers it up Auditors are often able to reconstruct erroneous data from backups and logfiles But for the ordinary query author, no trace remains in the database that the mistake ever occurred, let alone what the mistake was, or when it happened, or for how long it went undetected

Fortunately, we can do better than that Instead of overwriting the mistake, we can keep both the original customer record and its corrected copy in the same table, along with information about when and for how long the original was thought to be correct, and when we finally realized it wasn’t and then did something about it Moreover, while continuing to provide undisturbed, directly queryable, immediate access to the data that we currently believe is correct, we can also provide that same level of access to data that we once believed was correct but now realize is not correct

There is no generally accepted term for this kind of table

We will call it an assertion table Assertion tables, as we will see, are essential for recreating reports and queries, at a later time, when the objective is to retrieve the data as it was origi-nally entered, warts and all Assertion tables are the second of the two kinds of uni-temporal tables The same data manage-ment methods which lower the cost and increase the value of versioned data also lower the cost and increase the value of asserted data

There are also tables which combine versions and assertions, and combine them in the sense that every row in these tables is both a version and an assertion These tables contain data about what we currently believe the objects they represent were/are/ will be like, data about what we once believed but no longer believe those objects were/are/will be like, and also data about what we may in the future come to believe those objects were/ are/will be like Tables like these, tables whose rows contain data about both the past, the present and the future of things, and also about the past, the present and the future of our beliefs about those things, are bi-temporal tables

In spite of several decades of work on temporal data, and a growing awareness of the value of real-time access to it, little has been done to help IT professionals manage temporal data

in real-world databases One reason is that a temporal extension

to the SQL language has yet to be approved, even though a proposal to add temporal features to the language was submitted over fifteen years ago Lacking approved standards to guide them, DBMS vendors have been slow to build temporal support into their products

x PREFACE

Trang 10

In the meantime, IT professionals have developed home-grown

support for versioning, but have paid almost no attention to

bi-temporality In many cases, they don’t know what bi-temporality

is In most cases, their business users, unaware of the benefits

of bi-temporal data, don’t know to ask for such functionality

And among those who have at least heard of bi-temporality,

or to whom we have tried to explain it, we have found two

common responses One is that Ralph Kimball solved this

problem a long time ago with his three kinds of slowly changing

dimensions Another is that we can get all the temporal

func-tionality we need by simply versioning the tables to which we

wish to add temporal data

But both responses are mistaken Slowly changing dimensions

do not support bi-temporal data management at all Nor does

versioning Both are methods of managing versions; but both

also fall, as we shall see, far short of the extensive support for

versioning that Asserted Versioning provides

Objectives of this Book

Seamless Access to Temporal Data

One objective of this book is to describe how to manage

uni-temporal and bi-temporal data in relational databases in

such a way that they can be seamlessly accessed together

with current data.1By “seamlessly” we mean (i) maintained with

transactions simple enough that anyone who writes transactions

against conventional tables could write them; (ii) accessed with

queries simple enough that anyone who writes queries against

conventional tables could write them; and (iii) executed with

performance similar to that for transactions and queries that

target conventional data only

Encapsulation of Temporal Data Structures and

Processes

A second objective is to describe how to encapsulate the

complexities of uni-temporal and bi-temporal data

manage-ment These complexities are nowhere better illustrated than in

a book published ten years ago by Dr Richard Snodgrass, the

1 Both forms of temporal data can be implemented in non-relational databases also.

For that matter, they can be implemented with a set of flat files We use the language

of relational technology simply because the ubiquity of relational database technology

makes that terminology a lingua franca within business IT departments.

PREFACE xi

Trang 11

leading computer scientist in the field In this book, Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, San Francisco, 2000), Dr Snodgrass provides extensive examples of temporal schemas and also of the SQL, for several different relational DBMSs, that is required to make uni- and bi-temporality work, and especially to enforce the constraints that must be satisfied as temporal data is created and maintained Many of these SQL examples are dozens of lines long, and quite complex

This is not the kind of code that should be written over and over again, each time a new database application is developed

It is code that insures the integrity of the database regardless of the applications that use that database And so until that code

is written by vendors into their DBMS products, it is code that should exist as an interface between applications and the DBMS that manages the database—a single codebase used by multiple applications, developed and maintained independently of the applications that will use it A codebase which plays this role

is sometimes called a data access layer or a persistence and query service framework

So we have concluded that the best way to provide temporal functionality for databases managed with today’s DBMSs, and accessed with today’s SQL, is to encapsulate that complexity Asserted Versioning does this In doing so, it also provides an enterprise solution to the problem of managing temporal data, thus supporting both the semantic and physical interoperability

of temporal data across all the databases in the enterprise Asserted Versioning encapsulates the design, maintenance and querying of both uni-temporal and bi-temporal data Design encapsulation means that data modelers do not have to design temporal data structures Instead, declarative specifications replace that design work These declarations specify, among other things, which entities in a logical data model are to become bi-temporal tables when physically generated, which column or columns constitute business keys unique to the object represented, and between which pairs of tables there will exist a temporal form of referential integrity

Maintenance encapsulation and query encapsulation mean,

as we indicated earlier, that inserts, updates and deletes to bi-temporal tables, and queries against them, are simple enough that anyone who could write them against non-temporal tables could also write them against Asserted Versioning’s temporal tables Maintenance encapsulation, in the Asserted Versioning Framework (AVF) we are developing, is provided by an API, Calls

to which may be replaced by native SQL issued directly to a

xii PREFACE

Tiêu đề	Managing time in relational databases
Tác giả	Tom Johnston, Randall Weis
Thể loại	Book
Năm xuất bản	2010
Thành phố	Amsterdam

Định dạng
Số trang	20
Dung lượng	303,34 KB