1. Trang chủ
  2. » Luận Văn - Báo Cáo

Architects examination of form and function the dimensional model

23 274 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 23
Dung lượng 242,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with. A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with. A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with.

Trang 2

Table of contents

3.1 The Entity Relationship Model Form 5 3.2 An Organized Performance Architecture Response 6

6.1 Function Limiting Characteristics the Dimensional Form 11

6.1.1 The Dimensional Form Does Not Extend Well 11 6.1.2 The Dimensional Form Is Not Flexible 12 6.1.3 The Form Does Not Describe the Business 13

7.1 Client A 15

7.2 Client B 15

8 System Architecture Form to Fulfill Multiple Functions 21

8.2 Integrating Model Form with Technology Form 22

Trang 3

1 Introduction

"It is the pervading law of all things organic and inorganic, of all things physical and

metaphysical, of all things human and all things superhuman, of all true manifestations of the head, of the heart, of the soul, that the life is recognizable in its expression, that form ever

Louis Sullivan

“Form follows function - that has been misunderstood Form and function should be one,

joined in a spiritual union.”

Frank Lloyd Wright

To be an architect of information solutions is to understand the concept of form following function intuitively, as a matter of nature, because design (creation of form) is about enabling informational function Taking the title ―architect‖ affirms one‘s conscious method design based decision process in terms of aligning form with functional needs

As one examines form‘s relationship to function within the dimensional model, the evaluation

of the model form must not be based solely on Sullivan‘s statement, but on Wright‘s; form not only follows function, but function follows form

The concept of form and function unity highlights that form is not only based on function, but also limits it, many times strictly Form and function are bound together in a cause and effect relationship; function is the cause of the form, while form both facilitates function and limits it When considering the data warehouse function, one considers the overall goal to delivery information, allowing the business to measure its activity and understand the impacts of its actions in the market place This high-level statement of function though, is far too general for the evaluation of model form As will be demonstrated, a more detailed understanding of system functionality is needed before determining model form application

The function-limiting impact of form is often overlooked in design, particularly data model design By implementing a specific design form, are the broader limits on function considered? What system design steps are needed to mitigate those limitations?

Too often data practitioners apply the form they know best, the latest form they‘ve come to appreciate or a form that is deemed a ―best practice‖ in their circles

True architects are not practitioners of ―best practices‖ They practice the application of forms

to function based on principles derived from cause and effect analysis

The architect studies the relationship of form and function, of cause and effect and then applies forms specific to the required functions The architect deals with the complexity of the

Trang 4

2 Model Characteristics

One generally thinks about a model form in terms of certain characteristics Through the

evaluation these characteristics and examination of model form, it becomes evident how they align with, support and limit function in relationship to data and information delivery

o The model‘s ability to extend

 to extend a data model for new content/capability without disruption and redesign of processes

o The model‘s ability to be flexible

 to support multiple purposes or functions

o The model‘s ability to describe the business and subjects within the corporate

structure

 to document the business using data

o The model‘s ability to support any valid business question

 to answer business questions without specific design structuring

 not a matter of ease or performance but a matter of ability

o The model‘s ability to efficiently and quickly answer business questions (report query performance)

 to provide acceptable query performance for corporate decision support and analysis

o The model‘s ability to demonstrate business performance

 to measure business performance

The critical examination of limiting aspects to the dimensional model gives the architect the foundational principles necessary to understand the application of dimensional form in

Information Architecture solutions

Trang 5

3 Dimensional Model Architectural Origins

The dimensional model form is designed to greatly simplify database optimization for queries that would otherwise be applied against an Entity Relationship (ER) model Because the

dimensional model is a design response used to overcome ER form limits, there must first be examination of the ER form and its characteristics as a comparison basis

3.1 The Entity Relationship Model Form

1 To free the collection of relations from undesirable insertion, update and deletion

dependencies;

2 To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs;

3 To make the relational model more informative to users;

4 To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by

— E.F Codd, "Further Normalization of the Data Base Relational Model"

Each of Codd‘s goals not only provides insight to ER model function, but are also instructive as

to the reasons for the dimensional model form

The Data Architect produces an ER model that describes the business through ―Entities‖

representing each of the objects, actors, organizational fictions, contracts, business activities and others in the business landscape If it can be named as a subject, it must be represented

as an entity within the model Each entity is given an identifier known as the primary key

Additional attributes are added to describe only the primary key

Foreign key relationships document each business relationship existing between entities These relationships are instilled in the model logically rather than by direct data association This distinction is fundamental to the examination of the ER and Dimensional Model form

characteristics and its ability to deliver specific functionality

This examination won‘t delve into the application of normalization rules, except to state that many modelers deal with normalization intuitively as a matter of entity definition and

evaluation of attribute when creating the ER model Normalization rules represent a method of thinking regarding the evaluation of data content in model development Normalization

ensures all entities are defined purely and that all business relationships within the model are defined logically rather than by physical association

Trang 6

As one examines Codd‘s goals it is obvious that they align with some of the model

characteristics previously discussed Those characteristics are:

 extensibility

 flexibility

 ability to describe the subject

 ability to support any valid business question

Cobb‘s fourth goal may appear somewhat cryptic, but is central to an architect‘s

understanding of both model forms and support of Codd‘s preceding goals

In a fully normalized model there is no statistical data relationship bias that emphasizes one relationship or eliminates another, because relationships are implemented logically Data that

is not normalized, associates data physically on the same row, creating a bias When data is organized this way, certain questions can be answered, while others cannot

Applying rules of normalization ensures no bias exists for one type of business question or

another

One can ask any valid business question of a normalized model Based on the model‘s

logically implemented relationships, (foreign key) one will always get the answer There is no need to know future questions It will always work if each entity is represented within the model that is germane to the question and each relationship between the entities documented logically As long as one is willing to write the necessary queries and wait, the model will

answer

Therefore, the normalized entity relationship model form is designed for flexibility, to answer any business question It eliminates relationship bias by describing each entity purely and documenting all business relationship logically, providing data relationship neutrality

Extensibility is another outcome of eliminating relational bias, as will be seen later

The normalized form that gives us this functionality also limits function To answer more than simple business questions, complex queries need to be written with many joins that follow relational paths, and identifying specific content within data sets using correlated sub-queries The query may need to do mixed aggregation to common group by levels as well as use outer joins complicating query optimization Temp tables and multiple query steps may need to be used in some cases In data warehousing, all of this complex query optimization results in issues

of access and join serialization in relationship to lots of I/O from large data reading, buffering and sorting

No one wants to wait hours for BI report results In the early days of data warehousing, on at least one RDMBS, the longer the query ran, the more likely it would end in error due to the database‘s concurrency architecture

3.2 An Organized Performance Architecture Response

At the time of Ralph Kimball‘s first edition release of The Data Warehouse Toolkit, most data warehouse servers were hosted on SMP database servers These types of servers do not scale

Trang 7

parallel processing linearly as MPP clusters do, and often led to a variety of very limiting data forms that were intended to improve query performance

The introduction of the dimensional model provided an organized, systematic design basis for

a performance architecture form leading to predictable query optimization

It also addressed another issue at the time; it‘s much simpler to write queries against Hand coding queries against an ER model for any sort of complicated reporting requires a good deal of skill, experience and time While users still need to write manual queries, Business

Intelligence software has diminished that by supporting metadata driven abstraction that interprets the physical data model for the user

When dimensional models are designed properly for reporting they require only selection of attributes and measure required, direct join to dimensions needed, application of WHERE or JOIN filters, appropriate aggregate functions and GROUP BY clauses (and perhaps a HAVING clause.)

Trang 8

4 The Dimension Model Form

Dimensional modeling achieves its performance advantage by designing denormalizations into data organizations specific to answering a limited range of business questions These denormalizations take the form of placing data in physical relationships and eliminating the logical business-based relationships that follow an entity-to-entity-to-entity form, in favor of more direct report grouping reference relationship to business metrics

In other words, the dimensional model form creates explicit relationship biases to simplify

queries, reduce I/O and eliminate query optimization complexity, which delivers answers to business questions efficiently and quickly

The pattern of denormalization follows the form of a central table called a fact table

containing one or more business measurements called facts The facts may be sourced from a variety of transactional and reference sources, all of which may be used in combination to answer certain classes of business questions

The fact table row always has the context of a time period, either date or time together The time period may be either date or higher level time period, such as week, month, quarter or year Facts maybe transactional, a point-in-time snapshot state of metrics or period-based aggregate

The fact table also has foreign key relationship attributes relating the fact rows to reference tables called dimensions Dimensions may represent a single entity identity of data, but

typically contain attributes from, or derived from, multiple entities describing a subject

Typically there is at least one dimension associated with the fact table that has at its basis in on

an entity with a natural business-based relationship to the business activity represented in facts

of the fact table There are usually other dimension relationships that are one or two entities removed from the business activity documented in the fact table There may also be

additional dimensions related to the facts that must be derived by processing other business activity

Keep in mind that if a source does not actually document all of the data relationships, for example the customer‘s origination sales channels, then these relationships must be derived from processing business activity records, such as sales or service orders

One must also build into the process and structure of the star schema all of the complex

processing that would be needed in against the entity relationship model to bring data up to common simplified form, fit to answering functionally similar business questions

The philosophy of the dimensional model is to do all of processing once to form a common basis for a class of business questions or analysis, storing the results of that process in the star schema so that BI queries avoid that complex process at report runtime It is a ‗process once, use it many times‘ approach

The end result should be a star schema capable of delivering measurements based on simple SELECT, JOIN, WHERE and GROUP BY statements

Trang 10

5 The Dimensional Model Function

One concludes that the dimensional form is a performance architecture intended to improve report query performance However so far, a full understanding of why dimensional models perform so well and what limits them has yet to be exposed

The star schema design is created to measure business It is created with a business function orientation, as opposed to the subject area orientation of the ER model

The form is one of centralization of a series of measures (facts) surrounded by attributes gives business context to those measurements

While some consumers may refer to the content as subjects, the real orientation is focused on business reporting and analysis It may be Sales Analysis or Risk Analysis, but these are

organized to support specific business functions and not provide general data as a subject Instead of presenting data as it exists in an ER model, or in the source, data is organized to make decisions

Some of Webster‘s definitions of the word ―Information‖ are:

2 ―INTELLIGENCE, NEWS‖

3 ―FACTS, DATA‖

Architects do not design dimensional models that deliver measurements (facts) randomly as data The purpose is to deliver organized information to the business clients that supports the client‘s business decision making function

To be ―information,‖ measures have to be organized and presented with functional context; without that, it is simply data Providing data is what an ER model does It delivers it without bias It‘s up the consumer to discern how to make it provide information In a dimensional model, much of that work of organizing data as information is performed in advance of the report execution

Therefore, a primary function for which the dimensional form is employed is that of a

performance architecture built upon the direct structuring of information for specific business function

It is important to make this distinction because there are other means of implementing

performance architectures for delivering information that do not rely on data denormalizations

in a database

And, this is not to say that dimensional model content is the final state of the information

organization In systems that employ the dimensional form, it represents the foundational state

of information that is further organized into reporting to deliver KPIs, comparisons, trends,

graphics and other business oriented presentations of information

Trang 11

6 The Limits of Single Form Design

All that has been examined to this point represents the foundation for the remaining

examination

Architects realized that there are limits to form An automobile maker creates a variety of forms for different functional needs Each of those forms has recognizable limits A Freightliner semi-truck with a raised roof sleeper, Hendrickson AIRTEK axels, and front suspensions is

designed for long distance freight hauling in comfort, but it is not functional for the morning commute One might drive it downtown, but the fuel consumption empties the wallet and guarantied, it won‘t fit in the parking garage

Clearly design form has limits The architect‘s role is to understand those design form limits and produce system designs using integrated design forms to fulfill functional requirements

And by form, not only model forms are available for examination, but also a wide variety of technology based design forms as well

6.1 Function Limiting Characteristics the Dimensional Form

The dimensional model is a powerful performance architecture form for the delivery of

information to businesses when properly applied Like the ER form, the dimensional form has limitations in its recognized function

6.1.1 The Dimensional Form Does Not Extend Well

Ability to extend is a relative evaluation comparing one form to another The evaluation is really about how much disruption to process, existing data and retesting is involved in existing implementations

Purveyors of the dimensional model sometimes state that extending the dimensional form is as easy as adding new attributes to dimensions, or new dimensions and dimensional keys to an existing fact table from a specific point in time forward, and backfilling attributes and foreign keys with the standard defaults for NULL or Not Applicable definition

The reality of dimensional model extension is rather different

1 Changes in Processing

Even when this approach can be taken, the addition of new content means there is a change

in existing processing Aside from additional sourcing, the processing typically involves

integration with content sourced from multiple entity sources If the target is an existing fact

Ngày đăng: 03/07/2014, 08:17

TỪ KHÓA LIÊN QUAN