1. Trang chủ
  2. » Công Nghệ Thông Tin

data warehousing architecture andimplementation phần 4 ppsx

30 221 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 30
Dung lượng 419,64 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Existing operational systems are the source of internal warehouse data.. Improvement Areas in Operational Systems Data warehousing, however, does highlight areas in existing systems whe

Trang 1

Hardware or Operating System Platforms

The following evaluation criteria can be applied to hardware and operating system platforms:

Scalability The warehouse solution can scale up in terms of space

and processing power This scalability is particularly important if the warehouse is projected to grow at a rapid rate

Financial stability The product vendor has proven to be a strong

and visible player in the hardware segment, and its financial

performance indicates growth or stability

Price/performance The product performs well in a

price/performance comparison with other vendors of similar

products

Delivery lead time The product vendor can deliver the hardware

or an equivalent service unit within the required time frame If the unit is not readily available within the same country, there may be delays due to importation logistics

Reference sites The hardware vendor has a reference site that is

using a similar unit for the same purpose The warehousing team can either arrange a site visit or interview representatives from the site visit Alternatively, an onsite test of the unit can be conducted,

especially if no reference is available

Availability of support Support for the hardware and its

operating system is available, and support response times are within the acceptable down time for the warehouse

How Does Data Warehousing Affect My Existing Systems?

Existing operational systems are the source of internal warehouse data Extractions can take place only during the batch windows of the operational systems, typically after office hours If batch windows are sufficiently large, warehouse-related activities will have little or no disruptive effects on normal, day-to-day operations

Improvement Areas in Operational Systems

Data warehousing, however, does highlight areas in existing systems where improvements can be made to operational systems, particularly in two areas:

Trang 2

Missing data items Decisional information needs almost always

require the collection of data that are currently outside the scope of the existing systems If possible, the existing system are extended to support the collection of such data The team will have to study

alternatives to data collection if the operational systems cannot be modified (for example, if the operational system is an application package whose warranties will be void if modifications are made)

Insufficient data quality The data warehouse efforts may also

identify areas where the data quality of the operational systems can

be improved This is especially true for data items that are used to uniquely identify customers, such as social security numbers

The data warehouse implementation team should continuously provide constructive feedback regarding the operational systems Easy

improvements can be quickly implemented, and improvements that require significant effort and resources can be prioritized during IT planning

By ensuring that each rollout of a data warehouse phase is always

accompanied by a review of the existing systems, the warehousing team can provide valuable inputs to plans for enhancing operational systems

Data Warehousing and Its Impact on Other Enterprise Initiatives

By its enterprise-wide nature, a data warehousing initiative will naturally have an impact on other enterprise initiatives, two of which are discussed below

How Does Data Warehousing Tie In with BPR?

Data warehousing refers to the gamut of activities that support the

decisional information requirements of the enterprise BPR is "the radical redesign of strategic and value-added processes—and the systems, policies, and organizational structures that support them—to optimize the work flows and productivity in an organization."

Most BPR projects have focused on the optimization of operational business processes Data warehousing, on the other hand, focuses on optimizing the decisional (or decision-making) processes within the enterprise It can be said that data warehousing is the technology enabler for reengineering decisional processes

Trang 3

The ready availability of integrated data for corporate decision-making also has implications for the organizational structure of the enterprise Most organizations are structured or designed to collect, summarize, report, and direct the status of operations (i.e., there is an operational monitoring purpose) The availability of integrated data at different levels of detail may encourage a flattening of the organization structure

Data warehouses also provide the enterprise with the measures for gauging competitive standing The use of the warehouse leads to insights as to what drives the enterprise These insights may quickly lead to business process reengineering initiatives in the operational areas

How Does Data Warehousing Tie In with Intranets?

The term intranet refers to the use of Internet technologies for internal

corporate networks Intranets have been touched as cost-effective,

client/server solutions to enterprise computing needs Intranets are also popular due to the universal, easy-to-learn, easy-to-use front-end, i.e., the web browser

The web-publishing nature of the Internet, and the browser's metaphor of searching for information, are consistent with the data warehouse's

querying metaphor The availability of many web-based tools that draw their data from relational database structures has naturally encouraged the use of web technology as a means for delivering warehouse data to

end-users

A data warehouse with a web-enabled front-end therefore provides

enterprises with interesting options for intranet-based solutions

With the introduction of technologies that enable secure connections over the public Internet infrastructure, enterprises now also have a

cost-effective way of distributing or delivering warehouse data to users in multiple locations

When Is a Data Warehouse Not Appropriate?

Not all organizations are ready for a data warehousing initiative Below are two instances when a data warehouse is simply inappropriate

Trang 4

When the Operational Systems Are Not Ready

The data warehouse is populated with information primarily from the operational systems of the enterprise A good indicator of operational system readiness is the amount of IT effort focused on operational

systems

A number of telltale signs indicate a lack of readiness These include the following:

Many new operational systems are planned for development

or are in the process of being deployed Much of the

enterprise's IT resources will be assigned to this effort and will therefore not be available for data warehousing projects

Many of the operational systems are legacy applications that require much firefighting The source systems are brittle or

unstable and are candidates for replacement IT resources are also directed at fighting operational system fires

Many of the operational systems require major enhancements and must be overhauled If the operational systems require

major enhancements, then chances are these systems do not

sufficiently support the day-to-day operations of the enterprise Again, IT resources will be directed to enhancement or replacement efforts Furthermore, deficient operational systems almost always fail

to capture all the data required to meet the decisional information needs of business managers

Regardless of the reason for a lack of operational system readiness, the

bottom line is simple: an enterprise-wide data warehouse is out of the

question due to the lack of adequate source systems However, this does not preclude a phased data warehousing initiative, as illustrated in Figure 4-2

Trang 5

Figure 4-2 Data Warehouse Rollout Strategy

The enterprise may opt for an interleaved deployment of systems A series

of projects can be conducted, where a project to deploy an operational system is followed by a project that extends the scope of the data

warehouse to encompass the newly stabilized operational system

The main focus of the majority of IT staff remains on deploying the

operational systems However, a data warehouse scope extension project

is initiated as each operational system stabilizes This project extends the data warehouse scope with data from each new operational system

Note, however, that this approach may create unrealistic end-user

expectations, particularly during earlier rollouts The scope and strategy should therefore be communicated clearly and consistently to all users Most, if not all, business users will understand that enterprise-wide views of data are not possible while most of the operational systems are not feeding the warehouse

When the Need Is Operational Integration

Despite its ability to provide integrated data for decisional information

needs, a data warehouse does not in any way contribute to meeting the

operational information needs of the enterprise Data warehouses are

refreshed at best on a daily basis They do not integrate data quickly enough or often enough for operational management purposes

If the enterprise needs operational integration, then the typical data

warehouse deployment (as shown in Figure 4-3 ) is insufficient

Trang 6

Figure 4-3 Traditional Data Warehouse Architecture

Instead, the enterprise needs an Operational Data Store and its

accompanying front-end applications As mentioned in Chapter 1 , flash monitoring and reporting tools are often likened to a dashboard that is constantly refreshed to provide operational management with the latest information about enterprise operations Figure 4-4 illustrates the

Operational Data Store architecture

Trang 7

Figure 4-4 The Data Warehouse and the Operational Data

corresponds to data as of a specific point in time

How Do I Manage or Control a Data Warehouse Initiative?

There are several ways to manage or control a data warehouse project Note that most of the techniques described below are useful in any technology project

Milestones Clearly defined milestones provide project management and the Project

Sponsor with regular checkpoints to track the progress of the data warehouse

development effort Milestones should be far enough apart to show real progress, but not

so far apart that senior management becomes uneasy or loses focus and commitment In general, one data warehouse rollout should be treated as one project, lasting anywhere between three to six months

Trang 8

Incremental Rollouts, Incremental Investments Avoid biting off more than you can

chew; projects that are gigantic leaps forward are more likely to fail Instead, break up the data warehouse initiative into incremental rollouts By doing so, you give the warehouse team manageable but ambitious targets and clearly defined deliverables

Applying a phased approach also has the added benefit of allowing the Project Sponsor and the warehousing team to set priorities and manage end-user expectations The benefits of each rollout can be measured separately, and the data warehouse is justified on a

phase-per-phase basis

A phased approach, however, requires an overall architect so that each phase also lays the foundation for subsequent warehousing efforts, and earlier investments remain intact

Clearly Defined Rollout Scopes To the maximum extent possible, clearly define the

scope of each rollout to set the expectations of both senior management and warehouse end-users Each rollout should deliver useful functionality As in most development projects, the project manager will be walking the fine line between increasing the scope to better meet user needs and ruthlessly controlling the scope to meet the rollout deadline

Individually Cost-Justified Rollouts The scope of each rollout determines the

corresponding rollout cost Each rollout should be cost-justified on its own merits to ensure appropriate return on investment However, this practice should not preclude long-term architectural investments that do not have an immediate return in the same rollout

Plan to Have Early Successes Data warehousing is a long-term effort that must have

early and continuous successes that justify the length of the journey Focus early efforts on areas that can deliver highly visible success, and that success will increase organizational support

Plan to be Scalable Initial successes with the data warehouse will result in a sudden

demand for increased data scope, increased functionality, or both! The warehousing environment and design must both be scalable to deal with increased demand as needed

Reward your Team Data warehousing is hard work, and teams need to know their work

is appreciated A motivated team is always an asset in long-term initiatives

In Summary

The Chief Information Officer (CIO) has the unenviable task of juggling the limited IT resources of the enterprise He or she makes the resource assignment decisions that determine the skill sets of the various IT project teams

Trang 9

Unfortunately, data warehousing is just one of the many projects on the CIO's plate If the enterprise is still in the process of deploying operational system, data warehousing will naturally be at a lower priority

CIOs also have the difficult responsibility of evolving the enterprise's IT architecture They must ensure that the addition of each new system, and the extension of each existing system, contributes to the stability and resiliency of the overall IT architecture

Fortunately, data warehouse and operational data store technologies allow CIOs to migrate reporting and analytical functionality from legacy or

operational environments, thereby creating a more robust and stable computing environment for the enterprise

Trang 10

Chapter 5 The Project Manager

The warehouse Project Manager is responsible for any and all technical activities related to planning, designing, and building a data warehouse Under ideal circumstances, this role is fulfilled by internal IT staff It is not unusual, however, for this role to be outsourced, especially for early or pilot projects, because warehousing technologies and techniques are

so new

How Do I Roll Out a Data Warehouse Initiative?

If you are starting a data warehouse initiative, there are three main things to keep in mind Always start with a planning activity Always implement a pilot project as your "proof of concept." And, always extend the functionality of the warehouse in an iterative manner

Start with a Data Warehouse Planning Activity

The scope of a data warehouse varies from one enterprise to another The desired scope and scale are typically determined by the information requirements that drive the

warehouse design and development These requirements, in turn, are driven by the business context of the enterprise—the industry, the fierceness of competition, and the state of the art in industry practices

Regardless of the industry, however, it is advisable to start a data warehouse initiative with a short planning activity The Project Manager should launch and manage the activities listed below

Decisional Requirements Analysis Start with an analysis of the decision support

needs of the organization The warehousing team must understand the user requirements and attempt to map these to the data sources available The team also designs potential queries or reports that can meet the stated information requirements

Note that unlike system development projects for OLTP applications, the information needs of decisional users cannot be pinned down and are frequently changing The Requirements Analysis team should therefore gain enough of an understanding of the business to be able to anticipate likely changes to end-user requirements

Decisional Source System Audit Conduct an audit of all potential sources of data This

crucial and very detailed task verifies that data sources exist to meet the decisional information needs identified during requirements analysis There is no point in designing a warehouse schema that cannot be populated because of a lack of source data

Trang 11

Similarly, there is no point in designing reports or queries when data are not available to generate them Log all data items that are currently not supported or provided by the operational systems and submit these to the CIO as inputs for IT planning

Logical and Physical Warehouse Schema Design (Preliminary) The results of

requirements analysis and source system audit serve as inputs to the design of the warehouse schema The schema details all fact and dimension tables and fields, as well as the data sources for each warehouse field The preliminary schema produced as part of the warehousing planning activity will be progressively refined with each rollout of the data warehouse

The goal of the team is to design a data structure that will be resilient enough to meet the constantly changing information requirements of warehouse end-users

Other Concerns The three tasks described above should also provide the warehousing

team with an understanding of:

• the required warehouse architecture;

• the appropriate phasing and rollout strategy; and

• the ideal scope for a pilot implementation

The data warehouse plan must also evaluate the need for an ODS layer between the operational systems and the data warehouse

You can find additional information on the above activities in Part III, Process

Implement a Proof-of-Concept Pilot

Start with a pilot implementation as the first rollout for data warehousing Pilot projects have the advantage of being small and manageable, thereby providing the organization with a data warehouse "proof of concept" that has a good chance of success

Determine the functional scope of a pilot implementation based on two factors:

• The degree of risk the enterprise is willing to take The project difficulty

increases as the number of source systems, users, and locations increases Politically sensitive areas of the enterprise are also very high risk

• The potential for leveraging the pilot project Avoid constructing a

"throwaway" prototype for the pilot project The pilot warehouse must have actual value to the enterprise Figure 5-1 is a matrix for assessing the pilot project

Trang 12

Figure 5-1 Selecting Pilot Projects: Risk vs Reward

Avoid high-risk projects with very low reward possibilities Ideally, the pilot project has low

or manageable risk factors but has a highly visible impact on the way decisions are made

in the enterprise An early and high-profile success will increase the grassroots support of the warehousing initiative

Extend Functionality Iteratively

Once the warehouse pilot is in place and is stable, implement subsequent rollouts of the data warehouse to continuously layer new functionality or extend existing warehousing functionality on a cost-justifiable, prioritized basis, illustrated by the diagram in Figure 5-2

Trang 13

Figure 5-2 Iterative Extension of Functionality, i.e.,

Evolution

Top-Down Drive all rollouts by a top-down study of user requirements Note that

decisional requirements are subject to constant change; the team will never be able to fully document and understand the requirements, simply because the requirements change as the business situation changes Don't fall into the trap of wanting to analyze everything to extreme detail (i.e., analysis paralysis)

Bottom-Up While some team members are working top-down, other team members

are working bottom-up The results of the bottom-up study serve as the reality check for the rollout—some of the top-down requirements will quickly become unrealistic, given the state and contents of the intended source systems End users should be quickly informed

of limitations imposed by source system data to properly manage their expectations

Back-End Each rollout or iteration extends the back-end (i.e., the server component) of

the data warehouse Warehouse subsystems are created or extended to extract, transform, clean, and integrate more data Warehouse data structures are extended to support a larger scope of data Aggregate records are computed and loaded Metadata records are populated as required

Front-End The front-end (i.e., client component) of the warehouse is also extended by

deploying the existing data access and retrieval tools to more users and by deploying new tools (e.g., data mining tools, new decision support applications) to warehouse users The availability of more data implies that new reports and new queries can be defined

Trang 14

How Important Is the Hardware Platform?

Although the mainframe environment is also used as a data warehouse platform, data warehousing hardware discussions typically revolve around two main types of hardware technologies: symmetric multiprocessing (SMP) and massively parallel processing (MPP) servers

SMPs Symmetric multiprocessing (SMP) hardware has multiple processors that share

one memory (see Figure 5-3) This type of architecture is often referred to as the "Shared Everything" architecture When additional computing power is required, additional CPUs are added to the machine (although there is a limit to the number) or several SMP machines are clustered together

Figure 5-3 SMP Hardware Configuration

MPPs In contrast, massively parallel processing (MPP) hardware supports multiple

nodes, where each node has one or more processors, each with its own memory (see

Figure 5-4) Additional nodes can be added to increase processing power

Trang 15

Figure 5-4 MPP Hardware Configuration

The choice between SMP and MPP is influenced by a number of factors, including the complexity of the query environment, the price/performance ratio, the proven processing capacity of the hardware platform with the target RDBMS, the anticipated warehouse applications, and the foreseen increases in warehouse size and users

For example, complex queries that involve multiple table joins might realize better performance with an MPP configuration MPPs, though, are generally more expensive Clustered SMPs may provide a highly scalable implementation with better

price/performance benefits

What Technologies Are Involved?

Several types of technologies are used to make data warehousing possible These

technology types are enumerated briefly below You can find more information in Part 4, Technology

• Source systems The operational systems of the enterprise are the most likely

source systems for a data warehouse The warehouse may also make use of external data sources from third parties

Ngày đăng: 14/08/2014, 06:22

TỪ KHÓA LIÊN QUAN