Beginning Database Design- P12 pot

This chapter describes various factors affecting database performance tuning, as applied to different database model types.. In this chapter, you learn about the following:❑ Factors affe

Trang 1

Building Fast-Perfor ming

Database Models

“Man is a slow, sloppy, and brilliant thinker; the machine is fast, accurate, and stupid.”

(William M Kelly)

Man is ingenious Only by instructing a database properly will it perform well.

Much of the information in this chapter has already been discussed, perhaps even analyzed, and often alluded to, in previous chapters of this book This chapter is intended to take everything you have learned so far (all the theory) and begin the process of putting it into practice This chapter describes various factors affecting database performance tuning, as applied to different database model types Anything obviously repeated from a previous chapter should be considered as being doubly significant, with respect to database modeling Database performance is the most important factor as far as any database or database model is concerned If performance is not acceptable, your database model does not service the end-users in an acceptable manner End-users are your clients and thus the most important people in your life — at least as far as your database is concerned

A client can be a direct customer or an indirect customer An indirect client could be someone else’s customer — your customer’s client.

Previous chapters have explained database modeling theory This chapter forms a bridge between database modeling and related theoretical concepts described in the previous chapters, and a large ongoing case study in chapters to follow this chapter The following chapters dig into the practical aspects of database modeling by describing and demonstrating the full process of thinking about, analyzing, designing and building a database model in a real world environment

By the end of this chapter, you should have a slightly less theoretical and slightly more real-world picture of database modeling techniques

Trang 2

In this chapter, you learn about the following:

❑ Factors affecting tuning of different types of database models

❑ All the detail on writing efficient queries

❑ Helping database performance by using appropriate indexing

❑ Views

❑ Application caching

The Needs of Different Database Models

Performance tuning of different database model types depends solely on what the database is servicing,

in terms of applications connected to that database All the theory about tuning database models has been discussed in previous chapters Essentially, everything needs to be tied together All the theory you have so far been bombarded with is now explained from the point of view of why and how it used Different database model types are tuned in different ways In general, a database model can be tuned based on what its dependant applications require It comes down to what the end-users need The two extreme ends of the scale are the OLTP database model and the data warehouse database model The following sections break down the aspects of different types of databases based on the performance survival needs of the different database model types

Factors Affecting OLTP Database Model Tuning

An OLTP database services the Internet The primary characteristics of OLTP databases are as follows:

❑ Large user population — OLTP databases have an immeasurably large user population, all trying

to get at the same information at once

❑ Very high concurrency — Concurrency implies a very high degree of sharing of the same

information

❑ Large database size — OLTP databases have small to large databases, depending on application

type and user population A large globally available online book retailer might have a multitude

of servers all over the world A site advertising local night spots for only a single city, in a spe-cific country, has local appeal and, thus, potentially far less information

❑ Reaction time — Real-time, instantaneous reaction to database changes and activities are essential If

you withdraw cash from an ATM at your bank and then check your statement online in an hour or

so, you would expect to see the transaction Similarly, if you purchase something online, you would hope to see the transaction on your credit card account within minutes, if not seconds

❑ Small transactions — Users retrieve single records or very small joins.

❑ Granularity — Many OLTP database models are highly normalized structures, but this is often

a mistake OLTP databases allow access to small chunks of data; however, the problem is that sometimes those small chunks of data can actually equate to large multiple table joins caused by excessive normalization If a table structure is normalized to the point of catering for all business rules in the table structure, performance problems may well arise, even for users seeking to view

Trang 3

10 to 20 records on a single screen A prime example of this is a user logging onto a bank account

a getting bank statement If all the information on a single sheet of paper (a short Web page) is in

a multitude of tables, that user could become seriously irritated with all the data glued together (if it takes more than seven seconds for a response) Thousands of other users could be accessing the same data at the same time

❑ Manageability — This is usually possible but quite often difficult OLTP database user populations

are generally globally based, round the clock and 365 days a year This can make managing an OLTP database complex and awkward

❑ Service window — As already stated, OLTP databases must be alert, awake, and ready for use

permanently This is an ideal, but many service providers sell themselves based on the ability to provide availability at slightly less than 100 percent Less than 100 percent service time allows for small servicing windows of time

Factors Affecting Client-Server Database Model Tuning

There are plenty of client-server environments servicing small numbers of users in the range of tens of users or even less The primary characteristics of client-server databases are as follows:

❑ Small user population — A company can be small or large, on local- or wide-area networks.

Predicting and measuring internal company use is much easier than trying to cater to OLTP database capacity requirements

❑ Low level of concurrency — Company-wide client-server databases have measurable user

popula-tions These populations can be extremely small or relatively large, but it is a quantifiable service requirement because of being a measurable user population OLTP database requirements are actually quantifiable; however, for OLTP databases, user populations are immeasurably larger, but OLTP database use can often have sudden increases (or decreases), even occasional massive spikes (jumps in end-users) Client-server database concurrency levels are much more predictable than OLTP databases Predictability implies the ability to prepare for and cater to application requirements more easily

❑ Database size — Client-server databases are usually small in size Anything too large, and a

client-server architecture simply won’t be able to cope with requirements One solution to over use of client-server architectures is extremely costly hardware At that stage, costs can probably

be reduced by implementing OLTP and data warehouse architectural approaches

❑ Reaction time — Client-server reaction times are generally acceptable as real-time for single

record user interface actions, and perhaps minutes for reporting requirements

❑ Small and large transactions — Client-server environments combine both small and large

transac-tions in the form of user interface connectivity to data, plus reporting needs, which are small enough to manage at the same time This type of service is possible because both user popula-tion numbers and concurrency requirement levels are low

❑ Granularity — All items of data are often relatively small and table structures can be more

math-ematical in nature Client-server databases can even incorporate large quantities of business rule structure into table structures by utilizing very high levels of normalization, beyond 3NFs

Trang 4

Once again application of high-level normalization is, in my opinion, often more mathematical than

practical Let applications do the number crunching and leave the database to store the data Don’t put too much processing into the database It is quite possible, but can become very complicated to manage, change, and administer Modern application SDKs are more than capable of intense processing and

number crunching The purpose of a relational database is to store and apply structure to data Object databases manage processing inside database objects well Relational databases do not!

❑ Manageability — Data is fairly easily manageable not only because parameters are small and

quantifiable but also because everyone goes home at night, giving plenty of down time for maintenance

❑ Service window — See this same explanation in the previous section, “Factors Affecting OLTP

Database Model Tuning.”

Factors Affecting Data Warehouse Database

Model Tuning

Data warehouses are all about seriously large amounts of data and a very few — often very technically challenging — application environments:

❑ Minimal user population — Administrators, developers, and analytical-type end-users typically

access data warehouses Those analytical end-users are usually knowledgeable and executive or middle-management level One of the primary purposes of storing lots and lots of old data in a data warehouse is to help with forecasting for the future This type of user population finds this type of information extremely useful

❑ Very low concurrency — There is very little data sharing in a data warehouse Most activity is

read-only, or bulk updates to fact tables, when the database is not being used for reporting and analysis Concurrency is not really an issue

❑ Frightening database size — Data warehouses can become incredibly large Administrators and

developers must decide how much detail to retain, when to remove data, when to summarize, and what to summarize A lot of these decisions are done during production when the data ware-house is in use It is very difficult to predict what will be needed in design and development phases Ad-hoc queries can cause serious problems if a data warehouse is very large User educa-tion in relaeduca-tion to how to code proper joins may be essential; otherwise, provision of efficiency providing structures such as pre-built joins and aggregations in materialized views can help

Materialized views copy data, allowing access to physical copies of data and avoiding underlying table access, expensive joins, and aggregations A relational database allowing use of materialized views uses something called query rewrite Query rewrite is where requested access to a table in a query, is poten-tially replaced with access to a much smaller, and more efficient materialized view I/O and processing activity are substantially reduced Query performance is helped enormously.

❑ Reaction time — Data warehouse reaction times are acceptable as hours and perhaps even longer.

Reaction times depend on various factors, such as data warehouse database physical size, com-plexity of end-user reporting and analytical requests, granularity of data, and general end-user understanding of the scale of data warehouses

Trang 5

❑ Incredibly large transactions — Users retrieve large amounts of data, using both simple reporting

and highly complex analytical techniques The fewer tables in joins, the better Updates are best performed periodically in large batch operations

❑ Very low granularity — A star schema is the best route to adopt for a data warehouse because it

minimizes on the potential numbers of tables in joins A star schema contains a single large fact table connected to a single layer of very small, descriptive, static dimensional tables Very small tables can be joined with a single very large table fairly efficiently When joins involve more than one very large table, serious performance problems can arise

❑ Very demanding manageability — Because of their size, extremely large databases can become

difficult to manage The larger a database becomes, the more time and hardware resources needed to use and alter that data Demanding manageability is gradually replaced with more sophisticated means of handling sheer database sized, such as hugely expensive hardware and special tricks (such as clustering, partitioning, parallel processing, and materialized views) Data warehouses are, more often than not, largely read-only structures This gives far more flexibility, allowing for more available options to cope with a very demanding physical database size

❑ Service window — Data warehouse service windows are generally not an issue because end-user

usage is driven by occasional bursts of furious I/O activity, but generally not constant usage as with an OLTP database Most I/O activity is read-only This, of course, depends on the real-time capability of a data warehouse Real-time reporting requirements in a data warehouse complicate everything substantially, requiring constant real-time updating

One way to alleviate performance issues with data warehouses is the use of data marts A data mart is a subsection of a larger single data warehouse A large data warehouse can consist of a number of very large fact tables, linked to the same dimensions A data mart can be pictured as a single large fact table (perhaps one or two fact table star schemas) linked to a single set of dimensions.

Understanding Database Model Tuning

The biggest problem with database model tuning is that it really must be done during the design phase, and preferably before any development is complete This is the case as far as tables and their inter-relationships are concerned Data warehouses are largely read-only and are not as restrictive with production-phase changes Data warehouses are mostly read-only type environments Read-only environments can take advantage of specialized database structures, which overlay, duplicate, and summarize data in tables Materialized views are used extensively in data warehouses and even some OLTP databases A materialized view allows for copying of table data, either as individual tables or joins The result is a physical copy of data Queries then execute against the materialized view copy, which is built based on the requirements of a single query or a group of queries The result is better performance Tuning a database model is the most difficult and expensive option because SQL code depends on the structure of the underlying database model; extensive application code changes can result The database model underpins and supports everything else Changes to a database model can cause major application changes, obviously applying after development of application code The point is that database model tuning changes (such as changes to underlying tables) can affect everything else Changing everything from database model up is very expensive because everything is dependent on the database model Everything must be changed This is why it is so important to get the database model correct before development begins Unfortunately, we don’t live in an ideal world, but we can strive for it Big changes

to database model table structure can often result in what amounts to full rewrites of application software

Trang 6

An effective way to performance-tune a database model after development is complete, is the creation of alternate indexing Stored procedures can also help by compartmentalizing, speeding up and organizing what already exists

When it comes to database model tuning, at the worst and most expensive end of the scale are

normalization, denormalization, changes to referential integrity and table structure, and anything else that changes the basic table structure At best, and with minimal intrusion on existing tables and relationships, alternate indexing, materialized views, clustering, and other such tricks, can help to enhance a database model, without messing around with critical underlying table structure Database objects such as

materialized views and clustering help to circumvent table changes by creating copies and overlays of existing table structures, without affecting those existing tables, and obviously avoiding changes to any dependent application coding already written, tested, debugged, and in general use in a production environment The down side to overlaying and copying is that there is a limit to how many things such

as materialized views that can be created Too much can hinder rather than help performance

So, now you know why OLTP databases need less granularity, some denormalization, and small quantities

of data The same applies with the other extreme in that data warehouses need highly denormalized (simple) table structures to minimize table numbers in join queries, thus not severely impeding data warehouse reporting performance

Writing Efficient Queries

Efficient SQL code is primarily about efficient queries using the SELECTcommand The SELECTcommand allows use of a WHEREclause, reducing the amount of data read The WHEREclause is used to return (or not return) specific records The UPDATEand DELETEcommands can also have a WHEREclause and, thus, they can also be performance-tuned with respect to WHEREclause use, reducing the amount of data accessed

Performance tuning of the INSERTcommand to add records to a database is often the job of both

developers and administrators This is because end-users usually add data to a database through the use

of applications Metadata change commands such as (CREATE TABLEand ALTER TABLE) are more

database administration Thus, INSERTcommands and metadata commands and not relevant to this text.

In an OLTP (transactional) database, small transactions and high concurrency are the most important aspects Accuracy of SQL code and matching indexes is critical In data warehouses, large queries and batch updates are prevalent Therefore, in data warehouses, large complex queries are executed against

as few tables as possible, minimizing on the number of tables in join queries Joining too many tables at once in a query can have the most significant impact on query performance of all, in both OLTP and data warehouse databases Data warehouses simply exacerbate problems because of huge data quantities There are some general philosophical rules of thumb to follow when performance-tuning SQL code:

❑ Database model design supports SQL code — The quality of SQL code depends completely on

the quality of database model design, not only from a perspective of correct levels of normalization and denormalization, but also from the point of view of using appropriate structures For example, a data warehouse database model design is needed for a data warehouse because over-normalized, granular, deep normal form tables, often used in OLTP databases, are completely inappropriate to the very large transactions, across many tables, required by data warehouses

Trang 7

❑ The KISS Rule (Keep It Simple and Stupid) — Any type of program code broken into simple,

(preferably independent) pieces is always easier “to wrap your head around.” Simple SQL commands with simple clauses are easy to write and easy to tune Longer and more complicated queries are more difficult to write, and it’s more difficult to get them to produce the proper results Performance tuning is an additional step If you have to tune some big nasty queries because they are running too slow, well, what can I say? If you had kept it simple, making them run faster would probably be a lot easier, and a lot more possible Simplify first if over-complexity is an issue In the very least, simplicity can help you understand precisely what a query is doing, without giving you a headache just staring at lines of meaningless SQL code

❑ Good table structure allows for easy construction of SQL code — Be aware of anything controlling

the way or manner in which SQL code is constructed and written, other than, of course, the database model In an ideal table structure, SQL code should be written directly from those table structures, or as subsets of it, not the other way around Writing SQL code should not

be difficult You should not get a constant impression (a nagging doubt or hunch) that table structure doesn’t quite fit The structure of the database model should make for easy of SQL code construction After all, SQL code supports applications Don’t forget that SQL code rests

on the database table structure If there is any kind of mismatch between application requirements and database structure, there is likely something wrong with the database model Performance-tuning SQL code in a situation such as this will likely be a serious problem

❑ Breaking down into the smallest pieces — Break down the construction of SQL code commands,

such as queries and DML commands (INSERT, UPDATE, and DELETE) Do not break down non-query and non-DML type commands For example, do not continually connect and disconnect from a database for every single code snippet of SQL database access executed Either connect for a period of time, for each user, or connect at the start and end of sessions On the other hand, make extensive use of subqueries if it helps to making coding easier You can always merge sub-queries back into the parent query later on

There are a set of specific ways in which the most basic elements of SQL code can be constructed to ensure good processing performance There are a number of general areas that are important to the most basic rules of query performance tuning Examine how each factor is affected by the underlying structure of the database model:

❑ The SELECTcommand — This includes how many tables are involved in SELECTcommands These factors have a highly significant impact on performance of queries The more granular a database model, the more tables retrieved from at once The manner in which fields are retrieved can also affect performance, but table numbers in joins are more significant, especially in larger databases

❑ The WHEREclause — This includes how records are filtered Comparison conditions dictate that

a WHEREclause is applied to records, such as only to retrieve records with the vowel “a” in an author’s name A comparison condition is the main factor determining the construction of a

WHEREclause There are different types of comparison conditions The manner in which records

The most important thing to remember is that the SQL code, and its potential to exe-cute with acceptable speed, is completely dependant on the underlying structure of

a database model Queries are quite literally constructed from the tables and the relationships between those tables.

Trang 8

are filtered in a query can affect the way in which a query executes The result is a highly signifi-cant impact on performance Indexing has a very signifisignifi-cant affect on how well WHEREclause fil-tering performs

❑ The GROUP BYclause — The GROUP BYclause is used to aggregate records into summarized groups of records retrieved from a database Groupings are best achieved as a direct mapping onto one-to-many relationships between tables Materialized views are commonly used in data warehouse to pre-calculate and pre-store GROUP BYclause aggregations

❑ Joins — A join query retrieves records from multiple tables, joining tables based on related field

values between those tables Typically, relationships are based on referential integrity established between primary and foreign keys, in two tables Perhaps the most significant factor in making queries execute at an acceptable speed is how tables are joined, and how many tables are in joins (as stated previously) When considering a database model design, the more granular a database model is (the more tables you have and the more it is broken down into small pieces), the larger the number of tables will be in join queries In a data warehouse, this is generally much more significant because data warehouses contain huge volumes of data; however, even in OTLP databases, with a multitude of miniscule-sized transactions, large joins with ten or more tables can kill performance just as effectively as more than two large tables for a join query in a data warehouse

Joins are important to performance The database model design can have a most profound effect on join query performance if the database model has too many little-bitty tables (too much granularity or

normalization).

The SELECT Command

The SELECTcommand is used to query the database There are a number of points to remember when intending to build efficient queries:

❑ Querying all fields — Retrieving specific field names is very slightly more efficient than retrieving

all fields using the * character The * character requires the added overhead of metadata interpret-ation lookups into the metadata dictionary — to find the fields in the table In highly concurrent, very busy databases (OLTP databases), continual data dictionary lookups can stress out database concurrency handling capacity Consider the following query:

SELECT NAME FROM AUTHOR;

This is faster than this query:

SELECT * FROM AUTHOR;

❑ Reading indexes — If there is an index, use it Reading field values directly from an index without

reading a table at all is faster because the index occupies less physical space There is, therefore, less I/O activity In the ERD snippet shown in Figure 8-1, reading the EDITIONtable, with the following query should be able to force a direct read of the index because primary keys are automatically indexed The ISBN field is the primary key for the EDITIONtable

SELECT ISBN FROM EDITION;

Trang 9

Figure 8-1: Reading indexes instead of tables.

Not all database engines allow direct index scans, even when a SELECTcommand might encourage it.

❑ Simple aliases — Shorter alias names can help to keep SQL code more easily readable,

particu-larly for programmers in the future having to make changes Maintainable code is less prone to error and much easier to tune properly Consider the following query:

SELECT A.NAME, P.TITLE, E.ISBN FROM AUTHOR A JOIN PUBLICATION P USING (AUTHOR_ID) JOIN EDITION E USING (PUBLICATION_ID);

This is much easier to deal with than this query:

SELECT AUTHOR.NAME, PUBLICATION.TITLE, EDITION.ISBN FROM AUTHOR JOIN PUBLICATION USING (AUTHOR_ID) JOIN EDITION USING (PUBLICATION_ID);

Why? There is less code Less code is easier to handle Less is more in this case

Sale sale_id ISBN (FK) shipper_id (FK) customer_id (FK) sale_price sale_date

Edition ISBN publisher_id (FK) publication_id (FK) print_date pages list_price format

Publisher publisher_id name

publication_id Publication

subject_id (FK) author_id (FK) title

Rank ISBN (FK) rank ingram_units Foreign keys may or

may not be indexed

Trang 10

Filtering with the WHERE Clause

The WHEREclause can be used either to include wanted records or exclude unwanted records (or both) The WHEREclause can be built in specific ways, allowing for faster execution of SQL code Use of the

WHEREclause can be applied to tune SQL statements simply by attempting to match WHEREclause specifications to indexes, sorted orders, and physical ordering in tables In other words, filter according

to how the metadata is constructed

The WHEREclause is used to filter records and can, therefore, be placed in all three of SELECT, UPDATE, and DELETEcommands.

There are numerous points to keep in mind when building efficient filtering:

❑ Single record searches — The best filters utilize a primary key on a single table, preferably

finding a single record only, or a very small number of records This query finds the only author with primary key identifier as 10:

SELECT * FROM AUTHOR WHERE AUTHOR_ID = 10;

❑ Record range searches — Using the >, >=, <, and <=operators executes range searching Range searches are not as efficient as using equality with an = operator A group of rows rather than a single row are targeted Range searching can still use indexing and is fairly efficient This query finds the range of all author records, with identifiers between 5 and 10, inclusive:

SELECT * FROM AUTHOR WHERE AUTHOR_ID >= 5 AND AUTHOR_ID <= 10;

❑ Negative WHEREclauses — Negative filters using NOT, !=, or <>(an operator that is different in different databases) try to find something that is not there Indexes are ignored and the entire table is read This query reads all records, excluding only the record with author identifier as 10:

SELECT * FROM AUTHOR WHERE AUTHOR_ID != 10;

❑ The LIKEoperator — Beware of the LIKEoperator It usually involves a full scan of a table and ignores indexing If searching for a small number of records, this could be extremely inefficient When searching for 10 records in 10 million, it is best to find those 10 records only using something like equality, and not pull them from all 10 million records, because all those 10 million records are read The following query finds all authors with the vowel “a” in their names, reading the entire table:

SELECT * FROM AUTHOR WHERE NAME LIKE ‘%a%’;

❑ Functions in the WHEREclause — Any type of functional expression used in a WHEREclause must

be used carefully Functions are best not to be used where you expect a SQL statement to use

an index In the following query, the function will prohibit use on an index created on the

PRINT_DATEfield:

SELECT * FROM EDITION WHERE TO_CHAR(PRINT_DATE,’DD-MON-YYYY’)=’01-JAN-2005’;

Định dạng
Số trang	20
Dung lượng	542,69 KB