Essentially, SQL Server has to physically order those millions of records to align with the definition of the clustered index.. When you create a clustered index, the database engine tak
Trang 1Also, it should be noted that users will be unable to connect to the Book_List
table for the duration of the index build Essentially, SQL Server has to physically order those millions of records to align with the definition of the clustered index Let's see what the index took out of my hide by way of space The former index space for this table was 8K and data space was over 3 Gig What does
sp_spaceused tell me now? See Figure 4.17
Figure 4.17: Building the clustered index has increased the index_size to 5376KB
An increase in index_size to 5376K does not seem too significant When you create a clustered index, the database engine takes the data in the heap (table) and physically sorts it In the simplest terms, both a heap and a clustered table (a table with a clustered index) both store the actual data, one is just physically sorted So,
I would not expect that adding a clustered index for the Read_ID column to cause much growth in index_size
However, while the data size and index size for the Book_List table did not grow significantly, the space allocated for the database did double, as you can see from Figure 4.18
Trang 2Figure 4.18: Creating the clustered index caused the data file to double in size
So not only did the index addition take the table offline for the duration of the build, 12 minutes, it also doubled the space on disk The reason for the growth is that SQL Server had to do all manner of processing to reorganize the data from a heap to a clustered table and additional space, almost double, was required to accommodate this migration from a heap table to a clustered table Notice, though, that after the process has completed there is nearly 50% free space in the expanded file
The question remains, did I benefit from adding this index, and do I need to add any covering non-clustered indexes? First, let's consider the simple query shown in Listing 4.2 It returns data based on a specified range of Read_ID values (I know I have a range of data between 1 and 2902000 records)
Select book_list.Read_ID,
Trang 3book_list.Person
from book_list
where Read_Id between 756000 and 820000
Listing 4.2: A query on the Read_ID column
This query returned 64,001 records in 2 seconds which, at first glance, appears to
be the sort of performance I'd expect However, to confirm this, I need to examine the execution plan, as shown in Figure 4.19
Figure 4.19: Beneficial use of clustered index for the Book_list table
You can see that an Index Seek operation was used, which indicates that this index has indeed served our query well It means that the engine was able to retrieve all
of the required data based solely on the key values stored in the index If, instead,
I had seen an Index Scan, this would indicate that the engine decided to scan every single row of the index in order to retrieve the ones required An Index Scan is similar in concept to a table scan and both are generally inefficient, especially when dealing with such large record sets However, the query engine will sometimes choose to do a scan even if a usable index is in place if, for example, a high percentage of the rows need to be returned This is often an indicator of an inefficient WHERE clause
Let's say I now want to query a field that is not included in the clustered index, such as the Read_Date I would like to know how many books were read on July 24th of 2008 The query would look something like that shown in Listing 4.3 Select count (book_list.Read_ID),
book_list.Read_Date
from book_list
where book_list.Read_Date between '07/24/2008 00:00:00'
and '07/24/2008 11:59:59'
Group By book_list.Read_Date
Listing 4.3: A query that is not covered by the clustered index
Trang 4Executing this query, and waiting for the results to return, is a bit like watching paint dry or, something I like to do frequently, watching a hard drive defragment
It took 1 minute and 28 seconds to complete, and returned 123 records, with an average count of the number of books read on 7/24/2008 of 1000
The execution plan for this query, not surprisingly, shows that an index scan was utilized, as you can see in Figure 4.20
Figure 4.20: Clustered index scan for field with no index
What was a bit surprising, though, is that the memory allocation for SQL Server shot up through the roof as this query was executed Figure 4.21 shows the memory consumption at 2.51G which is pretty drastic considering the system only has 2G of RAM
Figure 4.21: Memory utilization resulting from date range query
The reason for the memory increase is that, since there was no available index to limit the data for the query, SQL Server had to load several million records into the buffer cache in order to give me back the 123 rows I needed Unless you have enabled AWE, and set max server memory to 2G (say) less than total server memory (see memory configurations for SQL Server in Chapter 1), then the server is going to begin paging, as SQL Server grabs more than its fair share of memory, and thrashing disks This will have a substantial impact on performance
If there is one thing that I know for sure with regard to SQL Server configuration
Trang 5has completed many minutes ago, my SQL Server instance still hovers at 2.5G of memory used, most of it by SQL Server
It's clear that I need to create indexes that will cover the queries I need to run, and
so avoid SQL Server doing such an expensive index scan I know that this is not always possible in a production environment, with many teams of developers all writing their own queries in their own style, but in my isolated environment it is an attainable goal
The first thing I need to do is restart SQL Server to get back down to a manageable level of memory utilization While there are other methods to reduce the memory footprint, such as freeing the buffer cache (DBCC DROPCLEANBUFFERS), I have the luxury of an isolated environment and restarting SQL Server will give me a "clean start" for troubleshooting Having done this, I can add two non-clustered indexes, one which will cover queries on the Book field and the other the Read_Date field
Having created the two new indexes, let's take another look at space utilization in the Book_List table, using sp_spaceused, as shown in Figure 4.22
Figure 4.22: Increased index size for 2 non clustered indexes
The index_size has risen from 5MB to 119MB, which seems fairly minimal, and
an excellent trade-off assuming we get the expected boost in the performance of the read_date query
If you are a DBA, working alongside developers who give you their queries for analysis, this is where you hold your breath Breath held, I click execute And … the query went from 1 minute 28 seconds to 2 seconds without even a baby's burp
in SQL Server memory The new execution plan, shown in Figure 4.23, tells the full story