Tài liệu Windows Internals covering windows server 2008 and windows vista- P17 pdf

Most of the time, a file system driver will mark its metadata stream “no write”, which instructs the memory manager’s mapped page writer explained in Chapter 9 to not write the pages to

Trang 1

4 Section Ref 1 Pfn Ref 46 Mapped Views 4

5 User Ref 0 WaitForDel 0 Flush Count 0

6 File Object 86960228 ModWriteCount 0 System Views 0

7 Flags (8008080) File WasPurged Accessed

8 File: \Program Files\Debugging Tools for Windows (x86)\debugger.chw Next look at the file object referenced by the control area with this command:

8 +0x018 BeyondLastByte1 : _LARGE_INTEGER 0x14a

9 +0x020 FileOffset2 : _LARGE_INTEGER 0x14a

1 lkd> dt nt!_SECTION_OBJECT_POINTERS 0x86724504

2 +0x000 DataSectionObject : 0x867548f0

3 +0x004 SharedCacheMap : 0x86b48388

4 +0x008 ImageSectionObject : (null)

Trang 2

1 lkd> !fileobj 0x86960228

2 \Program Files\Debugging Tools for Windows (x86)\debugger.chw

3 Device Object: 0x84a69a18 \Driver\volmgr

12 FsContext: 0x9ae3e768 FsContext2: 0xad4a0c78

13 Private Cache Map: 0x86b48460

14 CurrentByteOffset: 156

15 Cache Data:

16 Section Object Pointers: 86724504

17 Shared Cache Map: 86b48388 File Offset: 156 in VACB number 0

18 Vacb: 84738b30

19 Your data is at: b1e00156

10.5 File System interfaces

The first time a file’s data is accessed for a read or write operation, the file system driver is responsible for determining whether some part of the file is mapped in the system cache If it’s not,

Trang 3

Once a file is set up for cached access, the file system driver calls one of several functions to access the data in the file There are three primary methods for accessing cached data, each intended for a specific situation:

■ The copy method copies user data between cache buffers in system space and a process buffer in user space

■ The mapping and pinning method uses virtual addresses to read and write data directly from and to cache buffers

■ The physical memory access method uses physical addresses to read and write data directly from and to cache buffers

File system drivers must provide two versions of the file read operation—cached and noncached—to prevent an infinite loop when the memory manager processes a page fault When the memory manager resolves a page fault by calling the file system to retrieve data from the file (via the device driver, of course), it must specify this noncached read operation by setting the “no cache” flag in the IRP

Figure 10-10 illustrates the typical interactions between the cache manager, the memory manager, and file system drivers in response to user read or write file I/O The cache manager is invoked by a file system through the copy interfaces (the CcCopyRead and CcCopyWrite paths)

To process a CcFastCopyRead or CcCopyRead read, for example, the cache manager creates a view in the cache to map a portion of the file being read and reads the file data into the user buffer

by copying from the view The copy operation generates page faults as it accesses each previously invalid page in the view, and in response the memory manager initiates noncached I/O into the file system driver to retrieve the data corresponding to the part of the file mapped to the page that faulted

The next three sections explain these cache access mechanisms, their purpose, and how they’re used

10.5.1 Copying to and from the Cache

Trang 4

Because the system cache is in system space, it is mapped into the address space of every process As with all system space pages, however, cache pages aren’t accessible from user mode because that would be a potential security hole (For example, a process might not have the rights

to read a file whose data is currently contained in some part of the system cache.) Thus, user application file reads and writes to cached files must be serviced by kernelmode routines that copy data between the cache’s buffers in system space and the application’s buffers residing in the process address space The functions that file system drivers can use to perform this operation are listed in Table 10-2

You can examine read activity from the cache via the performance counters or system perprocessor variables stored in the processor’s control block (KPRCB) listed in Table 10-3

10.5.2 Caching with the Mapping and Pinning Interfaces

Just as user applications read and write data in files on a disk, file system drivers need to read and write the data that describes the files themselves (the metadata, or volume structure data) Because the file system drivers run in kernel mode, however, they could, if the cache manager were properly informed, modify data directly in the system cache To permit this optimization, the cache manager provides the functions shown in Table 10-4 These functions permit the file system drivers to find where in virtual memory the file system metadata resides, thus allowing direct modification without the use of intermediary buffers

Trang 5

If a file system driver needs to read file system metadata in the cache, it calls the cache manager’s mapping interface to obtain the virtual address of the desired data The cache manager touches all the requested pages to bring them into memory and then returns control to the file system driver The file system driver can then access the data directly

If the file system driver needs to modify cache pages, it calls the cache manager’s pinning services, which keep the pages active in virtual memory so that they cannot be reclaimed

The pages aren’t actually locked into memory (such as when a device driver locks pages for direct memory access transfers) Most of the time, a file system driver will mark its metadata stream “no write”, which instructs the memory manager’s mapped page writer (explained in Chapter 9) to not write the pages to disk until explicitly told to do so When the file system driver unpins (releases) them, the cache manager releases its resources so that it can lazily flush any changes to disk and release the cache view that the metadata occupied The mapping and pinning interfaces solve one thorny problem of implementing a file system: buffer management Without directly manipulating cached metadata, a file system must predict the maximum number of buffers

it will need when updating a volume’s structure By allowing the file system to access and update its metadata directly in the cache, the cache manager eliminates the need for buffers, simply updating the volume structure in the virtual memory the memory manager provides The only limitation the file system encounters is the amount of available memory

You can examine pinning and mapping activity in the cache via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-5

Trang 6

10.5.3 Caching with the Direct Memory Access Interfaces

In addition to the mapping and pinning interfaces used to access metadata directly in the cache, the cache manager provides a third interface to cached data: direct memory access (DMA) The DMA functions are used to read from or write to cache pages without intervening buffers, such as when a network file system is doing a transfer over the network

The DMA interface returns to the file system the physical addresses of cached user data (rather than the virtual addresses, which the mapping and pinning interfaces return), which can then be used to transfer data directly from physical memory to a network device Although small amounts of data (1 KB to 2 KB) can use the usual buffer-based copying interfaces, for larger transfers the DMA interface can result in significant performance improvements for a network server processing file requests from remote systems

To describe these references to physical memory, a memory descriptor list (MDL) is used (MDLs were introduced in Chapter 9.) The four separate functions described in Table 10-6 create the cache manager’s DMA interface

Trang 7

You can examine MDL activity from the cache via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-7

10.6 Fast I/O

Whenever possible, reads and writes to cached files are handled by a high-speed mechanism named fast I/O Fast I/O is a means of reading or writing a cached file without going through the work of generating an IRP, as described in Chapter 7 With fast I/O, the I/O manager calls the file system driver’s fast I/O routine to see whether I/O can be satisfied directly from the cache manager without generating an IRP

Because the cache manager is architected on top of the virtual memory subsystem, file system drivers can use the cache manager to access file data simply by copying to or from pages mapped to the actual file being referenced without going through the overhead of generating an IRP

Fast I/O doesn’t always occur For example, the first read or write to a file requires setting up the file for caching (mapping the file into the cache and setting up the cache data structures, as explained earlier in the section “Cache Data Structures”) Also, if the caller specified an asynchronous read or write, fast I/O isn’t used because the caller might be stalled during paging I/O operations required to satisfy the buffer copy to or from the system cache and thus not really providing the requested asynchronous I/O operation But even on a synchronous I/O, the file system driver might decide that it can’t process the I/O operation by using the fast I/O mechanism, say, for example, if the file in question has a locked range of bytes (as a result of calls to the Windows LockFile and UnlockFile functions) Because the cache manager doesn’t know what parts of which files are locked, the file system driver must check the validity of the read or write, which requires generating an IRP The decision tree for fast I/O is shown in Figure 10-11

These steps are involved in servicing a read or a write with fast I/O:

1 A thread performs a read or write operation

2 If the file is cached and the I/O is synchronous, the request passes to the fast I/O entry point of the file system driver stack If the file isn’t cached, the file system driver sets up the file for caching so that the next time, fast I/O can be used to satisfy a read or write request

Trang 8

3 If the file system driver’s fast I/O routine determines that fast I/O is possible, it calls the cache manager’s read or write routine to access the file data directly in the cache (If fast I/O isn’t possible, the file system driver returns to the I/O system, which then generates an IRP for the I/O and eventually calls the file system’s regular read routine.)

4 The cache manager translates the supplied file offset into a virtual address in the cache

5 For reads, the cache manager copies the data from the cache into the buffer of the process requesting it; for writes, it copies the data from the buffer to the cache

6 One of the following actions occurs:

❏ For reads where FILE_FLAG_RANDOM_ACCESS wasn’t specified when the file was opened, the read-ahead information in the caller’s private cache map is updated Read-ahead may also be queued for files for which the FO_RANDOM_ACCESS flag is not specified

❏ For writes, the dirty bit of any modified page in the cache is set so that the lazy writer will know to flush it to disk

❏ For write-through files, any modifications are flushed to disk

The performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-8 can be used to determine the fast I/O activity on the system

Trang 9

10.7 read ahead and Write behind

In this section, you’ll see how the cache manager implements reading and writing file data on behalf of file system drivers Keep in mind that the cache manager is involved in file I/O only when a file is opened without the FILE_FLAG_NO_BUFFERING flag and then read from or written to using the Windows I/O functions (for example, using the Windows ReadFile and WriteFile functions) Mapped files don’t go through the cache manager, nor do files opened with the FILE_FLAG_NO_BUFFERING flag set

Note When an application uses the FILE_FLAG_NO_BUFFERING flag to open a file, its

file I/O must start at device-aligned offsets and be of sizes that are a multiple of the alignment size; its input and output buffers must also be device-aligned virtual addresses For file systems, this usually corresponds to the sector size (512 bytes on NTFS, typically, and 2,048 bytes on CDFS) One of the benefits of the cache manager, apart from the actual caching performance, is the fact that it performs intermediate buffering to allow arbitrarily aligned and sized I/O

10.7.1 Intelligent Read-Ahead

The cache manager uses the principle of spatial locality to perform intelligent read-ahead by predicting what data the calling process is likely to read next based on the data that it is reading currently Because the system cache is based on virtual addresses, which are contiguous for a particular file, it doesn’t matter whether they’re juxtaposed in physical memory File read-ahead for logical block caching is more complex and requires tight cooperation between file system drivers and the block cache because that cache system is based on the relative positions of the accessed data on the disk, and, of course, files aren’t necessarily stored contiguously on disk You can examine read-ahead activity by using the Cache: Read Aheads/sec performance counter or the CcReadAheadIos system variable

Reading the next block of a file that is being accessed sequentially provides an obvious performance improvement, with the disadvantage that it will cause head seeks To extend readahead benefits to cases of strided data accesses (both forward and backward through a file), the cache manager maintains a history of the last two read requests in the private cache map for

Trang 10

the file handle being accessed, a method known as asynchronous read-ahead with history If a pattern can be determined from the caller’s apparently random reads, the cache manager extrapolates it For example, if the caller reads page 4000 and then page 3000, the cache manager assumes that the next page the caller will require is page 2000 and prereads it

Note Although a caller must issue a minimum of three read operations to establish a

predictable sequence, only two are stored in the private cache map

To make read-ahead even more efficient, the Win32 CreateFile function provides a flag indicating forward sequential file access: FILE_FLAG_SEQUENTIAL_SCAN If this flag is set, the cache manager doesn’t keep a read history for the caller for prediction but instead performs sequential read-ahead However, as the file is read into the cache’s working set, the cache manager unmaps views of the file that are no longer active and, if they are unmodified, directs the memory manager to place the pages belonging to the unmapped views at the front of the standby list so that they will be quickly reused It also reads ahead two times as much data (2 MB instead of 1 MB, for example) As the caller continues reading, the cache manager prereads additional blocks of data, always staying about one read (of the size of the current read) ahead of the caller

The cache manager’s read-ahead is asynchronous because it is performed in a thread separate from the caller’s thread and proceeds concurrently with the caller’s execution When called to retrieve cached data, the cache manager first accesses the requested virtual page to satisfy the request and then queues an additional I/O request to retrieve additional data to a system worker thread The worker thread then executes in the background, reading additional data in anticipation

of the caller’s next read request The preread pages are faulted into memory while the program continues executing so that when the caller requests the data it’s already in memory

For applications that have no predictable read pattern, the FILE_FLAG_RANDOM_ ACCESS flag can be specified when the CreateFile function is called This flag instructs the cache manager not to attempt to predict where the application is reading next and thus disables read-ahead The flag also stops the cache manager from aggressively unmapping views of the file

as the file is accessed so as to minimize the mapping/unmapping activity for the file when the application revisits portions of the file

10.7.2 Write-Back Caching and Lazy Writing

The cache manager implements a write-back cache with lazy write This means that data written to files is first stored in memory in cache pages and then written to disk later Thus, write operations are allowed to accumulate for a short time and are then flushed to disk all at once, reducing the overall number of disk I/O operations

The cache manager must explicitly call the memory manager to flush cache pages because otherwise the memory manager writes memory contents to disk only when demand for physical memory exceeds supply, as is appropriate for volatile data Cached file data, however, represents nonvolatile disk data If a process modifies cached data, the user expects the contents to be reflected on disk in a timely manner

Trang 11

The decision about how often to flush the cache is an important one If the cache is flushed too frequently, system performance will be slowed by unnecessary I/O If the cache is flushed too rarely, you risk losing modified file data in the cases of a system failure (a loss especially irritating

to users who know that they asked the application to save the changes) and running out of physical memory (because it’s being used by an excess of modified pages)

To balance these concerns, once per second the cache manager’s lazy writer function executes on a system worker thread and queues one-eighth of the dirty pages in the system cache

to be written to disk If the rate at which dirty pages are being produced is greater than the amount the lazy writer had determined it should write, the lazy writer writes an additional number of dirty pages that it calculates are necessary to match that rate System worker threads from the systemwide critical worker thread pool actually perform the I/O operations

Note The cache manager provides a means for file system drivers to track when and how

much data has been written to a file After the lazy writer flushes dirty pages to the disk, the cache manager notifies the file system, instructing it to update its view of the valid data length for the file (The cache manager and file systems separately track the valid data length for a file in memory.)

You can examine the activity of the lazy writer by examining the cache performance counters

or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-9

eXPeriMeNT: Watching the Cache Manager in action

In this experiment, we’ll use Process Monitor to view the underlying file system activity, including cache manager read-ahead and write-behind, when Windows Explorer copies a large file (in this example, a CD-ROM image) from one local directory to another First, configure Process Monitor’s filter to include the source and destination file paths, the Explorer.exe and System processes, and the ReadFile and WriteFile operations In this example, the c:\source.iso file was copied to c:\programming\source.iso, so the filter is configured as follows:

You should see a Process Monitor trace like the one shown here after you copy the file:

Trang 12

The first few entries show the initial I/O processing performed by the copy engine and the first cache manager operations Here are some of the things that you can see:

■ The initial 1-MB cached read from Explorer at the first entry The size of this read depends

on an internal matrix calculation based on the file size and can vary from 128 KB to 1 MB Because this file was large, the copy engine chose 1 MB

■ The 1-MB read is followed by 16 64-KB noncached reads Noncached reads typically indicate activity due to page faults or cache manager access A closer look at the stack trace for these events, which you can see by double-clicking an entry and choosing the Stack tab, reveals that indeed the CcCopyRead cache manager routine, which is called by the NTFS driver’s read routine, causes the memory manager to fault the source data into physical memory:

■ After these 64-KB page fault I/Os, the cache manager’s read-ahead mechanism starts reading the file, which includes the System process’s subsequent noncached 2-MB read at the 1-MB offset Because of the file size and Explorer’s read I/O sizes, the cache manager chose 2

MB as the optimal read-ahead size

Trang 13

The stack trace for one of the read-ahead operations, shown next, confirms that one of the cache manager’s worker threads is performing the read-ahead

After this point, Explorer’s 1-MB reads aren’t followed by 64-KB page faults, because the read-ahead thread stays ahead of Explorer, prefetching the file data with its 2-MB noncached reads Eventually, after reading about 4 MB of the file, Explorer starts performing writes to the destination file These are sequential, cached 64-KB writes After about 32 MB of reads, the first WriteFile operation from the System process occurs, shown here:

The write operation’s stack trace, shown here, indicates that the memory manager’s mapped page writer thread was actually responsible for the write:

Trang 14

This occurs because for the first couple of megabytes of data, the cache manager hadn’t started performing write-behind, so the memory manager’s mapped page writer began flushing the modified destination file data (see Chapter 9 for more information on the mapped page writer)

To get a clearer view of the cache manager operations, remove Explorer from the Process Monitor’s filter so that only the System process operations are visible, as shown next

With this view, it’s much easier to see the cache manager’s 16-MB write-behind operations (the maximum write sizes are 1 MB on client versions of Windows and 32 MB on server versions; this experiment was performed on a server system) The Time Of Day column shows that these operations occur almost exactly 1 second apart The stack trace for one of the write-behind operations, shown here, verifies that a cache manager worker thread is performing write-behind:

Trang 15

As an added experiment, try repeating this process with a remote copy instead (from one Windows system to another) and by copying files of varying sizes You’ll notice some different behaviors by the copy engine and the cache manager, both on the receiving and sending sides

Disabling Lazy Writing for a File

If you create a temporary file by specifying the flag FILE_ATTRIBUTE_TEMPORARY in a call to the Windows CreateFile function, the lazy writer won’t write dirty pages to the disk unless there is a severe shortage of physical memory or the file is explicitly flushed This characteristic of the lazy writer improves system performance—the lazy writer doesn’t immediately write data to a disk that might ultimately be discarded Applications usually delete temporary files soon after closing them

Forcing the Cache to Write Through to Disk

Because some applications can’t tolerate even momentary delays between writing a file and seeing the updates on disk, the cache manager also supports write-through caching on a per–file object basis; changes are written to disk as soon as they’re made To turn on write-through caching, set the FILE_FLAG_WRITE_THROUGH flag in the call to the CreateFile function Alternatively, a thread can explicitly flush an open file, by using the Windows FlushFileBuffers function, when it reaches a point at which the data needs to be written to disk You can observe cache flush operations that are the result of write-through I/O requests or explicit calls to FlushFileBuffers via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) shown in Table 10-10

Trang 16

Flushing Mapped Files

If the lazy writer must write data to disk from a view that’s also mapped into another process’s address space, the situation becomes a little more complicated, because the cache manager will only know about the pages it has modified (Pages modified by another process are known only to that process because the modified bit in the page table entries for modified pages is kept in the process private page tables.) To address this situation, the memory manager informs the cache manager when a user maps a file When such a file is flushed in the cache (for example,

as a result of a call to the Windows FlushFileBuffers function), the cache manager writes the dirty pages in the cache and then checks to see whether the file is also mapped by another process When the cache manager sees that the file is, the cache manager then flushes the entire view of the section to write out pages that the second process might have modified If a user maps a view of a file that is also open in the cache, when the view is unmapped, the modified pages are marked as dirty so that when the lazy writer thread later flushes the view, those dirty pages will be written to disk This procedure works as long as the sequence occurs in the following order:

1 A user unmaps the view

2 A process flushes file buffers

If this sequence isn’t followed, you can’t predict which pages will be written to disk

eXPeriMeNT: Watching Cache Flushes

You can see the cache manager map views into the system cache and flush pages to disk by running the Reliability and Performance Monitor and adding the Data Maps/sec and Lazy Write Flushes/sec counters and then copying a large file from one location to another The generally higher line in the following screen shot shows Data Maps/sec and the other shows Lazy Write Flushes/sec

10.7.3 Write Throttling

Trang 17

The file system and cache manager must determine whether a cached write request will affect system performance and then schedule any delayed writes First the file system asks the cache manager whether a certain number of bytes can be written right now without hurting performance

by using the CcCanIWrite function and blocking that write if necessary For asynchronous I/O, the file system sets up a callback with the cache manager for automatically writing the bytes when writes are again permitted by calling CcDeferWrite Otherwise, it just blocks and waits on CcCanIWrite to continue Once it’s notified of an impending write operation, the cache manager determines how many dirty pages are in the cache and how much physical memory is available If few physical pages are free, the cache manager momentarily blocks the file system thread that’s requesting to write data to the cache The cache manager’s lazy writer flushes some of the dirty pages to disk and then allows the blocked file system thread to continue This write throttling prevents system performance from degrading because of a lack of memory when a file system or network server issues a large write operation

Note The effects of write throttling are global to the system because the resource it is based

on, available physical memory, is global to the system This means that if heavy write activity to a slow device triggers write throttling, writes to other devices will also be throttled

The dirty page threshold is the number of pages that the system cache will allow to be dirty before throttling cached writers This value is computed at system initialization time and depends

on the product type (client or server) Two other values are also computed—the top dirty page threshold and the bottom dirty page threshold Depending on memory consumption and the rate at which dirty pages are being processed, the lazy writer calls the internal function CcAdjustThrottle, which, on server systems, performs dynamic adjustment of the current threshold based on the calculated top and bottom values This adjustment is made to preserve the read cache in cases of a heavy write load that will inevitably overrun the cache and become throttled Table 10-11 lists the algorithms used to calculate the dirty page thresholds

Write throttling is also useful for network redirectors transmitting data over slow communication lines For example, suppose a local process writes a large amount of data to a remote file system over a 9600-baud line The data isn’t written to the remote disk until the cache manager’s lazy writer flushes the cache If the redirector has accumulated lots of dirty pages that are flushed to disk at once, the recipient could receive a network timeout before the data transfer completes By using the CcSetDirtyPageThreshold function, the cache manager allows network redirectors to set a limit on the number of dirty cache pages they can tolerate (for each stream), thus preventing this scenario By limiting the number of dirty pages, the redirector ensures that a cache flush operation won’t cause a network timeout

eXPeriMeNT: Viewing the Write-Throttle Parameters

Trang 18

The !defwrites kernel debugger command dumps the values of the kernel variables the cache manager uses, including the number of dirty pages in the file cache (CcTotalDirtyPages), when determining whether it should throttle write operations:

10.7.4 System Threads

As mentioned earlier, the cache manager performs lazy write and read-ahead I/O operations

by submitting requests to the common critical system worker thread pool However, it does limit the use of these threads to one less than the total number of critical system worker threads for small and medium memory systems (two less than the total for large memory systems)

Internally, the cache manager organizes its work requests into four lists (though these are serviced by the same set of executive worker threads):

■ The express queue is used for read-ahead operations

■ The regular queue is used for lazy write scans (for dirty data to flush), write-behinds, and lazy closes

■ The fast teardown queue is used when the memory manager is waiting for the data section owned by the cache manager to be freed so that the file can be opened with an image section instead, which causes CcWriteBehind to flush the entire file and tear down the shared cache map

■ The post tick queue is used for the cache manager to internally register for a notification

Trang 19

To keep track of the work items the worker threads need to perform, the cache manager creates its own internal per-processor look-aside list, a fixed-length list—one for each processor—of worker queue item structures (Look-aside lists are discussed in Chapter 9.) The number of worker queue items depends on system size: 32 for small-memory systems, 64 for medium-memory systems, 128 for large-memory client systems, and 256 for large-memory server systems For cross-processor performance, the cache manager also allocates a global look-aside list at the same sizes as just described

10.8 Conclusion

The cache manager provides a high-speed, intelligent mechanism for reducing disk I/O and increasing overall system throughput By caching on the basis of virtual blocks, the cache manager can perform intelligent read-ahead By relying on the global memory manager’s mapped file primitive to access file data, the cache manager can provide the special fast I/O mechanism to reduce the CPU time required for read and write operations and also leave all matters related to physical memory management to the single Windows global memory manager, thus reducing code duplication and increasing efficiency

Trang 20

11 File Systems

In this chapter, we present an overview of the file system formats supported by Windows

We then describe the types of file system drivers and their basic operation, including how they interact with other system components, such as the memory manager and the cache manager Following that is a description of how to use Process Monitor from Windows Sysinternals (at

www.microsoft.com/technet/sysinternals) to troubleshoot a wide variety of file system access problems

In the balance of the chapter, we first describe the Common Log File System (CLFS), a transactional logging virtual file system implemented on the native Windows file system format, NTFS Then we focus on the on-disk layout of NTFS and its advanced features, such as compression, recoverability, quotas, symbolic links, transactions (which use the services provided

by CLFS), and encryption

To fully understand this chapter, you should be familiar with the terminology introduced in Chapter 8, including the terms volume and partition You’ll also need to be acquainted with these additional terms:

■ Sectors are hardware-addressable blocks on a storage medium Hard disks for x86 systems almost always define a 512-byte sector size; however, Windows also supports large sector disks, which are a new technology that allows access to even larger disks Thus, if the sector size is the standard 512 bytes and the operating system wants to modify the 632nd byte on a disk, it must write a 512-byte block of data to the second sector on the disk

■ File system formats define the way that file data is stored on storage media, and they affect

a file system’s features For example, a format that doesn’t allow user permissions to be associated with files and directories can’t support security A file system format can also impose limits on the sizes of files and storage devices that the file system supports Finally, some file system formats efficiently implement support for either large or small files or for large or small disks NTFS and exFAT are examples of file system formats that offer a different set of features and usage scenarios

■ Clusters are the addressable blocks that many file system formats use Cluster size is always a multiple of the sector size, as shown in Figure 11-1 File system formats use clusters to manage disk space more efficiently; a cluster size that is larger than the sector size divides a disk into more manageable blocks The potential trade-off of a larger cluster size is wasted disk space,

or internal fragmentation, that results when file sizes aren’t perfect multiples of cluster sizes

Trang 21

■ Metadata is data stored on a volume in support of file system format management It isn’t typically made accessible to applications Metadata includes the data that defines the placement of files and directories on a volume, for example

11.1 Windows File System Formats

Windows includes support for the following file system formats:

CDFS

CDFS (\Windows\System32\Drivers\Cdfs.sys), or CD-ROM file system, is a read-only file system driver that supports a superset of the ISO-9660 format as well as a superset of the Joliet disk format While the ISO-9660 format is relatively simple and has limitations such as ASCII uppercase names with a maximum length of 32 characters, Joliet is more flexible and supports Unicode names of arbitrary length, for example If structures for both formats are present on a disk (to offer maximum compatibility), CDFS uses the Joliet format CDFS has a couple of restrictions:

■ A maximum file size of 4 GB

■ A maximum of 65,535 directories CDFS is considered a legacy format because the industry has adopted the Universal Disk Format (UDF) as the standard for optical media

UDF

The Windows UDF file system implementation is OSTA (Optical Storage Technology Association) UDF-compliant (UDF is a subset of the ISO-13346 format with extensions for formats such as CD-R and DVD-R/RW.) OSTA defined UDF in 1995 as a format to replace the ISO-9660 format for magneto-optical storage media, mainly DVD-ROM UDF is included in the DVD specification and is more flexible than CDFS The UDF file system format has the following traits:

■ Directory and file names can be 254 ASCII or 127 Unicode characters long

■ Files can be sparse (Sparse files are defined later in the chapter.)

■ File sizes are specified with 64-bits

Trang 22

■ Support for access control lists (ACLs)

■ Support for alternate data streams

The UDF driver supports UDF versions up to 2.60 The UDF format was designed with rewritable media in mind The Windows UDF driver (\Windows\System32\Drivers\Udfs.sys) provides read-write support for DVD-RAM, CD-R/RW, and DVD+-R/RW drives when using UDF 2.50 and read-only support when using UDF 2.60 However, Windows does not implement support for certain UDF features such as named streams and access control lists

FAT12, FAT16, and FAT32

Windows supports the FAT file system primarily to enable upgrades from previous versions

of Windows, for compatibility with other operating systems in multiboot systems, and as a format for flash drives or memory cards The Windows FAT file system driver is implemented in

\Windows\System32\Drivers\Fastfat.sys

The name of each FAT format includes a number that indicates the number of bits the format uses to identify clusters on a disk FAT12’s 12-bit cluster identifier limits a partition to storing a maximum of 212 (4,096) clusters Windows uses cluster sizes from 512 bytes to 8 KB in size, which limits a FAT12 volume size to 32 MB

Note All FAT file system types reserve the first two clusters and the last 16 clusters of a

volume, so the number of usable clusters for a FAT12 volume, for instance, is slightly less than 4,096

FAT16, with a 16-bit cluster identifier, can address 216 (65,536) clusters On Windows, FAT16 cluster sizes range from 512 bytes (the sector size) to 64 KB, which limits FAT16 volume sizes to 4 GB The cluster size Windows uses depends on the size of a volume The various sizes are listed in Table 11-1 If you format a volume that is less than 16 MB as FAT by using the format command or the Disk Management snap-in, Windows uses the FAT12 format instead of FAT16

Trang 23

Entries in the file allocation table define file-allocation chains (shown in Figure 11-3) for files and directories, where the links in the chain are indexes to the next cluster of a file’s data A file’s directory entry stores the starting cluster of the file The last entry of the file’s allocation chain is the reserved value of 0xFFFF for FAT16 and 0xFFF for FAT12 The FAT entries for unused clusters have a value of 0 You can see in Figure 11-3 that FILE1 is assigned clusters 2, 3, and 4; FILE2 is fragmented and uses clusters 5, 6, and 8; and FILE3 uses only cluster 7 Reading a file from a FAT volume can involve reading large portions of a file allocation table to traverse the file’s allocation chains

The root directory of FAT12 and FAT16 volumes is preassigned enough space at the start of

a volume to store 256 directory entries, which places an upper limit on the number of files and directories that can be stored in the root directory (There’s no preassigned space or size limit on FAT32 root directories.) A FAT directory entry is 32 bytes and stores a file’s name, size, starting cluster, and time stamp (last-accessed, created, and so on) information If a file has a name that is Unicode or that doesn’t follow the MS-DOS 8.3 naming convention, additional directory entries are allocated to store the long file name The supplementary entries precede the file’s main entry Figure 11-4 shows a sample directory entry for a file named “The quick brown fox.” The system has created a THEQUI~1.FOX 8.3 representation of the name (that is, you don’t see a “.” in the directory entry because it is assumed to come after the eighth character) and used two more directory entries to store the Unicode long file name Each row in the figure is made up of 16 bytes

Trang 24

FAT32 uses 32-bit cluster identifiers but reserves the high 4 bits, so in effect it has 28-bit cluster identifiers Because FAT32 cluster sizes can be as large as 32 KB, FAT32 has a theoretical ability to address 8-terabyte (TB) volumes Although Windows works with existing FAT32 volumes of larger sizes (created in other operating systems), it limits new FAT32 volumes to a maximum of 32 GB FAT32’s higher potential cluster numbers let it manage disks more efficiently than FAT16; it can handle up to 128-GB volumes with 512-byte clusters Table 11-2 shows default cluster sizes for FAT32 volumes

Besides the higher limit on cluster numbers, other advantages FAT32 has over FAT12 and FAT16 include the fact that the FAT32 root directory isn’t stored at a predefined location on the volume, the root directory doesn’t have an upper limit on its size, and FAT32 stores a second copy

of the boot sector for reliability A limitation FAT32 shares with FAT16 is that the maximum file size is 4 GB because directories store file sizes as 32-bit values

exFAT

Designed by Microsoft, the Extended File Allocation Table file system (exFAT, also called FAT64) is an improvement over the traditional FAT file systems and is specifically designed for flash drives The main goal of exFAT is to provide some of the advanced functionality offered by NTFS, but without the metadata structure overhead and metadata logging that create write patterns

Trang 25

As the FAT64 name implies, the file size limit is increased to 264, allowing files up to 16 exabytes This change is also matched by an increase in the maximum cluster size, which is currently implemented as 32 MB but can be as large as 2255 sectors exFAT also adds a bitmap that tracks free clusters, which improves the performance of allocation and deletion operations Finally, exFAT allows more than 1,000 files in a single directory These characteristics result in increased scalability and support for large disk sizes

Additionally, exFAT implements certain features previously available only in NTFS, such as support for access control lists (ACLs) and transactions (called Transaction-Safe FAT, or TFAT) While the Windows Embedded CE implementation of exFAT includes these features, the version

of exFAT in Windows Vista and Windows Server 2008 does not Note ReadyBoost (described in Chapter 9) does not work with exFAT-formatted flash drives

NTFS

As we said at the beginning of the chapter, the NTFS file system is the native file system format of Windows NTFS uses 64-bit cluster numbers This capacity gives NTFS the ability to address volumes of up to 16 exaclusters; however, Windows limits the size of an NTFS volume to that addressable with 32-bit clusters, which is slightly less than 256 TB (using 64-KB clusters) Table 11-3 shows the default cluster sizes for NTFS volumes (You can override the default when you format an NTFS volume.) NTFS also supports 232–1 files per volume The NTFS format allows for files that are 16 exabytes in size, but the implementation limits the maximum file size to

16 TB

NTFS includes a number of advanced features, such as file and directory security, alternate data streams, disk quotas, sparse files, file compression, symbolic (soft) and hard links, support for transactional semantics, and encryption One of its most significant features is recoverability If a system is halted unexpectedly, the metadata of a FAT volume can be left in an inconsistent state, leading to the corruption of large amounts of file and directory data NTFS logs changes to metadata in a transactional manner so that file system structures can be repaired to a consistent state with no loss of file or directory structure information (File data can be lost, however.) Additionally, the NTFS driver in Windows also implements self-healing, a mechanism through which it makes most minor repairs to corruption of file system on-disk structures while Windows

is running and without requiring a reboot We’ll describe NTFS data structures and advanced features in detail later in this chapter

Tiêu đề	Windows Internals Covering Windows Server 2008 and Windows Vista
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Computer Science
Thể loại	Tài liệu hướng dẫn nội bộ
Thành phố	Hanoi

Định dạng
Số trang	50
Dung lượng	0,98 MB