After studying this chapter, you should be able to: Describe the basic concepts of files and file systems, understand the principal techniques for file organization and access, define B-trees, explain file directories, understand the requirements for file sharing.
Trang 1Chapter 12 File Management
Trang 2• Files are the central element to most
applications
– file as an input to applications
– file as an output for long-term storage and for later access
• Desirable properties of files:
– Long-term existence
– Controlled sharing between processes
– Structure that is convenient for particular
applications
Trang 3File Structure Fields and Records
• Fields
– Basic element of data
• e.g., student’s last name
– Contains a single value
– Characterized by its length and data type
• Records
– Collection of related fields
• e.g., a student record
– Treated as a unit
Trang 4File Structure File and Database
• File
– Collection of similar records
– Treated as a single entity and may be
referenced by name
– Access control restrictions usually apply at the file level
• Database
– Collection of related data
– Explicit relationships exist among elements
– Consists of one or more files
Trang 5A Big Picture
How to identify and
locate a selected file?
How to enforce user
access control in
shared systems?
How to organize records as a sequence of blocks
for I/O?
individual block I/O requests must be scheduled for optimizing performance
How to organize
records in a file and
access a particular
record in a file?
Trang 7File Organization
• The basic operations that a user or
application may perform on a file are
performed at the record level
– The file is viewed as having some structure
that organizes the records
• File organization refers to the logical
structuring of records
– Determined by the way in which files are
accessed (access method)
Trang 8Criteria for File Organization
• Important criteria include:
– Short access time
– Ease of update
– Economy of storage
– Simple maintenance
– Reliability
Trang 9Criteria for File Organization
• Priority will differ depending on the use
– For batch mode file processing, rapid access for retrieval of a single record is of minimal
concern
• These criteria may conflict
– Use of indexes (conflict with economy of
storage) can be a primary means of
increasing the speed of access to data
Trang 10The Pile
• Data are collected in the
order they arrive
– No structure
• Purpose is to accumulate a
mass of data and save it
• Records may have different
fields
– field should be self-describing (field
name + value)
– field length should be known (delimiters,
subfield or default for a field type)
Trang 11The Pile
• Record access is by exhaustive search
• Used when data are collected and stored
prior to processing or data are not easy to organize
• Uses space well when data vary in size
Trang 12The Sequential File
• Fixed format used for records
• Records are of the same length
– same number of fixed-length fields
in a particular order
• Only the values of fields need to
be stored
• Field name and length are
attributes of the file structure
Trang 13The Sequential File
• Key field
– Uniquely identifies the record
– Records are stored in key sequence
• Optimal for batch applications if they involve
the processing of all the records
• Easily stored on tape and disk
• Poor performance for interactive applications
– considerable processing and delay due to the
sequential search of the file for a key match
Trang 14Indexed Sequential File
• An index is added to support
random access
– An index record contains a key
field and a pointer into the main file
– The index is a sequential file
– For searching
• Search the index to find the highest key value that is equal to or precedes the desired key value
• Search continues in the main file at the location indicated by the pointer
Trang 15Indexed Sequential File
Example
• Consider searching a particular key value
in a sequential file with 1 million records
Trang 16• An overflow file is added
• A new record is added to the overflow file and is
located by following a pointer from its predecessor
record
• The indexed sequential file is occasionally merged
with the overflow file in batch mode
• Greatly reduces the time required to access a
single record, without sacrificing the sequential
nature
Indexed Sequential File
Trang 17– allows variable-length records
• Uses multiple indexes for different key
fields
– An exhaustive index contains one entry for
every record in the main file
– A partial index contains entries to records
Trang 18Indexed File
• When a new record is added to the main file, all
of the index files must be updated
• Used mostly in applications where
– timeliness of information is critical and
– data are rarely processed exhaustively
– examples: airline reservation systems and inventory
control systems
Trang 21Directory Elements
• Basic Information
– File name: must be unique
– File type: e.g., text, binary
– File organization
• Address Information
– Volume: device on which file is stored
– Starting address: e.g., cylinder, track on disk
– Size used: in bytes, words or blocks
– Size allocated: maximum size of the file
Trang 22Directory Elements
• Access Control Information
– Owner: able to grant/deny access to other users and
to change these privileges
– Access information: e.g., user’s name and password
for each authorized user
– Permitted actions: controls reading, writing, executing, transmitting over a network
• Usage Information
– Date Created, Identity of Creator, Date Last Read
Access, Identity of Last Reader, Date Last Modified
Trang 23Hierarchical, or Tree-Structured Directory
• Master directory with user
directories underneath it
• Each user directory may
have subdirectories and
Trang 24Hierarchical, or Tree-Structured Directory
• Easily enforce access restriction on
directories.
• Easily organize collections of files
• Minimize the difficulty in assigning
unique names.
Trang 25• The tree structure allows users to find a
file by following a path from the root or
master directory down various branches
until the file is reached
• The series of directory names, culminating
in the file name itself, constitutes a
pathname for the file
• Duplicate filenames are possible if they
have different pathnames
Trang 26– Files are referenced
relative to the working
directory unless an
explicit full pathname
is used
Trang 28File Sharing
• In multiuser system, there is almost
always a requirement for allowing files to
be shared among a number of users
• Two issues
– Access rights
– Management of simultaneous access
Trang 29Access Rights
• A wide variety of access rights have been
used by various systems
– often as a hierarchy, with each right implying
those that precede it
– User can only determine that the file exists
and who its owner is
Trang 30Access Rights cont…
• Execution
– The user can load and execute a program but cannot copy it, e.g., proprietary programs
• Reading
– The user can read the file for any purpose,
including copying and execution
• Appending
– The user can add data to the file but cannot
modify or delete any of the file’s contents
Trang 31Access Rights cont…
Trang 33Simultaneous Access
• When access is granted to append or
update a file to more than one user, the
OS or file management system must
enforce discipline
• User may lock the entire file or individual
records during update
• Mutual exclusion and deadlock are issues for shared access, ref readers/writers
problem
Trang 35Blocks and records
• Records are the logical unit of access of a
structured file
• Blocks are the unit for I/O with secondary storage
• For I/O to be performed, records must be
organized as blocks
• Three methods of blocking are common
– Fixed length blocking
– Variable length spanned blocking
Trang 36Fixed Blocking
• Fixed-length records are used, and an
integral number of records are stored in a
Trang 37Fixed Blocking
Trang 38Variable Length Spanned Blocking
• Variable-length records are used and are
packed into blocks with no unused space
• Some records may span multiple blocks
– Continuation is indicated by a pointer to the
successor block
• Efficient for storage and does not limit
the size of records
Trang 39Variable Blocking:
Spanned
• Difficult to implement
• Records that span two blocks require
two I/O operations
Trang 40Variable-length unspanned blocking
• Uses variable length records without
spanning
• Wasted space in most blocks because
of the inability to use the remainder of a
block if the next record is larger than the
remaining unused space
• Limits record size to the size of a block
Trang 41Variable Blocking:
Unspanned
Trang 42Revisit the Big Picture
Describes the location
of all files plus their
attributes
Only authorized users
are allowed to access
particular files in
particular ways
Records must be organized as a sequence of blocks for output and unblocked after
input
individual block I/O requests must be scheduled for optimizing performance
User views the file as
having some structure
that organizes the
records; different
access methods reflect
different file structures