172 ● Module II / Information Technologies obr76779_ch05_169-206.indd Page 172 9/5/09 6:08:55 AM user-f501
obr76779_ch05_169-206.indd Page 172 9/5/09 6:08:55 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
Chapter 5 / Data Resource Management ● 173
fields describing attributes such as the person’s name, Social Security number, and rate of pay. Fixed-length records contain a fixed number of fixed-length data fields. Variable- length records contain a variable number of fields and field lengths. Another way of looking at a record is that it represents a single instance of an entity. Each record in an employee file describes one specific employee.
Normally, the first field in a record is used to store some type of unique identifier for the record. This unique identifier is called the primary key . The value of a pri- mary key can be anything that will serve to uniquely identify one instance of an entity, and distinguish it from another. For example, if we wanted to uniquely identify a sin- gle student from a group of related students, we could use a student ID number as a primary key. As long as no one shared the same student ID number, we would always be able to identify the record of that student. If no specific data can be found to serve as a primary key for a record, the database designer can simply assign a record a unique sequential number so that no two records will ever have the same primary key.
A group of related records is a data file (sometimes referred to as a table or flat file ).
When it is independent of any other files related to it, a single table may be referred to as a flat file . As a point of accuracy, the term flat file may be defined either narrowly or more broadly. Strictly speaking, a flat file database should consist of nothing but data and delimiters. More broadly, the term refers to any database that exists in a single file in the form of rows and columns, with no relationships or links between records and fields except the table structure. Regardless of the name used, any grouping of related records in tabular (row-and-column form) is called a file . Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file . Files are also classified by their permanence, for example, a payroll master file versus a payroll weekly transaction file . A transaction file, therefore, would contain records of all transactions occurring dur- ing a period and might be used periodically to update the permanent records contained in a master file. A history file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage, called archival storage .
A database is an integrated collection of logically related data elements. A database consolidates records previously stored in separate files into a common pool of data File
File
Database Database
Employee Record 1
Employee Record 2
Employee Record 3
Employee Record 4 Human Resource Database
Payroll File Benefits File
Insurance Field
50,000
SS No.
Field
617-87-7915
Name Field
Porter M.L.
Name Field
Jones T. A.
SS No.
Field
275-32-3874
Salary Field
20,000
Name Field
Klugman J. L.
SS No.
Field
349-88-7913
Salary Field
28,000
Name Field
Alvarez J.S.
SS No.
Field
542-40-3718
Insurance Field
100,000
FIGURE 5.2 Examples of the logical data elements in information systems. Note especially the examples of how data fields, records, files, and databases relate.
obr76779_ch05_169-206.indd Page 173 9/5/09 6:08:56 AM user-f501
obr76779_ch05_169-206.indd Page 173 9/5/09 6:08:56 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
174 ● Module II / Information Technologies
elements that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of storage devices on which they are stored.
Thus, databases contain data elements describing entities and relationships among entities. For example, Figure 5.3 outlines some of the entities and relationships in a database for an electric utility. Also shown are some of the business applications (bill- ing, payment processing) that depend on access to the data elements in the database.
As stated in the beginning of the chapter, just about all the data we use are stored in some type of database. A database doesn’t need to look complex or technical to be a database; it just needs to provide a logical organization method and easy access to the data stored in it. You probably use one or two rapidly growing databases just about every day: How about Facebook, MySpace, or YouTube?
All of the pictures, videos, songs, messages, chats, icons, e-mail addresses, and ev- erything else stored on each of these popular social networking Web sites are stored as fields, records, files, or objects in large databases. The data are stored in such a way to ensure that there is easy access to it, it can be shared by its respective owners, and it can be protected from unauthorized access or use. When you stop to think about how simple it is to use and enjoy these databases, it is easy to forget how large and complex they are.
For example, in July 2006, YouTube reported that viewers watched more than 100 million videos every day, with 2.5 billion videos in June 2006 alone. In May 2006, users added 50,000 videos per day, and this increased to 65,000 videos by July. In January 2008 alone, almost 79 million users watched more than 3 billion videos on YouTube. In August 2006, The Wall Street Journal published an article revealing that YouTube was hosting approximately 6.1 million videos (requiring about 45 terabytes of storage space), and had approximately 500 accounts. As of March 2008, a YouTube search turned up approximately 77.3 million videos and 2.89 million user channels.
Perhaps an even more compelling example of ease of access versus complexity is found in the popular social networking Web site Facebook. Some of the basic statistics are nothing short of amazing! Facebook reports more than 200 million users with more than 100 million logging in at least once each day. The average user has 120 friend relationships established. More than 850 million photos, 8 million videos, 1 billion pieces of content, and 2.5 million events are uploaded or created each month.
More than 40 language translations are currently available on the site, with more than 50 more in development. More than 52,000 software applications exist in the Face- book Application Directory and over 30 million active users access Facebook through their mobile devices. The size of their databases is best measured in petabytes, which FIGURE 5.3
Some of the entities and relationships in a simplified electric utility database.
Note a few of the business applications that access the data in the database.
Billing
Meter reading
Payment processing
Service start / stop Entities:
Customers, meters, bills, payments, meter readings
Relationships:
Bills sent to customers, customers make payments, customers use meters, . . .
Electric Utility Database
Source: Adapted from Michael V. Mannino, Database Application Development and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
obr76779_ch05_169-206.indd Page 174 9/5/09 6:08:57 AM user-f501
obr76779_ch05_169-206.indd Page 174 9/5/09 6:08:57 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
Chapter 5 / Data Resource Management ● 175 is equal to one quadrillion bytes. All of this from a database and a simple access method launched in 2004 from a dorm room at Harvard University.
The important point here is that all of these videos, user accounts, and information are easily accessed because the data are stored in a database system that organizes it so that a particular item can be found on demand.
The relationships among the many individual data elements stored in databases are based on one of several logical data structures, or models. Database management system (DBMS) packages are designed to use a specific data structure to provide end users with quick, easy access to information stored in databases. Five fundamental database struc- tures are the hierarchical, network, relational, object-oriented, and multidimensional models.
Simplified illustrations of the first three database structures are shown in Figure 5.4 .
Database Structures Database Structures
Source: Adapted from Michael V. Mannino, Database Application Development and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
FIGURE 5.4
Example of three fundamental database structures. They represent three basic ways to develop and express the relationships among the
data elements in a database. Data ElementProject A
Department Data Element
Project B Data Element
Employee 1 Data Element
Employee 2 Data Element Hierarchical Structure
Department A
Project A
Department Table Deptno
Dept A Dept B Dept C
Employee Table Empno Emp 1 Emp 2 Emp 3 Emp 4 Emp 5 Emp 6
Deptno Dept A Dept A Dept B Dept B Dept C Dept B
Dname Dloc Dmgr Ename Etitle Esalary
Network Structure
Relational Structure Employee
1
Department B
Project B
Employee 3 Employee
2 obr76779_ch05_169-206.indd Page 175 9/5/09 6:08:58 AM user-f501
obr76779_ch05_169-206.indd Page 175 9/5/09 6:08:58 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
176 ● Module II / Information Technologies
Early mainframe DBMS packages used the hierarchical structure , in which the relation- ships between records form a hierarchy or treelike structure. In the traditional hierarchi- cal model, all records are dependent and arranged in multilevel structures, consisting of one root record and any number of subordinate levels. Thus, all of the relationships among records are one-to-many because each data element is related to only one element above it. The data element or record at the highest level of the hierarchy (the depart- ment data element in this illustration) is called the root element. Any data element can be accessed by moving progressively downward from a root and along the branches of the tree until the desired record (e.g., the employee data element) is located.
The network structure can represent more complex logical relationships and is still used by some mainframe DBMS packages. It allows many-to-many relationships among records; that is, the network model can access a data element by following one of sev- eral paths because any data element or record can be related to any number of other data elements. For example, in Figure 5.4 , departmental records can be related to more than one employee record, and employee records can be related to more than one project record. Thus, you could locate all employee records for a particular de- partment or all project records related to a particular employee.
It should be noted that neither the hierarchical nor the network data structures are commonly found in the modern organization. The next data structure we discuss, the relational data structure, is the most common of all and serves as the foundation for most modern databases in organizations.
The relational model is the most widely used of the three database structures. It is used by most microcomputer DBMS packages, as well as by most midrange and mainframe sys- tems. In the relational model, all data elements within the database are viewed as being stored in the form of simple two-dimensional tables , sometimes referred to as relations . The tables in a relational database are flat files that have rows and columns. Each row rep- resents a single record in the file, and each column represents a field. The major difference between a flat file and a database is that a flat file can only have data attributes specified for one file. In contrast, a database can specify data attributes for multiple files simultaneously and can relate the various data elements in one file to those in one or more other files.
Figure 5.4 illustrates the relational database model with two tables representing some of the relationships among departmental and employee records. Other tables, or relations, for this organization’s database might represent the data element relationships among projects, divisions, product lines, and so on. Database management system pack- ages based on the relational model can link data elements from various tables to provide information to users. For example, a manager might want to retrieve and display an employee’s name and salary from the employee table in Figure 5.4 , as well as the name of the employee’s department from the department table, by using their common department number field (Deptno) to link or join the two tables. See Figure 5.5 . The relational model can relate data in any one file with data in another file if both files share a common data element or field. Because of this, information can be created by retriev- ing data from multiple files even if they are not all stored in the same physical location.
Hierarchical Structure Hierarchical Structure
Network Structure Network Structure
Relational Structure Relational Structure
FIGURE 5.5
Joining the employee and department tables in a relational database enables you to access data selectively in both tables at the same time.
Department Table Deptno
Dept A Dept B Dept C
Employee Table Empno Emp 1 Emp 2 Emp 3 Emp 4 Emp 5 Emp 6
Deptno Dept A Dept A Dept B Dept B Dept C Dept B
Dname Dloc Dmgr Ename Etitle Esalary
obr76779_ch05_169-206.indd Page 176 9/5/09 6:09:00 AM user-f501
obr76779_ch05_169-206.indd Page 176 9/5/09 6:09:00 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
Chapter 5 / Data Resource Management ● 177 Three basic operations can be performed on a relational database to create useful sets of data. The select operation is used to create a subset of records that meet a stated criterion. For example, a select operation might be used on an employee database to create a subset of records that contain all employees who make more than $30,000 per year and who have been with the company more than three years. Another way to think of the select operation is that it temporarily creates a table whose rows have records that meet the selection criteria.
The join operation can be used to combine two or more tables temporarily so that a user can see relevant data in a form that looks like it is all in one big table. Using this operation, a user can ask for data to be retrieved from multiple files or databases with- out having to go to each one separately.
Finally, the project operation is used to create a subset of the columns contained in the temporary tables created by the select and join operations. Just as the select opera- tion creates a subset of records that meet stated criteria, the project operation creates a subset of the columns, or fields, that the user wants to see. Using a project operation, the user can decide not to view all of the columns in the table but instead only those that have the data necessary to answer a particular question or construct a specific report.
Because of the widespread use of relational models, an abundance of commercial products exist to create and manage them. Leading mainframe relational database ap- plications include Oracle 10g from Oracle Corp. and DB2 from IBM. A very popular midrange database application is SQL Server from Microsoft. The most commonly used database application for the PC is Microsoft Access.
The multidimensional model is a variation of the relational model that uses multidi- mensional structures to organize data and express the relationships between data. You can visualize multidimensional structures as cubes of data and cubes within cubes of data. Each side of the cube is considered a dimension of the data. Figure 5.6 is an ex- ample that shows that each dimension can represent a different category, such as prod- uct type, region, sales channel, and time.
Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions. For example, a single cell may contain the total sales for a product in a region for a specific sales channel in a single month. A major benefit of multidimensional databases is that they provide a compact and easy-to- understand way to visualize and manipulate data elements that have many interrelation- ships. So multidimensional databases have become the most popular database structure for the analytical databases that support online analytical processing (OLAP) applica- tions, in which fast answers to complex business queries are expected. We discuss OLAP applications in Chapter 10.
The object-oriented model is considered one of the key technologies of a new genera- tion of multimedia Web-based applications. As Figure 5.7 illustrates, an object con- sists of data values describing the attributes of an entity, plus the operations that can be performed upon the data. This encapsulation capability allows the object-oriented model to handle complex types of data (graphics, pictures, voice, and text) more easily than other database structures.
The object-oriented model also supports inheritance ; that is, new objects can be automatically created by replicating some or all of the characteristics of one or more parent objects. Thus, in Figure 5.7 , the checking and savings account objects can inherit both the common attributes and operations of the parent bank account object. Such capabilities have made object-oriented database management systems (OODBMS) popular in computer-aided design (CAD) and a growing number of applications. For example, object technology allows designers to develop product designs, store them as objects in an object-oriented database, and replicate and modify them to create new product de- signs. In addition, multimedia Web-based applications for the Internet and corporate intranets and extranets have become a major application area for object technology.
Relational Operations Relational Operations
Multidimensional Structure
Multidimensional Structure
Objected-Oriented Structure
Objected-Oriented Structure
obr76779_ch05_169-206.indd Page 177 9/5/09 6:09:02 AM user-f501
obr76779_ch05_169-206.indd Page 177 9/5/09 6:09:02 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
178 ● Module II / Information Technologies
FIGURE 5.6 An example of the different dimensions of a multidimensional database.
Camera TV VCR Audio Camera TV VCR Audio
February March
Sales
Margin East
West
San Francisco Los Angeles Denver
Actual Budget Actual Budget
January February March Qtr 1 January February March Qtr 1
Actual
West TV
VCR Sales
COGS Margin
Total Expenses Profit
East
Budget Actual Budget
East West South Total East West South Total
Actual Budget
TV
VCR January
February March Qtr 1
April
Sales Margin Sales Margin
Actual Budget Forecast Variance Actual Budget Forecast Variance
Sales Margin
East
West January
February March Qtr 1
April
TV VCR TV VCR
FIGURE 5.7
The checking and savings account objects can inherit common attributes and operations from the bank account object.
Checking Account Object
Attributes Customer Balance Interest
Operations Deposit (amount) Withdraw (amount) Get owner
Attributes Number of withdrawals Quarterly statement
Operations Calculate interest paid Print quarterly statement Attributes
Credit line Monthly statement
Operations Calculate interest owed Print monthly statement
Savings Account Object Bank Account Object
Inheritance Inheritance
Source: Adapted from Ivar Jacobsen, Maria Ericsson, and Ageneta Jacobsen, The Object Advantage: Business Process Reengineering with Object Technology (New York: ACM Press, 1995), p. 65. Copyright © 1995, Association for Computing Machinery. Used by permission.
obr76779_ch05_169-206.indd Page 178 9/5/09 6:09:02 AM user-f501
obr76779_ch05_169-206.indd Page 178 9/5/09 6:09:02 AM user-f501 /Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles/Volumes/204/MHBR112/obr76779/0073376779/obr76779_pagefiles
Chapter 5 / Data Resource Management ● 179
Object technology proponents argue that an object-oriented DBMS can work with complex data types such as document and graphic images, video clips, audio segments, and other subsets of Web pages much more efficiently than relational database management systems. However, major relational DBMS vendors have countered by adding object-oriented modules to their relational software. Examples include multimedia object extensions to IBM’s DB2 and Oracle’s object-based “cartridges” for Oracle 10g. See Figure 5.8 . The hierarchical data structure was a natural model for the databases used for the struc- tured, routine types of transaction processing characteristic of many business operations in the early years of data processing and computing. Data for these operations can easily be represented by groups of records in a hierarchical relationship. However, as time pro- gressed, there were many cases in which information was needed about records that did not have hierarchical relationships. For example, in some organizations, employees from more than one department can work on more than one project (refer to Figure 5.4 ). A network data structure could easily handle this many-to-many relationship, whereas a hierarchical model could not. As such, the more flexible network structure became popu- lar for these types of business operations. Like the hierarchical structure, the network model was unable to handle ad hoc requests for information easily because its relation- ships must be specified in advance, which pointed to the need for the relational model.
Relational databases enable an end user to receive information easily in response to ad hoc requests. That’s because not all of the relationships among the data elements in a relationally organized database need to be specified when the database is created.
Database management software (such as Oracle 11g, DB2, Access, and Approach) cre- ates new tables of data relationships by using parts of the data from several tables.
Thus, relational databases are easier for programmers to work with and easier to main- tain than the hierarchical and network models.
The major limitation of the relational model is that relational database manage- ment systems cannot process large amounts of business transactions as quickly and efficiently as those based on the hierarchical and network models; they also cannot process complex, high-volume applications as well as the object-oriented model. This performance gap has narrowed with the development of advanced relational database software with object-oriented extensions. The use of database management software based on the object-oriented and multidimensional models is growing steadily, as these technologies are playing a greater role for OLAP and Web-based applications.
Evaluation of
Database Structures Evaluation of
Database Structures FIGURE 5.8
Databases can supply data to a wide variety of analysis packages, allowing for data to be displayed in graphical form.
Source: Courtesy of Microsoft®. obr76779_ch05_169-206.indd Page 179 10/2/09 1:15:48 PM user
obr76779_ch05_169-206.indd Page 179 10/2/09 1:15:48 PM user /Users/user/Desktop/Temp Work/SEPTEMBER_2009/00 HARISH_BACKUP/OCTOBER_2009/02.../Users/user/Desktop/Temp Work/SEPTEMBER_2009/00 HARISH_BACKUP/OCTOBER_2009/02...