Conceptual completeness of a data model implies that it is a complete representation of the information requirements of the organization.. Conceptual correctness of a data model implies
Trang 1data modeling tools, analysis and design tools, and tools for documenting and testing applications
Circular Structure A data structure consisting of three or more entity types forming cyclical relationships where the first is related to the second, the second to the third, and so on, and finally the last related back to the first In a good data model, circular structures are resolved
Composite Key Primary key made up of more than one attribute
Concatenated Key Same as Composite Key
Conceptual Completeness Conceptual completeness of a data model implies that it is a complete representation of the information requirements of the organization
Conceptual Correctness Conceptual correctness of a data model implies that it is a true replica of the information requirements of the organization
Conceptual Data Model A generic data model capturing the true meaning of the information requirements of an organization Does not conform to the conventions of any class of database systems such as hierarchical, network, relational, and so on Conceptual Entity Type Set representing the type of the objects, not the physical objects themselves
Data Dictionary Repository holding the definitions of the data structures in a database
In a relational database, the data dictionary contains the definitions of all the tables, columns, and so on
Data Integrity Accuracy and consistency of the data stored in the organization’s data-base system
Data Manipulation Operations for altering data in the database Data manipulation includes retrieval, addition, update, and deletion of data
Data Mining Knowledge discovery process Data mining algorithms uncover hidden relationships and patterns from a given set of data on which they operate Knowledge discovery is automatic, not through deliberate search and analysis by analysts Data Model Representation of the real-world information requirements that gets implemented in a computer system A data model provides a method and means for describing real-world information by using specific notations and conventions Data Repository Storage of the organization’s data in databases Stores all data values that are part of the databases
Data View See User View
Data Warehouse A specialized database having a collection of transformed and inte-grated data, stored for the purpose of providing strategic information to the organization
Database Repository where an ordered, integrated, and related collection of the organization’s data is stored for the purpose of computer applications and information sharing
Database Administration Responsibility for the technical aspects of the organization’s database Includes the physical design and handling of the technical details such
as database security, performance, day-to-day maintenance, backup, and recovery Database administration is more technical than managerial
Database Administrator (DBA) Specially trained technical person performing the database administration functions in an organization
Trang 2Database Practitioners Includes the set of IT professionals such as analysts, data mode-lers, designers, programmers, and database administrators who design, build, deploy, and maintain database systems
DBMS Database Management System Software system to store, access, maintain, manage, and safeguard the data in databases
DDLC Database Development Life Cycle A complete process from beginning to end, with distinct phases for defining information requirements, creating the data model, designing the database, implementing the database, and maintaining it thereafter Decomposition of Relations Splitting of relations or tables into smaller relations for the purpose of normalizing them
Degree The number of entity types or object sets that participate in a relationship For a binary relationship the degree is 2
Dimension Entity Type In a STAR schema, a dimension entity type represents a business dimension such as customer or product along which metrics like sales are analyzed
DKNF Domain Key Normal Form This is the ultimate goal in transforming a relation into the highest normal form A relation is in DKNF if it represents one topic and all
of its business rules, being able to be expressed through domain constraints and key relationships
Domain The set of all permissible data values and data types for an attribute of an entity type
DSS Decision Support System Application that enables users to make strategic decisions Decision support systems are driven by specialized databases
End-Users See Users
Entity A real-world “thing” of interest to an organization
Entity Instance A single occurrence of an entity type For example, a single invoice is an instance of the entity type called INVOICE
Entity Integrity A rule or constraint to ensure the correctness of an entity type or rela-tional table
ERD Entity-Relationship Diagram A graphical representation of entities and their relationships in the Entity-Relationship data modeling technique
Entity Set The collection of all entity instances of a particular type of entity
Entity Type Refers to the type of entity occurrences in an entity set For example, all customers of an organization form the CUSTOMER entity type
E-R Data Modeling Design technique for creating an entity-relationship diagram from the information requirements
Evolutionary Modeling Data modeling as promoted by the Agile Software Develop-ment moveDevelop-ment This is a type of iterative modeling methodology where the model evolves in “creation—feedback—revision” cycles
External Data Model Definition of the data structures in a database that are of interest to various user groups in an organization It is the way users view the database from outside
Fact Entity Type In a STAR schema, a fact entity type represents the metrics such as sales that are analyzed along business dimensions such as customer or product
Trang 3Feasibility Study One of the earlier phases in DDLC conducting a study of the readiness
of an organization and the technological, economic, and operational feasibility of a database system for the organization
Fifth Normal Form (5NF) A relation that is already in the fourth normal form and without any join dependencies
First Normal Form (1NF) A relation that has no repeating groups of values for a set of attributes in a single row
Foreign Key An attribute in a relational table used for establishing a direct relationship with another table, known as the parent table The values of the foreign key attribute are drawn from the primary key values of the parent table
Fourth Normal Form (4NF) A relation that is already in the third normal and without any multivalued dependencies
Functional Dependency The value of an attribute B in a relation depending on the value
of another attribute A For every instance of attribute A, its value uniquely determines the value of attribute B in the relation
Generalization The concept that some entity types are general cases of other entity types The entity types in the general cases are known as super-types
Generalizing Specialists A trend in software developers, as promoted by the agile soft-ware development movement, where specialists acquire more and more diverse skills and expand their horizons Accordingly, data modelers are no longer specialists with just data modeling skills
Gerund Representation of a relationship between two entity types as an entity type itself Homonyms Two or more data elements having the same name but containing different data
Identifier One or more attributes whose values can uniquely identify the instances of an entity type
Identifying Relationship A relationship between two entity types where one entity type depends on another entity type for its existence For example, the entity type ORDER-DETAIL cannot exist without the entity type ORDER
Inheritance The property that sub-sets inherit the attributes and relationships of their super-set
Intrinsic Characteristics Basic or inherent properties of an object or entity
IT Information Technology Covers all computing and data communications in an organ-ization Typically, the CIO is responsible for IT operations in an organorgan-ization Iterative Modeling This implies that the modeling process is not strictly carried out in a sequential manner such as modeling all entity types, modeling all relationships, model-ing all attributes, and so on Iterative modelmodel-ing allows the data modeler to constantly go back, verify, readjust, and ensure cohesion and completeness
Key One or more attributes whose values can uniquely identify the rows of a relational table
Logical Data Model Also sometimes referred to as a conventional data model, consists
of the logical data structure representing the information requirements of an organiz-ation This data model conforms to the conventions of a class of database systems such as hierarchical, network, relational, and so on The logical data model for a relational database system consists of tables or relations
Trang 4Logical Design Process of designing and creating a logical data model.
Matrix Consists of members or elements arranged in rows and columns In the relational data model, a table or relation may be compared to a matrix thereby making it possible
to apply matrix algebra functions to the data represented in the table
MDDMBS Multi-dimensional database management system Used to create and manage multi-dimensional databases for OLAP
Meta-data Data about the data of an organization
Model Transformation Process of mapping and transforming the components of a conceptual data model to those of a logical or conventional data model
MOLAP Multidimensional Online Analytical Processing An analytical processing technique in which multidimensional data cubes are created and stored in separate proprietary databases
Normal Form A state of a relation or table, free from incorrect dependencies among the attributes See also Boyce-Codd Normal Form, First Normal Form, Second Normal Form, and Third Normal Form
Normalization The step-by-step method of transforming a random table into a set of normalized relations free from incorrect dependencies and conforming to the rules of the relational data model
Null Value A value of an attribute, different from zero or blank to indicate a missing, non-applicable or unknown value
OLAP Online Analytical Processing Powerful software systems providing extensive multidimensional analysis, complex calculations, and fast response times Usually present in data warehousing systems
Physical Data Model Data model representing the information requirements of an organization at a physical level of hardware and system software, consisting of the actual components such as data files, blocks, records, storage allocations, indexes, and so on
Physical Design Process of designing the physical data model
Practitioners See Database Practitioners
Primary Key A single attribute or a set of attributes that uniquely identifies an instance
of an object set or entity type and chosen as the primary key
RDBMS Relational Database Management System
Referential Integrity Refers to two relational tables that are directly related Referential integrity between related tables is established if non-null values in the foreign key attribute of the child table are primary key values in the parent table
Relation In relational database systems, a relation is a two dimensional table with columns and rows, conforming to relational rules
Relational Data Model A conventional or logical data model where data is perceived as two-dimensional tables with rows and columns Each table represents a business object; each column represents an attribute of the object; each row represents an instance of the object
Relational Database A database system built based on the relational data model Relationship A relationship between two object sets or entity types represents the associations of the instances of one object set with the instances of the other object
Trang 5set Unary, binary, or ternary relationships are the common ones depending on the number of object sets participating in the relationship A unary relationship is recur-sive—instances of an object set associated with instances of the same object set Relationships may be mandatory or optional based on whether some instances may
or may not participate in the relationship
Repeating Group A group of attributes in a relation that has multiple sets of values for the attributes
ROLAP Relational Online Analytical Processing An online analytical processing technique in which multidimensional data cubes are created on the fly by the relational database engine
Second Normal Form (2NF) A relation that is already in the first normal form and without partial key dependencies
Set Theory Mathematical concept where individual members form a set Set operations can be used to combine or select members from sets in several ways In a relational data model, the rows or tuples of a table or relation may be considered as forming
a set As such, set operations may be applied to manipulation of data represented as tables
Specialization The concept that some entity types are special cases of other entity types The entity types in the special cases are known as sub-types
SQL Structured Query Language Has become the standard language interface for relational databases
Stakeholders All people in the organization who have a stake in the success of the data system
STAR Schema The arrangement of the collection of fact and dimension entity types in the dimensional data model, resembling a star formation, with the fact entity type placed in the middle and surrounded by the dimension entity types Each dimension entity type is in a one-to-many relationship with the fact entity type
Strategic Information May refer to information in an organization used for making strategic decisions
Strong Entity An entity on which a weak entity depends for its existence See also Weak Entity
Sub-types See Specialization
Subset An entity type that is a special case of another entity type known as the superset Super-types See Generalization
Superset An entity type that is a general case of another entity type known as the subset Surrogate Key A unique value generated by the computer system used as a key for a relation A surrogate key has no business meaning apart from the computer system Synonyms Two or more data elements containing the same data but having different names
Syntactic Completeness Syntactic completeness of a data model implies that the model-ing process has been carried out completely to produce a good data model for the organization
Syntactic Correctness Syntactic correctness of a data model implies that the represen-tation using the appropriate symbols does not violate any rules of the modeling technique
Trang 6Third Normal Formn (3NF) A relation that is already in the second normal form and without any transitive dependencies—that is, the dependencies of non-key attributes
on the primary key through other non-key attributes, not directly
Transitive Dependency In a relation, the dependency of a non-key attribute on the primary key through another non-key attribute, not directly
Triad A set of three related entity types where one of the relationships is redundant Triads must be resolved in a refined data model
Tuple A row in a relational table
UML Unified Modeling Language Its forerunners constitute the wave of object-oriented analysis and design methods of the 1980s and 1990s UML is a unified language because it directly unifies the leading methods of Booch, Rumbaugh, and Jacobson OMG (Object Management Group) has adopted UML as a standard
User View View of the database by a single user group Therefore, a data view of a particular user group includes only those parts of the database that group is concerned with The collection of all data views of all the user groups constitutes the total data model
Users In connection with data modeling, the term users includes all people who use the data system that is built based on the particular data model
Weak Entity An entity that depends for its existence on another entity known as a strong entity For example, the entity type ORDER DETAIL cannot exist without the entity type ORDER See also Strong Entity
XML eXtensible Markup Language Introduced to overcome the limitations of HTML XML is extensible, portable, structured, and descriptive In a very limited way, it may
be used in data modeling
Trang 8Aggregation See Relationships, special cases
of, aggregation
Agile movement, the, 376 – 379
generalizing specialists, 379
philosophies, 378
principles, 378
See Data modeling, agile modeling principles
See also Modeling, agile; Modeling,
evolutionary
Assembly structures, 147 – 148
Attribute, checklist for validation of, 178 – 180
Attributes, 100, 158 – 178
constraints for, 169 – 170
null values, 170
range, 170
type, 170
value set, 169
data, as, 161
domain, definition of, 164
domains, 164 – 169
attribute values, for, 166
information content, 165
misrepresented, 167
split, 167
names, 163
properties or characteristics, 158
relationships of, 160
types of, 171 – 175
optional, 173
simple and composite, 171
single-valued and multi-valued, 171
stored and derived values, with, 172
values, 162
Business intelligence, 300 Business rules, incorporation of, 25 Case study
E-R model, 84 UML model, 87 Categorization See Specialization / Generalization, categorization Circular structures,
See Relationships, design issues of, circular structures
Class diagram, 62 See also UML Conceptual and physical entity types,
145 – 147 Conceptual model symbols and meanings, 77 Data lifecycle, 7 – 9
Data mining, 334 – 342 OLAP versus data mining, 336 techniques, 338
data modeling for, 341 Data model
communication tool, 5 components of, 18 – 20 database blueprint, 5 external, 13, 75 conceptual, 14 – 15, 75 identifying components, 77 – 80 review procedure, 76 – 77 logical, 15 – 17, 75, 104 – 107 transformation steps, 107 – 110
433
Data Modeling Fundamentals By Paulraj Ponniah
Copyright # 2007 John Wiley & Sons, Inc.
Trang 9Data model (Continued )
physical, 17, 76, 111 – 112
quality, 26 – 29, 348
approach to good modeling, 351
assurance process, 365 – 373
aspects of, 365
assessment of, 370
stages of, 366
definitions, of, 351 – 360
checklists, 358
dimensions, 361
good and bad models, 349
meaning of, 360
relational, 109
symbols, 19 – 20
Data model diagram, review of, 103 – 104
Data modeling
agile modeling principles, application
of, 34 – 35
approaches, 36 – 38, 44 – 47
data mining, for, 341
data warehouse, for the, 38 – 39
methods and techniques
IDEF1X, 51
Information Engineering, 50
Object Role Modeling (ORM), 55
Peter Chen (E-R) modeling, 48
Richard Barker’s, 53
XML, 57
steps of, 20 – 26
tips, practical, 392 – 421
bill-of-materials, 409
iterative modeling, 399 – 401
cycles, establishing, 399
increments, 400
partial models, integration of, 401
layout, conceptual model, 409 – 417
adding texts, 416
component arrangement, 410
visual highlights, 417
legal entities, 402
locations and places, 403
logical data model, 417 – 421
persons, 407
requirements definition, 393 – 396
stakeholder participation, 396 – 399
time periods, 405
Data system development life cycle.
See DDLC
Data warehouse, 301 – 325
data staging, 304
data storage, 304
dimensional to relational, 322 families of STARS, 321 information delivery, 305 modeling
business data, dimensional nature
of, 306 dimensional modeling, 308 – 312 dimension entity type, 309,313 fact entity type, 309, 314 information package, 307 snowflake schema, 318 source data, 304 STAR schema, 312 – 318 data granularity, 315, 317 degenerate dimensions, 316 factless fact entity type, 316 fully additive measures, 315 semi-additive measures, 315 technologies, 302
Database design conceptual to relational, 243 informal, 272
model transformation method attributes to columns, 250 entity types to relations, 250 identifiers to keys, 252 transformation of relationships,
252 – 267 mandatory and optional conditions,
261 – 265 transformation summary, 267 when to use, 248
traditional method, 244 Databases, post-relational, 39 – 40 DDLC, 29 – 33
design, 31 implementation, 31 phases and tasks, 32 process, starting the, 30 requirements definition, 30 roles and responsibilities, 33 Decision-support systems, 296 – 301 data modeling for, 301
history of, 297 Dimensional analysis See OLAP systems, dimensional analysis
Domains See Attributes, domains E-R modeling See Data modeling, methods and techniques; Peter Chen (E-R) modeling
Entity, checklist for validation of, 153 – 155
Trang 10Entity integrity See Relational model,
entity integrity
Entity types
aggregation, 129
association, 129
category of, 127
definition, comprehensive, 116
existence dependency, 132
homonyms, 125
ID dependency, 132
identifying, 120
intersection, 129
regular, 128
strong, 128
subtype, 128
supertype, 128
synonyms, 125
IDEF1X See Data modeling, methods and
techniques, IDEF1X
Identifiers or keys, 101, 175 – 178
generalization hierarchy, in, 177 – 178
guidelines for, 176
keys, definitions of, 175
need for, 175
Informal design, 272 – 276
potential problems, 273 – 276
addition anomaly, 276
deletion anomaly, 275
update anomaly, 275
Information engineering See Data modeling,
methods and techniques; Information
engineering
Information levels, 11 – 13
Integration definition for information
modeling See Data modeling,
methods and techniques,
IDEF1X
Key See also Identifiers or keys
composite, 176
natural, 176
primary, 176
surrogate, 176
Meta-modeling, 40
Modeling, agile, 379 – 385
documentation, 383
feasibility, 384
practices
additional, 383
primary, 381
principles auxiliary, 381 basic, 380 Modeling, evolutionary, 385 – 387 benefits of, 387
flexibility, need for, 386 nature of, 386
Modeling time dimension, 149 Normalization methodology, 276 – 291 fundamental normal forms, 278 – 285 Boyce – Codd normal form, 284 first normal form, 278
second normal form, 279 third normal form, 281 higher normal forms, 285 – 288 domain-key normal form, 288 fifth normal form, 287 fourth normal form, 286 normalization as verification, 291 steps, 277, 290
OLAP systems, 325 – 333 data modeling for, 332 dimensional analysis, 326 features, 325
hypercubes, 328 MOLAP, 330 ROLAP, 330 Online analytical processing See OLAP systems
ORM See Data modeling, methods and techniques; Object Role Modeling Peter Chen See Data modeling, methods and techniques; Peter Chen (E-R) modeling Process modeling, 40
Quality See Data model, quality Recursive structures, 145 Referential integrity See Relational model, referential integrity
Relational model, 231 – 242 columns as attributes, 234 entity integrity, 240 functional dependencies, 242 mathematical foundation, 232 modeling concept, single, 232 notation for, 237
referential integrity, 240 relation or table, 233