• A data type defines a collection of data values and a set of predefined operations on those objects • Some data types are specified by type constructors which are used to form type ex
Trang 1Chapter 6
Data Types
Trang 2Copyright © 2006 Addison-Wesley All rights reserved 1-2
Chapter 6 Topics
• Introduction
• Primitive Data Types
• Character String Types
• User-Defined Ordinal Types
Trang 3• A data type defines a collection of data values and
a set of predefined operations on those objects
• Some data types are specified by type constructors which are used to form type expressions („[]‟, „*‟,
„()‟ in C for array, pointer, function, respectively)
• A descriptor is the collection of the attributes of a variable It is used for type checking and by
allocation and deallocation operations
Trang 4Copyright © 2006 Addison-Wesley All rights reserved 1-4
Primitive Data Types
• Almost all programming languages provide a set of primitive data types
– Primitive data types: Those not defined in terms
of other data types
• Some primitive data types are merely
reflections of the hardware
• Others require little software support
Trang 5Primitive Data Types: Integer
• Almost always an exact reflection of the hardware
so the mapping is trivial
– Java: byte, short, int, long
– Ada: SHORT INTEGER, INTEGER and LONG INTEGER
• The leftmost bit is set to indicate negative and
the remainder of the bit string represents the
absolute value of the number
Trang 6Copyright © 2006 Addison-Wesley All rights reserved 1-6
Primitive Data Types: Floating Point
• Model real numbers, but only as
approximations for most real values
• Languages for scientific use support at least
two floating-point types (e.g., float and
double; sometimes more)
• The representation of this type is not very
much like the hardware
Trang 7IEEE Floating Point Formats
Trang 8Copyright © 2006 Addison-Wesley All rights reserved 1-8
Primitive Data Types: Decimal
• For business applications
• Store a fixed number of decimal digits
• The representations of data types are called
Binary Coded Decimal (BCD)
– Two digits per byte
• Advantage
– Accuracy
• Disadvantages
– The range of values is restricted because no
exponents are allowed
– BDC takes more storage then binary representation
Trang 9Primitive Data Types: Boolean
Trang 10Copyright © 2006 Addison-Wesley All rights reserved 1-10
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
– Includes characters from most natural languages– Originally used in Java
– C# and JavaScript also support Unicode
Trang 11Character String Types
• Values are sequences of characters
Trang 12Copyright © 2006 Addison-Wesley All rights reserved 1-12
Character String Types Operations
Trang 13Character String Type in Certain Languages
NAME1 := NAME1 & NAME2; (catenation)
NAME1(2:7) (substring reference)
• C
Trang 14Copyright © 2006 Addison-Wesley All rights reserved 1-14
• Java
– Primitive via the String class (not arrays of char)
Objects cannot be changed
– StringBuffer is a class for changeable string objects
• SNOBOL4 (a string manipulation language)
Trang 15Character String Length Options
• Static length string: FORTRAN 90, COBOL, Pascal,
Java‟s String class, …
– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2
• Limited dynamic length string: C/C++
– In C-based language, „\0‟ is used to indicate the end of a string‟s characters
• Dynamic length string: SNOBOL4, Perl, JavaScript
– It requires the overhead of dynamic storage
Trang 16Copyright © 2006 Addison-Wesley All rights reserved 1-16
Character String Implementation
• Static length: compile-time descriptor
• Limited dynamic length: may need a run-time descriptor for length (but not in C/C++)
• Dynamic length: need run-time descriptor
– Strings can be stored in a linked list, or
– Strings can be stored completely in adjacent
storage cells
Trang 17Static string Limited dyn string Dyn string
Length Max length Current lengthAddress Current length Address
AddressCompile-time Run-time descriptor Run-time
Character String Implementation
Trang 18Copyright © 2006 Addison-Wesley All rights reserved 1-18
User-Defined Ordinal Types
• An ordinal type is one in which the range of
possible values can be easily associated with the set of positive integers
• Examples of primitive ordinal types in Java
– integer
– char
– boolean
• In many languages, users can define two kinds
of ordinal types: enumeration and subrange
Trang 19Enumeration Types
• All possible values, which are named
constants, are provided in the definition
• Ada:
type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
• Design issues
– Is an enumeration constant allowed to appear in
more than one type definition, and if so, how is the
Trang 20Copyright © 2006 Addison-Wesley All rights reserved 1-20
Example
• Pascal - cannot reuse constants; they can be
used for array subscripts, for loops, case
selectors; no input or output; can be compared
Trang 21• C/C++ cannot reuse constants in a given
referencing environment Enumeration
values implicitly converted to integer
void main() {
enum months {Jan = 1, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec};
Trang 22Copyright © 2006 Addison-Wesley All rights reserved 1-22
Trang 23Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a
color as a number
• Aid to reliability, e.g., compiler can check:
– Operations (don‟t allow colors to be added)
– No enumeration variable can be assigned a value outside its defined range
– Ada, C#, and Java 5.0: enumeration type variables
Trang 24Copyright © 2006 Addison-Wesley All rights reserved 1-24
Trang 25Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of
subrange can store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is outside the specified range is detected as an
error
Trang 26Copyright © 2006 Addison-Wesley All rights reserved 1-26
Array Types
• An array is an aggregate of homogeneous
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element
Trang 27Array Design Issues
• What types are legal for subscripts?
• Are subscripting expressions range checked?
• How many subscripts are allowed?
• When does allocation take place?
• Can array objects be initialized?
• Are any kind of slices allowed?
Trang 28Copyright © 2006 Addison-Wesley All rights reserved 1-28
Array Indexing
• Indexing (or subscripting) is a mapping
from indices to elements
array_name(index_value)
• Index syntax
– FORTRAN, PL/I, Ada use parentheses
– Most other languages use square brackets
Trang 29Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Pascal: any ordinal type (integer, Boolean,
char, enumeration)
• Ada: integer or enumeration (includes
Boolean and char)
• Java: integer types only
• C/C++, Perl, and Fortran do not specify
Trang 30Copyright © 2006 Addison-Wesley All rights reserved 1-30
Subscript Binding and Array Categories
• Static array: subscript ranges are statically
bound and storage allocation is static (before
run-time)
– Advantage: execution efficiency (no dynamic
allocation)
• Fixed stack-dynamic array : subscript ranges are
statically bound, but the allocation is done at
declaration elaboration time during execution
– Advantage: space efficiency
Trang 31Subscript Binding and Array Categories (cont.)
dynamically bound and the storage allocation
is dynamic (at run-time)
– Advantage: flexibility (the size of an array need not
be known until the array is to be used)
– Ada:
procedure foo (size: integer) is
M: array (1 size, 1 size) of real
Trang 32Copyright © 2006 Addison-Wesley All rights reserved 1-32
• The compiler arranges for a pointer to M to reside at a
static offset from the frame pointer
Trang 33Subscript Binding and Array Categories (cont.)
is dynamic but fixed after allocation (i.e.,
binding is done when requested and storage
is allocated from heap, not stack)
– Example: In FORTRAN 90
INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT
ALLOCATE (MAT(10, NUMBER_OF_COLS))
Trang 34Copyright © 2006 Addison-Wesley All rights reserved 1-34
Subscript Binding and Array Categories (cont.)
ranges and storage allocation is dynamic and can change any number of times
– Advantage: Flexibility (arrays can grow or shrink during program execution)
– Example: In Perl
@alpha = ("a" "z");
push(@array, <element>);
pop(@array);
Trang 35Subscript Binding and Array Categories (cont.)
• C/C++ arrays that include static modifier are
static
• C/C++ arrays without static modifier are fixed
stack-dynamic
• Ada arrays can be stack-dynamic
• C/C++ provide fixed heap-dynamic arrays
• C# includes a second array class ArrayList that
Trang 36Copyright © 2006 Addison-Wesley All rights reserved 1-36
Number of subscripts
• FORTRAN I allowed up to three
• FORTRAN 77 allows up to seven
• Others - no limit
Trang 37Array Initialization
• Some language allow initialization at the
time of storage allocation
Trang 38Copyright © 2006 Addison-Wesley All rights reserved 1-38
– It allows the rows to have different lengths,
without devoting space to holes at the ends of the rows
Trang 39"Tuesday", "Wednesday",
"Thursday", "Friday",
An example in C/C++
Trang 40Copyright © 2006 Addison-Wesley All rights reserved 1-40
Slices
– It is a mechanism for referencing part of an array
as a unit
• Slices are only useful in languages that have array operations
Trang 41Slice Examples
• Fortran 90
INTEGER MAT(1:3, 1:3), CUBE(1:3, 1:3, 1:4)
Trang 42Copyright © 2006 Addison-Wesley All rights reserved 1-42
Arrays Operations
• APL provides the most powerful array processing operations for vectors (V) and matrixes (M) as
well as unary operators
V reverses the elements of V
M reverses the columns of M
M reverses the rows of M
M transposes M
A + B whether A and B are scalar variables, vectors,
or matrixes
Trang 43Arrays Operations (cont.)
• Ada allows
– The assignment, arithmetic, relational operators
type ARY_INT is array(1 6) of INTEGER;
flag : BOOLEAN;
crowd, Group1, Group2 : ARY_INT;
Group1 := (12, 17, -1, 3, -100, 5);
Group2 := (13, -2, 22, 1, 1242, -12);
Trang 44Arrays Operations (cont.)
– Logical operators
type ARY_BOOL is array(1 4) of BOOLEAN;
Result, Answer1, Answer2 : ARY_BOOL;
Answer1 := (TRUE, FALSE, TRUE, FALSE);
Answer2 := (TRUE, FALSE, FALSE, TRUE);
Result := Answer1 and Answer2;
Copyright © 2006 Addison-Wesley All rights reserved 1-44
Trang 45Arrays Operations (cont.)
Trang 46Copyright © 2006 Addison-Wesley All rights reserved 1-46
Arrays Operations (cont.)
• Fortran 90 has a very rich set of array operations
– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements are the result of applying the operator to correspon-ding elements
– Slices of the same shape can be intermixed in
array operations, even if the arrays from which
they were sliced have very different shapes
• It also includes library functions for matrix
multiplication, matrix transpose, and vector dot product
Trang 47Implementation of Arrays
to an address in the array
• A single-dimensioned array is a list of
adjacent memory cells
• Hardware memory is linear The elements of multi-dimensional arrays must be mapped
onto the single-dimensioned memory
• Row-major or column-major order
Trang 48Implementation of Arrays (cont.)
• The difference between row- and column-major layout can be important for programs that use
nested loops to access all the elements of a
large, multi-dimensional array
– If a small array is accessed frequently, all or most
of its elements are likely to remain in the cache
– For a large array, each miss will bring into the
cache not only the desired element, but the next several elements as well
If elements are accessed across cache lines, then almost every access will result in a cache miss
Copyright © 2006 Addison-Wesley All rights reserved 1-48
Trang 49Compile-Time Descriptors
Trang 50Copyright © 2006 Addison-Wesley All rights reserved 1-50
Implementation of Arrays (cont.)
Trang 51Implementation of Arrays (cont.)
Trang 52Copyright © 2006 Addison-Wesley All rights reserved 1-52
Associative Arrays
• An associative array is an unordered
collection of data elements that are indexed
by an equal number of values called key s
• Each element of an associative array is a pair
of entities: <key, value>
• The design issues that are specific for
associative arrays are:
– What is the form of references to elements?
– Is the size of an associative array static or
dynamic?
Trang 53Associative Arrays - Perl
• In Perl, associative arrays are often called
elements are stored and retrieved with hash functions
• The size of a Perl hash is dynamic: It grows when a new element is added and shrinks
when an element is deleted, and also it is
Trang 54Copyright © 2006 Addison-Wesley All rights reserved 1-54
Associative Arrays – Perl (cont.)
• Structure and Operations
– Every hash variable must begin with a %
– Hashes can be set to literal values with the assignment statement
%fruit = (“apples” => 3, “oranges” => 6);
– Subscripting is done using braces and keys
Trang 55Record Types
aggregate of data elements in which the
individual elements are identified by names
Trang 56Copyright © 2006 Addison-Wesley All rights reserved 1-56
Definition of Records in COBOL
• COBOL uses level numbers to show nested
Trang 57Definition of Records in Ada
• In Ada:
type DATE is
record
Month : INTEGER range 1 12;
Day : INTEGER range 1 31;
Year : INTEGER range 1776 2100;
end record;
type PERSON is
record
Trang 58Copyright © 2006 Addison-Wesley All rights reserved 1-58
References to Records
• Most language uses dot notation
• Fully qualified references must include all record names
– COBOL:
MID OF EMP-NAME OF EMP-REC
– Ada
Student: PERSON; … Student.Birthday.Year …
• Elliptical references allow leaving out record
names as long as the reference is unambiguous
– COBOL
FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-REC
are elliptical references to the employee‟s first name
– Ada
with Student do Name := „Michael‟;
Trang 59Operations on Records
• Assignment is very common if the types are identical
• Ada allows record comparison
• COBOL provides MOVE CORRESPONDING
– Copies a field of the source record to the
corresponding field in the target record
Trang 60Copyright © 2006 Addison-Wesley All rights reserved 1-60
Trang 61Implementation of record types
A compile-time
descriptor for a record
Trang 62Copyright © 2006 Addison-Wesley All rights reserved 1-62
Memory layout – an example in Pascal
type element = record
metallic: Boolean end;
Trang 63Evaluation and Comparison to Arrays
• Arrays are used when all the data values have the same type and are processed in the same way
• Records are used when the collection of data
values is heterogeneous and the different fields are not processed in the same way
• Access to array elements is much slower than
access to record fields, because subscripts are
Trang 64Copyright © 2006 Addison-Wesley All rights reserved 1-64
Unions Types
allowed to store different type values at
different times during execution
• Design issues
– Should type checking be required?
– Should unions be embedded in records?
Trang 65Discriminated vs Free Unions
• Fortran, C/C++ provide union constructs in which there is no language support for type checking; the union in these languages is
called free union
• Type checking of unions require that each
union include a type indicator called a tag (or