Bài giảng Chapter 6 data types

• A data type defines a collection of data values and a set of predefined operations on those objects • Some data types are specified by type constructors which are used to form type ex

Trang 1

Chapter 6

Data Types

Trang 2

Chapter 6 Topics

• Introduction

• Primitive Data Types

• Character String Types

• User-Defined Ordinal Types

Trang 3

• A data type defines a collection of data values and

a set of predefined operations on those objects

• Some data types are specified by type constructors which are used to form type expressions („[]‟, „*‟,

„()‟ in C for array, pointer, function, respectively)

• A descriptor is the collection of the attributes of a variable It is used for type checking and by

allocation and deallocation operations

Trang 4

Primitive Data Types

• Almost all programming languages provide a set of primitive data types

– Primitive data types: Those not defined in terms

of other data types

• Some primitive data types are merely

reflections of the hardware

• Others require little software support

Trang 5

Primitive Data Types: Integer

• Almost always an exact reflection of the hardware

so the mapping is trivial

– Java: byte, short, int, long

– Ada: SHORT INTEGER, INTEGER and LONG INTEGER

• The leftmost bit is set to indicate negative and

the remainder of the bit string represents the

absolute value of the number

Trang 6

Primitive Data Types: Floating Point

• Model real numbers, but only as

approximations for most real values

• Languages for scientific use support at least

two floating-point types (e.g., float and

double; sometimes more)

• The representation of this type is not very

much like the hardware

Trang 7

IEEE Floating Point Formats

Trang 8

Primitive Data Types: Decimal

• For business applications

• Store a fixed number of decimal digits

• The representations of data types are called

Binary Coded Decimal (BCD)

– Two digits per byte

• Advantage

– Accuracy

• Disadvantages

– The range of values is restricted because no

exponents are allowed

– BDC takes more storage then binary representation

Trang 9

Primitive Data Types: Boolean

Trang 10

Primitive Data Types: Character

• Stored as numeric codings

• Most commonly used coding: ASCII

• An alternative, 16-bit coding: Unicode

– Includes characters from most natural languages– Originally used in Java

– C# and JavaScript also support Unicode

Trang 11

Character String Types

• Values are sequences of characters

Trang 12

Character String Types Operations

Trang 13

Character String Type in Certain Languages

NAME1 := NAME1 & NAME2; (catenation)

NAME1(2:7) (substring reference)

• C

Trang 14

• Java

– Primitive via the String class (not arrays of char)

Objects cannot be changed

– StringBuffer is a class for changeable string objects

• SNOBOL4 (a string manipulation language)

Trang 15

Character String Length Options

• Static length string: FORTRAN 90, COBOL, Pascal,

Java‟s String class, …

– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2

• Limited dynamic length string: C/C++

– In C-based language, „\0‟ is used to indicate the end of a string‟s characters

• Dynamic length string: SNOBOL4, Perl, JavaScript

– It requires the overhead of dynamic storage

Trang 16

Character String Implementation

• Static length: compile-time descriptor

• Limited dynamic length: may need a run-time descriptor for length (but not in C/C++)

• Dynamic length: need run-time descriptor

– Strings can be stored in a linked list, or

– Strings can be stored completely in adjacent

storage cells

Trang 17

Static string Limited dyn string Dyn string

Length Max length Current lengthAddress Current length Address

AddressCompile-time Run-time descriptor Run-time

Character String Implementation

Trang 18

User-Defined Ordinal Types

• An ordinal type is one in which the range of

possible values can be easily associated with the set of positive integers

• Examples of primitive ordinal types in Java

– integer

– char

– boolean

• In many languages, users can define two kinds

of ordinal types: enumeration and subrange

Trang 19

Enumeration Types

• All possible values, which are named

constants, are provided in the definition

• Ada:

type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);

• Design issues

– Is an enumeration constant allowed to appear in

more than one type definition, and if so, how is the

Trang 20

Example

• Pascal - cannot reuse constants; they can be

used for array subscripts, for loops, case

selectors; no input or output; can be compared

Trang 21

• C/C++ cannot reuse constants in a given

referencing environment Enumeration

values implicitly converted to integer

void main() {

enum months {Jan = 1, Feb, Mar, Apr, May, Jun,

Jul, Aug, Sep, Oct, Nov, Dec};

Trang 22

Trang 23

Evaluation of Enumerated Type

• Aid to readability, e.g., no need to code a

color as a number

• Aid to reliability, e.g., compiler can check:

– Operations (don‟t allow colors to be added)

– No enumeration variable can be assigned a value outside its defined range

– Ada, C#, and Java 5.0: enumeration type variables

Trang 24

Trang 25

Subrange Evaluation

• Aid to readability

– Make it clear to the readers that variables of

subrange can store only certain range of values

• Reliability

– Assigning a value to a subrange variable that is outside the specified range is detected as an

error

Trang 26

Array Types

• An array is an aggregate of homogeneous

data elements in which an individual

element is identified by its position in the

aggregate, relative to the first element

Trang 27

Array Design Issues

• What types are legal for subscripts?

• Are subscripting expressions range checked?

• How many subscripts are allowed?

• When does allocation take place?

• Can array objects be initialized?

• Are any kind of slices allowed?

Trang 28

Array Indexing

• Indexing (or subscripting) is a mapping

from indices to elements

array_name(index_value)

• Index syntax

– FORTRAN, PL/I, Ada use parentheses

– Most other languages use square brackets

Trang 29

Arrays Index (Subscript) Types

• FORTRAN, C: integer only

• Pascal: any ordinal type (integer, Boolean,

char, enumeration)

• Ada: integer or enumeration (includes

Boolean and char)

• Java: integer types only

• C/C++, Perl, and Fortran do not specify

Trang 30

Subscript Binding and Array Categories

• Static array: subscript ranges are statically

bound and storage allocation is static (before

run-time)

– Advantage: execution efficiency (no dynamic

allocation)

• Fixed stack-dynamic array : subscript ranges are

statically bound, but the allocation is done at

declaration elaboration time during execution

– Advantage: space efficiency

Trang 31

Subscript Binding and Array Categories (cont.)

dynamically bound and the storage allocation

is dynamic (at run-time)

– Advantage: flexibility (the size of an array need not

be known until the array is to be used)

– Ada:

procedure foo (size: integer) is

M: array (1 size, 1 size) of real

Trang 32

• The compiler arranges for a pointer to M to reside at a

static offset from the frame pointer

Trang 33

is dynamic but fixed after allocation (i.e.,

binding is done when requested and storage

is allocated from heap, not stack)

– Example: In FORTRAN 90

INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT

ALLOCATE (MAT(10, NUMBER_OF_COLS))

Trang 34

ranges and storage allocation is dynamic and can change any number of times

– Advantage: Flexibility (arrays can grow or shrink during program execution)

– Example: In Perl

@alpha = ("a" "z");

push(@array, <element>);

pop(@array);

Trang 35

• C/C++ arrays that include static modifier are

static

• C/C++ arrays without static modifier are fixed

stack-dynamic

• Ada arrays can be stack-dynamic

• C/C++ provide fixed heap-dynamic arrays

• C# includes a second array class ArrayList that

Trang 36

Number of subscripts

• FORTRAN I allowed up to three

• FORTRAN 77 allows up to seven

• Others - no limit

Trang 37

Array Initialization

• Some language allow initialization at the

time of storage allocation

Trang 38

– It allows the rows to have different lengths,

without devoting space to holes at the ends of the rows

Trang 39

"Tuesday", "Wednesday",

"Thursday", "Friday",

An example in C/C++

Trang 40

Slices

– It is a mechanism for referencing part of an array

as a unit

• Slices are only useful in languages that have array operations

Trang 41

Slice Examples

• Fortran 90

INTEGER MAT(1:3, 1:3), CUBE(1:3, 1:3, 1:4)

Trang 42

Arrays Operations

• APL provides the most powerful array processing operations for vectors (V) and matrixes (M) as

well as unary operators

 V reverses the elements of V

 M reverses the columns of M

 M reverses the rows of M

 M transposes M

A + B whether A and B are scalar variables, vectors,

or matrixes

Trang 43

Arrays Operations (cont.)

• Ada allows

– The assignment, arithmetic, relational operators

type ARY_INT is array(1 6) of INTEGER;

flag : BOOLEAN;

crowd, Group1, Group2 : ARY_INT;

Group1 := (12, 17, -1, 3, -100, 5);

Group2 := (13, -2, 22, 1, 1242, -12);

Trang 44

– Logical operators

type ARY_BOOL is array(1 4) of BOOLEAN;

Result, Answer1, Answer2 : ARY_BOOL;

Answer1 := (TRUE, FALSE, TRUE, FALSE);

Answer2 := (TRUE, FALSE, FALSE, TRUE);

Result := Answer1 and Answer2;

Trang 45

Trang 46

• Fortran 90 has a very rich set of array operations

– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements are the result of applying the operator to correspon-ding elements

– Slices of the same shape can be intermixed in

array operations, even if the arrays from which

they were sliced have very different shapes

• It also includes library functions for matrix

multiplication, matrix transpose, and vector dot product

Trang 47

Implementation of Arrays

to an address in the array

• A single-dimensioned array is a list of

adjacent memory cells

• Hardware memory is linear The elements of multi-dimensional arrays must be mapped

onto the single-dimensioned memory

• Row-major or column-major order

Trang 48

Implementation of Arrays (cont.)

• The difference between row- and column-major layout can be important for programs that use

nested loops to access all the elements of a

large, multi-dimensional array

– If a small array is accessed frequently, all or most

of its elements are likely to remain in the cache

– For a large array, each miss will bring into the

cache not only the desired element, but the next several elements as well

 If elements are accessed across cache lines, then almost every access will result in a cache miss

Trang 49

Compile-Time Descriptors

Trang 50

Trang 51

Trang 52

Associative Arrays

• An associative array is an unordered

collection of data elements that are indexed

by an equal number of values called key s

• Each element of an associative array is a pair

of entities: <key, value>

• The design issues that are specific for

associative arrays are:

– What is the form of references to elements?

– Is the size of an associative array static or

dynamic?

Trang 53

Associative Arrays - Perl

• In Perl, associative arrays are often called

elements are stored and retrieved with hash functions

• The size of a Perl hash is dynamic: It grows when a new element is added and shrinks

when an element is deleted, and also it is

Trang 54

Associative Arrays – Perl (cont.)

• Structure and Operations

– Every hash variable must begin with a %

– Hashes can be set to literal values with the assignment statement

%fruit = (“apples” => 3, “oranges” => 6);

– Subscripting is done using braces and keys

Trang 55

Record Types

aggregate of data elements in which the

individual elements are identified by names

Trang 56

Definition of Records in COBOL

• COBOL uses level numbers to show nested

Trang 57

Definition of Records in Ada

• In Ada:

type DATE is

record

Month : INTEGER range 1 12;

Day : INTEGER range 1 31;

Year : INTEGER range 1776 2100;

end record;

type PERSON is

record

Trang 58

References to Records

• Most language uses dot notation

• Fully qualified references must include all record names

– COBOL:

MID OF EMP-NAME OF EMP-REC

– Ada

Student: PERSON; … Student.Birthday.Year …

• Elliptical references allow leaving out record

names as long as the reference is unambiguous

– COBOL

FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-REC

are elliptical references to the employee‟s first name

– Ada

with Student do Name := „Michael‟;

Trang 59

Operations on Records

• Assignment is very common if the types are identical

• Ada allows record comparison

• COBOL provides MOVE CORRESPONDING

– Copies a field of the source record to the

corresponding field in the target record

Trang 60

Trang 61

Implementation of record types

A compile-time

descriptor for a record

Trang 62

Memory layout – an example in Pascal

type element = record

metallic: Boolean end;

Trang 63

Evaluation and Comparison to Arrays

• Arrays are used when all the data values have the same type and are processed in the same way

• Records are used when the collection of data

values is heterogeneous and the different fields are not processed in the same way

• Access to array elements is much slower than

access to record fields, because subscripts are

Trang 64

Unions Types

allowed to store different type values at

different times during execution

• Design issues

– Should type checking be required?

– Should unions be embedded in records?

Trang 65

Discriminated vs Free Unions

• Fortran, C/C++ provide union constructs in which there is no language support for type checking; the union in these languages is

called free union

• Type checking of unions require that each

union include a type indicator called a tag (or

Tiêu đề	Data Types
Trường học	Addison-Wesley
Thể loại	chapter
Năm xuất bản	2006

Định dạng
Số trang	102
Dung lượng	0,95 MB