Chapter 6 data types

Primitive Data Types: Boolean• Simplest of allp • Range of values: two elements, one for “true” and one for “false” and one for false • Could be implemented as bits, but often as bytes –

Trang 1

• Primitive Data Types

• Character String Types

• User-Defined Ordinal Types

• Array TypesArray Types

Trang 2

• A data type defines a collection of data values and

a set of predefined operations on those objects

• Some data types are specified by type operators

which are used to form type expressions (‘[]’, ‘*’,

‘()’ in C for array, pointer, function, respectively)

• A descriptor is the collection of the attributes of a variable It is used for type checking and by

allocation and deallocation operations

es g ssue at ope at o s a e p o ded o

variables of the type and how are they specified?

Primitive Data Types

• Almost all programming languages provide a set

of primitive data types

– Primitive data types: Those not defined in terms yp

of other data types

• Some primitive data types are merely reflections

of the hardware

• Others require little software support

Trang 3

Primitive Data Types: Integer

• Almost always an exact reflection of the hardware

so the mapping is trivial

– Java: byte, short, int, long

– Ada: SHORT INTEGER, INTEGER and LONG INTEGER

• The leftmost bit is set to indicate negative and the remainder of the bit string represents the

absolute value of the number

complement to store negative integers, which is convenient for addition and subtraction

Primitive Data Types: Floating Point

• Model real numbers but only as approximations

• Model real numbers, but only as approximations for most real values

• Languages for scientific use support at least two

floating-point types (e.g., float and double;

sometimes more)

• Usually exactly like the hardware, but not alwaysy y , y

Trang 4

IEEE Floating Point Formats

Primitive Data Types: Decimal

• For business applications

• Store a fixed number of decimal digits

• The representations of data types are called

binar coded decimal (BCD)

– Two digits per byte

– The range of values is restricted because no

exponents are allowed

– Wastes memory

Trang 5

Primitive Data Types: Boolean

• Simplest of allp

• Range of values: two elements, one for “true” and one for “false”

and one for false

• Could be implemented as bits, but often as

bytes

– Advantage: readability

Primitive Data Types: Character

• Stored as numeric codings

• Most commonly used coding: ASCII

An alternative 16 bit coding: Unicode

• An alternative, 16-bit coding: Unicode

– Includes characters from most natural languages – Originally used in Java

– C# and JavaScript also support Unicode

Trang 6

Character String Types

• Values are sequences of characters

Character String Types Operations

Trang 7

Character String Type in Certain Languages

NAME1 := NAME1 & NAME2; (catenation)

NAME1(2:7) (substring reference)

• CC

– Not primitive

– Use char arrays and a library of functions that

provide operations

Character String Type in Certain Languages

• Java

– Primitive via the String class (not arrays of char)

Objects cannot be changed

– StringBuffer is a class for changeable string objects

• SNOBOL4 (a string manipulation language)

– Primitive

– Many operations, including elaborate pattern

hi matching

LETTER = ‘abcdefghijklmnopqrstuvwxyz’;

WORDPAT BREAK(LETTER) SPAN(LETTER) WORD

WORDPAT = BREAK(LETTER) SPAN(LETTER) WORD

Trang 8

Character String Length Options

• Static length string: FORTRAN 90, COBOL, Pascal,

Java’s String class, …

– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2

Limited dynamic length string: C/C++

• Limited dynamic length string: C/C++

– In C-based language, ‘\0’ is used to indicate the end of a string’s characters

• Dynamic length string: SNOBOL4, Perl, JavaScript

It requires the overhead of dynamic storage

– It requires the overhead of dynamic storage

allocation and deallocation but provides maximum flexibility

flexibility

• Ada supports all three string length options

Character String Implementation

• Static length: compile-time descriptor

• Limited dynamic length: may need a run-time descriptor for length (but not in C/C++)

• Dynamic length: need run-time descriptor

– Strings can be stored in a linked list, or

– Strings can be stored completely in adjacent

storage cells

Trang 9

Character String Implementation

Static string Limited dyn string Dyn string Length Max length Current length Length Max length Current length Address Current length Address

Address Compile-time Run-time descriptor Run-time

User-Defined Ordinal Types

• An ordinal type is one in which the range of

possible values can be easily associated with the set of positive integers

• Examples of primitive ordinal types in Java

– integer integer

– char

b l

– boolean

• In many languages, users can define two kinds

of ordinal types: enumeration and subrange

Trang 10

than one type definition, and if so, how is the type of

an occurrence of that constant checked?

Are en meration al es coerced to integer?

Example

• Pascal - cannot reuse constants; they can be

used for array subscripts for loops case

used for array subscripts, for loops, case

selectors; no input or output; can be compared

Trang 11

• C/C++, like Pascal, cannot reuse constants in a given referencing environment Enumeration values implicitly converted to integer

void main() {

enum months {Jan = 1, Feb, Mar, Apr, May, Jun,

Jul, Aug, Sep, Oct, Nov, Dec}; enum months month;

printf("%d\n", month = Sep);

Trang 12

Evaluation of Enumerated Type

• Aid to readability, e.g., no need to code a color

as a number

• Aid to reliability, e.g., compiler can check: y g p

– Operations (don’t allow colors to be added)

No enumeration variable can be assigned a value

– No enumeration variable can be assigned a value outside its defined range

subtype Weekdays is Days range mon fri;

subtype Index is Integer range 1 100;

…

Day1: Days;

Day2: Weekday;

Day2 := Day1;

Trang 13

Subrange Evaluation

• Aid to readability

– Make it clear to the readers that variables of

subrange can store only certain range of values g g

• Reliability

Assigning a value to a subrange variable that is

– Assigning a value to a subrange variable that is outside the specified range is detected as an

identified by its position in the aggregate,

relative to the first element

Trang 14

Array Design Issues

• What types are legal for subscripts?

• Are subscripting expressions range checked?

• How many subscripts are allowed?

• When does allocation take place?

• Can array objects be initialized?

• Are any kind of y slices allowed?

FORTRAN PL/I Ada use parentheses

– FORTRAN, PL/I, Ada use parentheses

– Most other languages use brackets

Trang 15

Arrays Index (Subscript) Types

• FORTRAN, C: integer only

• Pascal: any ordinal type (integer, Boolean, char, enumeration)

• Ada: integer or enumeration (includes Boolean and char)

• Java: integer types only

• C/C++, Perl, and Fortran do not specify range checking

• Java, ML, C# specify range checking

Subscript Binding and Array Categories

• Static array: subscript ranges are statically

bound and storage allocation is static (before run-time)

– Advantage: execution efficiency (no dynamic

Trang 16

Subscript Binding and Array Categories (cont )

• Stack-dynamic array: subscript ranges are

dynamically bound and the storage allocation is

dynamically bound and the storage allocation is dynamic (at run-time)

Advantage: flexibility (the size of an array need not

– Advantage: flexibility (the size of an array need not

be known until the array is to be used)

Ada:

– Ada:

procedure foo (size: integer) is

M: array (1 size, 1 size) of real y ( , )

The compiler arranges for a pointer to M to reside at a

• The compiler arranges for a pointer to M to reside at a static offset from the frame pointer

Trang 17

• Fixed heap-dynamic array: storage binding is dynamic but fixed after allocation (i.e., binding

is done when requested and storage is allocated q gfrom heap, not stack)

– Example: In FORTRAN 90 Example: In FORTRAN 90

INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT

ALLOCATE (MAT(10 NUMBER OF COLS))

ALLOCATE (MAT(10, NUMBER_OF_COLS))

DEALLOCATE (MAT)

– Example: malloc/free (C), new/delete (C++)

• Heap-dynamic array: binding of subscript

ranges and storage allocation is dynamic and can change any number of timesg y

– Advantage: Flexibility (arrays can grow or shrink during program execution)

Trang 18

• C/C++ arrays that include static modifier are

static

• C/C++ arrays without static modifier are fixed

stack-dynamic

• Ada arrays can be stack-dynamic

• C/C++ provide fixed heap-dynamic arrays

• C# includes a second array class ArrayList that

provides fixed heap-dynamic

• Perl and JavaScript support heap-dynamic arrays

Number of subscripts

• FORTRAN I allowed up to three

• FORTRAN 77 allows up to seven

Others no limit

• Others - no limit

Trang 19

SCORE : array (1 5) of Integer :=

Trang 20

• A slice is some substructure of an array

– It is a mechanism for referencing part of an array

as a unit

• Slices are only useful in languages that have array operations

array operations

Trang 21

V reverses the elements of V

M reverses the columns of M

Trang 22

Arrays Operations (cont.)

• Ada allows

– The assignment, arithmetic, relational operators

type ARY INT is array(1 6) of INTEGER; yp _ y( ) ;

flag : BOOLEAN;

crowd Group1 Group2 : ARY INT;

crowd, Group1, Group2 : ARY_INT;

crowd := Group1 + Group2;

– Logical operators

type ARY_BOOL is array(1 4) of BOOLEAN;

Result, Answer1, Answer2 : ARY BOOL; , , _ ;

Answer1 := (TRUE, FALSE, TRUE, FALSE);

Answer2 := (TRUE, FALSE, FALSE, TRUE);

Result := Answer1 and Answer2;

Trang 23

• Fortran 90 has a very rich set of array operations

A f th b ilt i ith ti t ill t k

– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements are

same shape as the operands, whose elements are the result of applying the operator to correspon- ding elements

– Slices of the same shape can be intermixed in

array operations, even if the arrays from which they were sliced have very different shapes

• FORTRAN 90 also includes library functions for matrix multiplication matrix transpose and

matrix multiplication, matrix transpose, and

vector dot product

Trang 24

Implementation of Arrays

• Access function maps subscript expressions to

an address in the array

• A single-dimensioned array is a list of adjacent memor cells

memory cells

• Hardware memory is linear – it is usually a simple

sequence of bytes So elements of

multi-dimensional arrays must be mapped onto the single-dimensioned memory

single-dimensioned memory

• Row-major or column-major order

Fortran uses column major order; most other

– Fortran uses column-major order; most other

languages use row-major order

Implementation of Arrays (cont.)

• The difference between row- and column-major

layout can be important for programs that use nested loops to access all the elements of a

large multi dimensional array

– If a small array is accessed frequently, all or most

of its elements are likely to remain in the cache – For a large array, each miss will bring into the

cache not only the desired element but the next

cache not only the desired element, but the next several elements as well

 If elements are accessed across cache lines then

 If elements are accessed across cache lines, then almost every access will result in a cache miss

Trang 25

Compile-Time Descriptors

Single-dimensioned array Multi-dimensional array

Trang 26

• Single-dimensioned array:

address(a[k]) =

address(a[lower_bound]) + ((k - lower_bound) * _ _ element_size)

• Double-dimensioned array (row-major order):Double dimensioned array (row major order):

address(a[i, j]) =

address(a[ro lb col lb]) + ((i ro lb) * N + address(a[row_lb, col_lb]) + ((i - row_lb) * N + (j - col_lb)) * element_size

where N is a number of column

• The design issues that are specific for

associative arrays are:

– What is the form of references to elements?

– Is the size of an associative array static or

dynamic?

Trang 27

Associative Arrays - Perl

• In Perl, associative arrays are often called

hashes, because in the implementation their elements are stored and retrieved with hash

functions

• The size of a Perl hash is dynamic: It grows

when a new element is added and shrinks when

l t i d l t d d l it i ti d b

an element is deleted, and also it is emptied by assignment of the empty literal

Associative Arrays – Perl (cont.)

• Structure and Operations

Hash variables begin with %

– Hash variables begin with %

– Scalar variable names begin with $

%fruit = (“apples” => 3, “oranges” => 6);

– Subscripting is done using braces and keys

Trang 28

Record Types

• A record is a possibly heterogeneous aggregate

of data elements in which the individual

elements are identified by namesy

• Design issues:

What is the syntactic form of references to the

– What is the syntactic form of references to the field?

– Are elliptical references allowed?

Definition of Records in COBOL

• COBOL uses level numbers to show nested

Trang 29

Definition of Records in Ada

• In Ada:

type DATE is

record

Month : INTEGER range 1 12;

Day : INTEGER range 1 31;

Year : INTEGER range 1776 2100;

• Most language use dot notation

• Fully qualified referencesFully qualified references must include all recordmust include all record names

– COBOL:

MID OF EMP-NAME OF EMP-REC

– Ada

Student: PERSON; … Student.Birthday.Year …

• Elliptical references allow leaving out record

names as long as the reference is unambiguous– COBOL COBOL

FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-REC

are elliptical references to the employee’s first name

– Ada Ada

with Student do Name := ‘Michael’;

Định dạng
Số trang	51
Dung lượng	667,3 KB