Primitive Data Types: Boolean• Simplest of allp • Range of values: two elements, one for “true” and one for “false” and one for false • Could be implemented as bits, but often as bytes –
Trang 1• Primitive Data Types
• Character String Types
• User-Defined Ordinal Types
• Array TypesArray Types
Trang 2• A data type defines a collection of data values and
a set of predefined operations on those objects
• Some data types are specified by type operators
which are used to form type expressions (‘[]’, ‘*’,
‘()’ in C for array, pointer, function, respectively)
• A descriptor is the collection of the attributes of a variable It is used for type checking and by
allocation and deallocation operations
• Design issue: What operations are provided for Copyright © 2006 Addison-Wesley All rights reserved 1-3
es g ssue at ope at o s a e p o ded o
variables of the type and how are they specified?
Primitive Data Types
• Almost all programming languages provide a set
of primitive data types
– Primitive data types: Those not defined in terms yp
of other data types
• Some primitive data types are merely reflections
• Some primitive data types are merely reflections
of the hardware
• Others require little software support
Trang 3Primitive Data Types: Integer
• Almost always an exact reflection of the hardware
so the mapping is trivial
– Java: byte, short, int, long
– Ada: SHORT INTEGER, INTEGER and LONG INTEGER
• The leftmost bit is set to indicate negative and the remainder of the bit string represents the
absolute value of the number
• Most computers now use a notation called two’s complement to store negative integers, which isCopyright © 2006 Addison-Wesley All rights reserved 1-5
complement to store negative integers, which is convenient for addition and subtraction
Primitive Data Types: Floating Point
• Model real numbers but only as approximations
• Model real numbers, but only as approximations for most real values
• Languages for scientific use support at least two
floating-point types (e.g., float and double;
sometimes more)
• Usually exactly like the hardware, but not alwaysy y , y
Trang 4IEEE Floating Point Formats
Copyright © 2006 Addison-Wesley All rights reserved 1-7
Primitive Data Types: Decimal
• For business applications
• Store a fixed number of decimal digits
• The representations of data types are called
binar coded decimal (BCD)
– Two digits per byte
– The range of values is restricted because no
exponents are allowed
exponents are allowed
– Wastes memory
Trang 5Primitive Data Types: Boolean
• Simplest of allp
• Range of values: two elements, one for “true” and one for “false”
and one for false
• Could be implemented as bits, but often as
bytes
– Advantage: readability
Copyright © 2006 Addison-Wesley All rights reserved 1-9
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
An alternative 16 bit coding: Unicode
• An alternative, 16-bit coding: Unicode
– Includes characters from most natural languages – Originally used in Java
– C# and JavaScript also support Unicode
Trang 6Character String Types
• Values are sequences of characters
Copyright © 2006 Addison-Wesley All rights reserved 1-11
Character String Types Operations
Trang 7Character String Type in Certain Languages
NAME1 := NAME1 & NAME2; (catenation)
NAME1 := NAME1 & NAME2; (catenation)
NAME1(2:7) (substring reference)
• CC
– Not primitive
– Use char arrays and a library of functions that
Copyright © 2006 Addison-Wesley All rights reserved 1-13
– Use char arrays and a library of functions that
provide operations
Character String Type in Certain Languages
• Java
– Primitive via the String class (not arrays of char)
Objects cannot be changed
– StringBuffer is a class for changeable string objects
• SNOBOL4 (a string manipulation language)
– Primitive
– Many operations, including elaborate pattern
hi matching
LETTER = ‘abcdefghijklmnopqrstuvwxyz’;
WORDPAT BREAK(LETTER) SPAN(LETTER) WORD
WORDPAT = BREAK(LETTER) SPAN(LETTER) WORD
Trang 8Character String Length Options
• Static length string: FORTRAN 90, COBOL, Pascal,
Java’s String class, …
– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2
Limited dynamic length string: C/C++
• Limited dynamic length string: C/C++
– In C-based language, ‘\0’ is used to indicate the end of a string’s characters
• Dynamic length string: SNOBOL4, Perl, JavaScript
It requires the overhead of dynamic storage
– It requires the overhead of dynamic storage
allocation and deallocation but provides maximum flexibility
Copyright © 2006 Addison-Wesley All rights reserved 1-15
flexibility
• Ada supports all three string length options
Character String Implementation
• Static length: compile-time descriptor
• Limited dynamic length: may need a run-time descriptor for length (but not in C/C++)
• Dynamic length: need run-time descriptor
– Strings can be stored in a linked list, or
– Strings can be stored completely in adjacent
storage cells
Trang 9Character String Implementation
Static string Limited dyn string Dyn string Length Max length Current length Length Max length Current length Address Current length Address
Address Compile-time Run-time descriptor Run-time
Run time descriptor for dynamic strings Copyright © 2006 Addison-Wesley All rights reserved 1-17
User-Defined Ordinal Types
• An ordinal type is one in which the range of
possible values can be easily associated with the set of positive integers
• Examples of primitive ordinal types in Java
– integer integer
– char
b l
– boolean
• In many languages, users can define two kinds
of ordinal types: enumeration and subrange
Trang 10than one type definition, and if so, how is the type of
an occurrence of that constant checked?
Are en meration al es coerced to integer?
Copyright © 2006 Addison-Wesley All rights reserved 1-19 – Are enumeration values coerced to integer?
Example
• Pascal - cannot reuse constants; they can be
used for array subscripts for loops case
used for array subscripts, for loops, case
selectors; no input or output; can be compared
Trang 11• C/C++, like Pascal, cannot reuse constants in a given referencing environment Enumeration values implicitly converted to integer
void main() {
enum months {Jan = 1, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec}; enum months month;
printf("%d\n", month = Sep);
Copyright © 2006 Addison-Wesley All rights reserved 1-21
Trang 12Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a color
as a number
• Aid to reliability, e.g., compiler can check: y g p
– Operations (don’t allow colors to be added)
No enumeration variable can be assigned a value
– No enumeration variable can be assigned a value outside its defined range
subtype Weekdays is Days range mon fri;
subtype Index is Integer range 1 100;
…
Day1: Days;
Day2: Weekday;
Day2 := Day1;
Trang 13Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of
subrange can store only certain range of values g g
• Reliability
Assigning a value to a subrange variable that is
– Assigning a value to a subrange variable that is outside the specified range is detected as an
identified by its position in the aggregate,
identified by its position in the aggregate,
relative to the first element
Trang 14Array Design Issues
• What types are legal for subscripts?
• Are subscripting expressions range checked?
• How many subscripts are allowed?
• When does allocation take place?
• Can array objects be initialized?
• Are any kind of y slices allowed?
Copyright © 2006 Addison-Wesley All rights reserved 1-27
FORTRAN PL/I Ada use parentheses
– FORTRAN, PL/I, Ada use parentheses
– Most other languages use brackets
Trang 15Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Pascal: any ordinal type (integer, Boolean, char, enumeration)
• Ada: integer or enumeration (includes Boolean and char)
• Java: integer types only
• C/C++, Perl, and Fortran do not specify range checking
Copyright © 2006 Addison-Wesley All rights reserved 1-29
• Java, ML, C# specify range checking
Subscript Binding and Array Categories
• Static array: subscript ranges are statically
bound and storage allocation is static (before run-time)
– Advantage: execution efficiency (no dynamic
Trang 16Subscript Binding and Array Categories (cont )
• Stack-dynamic array: subscript ranges are
dynamically bound and the storage allocation is
dynamically bound and the storage allocation is dynamic (at run-time)
Advantage: flexibility (the size of an array need not
– Advantage: flexibility (the size of an array need not
be known until the array is to be used)
Ada:
– Ada:
procedure foo (size: integer) is
M: array (1 size, 1 size) of real y ( , )
The compiler arranges for a pointer to M to reside at a
• The compiler arranges for a pointer to M to reside at a static offset from the frame pointer
Trang 17Subscript Binding and Array Categories (cont )
• Fixed heap-dynamic array: storage binding is dynamic but fixed after allocation (i.e., binding
is done when requested and storage is allocated q gfrom heap, not stack)
– Example: In FORTRAN 90 Example: In FORTRAN 90
INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT
ALLOCATE (MAT(10 NUMBER OF COLS))
ALLOCATE (MAT(10, NUMBER_OF_COLS))
DEALLOCATE (MAT)
Copyright © 2006 Addison-Wesley All rights reserved 1-33
– Example: malloc/free (C), new/delete (C++)
Subscript Binding and Array Categories (cont )
• Heap-dynamic array: binding of subscript
ranges and storage allocation is dynamic and can change any number of timesg y
– Advantage: Flexibility (arrays can grow or shrink during program execution)
Trang 18Subscript Binding and Array Categories (cont )
• C/C++ arrays that include static modifier are
static
• C/C++ arrays without static modifier are fixed
stack-dynamic
• Ada arrays can be stack-dynamic
• C/C++ provide fixed heap-dynamic arrays
• C# includes a second array class ArrayList that
provides fixed heap-dynamic
Copyright © 2006 Addison-Wesley All rights reserved 1-35
• Perl and JavaScript support heap-dynamic arrays
Number of subscripts
• FORTRAN I allowed up to three
• FORTRAN 77 allows up to seven
Others no limit
• Others - no limit
Trang 19SCORE : array (1 5) of Integer :=
Copyright © 2006 Addison-Wesley All rights reserved 1-37
Trang 20• A slice is some substructure of an array
– It is a mechanism for referencing part of an array
as a unit
• Slices are only useful in languages that have array operations
array operations
Trang 21V reverses the elements of V
M reverses the columns of M
Trang 22Arrays Operations (cont.)
• Ada allows
– The assignment, arithmetic, relational operators
type ARY INT is array(1 6) of INTEGER; yp _ y( ) ;
flag : BOOLEAN;
crowd Group1 Group2 : ARY INT;
crowd, Group1, Group2 : ARY_INT;
Copyright © 2006 Addison-Wesley All rights reserved 1-43
crowd := Group1 + Group2;
Arrays Operations (cont.)
– Logical operators
type ARY_BOOL is array(1 4) of BOOLEAN;
Result, Answer1, Answer2 : ARY BOOL; , , _ ;
Answer1 := (TRUE, FALSE, TRUE, FALSE);
Answer2 := (TRUE, FALSE, FALSE, TRUE);
Result := Answer1 and Answer2;
Trang 23Arrays Operations (cont.)
Arrays Operations (cont.)
• Fortran 90 has a very rich set of array operations
A f th b ilt i ith ti t ill t k
– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements are
same shape as the operands, whose elements are the result of applying the operator to correspon- ding elements
– Slices of the same shape can be intermixed in
array operations, even if the arrays from which they were sliced have very different shapes
• FORTRAN 90 also includes library functions for matrix multiplication matrix transpose and
matrix multiplication, matrix transpose, and
vector dot product
Trang 24Implementation of Arrays
• Access function maps subscript expressions to
an address in the array
• A single-dimensioned array is a list of adjacent memor cells
memory cells
• Hardware memory is linear – it is usually a simple
sequence of bytes So elements of
multi-dimensional arrays must be mapped onto the single-dimensioned memory
single-dimensioned memory
• Row-major or column-major order
Fortran uses column major order; most other
Copyright © 2006 Addison-Wesley All rights reserved 1-47
– Fortran uses column-major order; most other
languages use row-major order
Implementation of Arrays (cont.)
• The difference between row- and column-major
layout can be important for programs that use nested loops to access all the elements of a
large multi dimensional array
– If a small array is accessed frequently, all or most
of its elements are likely to remain in the cache – For a large array, each miss will bring into the
cache not only the desired element but the next
cache not only the desired element, but the next several elements as well
If elements are accessed across cache lines then
If elements are accessed across cache lines, then almost every access will result in a cache miss
Trang 25Compile-Time Descriptors
Copyright © 2006 Addison-Wesley All rights reserved 1-49
Single-dimensioned array Multi-dimensional array
Implementation of Arrays (cont.)
Trang 26Implementation of Arrays (cont.)
• Single-dimensioned array:
address(a[k]) =
address(a[lower_bound]) + ((k - lower_bound) * _ _ element_size)
• Double-dimensioned array (row-major order):Double dimensioned array (row major order):
address(a[i, j]) =
address(a[ro lb col lb]) + ((i ro lb) * N + address(a[row_lb, col_lb]) + ((i - row_lb) * N + (j - col_lb)) * element_size
Copyright © 2006 Addison-Wesley All rights reserved 1-51
where N is a number of column
• The design issues that are specific for
associative arrays are:
– What is the form of references to elements?
– Is the size of an associative array static or
dynamic?
Trang 27Associative Arrays - Perl
• In Perl, associative arrays are often called
hashes, because in the implementation their elements are stored and retrieved with hash
functions
• The size of a Perl hash is dynamic: It grows
• The size of a Perl hash is dynamic: It grows
when a new element is added and shrinks when
l t i d l t d d l it i ti d b
an element is deleted, and also it is emptied by assignment of the empty literal
Copyright © 2006 Addison-Wesley All rights reserved 1-53
Associative Arrays – Perl (cont.)
• Structure and Operations
Hash variables begin with %
– Hash variables begin with %
– Scalar variable names begin with $
%fruit = (“apples” => 3, “oranges” => 6);
– Subscripting is done using braces and keys
Trang 28Record Types
• A record is a possibly heterogeneous aggregate
of data elements in which the individual
elements are identified by namesy
• Design issues:
What is the syntactic form of references to the
– What is the syntactic form of references to the field?
– Are elliptical references allowed?
Copyright © 2006 Addison-Wesley All rights reserved 1-55
Definition of Records in COBOL
• COBOL uses level numbers to show nested
Trang 29Definition of Records in Ada
• In Ada:
type DATE is
record
Month : INTEGER range 1 12;
Day : INTEGER range 1 31;
Year : INTEGER range 1776 2100;
• Most language use dot notation
• Fully qualified referencesFully qualified references must include all recordmust include all record names
– COBOL:
MID OF EMP-NAME OF EMP-REC
– Ada
Student: PERSON; … Student.Birthday.Year …
• Elliptical references allow leaving out record
names as long as the reference is unambiguous– COBOL COBOL
FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-REC
are elliptical references to the employee’s first name
– Ada Ada
with Student do Name := ‘Michael’;