All primitive data terms are passed by value, whichis most commendable, but all proper objects are passed by reference.. In C#, by default, primitive data items are passed by value, obje
Trang 1188 Chapter 14 ■The basics
Unfortunately, most programming languages do not enforce even these simple, logical rules Thus it is largely the responsibility of the programmer to ensure that
pro-cedures and functions do not have side effects A side effect is any change to information
outside a method caused by a call – other than the parameters to a procedure Most pro-gramming languages do not prevent programmers from directly accessing and modify-ing data objects (global data) defined outside of the local environment of the method Along with pointers and the goto statement, global data has come to be regarded
as a major source of programming problems We shall see in Chapter 15? (on object-oriented features of programming languages) how, in classes, access to global data is controlled
Many abstractions, particularly those which manipulate recursive data structures such as lists, graphs, and trees, are more concisely described recursively Some lan-guages, for example Cobol and Fortran, do not support recursion
We have seen that, ideally:
■ parameters are passed to a procedure so that the procedure will accomplish some task There is no need for information to be passed back to the caller So there is no need for parameter values to change
■ functions communicate a value back to the caller as the return value So again there
is no need for parameter values to be changed
Two major schemes for parameters have emerged:
■ call by value(termed value parameters) – this means that a copy of the information
is passed as the parameter Therefore the method can use the information but can-not change it
■ call by reference(termed reference parameters) – this means that a pointer to the information is passed as the parameter Therefore the method can both access and change the information
These pointers are not a problem because the pointers are not themselves accessible
to the programmer (The programmer cannot access or change the pointer, merely the information pointed to.) The pointer is simply the mechanism for communicating the information We discuss programming using pointers in Chapter 15 on object-oriented programming
The programming language could enforce a discipline where procedures and functions
can only be supplied with value parameters, but most do not A number of parameter-passing schemes are employed in programming languages but no language provides a completely safe and secure parameter-passing mechanism
There is a performance consideration for value parameters Passing by value is inef-ficient for passing large, aggregate data structures such as an array, as a copy must be made In such situations, it is commonplace to pass the data structure by reference even
if the parameter should not be modified by the method
14.9 ● Parameter-passing mechanisms
Trang 2Java provides the following scheme All primitive data terms are passed by value, which
is most commendable, but all proper objects are passed by reference No distinction is made between procedures and functions Thus a method of either type (procedure or function) can modify any non-primitive parameter and it is left to the programmer to enforce a discipline over changing parameters A small concession is that the pointer to
an object cannot be changed, for example to point to another object
Fortran employs only a single parameter passing mode: call by reference Thus, undesirably, all actual parameters in Fortran may potentially be changed by any sub-routine or function The programmer is responsible for ensuring the safe implemen-tation of input and output parameters Using call by reference, the location of the actual parameter is bound to the formal parameter The formal and actual parameter
names are thus aliases; modification of the formal parameter automatically modifies
the actual parameter This is what you might expect of a language where arrays are often used, and the performance hit of copying arrays is unacceptable Fortran also, unfortunately, restricts the type of result that may be returned from functions to scalar types only (i.e not arrays etc.)
Visual Basic.Net provides the choice of value or reference parameters, described by the key words ByVal(the default) and ByRefin the method header But when objects are passed, they are always passed by reference
In C#, by default, primitive data items are passed by value, objects are passed by ref-erence But you can pass a primitive data item by reference if the parameter is preceded
by the key word refin both the method header and the method call You can also pre-cede an object name by ref, in which case you are passing a pointer to a pointer This means that the method can return an entirely different object
Call by value-result is often used as an alternative to call by reference for input-output parameters It avoids the use of aliases at the expense of copying Parameters passed by value-result are initially treated as in call by value; a copy of the value of the actual param-eter is passed to the formal paramparam-eter, which again acts as a local variable Manipulation
of the formal parameter does not immediately affect the actual parameter On exit from the procedure, the final value of the formal parameter is assigned into the actual
param-eter Call by result may be used as an alternative to call by reference for output parameters.
Parameters passed by value are treated exactly as those passed by value-result except that
no initial value is assigned to the local formal parameter
Ada identifies three types of parameter:
■ input parameters to allow a method read-only access to an actual parameter The actual parameter is purely an input parameter; the method should not be able to modify the value of the actual parameter
■ output parameters to allow a procedure write-only access to an actual parameter The actual parameter is purely an output parameter; the procedure should not be able to read the value of the actual parameter
■ input-output parameters to allow a procedure read-and-write access to an actual parameter The value of the actual parameter may be modified by the procedure
Ada only allows input variables to functions The parameter-passing mechanisms used
in Ada (described as in, outand in out) would therefore seem to be ideal However,
Trang 3190 Chapter 14 ■The basics
Ada does not specify whether they are to be implemented using sharing or copying Though beneficial to the language implementer, since the space requirements of the parameter can be used to determine whether sharing or copying should be used, this de-cision can be troublesome to the programmer In the presence of aliases, call by value-result and call by reference may return different value-results
Programmers are accustomed to being provided with a rudimentary set of primitive data types These are provided built in and ready made by the programming language They usually include:
■ Boolean
■ char
■ integer
■ real or floating point
These data types are accompanied by a supporting cast of operations (relational, arithmetic, etc.) For each type, it should be possible to clearly define the form of the literals or constants which make up the type For example, the constants true and
falsemake up the set of constants for the type Boolean Similarly, we should be able
to define the operations for each type For the type Boolean, these might include the
operations =, <>, not, and, and or.
In most languages the primitive data types are not true objects (in the sense of objects created from classes) But in Eiffel and Smalltalk, every data type is a proper object and can be treated just like any other object
For certain application domains, advanced computation facilities, such as extended precision real numbers or long integers, are essential The ability to specify the range of integers and reals and the precision to which reals are represented reduces the depend-ence on the physical characteristics, such as the word size, of a particular machine This increases the portability of programs However, some languages (for example C and C++) leave the issue of the precision and range of numbers to the compiler writer for the particular target machine Java gets around this sloppiness by precisely defining the representation of all its built-in data types Whatever machine a program is executed on, the expectation is that data is represented in exactly the same manner Thus the pro-gram will produce exactly the same behavior, whatever the machine
A data type is a set of data objects and a set of operations applicable to all objects of that type Almost all languages can be thought of as supporting this concept to some extent Many languages require the programmer to define explicitly the type (e.g in-teger or character) of all objects to be used in a program, and, to some extent or another, depending on the individual language, this information prescribes the operations that
14.11 ● Data typing 14.10 ● Primitive data types
Trang 4can be applied to the objects Thus, we could state, for example, that Fortran, Cobol,
C, C++, Ada, C#, Visual Basic.Net and Java are all typed languages However, only Ada, C#, Visual Basic.Net and Java would be considered strongly typed languages
A language is said to be strongly typed if it can be determined at compile-time
whether or not each operation performed on an object is consistent with the type of that object Operations inconsistent with the type of an object are considered illegal A strongly typed language therefore forces the programmer to consider more closely how objects are to be defined and used within a program The additional information pro-vided to the compiler by the programmer allows the compiler to perform automatic type checking operations and discover type inconsistencies Studies have shown that programs written in strongly typed languages are clearer, more reliable, and more portable Strong typing necessarily places some restrictions on what a programmer may
do with data objects However, this apparent decrease in flexibility is more than com-pensated for by the increased security and reliability of the ensuing programs
Languages such as Lisp, APL, and POP-2 allow a variable to change its type at
run-time This is known as dynamic typing as opposed to the static typing found in languages
where the type of an object is permanently fixed Where dynamic typing is employed, type checking must occur at run-time rather than compile-time Dynamic typing pro-vides additional freedom and flexibility but at a cost More discipline is required on the part of the programmer so that the freedom provided by dynamic typing is not abused That freedom is often very useful, even necessary, in some applications, for example, problem-solving programs which use sophisticated artificial intelligence techniques for searching complex data structures would be very difficult to write in languages without dynamic typing
What issues need to be considered when evaluating the data type facilities provided
by a programming language? We suggest the following list:
■ does the language provide an adequate set of primitive data types?
■ can these primitives be combined in useful ways to form aggregate or structured data types?
■ does the language allow the programmer to define new data types? How well do such new data types integrate with the rest of the language?
■ to what extent does the language support the notion of strong typing?
■ when are data types considered equivalent?
■ are type conversions handled in a safe and secure manner?
■ is it possible for the programmer to circumvent automatic type checking operations?
The debate as to whether strongly typed languages are preferable to weakly typed lan-guages closely mirrors the earlier debate among programming language aficionados about the virtues of the goto statement The pro-gotogroup argued that the construct was required and its absence would restrict programmers The anti-gotogroup contended that indiscriminate use of the construct encouraged the production of “spaghetti-like” code
14.12 ● Strong versus weak typing
Trang 5192 Chapter 14 ■The basics
The weakly typed languages group similarly argue that some types of programs are very difficult, if not impossible, to write in strongly typed languages For example, a program that manipulates graphical images will sometimes need to perform arithmetic
on the image and at other times examine the data bit-by-bit
The strongly typed languages group argue that the increased reliability and security outweigh these disadvantages A compromise has been struck; strong typing is gener-ally seen as highly desirable but languages provide well-defined escape mechanisms to circumvent type checking for those instances where it is truly required
Weakly typed languages such as Fortran and C provide little compile-time type checking support However, they do provide the ability to view the representation of information as different types For example, using the equivalence statement in Fortran, a programmer is able to subvert typing:
integer a logical b equivalence a, b
The variable b is a logical, which is the Fortran term for Boolean The equival-encedeclaration states that the variables aand bshare the same memory While econ-omy of storage is the primary use of the equivalence statement, it also allows the same storage to be interpreted as representing an integer in one case and a logical (Boolean) in the second The programmer can now apply both arithmetic operations and logical operations on the same storage simply by choosing the appropriate alias (aor b)
to reference it
This incredible language feature is dangerous because programs using it are unclear Moreover such programs are not portable because the representations used for integers and Booleans are usually machine dependent
To a small number of programming applications, the ability to circumvent typing to gain access to the underlying physical representation of data is essential How can this be provided in a language that is strongly typed? The best solution is probably to force the
programmer to state explicitly in the code that they wish to violate the type checking
operations of the language This approach is taken by Ada, where an object may be rein-terpreted as being of a different type only by using the unchecked conversion facility The question of conversion between types is inextricably linked with the strength of typing in a language Fortran, being weakly typed, performs many conversions (or co-ercions) implicitly during the evaluation of arithmetic expressions These implicit con-versions may result in a loss of information and can be dangerous to the programmer
As we saw earlier, Fortran allows mixed mode arithmetic and freely converts reals to integers on assignment
Java and other strongly typed languages perform implicit conversions only when
there will be no accompanying loss of information Thus, an assignment of an integer
to a real variable results in implicit conversion of the integer to a real – the programmer does nothing However, an attempt to assign a real value to an integer variable will result in a type incompatibility error Such an assignment must be carried out using an explicit conversion function That is, the programmer is forced by the language to
Trang 6explicitly consider the loss of information implied by the use of the conversion function.
In Java, for example, a real can be converted to an integer, but only by using an
explic-it casting operator:
float f = 1.2345;
int i = (int) f;
The casting operator is the name of the destination type, enclosed in brackets – in this case (int) When this is used, the compiler accepts that the programmer is truly asking for a conversion and is responsibly aware of the possible consequences
SELF-TEST QUESTION
14.3 Java provides shift and Boolean operations for integers and reals Does this violate strong typing?
The readability, reliability, and data abstraction capabilities of a language are enhanced
if the programmer can extend the primitive data types provided by the language The ability to define user-defined types separates the languages C, C++ and Ada from their predecessors For example, consider the following definition of a C++ enumerated type which is introduced by the key word enum:
enum Day {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday,
Sunday};
The type Day is a new type Variables of this type may only take on values that are
the literals of that type (that is Monday, Tuesday, etc) Now we can declare a variable
of this type, as follows:
Day today;
And we can perform such operations as
today = Monday;
if (today == Saturday) etc
We also get some type checking carried out by the compiler Assignments such as the following will be flagged as type errors by the compiler
today = January;
today = 7;
14.13 ● User-defined data types (enumerations)
Trang 7194 Chapter 14 ■The basics
In a language without this facility, we are forced to map the days of the week onto integers, so that 1 means Monday etc But then we get no help from the compiler when
we write (correct) statements, such as:
int today;
today = 2;
or even the “illegal”
today = 0;
since todayis an integer variable and therefore may be assigned any integer value
SELF-TEST QUESTION
14.4 Make the case for user-defined types
Enumerated types, such as the C++ facility described above, have their limitations An enumerated type can be declared, variables created, assignments and comparisons carried out, but these are the only operations and we cannot create any more For example, in the above example one cannot write a method nextDay Moreover different enums can-not contain identical names For example, we are prevented from writing:
enum Weekend {Saturday, Sunday};
because the names clash with those already in enum Day
Arguably, if the language provides classes (Chapter 15) it does not need enums In fact the Java enumfacility is almost a class
Composite data types allow the programmer to model structured data objects The most common aggregate data abstraction provided by programming languages is the
array: a collection of homogeneous elements (all elements of the same type) which may
be referenced through their positions (usually an integer) within the collection Arrays are characterized by the type of their elements and by the index or subscript range or ranges which specify the size, number of dimensions and how individual elements of the array may be referenced
For example, the Java array definition shown below defines an array named table
It is a one-dimensional array of integers with the subscript varying from 0 through 9
In Java, subscripts always start at 0, betraying the C origins of the language as a lan-guage close to machine instructions
int table[] = new int[10];
14.14 ● Arrays
Trang 8Individual elements of the array are referenced by specifying the array name and an expression for each subscript, for example, table[2]
The implementation of arrays in programming languages raises the following con-siderations:
■ what restrictions are placed on the element type? For complete freedom of expression there should be no restrictions
■ valid indices should be any subrange of numbers (e.g 2010 to 2020)
■ at what time must the size of an array be known? The utility of arrays in a pro-gramming language is governed by the time (compile-time or run-time) at which the size of the array must be known
■ what operations may be applied to complete arrays? For example, it is very convenient
to be able to carry out array assignment or comparison between compatible arrays using a single concise statement
■ are convenient techniques available for the initialization of arrays?
The time at which a size must be specified for an array has implications on how the array may be used In Java, as in most languages, the size of an array must be defined statically – the size and subscript ranges are required to be known at compile-time This has the advantage of allowing the compiler to generate code automatically to check for out-of-range subscripts However, the disadvantage of this simple scheme is that, to allow the program to accommodate data sets of differing sizes, we would like to delay deciding the size of the array until run-time Most languages provide arrays whose size
is fixed at compile-time, so if variable size is needed, a dynamic data structure is the answer (see Chapter 15)
SELF-TEST QUESTION
14.5 Argue for and against the language making array subscripts start at 0
Data objects in problem domains are not always simply collections of homogeneous objects (same types) Rather, they are often collections of heterogeneous objects (dif-ferent types) Although such collections can be represented using arrays, many
pro-gramming languages provide a record data aggregate Records (or structures as they are
termed in C and C++) are generalizations of arrays where the elements (or fields) may
be of different types and where individual components are referenced by (field) name rather than by position
For example, the C++ structdefinition shown below describes information relat-ing to a time Each object of type Timehas three components named hour,minute
and second
14.15 ● Records (structures)
Trang 9196 Chapter 14 ■The basics
struct Time { int hour;
int minute;
int second;
}
We can now declare a variable of this type:
Time time;
Components of records are selected by name The method used by Ada, PL/1 and C++ first specifies the variable and then the component For example,
time.minute = 46;
Each component of a record may be of any type – including aggregate types, such as arrays and records Similarly, the element type of an array might be a record type Programming languages which provide such data abstractions as arrays and records and allow them to be combined orthogonally in this fashion allow a wide range of real data objects to be modeled in a natural fashion
The languages Cobol, PL/1, C, C++, C# and Ada support records (In C, C++ and C# a record is termed a struct.) The Java language does not provide records as described above because this facility can simply be implemented as a class, using the object-oriented features of the language (see Chapter 15) Simply declare a class, with the requisite fields within it
SELF-TEST QUESTION
14.6 Make the case for arrays and records
Summary
In this chapter we have surveyed the basic characteristics that a programming lan-guage should have from the viewpoint of the software engineer It seems that small things – like syntax – can affect software reliability and maintenance
Some people think that a language should be rich in features – and therefore powerful Other people think that a language should be small but elegant so that
it can be mastered completely by the programmer
Trang 1014.1 Suppose that you were asked to design a new programming language for software engineering
■ select and justify a set of control structures
■ select and justify a set of primitive data types
14.2 Argue either for or against strong typing in a programming language
14.3 How many kinds of looping structure do we need in a programming language? Make suggestions
14.4 From the discussion in this chapter, list the possible problem features with either pro-gramming languages in general or a propro-gramming language of your choice
14.5 “In language design, small is beautiful.” Discuss
14.6 Argue for or against the inclusion of the break statement in a programming language
14.7 The language LISP has the ultimate simple syntax Every statement is a list For example:
(+ 1 2)
returns the sum of the parameters
Investigate the syntax of Lisp and discuss whether every language could and should have syntax that is as simple
The following issues are considered to be important:
■ matching the language to the application area of the project
■ clarity, simplicity, and orthogonality
■ syntax
■ control abstractions
■ primitive data types
■ data typing
■ enumerations
■ arrays
■ records (structures)
Exercises
•