1. Trang chủ
  2. » Công Nghệ Thông Tin

The New C Standard- P15

100 438 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The New C Standard- P15
Trường học Unknown University
Chuyên ngành Computer Science
Thể loại Guide
Năm xuất bản 2009
Thành phố Unknown City
Định dạng
Số trang 100
Dung lượng 718,83 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

There are a number of situations where adhering to such a guideline recommendation leads to the creation of redundant code e.g., if all190 redundant code possible values are covered by t

Trang 1

Neither of these reasons could be said to contain an actual benefit The cost associated with side effects in

controlling expressions is the possibility that they will go unnoticed by a reader of the source (especially if

kinds of

The most common form of side effect in a controlling expression is assignment, in particular simple

assignment The case where the author of the code intended to type an equality operator, rather than a simple

assignment operator is a fault and these coding guidelines are not intended to recommend against the use of

constructs that are obviously faults However, it is possible that a reader of the visible source will mistake 0 guidelines

not faults

a simple assignment for an equality operator (the token==is much more likely than=in the context of a

controlling expression) and reducing the likelihood of such a mistake occurring is also a cost reduction 1740controllingexpression

if statement

This discussion has referred to controlling expressions as if these costs and benefits apply to their use in

all contexts (i.e., selection and iteration statements) The following example shows that writing code to avoid

the occurrence of side effects in controlling expressions contained with iteration statements requires two,

rather than one, assignments to be used

1 extern int glob_1,

Duplicating the assignment toglob_1creates a maintenance dependency (any changes to one statement

need to be reflected in the other) The increase in cost caused by this maintenance dependency is assumed to

be greater than the cost reduction achieved from reducing the likelihood of a simple assignment operator

being mistaken treated as an equality operator

Cg1740.1

The simple assignment operator shall not occur in the controlling expression of anifstatement

Trang 2

6.8.4 Selection statements

1741

Experience has shown that there are a variety of other constructs, appearing in a controlling expression, thatdeveloper have difficulty comprehending, or simply miscomprehend when scanning the source However, noother constructs are discussed here The guideline recommendation dealing with the use of the assignmentoperator has the benefit of simplicity and frequency of occurrence It was difficult enough analyzing thecost/benefit case for simple assignment and others are welcome to address more complicated cases

Experience shows that many developers use the verbal form “if expression is not true then” when thinkingabout the condition under which anelseform is executed This use of not can lead to double negatives whenreading some expressions For instance, possible verbal forms of expressing the conditions under which thearms of anifstatement are executed include:

1 if (!x)

3 else

4 b(); /* Executed if not x is not true */

5 /* Executed if not x is equal to 0 */

6 /* Executed if x is not equal to 0 */

7

8 if (x != y)

10 else

11 d(); /* Executed if x is not equal to y is not true */

The possible on linguistic impact of the!operator on expression comprehension is discussed elsewhere

In the translated form of this book’s benchmark programs 1.3% ofselection-statements and 4% of

iteration-statements have a controlling expression that is a constant expression Use of simple, iterative, flow analysis enables a further 0.6% of all controlling expressions to be evaluated to a constantexpression at translation time

if (sizeof(enum {b, a}) != sizeof(int)) return a; // a == 1

return b; // which b?

}

In C89, the declarationenum {b, a}persists after the if-statement terminates; but in C99, the implied blockthat encloses the entireifstatement limits the scope of that declaration; therefore the different functionreturns different values in C89 and C99 The Committee views such cases as unintended artifacts of allowingdeclarations as operands of cast andsizeofoperators; and this change is not viewed as a serious problem

See the following C sentence for a further discussion on the rationale

block

selection

sub-statement

1742

Trang 3

Developers are more likely to be tripped up by the lifetime issues associated with compound literals than

enumeration constants For instance in:

1 if (f(p=&(struct S){1, 2}))

3 val=p->mem_1;

the lifetime of the storage whose address is assigned to pends when the execution of theifstatement

terminates Ensuring that developers are aware of this behavior is an educational issue However, developers

intentionally relying on the pointed-to storage continuing to exist (which it is likely to, at least until storage

needs to be allocated to another object) is a potential guideline issue However, until experience has been

gained on how developers use compound literals it is not known whether this issue is simply an interesting

theoretical idea of a real practical problem

1742Each associated substatement is also a block whose scope is a strict subset of the scope of the selection block

selection statementstatement

sub-Commentary

Rationale

A new feature of C99: A common coding practice is always to use compound statements for every selection

and iteration statement because this guards against inadvertent problems when changes are made in the

future Because this can lead to surprising behavior in connection with certain uses of compound literals

(§6.5.2.5), the concept of a block has been expanded in C99

Given the following example involving three different compound literals:

extern void fn(int*, int*);

int examp(int i, int j)

{

int *p, *q;

if (*(q = (int[2]){i, j})) fn(p = (int[5]){9, 8, 7, 6, 5}, q);

else fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);

return *p;

}

it seemed surprising that just introducing compound statements also introduced undefined behavior:

extern void fn(int*, int*);

int examp(int i, int j)

{

int *p, *q;

Trang 4

6.8.4 Selection statements

1742

if (*(q = (int[2]){i, j})) { fn(p = (int[5]){9, 8, 7, 6, 5}, q);

} else { fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);

as if written:

extern void fn(int*, int*);

int examp(int i, int j) {

int *p, *q;

{

if (*(q = (int[2]){i, j})) { // *q is guaranteed to designate an object fn(p = (int[5]){9, 8, 7, 6, 5}, q);

} else { // *q is guaranteed to designate an object fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);

} }

// *q is not guaranteed to designate an object

return *p; // *p is not guaranteed to designate an object }

If compound literals are defined in selection or iteration statements, their lifetimes are limited to the impliedenclosing block; therefore the definition of “block” has been moved to this section This change is compatiblewith similar C++rules

C90

The following example illustrates the rather unusual combination of circumstances needed for the specificationchange, introduced in C99, to result in a change of behavior

1 extern void f(int);

2 enum {a, b} glob;

Trang 5

compound statement contains a single statement the use of braces is redundant and their presence decreases

the amount of information visible on a display (the number of available lines is fixed and each brace usually

occupies one line) However, experience has shown that in some cases the presence of these braces can:

• Provide additional visual cues that can reduce the effort needed, by readers, to comprehend a sequence

of statements However, the presence of these redundant braces reduces the total amount of information

immediately visible, to a reader, on a single display (i.e., the amount of source code that can be seen

without expending motor effort giving editor commands to change the display contents) The way in 0 cost/accuracy

trade-off

which these costs and benefits trade-off against each other is not known

• Help prevent faults being introduced when code is modified (i.e., where a modification results in

unintended changes to the syntactic bindings of blocks to statement headers) Experience shows that1707 statement

header

nestedifstatements are the most common construct whose modification results in unintended changes

to the syntactic bindings of blocks

In the following example the presence of braces provides both visual cues that the elsedoes not bind

to the outerifand additional evidence (its indentation provides counter evidence because it provides an

interpretation that the intent is to bind to the outerif) that it is intended to bind to the innerif

Blocks occur in a number of statements, is there a worthwhile cost/benefit in guideline recommendation

specifying that these blocks always be a compound statement?

The block associated with aswitchstatement is invariably a compound statement A guideline

recom-mendation that braces be used is very likely to be redundant in this case Iteration statements are not as

common as selection statements and much less likely to be nested (in other iteration statements) than selection

statements (compare Figure1739.2and Figure1763.2), and experience suggests developer comprehension

of such constructs is significantly affected by the use of braces Experience suggests that the nestedif

statement is the only construct where the benefit of the use of braces is usually greater than the cost

Cg1742.1

The statement forming the block associated with either arm of anifstatement shall not be anif

statement

Trang 6

6.4p4 The value of a condition that is an expression is the value of the expression, implicitly converted toboolfor

statements other thanswitch; if that conversion is ill-formed, the program is ill-formed

If only constructs that are available in C are used the set of possible expressions is the same

This subsection concentrates on the form of the controlling expression

The value of a controlling expression is used to make one of two choices Values used in this way aregenerally considered to have a boolean role Some languages require the controlling expression to have

a boolean type and their translators enforce this requirement Some coding guideline documents containrecommendations that effectively try to duplicate this boolean type requirement found in other languages.Recommendations based on type not only faces technical problems in their wording and implementation(caused by the implicit promotions and conversions performed in C), but also fail to address the real issues ofdeveloper comprehension and performance

In the context of anifstatement do readers of the source distinguish between expressions that have twopossible values (i.e., boolean roles), and expressions that may have more than two values being used in acontext where an implicit test against zero is performed? Is the consideration of boolean roles a culturalbaggage carried over to C by developers who have previously used them in other languages? Do readers whohave only ever programmed in C make use of boolean roles, or do they think in terms of a test against zero?

In the absence of studies of developer mental representations of algorithmic and source code constructs, it isnot possible to reliably answer these questions Instead the following discussion looks at the main issuesinvolved in making use of boolean roles and making use of the implicit a test against zero special case

A boolean role is not about the type of an expression (prior to the introduction of the type_Boolin C99, acharacter type was often used as a stand-in), but about the possible values an expression may have and howthey are used The following discussion applies whether a controlling expression has an integer, floating, orpointer type

In some cases the top-level operator of a controlling expression returns a result that is either zero or one(e.g., the relational and equality operators) The visibility, in the source, of such an operator signals itsboolean role to readers However, in other cases (see Table1763.2) developers write controlling expressionsthat do not contain explicit comparisons (the value of a controlling expression is implicitly compared againstzero) What are the costs and benefits of omitting an explicit comparison? The following code fragmentcontains examples of various ways of writing a controlling expression:

Trang 7

Does the presence of an explicit visual (rather than an implicit, in the developers mind) comparison reduce

either the cognitive effort needed to comprehend the if statement or the likelihood of readers making

mistakes? Given sufficient practice readers can learn to automatically process if (x)as if it had been

written asif (x != 0) The amount of practice needed to attain an automatic level of performance is

unknown Another unknown is the extent to which the token sequence!= 0acts as a visual memory aid

When the visible form of the controlling expression is denoted by a single object (which may be an

ordinary identifier, or the member of a structure, or some other construct where a value is obtained from an

object) that name may provide information on the values it represents To obtain this information readers

might make use of the following:

• Software development conventions In the software development community (and other communities)

the term flag is generally understood to refer to something that can be in one of two states For instance,

the identifiermimbo_flagis likely to be interpreted as having two possible values relating to a mimbo,

rather than referring to the national flag of Mimbo Some naming conventions contain a greater degree

of uncertainty than others For instance, identifiers whose names contain the character sequence status

sometimes represent more than two values

• Natural language knowledge Speakers of English regard some prepositions as being capable of

representing two states For instance, a cat is or is not black This natural language usage is often

adopted when selecting identifier names For instance,is_flobberis likely to be interpreted as

representing one of two states (being a, or having the attribute of, flobber or not)

• Real world knowledge A program sometimes needs to take account of information from the real

world For instance, height above the ground is an important piece of information in an airplane flight

simulator, with zero height having a special status

• Application knowledge The design of a program invariably makes use of knowledge about the

application domain it operates within For instance, the termziffermay be used within the application

domain that a program is intended to operate Readers of the source will need the appropriate

application knowledge to interpret the role of this identifier

• Program implementation conventions The design of a program involves creating and using various

conventions For instance, a program dealing with book printing may perform special processing for

books that don’t contain any pages (e.g.,num_pagesbeing zero is a special case)

• Conventions and knowledge from different may be mixed together For instance, the identifier name

current_colorsuggests that it represents color information This kind of information is not usually

thought about in terms of numeric values and there are certainly more than two colors However,

assigning values to symbolic qualities is a common software development convention, as is assigning

a special interpretation to certain values (e.g., using zero to represent no known color, a program

implementation convention)

The likelihood of a reader assuming that an identifier name has a boolean role will depend on the cultural

beliefs and conventions they share with the author of the source There is also the possibility that rather than

using the identifier name to deduce a boolean role, readers may use the context in which it occurs to infer a 476 boolean role

boolean role This is an example of trust based usage Requiring that values always be compared (against 792 trust based

usage

true/false or zero/nonzero) leads to infinite regression, as in the sequence:

Trang 8

of zero or one (e.g., an equality or relational operator), it is possible to generate machine code that performs

a branch rather than returning a value that is then compared Some processors have a single instruction thatperforms a comparison and branch, while others have separate instructions (the comparison instruction settingprocessor condition flags that are then tested by a conditional branch instruction) On some processors simplyloading a value into a register also results in a comparison against zero being made, with the appropriateprocessor condition flags being set The use of conditional instructions is discussed elsewhere

Some coding guideline documents recommend that theelseform always be present, even if it contains

no executable statements Such a recommendation has the benefit of ensuring that there are never anymismatchingif/elsepairs However, then the same effect can be achieved by requiring nestedifstatements

to be enclosed in braces (this issue is discussed elsewhere) The cost of adding emptyelseforms increases

Trang 9

6.8.4.2 Theswitchstatement 1748

1746If the first substatement is reached via a label, the second substatement is not executed

Commentary

The flow of control of a sequence of statements is not influenced by how they were initially reached, in the

flow of control The label may be reached as a result of executing aswitchstatement, or agotostatement.1753switchstatement

causes jump

1789 goto

causes ditional jump

uncon-The issue of jumping into nested blocks or the body of iteration statements is discussed elsewhere

1783 jump ment

state-causes jump to

1766 iteration statement

executed edly

repeat-C++

The C++Standard does not explicitly specify the behavior for this case

Other Languages

This statement applies to all programming languages that support jumps into more deeply nested blocks

1747Anelseis associated with the lexically nearest precedingifthat is allowed by the syntax else

Languages that support nesting of conditional statements need a method of resolving which construct an

elsebinds to The rules used include the following:

• Not supporting in the language syntax unbracketed nesting (i.e., requiring braces orbegin/endpairs)

within the then arm For instance, Algol 60 permits the usageIF q1 THEN a1 ELSE IF q2 THEN

a2 ELSE a3, but the following is a syntax violationIF q1 THEN IF q2 THEN a1 ELSE a2 ELSE

a3

• Using a matching token to pair with theif The keywordfiis a common choice (used by Ada, Algol

68, while the C preprocessor usesendif) In this case the bracketing formed by theif/fiprevents

any ambiguity occurring

• Like C— using the nearest preceding rule

Coding Guidelines

If the guideline recommendation on using braces is followed there will only ever be one lexically preceding 1742.1 if statement

block not an if statement

ifthat anelsecan be associated with Some coding guideline documents recommend that anifstatement

always have an associatedelseform, even if it only contains the null statement 1733 null

Aswitchstatement uses the exact value of its controlling expression and it is not possible to guarantee

the exact value of an expression having a floating type (there is a degree of unpredictability in the value

between different implementations) For this reason implementations are not required to support controlling

expressions having a floating type

C++

6.4.2p2

Trang 10

6.8.4.2 Theswitchstatement

1749

The condition shall be of integral type, enumeration type, or of a class type for which a single conversion function

to integral or enumeration type exists (12.3)

If only constructs that are available in C are used the set of possible expressions is the same

Coding Guidelines

A controlling expression, in aswitchstatement, having a boolean role might be thought to be unusual, an

ifstatement being considered more appropriate However, the designer may be expecting the type of thecontrolling expression to evolve to a non-boolean role, or theswitchstatement may have once containedmorecaselabels

Table 1748.1: Occurrence ofswitchstatements having a controlling expression of the given type (as a percentage of allswitch

statements) Based on the translated form of this book’s benchmark programs.

unsigned long 18.7 unsigned short 2.8

unsigned char 12.4 long 0.9

unsigned int 10.0 other-types 0.2

modified type modified type, the entireswitchstatement shall be within the scope of that identifier.133)

case value density

1 10 100

nodefaultresults were scaled so that the total count ofswitchstatements matched those that included adefaultlabel.

Trang 11

6.8.4.2 Theswitchstatement 1749

Figure 1748.2: Number ofcase/defaultlabels having s given number of statements following them (statements from any

nestedswitchstatements did not contribute towards the count of a label) Based on the visible form of the c files.

Commentary

The declaration of an identifier having variable modified type can occur in one of the sequence of statements

labeled by acaseordefault, provided it appears within a compound statement that does not contain any

other caseordefaultlabels associated with thatswitchstatement, or it appear after the lastcaseor

defaultlabel in theswitchstatement In the compound statement case the variably modified type will

not be within the scope of anycaseordefaultlabels (its lifetime terminates at the end of the compound

statement)

The wording of the requirement is overly strict in that it prohibits uses that might be considered well

behaved For instance:

Attempting to create wording to support such edge cases was considered to be a risk (various ambiguities

may later be found in it) that was not worth the benefit Additional rationale for this requirement is discussed

modified type

C90

Support for variably modified types is new in C99

C++

Support for variably modified types is new in C99 and is not specified in the C++Standard

The C++Standard contains the additional requirement that (the wording in a subsequent example suggests

that being visible rather than in scope is more accurate terminology):

6.7p3

It is possible to transfer into a block, but not in a way that bypasses declarations with initialization A program

that jumps77)from a point where a local variable with automatic storage duration is not in scope to a point where

it is in scope is ill-formed unless the variable hasPOD type (3.9) and is declared without aninitializer(8.5)

Trang 12

6.8.4.2 Theswitchstatement

Trang 13

6.8.4.2 Theswitchstatement 1753

Coding Guidelines

Some coding guideline documents recommend that allswitchstatements contain adefaultlabel There

does not appear to be an obvious benefit (as defined by these coding guideline subsections, although there may

be benefits for other reasons) for such a guideline recommendation To adhere to the guideline developers

simply need to supply adefaultlabel and an associated null statement There are a number of situations

where adhering to such a guideline recommendation leads to the creation of redundant code (e.g., if all190 redundant

code

possible values are covered by thecaselabels, either because they handle all values that the controlling

expression can take or because execution of theswitchstatement is conditional on anifstatement that

guarantees the controlling expression is within a known range)

Usage

In the visible form of the.cfiles, 72.8% ofswitchstatements contain adefaultlabel

1752(Any enclosedswitchstatement may have adefaultlabel orcaseconstant expressions with values that

duplicatecaseconstant expressions in the enclosingswitchstatement.)

Commentary

This specification (semantics in a Constraints clause) clarifies the interpretation to be given to the phrase “in

the sameswitchstatement” appearing earlier in this Constraints clause 1750case la-bel unique

in same switch

Semantics

causes jump

on the value of a controlling expression, and on the presence of adefaultlabel and the values of anycase

labels on or in the switch body

Commentary

This defines the term switch body Developers also use the terminology body of the switch

It is possible to write aswitchstatement as an equivalent sequence ofifstatements However, experience

shows that in some cases theswitchstatement appears to require less significantly less (cognitive) effort to

comprehend than a sequence ofifstatements

Common Implementations

Many processors include some form of instruction (often called an indirect jump) that indexes into a table

(commonly known as a jump table) to obtain a location to jump to The extent to which it is considered

to be more efficient to use such an instruction, rather than a series of if statements, varies between

processors (whose behavior varies for the situation where the index is out of range of the jump table) and

implementations (the sophistication of the available optimizer) The presence of adefaultlabel creates

additional complications in that all values of the controlling expression, not covered by acaselabel, need to

be explicitly handled Spuler[1300]discusses the general issues

Some translators implementswitchstatements as a series ofifstatements Knowledgeable developers

know that, in such implementations, placing the most frequently executed case labels before the less

frequently executed ones can provide a worthwhile performance improvement Some translators[22, 588]

provide an option that allows the developer to specify whether a jump table, sequence ofifstatements or

some other method should to be used

Optimal execution time performance is not the only factor that implementations need to consider The

storage occupied by the jump table sometimes needs to be taken into account In a simple implementation it

is proportional to the difference between the maximum and minimum values appearing in thecaselabels

(which may not be considered an efficient use of storage if there are only a few case labels used within this

range) A more sophisticated technique than using a series ofifstatements is to create a binary tree ofcase

label values and jump addresses The value of the controlling expression being used to walk this tree to

obtain the destination address Some optimizers split the implementation into a jump table for thosecase

label values that are contiguous and a binary tree for the out lying values

Trang 14

6.8.4.2 Theswitchstatement

1756

Translator vendors targeting modern processors face an additional problem Successful processors oftencontain a range of different implementations, creating a processor family, (e.g., the Intel Pentium series).These different processor implementations usually have different performance characteristics, and in the case

of theswitchstatement different levels of sophistication in branch prediction How does a translator makethe decision on whether to use a jump table orifstatements when the optimal code varies between differentimplementations of a particular processor?

A study by Uh and Whalley[1420]compared (see Table1753.1) the performance of a series ofifstatementsand the equivalent jump table implementation For three of the processors it was worth using a jump tablewhen there were more than twoifstatements were likely to be executed In the case of the ULTRASPARC-1the figure was more than eightifstatements executed (this was put down to the lack hardware support forbranch prediction of indirect jumps)

Table 1753.1: Performance comparison (in seconds) of some implementation techniques for a series ofifstatements (contained

in a loop that iterated 10,000,000 times) using (1) linear search (LS), or (2) indirect jump (IJ), for a variety of processors in the SPARC family br is the average number of branches per loop iteration Based on Uh and Whalley [1420]

When the controlling expression is denoted by an object having a character type the possible range of values

is known to fit in a byte Even relatively simple optimizers often check for, and make use of, this special case

Trang 15

6.8.4.2 Theswitchstatement 1756

Other Languages

Many languages have a single integer type, so there is no conversion to perform for caselabel values

Strongly typed languages usually require that the type of thecaselabel value be compatible with the type

of the controlling expression, there is not usually any implicit conversions Enumerated constants are often

defined to be separate types, that are not compatible with any integer type

Coding Guidelines

This C sentence deals with the relationship between individual case label values and the controlling

expression The following points deal with the relationship between differentcaselabel values within a

givenswitchstatement:

• Mixingcaselabels whose values are represented using both character constants and integer constants

is making use of representation information (in this context the macroEOFmight be interpreted in

its symbolic form of representing an end-of-file character, rather than an integer constant) There

does not appear to be a worthwhile benefit in having a deviation that permits the use of the integer

constant0rather than the character constant’\0’, on the grounds of improved reader recognition

performance The character constant’\0’is the most commonly occurring character constant (10%

of all character constants in the visible form of the.cfiles, even if it only represents 1% of all constant

tokens denoting the value 0)

• Mixingcaselabels whose values are represented using both enumeration constants and some other

form of constant representation (e.g., an integer constant) is making use of the underlying representation

of the enumerated constants The same is also true if enumerated constants from different enumerations

types are mixed

• Mixing integer constants represented using decimal, hexadecimal, or octal lexical forms The issue of

visually mixing integer constants having different lexical forms is discussed elsewhere 1875form of rep-resentation

mixing

Floating point literals are very rarely seen incaselabels The guideline recommendation dealing with exact

comparison of floating-point values is applicable to this usage

1214.1 equality operators not floating-point operands

Trang 16

6.8.4.2 Theswitchstatement

Trang 17

6.8.5 Iteration statements 1763

1761133) That is, the declaration either precedes theswitchstatement, or it follows the lastcaseordefaultlabel footnote

133associated with theswitchthat is in the block containing the declaration

Commentary

If the declaration is not followed by anycaseordefaultlabels, all references to the identifier it declares

can only occur in the statements that follow it (which can only be reached via a jump to precedingcaseor

defaultlabels, unless agotostatement jumps to an ordinary label within the statement list occurs)

case fall through switch (expr)

{ int i = 4;

and thus if the controlling expression has a nonzero value, the call to theprintffunction will access an

indeterminate value Similarly, the call to the functionfcannot be reached

Commentary

Objects with static storage duration are initialized on program startup

151 static storage duration

initialized before startup

1763

iteration ment syntax

state-iteration-statement:

Commentary

The terms loop header or head of the loop are sometimes used to refer to the source code location containing

the controlling expression of a loop (in the case of aforstatement it might be applied to all three components

bracketed by parentheses)

It is often claimed that programs spend 90% of their time executing 10% of their code This characteristic

is only possible if the time is spent in a subset of the programs iteration statements, or a small number of

functions called within those statements While there is a large body of published research on program

Trang 18

6.8.5 Iteration statements

1763

performance, there is little evidence to back up this claim (one study[1344]found that 88% of the time wasspent in 20% of the code, while analysis[1455]of some small embedded applications found that 90% of thetime was spent in loops) It may be that researchers are attracted to applications which spend their time inloops because there are often opportunities for optimization Most of existing, published, execution timemeasurements are based on engineering and scientific applications, for database oriented applications[1160]and operating systems[1390]loops have not been found to be so important

The;specified as the last token of adostatement is not needed to reduce the difficulty of parsing Csource It is simply part of an adopted convention

C90

Support for the form:

1 FOR I=start TO end BY step

Some languages (e.g., BCPL, Modula-2) requirestepto be a translation time constant Both Ada or Pascalrequireforstatements to have a step size of one Ada uses the syntax:

which also acts as the definition ofcounter

Cobol supports aPERFORMstatement, which is effectively awhilestatement

1 PERFORM UNTIL quantity > 1000

The equivalent looping constructs In Fortran is known as adostatement A relatively new looping construct,

at least in the Fortran Standard, isFORALL This is used to express a looping computation in a form that canmore easily be translated for parallel execution Some languages (e.g., Modula-2, Pascal) use the keywords

repeat/untilinstead ofdo/while, while other languages (e.g., Ada) do not support an iteration statementwith a test at the end of the loop

A few languages (e.g., Icon[236]which uses the term generators) have generalized the looping construct toprovide what are commonly known as iterators An iterator enumerates the members of a set (a mechanismfor accessing each enumerated member is provided in the language), usually in some unspecified order, andhas a loop termination condition

Trang 19

6.8.5 Iteration statements 1763

Common Implementations

Many programs spend a significant percentage of their time executing iteration statements The following

are some of the ways in which processor and translator vendors have responded to this common usage

implementa-tions

• Translator vendors wanting to optimize the quality of generated machine code have a number of

optimization techniques available to them A traditional loop optimization is strength reduction[280] 0 translator

optimizations

(which replaces costly computations by less expensive ones), while more ambitious optimizers might

perform hoisting of loop invariants and loop unrolling Loop invariants are expressions whose value1774 loop unrolling

does not vary during the iteration of a loop; such expressions can be hoisted to a point just outside

the start of the loop Traditionally translators have only performed loop unrolling onforstatements

(Translation time information on the number of loop iterations and step size is required; this information

can often be obtained by from the expressions in the loop header, i.e., the loop body does not need to

be analyzed.)

More sophisticated optimizations include making use of data dependencies to order the accesses to988 data

depen-dency

storage As might be expected with such a performance critical construct, a large number of other

optimization techniques are also available

• Processor vendors want to design processors that will execute programs as quickly as possible Holding

the executed instructions in a processor’s cache saves the overhead of fetching them from storage and0 cache

most processors cache both instructions and object values Some processors (usually DSP) have what

is known as a zero overhead loop buffer (effectively a software controlled instruction cache) The

sequence of instructions in such a loop buffer can be repetitively executed with zero loop overhead

(the total loop count may be encoded in the looping instruction or be contained in a register) Because

of their small size (the Agere DSP16000[6]loop buffer has a limit of 31 instructions) and restrictions

on instructions that may be executed (e.g., no instructions that change the flow of control) optimizers

can have difficulty making good of such buffers.[1419]

The characteristics of loop usage often means that successive array elements are accessed on successive

loop interactions (i.e., storage accesses have spatial locality) McKinley and Temam[932]give empirical

results on the effect of loops on cache behavior (based on Fortran source)

Some CISC processors support a decrement/increment and branch on nonzero instruction;[323, 625]ideal

for implementing loops whose termination condition is the value zero (something that can be arranged

in handwritten assembler, but which rarely happens in loops written in higher-level languages—

Table1763.1) The simplifications introduced by the RISC design philosophy did away with this kind

of instruction; programs written in high-level languages did not contain enough loops of the right kind

to make it cost effective to support such an instruction However, one application domain where a

significant amount of code is still written in assembler (because of the comparatively poor performance

of translator generated machine code) is that addressed by DSP processors, which often contain such0translatorperformance

vs assembler

0 DSPprocessors

decrement (and/or increment) branch instructions (the SC140 DSP core[989]includes hardware loop

counters that support up to four levels of loop nesting)

The C compiler for the Unisys e-@ction Application Development Solutions[1424] uses the JGD

processor instruction to optimize the loop iteration test However, this usage limits the maximum

number of loop iterations to 235− 2, a value that is very unlikely to be reached in a commercial

program (a trade-off made by the compiler implementors between simplicity and investing effort to

handle very rare situations)

Obtaining an estimate of the execution time of a sequence of statements may require estimating the number

of times an iteration statement will iterate Some implementations provide a mechanism for the developer to

provide iteration count information to the translator For instance, the translator for the TMS320C6000[1373]

supports the following usage:

Trang 20

• • for do

1 #pragma MUST_ITERATE (30) /* Will loop at least 30 times */

Another approach is for the translator to deduce the information from the source.[567]

Program loops may not always be expressed using aniteration-statement(for instance, they may becreated using agotostatement) Ramalingam[1158]gives an algorithm for identifying loops in almost lineartime

Trang 21

Figure 1763.2: Number of functions containingiteration-statements nested to the given maximum nesting level; for

embedded C [397] (whose data was multiplied by a constant to allow comparison) and the visible form of the c files (zero nesting

depth denotes functions not containing anyiteration-statements).

Table 1763.1: Occurrence of various kinds offorstatement controlling expressions (as a percentage of all such expressions).

Where object is a reference to a single object, which may be an identifier, a member (e.g., s.m , s->m->n , or a[expr] ); assignment

is an assignment expression,integer-constantis an integer constant expression, and expression denotes expressions that

contain arithmetic and shift operators Based on the visible form of the c files.

Abstract Form offorloop header % assignment;identifier<identifier;identifierv++ 33.2 assignment;identifier<integer-constant ;identifierv++ 11.3

assignment;identifier;assignment 7.0 assignment;identifier<expression;identifierv++ 3.3 assignment;identifier<identifier; ++videntifier 2.7

assignment;identifier!=identifier;assignment 2.5 assignment;identifier<=identifier;identifierv++ 2.2 assignment;identifier>=integer-constant ;identifierv 1.6 assignment;identifier<function-call;identifierv++ 1.4 assignment;identifier<identifier;identifierv++, identifierv++ 1.4

Table 1763.2: Occurrence of various kinds ofwhilestatement controlling expressions (as a percentage of allwhilestatements).

Where object is a reference to a single object, which may be an identifier, a member (e.g., s.m , s->m->n , or a[expr] ); assignment

is an assignment expression,integer-constantis an integer constant expression, and expression denotes expressions that

contain arithmetic and shift operators Based on the visible form of the c files.

Abstract Form of Control Expression % Abstract Form of Control Expression %

function-call 4.4 object!=integer-constant 1.2 object>integer-constant 4.0 objectv >integer-constant 1.1

Trang 22

for (enum fred { jim, sheila = 10 } i = jim; i < sheila; i++) // loop body

Proposed Committee ResponseThe intent is clear enough;fred,jim, andsheilaare all identifiers which do not denote objects withautoor

registerstorage classes, and are not allowed in this context

C90

Support for this functionality is new in C99

C++

6.4p2 Thedeclaratorshall not specify a function or an array Thetype-specifier-seqshall not containtypedef

and shall not declare a new class or enumeration

11 for (static int ls; /* constraint violation */

15 }

Trang 23

6.8.5 Iteration statements 1766

Other Languages

In some languages (e.g., Ada and Algol 68) the identifier used as a loop control variable in aforstatement is

implicitly declared to have the appropriate type (based on the type of the expressions denoting the start and

end values)

Coding Guidelines

The ability to declare identifiers in this context is new in C99 and at the time of this writing there is insufficient

experience with its use to know whether any guideline recommendation is worthwhile

Semantics

1766An iteration statement causes a statement called the loop body to be executed repeatedly until the controlling iteration

statement executed repeatedly loop body

expression compares equal to 0

Commentary

This defines the term loop body The term loop is commonly used as a noun by developers to refer to

constructs associated with iteration statements (which are rarely referred to as iteration statements by

developers) For instance, the terms loop statement, or simply a loop are commonly used by developers

Execution of the loop may also terminate because abreak,goto, orreturnstatement is executed The

discussion on the evaluation of the controlling expression in anifstatement is applicable here 1744if statementoperand compare

against 0

It is often necessary to access a block of storage (e.g., to copy it somewhere else, or to calculate a checksum Duff’s Device

of its contents) For anything other than the smallest of blocks the overhead of a loop can be significant

1 void send(register unsigned char *to,

The above loop requires a comparison after ever item copied Unrolling the loop would reduce the number

of comparisons per items copied However, becausecountis not known at translation time an optimizer is

unlikely to perform loop unrolling The loop can be unrolled by hand, making sure that code also handles

the situation where the number of items being copied is not an exact multiple of the loop unroll factor A

technique proposed by Tom Duff[378](usually referred to as Duff ’s device) is (the original example used*to,

i.e., the bytes were copied to some memory mapped serial device):

1 void send(register unsigned char *to,

9 case 0: do{ *to++ = *from++;

19 }

Trang 24

6.8.5 Iteration statements

1766

C++

The C++Standard converts the controlling expression to typebooland expresses the termination condition

in terms of true and false The final effect is the same as in C

Other Languages

In many other languages the model of a for loop involves a counter being incremented (or decremented) from

a start value to an end value, while the model of a while loop (or whatever it is called) being something thatiterates until some condition is met There is considerable overlap between these two models (it is alwayspossible to rewrite one form of loop in terms of the other) The differences between the two kinds of loopare purely conceptual ones, created by developer loop classification models Loop classification is oftenbased on deciding whether a loop has the attributes needed to be considered a for loop (e.g., the number ofiterations being known before the first iteration starts), all other loops being classified as while loops Earlyversions of Fortran performed the loop termination test at the end of the loop This meant that loops alwaysiterated at least once, even if the test was false on the first iteration

Coding Guidelines

Some coding guideline documents recommend that loop termination only occur when the condition expressed

in the controlling expression becomes equal to zero A number of benefits are claimed to accrue from adhering

to this recommendation These include, readers being able to quickly find out the conditions under whichthe loop terminates (by looking at the loops controlling expression; which might only be a benefit for oneform of reading) and the desire not to jump across the control flow It is always possible to transform source

4 extern int glob;

5 static int arr[NUM_ELEMS];

Trang 25

Looking at the controlling expression inf_1andf_2it appears to be easy to deduce the condition under

which the loop will terminate However, in both cases the body of the loop contains a test that also effectively

terminates the loop (in the case off_2the body of the loop has increased in complexity by the introduction

of anifstatement) The functionf_3handles the case where guidelines recommend against modifying the

variable

Any guideline recommendation needs to be based on a comparison of the costs and benefits of the loop

constructs in these functions (and other cases) Your author knows of no studies that provide the information

needed to make such a comparison For this reason this coding guideline subsection is silent on the issue of

how loops might terminate A loop where it is known, at translation time, that the number of iterations is

zero, is a loop containing redundant code The issue of redundant code is discussed elsewhere 190 redundant

code

1767The repetition occurs regardless of whether the loop body is entered from the iteration statement or by a

jump.DR268

Commentary

This is a requirement on the implementation

This sentence was added by the response to DR #268

Other Languages

Many languages (e.g., Pascal, Ada) treat loop bodies as indivisible entities and do not permit a jump into

them (although it is usually possible to jump out of them)

Coding Guidelines

Some coding guideline documents recommend against jumping into the body of a loop One argument is that

a reader of the source may not notice that a loop could be entered in this way and makes a modification that

fails to take this case into account (i.e., introduces a fault)

There are a variety of situations where jumping into the body of a loop may result in code that is less

likely to contain faults and be less costly to maintain (see the example given for thegotostatement) 1790 goto

EXAMPLE

Trang 26

6.8.5.1 Thewhilestatement

10 for (int i = 0; i < 10; i++)

11 g((struct S){.mem_1 = i, mem_2 = 42});

case fall through1762

This sentence was added by the response to DR #268

This sentence was added by the response to DR #268

6.8.5.1 Thewhilestatement

Trang 27

6.8.5.2 Thedostatement 1773

Coding Guidelines

Why do developers choose to use awhilestatement rather than aforstatement? Technically a loop can be

written using either kind of statement Both forms of iteration statement are likely to involve initializing,

testing, and modifying one or more objects that systematically change over successive iterations Thefor

statement places these three components in a contiguous, visibly prominent, location Other reasons for the

choice (65.5%for, 34.5%while) include:

• C culture The use of a particular kind of loop to perform a particular operation may be something that

developers learn as part of the process of becoming a C programmer Measurements of the two looping

constructs (see Table1763.1and Table1763.2) show thatforstatements often count up to some value

andwhilestatements iterate until an equality operator is true The pattern of usage seen in the source

being the sum of the operations (e.g., always using aforstatement to loop over the elements of an

array and awhilestatement to loop over a linked list) required to implement the application

• Individual habits While learning to program a developer may have chosen (perhaps a random selection,

or purely a class exercise to practice the using a language construct) to use a particular construction to

perform some operation Reuse of the same construction to perform the same, or similar operations

leads to it becoming established as part of their repertoire The pattern of usage seen in source code

being the sum of individual habits

In both cases the choice offor/whileinvolves a process of algorithmic problem classification Which most

closely matches the developers mental model of the operations being performed? At the time of this writing

there is insufficient information to evaluate whether there is a cost/benefit case to the use ofwhilestatements,

rather thanforstatements These coding guidelines do not discuss this issue any further

There is a commonly seen idiom that uses side effects in the evaluation of the controlling expression to

modify the value of an object in the controlling expression (i.e., the loop control variable) The discussion 1774 loop control

variable

on controlling expressions in anifstatement showed that removing such side effects in the controlling1740 controlling

expression

if statement

expression ofwhilestatements would incur the cost of having to create and maintain two identical statements

(one outside the loop and one inside) Your author is not able to estimate if this cost was less than the

potential benefits of not having the side effects in the controlling expression For this reason no guideline is

applicable to adostatement (because the loop is always executed at least once) Given that the use of ado

statement is relatively rare and that developers are likely to be familiar with the side effect idioms that occur

in controlling expressions, no guideline recommendation is given here

Example

One use of thedostatement is to solve the dangling semicolon problem that can occur when a function-like

macro replaces a function call Bracketing a sequence of statement with braces creates a compound statement,1933 macro

function-like

which does not require a terminating semicolon In most contexts a semicolon following a function-like

macro invocation is a harmless null statement However, as the following example shows, when it forms

the first arm of anifstatement that contains anelsearm, the presence of a semicolon is a syntax violation

Enclosing a sequence of statements in the body of adostatement, whose controlling expression is false,

avoids this problem

Trang 28

6.8.5.3 Theforstatement

for (i = a; i < j; i += sizeof(enum {b, a}))

Trang 29

6.8.5.3 Theforstatement 1774

Except for the behavior of acontinuestatement in the loop body, the statement

and the sequence of statements

Like the C90 Standard, the C++Standard specifies the semantics in terms of an equivalentwhilestatement

However, the C++Standard uses more exact wording, avoiding the possible ambiguities present in the C90

wording

Other Languages

In most other languages the ordering of expressions puts the controlling expression last Or to be more exact,

an upper or lower bound for the loop control variable appears last Most other languages do not support 1774 loop control

variable

having anything other than the loop control variable tested against a value that is known at translation time

Some languages (e.g., Ada, Algol 68, and Pascal) do not allow the loop control variable to be modified by

the body of the loop

Common Implementations

Loop unrollingis the process of decreasing the number of iterations a loop makes by duplicating the statement loop unrolling

in the loop body.[329]For instance:

1 for (loop = 0; loop < 10; loop++)

Loop unrolling reduces the number of jumps performed (which can be a significant saving when the loop

body is short) and by increasing the number of statement in the loop body creates optimization opportunities

(which, in the above example, could result in two loop bodies executing in less time than twice the time for a

single iteration) When the iteration count is not exactly divisible by the loop body unrolling factor copies of

the loop body may need to occur before the start, or after the end, or the loop statement

At the minimum, loop unrolling requires knowing the number of loop iterations and the amount by which

the loop control variable is incremented, at translation time Implementations often place further restrictions

on loops before that they unroll (requiring the loop body to consist of a single basic block is a common

restriction)

Arbitrary amounts of loop unrolling (e.g., iterating 10 times over 100 copies of a loop body where the

original is known to iterate 1000 times) does not necessarily guarantee improved performance Duplicating

Trang 30

6.8.5.3 Theforstatement

Coding Guidelines

Writers of coding guideline documents often regard the components of aforstatement as having attributesthat other loop statements don’t have (e.g., they have an associated loop control variable) While it can beargued that many of these authors have simply grafted onto C concepts that only exist in other languages (orperhaps the encapsulation all of the loop control information in one visually delimited area of the sourcetriggers a cognitive response that triggers implicit assumptions in readers), if a sufficient number of developersassociate these attributes withforstatements then they become part of the culture of C and need to beconsidered here Other loop conceptualization issues are discussed elsewhere

while

statement1772

This subsection discusses one attribute commonly associated withforstatements that is not defined by

loop control

vari-able the C Standard, the so-called loop control variable (or simply loop variable, or loop counter) A loop control

variable is more than simply a concept that might occur during developer discussion, many coding guidelinedocuments make recommendations about its use (e.g., a loop control variable should not be modified during

execution of the body of the loop, or have floating-point type) Which of the potentially four different objectsthat might occur, for instance, in the most common form of loop header (see Table1763.1) is the loop controlvariable?

1 for (lcv_1=0; lcv_2 < lcv_3; lcv_4++)

The following algorithm frequently returns an answer that has been found to be acceptable to developers(it is based on the previous standard and has not been updated to reflect the potential importance of objectsdeclared inclause-1) Note that the algorithm may return zero, or multiple answers; a union or structuremember selection operator and its two operands is treated as a single object, but both an array and any objects

in its subscript are treated as separate objects and therefore possible loop control variables:

1 list all objects appearing inexpression-2(the controlling expression) If this contains a single object,

it is the loop control variable (33.2% of cases in the.cfiles),

2 remove all objects that do not appear inexpression-3(which is evaluated on every loop iteration) If

a single object remains, that is the loop control variable (91.8% of cases in the.cfiles),

3 remove all objects that do not appear inclause-1(which is only evaluated once, prior to loop iteration)

If a single object remains, that is the loop control variable (86.2% of cases in the.cfiles)

Unlike the example given above, in practice the same object often appears as an operand somewhere withinall three components (see Figure1774.1)

Because the controlling expression is evaluated on every iteration of the loop, the loop control variable canappear in contexts that are not supported in other languages (because most evaluate the three loop componentsonly once, prior to the first iteration) For instance:

1 for (lcv_1=0, lcv_2=0; a1[lcv_1] < a2[lcv_2]; lcv_1++, lcv_2+=2)

Experience shows that developers often assume that, in aforstatement, modification of any loop controlvariables only occurs within the loop header This leads to them forming beliefs about properties of the loop,for instance, it loops 10 times There tend to be fewer assumptions made about the use ofwhilestatements(which might not even be thought to have a loop control variable associated with them) and the followingguideline is likely to cause developers to use this form of looping construct

Trang 31

6.8.5.3 Theforstatement 1775

Possible loop control variables

1 10 100 1,000

Figure 1774.1: Number of possible loop control variables appearing inexpression-2(square-box) after filtering against the

objects appearing inexpression-3(cross) and after filtering against the objects appearing inclause-1(bullet) Based on the

visible form of the c files.

Cg1774.1

A loop control variable shall not be modified during the execution of the body of aforstatement

Some coding guideline documents recommend that loop control variables not have floating-point type It

might be thought that such a recommendation only makes sense in languages where the loop termination

condition involves an equality test (in C this case is covered by the guideline recommendation dealing

with the type of the operands of the quality operators) However, the controlling expression in a Cfor

1214.1 equality operators not floating-point operands

statement can contain relational operators, which can also have a dependence on the accuracy of floating-point

operations For instance, it is likely that the author of the following fragment expects the loop to iterate

10 times However, it is possible that 10 increments ofiresult in it having the value 9.9999, and loop

termination not occurring until after the eleventh iteration

1 for (float i=0.0; i < 10.0; i++)

A possible developer response to a guideline recommendation that loop control variables not have floating

point type is to use awhilestatement (which are not covered by the algorithm for deducing loop control

variables) Some of the issues associated with the finite accuracy of operations on floating-point values

can be addressed with guideline recommendations However, the difficulty of creating wording for a

recommendation dealing with the use of floating-point values to control the number of loop iterations is such

that none is attempted here

Table 1774.1: Occurrence of sequences of components omitted from aforstatement header (as a percentage of allfor

statements) Based on the visible form of the c files.

Trang 32

6.8.5.3 Theforstatement

Coding Guidelines

The controlling expression in aforstatement is sometimes written so that its evaluation also has the sideeffect of modifying the value of the loop control variable, removing the need forexpression-3 A developermay have any of a number of reasons for using such an expression, from use of an idiom to misplacedconcern for efficiency (many of the issues associated with side effects within the controlling expression arethe same as those that apply towhilestatements)

and the entire loop, including the other two expressions;

Declaring the loop control variable viaclause-1has the benefit of localizing the visual context over which

it is referenced Possible costs include having to modify existing habits (e.g., looking for the declaration at

Trang 33

6.8.5.3 Theforstatement 1781

Saying in words what is specified in the syntax Aforstatement loop header is essentially a means of

visually highlighting the various components of a loop

C++

The C++Standard does not make this observation, that can be deduced from the syntax

Other Languages

Being able to omit the specification for the initial value of a loop counter (i.e.,clause-1) is unique to C

(and C++) Most languages allow their equivalent ofexpression-3to be omitted and use a default value

(usually either 1 or -1)

Coding Guidelines

Why would a developer choose to omit either of these constructs in aforstatement, rather than using a

whilestatement? This issue is discussed elsewhere 1763iterationstatement

Specifying that an omittedexpression-2is replaced by a nonzero constant allows a more useful meaning to

be given to those cases whereclause-1orexpression-3are present, than by replacing it by the constant 0

Omittingexpression-2creates a loop that can never terminate via a condition in the loop header Executing

abreak,goto, orreturnstatement (or a call to thelongjmplibrary function) can cause execution of the

loop to terminate The term infinite loop is often used to describe aforstatement where the controlling

expression has been omitted In some freestanding environments the main body of a program consists of an

infinite loop that is only terminated when electrical power to the processor is switched off

Other Languages

Most languages require that the loop termination condition be explicitly specified In Ada the loop header is

optional (a missing header implies an infinite loop)

Common Implementations

The standard describes an effect that most implementations do not implement as stated A comparison that is

unconditionally true can be replaced by an unconditional jump

Trang 34

is a nonzero constant probably requires less effort to comprehend (because it is not an implicit special case)for less experienced developers.

Example

1 #define TRUE 1 2

3 void f(void)

4 {

5 for (;;)

6 { /* */ } 7

Trang 35

6.8.6 Jump statements 1782

stages down the pipeline and the following instruction is already in the pipeline behind it Until the processor

executes the branch instruction it does not know which location to fetch the next instruction from, a pipeline

stallhas occurred Branch instructions are relatively common, which means that pipeline stalls can have a

significant performance impact The main techniques used by processor vendors to reduce the impact of

stalls are discussed in the following C sentences

One of the design principles of RISC was to expose some of the underlying processor details to the

translator, in the hope that translators would make use of this information to improve the performance

of the generated machine code Some of the execution delays causes by branch instructions have been

exposed For instance, many RISC processors have what is known as a delay slot immediately after a branch

instruction The instruction in this delay slot is always executed before the jump occurs (some processors

allow delay slot instructions following a conditional branch to be annulled) This delay slot simplifies the

processor by moving some of the responsibility for keeping the pipeline full to the translator writer (who at

worst fills it with a no-op instruction) Most processors have a single delay slot, but the Texas Instruments

TMS320C6000[1372]has five

Fetching the instructions that will soon be executed requires knowing the address of those instructions In

the case of function calls the destination address is usually encoded as part of the instruction; however, the

function return address is usually held on the stack (along with other housekeeping information) Maintaining

a second stack, containing only function return addresses, has been proposed, along with speculative execution

(and stack repair if the speculation does not occur along the control flow path finally chosen[1276])

Calder, Grunwald, and Srivastava[192]studied the behavior of branches in library functions, looking for

common patterns that occurred across all calls

Coding Guidelines

Thecontinueandbreakstatements are a form ofgotostatement Some developers consider them to be a

structured gotoand treat them differently than agotostatement The controversy over the use of thegoto

statement has not abated since Dijkstra’s, now legendary, letter to the editor was published in 1968.[363]Many

reasons have been given for why source code should not containgotostatements; Dijkstra’s was based on

human cognition Knuth argued that in some cases use ofgotoprovided the best solution.[761]

Edsger W Dijkstra

My second remark is that our intellectual powers are rather geared to master static relations and that our powers

to visualize processes evolving in time are relatively poorly developed For that reason we should do (as wise

programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and

the dynamic process, to make the correspondence between the program (spread out in text space) and the process

(spread out in time) as trivial as possible

The heated debate on the use of thegotostatement has generated remarkably little empirical research.[488]

Are guideline recommendations against the use ofgotosimply a hang over from the days when developers

had few structured programming constructs (e.g., compound statements) available in the language they used,

or is there a worthwhile cost/benefit in recommending against their use?

It is possible to transform any C program containing jump statements to one that does not contain any

jump statements This may involve the introduction of additionalwhilestatements,ifstatements, and

the definition of new objects having a boolean type An algorithm for performing this transformation,

while maintaining the topology of the original flow graph and the same order of efficiency, is given by

Ashcoft and Manna.[59] Ammaraguellat[30]gives an algorithm that avoids code replication and normalizes

all control-flow cycles into single-entry single-exitwhileloops In practice automated tools tend to take a

simpler approach to transformation.[405] The key consideration does not appear to be the jump statement

itself, but the destination statement relative to the statement performing the jump This issue is discussed

elsewhere

1783 jump ment

state-causes jump to

Usage

Numbers such as those given in Table1782.1and Table1782.2depend on the optimizations performed by

an implementation For instance, unrolling a frequently executed loop will reduce the percentage of branch

Trang 37

6.8.6 Jump statements 1783

Table 1782.1: Dynamic occurrence of different kinds of instructions that can change the flow of control %Instructions Altering

Control Flow is expressed as a percentage of all executed instructions All but the last row are expressed as percentages of

these, control flow altering, instructions only The kinds of instructions that change control flow are: conditional branches CB,

unconditional branches UB, indirect procedure calls IC, procedure calls PC, procedure returns Ret, and other breaks Oth (e.g.,

signals andswitchstatements) Instructions between branches is the mean number of instructions between conditional branches.

Based on Calder, Grunwald, and Zorn [193]

Program %Instructions

Altering Control Flow

%CB %UB %IC %PC %Ret %Oth %Conditional

Branch Taken

Instructions Between Branches

Table 1782.2: Number of static conditional branches sites that are responsible for the given quantile percentage of dynamically

executed conditional branches For instance, 19 conditional branch sites are responsible for over 50% of the dynamically executed

branches executed by burg Static count is the total number of conditional branch instructions in the program image Of the

17,565 static branch sites, 69 branches account for the execution of 50% of all dynamic conditional branches Not all branches

will be executed during each program execution because many branches are only encountered during error conditions, or may

reside in unreachable or unused code Based on Calder, Grunwald, and Zorn [193]

Some languages allow labels to be treated as types In such languages jump statements can jump between

functions (there is usually a requirement that the function jumped to must have an active invocation in the

current call chain at the time the jump statement is executed) In Algol 68 a label is in effect a function that

jumps to that label It is possible to call that function or take its address In C terms:

Trang 38

Many processors have span-dependent branch instructions That is, a short-form (measured in number ofbytes) that can only branch relatively small distances, while a long-form can branch over longer distances.When storage usage needs to be minimized it may be possible to use a jump chain to branch to a place using

a short-form instruction, rather than a direct jump to that place using a long-form instruction (at the cost ofreduced execution performance).[859]

Coding Guidelines

The term spaghetti code is commonly used to describe code containing a number of jump statements whosevarious destinations cause the control flow to cross the other control flows (a graphical representation of thecontrol flow, using lines to represent flow, resembled cooked spaghetti, i.e., it is intertwined)

Jumps can be split into those cases where the destination is the most important consideration and thosewhere the jump/destination pair need to be considered— as follows:

• Jumping to the start/end of a function/block— the destination being in the same or outer block relative

to the jump This has a straight-forward interpretation as restarting/finishing the execution of afunction/block The statement jumped to may not be the last (some termination code may need to beexecuted, or a guideline recommendation that functions have a single point of exit may cause the label

Trang 39

Figure 1783.1: Number ofgotostatements having a given number of visible source lines between agotostatement and its

destination label (negative values denote backward jumps) Based on the translated form of this book’s benchmark programs.

Scope nesting level

Figure 1783.2: Number ofgotostatements and labels having a given scope nesting level (nesting level 1 is the outermost block

of a function definition), and on the right the difference in scope levels between agotostatement and its corresponding labeled

statement (negative values denote a jump to a less nested scope) Based on the translated form of this book’s benchmark programs.

• Jumping within the same block This is the most common kind ofgotostatement found in C source

(see right plot of Figure1783.2)

Jump statements written by the developer can create a flow of control that is that requires a lot of effort to

comprehend Some guideline documents recommend against the use of any jump statement (including a

return, unless it is the last statement within a function) Comprehending the flow of control is an important

part of comprehending a function The use of jump statements can increase the cost of comprehension (by

increasing the complexity of the flow of control) and can increase the probability that the comprehension

process reaches the wrong conclusion (unlike other constructs that change the flow of control, there are not

usually any addition visual clues, e.g., indentation, in the source signaling the presence of a jump statement)

However, there is no evidence to suggest that the cost of the alternatives is any lower and consequently no

guideline recommendation is made here

Any statements that appear after ajump-statement, in the same block, are dead code 190 dead code

Usage

A study by on Gellerich, Kosiol, and Ploedereder[488]analyzedgotousage in Ada and C In the translated

form of this book’s benchmark programs 20.6% ofgotostatements jumped to a label that occurred textually

before them in the source code

1784

Trang 40

6.8.6.1 Thegotostatement

In C90 this wording appeared in a footnote, where it was an observation about the equivalence between a

forstatement and awhilestatement In C99 this wording plays the role of a specification

is incremented and one of the operand appearing inexpression-2

6.8.6.1 Thegotostatement

Other Languages

Some languages that support nested function definitions (e.g., Pascal) only require that the label name bevisible (i.e., it is possible to jump to a label in a different function) Other languages (e.g., Algol 68) givelabels block scope, which restricts its visibility Perl has a form ofgotothat causes the function named bythe label to be called However, when that function returns control is returned to the function that calledthe function that performed the goto(i.e., the behavior is as-if control returns to thegoto which thenimmediately performs a return)

Ngày đăng: 07/11/2013, 09:15