There are a number of situations where adhering to such a guideline recommendation leads to the creation of redundant code e.g., if all190 redundant code possible values are covered by t
Trang 1Neither of these reasons could be said to contain an actual benefit The cost associated with side effects in
controlling expressions is the possibility that they will go unnoticed by a reader of the source (especially if
kinds of
The most common form of side effect in a controlling expression is assignment, in particular simple
assignment The case where the author of the code intended to type an equality operator, rather than a simple
assignment operator is a fault and these coding guidelines are not intended to recommend against the use of
constructs that are obviously faults However, it is possible that a reader of the visible source will mistake 0 guidelines
not faults
a simple assignment for an equality operator (the token==is much more likely than=in the context of a
controlling expression) and reducing the likelihood of such a mistake occurring is also a cost reduction 1740controllingexpression
if statement
This discussion has referred to controlling expressions as if these costs and benefits apply to their use in
all contexts (i.e., selection and iteration statements) The following example shows that writing code to avoid
the occurrence of side effects in controlling expressions contained with iteration statements requires two,
rather than one, assignments to be used
1 extern int glob_1,
Duplicating the assignment toglob_1creates a maintenance dependency (any changes to one statement
need to be reflected in the other) The increase in cost caused by this maintenance dependency is assumed to
be greater than the cost reduction achieved from reducing the likelihood of a simple assignment operator
being mistaken treated as an equality operator
Cg1740.1
The simple assignment operator shall not occur in the controlling expression of anifstatement
Trang 26.8.4 Selection statements
1741
Experience has shown that there are a variety of other constructs, appearing in a controlling expression, thatdeveloper have difficulty comprehending, or simply miscomprehend when scanning the source However, noother constructs are discussed here The guideline recommendation dealing with the use of the assignmentoperator has the benefit of simplicity and frequency of occurrence It was difficult enough analyzing thecost/benefit case for simple assignment and others are welcome to address more complicated cases
Experience shows that many developers use the verbal form “if expression is not true then” when thinkingabout the condition under which anelseform is executed This use of not can lead to double negatives whenreading some expressions For instance, possible verbal forms of expressing the conditions under which thearms of anifstatement are executed include:
1 if (!x)
3 else
4 b(); /* Executed if not x is not true */
5 /* Executed if not x is equal to 0 */
6 /* Executed if x is not equal to 0 */
7
8 if (x != y)
10 else
11 d(); /* Executed if x is not equal to y is not true */
The possible on linguistic impact of the!operator on expression comprehension is discussed elsewhere
In the translated form of this book’s benchmark programs 1.3% ofselection-statements and 4% of
iteration-statements have a controlling expression that is a constant expression Use of simple, iterative, flow analysis enables a further 0.6% of all controlling expressions to be evaluated to a constantexpression at translation time
if (sizeof(enum {b, a}) != sizeof(int)) return a; // a == 1
return b; // which b?
}
In C89, the declarationenum {b, a}persists after the if-statement terminates; but in C99, the implied blockthat encloses the entireifstatement limits the scope of that declaration; therefore the different functionreturns different values in C89 and C99 The Committee views such cases as unintended artifacts of allowingdeclarations as operands of cast andsizeofoperators; and this change is not viewed as a serious problem
See the following C sentence for a further discussion on the rationale
block
selection
sub-statement
1742
Trang 3Developers are more likely to be tripped up by the lifetime issues associated with compound literals than
enumeration constants For instance in:
1 if (f(p=&(struct S){1, 2}))
3 val=p->mem_1;
the lifetime of the storage whose address is assigned to pends when the execution of theifstatement
terminates Ensuring that developers are aware of this behavior is an educational issue However, developers
intentionally relying on the pointed-to storage continuing to exist (which it is likely to, at least until storage
needs to be allocated to another object) is a potential guideline issue However, until experience has been
gained on how developers use compound literals it is not known whether this issue is simply an interesting
theoretical idea of a real practical problem
1742Each associated substatement is also a block whose scope is a strict subset of the scope of the selection block
selection statementstatement
sub-Commentary
Rationale
A new feature of C99: A common coding practice is always to use compound statements for every selection
and iteration statement because this guards against inadvertent problems when changes are made in the
future Because this can lead to surprising behavior in connection with certain uses of compound literals
(§6.5.2.5), the concept of a block has been expanded in C99
Given the following example involving three different compound literals:
extern void fn(int*, int*);
int examp(int i, int j)
{
int *p, *q;
if (*(q = (int[2]){i, j})) fn(p = (int[5]){9, 8, 7, 6, 5}, q);
else fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);
return *p;
}
it seemed surprising that just introducing compound statements also introduced undefined behavior:
extern void fn(int*, int*);
int examp(int i, int j)
{
int *p, *q;
Trang 46.8.4 Selection statements
1742
if (*(q = (int[2]){i, j})) { fn(p = (int[5]){9, 8, 7, 6, 5}, q);
} else { fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);
as if written:
extern void fn(int*, int*);
int examp(int i, int j) {
int *p, *q;
{
if (*(q = (int[2]){i, j})) { // *q is guaranteed to designate an object fn(p = (int[5]){9, 8, 7, 6, 5}, q);
} else { // *q is guaranteed to designate an object fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1);
} }
// *q is not guaranteed to designate an object
return *p; // *p is not guaranteed to designate an object }
If compound literals are defined in selection or iteration statements, their lifetimes are limited to the impliedenclosing block; therefore the definition of “block” has been moved to this section This change is compatiblewith similar C++rules
C90
The following example illustrates the rather unusual combination of circumstances needed for the specificationchange, introduced in C99, to result in a change of behavior
1 extern void f(int);
2 enum {a, b} glob;
Trang 5compound statement contains a single statement the use of braces is redundant and their presence decreases
the amount of information visible on a display (the number of available lines is fixed and each brace usually
occupies one line) However, experience has shown that in some cases the presence of these braces can:
• Provide additional visual cues that can reduce the effort needed, by readers, to comprehend a sequence
of statements However, the presence of these redundant braces reduces the total amount of information
immediately visible, to a reader, on a single display (i.e., the amount of source code that can be seen
without expending motor effort giving editor commands to change the display contents) The way in 0 cost/accuracy
trade-off
which these costs and benefits trade-off against each other is not known
• Help prevent faults being introduced when code is modified (i.e., where a modification results in
unintended changes to the syntactic bindings of blocks to statement headers) Experience shows that1707 statement
header
nestedifstatements are the most common construct whose modification results in unintended changes
to the syntactic bindings of blocks
In the following example the presence of braces provides both visual cues that the elsedoes not bind
to the outerifand additional evidence (its indentation provides counter evidence because it provides an
interpretation that the intent is to bind to the outerif) that it is intended to bind to the innerif
Blocks occur in a number of statements, is there a worthwhile cost/benefit in guideline recommendation
specifying that these blocks always be a compound statement?
The block associated with aswitchstatement is invariably a compound statement A guideline
recom-mendation that braces be used is very likely to be redundant in this case Iteration statements are not as
common as selection statements and much less likely to be nested (in other iteration statements) than selection
statements (compare Figure1739.2and Figure1763.2), and experience suggests developer comprehension
of such constructs is significantly affected by the use of braces Experience suggests that the nestedif
statement is the only construct where the benefit of the use of braces is usually greater than the cost
Cg1742.1
The statement forming the block associated with either arm of anifstatement shall not be anif
statement
Trang 66.4p4 The value of a condition that is an expression is the value of the expression, implicitly converted toboolfor
statements other thanswitch; if that conversion is ill-formed, the program is ill-formed
If only constructs that are available in C are used the set of possible expressions is the same
This subsection concentrates on the form of the controlling expression
The value of a controlling expression is used to make one of two choices Values used in this way aregenerally considered to have a boolean role Some languages require the controlling expression to have
a boolean type and their translators enforce this requirement Some coding guideline documents containrecommendations that effectively try to duplicate this boolean type requirement found in other languages.Recommendations based on type not only faces technical problems in their wording and implementation(caused by the implicit promotions and conversions performed in C), but also fail to address the real issues ofdeveloper comprehension and performance
In the context of anifstatement do readers of the source distinguish between expressions that have twopossible values (i.e., boolean roles), and expressions that may have more than two values being used in acontext where an implicit test against zero is performed? Is the consideration of boolean roles a culturalbaggage carried over to C by developers who have previously used them in other languages? Do readers whohave only ever programmed in C make use of boolean roles, or do they think in terms of a test against zero?
In the absence of studies of developer mental representations of algorithmic and source code constructs, it isnot possible to reliably answer these questions Instead the following discussion looks at the main issuesinvolved in making use of boolean roles and making use of the implicit a test against zero special case
A boolean role is not about the type of an expression (prior to the introduction of the type_Boolin C99, acharacter type was often used as a stand-in), but about the possible values an expression may have and howthey are used The following discussion applies whether a controlling expression has an integer, floating, orpointer type
In some cases the top-level operator of a controlling expression returns a result that is either zero or one(e.g., the relational and equality operators) The visibility, in the source, of such an operator signals itsboolean role to readers However, in other cases (see Table1763.2) developers write controlling expressionsthat do not contain explicit comparisons (the value of a controlling expression is implicitly compared againstzero) What are the costs and benefits of omitting an explicit comparison? The following code fragmentcontains examples of various ways of writing a controlling expression:
Trang 7Does the presence of an explicit visual (rather than an implicit, in the developers mind) comparison reduce
either the cognitive effort needed to comprehend the if statement or the likelihood of readers making
mistakes? Given sufficient practice readers can learn to automatically process if (x)as if it had been
written asif (x != 0) The amount of practice needed to attain an automatic level of performance is
unknown Another unknown is the extent to which the token sequence!= 0acts as a visual memory aid
When the visible form of the controlling expression is denoted by a single object (which may be an
ordinary identifier, or the member of a structure, or some other construct where a value is obtained from an
object) that name may provide information on the values it represents To obtain this information readers
might make use of the following:
• Software development conventions In the software development community (and other communities)
the term flag is generally understood to refer to something that can be in one of two states For instance,
the identifiermimbo_flagis likely to be interpreted as having two possible values relating to a mimbo,
rather than referring to the national flag of Mimbo Some naming conventions contain a greater degree
of uncertainty than others For instance, identifiers whose names contain the character sequence status
sometimes represent more than two values
• Natural language knowledge Speakers of English regard some prepositions as being capable of
representing two states For instance, a cat is or is not black This natural language usage is often
adopted when selecting identifier names For instance,is_flobberis likely to be interpreted as
representing one of two states (being a, or having the attribute of, flobber or not)
• Real world knowledge A program sometimes needs to take account of information from the real
world For instance, height above the ground is an important piece of information in an airplane flight
simulator, with zero height having a special status
• Application knowledge The design of a program invariably makes use of knowledge about the
application domain it operates within For instance, the termziffermay be used within the application
domain that a program is intended to operate Readers of the source will need the appropriate
application knowledge to interpret the role of this identifier
• Program implementation conventions The design of a program involves creating and using various
conventions For instance, a program dealing with book printing may perform special processing for
books that don’t contain any pages (e.g.,num_pagesbeing zero is a special case)
• Conventions and knowledge from different may be mixed together For instance, the identifier name
current_colorsuggests that it represents color information This kind of information is not usually
thought about in terms of numeric values and there are certainly more than two colors However,
assigning values to symbolic qualities is a common software development convention, as is assigning
a special interpretation to certain values (e.g., using zero to represent no known color, a program
implementation convention)
The likelihood of a reader assuming that an identifier name has a boolean role will depend on the cultural
beliefs and conventions they share with the author of the source There is also the possibility that rather than
using the identifier name to deduce a boolean role, readers may use the context in which it occurs to infer a 476 boolean role
boolean role This is an example of trust based usage Requiring that values always be compared (against 792 trust based
usage
true/false or zero/nonzero) leads to infinite regression, as in the sequence:
Trang 8of zero or one (e.g., an equality or relational operator), it is possible to generate machine code that performs
a branch rather than returning a value that is then compared Some processors have a single instruction thatperforms a comparison and branch, while others have separate instructions (the comparison instruction settingprocessor condition flags that are then tested by a conditional branch instruction) On some processors simplyloading a value into a register also results in a comparison against zero being made, with the appropriateprocessor condition flags being set The use of conditional instructions is discussed elsewhere
Some coding guideline documents recommend that theelseform always be present, even if it contains
no executable statements Such a recommendation has the benefit of ensuring that there are never anymismatchingif/elsepairs However, then the same effect can be achieved by requiring nestedifstatements
to be enclosed in braces (this issue is discussed elsewhere) The cost of adding emptyelseforms increases
Trang 96.8.4.2 Theswitchstatement 1748
1746If the first substatement is reached via a label, the second substatement is not executed
Commentary
The flow of control of a sequence of statements is not influenced by how they were initially reached, in the
flow of control The label may be reached as a result of executing aswitchstatement, or agotostatement.1753switchstatement
causes jump
1789 goto
causes ditional jump
uncon-The issue of jumping into nested blocks or the body of iteration statements is discussed elsewhere
1783 jump ment
state-causes jump to
1766 iteration statement
executed edly
repeat-C++
The C++Standard does not explicitly specify the behavior for this case
Other Languages
This statement applies to all programming languages that support jumps into more deeply nested blocks
1747Anelseis associated with the lexically nearest precedingifthat is allowed by the syntax else
Languages that support nesting of conditional statements need a method of resolving which construct an
elsebinds to The rules used include the following:
• Not supporting in the language syntax unbracketed nesting (i.e., requiring braces orbegin/endpairs)
within the then arm For instance, Algol 60 permits the usageIF q1 THEN a1 ELSE IF q2 THEN
a2 ELSE a3, but the following is a syntax violationIF q1 THEN IF q2 THEN a1 ELSE a2 ELSE
a3
• Using a matching token to pair with theif The keywordfiis a common choice (used by Ada, Algol
68, while the C preprocessor usesendif) In this case the bracketing formed by theif/fiprevents
any ambiguity occurring
• Like C— using the nearest preceding rule
Coding Guidelines
If the guideline recommendation on using braces is followed there will only ever be one lexically preceding 1742.1 if statement
block not an if statement
ifthat anelsecan be associated with Some coding guideline documents recommend that anifstatement
always have an associatedelseform, even if it only contains the null statement 1733 null
Aswitchstatement uses the exact value of its controlling expression and it is not possible to guarantee
the exact value of an expression having a floating type (there is a degree of unpredictability in the value
between different implementations) For this reason implementations are not required to support controlling
expressions having a floating type
C++
6.4.2p2
Trang 106.8.4.2 Theswitchstatement
1749
The condition shall be of integral type, enumeration type, or of a class type for which a single conversion function
to integral or enumeration type exists (12.3)
If only constructs that are available in C are used the set of possible expressions is the same
Coding Guidelines
A controlling expression, in aswitchstatement, having a boolean role might be thought to be unusual, an
ifstatement being considered more appropriate However, the designer may be expecting the type of thecontrolling expression to evolve to a non-boolean role, or theswitchstatement may have once containedmorecaselabels
Table 1748.1: Occurrence ofswitchstatements having a controlling expression of the given type (as a percentage of allswitch
statements) Based on the translated form of this book’s benchmark programs.
unsigned long 18.7 unsigned short 2.8
unsigned char 12.4 long 0.9
unsigned int 10.0 other-types 0.2
modified type modified type, the entireswitchstatement shall be within the scope of that identifier.133)
case value density
1 10 100
nodefaultresults were scaled so that the total count ofswitchstatements matched those that included adefaultlabel.
Trang 116.8.4.2 Theswitchstatement 1749
Figure 1748.2: Number ofcase/defaultlabels having s given number of statements following them (statements from any
nestedswitchstatements did not contribute towards the count of a label) Based on the visible form of the c files.
Commentary
The declaration of an identifier having variable modified type can occur in one of the sequence of statements
labeled by acaseordefault, provided it appears within a compound statement that does not contain any
other caseordefaultlabels associated with thatswitchstatement, or it appear after the lastcaseor
defaultlabel in theswitchstatement In the compound statement case the variably modified type will
not be within the scope of anycaseordefaultlabels (its lifetime terminates at the end of the compound
statement)
The wording of the requirement is overly strict in that it prohibits uses that might be considered well
behaved For instance:
Attempting to create wording to support such edge cases was considered to be a risk (various ambiguities
may later be found in it) that was not worth the benefit Additional rationale for this requirement is discussed
modified type
C90
Support for variably modified types is new in C99
C++
Support for variably modified types is new in C99 and is not specified in the C++Standard
The C++Standard contains the additional requirement that (the wording in a subsequent example suggests
that being visible rather than in scope is more accurate terminology):
6.7p3
It is possible to transfer into a block, but not in a way that bypasses declarations with initialization A program
that jumps77)from a point where a local variable with automatic storage duration is not in scope to a point where
it is in scope is ill-formed unless the variable hasPOD type (3.9) and is declared without aninitializer(8.5)
Trang 126.8.4.2 Theswitchstatement
Trang 136.8.4.2 Theswitchstatement 1753
Coding Guidelines
Some coding guideline documents recommend that allswitchstatements contain adefaultlabel There
does not appear to be an obvious benefit (as defined by these coding guideline subsections, although there may
be benefits for other reasons) for such a guideline recommendation To adhere to the guideline developers
simply need to supply adefaultlabel and an associated null statement There are a number of situations
where adhering to such a guideline recommendation leads to the creation of redundant code (e.g., if all190 redundant
code
possible values are covered by thecaselabels, either because they handle all values that the controlling
expression can take or because execution of theswitchstatement is conditional on anifstatement that
guarantees the controlling expression is within a known range)
Usage
In the visible form of the.cfiles, 72.8% ofswitchstatements contain adefaultlabel
1752(Any enclosedswitchstatement may have adefaultlabel orcaseconstant expressions with values that
duplicatecaseconstant expressions in the enclosingswitchstatement.)
Commentary
This specification (semantics in a Constraints clause) clarifies the interpretation to be given to the phrase “in
the sameswitchstatement” appearing earlier in this Constraints clause 1750case la-bel unique
in same switch
Semantics
causes jump
on the value of a controlling expression, and on the presence of adefaultlabel and the values of anycase
labels on or in the switch body
Commentary
This defines the term switch body Developers also use the terminology body of the switch
It is possible to write aswitchstatement as an equivalent sequence ofifstatements However, experience
shows that in some cases theswitchstatement appears to require less significantly less (cognitive) effort to
comprehend than a sequence ofifstatements
Common Implementations
Many processors include some form of instruction (often called an indirect jump) that indexes into a table
(commonly known as a jump table) to obtain a location to jump to The extent to which it is considered
to be more efficient to use such an instruction, rather than a series of if statements, varies between
processors (whose behavior varies for the situation where the index is out of range of the jump table) and
implementations (the sophistication of the available optimizer) The presence of adefaultlabel creates
additional complications in that all values of the controlling expression, not covered by acaselabel, need to
be explicitly handled Spuler[1300]discusses the general issues
Some translators implementswitchstatements as a series ofifstatements Knowledgeable developers
know that, in such implementations, placing the most frequently executed case labels before the less
frequently executed ones can provide a worthwhile performance improvement Some translators[22, 588]
provide an option that allows the developer to specify whether a jump table, sequence ofifstatements or
some other method should to be used
Optimal execution time performance is not the only factor that implementations need to consider The
storage occupied by the jump table sometimes needs to be taken into account In a simple implementation it
is proportional to the difference between the maximum and minimum values appearing in thecaselabels
(which may not be considered an efficient use of storage if there are only a few case labels used within this
range) A more sophisticated technique than using a series ofifstatements is to create a binary tree ofcase
label values and jump addresses The value of the controlling expression being used to walk this tree to
obtain the destination address Some optimizers split the implementation into a jump table for thosecase
label values that are contiguous and a binary tree for the out lying values
Trang 146.8.4.2 Theswitchstatement
1756
Translator vendors targeting modern processors face an additional problem Successful processors oftencontain a range of different implementations, creating a processor family, (e.g., the Intel Pentium series).These different processor implementations usually have different performance characteristics, and in the case
of theswitchstatement different levels of sophistication in branch prediction How does a translator makethe decision on whether to use a jump table orifstatements when the optimal code varies between differentimplementations of a particular processor?
A study by Uh and Whalley[1420]compared (see Table1753.1) the performance of a series ofifstatementsand the equivalent jump table implementation For three of the processors it was worth using a jump tablewhen there were more than twoifstatements were likely to be executed In the case of the ULTRASPARC-1the figure was more than eightifstatements executed (this was put down to the lack hardware support forbranch prediction of indirect jumps)
Table 1753.1: Performance comparison (in seconds) of some implementation techniques for a series ofifstatements (contained
in a loop that iterated 10,000,000 times) using (1) linear search (LS), or (2) indirect jump (IJ), for a variety of processors in the SPARC family br is the average number of branches per loop iteration Based on Uh and Whalley [1420]
When the controlling expression is denoted by an object having a character type the possible range of values
is known to fit in a byte Even relatively simple optimizers often check for, and make use of, this special case
Trang 156.8.4.2 Theswitchstatement 1756
Other Languages
Many languages have a single integer type, so there is no conversion to perform for caselabel values
Strongly typed languages usually require that the type of thecaselabel value be compatible with the type
of the controlling expression, there is not usually any implicit conversions Enumerated constants are often
defined to be separate types, that are not compatible with any integer type
Coding Guidelines
This C sentence deals with the relationship between individual case label values and the controlling
expression The following points deal with the relationship between differentcaselabel values within a
givenswitchstatement:
• Mixingcaselabels whose values are represented using both character constants and integer constants
is making use of representation information (in this context the macroEOFmight be interpreted in
its symbolic form of representing an end-of-file character, rather than an integer constant) There
does not appear to be a worthwhile benefit in having a deviation that permits the use of the integer
constant0rather than the character constant’\0’, on the grounds of improved reader recognition
performance The character constant’\0’is the most commonly occurring character constant (10%
of all character constants in the visible form of the.cfiles, even if it only represents 1% of all constant
tokens denoting the value 0)
• Mixingcaselabels whose values are represented using both enumeration constants and some other
form of constant representation (e.g., an integer constant) is making use of the underlying representation
of the enumerated constants The same is also true if enumerated constants from different enumerations
types are mixed
• Mixing integer constants represented using decimal, hexadecimal, or octal lexical forms The issue of
visually mixing integer constants having different lexical forms is discussed elsewhere 1875form of rep-resentation
mixing
Floating point literals are very rarely seen incaselabels The guideline recommendation dealing with exact
comparison of floating-point values is applicable to this usage
1214.1 equality operators not floating-point operands
Trang 166.8.4.2 Theswitchstatement
Trang 176.8.5 Iteration statements 1763
1761133) That is, the declaration either precedes theswitchstatement, or it follows the lastcaseordefaultlabel footnote
133associated with theswitchthat is in the block containing the declaration
Commentary
If the declaration is not followed by anycaseordefaultlabels, all references to the identifier it declares
can only occur in the statements that follow it (which can only be reached via a jump to precedingcaseor
defaultlabels, unless agotostatement jumps to an ordinary label within the statement list occurs)
case fall through switch (expr)
{ int i = 4;
and thus if the controlling expression has a nonzero value, the call to theprintffunction will access an
indeterminate value Similarly, the call to the functionfcannot be reached
Commentary
Objects with static storage duration are initialized on program startup
151 static storage duration
initialized before startup
1763
iteration ment syntax
state-iteration-statement:
Commentary
The terms loop header or head of the loop are sometimes used to refer to the source code location containing
the controlling expression of a loop (in the case of aforstatement it might be applied to all three components
bracketed by parentheses)
It is often claimed that programs spend 90% of their time executing 10% of their code This characteristic
is only possible if the time is spent in a subset of the programs iteration statements, or a small number of
functions called within those statements While there is a large body of published research on program
Trang 186.8.5 Iteration statements
1763
performance, there is little evidence to back up this claim (one study[1344]found that 88% of the time wasspent in 20% of the code, while analysis[1455]of some small embedded applications found that 90% of thetime was spent in loops) It may be that researchers are attracted to applications which spend their time inloops because there are often opportunities for optimization Most of existing, published, execution timemeasurements are based on engineering and scientific applications, for database oriented applications[1160]and operating systems[1390]loops have not been found to be so important
The;specified as the last token of adostatement is not needed to reduce the difficulty of parsing Csource It is simply part of an adopted convention
C90
Support for the form:
1 FOR I=start TO end BY step
Some languages (e.g., BCPL, Modula-2) requirestepto be a translation time constant Both Ada or Pascalrequireforstatements to have a step size of one Ada uses the syntax:
which also acts as the definition ofcounter
Cobol supports aPERFORMstatement, which is effectively awhilestatement
1 PERFORM UNTIL quantity > 1000
The equivalent looping constructs In Fortran is known as adostatement A relatively new looping construct,
at least in the Fortran Standard, isFORALL This is used to express a looping computation in a form that canmore easily be translated for parallel execution Some languages (e.g., Modula-2, Pascal) use the keywords
repeat/untilinstead ofdo/while, while other languages (e.g., Ada) do not support an iteration statementwith a test at the end of the loop
A few languages (e.g., Icon[236]which uses the term generators) have generalized the looping construct toprovide what are commonly known as iterators An iterator enumerates the members of a set (a mechanismfor accessing each enumerated member is provided in the language), usually in some unspecified order, andhas a loop termination condition
Trang 196.8.5 Iteration statements 1763
Common Implementations
Many programs spend a significant percentage of their time executing iteration statements The following
are some of the ways in which processor and translator vendors have responded to this common usage
implementa-tions
• Translator vendors wanting to optimize the quality of generated machine code have a number of
optimization techniques available to them A traditional loop optimization is strength reduction[280] 0 translator
optimizations
(which replaces costly computations by less expensive ones), while more ambitious optimizers might
perform hoisting of loop invariants and loop unrolling Loop invariants are expressions whose value1774 loop unrolling
does not vary during the iteration of a loop; such expressions can be hoisted to a point just outside
the start of the loop Traditionally translators have only performed loop unrolling onforstatements
(Translation time information on the number of loop iterations and step size is required; this information
can often be obtained by from the expressions in the loop header, i.e., the loop body does not need to
be analyzed.)
More sophisticated optimizations include making use of data dependencies to order the accesses to988 data
depen-dency
storage As might be expected with such a performance critical construct, a large number of other
optimization techniques are also available
• Processor vendors want to design processors that will execute programs as quickly as possible Holding
the executed instructions in a processor’s cache saves the overhead of fetching them from storage and0 cache
most processors cache both instructions and object values Some processors (usually DSP) have what
is known as a zero overhead loop buffer (effectively a software controlled instruction cache) The
sequence of instructions in such a loop buffer can be repetitively executed with zero loop overhead
(the total loop count may be encoded in the looping instruction or be contained in a register) Because
of their small size (the Agere DSP16000[6]loop buffer has a limit of 31 instructions) and restrictions
on instructions that may be executed (e.g., no instructions that change the flow of control) optimizers
can have difficulty making good of such buffers.[1419]
The characteristics of loop usage often means that successive array elements are accessed on successive
loop interactions (i.e., storage accesses have spatial locality) McKinley and Temam[932]give empirical
results on the effect of loops on cache behavior (based on Fortran source)
Some CISC processors support a decrement/increment and branch on nonzero instruction;[323, 625]ideal
for implementing loops whose termination condition is the value zero (something that can be arranged
in handwritten assembler, but which rarely happens in loops written in higher-level languages—
Table1763.1) The simplifications introduced by the RISC design philosophy did away with this kind
of instruction; programs written in high-level languages did not contain enough loops of the right kind
to make it cost effective to support such an instruction However, one application domain where a
significant amount of code is still written in assembler (because of the comparatively poor performance
of translator generated machine code) is that addressed by DSP processors, which often contain such0translatorperformance
vs assembler
0 DSPprocessors
decrement (and/or increment) branch instructions (the SC140 DSP core[989]includes hardware loop
counters that support up to four levels of loop nesting)
The C compiler for the Unisys e-@ction Application Development Solutions[1424] uses the JGD
processor instruction to optimize the loop iteration test However, this usage limits the maximum
number of loop iterations to 235− 2, a value that is very unlikely to be reached in a commercial
program (a trade-off made by the compiler implementors between simplicity and investing effort to
handle very rare situations)
Obtaining an estimate of the execution time of a sequence of statements may require estimating the number
of times an iteration statement will iterate Some implementations provide a mechanism for the developer to
provide iteration count information to the translator For instance, the translator for the TMS320C6000[1373]
supports the following usage:
Trang 20• • for do
1 #pragma MUST_ITERATE (30) /* Will loop at least 30 times */
Another approach is for the translator to deduce the information from the source.[567]
Program loops may not always be expressed using aniteration-statement(for instance, they may becreated using agotostatement) Ramalingam[1158]gives an algorithm for identifying loops in almost lineartime
Trang 21Figure 1763.2: Number of functions containingiteration-statements nested to the given maximum nesting level; for
embedded C [397] (whose data was multiplied by a constant to allow comparison) and the visible form of the c files (zero nesting
depth denotes functions not containing anyiteration-statements).
Table 1763.1: Occurrence of various kinds offorstatement controlling expressions (as a percentage of all such expressions).
Where object is a reference to a single object, which may be an identifier, a member (e.g., s.m , s->m->n , or a[expr] ); assignment
is an assignment expression,integer-constantis an integer constant expression, and expression denotes expressions that
contain arithmetic and shift operators Based on the visible form of the c files.
Abstract Form offorloop header % assignment;identifier<identifier;identifierv++ 33.2 assignment;identifier<integer-constant ;identifierv++ 11.3
assignment;identifier;assignment 7.0 assignment;identifier<expression;identifierv++ 3.3 assignment;identifier<identifier; ++videntifier 2.7
assignment;identifier!=identifier;assignment 2.5 assignment;identifier<=identifier;identifierv++ 2.2 assignment;identifier>=integer-constant ;identifierv 1.6 assignment;identifier<function-call;identifierv++ 1.4 assignment;identifier<identifier;identifierv++, identifierv++ 1.4
Table 1763.2: Occurrence of various kinds ofwhilestatement controlling expressions (as a percentage of allwhilestatements).
Where object is a reference to a single object, which may be an identifier, a member (e.g., s.m , s->m->n , or a[expr] ); assignment
is an assignment expression,integer-constantis an integer constant expression, and expression denotes expressions that
contain arithmetic and shift operators Based on the visible form of the c files.
Abstract Form of Control Expression % Abstract Form of Control Expression %
function-call 4.4 object!=integer-constant 1.2 object>integer-constant 4.0 objectv >integer-constant 1.1
Trang 22for (enum fred { jim, sheila = 10 } i = jim; i < sheila; i++) // loop body
Proposed Committee ResponseThe intent is clear enough;fred,jim, andsheilaare all identifiers which do not denote objects withautoor
registerstorage classes, and are not allowed in this context
C90
Support for this functionality is new in C99
C++
6.4p2 Thedeclaratorshall not specify a function or an array Thetype-specifier-seqshall not containtypedef
and shall not declare a new class or enumeration
11 for (static int ls; /* constraint violation */
15 }
Trang 236.8.5 Iteration statements 1766
Other Languages
In some languages (e.g., Ada and Algol 68) the identifier used as a loop control variable in aforstatement is
implicitly declared to have the appropriate type (based on the type of the expressions denoting the start and
end values)
Coding Guidelines
The ability to declare identifiers in this context is new in C99 and at the time of this writing there is insufficient
experience with its use to know whether any guideline recommendation is worthwhile
Semantics
1766An iteration statement causes a statement called the loop body to be executed repeatedly until the controlling iteration
statement executed repeatedly loop body
expression compares equal to 0
Commentary
This defines the term loop body The term loop is commonly used as a noun by developers to refer to
constructs associated with iteration statements (which are rarely referred to as iteration statements by
developers) For instance, the terms loop statement, or simply a loop are commonly used by developers
Execution of the loop may also terminate because abreak,goto, orreturnstatement is executed The
discussion on the evaluation of the controlling expression in anifstatement is applicable here 1744if statementoperand compare
against 0
It is often necessary to access a block of storage (e.g., to copy it somewhere else, or to calculate a checksum Duff’s Device
of its contents) For anything other than the smallest of blocks the overhead of a loop can be significant
1 void send(register unsigned char *to,
The above loop requires a comparison after ever item copied Unrolling the loop would reduce the number
of comparisons per items copied However, becausecountis not known at translation time an optimizer is
unlikely to perform loop unrolling The loop can be unrolled by hand, making sure that code also handles
the situation where the number of items being copied is not an exact multiple of the loop unroll factor A
technique proposed by Tom Duff[378](usually referred to as Duff ’s device) is (the original example used*to,
i.e., the bytes were copied to some memory mapped serial device):
1 void send(register unsigned char *to,
9 case 0: do{ *to++ = *from++;
19 }
Trang 246.8.5 Iteration statements
1766
C++
The C++Standard converts the controlling expression to typebooland expresses the termination condition
in terms of true and false The final effect is the same as in C
Other Languages
In many other languages the model of a for loop involves a counter being incremented (or decremented) from
a start value to an end value, while the model of a while loop (or whatever it is called) being something thatiterates until some condition is met There is considerable overlap between these two models (it is alwayspossible to rewrite one form of loop in terms of the other) The differences between the two kinds of loopare purely conceptual ones, created by developer loop classification models Loop classification is oftenbased on deciding whether a loop has the attributes needed to be considered a for loop (e.g., the number ofiterations being known before the first iteration starts), all other loops being classified as while loops Earlyversions of Fortran performed the loop termination test at the end of the loop This meant that loops alwaysiterated at least once, even if the test was false on the first iteration
Coding Guidelines
Some coding guideline documents recommend that loop termination only occur when the condition expressed
in the controlling expression becomes equal to zero A number of benefits are claimed to accrue from adhering
to this recommendation These include, readers being able to quickly find out the conditions under whichthe loop terminates (by looking at the loops controlling expression; which might only be a benefit for oneform of reading) and the desire not to jump across the control flow It is always possible to transform source
4 extern int glob;
5 static int arr[NUM_ELEMS];
Trang 25Looking at the controlling expression inf_1andf_2it appears to be easy to deduce the condition under
which the loop will terminate However, in both cases the body of the loop contains a test that also effectively
terminates the loop (in the case off_2the body of the loop has increased in complexity by the introduction
of anifstatement) The functionf_3handles the case where guidelines recommend against modifying the
variable
Any guideline recommendation needs to be based on a comparison of the costs and benefits of the loop
constructs in these functions (and other cases) Your author knows of no studies that provide the information
needed to make such a comparison For this reason this coding guideline subsection is silent on the issue of
how loops might terminate A loop where it is known, at translation time, that the number of iterations is
zero, is a loop containing redundant code The issue of redundant code is discussed elsewhere 190 redundant
code
1767The repetition occurs regardless of whether the loop body is entered from the iteration statement or by a
jump.DR268
Commentary
This is a requirement on the implementation
This sentence was added by the response to DR #268
Other Languages
Many languages (e.g., Pascal, Ada) treat loop bodies as indivisible entities and do not permit a jump into
them (although it is usually possible to jump out of them)
Coding Guidelines
Some coding guideline documents recommend against jumping into the body of a loop One argument is that
a reader of the source may not notice that a loop could be entered in this way and makes a modification that
fails to take this case into account (i.e., introduces a fault)
There are a variety of situations where jumping into the body of a loop may result in code that is less
likely to contain faults and be less costly to maintain (see the example given for thegotostatement) 1790 goto
EXAMPLE
Trang 266.8.5.1 Thewhilestatement
10 for (int i = 0; i < 10; i++)
11 g((struct S){.mem_1 = i, mem_2 = 42});
case fall through1762
This sentence was added by the response to DR #268
This sentence was added by the response to DR #268
6.8.5.1 Thewhilestatement
Trang 276.8.5.2 Thedostatement 1773
Coding Guidelines
Why do developers choose to use awhilestatement rather than aforstatement? Technically a loop can be
written using either kind of statement Both forms of iteration statement are likely to involve initializing,
testing, and modifying one or more objects that systematically change over successive iterations Thefor
statement places these three components in a contiguous, visibly prominent, location Other reasons for the
choice (65.5%for, 34.5%while) include:
• C culture The use of a particular kind of loop to perform a particular operation may be something that
developers learn as part of the process of becoming a C programmer Measurements of the two looping
constructs (see Table1763.1and Table1763.2) show thatforstatements often count up to some value
andwhilestatements iterate until an equality operator is true The pattern of usage seen in the source
being the sum of the operations (e.g., always using aforstatement to loop over the elements of an
array and awhilestatement to loop over a linked list) required to implement the application
• Individual habits While learning to program a developer may have chosen (perhaps a random selection,
or purely a class exercise to practice the using a language construct) to use a particular construction to
perform some operation Reuse of the same construction to perform the same, or similar operations
leads to it becoming established as part of their repertoire The pattern of usage seen in source code
being the sum of individual habits
In both cases the choice offor/whileinvolves a process of algorithmic problem classification Which most
closely matches the developers mental model of the operations being performed? At the time of this writing
there is insufficient information to evaluate whether there is a cost/benefit case to the use ofwhilestatements,
rather thanforstatements These coding guidelines do not discuss this issue any further
There is a commonly seen idiom that uses side effects in the evaluation of the controlling expression to
modify the value of an object in the controlling expression (i.e., the loop control variable) The discussion 1774 loop control
variable
on controlling expressions in anifstatement showed that removing such side effects in the controlling1740 controlling
expression
if statement
expression ofwhilestatements would incur the cost of having to create and maintain two identical statements
(one outside the loop and one inside) Your author is not able to estimate if this cost was less than the
potential benefits of not having the side effects in the controlling expression For this reason no guideline is
applicable to adostatement (because the loop is always executed at least once) Given that the use of ado
statement is relatively rare and that developers are likely to be familiar with the side effect idioms that occur
in controlling expressions, no guideline recommendation is given here
Example
One use of thedostatement is to solve the dangling semicolon problem that can occur when a function-like
macro replaces a function call Bracketing a sequence of statement with braces creates a compound statement,1933 macro
function-like
which does not require a terminating semicolon In most contexts a semicolon following a function-like
macro invocation is a harmless null statement However, as the following example shows, when it forms
the first arm of anifstatement that contains anelsearm, the presence of a semicolon is a syntax violation
Enclosing a sequence of statements in the body of adostatement, whose controlling expression is false,
avoids this problem
Trang 286.8.5.3 Theforstatement
for (i = a; i < j; i += sizeof(enum {b, a}))
Trang 296.8.5.3 Theforstatement 1774
Except for the behavior of acontinuestatement in the loop body, the statement
and the sequence of statements
Like the C90 Standard, the C++Standard specifies the semantics in terms of an equivalentwhilestatement
However, the C++Standard uses more exact wording, avoiding the possible ambiguities present in the C90
wording
Other Languages
In most other languages the ordering of expressions puts the controlling expression last Or to be more exact,
an upper or lower bound for the loop control variable appears last Most other languages do not support 1774 loop control
variable
having anything other than the loop control variable tested against a value that is known at translation time
Some languages (e.g., Ada, Algol 68, and Pascal) do not allow the loop control variable to be modified by
the body of the loop
Common Implementations
Loop unrollingis the process of decreasing the number of iterations a loop makes by duplicating the statement loop unrolling
in the loop body.[329]For instance:
1 for (loop = 0; loop < 10; loop++)
Loop unrolling reduces the number of jumps performed (which can be a significant saving when the loop
body is short) and by increasing the number of statement in the loop body creates optimization opportunities
(which, in the above example, could result in two loop bodies executing in less time than twice the time for a
single iteration) When the iteration count is not exactly divisible by the loop body unrolling factor copies of
the loop body may need to occur before the start, or after the end, or the loop statement
At the minimum, loop unrolling requires knowing the number of loop iterations and the amount by which
the loop control variable is incremented, at translation time Implementations often place further restrictions
on loops before that they unroll (requiring the loop body to consist of a single basic block is a common
restriction)
Arbitrary amounts of loop unrolling (e.g., iterating 10 times over 100 copies of a loop body where the
original is known to iterate 1000 times) does not necessarily guarantee improved performance Duplicating
Trang 306.8.5.3 Theforstatement
Coding Guidelines
Writers of coding guideline documents often regard the components of aforstatement as having attributesthat other loop statements don’t have (e.g., they have an associated loop control variable) While it can beargued that many of these authors have simply grafted onto C concepts that only exist in other languages (orperhaps the encapsulation all of the loop control information in one visually delimited area of the sourcetriggers a cognitive response that triggers implicit assumptions in readers), if a sufficient number of developersassociate these attributes withforstatements then they become part of the culture of C and need to beconsidered here Other loop conceptualization issues are discussed elsewhere
while
statement1772
This subsection discusses one attribute commonly associated withforstatements that is not defined by
loop control
vari-able the C Standard, the so-called loop control variable (or simply loop variable, or loop counter) A loop control
variable is more than simply a concept that might occur during developer discussion, many coding guidelinedocuments make recommendations about its use (e.g., a loop control variable should not be modified during
execution of the body of the loop, or have floating-point type) Which of the potentially four different objectsthat might occur, for instance, in the most common form of loop header (see Table1763.1) is the loop controlvariable?
1 for (lcv_1=0; lcv_2 < lcv_3; lcv_4++)
The following algorithm frequently returns an answer that has been found to be acceptable to developers(it is based on the previous standard and has not been updated to reflect the potential importance of objectsdeclared inclause-1) Note that the algorithm may return zero, or multiple answers; a union or structuremember selection operator and its two operands is treated as a single object, but both an array and any objects
in its subscript are treated as separate objects and therefore possible loop control variables:
1 list all objects appearing inexpression-2(the controlling expression) If this contains a single object,
it is the loop control variable (33.2% of cases in the.cfiles),
2 remove all objects that do not appear inexpression-3(which is evaluated on every loop iteration) If
a single object remains, that is the loop control variable (91.8% of cases in the.cfiles),
3 remove all objects that do not appear inclause-1(which is only evaluated once, prior to loop iteration)
If a single object remains, that is the loop control variable (86.2% of cases in the.cfiles)
Unlike the example given above, in practice the same object often appears as an operand somewhere withinall three components (see Figure1774.1)
Because the controlling expression is evaluated on every iteration of the loop, the loop control variable canappear in contexts that are not supported in other languages (because most evaluate the three loop componentsonly once, prior to the first iteration) For instance:
1 for (lcv_1=0, lcv_2=0; a1[lcv_1] < a2[lcv_2]; lcv_1++, lcv_2+=2)
Experience shows that developers often assume that, in aforstatement, modification of any loop controlvariables only occurs within the loop header This leads to them forming beliefs about properties of the loop,for instance, it loops 10 times There tend to be fewer assumptions made about the use ofwhilestatements(which might not even be thought to have a loop control variable associated with them) and the followingguideline is likely to cause developers to use this form of looping construct
Trang 316.8.5.3 Theforstatement 1775
Possible loop control variables
1 10 100 1,000
Figure 1774.1: Number of possible loop control variables appearing inexpression-2(square-box) after filtering against the
objects appearing inexpression-3(cross) and after filtering against the objects appearing inclause-1(bullet) Based on the
visible form of the c files.
Cg1774.1
A loop control variable shall not be modified during the execution of the body of aforstatement
Some coding guideline documents recommend that loop control variables not have floating-point type It
might be thought that such a recommendation only makes sense in languages where the loop termination
condition involves an equality test (in C this case is covered by the guideline recommendation dealing
with the type of the operands of the quality operators) However, the controlling expression in a Cfor
1214.1 equality operators not floating-point operands
statement can contain relational operators, which can also have a dependence on the accuracy of floating-point
operations For instance, it is likely that the author of the following fragment expects the loop to iterate
10 times However, it is possible that 10 increments ofiresult in it having the value 9.9999, and loop
termination not occurring until after the eleventh iteration
1 for (float i=0.0; i < 10.0; i++)
A possible developer response to a guideline recommendation that loop control variables not have floating
point type is to use awhilestatement (which are not covered by the algorithm for deducing loop control
variables) Some of the issues associated with the finite accuracy of operations on floating-point values
can be addressed with guideline recommendations However, the difficulty of creating wording for a
recommendation dealing with the use of floating-point values to control the number of loop iterations is such
that none is attempted here
Table 1774.1: Occurrence of sequences of components omitted from aforstatement header (as a percentage of allfor
statements) Based on the visible form of the c files.
Trang 326.8.5.3 Theforstatement
Coding Guidelines
The controlling expression in aforstatement is sometimes written so that its evaluation also has the sideeffect of modifying the value of the loop control variable, removing the need forexpression-3 A developermay have any of a number of reasons for using such an expression, from use of an idiom to misplacedconcern for efficiency (many of the issues associated with side effects within the controlling expression arethe same as those that apply towhilestatements)
and the entire loop, including the other two expressions;
Declaring the loop control variable viaclause-1has the benefit of localizing the visual context over which
it is referenced Possible costs include having to modify existing habits (e.g., looking for the declaration at
Trang 336.8.5.3 Theforstatement 1781
Saying in words what is specified in the syntax Aforstatement loop header is essentially a means of
visually highlighting the various components of a loop
C++
The C++Standard does not make this observation, that can be deduced from the syntax
Other Languages
Being able to omit the specification for the initial value of a loop counter (i.e.,clause-1) is unique to C
(and C++) Most languages allow their equivalent ofexpression-3to be omitted and use a default value
(usually either 1 or -1)
Coding Guidelines
Why would a developer choose to omit either of these constructs in aforstatement, rather than using a
whilestatement? This issue is discussed elsewhere 1763iterationstatement
Specifying that an omittedexpression-2is replaced by a nonzero constant allows a more useful meaning to
be given to those cases whereclause-1orexpression-3are present, than by replacing it by the constant 0
Omittingexpression-2creates a loop that can never terminate via a condition in the loop header Executing
abreak,goto, orreturnstatement (or a call to thelongjmplibrary function) can cause execution of the
loop to terminate The term infinite loop is often used to describe aforstatement where the controlling
expression has been omitted In some freestanding environments the main body of a program consists of an
infinite loop that is only terminated when electrical power to the processor is switched off
Other Languages
Most languages require that the loop termination condition be explicitly specified In Ada the loop header is
optional (a missing header implies an infinite loop)
Common Implementations
The standard describes an effect that most implementations do not implement as stated A comparison that is
unconditionally true can be replaced by an unconditional jump
Trang 34is a nonzero constant probably requires less effort to comprehend (because it is not an implicit special case)for less experienced developers.
Example
1 #define TRUE 1 2
3 void f(void)
4 {
5 for (;;)
6 { /* */ } 7
Trang 356.8.6 Jump statements 1782
stages down the pipeline and the following instruction is already in the pipeline behind it Until the processor
executes the branch instruction it does not know which location to fetch the next instruction from, a pipeline
stallhas occurred Branch instructions are relatively common, which means that pipeline stalls can have a
significant performance impact The main techniques used by processor vendors to reduce the impact of
stalls are discussed in the following C sentences
One of the design principles of RISC was to expose some of the underlying processor details to the
translator, in the hope that translators would make use of this information to improve the performance
of the generated machine code Some of the execution delays causes by branch instructions have been
exposed For instance, many RISC processors have what is known as a delay slot immediately after a branch
instruction The instruction in this delay slot is always executed before the jump occurs (some processors
allow delay slot instructions following a conditional branch to be annulled) This delay slot simplifies the
processor by moving some of the responsibility for keeping the pipeline full to the translator writer (who at
worst fills it with a no-op instruction) Most processors have a single delay slot, but the Texas Instruments
TMS320C6000[1372]has five
Fetching the instructions that will soon be executed requires knowing the address of those instructions In
the case of function calls the destination address is usually encoded as part of the instruction; however, the
function return address is usually held on the stack (along with other housekeeping information) Maintaining
a second stack, containing only function return addresses, has been proposed, along with speculative execution
(and stack repair if the speculation does not occur along the control flow path finally chosen[1276])
Calder, Grunwald, and Srivastava[192]studied the behavior of branches in library functions, looking for
common patterns that occurred across all calls
Coding Guidelines
Thecontinueandbreakstatements are a form ofgotostatement Some developers consider them to be a
structured gotoand treat them differently than agotostatement The controversy over the use of thegoto
statement has not abated since Dijkstra’s, now legendary, letter to the editor was published in 1968.[363]Many
reasons have been given for why source code should not containgotostatements; Dijkstra’s was based on
human cognition Knuth argued that in some cases use ofgotoprovided the best solution.[761]
Edsger W Dijkstra
My second remark is that our intellectual powers are rather geared to master static relations and that our powers
to visualize processes evolving in time are relatively poorly developed For that reason we should do (as wise
programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and
the dynamic process, to make the correspondence between the program (spread out in text space) and the process
(spread out in time) as trivial as possible
The heated debate on the use of thegotostatement has generated remarkably little empirical research.[488]
Are guideline recommendations against the use ofgotosimply a hang over from the days when developers
had few structured programming constructs (e.g., compound statements) available in the language they used,
or is there a worthwhile cost/benefit in recommending against their use?
It is possible to transform any C program containing jump statements to one that does not contain any
jump statements This may involve the introduction of additionalwhilestatements,ifstatements, and
the definition of new objects having a boolean type An algorithm for performing this transformation,
while maintaining the topology of the original flow graph and the same order of efficiency, is given by
Ashcoft and Manna.[59] Ammaraguellat[30]gives an algorithm that avoids code replication and normalizes
all control-flow cycles into single-entry single-exitwhileloops In practice automated tools tend to take a
simpler approach to transformation.[405] The key consideration does not appear to be the jump statement
itself, but the destination statement relative to the statement performing the jump This issue is discussed
elsewhere
1783 jump ment
state-causes jump to
Usage
Numbers such as those given in Table1782.1and Table1782.2depend on the optimizations performed by
an implementation For instance, unrolling a frequently executed loop will reduce the percentage of branch
Trang 376.8.6 Jump statements 1783
Table 1782.1: Dynamic occurrence of different kinds of instructions that can change the flow of control %Instructions Altering
Control Flow is expressed as a percentage of all executed instructions All but the last row are expressed as percentages of
these, control flow altering, instructions only The kinds of instructions that change control flow are: conditional branches CB,
unconditional branches UB, indirect procedure calls IC, procedure calls PC, procedure returns Ret, and other breaks Oth (e.g.,
signals andswitchstatements) Instructions between branches is the mean number of instructions between conditional branches.
Based on Calder, Grunwald, and Zorn [193]
Program %Instructions
Altering Control Flow
%CB %UB %IC %PC %Ret %Oth %Conditional
Branch Taken
Instructions Between Branches
Table 1782.2: Number of static conditional branches sites that are responsible for the given quantile percentage of dynamically
executed conditional branches For instance, 19 conditional branch sites are responsible for over 50% of the dynamically executed
branches executed by burg Static count is the total number of conditional branch instructions in the program image Of the
17,565 static branch sites, 69 branches account for the execution of 50% of all dynamic conditional branches Not all branches
will be executed during each program execution because many branches are only encountered during error conditions, or may
reside in unreachable or unused code Based on Calder, Grunwald, and Zorn [193]
Some languages allow labels to be treated as types In such languages jump statements can jump between
functions (there is usually a requirement that the function jumped to must have an active invocation in the
current call chain at the time the jump statement is executed) In Algol 68 a label is in effect a function that
jumps to that label It is possible to call that function or take its address In C terms:
Trang 38Many processors have span-dependent branch instructions That is, a short-form (measured in number ofbytes) that can only branch relatively small distances, while a long-form can branch over longer distances.When storage usage needs to be minimized it may be possible to use a jump chain to branch to a place using
a short-form instruction, rather than a direct jump to that place using a long-form instruction (at the cost ofreduced execution performance).[859]
Coding Guidelines
The term spaghetti code is commonly used to describe code containing a number of jump statements whosevarious destinations cause the control flow to cross the other control flows (a graphical representation of thecontrol flow, using lines to represent flow, resembled cooked spaghetti, i.e., it is intertwined)
Jumps can be split into those cases where the destination is the most important consideration and thosewhere the jump/destination pair need to be considered— as follows:
• Jumping to the start/end of a function/block— the destination being in the same or outer block relative
to the jump This has a straight-forward interpretation as restarting/finishing the execution of afunction/block The statement jumped to may not be the last (some termination code may need to beexecuted, or a guideline recommendation that functions have a single point of exit may cause the label
Trang 39Figure 1783.1: Number ofgotostatements having a given number of visible source lines between agotostatement and its
destination label (negative values denote backward jumps) Based on the translated form of this book’s benchmark programs.
Scope nesting level
Figure 1783.2: Number ofgotostatements and labels having a given scope nesting level (nesting level 1 is the outermost block
of a function definition), and on the right the difference in scope levels between agotostatement and its corresponding labeled
statement (negative values denote a jump to a less nested scope) Based on the translated form of this book’s benchmark programs.
• Jumping within the same block This is the most common kind ofgotostatement found in C source
(see right plot of Figure1783.2)
Jump statements written by the developer can create a flow of control that is that requires a lot of effort to
comprehend Some guideline documents recommend against the use of any jump statement (including a
return, unless it is the last statement within a function) Comprehending the flow of control is an important
part of comprehending a function The use of jump statements can increase the cost of comprehension (by
increasing the complexity of the flow of control) and can increase the probability that the comprehension
process reaches the wrong conclusion (unlike other constructs that change the flow of control, there are not
usually any addition visual clues, e.g., indentation, in the source signaling the presence of a jump statement)
However, there is no evidence to suggest that the cost of the alternatives is any lower and consequently no
guideline recommendation is made here
Any statements that appear after ajump-statement, in the same block, are dead code 190 dead code
Usage
A study by on Gellerich, Kosiol, and Ploedereder[488]analyzedgotousage in Ada and C In the translated
form of this book’s benchmark programs 20.6% ofgotostatements jumped to a label that occurred textually
before them in the source code
1784
Trang 406.8.6.1 Thegotostatement
In C90 this wording appeared in a footnote, where it was an observation about the equivalence between a
forstatement and awhilestatement In C99 this wording plays the role of a specification
is incremented and one of the operand appearing inexpression-2
6.8.6.1 Thegotostatement
Other Languages
Some languages that support nested function definitions (e.g., Pascal) only require that the label name bevisible (i.e., it is possible to jump to a label in a different function) Other languages (e.g., Algol 68) givelabels block scope, which restricts its visibility Perl has a form ofgotothat causes the function named bythe label to be called However, when that function returns control is returned to the function that calledthe function that performed the goto(i.e., the behavior is as-if control returns to thegoto which thenimmediately performs a return)