1058It provides an unnamed object whose value is given by the initializer list.81 compound literal unnamed object Commentary The difference between this kind of unnamed object and that c
Trang 1All objects defined outside the body of a function have static storage duration The storage for such objects is
1065 compound literal outside func- tion body
455 static storage dura- tion
initialized before program startup, so can only consist of constant expressions This constraint only differs
151 static storage duration initialized before startup
from an equivalent one for initializers by being framed in terms of “occurring outside the body of a function”
1644 initializer static storage duration object
rather than “an object that has static storage duration.”
outside of an object definition Because their need be no associated type definition, a type name must be
specified (for initializers the type is obtained from the type of the object being initialized)
Other Languages
A form of compound literals are supported in some languages (e.g., Ada, Algol 68, CHILL, and Extended
Pascal) These languages do not always require a type name to be given The type of the parenthesized list of
expressions is deduced from the context in which it occurs
Coding Guidelines
From the coding guideline point of view, the use of compound literals appears fraught with potential pitfalls,
1066 compound literal inside function body
including the use of the term compound literal which suggests a literal value, not an unnamed object.1061compound
literal
is lvalue
However, this construct is new in C99 and there is not yet sufficient experience in their use to know if any
specific guideline recommendations might apply to them
1058It provides an unnamed object whose value is given by the initializer list.81) compound literal
unnamed object
Commentary
The difference between this kind of unnamed object and that created by a call to a memory allocation function
(e.g.,malloc) is that its definition includes a type and it has a storage duration other than allocated (i.e.,
either static or automatic)
Other Languages
Some languages treat their equivalent of compound literals as just that, a literal For instance, like other
literals, it is not possible to take their address
Common Implementations
In those cases where a translator can deduce that storage need not be allocated for the unnamed object, the
as-if rule can be used, and it need not allocate any storage This situation is likely to occur for compound
literals because, unless their address is taken (explicitly using the address-of operator, or in the case of an
array type implicit conversion to pointer type), they are only assigned a value at one location in the source
code At their point of definition, and use, a translator can generate machine code that operates on their
constituent values directly rather than copying them to an unnamed object and operating on that
Coding Guidelines
Guideline recommendations applicable to the unnamed object are the same as those that apply to objects
having the same storage duration For instance, the guideline recommendation dealing with assigning the
address assigned
Example
The following example not only requires that storage be allocated for the unnamed object created by the
compound literal, but that the value it contains be reset on every iteration of the loop
Trang 3Coding Guidelines
Developers are unlikely to write expressions, such as (int){1}, when(int)1had been intended (on
standard US PC-compatible keyboards the pair of characters ( { and the pair ) } appear on four different
keys) Such usage may occur through the use of parameterized macros However, at the time of this writing
there is insufficient experience with use of this new language construct to know whether any guideline
recommendation is worthwhile
Example
The following all assign a value toloc The first two assignments involve an lvalue to value conversion In
the second two assignments the operand being assigned is already a value
Example
1064The value of the compound literal is that of an unnamed object initialized by the initializer list
Commentary
The distinction between a compound literal acting as if the initializer list was its value, and an unnamed
object (initialized with values from the initializer list) being its value, is only apparent when the address-of
operator is applied to it The creation of an unnamed object does not mean that locally allocated storage is a
factor in this distinction Implementations of languages where compound literals are defined to be literals
sometimes use locally allocated temporary storage to hold their values C implementations may find they can
optimize away allocation of any actual unnamed storage
Common Implementations
If a compound literal occurs in a context where its value is required (e.g., assignment) there are obvious
opportunities for implementations to use the values of the initializer list directly C99 is still too new to know
whether most implementations will make use of this optimization
Trang 4Coding Guidelines
The distinction between the value of a compound literal being an unnamed object and being the values of theinitializer list could be viewed as an unnecessary complication that is not worth educating a developer about.Until more experience has been gained with the kinds of mistakes developers make with compound literals,
it is not possible to recommend any guidelines
8
10 {
14 }
Other Languages
The storage duration specified by other languages, which support some form of compound literal, varies.Some allow the developer to choose (e.g., Algol 68), others require them to be dynamically allocated (e.g.,Ada), while in others (e.g., Fortran and Pascal) the issue is irrelevant because it is not possible to obtain theiraddress
1066otherwise, it has automatic storage duration associated with the enclosing block
compound literal
inside function
A parallel can be drawn between an object definition that includes an initializer and a compound literal (that
is the definition of an unnamed object) The lifetime of the associated objects starts when the block that
Trang 5contains their definition is entered However, the objects are not assigned their initial value, if any, until the lifetime from
entry to exit of block
declaration is encountered during program execution 462initializationperformed every
time declaration reached
The unnamed object associated with a compound literal is initialized each time the statement that contains
it is encountered during program execution Previous invocations, which may have modified the value of the 1711objectinitializer
literal does not cause any new storage to be allocated Recursive calls to a function containing a compound
literal will cause different storage to be allocated, for the unnamed object, for each nested call
To what extent is it worth trying to optimize compound literals made up of a list of constant expressions;
for instance, by detecting those that are never modified, or by placing them in a static region of storage
that can be copied from or pointed at? The answer to these and many other optimization issues relating to
compound literals will have to wait until translator vendors get a feel for how their customers use this new, to
C, construct
Coding Guidelines
Parallels can be drawn between the unnamed object associated with a compound literal and the temporaries
created in C++ Experience has shown that C++developers sometimes assume that the lifetime of a temporary
is greater than it is required to be by that languages standard Based on this experience it is to be expected
that developers using C might make similar mistakes with the lifetime of the unnamed object associated with
a compound literal Only time will tell whether these mistakes will be sufficiently common, or serious, that
the benefits of being able to apply the address-of operator to a compound literal (the operator that needs to be
used to extend the range of statements over which an unnamed object can be accessed) are outweighed by
the probably cost of faults
Trang 6The guideline recommendation dealing with assigning the address of an object to a pointer object, whoselifetime is greater than that of the addressed object, is applicable here.
Commentary
They are the same except
• initializer lists don’t create objects, they are simply a list of values with which to initialize an object;and
• the type is deduced from the object being initialized, not a type name
string literal
distinct object
compound literal
Trang 7A strictly conforming program can deduce if an implementation uses the same object for two string literals,
or compound literals, by performing an equality comparison on their addresses (an infinite number of 1076EXAMPLEstring literals
shared
comparisons would be needed to deduce whether an implementation always used distinct objects) This
permission for string literals is also specified elsewhere 908 string literal
distinct array
The only way a const-qualified object can be modified is by casting a pointer to it to a non-const-qualified
pointer Such usage results in undefined behavior The undefined behavior, if the pointer was used to modify746pointerconverting
quali-fied/unqualified
such an unnamed object that was not distinct, could also modify the values of other compound literal object
values
Other Languages
Most languages do not consider any kind of literal to be modifiable, so whether they share the same storage
locations is not an issue
Common Implementations
The extent to which developers will use compound literals having a const-qualified type, for which storage
is allocated and whose values form a sharable subset with another compound literal, remains to be seen
Without such usage it is unlikely that implementors of optimizers will specifically look for savings in this
area, although they may come about as a consequence of optimizations not specifically aimed at compound
literals
Example
In the following there is an opportunity to overlay the two unnamed objects containing zero values
1069EXAMPLE 1 The file scope definition
int *p = (int []){2, 4};
initializespto point to the first element of an array of two ints, the first having the value two and the second,
four The expressions in this compound literal are required to be constant The unnamed object has static
storage duration
Commentary
This usage, rather than the more obviousint p[] = {2, 4};, can arise because the initialization value is
derived through macro replacement The same macro replacement is used in noninitialization contexts
1070EXAMPLE 2 In contrast, in
void f(void) {
pis assigned the address of the first element of an array of two ints, the first having the value previously
pointed to bypand the second, zero The expressions in this compound literal need not be constant The
unnamed object has automatic storage duration
Commentary
The assignment of values to the unnamed object occurs before the value of the right operand is assigned top
Trang 8drawline((struct point){.x=1, y=1},
(struct point){.x=3, y=4});
Or, ifdrawlineinstead expected pointers tostruct point:
drawline(&(struct point){.x=1, y=1},
&(struct point){.x=3, y=4});
Commentary
This usage removes the need to create a temporary in the calling function The arguments are passed byvalue, like any other structure argument
1072EXAMPLE 4 A read-only compound literal can be specified through constructions like:
(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}
footnote
83
same or overlapping representations
Trang 9The need to discuss an implementation’s ability to share storage for string literals occurs because it is
possible to detect such sharing in a conforming program (e.g., by comparing two pointers assigned the
addresses of two distinct, in the visible source code, string literals) The C Committee choose to permit this
implementation behavior (There were existing implementations, when the C90 Standard was being drafted,
that shared storage.)
1075EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array ofchar, but need not be modifiable; the last
two have automatic storage duration when they occur within the body of a function, and the first of these two
is modifiable
Commentary
In all three cases, a pointer to the start of storage is returned and the first 16 bytes of the storage allocated
will have the same set of values If all three expressions occurred in the same source file, the first and third
could share the same storage even though their storage durations were different Developers who see a 1076EXAMPLEstring literals
shared
potential storage saving in using a compound literal instead of a string literal (the storage for one only need
be allocated during the lifetime of its enclosing block) also need to consider potential differences in the
number of machine code instructions that will be generated Overall, there may be no savings
1076EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and EXAMPLE
als shared
can even be shared For example,
(const char []){"abc"} == "abc"
might yield 1 if the literals’ storage is shared
Commentary
In this example pointers to the first element of the compound literal and a string literal are being compared
for equality Permission to share the storage allocated for a compound literal only applies to those having a
const-qualified type (there is no such restriction on string literals) 1068compoundliteral
distinct object
908 string literal distinct array
Coding Guidelines
Comparing string using an equality operator, rather than a call to thestrcmplibrary function is a common
beginner mistake Training is the obvious solution
Usage
In the visible source of the.cfiles 0.1% of string literals appeared as the operand of the equality operator
(representing 0.3% of the occurrences of this operator)
1077EXAMPLE 7 Since compound literals are unnamed, a single compound literal cannot specify a circularly
linked object For example, there is no way to write a self-referential compound literal that could be used as
the function argument in place of the named objectendless_zerosbelow:
struct int_list { int car; struct int_list *cdr; };
struct int_list endless_zeros = {0, &endless_zeros};
eval(endless_zeros);
Trang 10A modification using pointer types, and an additional assignment, creates a circularly linked list that uses thestorage of the unnamed object:
3
The following statement would not have achieved the same result:
because the second compound literal would occupy a distinct object, different from the first The value ofendless_zerosin the second compound literal would be pointing at the unnamed object allocated for thefirst compound literal
Other Languages
Algol 68 supports the creation of circularly linked objects (see the Other Languages subsection in thefollowing C sentence)
1078EXAMPLE 8 Each compound literal creates only a single object in a given scope:
EXAMPLE
compound literal
int f (void) {
The functionf()always returns the value 1
Note that if an iteration statement were used instead of an explicitgotoand a labeled statement, the lifetime
of the unnamed object would be the body of the loop only, and on entry next time aroundpwould have anindeterminate value, which would result in undefined behavior
Trang 111079Forward references: type names (6.7.6), initialization (6.7.8).
6.5.3 Unary operators
1080
unary-expression syntax
unary-expression:
postfix-expression
++ unary-expression
unary-expression unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
unary-operator: one of
& * + - ~ ! Commentary
Note that the operand of unary-operator is acast-expression, not aunary-expression A unary operator1133cast-expression
syntax
usually refers to an operator that takes a single argument Technically all of the operators listed here, plus the
postfix increment and decrement operators, could be considered as being unary operators
Rationale
Unary plus was adopted by the C89 Committee from several implementations, for symmetry with unary minus
Other Languages
Some languages (i.e., Ada and Pascal) specify the unary operators to have lower precedence than the
multiplicative operators; for instance,-x/yis equivalent to-(x/y)in Ada, but(-x)/yin C Most languages1143multiplicative-expression
syntax
call all operators that take a single-operand unary operators
Languages that support the unary+operator include Ada, Fortran, and Pascal Some languages use the
keywordNOTrather than! In the case of Cobol this keyword can also appear to the left of an operator,
indicating negation of the operator (i.e.,NOT <meaning not less than)
Coding Guidelines
Coding guidelines need to be careful in their use of the term unary operator Its meaning, as developers
understand it, may be different from its actual definition in C The operators in aunary-expressionoccur
to the left of the operand The only situation where a developer’s incorrect assumption about precedence
relationships might lead to a difference between predicted and actual behavior is when a postfix operator
occurs immediately to the right of theunary-expression
Dev943.1
Except whensizeof ( type-name )is immediately followed visually by a token having the lexical form
of an additive operator, if aunary-expressionis not immediately followed by a postfix operator it need
not be parenthesized
Although the expressionsizeof (int)-1may not occur in the visible source code, it could easily occur as
the result of macro replacement of the operand of thesizeofoperator This is one of the reasons behind the
guideline recommendation specifying the parenthesizing of macro bodies (without parentheses the expression 1931.2macrodefinition
Trang 12Token Sequence % Occurrence
of First Token
% Occurrence of Second Token
Figure 1080.1: Number ofinteger-constants having a given value appearing as the operand of the unary minus and unary~
operators Based on the visible form of the c files.
Trang 13Table 1080.2: Occurrence of theunary-operators, prefix++, and prefix having particular operand types (as a percentage of
all occurrences of the particular operator; an _ prefix indicates a literal operand) Based on the translated form of this book’s
benchmark programs.
The use of an operand of typeboolwith the prefix++operator is deprecated (5.3.2p1); there is no
corre-sponding entry in annex D, but the proposed response to C++DR #145 inserted one In the case of the
decrement operator:
5.3.2p1
The operand shall not be of typebool
A C source file containing an instance of the prefix operator applied to an operand having type_Boolis
likely to result in a C++translator issuing a diagnostic
Coding Guidelines
Enumerated types are usually thought about in symbolic rather than arithmetic terms The increment and 822 symbolic
name
517 enumeration set of named constants
decrement operators can also be given a symbolic interpretation They are sometimes thought about in terms
of moving on to the next symbolic name in a list This move to next operation relies on the enumeration
constants being represented by successive numeric values While this usage is making use of representation
information, there is often a need to step through a series of symbolic names (and C provides no other built-in
mechanism), for instance, iterating over the named constants defined by an enumerated type 1199relationaloperators
real operands
Dev569.1
The operand of a prefix increment or decrement operator may have an enumerated type, provided the
enumeration constants defined by that type have successive numeric values
Semantics
Trang 141082The value of the operand of the prefix++operator is incremented.
From the coding guidelines perspective, uses of these operators can be grouped into three categories:
1 The only operator in an expression statement In this context the result returned by the operation isignored The statement simply increments/decrements its operand Use of the prefix, rather than thepostfix, form does not follow the pattern seen at the start of most visible source code statement lines—
an identifier followed by an operator (see Figure940.2) A reader’s scanning of the source looking forobjects that are modified will be disrupted by the initial operator For this reason, use of the postfixform is recommended
so that a prefix operator does not occur in the same expression as other operators The evaluation can
be moved back before the containing expression (see the postfix operators for a fuller discussion ofthis point)
in serial rather than nested form)
3 The third point is the same as for the postfix operators
Trang 151083The result is the new value of the operand after incrementation prefix ++
result
Other Languages
Pascal contains thesuccoperator This returns the successor value (i.e., it adds one to its operand), but it
does not modify the value of an object appearing as its operand
1084The expression++Eis equivalent to(E+=1)
Commentary
The expression++Eneed not be equivalent toE=E+1(e.g., the expressionEmay contain a side effect)
C++
C++lists an exception (5.3.2p1) for the case whenEhas typebool This is needed because C++does not
define its boolean type in the same way as C The behavior of this operator on operands is defined as a special 476_Boollarge enough
to store 0 and 1
case in C++ The final result is the same as in C
1085See the discussions of additive operators and compound assignment for information on constraints, types, prefix operators
[Note: see the discussions of addition (5.7) and assignment operators (5.17) for information on conversions ]
There is no mention that the conditions described in these clauses also apply to this operator
1086The prefix operator is analogous to the prefix ++operator, except that the value of the operand is
where the affects are not analogous
C++
The prefix operator is not analogous to the prefix++operator in that its operand may not have typebool
Other Languages
Pascal contains thepredreserved identifier This returns the predecessor value, but does not modify the
value of its operand
Coding Guidelines
The guideline recommendation for the prefix++operator has been worded to apply to either operator 1082.1prefixin expression
statement
1087Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
6.5.3.2 Address and indirection operators
Constraints
1088The operand of the unary&operator shall be either a function designator, the result of a[]or unary*operator, unary &
operand constraints
or an lvalue that designates an object that is not a bit-field and is not declared with theregisterstorage-class
specifier
Trang 16that the object so declared will be heavily used [Note: the hint can be ignored and in most implementations itwill be ignored if the address of the object is taken —end note]
Source developed using a C++translator may contain occurrences of the unary&operator applied to anoperand declared with the register storage-class specifier, which will cause a constraint violation ifprocessed by a C translator
Coding Guidelines
In itself, use of the address-of operator is relatively harmless The problems occur subsequently when thevalue returned is used to access storage The following are three, coding guideline related, consequences ofbeing able to take the address of an object:
Trang 17• It provides another mechanism for accessing the individual bytes of an object representation (a pointer
to an object can be cast to a pointer to character type, enabling the individual bytes of an object
representation to be accessed)
761 pointer converted to pointer to charac- ter
• It is an alias for the object having that address
• It provides a mechanism for accessing the storage allocated to an object after the lifetime of that object
has terminated
Assigning the address of an object potentially increases the scope over which that object can be accessed
When is it necessary to increase the scope of an object? What are the costs/benefits of referring to an object
using its address rather than its name? (If a larger scope is needed, could an objects definition be moved to a
scope where it is visible to all source code statements that need to refer to it?)
The parameter-passing mechanism in C is pass by value What is often known as pass by reference is1004 function call
preparing for
achieved, in C, by explicitly passing the address of an object Different calls to a function having
pass-by-reference arguments can involve different objects in different calls Passing arguments, by reference,
to functions is not a necessity; it is possible to pass information into and out of functions using file scope
objects
Assigning the address of an object creates an alias for that object It then becomes possible to access the
same object in more than one way The use of aliases creates technical problems for translators (the behavior
implied by the use of therestrictkeyword was introduced into C99 to help get around this problem) and1491 restrict
intended use
can require developers to use additional cognitive resources (they need to keep track of aliased objects)
A classification often implicitly made by developers is to categorize objects based on how they are
accessed, the two categories being those accessed by the name they were declared with and those accessed
via pointers A consequence of using this classification is that developers overlook the possibility, within a
sequence of statements, of a particular object being modified via both methods When readers are aware of an
object having two modes of reference (a name and a pointer dereference) is additional cognitive effort needed
to comprehend the source? Your author knows of no research in on this subject These coding guidelines
discuss the aliasing issue purely from the oversight point of view (faults being introduced because of lack of
information), because there is no known experimental evidence for any cognitive factors
One way of reducing aliasing issues at the point of object access is to reduce the number of objects whose
addresses are taken Is it possible to specify a set of objects whose addresses should not be taken and what
are the costs of having no alternatives for these cases? Is the cost worth the benefit? Restricting the operands
of the address operator to be objects having block scope would limit the scope over which aliasing could
occur However, there are situations where the addresses of objects at file scope needs to be used, including:
• An argument to a function could be an object with block scope, or file scope; for instance, theqsort
function might be called
• In resource-constrained environments it may be decided not to use dynamic storage allocation For
instance, all of the required storage may be defined at file scope and pointers to objects within this
storage used by the program
• The return from a function call is sometimes a pointer to an object, holding information It may
simplify storage management if this is a pointer to an object at file scope
The following guideline recommendation ensures that the storage allocated to an object is not accessed once
the object’s lifetime has terminated
Cg1088.1
The address of an object shall not be assigned to another object whose scope is greater than that of
the object assigned
Dev1088.1
An object defined in block scope, having static storage duration, may have its address assigned to any
other object
Trang 18A function designator can appear as the operand of the address-of operator However, taking the address of afunction is redundant This issue is discussed elsewhere Likewise for objects having an array type.
In the following it is not possible to take the address ofaor any of its elements
In fact this object is virtually useless (the identifieracan appear as the operand to thesizeofoperator) Ifallocated memory is not permitted (we know the memory requirements of the following on program startup):
unary *
operand has
Depending on the context in which it occurs, there may be restrictions on the pointed-to type (because of thetype of the result)
unary *
result type
1098
C++
type, or a pointer to a function type
C++does not permit the unary*operator to be applied to an operand having a pointer tovoidtype
unary &
operator
Trang 19For operands with static storage duration, the value of the address operator may be a constant (objects having 1341 address
constant
an array type also need to be indexed with a constant expression) There is no requirement that the address
of an object be the same between different executions of the same program image (for objects with static
storage duration) or different executions of the same function (for objects with automatic storage duration)
All external function references are resolved during translation phase 8 Any identifier denoting a function139transla-tion phase
8
definition will have been resolved
79
C90
This sentence is new in C99 and summarizes what the unary&operator does
C++
Like C90, the C++Standard specifies a pointer to its operand (5.3.1p1) But later on (5.3.1p2) goes on to say:
“In particular, the address of an object of type “cvT” is “pointer tocv T,” with the same cv-qualifiers.”
Other Languages
Many languages do not contain an address-of operator Fortran 95 has an address assignment operator,=>
The left operand is assigned the address of the right operand
Common Implementations
Early versions of K&R C treatedp=&xas being equivalent top&=x.[734]
In the case of constant addresses the value used in the program image is often calculated at link-time For
objects with automatic storage duration, their address is usually calculated by adding a known, at translation
time, value (the offset of an object within its local storage area) to the value of the frame pointer for that
function invocation Addresses of elements, or members, of objects can be calculated using the base address
of the object plus the offset of the corresponding subobject
Having an object appear as the operand of the address-of operator causes many implementations to play
safe and not attempt to perform some optimizations on that object For instance, without sophisticated pointer
analysis, it is not possible to know which object a pointer dereference will access (Implementations often
assume all objects that have had their address taken are possible candidates, others might use information on
the pointed-to type to attempt to reduce the set of possible accessed objects.) This often results in no attempt
being made to keep the values of such objects in registers
Implementations’ representation of addresses is discussed elsewhere 540 pointer type
1092If the operand is the result of a unary*operator, neither that operator nor the&operator is evaluated and the &*
result is as if both were omitted, except that the constraints on the operators still apply and the result is not an
lvalue
Commentary
The only effect of the operator pair&*is to remove any lvalueness from the underlying operand The1114 footnote
84
combination*&returns an lvalue if its operand is an lvalue This specification is consistent with the behavior1115 *&
of the last operator applied controlling lvalue-ness This case was added in C99 to cover a number of existing
coding idioms; for instance:
1 #include <stddef.h>
2
Trang 20This behavior is not specified in C++ Given that either operator could be overloaded by the developer to have
a different meaning, such a specification would be out of place
At the time of this writing a response to C++DR #232 is being drafted (a note from the Oct 2003 WG21meeting says: “We agreed that the approach in the standard seems okay:p = 0; *p;is not inherently anerror An lvalue-to-rvalue conversion would give it undefined behavior.”)
4
8
11 }
Common Implementations
Some C90 implementations did not optimize the operator pair&*into a no-op In these implementations thebehavior of the unary*operator was not altered by the subsequent address-of operator C99 implementationsare required to optimize away the operator pair&*
1093Similarly, if the operand is the result of a[]operator, neither the&operator nor the unary*that is implied bythe[]is evaluated and the result is as if the&operator were removed and the[]operator were changed to a
Trang 218 * For instance, we might want to loop over an object, using
This requirement was not explicitly specified in the C90 Standard It was the subject of a DR #076 that was
closed by adding this wording to the C99 Standard
C++
This behavior is not specified in C++ Given that either operator could be overloaded by the developer to have
a different meaning, such a specification would be out of place The response to C++DR #232 may specify
the behavior for this case
Common Implementations
This requirement describes how all known C90 implementations behave
Coding Guidelines
The expression&a[index], in the visible source code, could imply
• a lack of knowledge of C semantics (why wasn’ta+indexwritten?),
• that the developer is trying to make the intent explicit, and
• that the developer is adhering to a coding standard that recommends against the use of pointer
arithmetic— the authors of such standards often view(a+index)as pointer arithmetic, buta[index]
as an array index (the equivalence between these two forms being lost on them) 989array sub-script
identical to
1094Otherwise, the result is a pointer to the object or function designated by its operand
Commentary
There is no difference between the use of objects having a pointer type and using the address-of operator For
instance, the result of the address-of operator could be assigned to an object having the appropriate pointer
type, and that object used interchangeably with the value assigned to it
Trang 22Other Languages
Some languages (e.g., Pascal and Ada) use the postfix operator^ Other languages— Algol 68 and Fortran95— implicitly perform the indirection operation In this case, an occurrence of operand, having a pointertype, is dereferenced to return the value of the pointed-to object
Coding Guidelines
Some coding guideline documents place a maximum limit on the number of simultaneous indirectionoperators that can be successively applied The rationale being that deeply nested indirections can be difficult
to comprehend Is there any substance to this claim?
Expressions, such as***p, are similar to nested function calls in that they have to be comprehended in a
1000 right-to-left order The issue of nested constructions in natural language is discussed in that earlier C sentence
At the time of this writing there is insufficient experimental evidence to enable a meaningful cost/benefitanalysis to be performed and these coding guidelines say nothing more about this issue
If sequences of unary*operators are needed in an expression, it is because an algorithm’s data structuresmake the usage necessary In practice, long sequences of indirections using the unary*operator are rare Likethe function call case, it may be possible to provide a visual form that provides a higher-level interpretationand hides the implementation’s details of the successive indirections
An explicit unary*operator is not the only way of specifying an indirection Both the array subscript,
member
selection1031
[], and member selection,->, binary operators imply an indirection Developers rarely use the form(*s).m((&s)->m), the form s->m (s.m) being much more obvious and natural While the expressions1->m1->m2->m3is technically equivalent to(*(*(*s1).m1).m2).m3, it is comprehended in a left-to-rightorder
The operand could be an object, with some pointer to function type, or it could be an identifier denoting
a function that has been implicitly converted to a pointer to function type This result is equivalent to theoriginal function designator Depending on the context in which it occurs this function designator may be
Common Implementations
For most implementations the result is an address of a storage location Whether there is a function definition(translated machine code) at that address is not usually relevant until an attempt is made to call the designatedfunction (using the result)
Trang 23Coding Guidelines
Because of the implicit conversions a translator is required to perform, the unary*operator is not required to
cause the designated function to be called There are a number of situations that can cause such usage to
appear in source code: the token sequence may be in automatically generated source code, or the sequence
may occur in developer-written source via arguments passed to macros, or developers may apply it to objects
having a pointer to function type because they are unaware of the implicit conversions that need to be
performed
Example
The indirection operator produces a result that allows the pointed-to object to be treated like an anonymous
object The result can appear in the same places that an identifier (defined to be an object of the same type)
can appear The resulting lvalue might not be a modifiable lvalue There may already be an identifier that 724 modifiable
lvalue
refers to the same object If two or more different access paths to an object exist, it is said to be aliased 971 object
aliased
Common Implementations
Some processors (usually CISC) have instructions that treat their operand as an indirect reference For
instance, an indirect load instruction obtains its value from the storage location pointed to by the storage
location that is the operand of the instruction
1098If the operand has type “pointer to type”, the result has type “type” unary *
result type
Commentary
The indirection operator removes one level of pointer from the operand’s type The operand is required to
have pointer type In many contexts the result type of a pointer to function type will be implicitly converted 1089unary *operand has
pointer type
732 function designator converted to type
back to a pointer type
1099If an invalid value has been assigned to the pointer, the behavior of the unary*operator is undefined.84)
Commentary
The standard does not provide an all-encompassing definition of what an invalid value is The footnote1114 footnote
84
gives some examples An invalid value has to be created before it can be assigned and this may involve a
conversion operation Those pointer conversions for which the standard defines the behavior do not create 743pointerto void
what is commonly known as an out of bounds array access is specified here
Trang 24do perform a few checks.
Forward references: storage-class specifiers (6.7.1), structure and union specifiers (6.7.2.1).
6.5.3.3 Unary arithmetic operators
Constraints
1101The operand of the unary+or-operator shall have arithmetic type;
Trang 25There are algorithms (e.g., in graphics applications) that require the bits in an integer value to be
comple-mented, and processors invariably contain an instruction for performing this operation Complementing
the bits in a floating-point value is a very rarely required operation and processors do not contain such an
instruction This constraint reflects this common usage
Other Languages
While many languages do not contain an equivalent of the~operator, their implementations sometimes
include it as an extension
Coding Guidelines
Some coding guideline documents only recommend against the use of operands having a signed type The
argument is that the representation of unsigned types is defined by the standard, while signed types might have
one of several representations In practice, signed types almost universally have the same representation—
two’s complement However, the possibility of variability of integer representation across processors is not 612 two’s
comple-ment
the only important issue here The~operator treats its operand as a sequence of bits, not a numeric value As
such it may be making use of representation information and the guideline recommendation dealing with this
issue would be applicable
569.1 tation in- formation
C++
The C++Standard does not specify any requirements on the type of the operand of the!operator
5.3.1p8
The operand of the logical negation operator!is implicitly converted tobool(clause 4);
But the behavior is only defined if operands of scalar type are converted tobool:
The following are two possible ways of thinking about this operator are:
1 As a shorthand form of the!=operator in a conditional expression That is, in the same way the
two formsif (x)andif (x == 0)are equivalent, the two formsif (!x)andif (x != 0)are
equivalent
2 As a logical negation operator that reverses the state of a boolean value (it can take as its operand a
value in either of the possible boolean representation models and map it to the model that uses the 0/1476 boolean role
for its boolean representation)
A double negative is very often interpreted as a positive statement in English (e.g., “It is not unknown for
double negatives to occur in C source”) The same semantics that apply in C However, in some languages
(e.g., Spanish) a double negative is interpreted as making the statement more negative (this usage does occur
in casual English speech, e.g., “you haven’t seen nothing yet”, but it is rare and frowned on socially[120])
The token!is commonly called the not operator This term is a common English word whose use in a
sentence is similar to its use in a C expression Through English language usage the word not, or an equivalent
Trang 26form, can appear as part of an identifier spelling (e.g.,not_finished,no_signal, orunfinished) The
A simple expression containing a double negation is likely to require significantly more cognitive resources
to comprehend than a one that does not Changing the semantic associations of an identifier from (thoseimplied by)not_finishedtofinishedwould require that occurrences ofnot_finishedbe changed to
!finished(plus associated changes to any appearances of the identifier as the operand of the! or theequality operators)
Calculating the difference in cognitive cost/benefit between using an identifier spelling that represents anegated form and one that does not requires information on a number of factors For instance, whether anydouble negative forms actually appear in the source, the extent to which the not spelling form provides a goodfit to the application domain, and any cognitive cost differences between the alternative formsnot_finishedand!finished Given the uncertainty in the cost/benefit analysis no guideline recommendation is givenhere
Table 1103.1: Occurrence of the unary!operator in various contexts (as a percentage of all occurrences of this operator and the percentage of all occurrences of the given context that contains this operator) Based on the visible form of the c files.
Use of the unary+operator is very rare in developer-written source If it appears immediately after the
=operator in existing code, the possible early K&R interpretation might be applicable The usage is nowsufficiently rare that a discussion on whether to do nothing, replace every occurrence by the sequence+=,introduce a separating white-space character, parenthesize the value being assigned, or do something else isnot considered worthwhile
Example
1 /*
Trang 27The two contexts in which the integer promotions would not be performed, unless the unary+operator is
applied, are the right operand of a simple assignment and the operand of thesizeofoperator 1303 simple
as-signment
1119 sizeof result of
1106The result of the unary-operator is the negative of its (promoted) operand
Commentary
The expression-xis not always equivalent to0-x; for instance, ifxhas the value0.0, the results will be
-0.0and0.0, respectively
Common Implementations
Most processors include a single instruction that performs the negation operation On many RISC processors
this instruction is implemented by the assembler using an alias of the subtract instruction (for integer operands
only) On such processors there is usually a register hardwired to contain the value zero (the IBM/Motorola
floating-point representations, the negation operator simply changes the value of the sign bit
Coding Guidelines
If the operand has an unsigned type, the result will always be a positive or zero value This issue is discussed
unsigned operand
Example
The expression-1is the unary-operator applied to the integer constant1
1107The integer promotions are performed on the operand, and the result has the promoted type
The integer promotions may convert an unsigned type to a signed type However, this can only happen
if the signed type can represent all of the values of the unsigned type This is reflected in the guideline
715 signed integer represent all unsigned integer values
recommendation for unsigned types
1101.1 unary minus
unsigned operand
1108The result of the~operator is the bitwise complement of its (promoted) operand (that is, each bit in the result bitwise
com-plement result is
is set if and only if the corresponding bit in the converted operand is not set)
Commentary
The term bitwise not is sometimes used to denote this operator (it is sometimes also referred to by the bitwise not
character used to represent it, tilde) Because its use is much less frequent than logical negation, this term is 1111logicalnegation
result is
rarely shortened
Trang 28Common Implementations
Most processors have an instruction that performs this operation An alternative implementation is toexclusive-or the operand with an all-bits-one value (containing the same number of bits as the promotedtype) The Unisys A Series[1423]uses signed magnitude representation and if an operand has an unsignedtype, the sign bit in the object representation (which is treated as a padding bit) is not affected by the bitwise
a processor whoseintsize differs from the original processor on which the code executed can cause latentdifferences in behavior to appear For instance, ifsizeof(int)==sizeof(short)on the original processor,then any integer promotions on operands having typeshortwould not increase the number of bits in thevalue representation and a program may have an implicit dependency on this behavior occurring Moving
to a processor wheresizeof(int) > sizeof(short)may require modifications to explicitly enforce thisdependency The issues involved in guideline recommendations that only deliver a benefit when a program isported to a processor whose integer widths are different from the original processor are discussed elsewhere
Trang 291110If the promoted type is an unsigned type, the expression~Eis equivalent to the maximum value representable
in that type minusE
The issues surrounding the use of bitwise operations to perform equivalent arithmetic operations is discussed
The term not (or logical not) is often used to denote this operator The much less frequently used operator, logical not
result is
C++
5.3.1p8
its value istrueif the converted operand isfalseandfalseotherwise
This difference is only visible to the developer in one case In all other situations the behavior is the same1112logicalnegation
The implementation of this operator often depends on the context in which it occurs The machine code
generated can be very different if the result value is used to decide the control flow (e.g., it is the final
operation in the evaluation of a controlling expression) than if the result value is the operand of further 1744if statementoperand compare
against 0
operators In the control flow case an actual value of 0 or 1 is not usually required On many processors
loading a value from storage into a register will set various bits in a conditional flags register (these flag bit
settings usually specify some relationship between the value loaded and zero— e.g., equal to, less than, etc.)
A processor’s conditional branch instructions use the current settings of combinations of these bits to decide
whether to take the branch or not When the result is used as an operand in further operations, a 0 or 1 value
is needed; the generated machine code is often more complex A common solution is the following pseudo
machine code sequence (which leaves the result inREG_1):
While the result is specified in numeric terms, most occurrences of this operator are as the top-level operator
in a controlling expression (see Usage below) These contexts are usually considered in boolean rather than
Trang 301112The result has typeint.
5.3.1p8 The type of the result isbool
The difference in result type will result in a difference of behavior if the result is the immediate operand ofthesizeofoperator Such usage is rare
• The!character is not frequently encountered in formal education, and it may be easy to miss in avisual scan of source (no empirical studies using the!character are known to your author)
• The equality operator,==, is sometimes mistyped as an assignment operator,=
is! If the operand is thought of as being arithmetic, the conceptually closest operator is==
A number of studies have investigated the impact of negation in reasoning tasks In natural languagesnegation comes in a variety of linguistic forms (e.g., “no boys go to class”, “few boys go to class”, “someboys go to class”) and while the results of these studies[707]of human performance using these forms may
Trang 31be of interest to some researchers, they don’t have an obvious mapping to C language usage (apart from
the obvious one that negating a sentence involves an additional operator, the negation, which itself needs
cognitive resources to process)
Usage
The visible form of the.cfiles contain 95,024 instances of the operator!(see Table912.2for information
on punctuation frequencies) and 27,008 instances of the token sequence== 0(plus 309 instances of the form
== 0x0) Integer constants appearing as the operand of a binary operator occur 28 times more often as the
right operand than as the left operand
111484) Thus,&*Eis equivalent toE(even ifEis a null pointer), and&(E1[E2])to((E1)+(E2)) footnote
84
Commentary
This footnote sentence should really have been referenced from a different paragraph, where these equiva-1092 &*
lences are discussed
C90
This equivalence was not supported in C90, as discussed in the response to DR #012, #076, and #106
C++
At the moment the C++Standard specifies no such equivalence, explicitly or implicitly However, this
situation may be changed by the response to DR #232
1115It is always true that ifEis a function designator or an lvalue that is a valid operand of the unary&operator, *&
*&Eis a function designator or an lvalue equal toE
Commentary
This statement can be deduced from the specifications of the two operators concerned
1116If*Pis an lvalue andTis the name of an object pointer type,*(T)Pis an lvalue that has a type compatible
with that to whichTpoints
Commentary
The result of the cast operator is not an lvalue However, if the operand is a pointer, the pointed-to object1131 footnote
85
does not lose its lvalue-ness This sentence simply points out the type of the result of the operations and its
lvalue-ness; it does not give any additional semantics to the cast or dereference
C++
The C++Standard makes no such observation
1117Among the invalid values for dereferencing a pointer by the unary*operator are a null pointer, an address
inappropriately aligned for the type of object pointed to, and the address of an object after the end of its
lifetime
Commentary
This list contains some examples of invalid values that may appear directly in the source; it is not exhaustive
(another example is dereferencing a pointer-to function) The invalid values may also be the result of an
operation that has undefined behavior For instance, using pointer arithmetic to create an address that does
not correspond to any physical memory location supported by a particular computing system (In virtual
memory systems this case would correspond to an unmapped address.)
Trang 32On many implementations theoffsetof macro expands to an expression that dereferences the nullpointer.
values from being created in the first place
6.5.3.4 Thesizeofoperator
Constraints
1118Thesizeofoperator shall not be applied to an expression that has function type or an incomplete type, to the
If thesizeofoperator accepted a bit-field as an operand, it would have to return a value measured in bitsfor all its operands
sizeof
result of1119
C++
The C++Standard contains a requirement that does not exist in C
A C source file that defines a type within asizeofexpression is likely to cause a C++translator to issue adiagnostic Defining a type within asizeofexpression is rarely seen in C source
Trang 33Table 1118.1: Occurrence of thesizeofoperator having particular operand types (as a percentage of all occurrences of this
operator) Based on the translated form of this book’s benchmark programs.
The operand referred to is the execution-time value of the operand In the case of string literals, escape
sequences will have been converted to a single or multibyte character In these cases the value returned by133transla-tion phase
5
thesizeofoperator does not correspond to the number of characters visible in the source code Most of
the uses of the result of this operator work at the byte, not the bit, level; for instance, the argument of a
memory-allocation function, which operates in units of bytes Having to divide the result byCHAR_BIT, for
most uses, would not be worth the benefit of being able to accept bit-field members
Other Languages
TheSIZEattribute in Ada returns the number of bits allocated to hold the object, or type TheBIT_SIZE
intrinsic in Fortran 90 returns the number of bits in its integer argument; theSIZEintrinsic returns the number
of elements in an array
Common Implementations
A few vendors have extended thesizeofoperator For instance, Diab Data[359]supports a second argument
to the parenthesized form of thesizeofoperator The value of this argument changes the information
returned (e.g., if the value of the second argument is 1 the alignment of the type is returned, if it is 2 a unique
value denoting the actual type is returned)
Coding Guidelines
The size of an object, or type, is representation information and the guideline recommendation dealing with
the use of representation information might be thought to be applicable However, in some contexts many
569.1 tation in- formation
represen-using
uses of thesizeofoperator are symbolic The contexts in which the size of an operand is often used include 822symbolic
name
the following:
• A call to a storage allocation function requires the number of bytes to allocate
• When copying the representation of an object, either to another object or to a binary file, the number
of bytes to be copied is required
• When an object is being overlaid over the same storage as another object (using a union or pointer to
object type), the sizes in the two types need to agree
• When calculating the range of values representable by the operand (based on the number of bits it
contains)
In some of the uses in these contexts the result of thesizeofoperator is treated as a symbolic value— the
size of its operand, with no interest in its numeric properties While in others the result is manipulated as an
arithmetic value; it is an intermediate value used in the calculation of the final value However, a strong case
can be made for claiming that certain kinds of arithmetic operations are essentially symbolic in nature:
Trang 34• Multiplication of the result (e.g, to calculate the size of an array of objects)
• Division of the result (e.g., to calculate how many objects will fit in a given amount of storage)
• Subtracting from the result (e.g., to calculate the offset of the character that is third from the end of astring literal
• Adding to the result (e.g., calculating the size of an array needed to hold several strings)
Dev569.1
Thesizeofoperator may be used provided the only operators applied to its result (and the result ofthese operations) are divide and multiply
Dev569.1
Thesizeofoperator whose operand has an array type may be used provided the only operators applied
to its result (and the result of these operations) are divide, multiply, addition, and subtraction
For simplicity the deviation wording permits some unintended uses of representation information For stance, the deviations permit both of the expressionssizeof(array_of_int)-5andsizeof(array_of_char)-5.There is a difference between the two in that in the former case the developer is either making use ofrepresentation information or forgot to writesizeof(array_of_int)-5*sizeof(int)(these guidelinerecommendations are not intended to recommend against constructs that are faults) Character types are
bytes returned For instance,sizeof(a_double * b_double)always returns the size of the type specified
by the C semantics, not the size of the representation used by the implementation when multiplying twoobjects of typedouble
A size determined from the type of the operand need not provide an accurate indication of the range ofvalues representable in that operand type (it provides an upper bound on the range of values that can be stored
in an object of that type) A type may contain padding bytes, which will be included in its size In the case offloating-point types, it is also possible that an expression is evaluated to a greater precision than implied byits type Using thesizeofoperator for this purpose is covered by the guideline recommendation dealing
Trang 351121The result is an integer.
Commentary
To be exact, the result has an integer type,size_t
C90
In C90 the result was always an integer constant The C99 contexts in which the result is not an integer
constant all involve constructs that are new in C99
C++
Like C90, the C++Standard specifies that the result is a constant The cases where the result is not a constant
require the use of types that are not supported by C++
1122If the type of the operand is a variable length array type, the operand is evaluated; sizeof
operand evaluated
Commentary
The number of elements in the variable length array is not known until its index expression is evaluated
This evaluation may cause side effects The requirement specified in this C sentence is weakened by a later
sentence in the standard It is possible that the operand may only be partially evaluated 1584sizeof VLAunspecified
Commentary
A full expression having a sizeofoperator as its top-level operator, with such an operand, can occur
anywhere that an integer constant can occur The size is obtained from the type of the operand This
information is available during translation (There is no need to generate any machine code to evaluate
the operand, and this requirement prohibits such generation.) Although the operand is not evaluated, any
operators that appear in it will still cause the integer promotions and usual arithmetic conversions to be 675 integer
pro-motions
706 usual metic conver- sions
arith-performed
Trang 36Coding Guidelines
Some coding guideline documents recommend that the operand of thesizeofoperator should not containany side effects In practice such usage is very rarely seen and no such guideline recommendation is givenhere
1124When applied to an operand that has typechar,unsigned char, orsigned char, (or a qualified version
sizeof char
defined to be
1 thereof) the result is 1
Commentary
The number of bits in the representation of a character type is irrelevant By definition the number of bytes in
a character type is one
byte
addressable unit
53
Coding Guidelines
Developers sometimes associate a byte as always containing eight bits On hosts where the character type
is 16 bits, this can lead to the incorrect assumption that applyingsizeofto a character type will return thevalue 2 These issues are discussed elsewhere
CHAR_BIT
macro307
1125When applied to an operand that has array type, the result is the total number of bytes in the array.85)
structure
unnamed padding1424
1127The value of the result is implementation-defined, and its type (an unsigned integer type) issize_t, defined in
Trang 37The committee has deliberated and decided that more than one interpretation is reasonable.
There is no requirement on implementations to provide a definition of the typesize_tthat is capable of
representing the number of bytes in any object that the implementation is capable of allocating storage for It
is the implementation’s responsibility to ensure that the type it uses forsize_tinternally is the same as the
typedef definition ofsize_tin the supplied header,<stddef.h> If these types differ, the implementation
is not conforming
A developer can define a typedef whose name issize_t(subject to the constraints covering declarations
of identifiers) Such a declaration does not affect the type used by a translator as its result type for thesizeof
operator
C++
5.3.3p1
; the result ofsizeofapplied to any other fundamental type (3.9.1) is implementation-defined
The C++Standard does not explicitly specify any behavior when the operand ofsizeofhas a derived type
A C++implementation need not document how the result of thesizeofoperator applied to a derived type is
calculated
Coding Guidelines
Use of thesizeofoperator can sometimes produce results that surprise developers The root cause of the
surprising behavior is usually that the developer forgot that the result of thesizeofhas an unsigned type
(which causes the type of the other operand, of a binary operator, to be converted to an unsigned type)
Developers forgetting about the unsignedness of the result of asizeofis not something that can be addressed
by a guideline recommendation
1128EXAMPLE 1 A principal use of the sizeofoperator is in communication with routines such as storage
allocators and I/O systems A storage-allocation function might accept a size (in bytes) of an object to allocate
and return a pointer tovoid For example:
extern void *alloc(size_t);
double *dp = alloc(sizeof *dp);
The implementation of theallocfunction should ensure that its return value is aligned suitably for conversion
to a pointer todouble
Commentary
Measurements of existing source (see Table1080.1) shows that this usage represents at most 14% of all uses
of thesizeofoperator
1129EXAMPLE 2 Another use of thesizeofoperator is to compute the number of elements in an array:
sizeof array / sizeof array[0]
Commentary
The declaration of an object having an array type may not contain an explicit value for the size, but obtain it
from the number of elements in an associated initializer
Other Languages
Some languages provide built-in support for obtaining the bounds or the number of elements in an array
For instance, Fortran has the intrinsic functionsLBOUNDandUBOUND; Ada specifies the attributesfirstand
lastto return the lower and upper bounds of array, respectively
Trang 381130EXAMPLE 3 In this example, the size of a variable length array is computed and returned from a function:
#include <stddef.h>
size_t fsize3(int n) {
} int main() {
qualified
array of
1571
the code treat the parameter as having an array or pointer type However, in the context of an operand to the
sizeofoperator, there is an important difference in behavior
Trang 39cast-expression syntax
cast-expression:
unary-expression
( type-name ) cast-expression
Commentary
Acast-expressionis also a unary operator Given that the evaluation of a sequence of unary operators
always occurs in a right-to-left order, the lower precedence of the cast operator is not significant
Measurements by Stiff, Chandra, Ball, Kunchithapadam, and Reps[1327]of 1.36 MLOC (SPEC95 version of
gcc, binutils, production code from a Lucent Technologies product and a few other programs) showed a total
of 23,947 casts involving 2,020 unique types Of these 15,704 involved scalar types (not involving a structure,
union, or function pointer) and 447 function pointer types Of the remaining casts 7,796 (1,276 unique types)
involved conversions between pointers tovoid/charand pointers to structure (in either direction) and 1,053
(209 unique types) conversions between pointers to structs
Constraints
1134Unless the type name specifies a void type, the type name shall specify qualified or unqualified scalar type cast
scalar or void type
and the operand shall have scalar type
Commentary
Casting to thevoidtype is a method of explicitly showing that the value of the operand is discarded Casting
a value having a structure or union type has no obvious meaning (Would corresponding member names be
assigned to each other? What would happen to those members that did not correspond to a member in the
other type?)
C++
There is no such restriction in C++(which permits the type name to be a class type) However, the C++
Standard contains a requirement that does not exist in C
5.4p3
Types shall not be defined in casts
A C source file that defines a type within a cast is likely to cause a C++translator to issue a diagnostic (this
Trang 405 switch ((enum {E1, E2, E3})glob) /* does not affect the conformance status of the program */
Common Implementations
gccsupports the casting of scalar types to union types The scalar type must have the same type as one of themembers of the union type The cast is treated as being equivalent to assigning to the member having thattype This extension removes the need to know the name of the union member
12 }
Coding Guidelines
In this discussion a suffixed literal will be treated as an explicit cast of a literal value, while an unsuffixedliteral is not treated as such An explicit cast is usually interpreted as showing that the developer intended theconversion to take place It is taken as a statement of intent It is often assumed, by readers of the source,that an explicit cast specifies the final type of the operand An explicit cast followed by an implicit one issuspicious; it suggests that either the original developer did not fully understand what was occurring or thatsubsequent changes have modified the intended behavior
5