The New C Standard- P11

1058It provides an unnamed object whose value is given by the initializer list.81 compound literal unnamed object Commentary The difference between this kind of unnamed object and that c

Trang 1

All objects defined outside the body of a function have static storage duration The storage for such objects is

1065 compound literal outside function body

455 static storage duration

initialized before program startup, so can only consist of constant expressions This constraint only differs

151 static storage duration initialized before startup

from an equivalent one for initializers by being framed in terms of “occurring outside the body of a function”

1644 initializer static storage duration object

rather than “an object that has static storage duration.”

outside of an object definition Because their need be no associated type definition, a type name must be

specified (for initializers the type is obtained from the type of the object being initialized)

Other Languages

A form of compound literals are supported in some languages (e.g., Ada, Algol 68, CHILL, and Extended

Pascal) These languages do not always require a type name to be given The type of the parenthesized list of

expressions is deduced from the context in which it occurs

Coding Guidelines

From the coding guideline point of view, the use of compound literals appears fraught with potential pitfalls,

1066 compound literal inside function body

including the use of the term compound literal which suggests a literal value, not an unnamed object.1061compound

literal

is lvalue

However, this construct is new in C99 and there is not yet sufficient experience in their use to know if any

specific guideline recommendations might apply to them

1058It provides an unnamed object whose value is given by the initializer list.81) compound literal

unnamed object

Commentary

The difference between this kind of unnamed object and that created by a call to a memory allocation function

(e.g.,malloc) is that its definition includes a type and it has a storage duration other than allocated (i.e.,

either static or automatic)

Other Languages

Some languages treat their equivalent of compound literals as just that, a literal For instance, like other

literals, it is not possible to take their address

Common Implementations

In those cases where a translator can deduce that storage need not be allocated for the unnamed object, the

as-if rule can be used, and it need not allocate any storage This situation is likely to occur for compound

literals because, unless their address is taken (explicitly using the address-of operator, or in the case of an

array type implicit conversion to pointer type), they are only assigned a value at one location in the source

code At their point of definition, and use, a translator can generate machine code that operates on their

constituent values directly rather than copying them to an unnamed object and operating on that

Guideline recommendations applicable to the unnamed object are the same as those that apply to objects

having the same storage duration For instance, the guideline recommendation dealing with assigning the

address assigned

Example

The following example not only requires that storage be allocated for the unnamed object created by the

compound literal, but that the value it contains be reset on every iteration of the loop

Trang 3

Developers are unlikely to write expressions, such as (int){1}, when(int)1had been intended (on

standard US PC-compatible keyboards the pair of characters ( { and the pair ) } appear on four different

keys) Such usage may occur through the use of parameterized macros However, at the time of this writing

there is insufficient experience with use of this new language construct to know whether any guideline

recommendation is worthwhile

Example

The following all assign a value toloc The first two assignments involve an lvalue to value conversion In

the second two assignments the operand being assigned is already a value

Example

1064The value of the compound literal is that of an unnamed object initialized by the initializer list

Commentary

The distinction between a compound literal acting as if the initializer list was its value, and an unnamed

object (initialized with values from the initializer list) being its value, is only apparent when the address-of

operator is applied to it The creation of an unnamed object does not mean that locally allocated storage is a

factor in this distinction Implementations of languages where compound literals are defined to be literals

sometimes use locally allocated temporary storage to hold their values C implementations may find they can

optimize away allocation of any actual unnamed storage

If a compound literal occurs in a context where its value is required (e.g., assignment) there are obvious

opportunities for implementations to use the values of the initializer list directly C99 is still too new to know

whether most implementations will make use of this optimization

Trang 4

The distinction between the value of a compound literal being an unnamed object and being the values of theinitializer list could be viewed as an unnecessary complication that is not worth educating a developer about.Until more experience has been gained with the kinds of mistakes developers make with compound literals,

it is not possible to recommend any guidelines

8

10 {

14 }

Other Languages

The storage duration specified by other languages, which support some form of compound literal, varies.Some allow the developer to choose (e.g., Algol 68), others require them to be dynamically allocated (e.g.,Ada), while in others (e.g., Fortran and Pascal) the issue is irrelevant because it is not possible to obtain theiraddress

1066otherwise, it has automatic storage duration associated with the enclosing block

compound literal

inside function

A parallel can be drawn between an object definition that includes an initializer and a compound literal (that

is the definition of an unnamed object) The lifetime of the associated objects starts when the block that

Trang 5

contains their definition is entered However, the objects are not assigned their initial value, if any, until the lifetime from

entry to exit of block

declaration is encountered during program execution 462initializationperformed every

time declaration reached

The unnamed object associated with a compound literal is initialized each time the statement that contains

it is encountered during program execution Previous invocations, which may have modified the value of the 1711objectinitializer

literal does not cause any new storage to be allocated Recursive calls to a function containing a compound

literal will cause different storage to be allocated, for the unnamed object, for each nested call

To what extent is it worth trying to optimize compound literals made up of a list of constant expressions;

for instance, by detecting those that are never modified, or by placing them in a static region of storage

that can be copied from or pointed at? The answer to these and many other optimization issues relating to

compound literals will have to wait until translator vendors get a feel for how their customers use this new, to

C, construct

Parallels can be drawn between the unnamed object associated with a compound literal and the temporaries

created in C++ Experience has shown that C++developers sometimes assume that the lifetime of a temporary

is greater than it is required to be by that languages standard Based on this experience it is to be expected

that developers using C might make similar mistakes with the lifetime of the unnamed object associated with

a compound literal Only time will tell whether these mistakes will be sufficiently common, or serious, that

the benefits of being able to apply the address-of operator to a compound literal (the operator that needs to be

used to extend the range of statements over which an unnamed object can be accessed) are outweighed by

the probably cost of faults

Trang 6

The guideline recommendation dealing with assigning the address of an object to a pointer object, whoselifetime is greater than that of the addressed object, is applicable here.

Commentary

They are the same except

• initializer lists don’t create objects, they are simply a list of values with which to initialize an object;and

• the type is deduced from the object being initialized, not a type name

string literal

distinct object

compound literal

Trang 7

A strictly conforming program can deduce if an implementation uses the same object for two string literals,

or compound literals, by performing an equality comparison on their addresses (an infinite number of 1076EXAMPLEstring literals

shared

comparisons would be needed to deduce whether an implementation always used distinct objects) This

permission for string literals is also specified elsewhere 908 string literal

distinct array

The only way a const-qualified object can be modified is by casting a pointer to it to a non-const-qualified

pointer Such usage results in undefined behavior The undefined behavior, if the pointer was used to modify746pointerconverting

quali-fied/unqualified

such an unnamed object that was not distinct, could also modify the values of other compound literal object

values

Other Languages

Most languages do not consider any kind of literal to be modifiable, so whether they share the same storage

locations is not an issue

The extent to which developers will use compound literals having a const-qualified type, for which storage

is allocated and whose values form a sharable subset with another compound literal, remains to be seen

Without such usage it is unlikely that implementors of optimizers will specifically look for savings in this

area, although they may come about as a consequence of optimizations not specifically aimed at compound

literals

Example

In the following there is an opportunity to overlay the two unnamed objects containing zero values

1069EXAMPLE 1 The file scope definition

int *p = (int []){2, 4};

initializespto point to the first element of an array of two ints, the first having the value two and the second,

four The expressions in this compound literal are required to be constant The unnamed object has static

storage duration

Commentary

This usage, rather than the more obviousint p[] = {2, 4};, can arise because the initialization value is

derived through macro replacement The same macro replacement is used in noninitialization contexts

1070EXAMPLE 2 In contrast, in

void f(void) {

pis assigned the address of the first element of an array of two ints, the first having the value previously

pointed to bypand the second, zero The expressions in this compound literal need not be constant The

unnamed object has automatic storage duration

Commentary

The assignment of values to the unnamed object occurs before the value of the right operand is assigned top

Trang 8

drawline((struct point){.x=1, y=1},

(struct point){.x=3, y=4});

Or, ifdrawlineinstead expected pointers tostruct point:

drawline(&(struct point){.x=1, y=1},

&(struct point){.x=3, y=4});

Commentary

This usage removes the need to create a temporary in the calling function The arguments are passed byvalue, like any other structure argument

1072EXAMPLE 4 A read-only compound literal can be specified through constructions like:

(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}

footnote

83

same or overlapping representations

Trang 9

The need to discuss an implementation’s ability to share storage for string literals occurs because it is

possible to detect such sharing in a conforming program (e.g., by comparing two pointers assigned the

addresses of two distinct, in the visible source code, string literals) The C Committee choose to permit this

implementation behavior (There were existing implementations, when the C90 Standard was being drafted,

that shared storage.)

1075EXAMPLE 5 The following three expressions have different meanings:

"/tmp/fileXXXXXX"

(char []){"/tmp/fileXXXXXX"}

(const char []){"/tmp/fileXXXXXX"}

The first always has static storage duration and has type array ofchar, but need not be modifiable; the last

two have automatic storage duration when they occur within the body of a function, and the first of these two

is modifiable

Commentary

In all three cases, a pointer to the start of storage is returned and the first 16 bytes of the storage allocated

will have the same set of values If all three expressions occurred in the same source file, the first and third

could share the same storage even though their storage durations were different Developers who see a 1076EXAMPLEstring literals

shared

potential storage saving in using a compound literal instead of a string literal (the storage for one only need

be allocated during the lifetime of its enclosing block) also need to consider potential differences in the

number of machine code instructions that will be generated Overall, there may be no savings

1076EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and EXAMPLE

als shared

can even be shared For example,

(const char []){"abc"} == "abc"

might yield 1 if the literals’ storage is shared

Commentary

In this example pointers to the first element of the compound literal and a string literal are being compared

for equality Permission to share the storage allocated for a compound literal only applies to those having a

const-qualified type (there is no such restriction on string literals) 1068compoundliteral

distinct object

908 string literal distinct array

Comparing string using an equality operator, rather than a call to thestrcmplibrary function is a common

beginner mistake Training is the obvious solution

Usage

In the visible source of the.cfiles 0.1% of string literals appeared as the operand of the equality operator

(representing 0.3% of the occurrences of this operator)

1077EXAMPLE 7 Since compound literals are unnamed, a single compound literal cannot specify a circularly

linked object For example, there is no way to write a self-referential compound literal that could be used as

the function argument in place of the named objectendless_zerosbelow:

struct int_list { int car; struct int_list *cdr; };

struct int_list endless_zeros = {0, &endless_zeros};

eval(endless_zeros);

Trang 10

A modification using pointer types, and an additional assignment, creates a circularly linked list that uses thestorage of the unnamed object:

3

The following statement would not have achieved the same result:

because the second compound literal would occupy a distinct object, different from the first The value ofendless_zerosin the second compound literal would be pointing at the unnamed object allocated for thefirst compound literal

Other Languages

Algol 68 supports the creation of circularly linked objects (see the Other Languages subsection in thefollowing C sentence)

1078EXAMPLE 8 Each compound literal creates only a single object in a given scope:

EXAMPLE

compound literal

int f (void) {

The functionf()always returns the value 1

Note that if an iteration statement were used instead of an explicitgotoand a labeled statement, the lifetime

of the unnamed object would be the body of the loop only, and on entry next time aroundpwould have anindeterminate value, which would result in undefined behavior

Trang 11

1079Forward references: type names (6.7.6), initialization (6.7.8).

6.5.3 Unary operators

1080

unary-expression syntax

unary-expression:

postfix-expression

++ unary-expression

unary-expression unary-operator cast-expression

sizeof unary-expression

sizeof ( type-name )

unary-operator: one of

& * + - ~ ! Commentary

Note that the operand of unary-operator is acast-expression, not aunary-expression A unary operator1133cast-expression

syntax

usually refers to an operator that takes a single argument Technically all of the operators listed here, plus the

postfix increment and decrement operators, could be considered as being unary operators

Rationale

Unary plus was adopted by the C89 Committee from several implementations, for symmetry with unary minus

Other Languages

Some languages (i.e., Ada and Pascal) specify the unary operators to have lower precedence than the

multiplicative operators; for instance,-x/yis equivalent to-(x/y)in Ada, but(-x)/yin C Most languages1143multiplicative-expression

syntax

call all operators that take a single-operand unary operators

Languages that support the unary+operator include Ada, Fortran, and Pascal Some languages use the

keywordNOTrather than! In the case of Cobol this keyword can also appear to the left of an operator,

indicating negation of the operator (i.e.,NOT <meaning not less than)

Coding guidelines need to be careful in their use of the term unary operator Its meaning, as developers

understand it, may be different from its actual definition in C The operators in aunary-expressionoccur

to the left of the operand The only situation where a developer’s incorrect assumption about precedence

relationships might lead to a difference between predicted and actual behavior is when a postfix operator

occurs immediately to the right of theunary-expression

Dev943.1

Except whensizeof ( type-name )is immediately followed visually by a token having the lexical form

of an additive operator, if aunary-expressionis not immediately followed by a postfix operator it need

not be parenthesized

Although the expressionsizeof (int)-1may not occur in the visible source code, it could easily occur as

the result of macro replacement of the operand of thesizeofoperator This is one of the reasons behind the

guideline recommendation specifying the parenthesizing of macro bodies (without parentheses the expression 1931.2macrodefinition

Trang 12

Token Sequence % Occurrence

of First Token

% Occurrence of Second Token

Figure 1080.1: Number ofinteger-constants having a given value appearing as the operand of the unary minus and unary~

operators Based on the visible form of the c files.

Trang 13

Table 1080.2: Occurrence of theunary-operators, prefix++, and prefix having particular operand types (as a percentage of

all occurrences of the particular operator; an _ prefix indicates a literal operand) Based on the translated form of this book’s

benchmark programs.

The use of an operand of typeboolwith the prefix++operator is deprecated (5.3.2p1); there is no

corre-sponding entry in annex D, but the proposed response to C++DR #145 inserted one In the case of the

decrement operator:

5.3.2p1

The operand shall not be of typebool

A C source file containing an instance of the prefix operator applied to an operand having type_Boolis

likely to result in a C++translator issuing a diagnostic

Enumerated types are usually thought about in symbolic rather than arithmetic terms The increment and 822 symbolic

name

517 enumeration set of named constants

decrement operators can also be given a symbolic interpretation They are sometimes thought about in terms

of moving on to the next symbolic name in a list This move to next operation relies on the enumeration

constants being represented by successive numeric values While this usage is making use of representation

information, there is often a need to step through a series of symbolic names (and C provides no other built-in

mechanism), for instance, iterating over the named constants defined by an enumerated type 1199relationaloperators

real operands

Dev569.1

The operand of a prefix increment or decrement operator may have an enumerated type, provided the

enumeration constants defined by that type have successive numeric values

Semantics

Trang 14

1082The value of the operand of the prefix++operator is incremented.

From the coding guidelines perspective, uses of these operators can be grouped into three categories:

1 The only operator in an expression statement In this context the result returned by the operation isignored The statement simply increments/decrements its operand Use of the prefix, rather than thepostfix, form does not follow the pattern seen at the start of most visible source code statement lines—

an identifier followed by an operator (see Figure940.2) A reader’s scanning of the source looking forobjects that are modified will be disrupted by the initial operator For this reason, use of the postfixform is recommended

so that a prefix operator does not occur in the same expression as other operators The evaluation can

be moved back before the containing expression (see the postfix operators for a fuller discussion ofthis point)

in serial rather than nested form)

3 The third point is the same as for the postfix operators

Trang 15

1083The result is the new value of the operand after incrementation prefix ++

result

Other Languages

Pascal contains thesuccoperator This returns the successor value (i.e., it adds one to its operand), but it

does not modify the value of an object appearing as its operand

1084The expression++Eis equivalent to(E+=1)

Commentary

The expression++Eneed not be equivalent toE=E+1(e.g., the expressionEmay contain a side effect)

C++

C++lists an exception (5.3.2p1) for the case whenEhas typebool This is needed because C++does not

define its boolean type in the same way as C The behavior of this operator on operands is defined as a special 476_Boollarge enough

to store 0 and 1

case in C++ The final result is the same as in C

1085See the discussions of additive operators and compound assignment for information on constraints, types, prefix operators

[Note: see the discussions of addition (5.7) and assignment operators (5.17) for information on conversions ]

There is no mention that the conditions described in these clauses also apply to this operator

1086The prefix operator is analogous to the prefix ++operator, except that the value of the operand is

where the affects are not analogous

C++

The prefix operator is not analogous to the prefix++operator in that its operand may not have typebool

Other Languages

Pascal contains thepredreserved identifier This returns the predecessor value, but does not modify the

value of its operand

The guideline recommendation for the prefix++operator has been worded to apply to either operator 1082.1prefixin expression

statement

1087Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).

6.5.3.2 Address and indirection operators

Constraints

1088The operand of the unary&operator shall be either a function designator, the result of a[]or unary*operator, unary &

operand constraints

or an lvalue that designates an object that is not a bit-field and is not declared with theregisterstorage-class

specifier

Trang 16

that the object so declared will be heavily used [Note: the hint can be ignored and in most implementations itwill be ignored if the address of the object is taken —end note]

Source developed using a C++translator may contain occurrences of the unary&operator applied to anoperand declared with the register storage-class specifier, which will cause a constraint violation ifprocessed by a C translator

In itself, use of the address-of operator is relatively harmless The problems occur subsequently when thevalue returned is used to access storage The following are three, coding guideline related, consequences ofbeing able to take the address of an object:

Trang 17

• It provides another mechanism for accessing the individual bytes of an object representation (a pointer

to an object can be cast to a pointer to character type, enabling the individual bytes of an object

representation to be accessed)

761 pointer converted to pointer to character

• It is an alias for the object having that address

• It provides a mechanism for accessing the storage allocated to an object after the lifetime of that object

has terminated

Assigning the address of an object potentially increases the scope over which that object can be accessed

When is it necessary to increase the scope of an object? What are the costs/benefits of referring to an object

using its address rather than its name? (If a larger scope is needed, could an objects definition be moved to a

scope where it is visible to all source code statements that need to refer to it?)

The parameter-passing mechanism in C is pass by value What is often known as pass by reference is1004 function call

preparing for

achieved, in C, by explicitly passing the address of an object Different calls to a function having

pass-by-reference arguments can involve different objects in different calls Passing arguments, by reference,

to functions is not a necessity; it is possible to pass information into and out of functions using file scope

objects

Assigning the address of an object creates an alias for that object It then becomes possible to access the

same object in more than one way The use of aliases creates technical problems for translators (the behavior

implied by the use of therestrictkeyword was introduced into C99 to help get around this problem) and1491 restrict

intended use

can require developers to use additional cognitive resources (they need to keep track of aliased objects)

A classification often implicitly made by developers is to categorize objects based on how they are

accessed, the two categories being those accessed by the name they were declared with and those accessed

via pointers A consequence of using this classification is that developers overlook the possibility, within a

sequence of statements, of a particular object being modified via both methods When readers are aware of an

object having two modes of reference (a name and a pointer dereference) is additional cognitive effort needed

to comprehend the source? Your author knows of no research in on this subject These coding guidelines

discuss the aliasing issue purely from the oversight point of view (faults being introduced because of lack of

information), because there is no known experimental evidence for any cognitive factors

One way of reducing aliasing issues at the point of object access is to reduce the number of objects whose

addresses are taken Is it possible to specify a set of objects whose addresses should not be taken and what

are the costs of having no alternatives for these cases? Is the cost worth the benefit? Restricting the operands

of the address operator to be objects having block scope would limit the scope over which aliasing could

occur However, there are situations where the addresses of objects at file scope needs to be used, including:

• An argument to a function could be an object with block scope, or file scope; for instance, theqsort

function might be called

• In resource-constrained environments it may be decided not to use dynamic storage allocation For

instance, all of the required storage may be defined at file scope and pointers to objects within this

storage used by the program

• The return from a function call is sometimes a pointer to an object, holding information It may

simplify storage management if this is a pointer to an object at file scope

The following guideline recommendation ensures that the storage allocated to an object is not accessed once

the object’s lifetime has terminated

Cg1088.1

The address of an object shall not be assigned to another object whose scope is greater than that of

the object assigned

Dev1088.1

An object defined in block scope, having static storage duration, may have its address assigned to any

other object

Trang 18

A function designator can appear as the operand of the address-of operator However, taking the address of afunction is redundant This issue is discussed elsewhere Likewise for objects having an array type.

In the following it is not possible to take the address ofaor any of its elements

In fact this object is virtually useless (the identifieracan appear as the operand to thesizeofoperator) Ifallocated memory is not permitted (we know the memory requirements of the following on program startup):

unary *

operand has

Depending on the context in which it occurs, there may be restrictions on the pointed-to type (because of thetype of the result)

unary *

result type

1098

C++

type, or a pointer to a function type

C++does not permit the unary*operator to be applied to an operand having a pointer tovoidtype

unary &

operator

Trang 19

For operands with static storage duration, the value of the address operator may be a constant (objects having 1341 address

constant

an array type also need to be indexed with a constant expression) There is no requirement that the address

of an object be the same between different executions of the same program image (for objects with static

storage duration) or different executions of the same function (for objects with automatic storage duration)

All external function references are resolved during translation phase 8 Any identifier denoting a function139transla-tion phase

8

definition will have been resolved

79

C90

This sentence is new in C99 and summarizes what the unary&operator does

C++

Like C90, the C++Standard specifies a pointer to its operand (5.3.1p1) But later on (5.3.1p2) goes on to say:

“In particular, the address of an object of type “cvT” is “pointer tocv T,” with the same cv-qualifiers.”

Other Languages

Many languages do not contain an address-of operator Fortran 95 has an address assignment operator,=>

The left operand is assigned the address of the right operand

Early versions of K&R C treatedp=&xas being equivalent top&=x.[734]

In the case of constant addresses the value used in the program image is often calculated at link-time For

objects with automatic storage duration, their address is usually calculated by adding a known, at translation

time, value (the offset of an object within its local storage area) to the value of the frame pointer for that

function invocation Addresses of elements, or members, of objects can be calculated using the base address

of the object plus the offset of the corresponding subobject

Having an object appear as the operand of the address-of operator causes many implementations to play

safe and not attempt to perform some optimizations on that object For instance, without sophisticated pointer

analysis, it is not possible to know which object a pointer dereference will access (Implementations often

assume all objects that have had their address taken are possible candidates, others might use information on

the pointed-to type to attempt to reduce the set of possible accessed objects.) This often results in no attempt

being made to keep the values of such objects in registers

Implementations’ representation of addresses is discussed elsewhere 540 pointer type

1092If the operand is the result of a unary*operator, neither that operator nor the&operator is evaluated and the &*

result is as if both were omitted, except that the constraints on the operators still apply and the result is not an

lvalue

Commentary

The only effect of the operator pair&*is to remove any lvalueness from the underlying operand The1114 footnote

84

combination*&returns an lvalue if its operand is an lvalue This specification is consistent with the behavior1115 *&

of the last operator applied controlling lvalue-ness This case was added in C99 to cover a number of existing

coding idioms; for instance:

1 #include <stddef.h>

2

Trang 20

This behavior is not specified in C++ Given that either operator could be overloaded by the developer to have

a different meaning, such a specification would be out of place

At the time of this writing a response to C++DR #232 is being drafted (a note from the Oct 2003 WG21meeting says: “We agreed that the approach in the standard seems okay:p = 0; *p;is not inherently anerror An lvalue-to-rvalue conversion would give it undefined behavior.”)

4

8

11 }

Some C90 implementations did not optimize the operator pair&*into a no-op In these implementations thebehavior of the unary*operator was not altered by the subsequent address-of operator C99 implementationsare required to optimize away the operator pair&*

1093Similarly, if the operand is the result of a[]operator, neither the&operator nor the unary*that is implied bythe[]is evaluated and the result is as if the&operator were removed and the[]operator were changed to a

Trang 21

8 * For instance, we might want to loop over an object, using

This requirement was not explicitly specified in the C90 Standard It was the subject of a DR #076 that was

closed by adding this wording to the C99 Standard

C++

This behavior is not specified in C++ Given that either operator could be overloaded by the developer to have

a different meaning, such a specification would be out of place The response to C++DR #232 may specify

the behavior for this case

This requirement describes how all known C90 implementations behave

The expression&a[index], in the visible source code, could imply

• a lack of knowledge of C semantics (why wasn’ta+indexwritten?),

• that the developer is trying to make the intent explicit, and

• that the developer is adhering to a coding standard that recommends against the use of pointer

arithmetic— the authors of such standards often view(a+index)as pointer arithmetic, buta[index]

as an array index (the equivalence between these two forms being lost on them) 989array sub-script

identical to

1094Otherwise, the result is a pointer to the object or function designated by its operand

Commentary

There is no difference between the use of objects having a pointer type and using the address-of operator For

instance, the result of the address-of operator could be assigned to an object having the appropriate pointer

type, and that object used interchangeably with the value assigned to it

Trang 22

Other Languages

Some languages (e.g., Pascal and Ada) use the postfix operator^ Other languages— Algol 68 and Fortran95— implicitly perform the indirection operation In this case, an occurrence of operand, having a pointertype, is dereferenced to return the value of the pointed-to object

Some coding guideline documents place a maximum limit on the number of simultaneous indirectionoperators that can be successively applied The rationale being that deeply nested indirections can be difficult

to comprehend Is there any substance to this claim?

Expressions, such as***p, are similar to nested function calls in that they have to be comprehended in a

1000 right-to-left order The issue of nested constructions in natural language is discussed in that earlier C sentence

At the time of this writing there is insufficient experimental evidence to enable a meaningful cost/benefitanalysis to be performed and these coding guidelines say nothing more about this issue

If sequences of unary*operators are needed in an expression, it is because an algorithm’s data structuresmake the usage necessary In practice, long sequences of indirections using the unary*operator are rare Likethe function call case, it may be possible to provide a visual form that provides a higher-level interpretationand hides the implementation’s details of the successive indirections

An explicit unary*operator is not the only way of specifying an indirection Both the array subscript,

member

selection1031

[], and member selection,->, binary operators imply an indirection Developers rarely use the form(*s).m((&s)->m), the form s->m (s.m) being much more obvious and natural While the expressions1->m1->m2->m3is technically equivalent to(*(*(*s1).m1).m2).m3, it is comprehended in a left-to-rightorder

The operand could be an object, with some pointer to function type, or it could be an identifier denoting

a function that has been implicitly converted to a pointer to function type This result is equivalent to theoriginal function designator Depending on the context in which it occurs this function designator may be

For most implementations the result is an address of a storage location Whether there is a function definition(translated machine code) at that address is not usually relevant until an attempt is made to call the designatedfunction (using the result)

Trang 23

Because of the implicit conversions a translator is required to perform, the unary*operator is not required to

cause the designated function to be called There are a number of situations that can cause such usage to

appear in source code: the token sequence may be in automatically generated source code, or the sequence

may occur in developer-written source via arguments passed to macros, or developers may apply it to objects

having a pointer to function type because they are unaware of the implicit conversions that need to be

performed

Example

The indirection operator produces a result that allows the pointed-to object to be treated like an anonymous

object The result can appear in the same places that an identifier (defined to be an object of the same type)

can appear The resulting lvalue might not be a modifiable lvalue There may already be an identifier that 724 modifiable

lvalue

refers to the same object If two or more different access paths to an object exist, it is said to be aliased 971 object

aliased

Some processors (usually CISC) have instructions that treat their operand as an indirect reference For

instance, an indirect load instruction obtains its value from the storage location pointed to by the storage

location that is the operand of the instruction

1098If the operand has type “pointer to type”, the result has type “type” unary *

result type

Commentary

The indirection operator removes one level of pointer from the operand’s type The operand is required to

have pointer type In many contexts the result type of a pointer to function type will be implicitly converted 1089unary *operand has

pointer type

732 function designator converted to type

back to a pointer type

1099If an invalid value has been assigned to the pointer, the behavior of the unary*operator is undefined.84)

Commentary

The standard does not provide an all-encompassing definition of what an invalid value is The footnote1114 footnote

84

gives some examples An invalid value has to be created before it can be assigned and this may involve a

conversion operation Those pointer conversions for which the standard defines the behavior do not create 743pointerto void

what is commonly known as an out of bounds array access is specified here

Trang 24

do perform a few checks.

Forward references: storage-class specifiers (6.7.1), structure and union specifiers (6.7.2.1).

6.5.3.3 Unary arithmetic operators

Constraints

1101The operand of the unary+or-operator shall have arithmetic type;

Trang 25

There are algorithms (e.g., in graphics applications) that require the bits in an integer value to be

comple-mented, and processors invariably contain an instruction for performing this operation Complementing

the bits in a floating-point value is a very rarely required operation and processors do not contain such an

instruction This constraint reflects this common usage

Other Languages

While many languages do not contain an equivalent of the~operator, their implementations sometimes

include it as an extension

Some coding guideline documents only recommend against the use of operands having a signed type The

argument is that the representation of unsigned types is defined by the standard, while signed types might have

one of several representations In practice, signed types almost universally have the same representation—

two’s complement However, the possibility of variability of integer representation across processors is not 612 two’s

comple-ment

the only important issue here The~operator treats its operand as a sequence of bits, not a numeric value As

such it may be making use of representation information and the guideline recommendation dealing with this

issue would be applicable

569.1 tation information

C++

The C++Standard does not specify any requirements on the type of the operand of the!operator

5.3.1p8

The operand of the logical negation operator!is implicitly converted tobool(clause 4);

But the behavior is only defined if operands of scalar type are converted tobool:

The following are two possible ways of thinking about this operator are:

1 As a shorthand form of the!=operator in a conditional expression That is, in the same way the

two formsif (x)andif (x == 0)are equivalent, the two formsif (!x)andif (x != 0)are

equivalent

2 As a logical negation operator that reverses the state of a boolean value (it can take as its operand a

value in either of the possible boolean representation models and map it to the model that uses the 0/1476 boolean role

for its boolean representation)

A double negative is very often interpreted as a positive statement in English (e.g., “It is not unknown for

double negatives to occur in C source”) The same semantics that apply in C However, in some languages

(e.g., Spanish) a double negative is interpreted as making the statement more negative (this usage does occur

in casual English speech, e.g., “you haven’t seen nothing yet”, but it is rare and frowned on socially[120])

The token!is commonly called the not operator This term is a common English word whose use in a

sentence is similar to its use in a C expression Through English language usage the word not, or an equivalent

Trang 26

form, can appear as part of an identifier spelling (e.g.,not_finished,no_signal, orunfinished) The

A simple expression containing a double negation is likely to require significantly more cognitive resources

to comprehend than a one that does not Changing the semantic associations of an identifier from (thoseimplied by)not_finishedtofinishedwould require that occurrences ofnot_finishedbe changed to

!finished(plus associated changes to any appearances of the identifier as the operand of the! or theequality operators)

Calculating the difference in cognitive cost/benefit between using an identifier spelling that represents anegated form and one that does not requires information on a number of factors For instance, whether anydouble negative forms actually appear in the source, the extent to which the not spelling form provides a goodfit to the application domain, and any cognitive cost differences between the alternative formsnot_finishedand!finished Given the uncertainty in the cost/benefit analysis no guideline recommendation is givenhere

Table 1103.1: Occurrence of the unary!operator in various contexts (as a percentage of all occurrences of this operator and the percentage of all occurrences of the given context that contains this operator) Based on the visible form of the c files.

Use of the unary+operator is very rare in developer-written source If it appears immediately after the

=operator in existing code, the possible early K&R interpretation might be applicable The usage is nowsufficiently rare that a discussion on whether to do nothing, replace every occurrence by the sequence+=,introduce a separating white-space character, parenthesize the value being assigned, or do something else isnot considered worthwhile

Example

1 /*

Trang 27

The two contexts in which the integer promotions would not be performed, unless the unary+operator is

applied, are the right operand of a simple assignment and the operand of thesizeofoperator 1303 simple

as-signment

1119 sizeof result of

1106The result of the unary-operator is the negative of its (promoted) operand

Commentary

The expression-xis not always equivalent to0-x; for instance, ifxhas the value0.0, the results will be

-0.0and0.0, respectively

Most processors include a single instruction that performs the negation operation On many RISC processors

this instruction is implemented by the assembler using an alias of the subtract instruction (for integer operands

only) On such processors there is usually a register hardwired to contain the value zero (the IBM/Motorola

floating-point representations, the negation operator simply changes the value of the sign bit

If the operand has an unsigned type, the result will always be a positive or zero value This issue is discussed

unsigned operand

Example

The expression-1is the unary-operator applied to the integer constant1

1107The integer promotions are performed on the operand, and the result has the promoted type

The integer promotions may convert an unsigned type to a signed type However, this can only happen

if the signed type can represent all of the values of the unsigned type This is reflected in the guideline

715 signed integer represent all unsigned integer values

recommendation for unsigned types

1101.1 unary minus

unsigned operand

1108The result of the~operator is the bitwise complement of its (promoted) operand (that is, each bit in the result bitwise

com-plement result is

is set if and only if the corresponding bit in the converted operand is not set)

Commentary

The term bitwise not is sometimes used to denote this operator (it is sometimes also referred to by the bitwise not

character used to represent it, tilde) Because its use is much less frequent than logical negation, this term is 1111logicalnegation

result is

rarely shortened

Trang 28

Most processors have an instruction that performs this operation An alternative implementation is toexclusive-or the operand with an all-bits-one value (containing the same number of bits as the promotedtype) The Unisys A Series[1423]uses signed magnitude representation and if an operand has an unsignedtype, the sign bit in the object representation (which is treated as a padding bit) is not affected by the bitwise

a processor whoseintsize differs from the original processor on which the code executed can cause latentdifferences in behavior to appear For instance, ifsizeof(int)==sizeof(short)on the original processor,then any integer promotions on operands having typeshortwould not increase the number of bits in thevalue representation and a program may have an implicit dependency on this behavior occurring Moving

to a processor wheresizeof(int) > sizeof(short)may require modifications to explicitly enforce thisdependency The issues involved in guideline recommendations that only deliver a benefit when a program isported to a processor whose integer widths are different from the original processor are discussed elsewhere

Trang 29

1110If the promoted type is an unsigned type, the expression~Eis equivalent to the maximum value representable

in that type minusE

The issues surrounding the use of bitwise operations to perform equivalent arithmetic operations is discussed

The term not (or logical not) is often used to denote this operator The much less frequently used operator, logical not

result is

C++

5.3.1p8

its value istrueif the converted operand isfalseandfalseotherwise

This difference is only visible to the developer in one case In all other situations the behavior is the same1112logicalnegation

The implementation of this operator often depends on the context in which it occurs The machine code

generated can be very different if the result value is used to decide the control flow (e.g., it is the final

operation in the evaluation of a controlling expression) than if the result value is the operand of further 1744if statementoperand compare

against 0

operators In the control flow case an actual value of 0 or 1 is not usually required On many processors

loading a value from storage into a register will set various bits in a conditional flags register (these flag bit

settings usually specify some relationship between the value loaded and zero— e.g., equal to, less than, etc.)

A processor’s conditional branch instructions use the current settings of combinations of these bits to decide

whether to take the branch or not When the result is used as an operand in further operations, a 0 or 1 value

is needed; the generated machine code is often more complex A common solution is the following pseudo

machine code sequence (which leaves the result inREG_1):

While the result is specified in numeric terms, most occurrences of this operator are as the top-level operator

in a controlling expression (see Usage below) These contexts are usually considered in boolean rather than

Trang 30

1112The result has typeint.

5.3.1p8 The type of the result isbool

The difference in result type will result in a difference of behavior if the result is the immediate operand ofthesizeofoperator Such usage is rare

• The!character is not frequently encountered in formal education, and it may be easy to miss in avisual scan of source (no empirical studies using the!character are known to your author)

• The equality operator,==, is sometimes mistyped as an assignment operator,=

is! If the operand is thought of as being arithmetic, the conceptually closest operator is==

A number of studies have investigated the impact of negation in reasoning tasks In natural languagesnegation comes in a variety of linguistic forms (e.g., “no boys go to class”, “few boys go to class”, “someboys go to class”) and while the results of these studies[707]of human performance using these forms may

Trang 31

be of interest to some researchers, they don’t have an obvious mapping to C language usage (apart from

the obvious one that negating a sentence involves an additional operator, the negation, which itself needs

cognitive resources to process)

Usage

The visible form of the.cfiles contain 95,024 instances of the operator!(see Table912.2for information

on punctuation frequencies) and 27,008 instances of the token sequence== 0(plus 309 instances of the form

== 0x0) Integer constants appearing as the operand of a binary operator occur 28 times more often as the

right operand than as the left operand

111484) Thus,&*Eis equivalent toE(even ifEis a null pointer), and&(E1[E2])to((E1)+(E2)) footnote

84

Commentary

This footnote sentence should really have been referenced from a different paragraph, where these equiva-1092 &*

lences are discussed

C90

This equivalence was not supported in C90, as discussed in the response to DR #012, #076, and #106

C++

At the moment the C++Standard specifies no such equivalence, explicitly or implicitly However, this

situation may be changed by the response to DR #232

1115It is always true that ifEis a function designator or an lvalue that is a valid operand of the unary&operator, *&

*&Eis a function designator or an lvalue equal toE

Commentary

This statement can be deduced from the specifications of the two operators concerned

1116If*Pis an lvalue andTis the name of an object pointer type,*(T)Pis an lvalue that has a type compatible

with that to whichTpoints

Commentary

The result of the cast operator is not an lvalue However, if the operand is a pointer, the pointed-to object1131 footnote

85

does not lose its lvalue-ness This sentence simply points out the type of the result of the operations and its

lvalue-ness; it does not give any additional semantics to the cast or dereference

C++

The C++Standard makes no such observation

1117Among the invalid values for dereferencing a pointer by the unary*operator are a null pointer, an address

inappropriately aligned for the type of object pointed to, and the address of an object after the end of its

lifetime

Commentary

This list contains some examples of invalid values that may appear directly in the source; it is not exhaustive

(another example is dereferencing a pointer-to function) The invalid values may also be the result of an

operation that has undefined behavior For instance, using pointer arithmetic to create an address that does

not correspond to any physical memory location supported by a particular computing system (In virtual

memory systems this case would correspond to an unmapped address.)

Trang 32

On many implementations theoffsetof macro expands to an expression that dereferences the nullpointer.

values from being created in the first place

6.5.3.4 Thesizeofoperator

Constraints

1118Thesizeofoperator shall not be applied to an expression that has function type or an incomplete type, to the

If thesizeofoperator accepted a bit-field as an operand, it would have to return a value measured in bitsfor all its operands

sizeof

result of1119

C++

The C++Standard contains a requirement that does not exist in C

A C source file that defines a type within asizeofexpression is likely to cause a C++translator to issue adiagnostic Defining a type within asizeofexpression is rarely seen in C source

Trang 33

Table 1118.1: Occurrence of thesizeofoperator having particular operand types (as a percentage of all occurrences of this

operator) Based on the translated form of this book’s benchmark programs.

The operand referred to is the execution-time value of the operand In the case of string literals, escape

sequences will have been converted to a single or multibyte character In these cases the value returned by133transla-tion phase

5

thesizeofoperator does not correspond to the number of characters visible in the source code Most of

the uses of the result of this operator work at the byte, not the bit, level; for instance, the argument of a

memory-allocation function, which operates in units of bytes Having to divide the result byCHAR_BIT, for

most uses, would not be worth the benefit of being able to accept bit-field members

Other Languages

TheSIZEattribute in Ada returns the number of bits allocated to hold the object, or type TheBIT_SIZE

intrinsic in Fortran 90 returns the number of bits in its integer argument; theSIZEintrinsic returns the number

of elements in an array

A few vendors have extended thesizeofoperator For instance, Diab Data[359]supports a second argument

to the parenthesized form of thesizeofoperator The value of this argument changes the information

returned (e.g., if the value of the second argument is 1 the alignment of the type is returned, if it is 2 a unique

value denoting the actual type is returned)

The size of an object, or type, is representation information and the guideline recommendation dealing with

the use of representation information might be thought to be applicable However, in some contexts many

569.1 tation information

represen-using

uses of thesizeofoperator are symbolic The contexts in which the size of an operand is often used include 822symbolic

name

the following:

• A call to a storage allocation function requires the number of bytes to allocate

• When copying the representation of an object, either to another object or to a binary file, the number

of bytes to be copied is required

• When an object is being overlaid over the same storage as another object (using a union or pointer to

object type), the sizes in the two types need to agree

• When calculating the range of values representable by the operand (based on the number of bits it

contains)

In some of the uses in these contexts the result of thesizeofoperator is treated as a symbolic value— the

size of its operand, with no interest in its numeric properties While in others the result is manipulated as an

arithmetic value; it is an intermediate value used in the calculation of the final value However, a strong case

can be made for claiming that certain kinds of arithmetic operations are essentially symbolic in nature:

Trang 34

• Multiplication of the result (e.g, to calculate the size of an array of objects)

• Division of the result (e.g., to calculate how many objects will fit in a given amount of storage)

• Subtracting from the result (e.g., to calculate the offset of the character that is third from the end of astring literal

• Adding to the result (e.g., calculating the size of an array needed to hold several strings)

Dev569.1

Thesizeofoperator may be used provided the only operators applied to its result (and the result ofthese operations) are divide and multiply

Dev569.1

Thesizeofoperator whose operand has an array type may be used provided the only operators applied

to its result (and the result of these operations) are divide, multiply, addition, and subtraction

For simplicity the deviation wording permits some unintended uses of representation information For stance, the deviations permit both of the expressionssizeof(array_of_int)-5andsizeof(array_of_char)-5.There is a difference between the two in that in the former case the developer is either making use ofrepresentation information or forgot to writesizeof(array_of_int)-5*sizeof(int)(these guidelinerecommendations are not intended to recommend against constructs that are faults) Character types are

bytes returned For instance,sizeof(a_double * b_double)always returns the size of the type specified

by the C semantics, not the size of the representation used by the implementation when multiplying twoobjects of typedouble

A size determined from the type of the operand need not provide an accurate indication of the range ofvalues representable in that operand type (it provides an upper bound on the range of values that can be stored

in an object of that type) A type may contain padding bytes, which will be included in its size In the case offloating-point types, it is also possible that an expression is evaluated to a greater precision than implied byits type Using thesizeofoperator for this purpose is covered by the guideline recommendation dealing

Trang 35

1121The result is an integer.

Commentary

To be exact, the result has an integer type,size_t

C90

In C90 the result was always an integer constant The C99 contexts in which the result is not an integer

constant all involve constructs that are new in C99

C++

Like C90, the C++Standard specifies that the result is a constant The cases where the result is not a constant

require the use of types that are not supported by C++

1122If the type of the operand is a variable length array type, the operand is evaluated; sizeof

operand evaluated

Commentary

The number of elements in the variable length array is not known until its index expression is evaluated

This evaluation may cause side effects The requirement specified in this C sentence is weakened by a later

sentence in the standard It is possible that the operand may only be partially evaluated 1584sizeof VLAunspecified

Commentary

A full expression having a sizeofoperator as its top-level operator, with such an operand, can occur

anywhere that an integer constant can occur The size is obtained from the type of the operand This

information is available during translation (There is no need to generate any machine code to evaluate

the operand, and this requirement prohibits such generation.) Although the operand is not evaluated, any

operators that appear in it will still cause the integer promotions and usual arithmetic conversions to be 675 integer

pro-motions

706 usual metic conversions

arith-performed

Trang 36

Some coding guideline documents recommend that the operand of thesizeofoperator should not containany side effects In practice such usage is very rarely seen and no such guideline recommendation is givenhere

1124When applied to an operand that has typechar,unsigned char, orsigned char, (or a qualified version

sizeof char

defined to be

1 thereof) the result is 1

Commentary

The number of bits in the representation of a character type is irrelevant By definition the number of bytes in

a character type is one

byte

addressable unit

53

Developers sometimes associate a byte as always containing eight bits On hosts where the character type

is 16 bits, this can lead to the incorrect assumption that applyingsizeofto a character type will return thevalue 2 These issues are discussed elsewhere

CHAR_BIT

macro307

1125When applied to an operand that has array type, the result is the total number of bytes in the array.85)

structure

unnamed padding1424

1127The value of the result is implementation-defined, and its type (an unsigned integer type) issize_t, defined in

Trang 37

The committee has deliberated and decided that more than one interpretation is reasonable.

There is no requirement on implementations to provide a definition of the typesize_tthat is capable of

representing the number of bytes in any object that the implementation is capable of allocating storage for It

is the implementation’s responsibility to ensure that the type it uses forsize_tinternally is the same as the

typedef definition ofsize_tin the supplied header,<stddef.h> If these types differ, the implementation

is not conforming

A developer can define a typedef whose name issize_t(subject to the constraints covering declarations

of identifiers) Such a declaration does not affect the type used by a translator as its result type for thesizeof

operator

C++

5.3.3p1

; the result ofsizeofapplied to any other fundamental type (3.9.1) is implementation-defined

The C++Standard does not explicitly specify any behavior when the operand ofsizeofhas a derived type

A C++implementation need not document how the result of thesizeofoperator applied to a derived type is

calculated

Use of thesizeofoperator can sometimes produce results that surprise developers The root cause of the

surprising behavior is usually that the developer forgot that the result of thesizeofhas an unsigned type

(which causes the type of the other operand, of a binary operator, to be converted to an unsigned type)

Developers forgetting about the unsignedness of the result of asizeofis not something that can be addressed

by a guideline recommendation

1128EXAMPLE 1 A principal use of the sizeofoperator is in communication with routines such as storage

allocators and I/O systems A storage-allocation function might accept a size (in bytes) of an object to allocate

and return a pointer tovoid For example:

extern void *alloc(size_t);

double *dp = alloc(sizeof *dp);

The implementation of theallocfunction should ensure that its return value is aligned suitably for conversion

to a pointer todouble

Commentary

Measurements of existing source (see Table1080.1) shows that this usage represents at most 14% of all uses

of thesizeofoperator

1129EXAMPLE 2 Another use of thesizeofoperator is to compute the number of elements in an array:

sizeof array / sizeof array[0]

Commentary

The declaration of an object having an array type may not contain an explicit value for the size, but obtain it

from the number of elements in an associated initializer

Other Languages

Some languages provide built-in support for obtaining the bounds or the number of elements in an array

For instance, Fortran has the intrinsic functionsLBOUNDandUBOUND; Ada specifies the attributesfirstand

lastto return the lower and upper bounds of array, respectively

Trang 38

1130EXAMPLE 3 In this example, the size of a variable length array is computed and returned from a function:

#include <stddef.h>

size_t fsize3(int n) {

} int main() {

qualified

array of

1571

the code treat the parameter as having an array or pointer type However, in the context of an operand to the

sizeofoperator, there is an important difference in behavior

Trang 39

cast-expression syntax

cast-expression:

unary-expression

( type-name ) cast-expression

Commentary

Acast-expressionis also a unary operator Given that the evaluation of a sequence of unary operators

always occurs in a right-to-left order, the lower precedence of the cast operator is not significant

Measurements by Stiff, Chandra, Ball, Kunchithapadam, and Reps[1327]of 1.36 MLOC (SPEC95 version of

gcc, binutils, production code from a Lucent Technologies product and a few other programs) showed a total

of 23,947 casts involving 2,020 unique types Of these 15,704 involved scalar types (not involving a structure,

union, or function pointer) and 447 function pointer types Of the remaining casts 7,796 (1,276 unique types)

involved conversions between pointers tovoid/charand pointers to structure (in either direction) and 1,053

(209 unique types) conversions between pointers to structs

Constraints

1134Unless the type name specifies a void type, the type name shall specify qualified or unqualified scalar type cast

scalar or void type

and the operand shall have scalar type

Commentary

Casting to thevoidtype is a method of explicitly showing that the value of the operand is discarded Casting

a value having a structure or union type has no obvious meaning (Would corresponding member names be

assigned to each other? What would happen to those members that did not correspond to a member in the

other type?)

C++

There is no such restriction in C++(which permits the type name to be a class type) However, the C++

Standard contains a requirement that does not exist in C

5.4p3

Types shall not be defined in casts

A C source file that defines a type within a cast is likely to cause a C++translator to issue a diagnostic (this

Trang 40

5 switch ((enum {E1, E2, E3})glob) /* does not affect the conformance status of the program */

gccsupports the casting of scalar types to union types The scalar type must have the same type as one of themembers of the union type The cast is treated as being equivalent to assigning to the member having thattype This extension removes the need to know the name of the union member

12 }

In this discussion a suffixed literal will be treated as an explicit cast of a literal value, while an unsuffixedliteral is not treated as such An explicit cast is usually interpreted as showing that the developer intended theconversion to take place It is taken as a statement of intent It is often assumed, by readers of the source,that an explicit cast specifies the final type of the operand An explicit cast followed by an implicit one issuspicious; it suggests that either the original developer did not fully understand what was occurring or thatsubsequent changes have modified the intended behavior

5

Tiêu đề	Compound Literals Commentary
Thể loại	essay
Năm xuất bản	2009

Định dạng
Số trang	100
Dung lượng	739,28 KB