The New C Standard- P13

Storage for members having such types need not be allocated in the class type that contains the member declaration and there is no requirement that the number of elementsallocated to a m

Trang 1

6.7.2.1 Structure and union specifiers 1401

Commentary

This wording specifies that the form:struct-or-union identifieropt { struct-declaration-list

}declares a new type Other forms of structure declaration that omit the braces either declare an identifier as

a tag or refer to a previous declaration

Other Languages

Whether or not a structure or union type definition is a new type may depend on a languages type compatibility

rules Languages that use structural equivalence may treat different definitions as being the same type (usually

employing rules similar to those used by C for type compatibility across translation units) 633compatibleseparate

transla-tion units

1400The struct-declaration-list is a sequence of declarations for the members of the structure or union

Commentary

Say in words what is specified in the syntax

1401If the struct-declaration-list contains no named members, the behavior is undefined

Commentary

The syntax does not permit the struct-declaration-list to be empty However, it is possible for

unnamed

C++

9p1

An object of a class consists of a (possibly empty) sequence of members and base class objects

Source developed using a C++translator may contain class types having no members This usage will result

in undefined behavior when processed by a C translator

Other Languages

The syntax of languages invariably requires at least one member to be declared and do not permit zero sized

types to be defined

Common Implementations

Most implementations issue a diagnostic when they encounter astruct-declaration-listthat does not

contain any named members However, many implementations also implicitly assume that all declared

objects have a nonzero size and after issuing the diagnostic may behave unpredictably when this assumption

is not met

Coding Guidelines

This construct did not occur in the source code used for this book’s code measurements and in practice

occurrences are likely to be very rare (until version 3.3.1gccreported “internal compiler error” for many

uses of objects declared to have such a type) and a guideline recommendation is not considered worthwhile

Trang 2

6.7.2.1 Structure and union specifiers

1403

1402The type is incomplete until after the}that terminates the list

In the second definition the closing}(the one before thex) completes the type and thesizeofoperator can

be applied to the type

Java uses references for all non-primitive types Storage for members having such types need not be allocated

in the class type that contains the member declaration and there is no requirement that the number of elementsallocated to a member having array type be known at translation time

Table 1403.1: Occurrence of structure member types (as a percentage of the types of all such members) Based on the translated form of this book’s benchmark programs.

int 15.8 unsigned short 7.7 char * 2.3 void*() 1.3 other-types 12.7 struct 7.2 enum 1.9 float 1.2

unsigned char 11.1 unsigned long 5.2 long 1.8 short 1.0

unsigned int 10.4 unsigned 4.0 char 1.8 int*() 1.0

struct * 8.8 unsigned char[] 3.1 char[] 1.5

Trang 3

Table 1403.2: Occurrence of union member types (as a percentage of the types of all such members) Based on the translated

form of this book’s benchmark programs.

struct 46.9 unsigned int 3.8 double 1.9 char[] 1.3

other-types 11.3 char * 2.8 enum 1.7 union * 1.1

struct * 8.3 unsigned long 2.4 unsigned char 1.5

int 6.0 unsigned short 2.1 struct[] 1.3

unsigned char[] 4.3 long 2.1 ( struct * )[] 1.3

1404In addition, a member may be declared to consist of a specified number of bits (including a sign bit, if any)

Commentary

The ability to declare an object that consists of a specified number of bits is only possible inside a structure

or union type declaration

Other Languages

Some languages (e.g., CHILL) provide a mechanism for specifying how the elements of arrays are laid out

and the number of bits they occupy Languages in the Pascal family support the concept of subranges A

subrange allows the developer to specify the minimum and maximum range of values that an object needs to

be able to represent The implementation is at liberty to allocate whatever resources are needed to satisfy this

requirement (some implementations simply allocate an integers worth of storage, while others allocate the

minimum number of bytes needed)

Why would a developer want to specify the number of bits to be used in an object representation? This level

of detail is usually considered to be a low level implementation information The following are possible

reasons for this usage include:

• Minimizing the amount of storage used by structure objects This remains, and is likely to continue to

remain, an important concern in applications where available storage is very limited (usually for cost

reasons)

• There is existing code, originally designed to run in a limited storage environment The fact that

storage requirements are no longer an issue is rarely a cost-effective rationale for spending resources

on removing bit-field specifications from declarations

• Mapping to a hardware device There are often interfaced via particular storage locations (organized

as sequences of bits), or transfer data is some packed format Being able to mirror the bit sequences of

the hardware using some structure type can be a useful abstraction (which can require the specification

of the number of bits to be allocated to each object)

• Mapping to some protocol imposed layout of bits For instance, the fields in a network data structure

(e.g., TCP headers)

The following are some of the arguments that can be made for not using bit-fields types:

• Many of the potential problems associated with objects declared to have an integer type, whose rank is

less thanint, also apply to bit-fields However, one difference between them is that developers do not480.1 object

int type only

habitually use bit-fields, to the extent that character types are used If developers don’t use bit-fields

out of habit, but put some thought into deciding that their use is necessary a guideline recommendation

would be redundant (treating guideline recommendations as prepackaged decision aids) 0codingguidelines

introduction

representation

Trang 4

1409

• The specification of bit-field types involves a relatively large number of implementation-definedbehaviors, dealing with how bit-fields are allocated in storage However, recommending against theuse of bit-fields only prevents developers from using one of the available techniques for accessingsequences of bits within objects It is not obvious that bit-fields offer the least cost/benefit of all theavailable techniques (although some coding guideline documents do recommend against the use ofbit-fields)

Bit-fields may be used to interface to some externally imposed storage layout requirements

1405Such a member is called a bit-field ;104)

Commentary

Specifying in words the interpretation to be given to the syntax

Other Languages

Declarations in languages in the Pascal family require the range of values, that need to be representable, to

be specified in the declaration The number of bits used is implementation-defined

If the value 0 or 1 is stored into a nonzero-width bit-field of type_Bool, the value of the bit-field shall compare

equal to the value stored

Trang 5

Commentary

There is no requirement on implementations to allocate the smallest possible storage unit They may even

allocate more bytes thansizeof(int)

Other Languages

Languages that support some form of object layout specification often require developers to specify the

storage unit and the bit offset, within that unit, where the storage for an object starts 1390 struct/union

syntax

Many implementations allocate the same storage unit for bit-fields as they do for the typeint The only

difference being that they will often allocate storage for more than one bit-field in such storage units.1410 bit-field

packed into

Implementations that support bit-field types having a rank different fromintusually base the properties of1395 bit-field

shall have type

the storage unit used (e.g., alignment and size) on those of the type specifier used

Like other integer types, the storage unit used to hold bit-field types is decided by the implementation The

shall have type 569.1 representation information

11 if ((&x.m_3 - &x.m_1) == sizeof(int))

13 if ((&x.m_3 - &x.m_1) == 2*sizeof(int))

This is a requirement on the implementation However, any program written to verify what the implementation

has done, has to make use of other implementation-defined behavior This requirement does not guarantee

that all adjacent bit-fields will be packed in any way An implementation could choose its addressable storage

unit to be a byte, limiting the number of bit-fields that it is required to pack However, if the storage unit used

by an implementation is a byte, this requirement means that all members in the following declaration must

allocated storage in the same byte

Trang 6

alignment 39

values from the two storage units into two registers, followed by a sequence of shift, bitwise-AND, andbitwise-OR operations This wording allows implementation vendors to chose whether they want to supportthis usage, or leave bits in the storage unit unused

Other Languages

Even languages that contain explicit mechanisms for specifying storage layout sometimes allow tions to place restrictions on how objects straddle storage unit boundaries

implementa-Common Implementations

Implementations that do not have alignment restrictions can access the appropriate bytes in a single load

or store instruction and do not usually include a special case to handle overlapping storage units Someprocessors include instructions[985]that can load/store a particular sequence of bits from/to storage

Trang 7

While there is no requirement that the ordering be the same for each sequence of bit-field declarations

(within a structure type), it would be surprising if an implementation used a different ordering for different

declarations Many implementations use the allocation order implied by the order in which bytes are allocated

within multibyte objects

The guideline recommendation dealing with the use of representation information is applicable here

569.1 tation information

represen-using

Example

Commentary

This behavior differs from that of the non-bit-field members, which is implementation-defined 1421 member

alignment

C++

The wording in the C++Standard refers to the bit-field, not the addressable allocation unit in which it resides

Does this wording refer to the alignment within the addressable allocation unit?

9.6p1

Alignment of bit-fields is implementation-defined Bit-fields are packed into some addressable allocation unit

Implementations that support bit-field types having a rank different fromintusually base the properties of1395 bit-field

shall have type

the alignment used on those of the type specifier used

The guideline recommendation dealing with the use of representation information is applicable here

569.1 tation information

represen-using

1414A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.106)

bit-field unnamed

Commentary

Memory mapped devices and packed data sometimes contains sequences of bits that have no meaning

assigned to them (sometimes called holes) When creating a sequence of bit-fields that map onto the

meaningful values any holes also need to be taken into account Unnamed bit-fields remove the need to

create an anonymous name (sometimes called a dummy name) to denote the bit sequences occupied by the

holes In some cases the design of a data structure might involve having some spare bits, between certain

members, for future expansion

Trang 8

declaresabitsto have type array of bit-field, rather than being a bit-field of an array type (which would alsoviolate a constraint) Similarlypbitshas type pointer to bit-field

bit-field

shall have type1395

One of the principles that the C committee derived from the spirit of C was that an operation should not

spirit of C 14

Trang 9

expand to a surprisingly large amount of machine code Arrays of bit-fields potentially require the generation

of machine code to perform relatively complex calculations, compared to non-bit-field element accesses, to

calculate out the offset of an element from the array index, and to extract the necessary bits

The C pointer model is based on the byte as the smallest addressable storage unit As such it is not possible53byteaddressable

unit

to express the address of individual bits within a byte

Other Languages

Some languages (e.g., Ada, CHILL, and Pascal) support arrays of objects that only occupy some of the bits of

a storage unit When translating such languages, calling a library routine that extracts the bits corresponding

to the appropriate element is often a cost effective implementation technique Not only does the offset need

to be calculated from the index, but the relative position of the bit sequence within a storage unit will depend

on the value of the index (unless its width is an exact division of the width of the storage unit) Pointers to

objects that do not occupy a complete storage unit are rarely supported in any language

1419105) As specified in 6.7.2 above, if the actual type specifier used isintor a typedef-name defined asint, footnote

This footnote is new in C99

1420106) An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts footnote

106

Commentary

Bit-fields, named or otherwise, are in general useful for padding to conform to externally imposed layouts

By their nature unnamed bit-fields do not provide any naming information that might help reduce the effort

needed to comprehend the source code

1421Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner member

alignment

appropriate to its type

Commentary

The standard does not require the alignment of other kinds of objects to be documented Developers

sometimes need to be able to calculate the offsets of members of structure types (theoffsetofmacro was

introduced into C90 to provide a portable method of obtaining this information) Knowing the size of each

member, the relative order of members, and their alignment requirements is invariably sufficient information 1422 member

address may depend on the types of the members (because of the requirement that a pointer to an object

behave the same as a pointer to the first element of an array having the same object type) 1165additiveoperators

Trang 10

member

address

increas-ing that increase in the order in which they are declared

Commentary

Although not worded as such, this is effectively a requirement on the implementation It is consistent with

a requirement on the result of comparisons of pointers to members of the same structure object Prior to

• Making use of information on the layout of members to overlay the storage they occupy with otherobjects

By specifying this ordering requirement the committee prevented implementations from using a differentordering (for optimization reasons), increasing the chances that existing practices would continue to work asexpected (these practices also rely on other implementation-defined behaviors) The cost of breaking existing

member

alignment1421

code and reducing the possibility of being able to predict member storage layout was considered to outweighany performance advantages that might be obtained from allowing implementations to choose the relativeorder of members

be minimized A few Pascal (and Ada) implementations reorder members to reduce the storage they use,

or to change alignments to either reduce the total storage requirements or to reduce access costs for somefrequently used members

The quantity and quality of analysis needed to deduce when it is possible to reorder members of structures hasdeterred implementors from attempting to make savings, for the general case, in this area Some impressivesavings have been made by optimizers[751]for languages that do not make this pointer to member guarantee.Palem and Rabbah[1062]looked at the special case of dynamically allocated objects used to create treestructures; such structures usually requires the creation of many objects having the same type A commoncharacteristic of some operations on tree structures is that an access to an object, using a particular membername, is likely to be closely followed by another access to an object using the same member name Ratherthan simply reordering members, they separated out each member into its own array, based on dynamicprofiles of member accesses (the Trimaran[1399]andgcccompilers were modified to handle this translationinternally; it was invisible to the developer) For instance in:

Trang 11

dynamically allocating storage for an object having typestruct Tresulted in storage for the two arrays

being allocated A second dynamic allocation request requires no storage to be allocated, the second array

element from the first allocation can be used If tree structures are subsequently walked in an order that is

close to the order in which they are built, there is an increased probability that members having the same name

will be in the same cache line Using a modifiedgccto process seven data intensive benchmarks resulted in

an average performance improvement of 24% on Intel Pentium II and III, and 9% on Sun Ultra-Sparc-II An

analysis of the Olden benchmark using the same techniques by Shin, Kim, Kim and Han[1254]found that L1

and L2 cache misses were reduced by 23% and 17% respectively and cache power consumption was reduced

by 18%

Franz and Kistler[453]describe an optimization that splits objects across non-contiguous storage areas

to improve cache performance However, their algorithm only applies to strongly typed languages where

developers cannot make assumptions about member layout, such as Java

Zhang and Gupta[1545]developed what they called the common-prefix and narrow-data transformations pointer

compressing members

These compress 32-bit integer values and 32-bit address pointers into 15 bits This transformation is

dynamically applied (the runtime system checks to see if the transformation can be performed) to the

members of dynamically allocated structure objects, enabling two adjacent members to be packed into a

32-bit word (a bit is used to indicate a compressed member) The storage optimization comes from the

commonly seem behavior: (1) integer values tend to be small (the runtime system checks whether the top 18

bits are all 1’s or all 0’s), and (2) that the addresses of the links, in a linked data structure, are often close to

the address of the object they refer to (the runtime system checks whether the two addresses have the same

top 17 bits) Extra machine code has to be generated to compress and uncompress members, which increases

code size (average of 21% on the user code, excluding linked libraries) and lowers runtime performance

(average 30%) A reduction in heap usage of approximately 25% was achieved (the Olden benchmarks were

bench-mark

The order of storage layout of the members in a structure type is representation information that is effectively

guaranteed It would be possible to use this information, in conjunction with theoffsetofmacro to write

code to access specific members of a structure, using pointers to other members However, use of information

on the relative ordering of structure members tends not to be code based, but data based (the same object

is interpreted using different types) The coding guideline issues associated with the layout of types are

layout

1423A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, pointer to

structure points at initial member

then to the unit in which it resides), and vice versa

Commentary

Although not worded as such, this is effectively a requirement on the implementation The only reason for

preventing implementations inserting padding at the start of a structure type is existing practice (and the

resulting existing code that treats the address of a structure object as being equal to the address of the first

member of that structure)

Other Languages

Most languages do not go into this level of representation detail

Trang 12

alignment1421 layout algorithm used by another language, or even that used by another execution environment

The standard does not guarantee that two structure types having exactly the same member types haveexactly the same storage layout, unless they are part of a common initial sequence

Unnamed padding occupies storage bytes within an object The pattern of bits set, or unset, within thesebytes can be accessed explicitly by a conforming program (usingmemcpyormemsetlibrary functions) Theymay also be accessed implicitly during assignment of structure objects It is the values of these bytes that

is a potential cause of unexpected behavior when thememcmp(amongst others) library function is used tocompare two objects having structure type

Trang 13

member when written to

Other Languages

Pascal supports a construct, called a variant tag, that can be used by implementations to check that the

member being read from was the last member assigned to However, use of this construct does require that

developers explicitly declare such a tag within the type definition A few implementations perform the check

suggested by the language standard Ada supports a similar construct and implementations are required to

perform execution time checks, when a member is accessed, on what it calls the discriminant (which holds

information on the last member assigned to)

distinguish between different pointer types and different integer types having the same size) where the

member read is different from the last member stored in

1427A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, union

members start same address

then to the unit in which it resides), and vice versa

Commentary

Although not worded as such, this is effectively a requirement on the implementation A consequence of this

requirement is that all members of a union type have the same offset from the start of the union, zero A

previous requirement dealt with pointer equality between different members of the same union object This

1207 pointer

to union members compare equal

C sentence deals with pointer equality between a pointer to an object having the union type and a pointer to

one of the members of such an object

Trang 14

1428

Other Languages

Strongly typed languages do not usually (Algol 68 does) provide a mechanism that returns the addresses ofmembers of union (or structure) objects The result of this C requirement (that all members have the sameaddress) are not always specified, or implemented, in other languages It may be more efficient on someprocessors, for instance, for members to be aligned differently (given that in many languages unions mayonly be contained within structure declarations and so could follow other members of a structure)

The fact that pointers to different types can refer to the same storage location, without the need for any form

of explicit type conversion, is something that optimizers performing points-to analysis need to take intoaccount

it is the alignment requirements of the member types, rather than their size, that determines whether there

is any unnamed padding at the end of the union type When one member has a type that often requiresalignment on an even address and another member contains an odd number of bytes, it is likely that someunnamed padding will be used

C++

The only time this possibility is mentioned in the C++Standard is under thesizeofoperator:

5.3.3p2 When applied to a class, the result is the number of bytes in an object of that class including any padding required

for placing objects of that type in an array

Other Languages

The algorithms used to assign offsets to structure members are common to implementations of manylanguages, including the rationale for unnamed padding at the end Few language definitions explicitly callout the fact that structure or union types may have unnamed padding at their end

Most implementations use the same algorithm for assigning member offsets and creating unnamed paddingfor all structure and union types in a program, even when these types are anonymous (performing the analysis

to deduce whether the padding is actually required is not straight-forward) Such an implementation strategy

is likely to waste a few bytes in some cases But it has the advantage that, for a given implementation andset of translator options, the same structure declarations always have the same size (there may not be anystandard’s requirement for this statement to be true, but there is sometimes a developer expectation that it istrue)

Trang 15

Unnamed padding is a representation detail associated with storage layout That this padding may occur

after the last declared member is simply another surprise awaiting developers who try to make use of storage

layout details The guideline recommendation dealing with the use of representation information is applicable 1354 storage

layout 569.1 representation information

The Committee introduced this special case, in C99, to provide a standard defined method of using what

has become known as the struct hack Developers sometimes want a structure object to contain an array

object whose number of elements is decided during program execution A standard, C90, well defined,

technique is to have a member point at dynamically allocated storage However, some developers, making

use of representation information, caught onto the idea of simply declaring the last member be an array

of one element Storage for the entire structure object being dynamically allocated, with the storage

allocation request including sufficient additional storage for the necessary extra array elements Because

array elements are contiguous and implementations are not required to perform runtime checks on array

indexes, the additional storage could simply be treated as being additional array elements This C90 usage

causes problems for translators that perform sophisticated flow analysis, because the size of the object being

accessed does not correspond to the size of the type used to perform the access Should such translators play

safe and treat all structure types containing a single element array as their last member as if they will be used

in a struct hack manner?

The introduction of flexible array members, in C99, provides an explicit mechanism for developers to

indicate to the translator that objects having such a type are likely to have been allocated storage to make use

of the struct hack

The presence of a member having an incomplete type does not cause the structure type that contains it to

have an incomplete type

C90

The issues involved in making use of the struct hack were raised in DR #051 The response pointed out

declaring the member to be an array containing fewer elements and then allocating storage extra storage for

additional elements was not strictly conforming However, declaring the array to have a large number of

elements and allocating storage for fewer elements was strictly conforming

Trang 16

1433

All known C90 implementations exhibit the expected behavior for uses of the struct hack However, somestatic analysis tools issue a diagnostic on calls to mallocthat request an amount of storage that is notconsistent (e.g., smaller or not an exact multiple) with the size of the type pointed to by any explicit cast ofits return value

Is the use of flexible arrays members more or less error prone than using any of the alternatives?

The struct hack is not widely used, or even widely known about by developers (although there may besome development communities that are familiar with it) It is likely that many developers will not beexpecting this usage Use of a member having a pointer type, with the pointed-to object being allocatedduring program execution, is a more common idiom (although more statements are needed to allocateand deallocate storage; and experience suggests that developers sometimes forget to free up the additionalpointed-to storage, leading to storage leakage)

From the point of view of static analysis the appearance of a member having an incomplete type providesexplicit notification of likely usage While the appearance of a member having a completed array type islikely to be taken at face value Without more information on developer usage, expectations, and kinds ofmistakes made it is not possible to say anything more on these possible usages

1430this is called a flexible array member

The following are some situations where the member is ignored:

• forming part of a common initial sequence, even if it is the last member,

• compatibility checking across translation units, and

• if an initializer is given in a declaration (this is consistent with the idea that the usage for this type is toallocate variably sized objects viamalloc)

1432

structure

size with

that replaces the flexible array member with an array of unspecified length.106)In particular, the size of thestructure is as if the flexible array member were omitted except that it may have more trailing padding than theomission would imply

Commentary

The C99 specification required implementations to put any padding before the flexible array member.However, several existing implementations (e.g., GNU C, Compaq C, and Sun C) put the padding after theflexible array member Because of the efficiency gains that might be achieved by allowing implementations

to put the padding after the flexible array member the committee decided to sanction this form of layout

The wording was changed by the response to DR #282

Trang 17

1433SecondHowever, when a.(or->) operator has a left operand that is (a pointer to) a structure with a flexible

array member and the right operand names that member, it behaves as if that member were replaced with the

longest array (with the same element type) that would not make the structure larger than the object being

accessed;

Commentary

The structure object acts as if it effectively grows to fill the available space (but it cannot shrink to smaller

than the storage required to hold all the other members)

1434the offset of the array shall remain that of the flexible array member, even if this would differ from that of the

replacement array

Commentary

This is a requirement on the implementation It effectively prevents an implementation inserting additional

padding before the flexible array member, dependent on the size of the array Fixing the offset of the flexible

array member makes it possible for developers to calculate the amount of additional storage required to

accommodate a given number of array elements

1435If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any

attempt is made to access that element or to generate a pointer one past it

insufficient storage has been allocated (assumingsizeof(float) != 1) for there to be more than zero

elements in the array type of the membermem_2 However, the requirements in the C Standard are written on

the assumption that it is not possible to create a zero sized object, hence this as-if specification

Other Languages

Few languages support the declaration of object types requiring zero bytes of storage

1436EXAMPLE Assuming that all array members are aligned the same, after the declarations: EXAMPLE

flexible member struct s { int n; double d[]; };

struct ss { int n; double d[1]; };

the three expressions:

sizeof (struct s) offsetof(struct s, d) offsetof(struct ss, d)

have the same value The structurestructshas a flexible array memberd.

Ifsizeof (double)is 8, then after the following code is executed:

struct s *s1;

struct s *s2;

s1 = malloc(sizeof (struct s) + 64);

Trang 18

6.7.2.2 Enumeration specifiers

1439

and assuming that the calls tomallocsucceed, the objects pointed to bys1ands2behave, for most purposes,

as if the identifiers had been declared as:

struct { int n; double d[8]; } *s1;

struct { int n; double d[5]; } *s2;

Following the further successful assignments:

they then behave as if the declarations were:

struct { int n; double d[1]; } *s1, *s2;

Trang 19

6.7.2.2 Enumeration specifiers 1439

1439

tion specifier syntax

enumera-enum-specifier:

enum identifier opt { enumerator-list }

enum identifier opt { enumerator-list , }

enum identifier enumerator-list:

Many languages do not use a keyword to denote an enumerated type, the type is implicit in the general

declaration syntax Those languages that support enumeration constants do not always allow an explicit

value to be given to an enumeration constant The value is specified by the language specification (invariably

using the same algorithm as C, when no explicit values are provided)

Support for enumeration constants was not included in the original K&R specification (support for this

functionality was added during the early evolution of C[1199]) Many existing C90 implementations support a

trailing comma at the end of anenumerator-list

A general discussion on enumeration types is given elsewhere 517enumerationset of named

constants

The order in which enumeration constants are listed in an enumeration type declaration often follows

some rule, for instance:

• Application conventions (e.g., colors of rainbow, kings of England, etc.)

• Human conventions (e.g., increasing size, direction— such as left-to-right, or clockwise, alphabetic

order, etc.)

• Numeric values (e.g., baud rate, Roman numerals, numeric value of enumeration constant, etc.)

Trang 20

initialized enumeration constants in definition

organize information for later recall), there may be more than one possible ordering, or it may not be possible

to create a meaningful ordering For this reason no guideline recommendation is made here

Do the visual layout factors that apply to the declaration of objects also apply to enumeration constants?

init-declarator

one per source line1348.1

The following are some of the differences between the declarations of enumeration constants and objects:

• There are generally significantly fewer declarations of enumerator constants than objects, in a program(which might rule out a guideline recommendation on the grounds of applying to a construct that rarelyoccurs in source)

• Enumeration constants are usually declared amongst other declarations at file scope (i.e., they are notvisually close to statements) One consequence of this is that, based on declarations being read on

as as-needed basis, the benefits of maximizing the amount of surrounding code that appears on the

reading

kinds of

770

display at the same time are likely to be small

The following guideline recommendation is given for consistency with other layout recommendations

Cg1439.1

No more than one enumeration constant definition shall occur on each visible source code line

The issue of enumeration constant naming conventions is discussed elsewhere

Table 1439.1: Some properties of the set of values (the phrase all values refers to all the values in a particular enumeration definition) assigned to the enumeration constants in enumeration definitions Based on the translated form of this book’s benchmark programs.

All value assigned implicitly 60.1 All values are bitwise distinct and zero is not used 8.6 One or more constants share the same value 2.9 All values are continuous , i.e , number of enumeration constants equals maximum value minus minimum value plus 1

80.4

Trang 21

Constraints

1440The expression that defines the value of an enumeration constant shall be an integer constant expression that

enumera-tion constant representable

in int

has a value representable as anint.

Commentary

This constraint is consistent with the requirement that the value of a constant be in the range of representable

values for its type Enumeration constants are defined to have typeint 823constantrepresentable in its

type

864 enumeration constant type

C++

7.2p1

Theconstant-expressionshall be of integral or enumeration type

7.2p4

If an initializer is specified for an enumerator, the initializing value has the same type as the expression

Source developed using a C++translator may contain enumeration initialization values that would be a

constraint violation if processed by a C translator

1 #include <limits.h>

2

Some implementations support enumeration constants having values that are only representable in the types

unsigned int,long, orunsigned long

The requirement is that the constant expression have a value that is representable as anint The only

requirement on its type is that it be an integer type The constant expression may have a type other thanint

because of the use of a macro name that happens to have some other type, or because one of its operands

happens to have a different type If the constant expression consists, in the visible source, of an integer

constant containing a suffix, it is possible that the original author or subsequent readers may assume some

additional semantics are implied However, such occurrences are rare and for this reason no guideline

covering this case is given here

There may be relationships between different enumeration constants in the same enumeration type The

issue of explicitly showing this relationship in the definition, using the names of those constants rather than

purely numeric values, is a software engineering one and is not discussed further in these coding guidelines

2

4

The enumeration constants defined in by an enumerated type are a set of identifiers that provide a method of

naming members having a particular property These properties are usually distinct and in many cases the

values used to represent them are irrelevant

Semantics

Trang 22

1442

1441The identifiers in an enumerator list are declared as constants that have typeintand may appear wherever

the closing brace, the type of each enumerator is the type of its initializing value

In C the type of an enumeration constant is alwaysint, independently of the integer type that is compatiblewith its enumeration type

17 }

Other Languages

Most languages that contain enumerator types treat the associated enumerated constants as belonging to

a unique type that is not compatible with typeint In these languages an enumeration constant must beexplicitly cast (Pascal provides a built-in function,ord) before they can appear where a constant having type

intmay appear

Trang 23

Other Languages

Not all languages that support enumeration constants allow the value, used to represent them during program

execution, to be specified in their definition

Some guideline documents recommend against assigning an explicit value to an enumeration constant

Such recommendations limit enumeration types to having a symbolic role only It has the effect of giving

developers no choice but to use object-like macros to create sets of identifiers having bit-set roles Using 1931 macro

object-like

macros instead of enumerations makes it much more difficult for static analysis tools to deduce an association

between identifiers (it may still be made apparent to human readers by grouping of macro definitions and

appropriate commenting), which in turn will reduce their ability to flag suspicious use of such identifiers

1443If the first enumerator has no=, the value of its enumeration constant is 0.

Commentary

This choice is motivated by common usage and the fact that arrays are zero based Most enumeration types

contain relatively few enumeration constants and many do not explicitly assign a value to any of them 298limitenumeration

constants

Other Languages

This is the common convention specified by other languages, or by implementations of other languages that

do not specify the initial value

1444Each subsequent enumerator with no = defines its enumeration constant as the value of the constant

expression obtained by adding 1 to the value of the previous enumeration constant

Commentary

If the previous enumeration constant had the valueMAX_INT, adding one will produce a value that cannot be

representable in int

Other Languages

This is the common convention specified by other languages, or by implementations of other languages that

do not specify the initial value

1445(The use of enumerators with=may produce enumeration constants with values that duplicate other values in

the same enumeration.)

Commentary

When such enumeration constants are tested for equality with each other the result will be 1 (true), because it

is their values not their spellings that are compared

C++

The C++Standard does not explicitly mention this possibility, although it does give an example, 7.2p2, of an

enumeration type containing more than one enumeration constant having the same value

Other Languages

No languages known to your author, that support the explicit definition of enumeration constant values,

prohibits the appearance of duplicate values in the same enumeration

There are two ways in which more than one enumeration constant, in the same enumerated type, can have

the same value Either the values were explicitly assigned, or the at least one of the values was implicitly

assigned its value This usage may be an oversight, or it may be intentional (i.e., fixing the names of the

first and last enumeration constant when it is known that new members may be added at a later date) These

guideline recommendations are not intended to recommend against the creation of faults in code What of 0 guidelines

not faults

the intended usage?

Trang 24

1447

Do readers of the source assume there are no duplicate values among different enumeration constants, fromthe same enumerated type? Unfortunately use of enumerations constants are not sufficiently common amongdevelopers to provide the experience needed to answer this question

1446The enumerators of an enumeration are also known as its members

Each enumerated type shall be compatible with an integer type;

The integer types include the enumeration types The change of wording in the C99 Standard removes a

integer types 519

circularity in the specification

C++

The underlying type of an enumeration may be an integral type that can represent all the enumerator values

While it is possible that source developed using a C++translator may select a different integer type than aparticular C translator, there is no effective difference in behavior because different C translators may alsoselect different types

Other Languages

Most languages that support enumerated types treat such types as being unique types, that is not compatiblewith any other type

Trang 25

Experience shows that developers are often surprised by some behaviors that occur when a translator selects

a type other thanintfor the compatible type The two attributes that developers appear to assume an

enumerated type to have are promoting to a signed type (rather than unsigned) and being able to represent

all the values that typeintcan (if values other than those in the enumeration definition are assigned to the

object)

If the following guideline recommendation on enumerated types being treated as not being compatible

with any integer type is followed, these assumptions are harmless

Experience with enumerated types in more strongly typed languages has shown that the diagnostics issued

when objects having these types, or their members, are mismatched in operations with other types, are a very

effective method of locating faults Also a number of static analysis tools[502, 694, 1176]perform checks on the

use of objects having an enumerated type and their associated enumeration constants1447.1

1448The choice of type is implementation-defined,108)but shall be capable of representing the values of all the

members of the enumeration

Commentary

This is a requirement on the implementation

C90

The requirement that the type be capable of representing the values of all the members of the enumeration

was added by the response to DR #071

Other Languages

Languages that support enumeration types do not usually specify low level implementation details, such as

the underlying representation

Most implementations chose the typeint A few implementations attempt to minimize the amount of storage

occupied by each enumerated type They do this by selecting the compatible type to be the integer type with

the lowest rank, that can represent all constant values used in the definition of the contained enumeration

constants

1447.1 However, this is not necessarily evidence of a worthwhile benefit Vendors do sometimes add features to a product because of a

perceived rather actual benefit.

Trang 26

A definition of an enumeration type may not include (most don’t) enumeration constants for each of thepossible values that can be represented in the underlying value representation (invariably some integer type).The guideline recommendation that both operands of a binary operator have the same enumerated type limits,

be expected to apply to the given enumeration type (these are discussed in the C sentence for the respectiveoperators) The following example shows two possibilities:

25 }

1449The enumerated type is incomplete until after the}that terminates the list of enumerator declarations

Trang 27

Following the closing brace of anenum-specifier, each enumerator has the type of its enumeration

Example

The definition:

is not permitted (it is not possible to take the size of an incomplete type) But:

is conforming

1450EXAMPLE The following fragment:

enum hue { chartreuse, burgundy, claret=20, winedark };

enum hue col, *cp;

col = claret;

cp = & col;

if (*cp != burgundy) /* */

makeshuethe tag of an enumeration, and then declarescolas an object that has that type andcpas a

pointer to an object that has that type The enumerated values are in the set { 0, 1, 20, 21 }

and in Ada as:

1 type

3 for hue use (chartreuse => 0, burgundy => 1, claret => 20, winedark => 21);

1451Forward references: tags (6.7.2.3).

1452107) Thus, the identifiers of enumeration constants declared in the same scope shall all be distinct from each footnote

107

other and from other identifiers declared in ordinary declarators

Commentary

This requirement can be deduced from the fact that enumeration constants are in the same name space as

ordinary identifiers, they have no linkage, and that only one identifier with these attributes shall (a constraint)444 name space

C++

The C++Standard does not explicitly make this observation

Other Languages

Ada permits the same identifier to be defined as an enumeration constant in a different enumerated type in

the same scope References to such identifiers have to be explicitly disambiguated

Trang 28

6.7.2.3 Tags

1455

1453108) An implementation may delay the choice of which integer type until all enumeration constants have been

560 storage requirements needed to represent an object having an enumerated type until all of the members

of that type had been processed In the example below, a single pass implementation, that minimizes thestorage allocated, and uses different representations for pointers to different scalar types, would not be able

to evaluatesizeof(enum e_T *)at the point its value is needed to give a value toe2

This behavior is required of a C++implementation because:

7.2p5 The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in

C++requirement that enforces the C one given above

Trang 29

6.7.2.3 Tags 1456

1455

tag name same struct, union or enum

Where two declarations that use the same tag declare the same type, they shall both use the same choice of

struct, union, or enum.

The C90 Standard did not explicitly specify this constraint While the behavior was therefore undefined, it

is unlikely that the behavior of any existing code will change when processed by a C99 translator (and no

difference is flagged here)

C++

7.1.5.3p3

Theclass-keyorenumkeyword present in theelaborated-type-specifiershall agree in kind with the

declaration to which the name in theelaborated-type-specifierrefers

Incomplete types are needed to support the declaration of mutually recursive structure and union types It is

not possible to create a mutually recursive enumerated type and a declaration making use of self-referencing

recursion is an edge case that does not appear to be of practical use

C90

This C99 requirement was not specified in C90, which did not containing any wording that ruled out the

declaration of an incomplete enumerated type (and confirmed by the response to DR #118) Adding this

constraint brings the behavior of enumeration types in line with that for structure and union types 1118 sizeof

constraints

Source code containing declarations of incomplete enumerator types will cause C99 translators to issue a

diagnostic, where a C90 translator was not required to issue one

C++

3.3.1p5

[Note: if theelaborated-type-specifierdesignates an enumeration, theidentifiermust refer to an

already declaredenum-name

3.4.4p2

Trang 30

6.7.2.3 Tags

1458

If theelaborated-type-specifierrefers to anenum-nameand this lookup does not find a previously declared

enum-name, theelaborated-type-specifieris ill-formed

declared in one file422.1

a tag, denoting the same type, is declared more than once is when its type refers to another type in somemutually recursive way

Trang 31

6.7.2.3 Tags 1460

1458The type is incomplete109)until the closing brace of the list defining the content, and complete thereafter tag

incomplete until

Commentary

The closing brace that defines its content may occur in a separate declaration Incomplete types are one of

the three kinds of types defined in C The only other incomplete type isvoid, which can never be completed.475 incomplete

types

523 void

is incomplete type

10 }

11

C90

The C99 Standard more clearly specifies the intended behavior, which had to be inferred in the C90 Standard.1457tag dec-larations

same scope

C++

The C++Standard specifies this behavior for class definitions (9.1p1), but does not explicitly specify this

behavior for declarations in different scope

If the guideline recommendation dealing with the reuse of identifier names is followed there will never be 792.3 identifier

reusing names

two distinct types with the same name The case of distinct tags being declared with function prototype scope

does not need a guideline recommendation Such a declaration will render the function uncallable, as no type

can be declared to be compatible with its parameter type A translator will issue a diagnostic if a call to it

occurs in the source

1460Each declaration of a structure, union, or enumerated type which does not include a tag declares a distinct struct/union

declaration

no tag

type

Commentary

A declaration of a structure or union type that includes a tag may declare a distinct type, or it may refer to a

previously declared distinct type

If one of the identifiers declared is a typedef name, it will be possible to refer to the type in other contexts.1468 footnote

110

If the identifier being declared is an object there is no standard defined way of referring to its type Such

types are sometimes known as anonymous types

Two types have compatible type if they are the same Types that are distinct are not the same 631compati-ble type

if

Trang 32

A type specifier of the form

struct-or-union identifier opt { struct-declaration-list }

or

enum identifier { enumerator-list }

or

enum identifier { enumerator-list , }

declares a structure, union, or enumerated type

Commentary

This specification provides semantics for a subset of the possible token sequences supported by the syntax of

type-specifier The difference between this brace delimited form and the semicolon terminated form is

type specifier

syntax

1378

struct tag; 1464 similar to the difference between the brace delimited and semicolon terminated form of function declarations

(i.e., one specifies content and the other doesn’t)

content

list defines

Commentary

This defines the terms structure content, union content, or enumeration content, which is the content referred

to by the constraint requirement The content is the members of the type declared, plus any type declarations

Trang 33

The term tag is not used in C++, which calls the equivalent construct a class name.

Table 1463.1: Occurrence of types declared with tag names (as a percentage of all occurrences of each keyword) Based on the

visible form of the c and h files.

This form of declaration either declares, or redeclares, the identifier, as a tag, in the current scope The

following are some of the uses for this form of declaration:

• To support mutually referring declarations when there is the possibility that a declaration of one of the

structures

• To provide a mechanism for information hiding Developers can declare a tag in an interface without

specifying the details of a types implementation,

• In automatically generated code, where the generator does not yet have sufficient information to fully

define the content of the type, but still needs to refer to it

1465109) An incomplete type may only by used when the size of an object of that type is not needed footnote

109 size needed

Commentary

When is the size of an object not needed? Who, or what needs the size and when do they need it?

The implementation needs the size of objects to allocate storage for them When does storage need to1354 object

reserve storage

be allocated for an object? In theory, not until the object is encountered during program execution (and

in practice for a few languages) However, delaying storage allocation until program execution incurs a

high-performance penalty Knowing the size during translation enables much more efficient machine code to

be generated Also, knowing the size when the type is first encountered (if the size has to be known by the

implementation) can simplify the job of writing a translator (many existing translators operated in a single

single pass

The size of an object having an incomplete array type is not needed to access an element of that array 728

incom-plete array indexing

The Committee responses to defect reports (e.g., DR #017) asking where the size of an object is needed

do not provide a list of places Now the wording has been moved to a footnote, perhaps this discussion will

subside

C90

Trang 34

6.7.2.3 Tags

1466

It declares a tag that specifies a type that may be used only when the size of an object of the specified type is notneeded

The above sentence appears in the main body of the standard, not a footnote

The C99 wording is more general in that it includes all incomplete types This is not a difference in

The C++Standard contains no such rule, but enumerates the cases:

3.9p8 [Note: the rules for declarations and expressions describe in which contexts incomplete types are prohibited ]

Other Languages

Knowing the size of objects is an issue in all computer languages When the size needs to be known issometimes decided by high-level issues of language design (some languages require their translators toeffectively perform more than one pass over the source code), other times it is decided by implementationtechniques

semantic processing An optimizer may perform multiple passes over the internal representation of statements

in a function, deciding how best to generate machine code for them

1466

It is not needed, for example, when a typedef name is declared to be a specifier for a structure or union, or

size not needed

This is one area where vendors are often silent on how their language extensions operate For instance, the

gcc typeofoperator returns the type of its operand However, the associated documentation says nothingabout the case of the operand type being incomplete and having a tag that is identical to another definitionoccurring within the scope that thetypeofoccurred One interpretation (unsupported by any specificationfrom the vendor) of the following:

Trang 35

is that bothxandyare being declared as being pointers to the type off, that is an incomplete type, and that

the declaration of the tagfred, in a nested scope, has no effect on the declaration ofy

In these contexts the commonly used methods for mapping source code to machine code need to know the

number of bytes in a types object representation

C90

The specification shall be complete before such a function is called or defined

The form of wording has been changed from appearing to be a requirement (which would not be normative

in a footnote) to being commentary

1468110) If there is no identifier, the type can, within the translation unit, only be referred to by the declaration of footnote

xandyare compatible with each other They both have the same anonymous type, but the objectzhas a

different anonymous type Note that the types of the objectsx,y, andzwould be considered to be compatible

if they occurred in different translation units 633compatibleseparate

transla-tion units

C90

This observation was is new in the C90 Standard

Trang 37

6.7.2.3 Tags 1472

2

C++

7.1.5.3p1

If an elaborated-type-specifieris the sole constituent of a declaration, the declaration is ill-formed

unless

The C++Standard does not listenum identifier ;among the list of exceptions and a conforming C++

translator is required to issue a diagnostic for any instances of this usage

The C++Standard agrees with this footnote for its second reference in the C90 Standard

1471 or-union identifier not visible

struct-or-union identifier not visible

struct-or-union identifier

occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible,

then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.111

Commentary

The forms ofstruct-or-union identifier, excluded by this wording, are the identifier being followed

by a semicolon or a left brace The remaining possible occurrences of this form are described elsewhere and 1466size notneeded

examples

include:

C++

The C++Standard does not explicitly discuss this kind of construction/occurrence, although 3.9p6 and 3.9p7

discuss this form of incomplete type

When no other declaration is visible at the point this type specifier occurs, should this usage be permitted?

Perhaps it was intended that a tag be visible at the point in the source where this type specifier occurs

However, not having a prior declaration visible is either harmless (intended or otherwise), or will cause a

diagnostic to be issued by a translator

A pointer to an incomplete structure or union type is a more strongly typed form of generic pointer than

a pointer to void Whether this use of pointer to incomplete types, for information hiding purposes, is

worthwhile can only be decided by the developer

struct-or-union identifier visible

struct-or-union identifier

or

enum identifier

occurs other than as part of one of the above forms, and a declaration of the identifier as a tag is visible, then

it specifies the same type as that other declaration, and does not redeclare the tag

Trang 38

6.7.2.3 Tags

1473

Commentary

The forms ofstruct-or-union identifier, excluded by this wording, are the identifier being followed

by a semicolon or a left brace This is the one form that is not a declaration

a member selection operator to a subscript operator) Also experience suggests that this kind of change

in type is not common

• Changes in an objects structure type may occur as a program evolves For instance, the objectxmayhave structure typet_1because it needs to represent information denoted by a few of the members ofthat type At a later time the typet_1may be subdivided into several structure types, with the membersreferenced byxbeing declared in the typet_1_3 Developers then have the choice of changing thedeclaration ofxto bet_1_3, or leaving it alone However, the prior use of a typedef name, rather than

a tag, is unlikely to result in any cost savings, when changing the declaration ofx(i.e., developers arelikely to have declaredxto have typet_1, rather than a synonym of that type, so the declaration ofx

will either have to be edited)

• What are the cognitive costs and benefits associated with the presence, or absence of a keyword inthe source of a declaration? There is a cost to readers in having to process an extra token (i.e., thekeyword) in the visible source, or any benefits, to readers of the visible source However, the visualpresence of this keyword may reduce the cognitive effort needed to deduce the kind of declarationbeing made There does not appear to be a significant cost/benefit difference between any of thesecognitive issues

1473EXAMPLE 1 This mechanism allows declaration of a self-referential structure

struct tnode { int count;

struct tnode *left, *right;

The following alternative formulation uses thetypedefmechanism:

Trang 39

∗ no tag in definition

Figure 1472.1: Number of files containing a given number of references to each tag previously defined in the visible source

of that file (times, bullet, square; the definition itself is not included in the count), tags with no definition visible in the c file

(triangle; i.e., it is defined in a header) and anonymous structure/union/enumeration definitions (star) Based on the visible form

of the c files.

typedef struct tnode TNODE;

struct tnode { int count;

TNODE *left, *right;

refer-the declarations

struct s1 { struct s2 *s2p; /* */ }; // D1 struct s2 { struct s1 *s1p; /* */ }; // D2

specify a pair of structures that contain pointers to each other Note, however, that ifs2were already declared

as a tag in an enclosing scope, the declarationD1would refer to it, not to the tags2declared inD2 To eliminate

this context sensitivity, the declaration

struct s2;

may be inserted ahead ofD1 This declares a new tag s2in the inner scope; the declarationD2then completes

the specification of the new type

Trang 40

optimization techniques are based on the principle of “cacheing”: under certain circumstances the compilercan remember the last value accessed (read or written) from a location, and use this retained value the nexttime that location is read (The memory, or “cache”, is typically a hardware register.) If this memory is amachine register, for instance, the code can be smaller and faster using the register rather than accessingexternal memory.

Tiêu đề	The New C Standard- P13
Trường học	Unknown University
Chuyên ngành	Computer Science
Thể loại	presentation
Năm xuất bản	2009
Thành phố	Unknown City

Định dạng
Số trang	100
Dung lượng	605,7 KB