Storage for members having such types need not be allocated in the class type that contains the member declaration and there is no requirement that the number of elementsallocated to a m
Trang 16.7.2.1 Structure and union specifiers 1401
Commentary
This wording specifies that the form:struct-or-union identifieropt { struct-declaration-list
}declares a new type Other forms of structure declaration that omit the braces either declare an identifier as
a tag or refer to a previous declaration
Other Languages
Whether or not a structure or union type definition is a new type may depend on a languages type compatibility
rules Languages that use structural equivalence may treat different definitions as being the same type (usually
employing rules similar to those used by C for type compatibility across translation units) 633compatibleseparate
transla-tion units
1400The struct-declaration-list is a sequence of declarations for the members of the structure or union
Commentary
Say in words what is specified in the syntax
1401If the struct-declaration-list contains no named members, the behavior is undefined
Commentary
The syntax does not permit the struct-declaration-list to be empty However, it is possible for
unnamed
C++
9p1
An object of a class consists of a (possibly empty) sequence of members and base class objects
Source developed using a C++translator may contain class types having no members This usage will result
in undefined behavior when processed by a C translator
Other Languages
The syntax of languages invariably requires at least one member to be declared and do not permit zero sized
types to be defined
Common Implementations
Most implementations issue a diagnostic when they encounter astruct-declaration-listthat does not
contain any named members However, many implementations also implicitly assume that all declared
objects have a nonzero size and after issuing the diagnostic may behave unpredictably when this assumption
is not met
Coding Guidelines
This construct did not occur in the source code used for this book’s code measurements and in practice
occurrences are likely to be very rare (until version 3.3.1gccreported “internal compiler error” for many
uses of objects declared to have such a type) and a guideline recommendation is not considered worthwhile
Trang 26.7.2.1 Structure and union specifiers
1403
1402The type is incomplete until after the}that terminates the list
In the second definition the closing}(the one before thex) completes the type and thesizeofoperator can
be applied to the type
Java uses references for all non-primitive types Storage for members having such types need not be allocated
in the class type that contains the member declaration and there is no requirement that the number of elementsallocated to a member having array type be known at translation time
Table 1403.1: Occurrence of structure member types (as a percentage of the types of all such members) Based on the translated form of this book’s benchmark programs.
int 15.8 unsigned short 7.7 char * 2.3 void*() 1.3 other-types 12.7 struct 7.2 enum 1.9 float 1.2
unsigned char 11.1 unsigned long 5.2 long 1.8 short 1.0
unsigned int 10.4 unsigned 4.0 char 1.8 int*() 1.0
struct * 8.8 unsigned char[] 3.1 char[] 1.5
Trang 36.7.2.1 Structure and union specifiers 1404
Table 1403.2: Occurrence of union member types (as a percentage of the types of all such members) Based on the translated
form of this book’s benchmark programs.
struct 46.9 unsigned int 3.8 double 1.9 char[] 1.3
other-types 11.3 char * 2.8 enum 1.7 union * 1.1
struct * 8.3 unsigned long 2.4 unsigned char 1.5
int 6.0 unsigned short 2.1 struct[] 1.3
unsigned char[] 4.3 long 2.1 ( struct * )[] 1.3
1404In addition, a member may be declared to consist of a specified number of bits (including a sign bit, if any)
Commentary
The ability to declare an object that consists of a specified number of bits is only possible inside a structure
or union type declaration
Other Languages
Some languages (e.g., CHILL) provide a mechanism for specifying how the elements of arrays are laid out
and the number of bits they occupy Languages in the Pascal family support the concept of subranges A
subrange allows the developer to specify the minimum and maximum range of values that an object needs to
be able to represent The implementation is at liberty to allocate whatever resources are needed to satisfy this
requirement (some implementations simply allocate an integers worth of storage, while others allocate the
minimum number of bytes needed)
Coding Guidelines
Why would a developer want to specify the number of bits to be used in an object representation? This level
of detail is usually considered to be a low level implementation information The following are possible
reasons for this usage include:
• Minimizing the amount of storage used by structure objects This remains, and is likely to continue to
remain, an important concern in applications where available storage is very limited (usually for cost
reasons)
• There is existing code, originally designed to run in a limited storage environment The fact that
storage requirements are no longer an issue is rarely a cost-effective rationale for spending resources
on removing bit-field specifications from declarations
• Mapping to a hardware device There are often interfaced via particular storage locations (organized
as sequences of bits), or transfer data is some packed format Being able to mirror the bit sequences of
the hardware using some structure type can be a useful abstraction (which can require the specification
of the number of bits to be allocated to each object)
• Mapping to some protocol imposed layout of bits For instance, the fields in a network data structure
(e.g., TCP headers)
The following are some of the arguments that can be made for not using bit-fields types:
• Many of the potential problems associated with objects declared to have an integer type, whose rank is
less thanint, also apply to bit-fields However, one difference between them is that developers do not480.1 object
int type only
habitually use bit-fields, to the extent that character types are used If developers don’t use bit-fields
out of habit, but put some thought into deciding that their use is necessary a guideline recommendation
would be redundant (treating guideline recommendations as prepackaged decision aids) 0codingguidelines
introduction
representation
Trang 46.7.2.1 Structure and union specifiers
1409
• The specification of bit-field types involves a relatively large number of implementation-definedbehaviors, dealing with how bit-fields are allocated in storage However, recommending against theuse of bit-fields only prevents developers from using one of the available techniques for accessingsequences of bits within objects It is not obvious that bit-fields offer the least cost/benefit of all theavailable techniques (although some coding guideline documents do recommend against the use ofbit-fields)
Bit-fields may be used to interface to some externally imposed storage layout requirements
1405Such a member is called a bit-field ;104)
Commentary
Specifying in words the interpretation to be given to the syntax
Other Languages
Declarations in languages in the Pascal family require the range of values, that need to be representable, to
be specified in the declaration The number of bits used is implementation-defined
If the value 0 or 1 is stored into a nonzero-width bit-field of type_Bool, the value of the bit-field shall compare
equal to the value stored
Trang 56.7.2.1 Structure and union specifiers 1410
Commentary
There is no requirement on implementations to allocate the smallest possible storage unit They may even
allocate more bytes thansizeof(int)
Other Languages
Languages that support some form of object layout specification often require developers to specify the
storage unit and the bit offset, within that unit, where the storage for an object starts 1390 struct/union
syntax
Common Implementations
Many implementations allocate the same storage unit for bit-fields as they do for the typeint The only
difference being that they will often allocate storage for more than one bit-field in such storage units.1410 bit-field
packed into
Implementations that support bit-field types having a rank different fromintusually base the properties of1395 bit-field
shall have type
the storage unit used (e.g., alignment and size) on those of the type specifier used
Coding Guidelines
Like other integer types, the storage unit used to hold bit-field types is decided by the implementation The
shall have type 569.1 represen- tation in- formation
11 if ((&x.m_3 - &x.m_1) == sizeof(int))
13 if ((&x.m_3 - &x.m_1) == 2*sizeof(int))
This is a requirement on the implementation However, any program written to verify what the implementation
has done, has to make use of other implementation-defined behavior This requirement does not guarantee
that all adjacent bit-fields will be packed in any way An implementation could choose its addressable storage
unit to be a byte, limiting the number of bit-fields that it is required to pack However, if the storage unit used
by an implementation is a byte, this requirement means that all members in the following declaration must
allocated storage in the same byte
Trang 66.7.2.1 Structure and union specifiers
alignment 39
values from the two storage units into two registers, followed by a sequence of shift, bitwise-AND, andbitwise-OR operations This wording allows implementation vendors to chose whether they want to supportthis usage, or leave bits in the storage unit unused
Other Languages
Even languages that contain explicit mechanisms for specifying storage layout sometimes allow tions to place restrictions on how objects straddle storage unit boundaries
implementa-Common Implementations
Implementations that do not have alignment restrictions can access the appropriate bytes in a single load
or store instruction and do not usually include a special case to handle overlapping storage units Someprocessors include instructions[985]that can load/store a particular sequence of bits from/to storage
Trang 76.7.2.1 Structure and union specifiers 1414
Common Implementations
While there is no requirement that the ordering be the same for each sequence of bit-field declarations
(within a structure type), it would be surprising if an implementation used a different ordering for different
declarations Many implementations use the allocation order implied by the order in which bytes are allocated
within multibyte objects
Coding Guidelines
The guideline recommendation dealing with the use of representation information is applicable here
569.1 tation in- formation
represen-using
Example
Commentary
This behavior differs from that of the non-bit-field members, which is implementation-defined 1421 member
alignment
C++
The wording in the C++Standard refers to the bit-field, not the addressable allocation unit in which it resides
Does this wording refer to the alignment within the addressable allocation unit?
9.6p1
Alignment of bit-fields is implementation-defined Bit-fields are packed into some addressable allocation unit
Common Implementations
Implementations that support bit-field types having a rank different fromintusually base the properties of1395 bit-field
shall have type
the alignment used on those of the type specifier used
Coding Guidelines
The guideline recommendation dealing with the use of representation information is applicable here
569.1 tation in- formation
represen-using
1414A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.106)
bit-field unnamed
Commentary
Memory mapped devices and packed data sometimes contains sequences of bits that have no meaning
assigned to them (sometimes called holes) When creating a sequence of bit-fields that map onto the
meaningful values any holes also need to be taken into account Unnamed bit-fields remove the need to
create an anonymous name (sometimes called a dummy name) to denote the bit sequences occupied by the
holes In some cases the design of a data structure might involve having some spare bits, between certain
members, for future expansion
Trang 86.7.2.1 Structure and union specifiers
declaresabitsto have type array of bit-field, rather than being a bit-field of an array type (which would alsoviolate a constraint) Similarlypbitshas type pointer to bit-field
bit-field
shall have type1395
One of the principles that the C committee derived from the spirit of C was that an operation should not
spirit of C 14
Trang 96.7.2.1 Structure and union specifiers 1421
expand to a surprisingly large amount of machine code Arrays of bit-fields potentially require the generation
of machine code to perform relatively complex calculations, compared to non-bit-field element accesses, to
calculate out the offset of an element from the array index, and to extract the necessary bits
The C pointer model is based on the byte as the smallest addressable storage unit As such it is not possible53byteaddressable
unit
to express the address of individual bits within a byte
Other Languages
Some languages (e.g., Ada, CHILL, and Pascal) support arrays of objects that only occupy some of the bits of
a storage unit When translating such languages, calling a library routine that extracts the bits corresponding
to the appropriate element is often a cost effective implementation technique Not only does the offset need
to be calculated from the index, but the relative position of the bit sequence within a storage unit will depend
on the value of the index (unless its width is an exact division of the width of the storage unit) Pointers to
objects that do not occupy a complete storage unit are rarely supported in any language
1419105) As specified in 6.7.2 above, if the actual type specifier used isintor a typedef-name defined asint, footnote
This footnote is new in C99
1420106) An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts footnote
106
Commentary
Bit-fields, named or otherwise, are in general useful for padding to conform to externally imposed layouts
Coding Guidelines
By their nature unnamed bit-fields do not provide any naming information that might help reduce the effort
needed to comprehend the source code
1421Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner member
alignment
appropriate to its type
Commentary
The standard does not require the alignment of other kinds of objects to be documented Developers
sometimes need to be able to calculate the offsets of members of structure types (theoffsetofmacro was
introduced into C90 to provide a portable method of obtaining this information) Knowing the size of each
member, the relative order of members, and their alignment requirements is invariably sufficient information 1422 member
address may depend on the types of the members (because of the requirement that a pointer to an object
behave the same as a pointer to the first element of an array having the same object type) 1165additiveoperators
Trang 106.7.2.1 Structure and union specifiers
member
address
increas-ing that increase in the order in which they are declared
Commentary
Although not worded as such, this is effectively a requirement on the implementation It is consistent with
a requirement on the result of comparisons of pointers to members of the same structure object Prior to
• Making use of information on the layout of members to overlay the storage they occupy with otherobjects
By specifying this ordering requirement the committee prevented implementations from using a differentordering (for optimization reasons), increasing the chances that existing practices would continue to work asexpected (these practices also rely on other implementation-defined behaviors) The cost of breaking existing
member
alignment1421
code and reducing the possibility of being able to predict member storage layout was considered to outweighany performance advantages that might be obtained from allowing implementations to choose the relativeorder of members
be minimized A few Pascal (and Ada) implementations reorder members to reduce the storage they use,
or to change alignments to either reduce the total storage requirements or to reduce access costs for somefrequently used members
Common Implementations
The quantity and quality of analysis needed to deduce when it is possible to reorder members of structures hasdeterred implementors from attempting to make savings, for the general case, in this area Some impressivesavings have been made by optimizers[751]for languages that do not make this pointer to member guarantee.Palem and Rabbah[1062]looked at the special case of dynamically allocated objects used to create treestructures; such structures usually requires the creation of many objects having the same type A commoncharacteristic of some operations on tree structures is that an access to an object, using a particular membername, is likely to be closely followed by another access to an object using the same member name Ratherthan simply reordering members, they separated out each member into its own array, based on dynamicprofiles of member accesses (the Trimaran[1399]andgcccompilers were modified to handle this translationinternally; it was invisible to the developer) For instance in:
Trang 116.7.2.1 Structure and union specifiers 1423
dynamically allocating storage for an object having typestruct Tresulted in storage for the two arrays
being allocated A second dynamic allocation request requires no storage to be allocated, the second array
element from the first allocation can be used If tree structures are subsequently walked in an order that is
close to the order in which they are built, there is an increased probability that members having the same name
will be in the same cache line Using a modifiedgccto process seven data intensive benchmarks resulted in
an average performance improvement of 24% on Intel Pentium II and III, and 9% on Sun Ultra-Sparc-II An
analysis of the Olden benchmark using the same techniques by Shin, Kim, Kim and Han[1254]found that L1
and L2 cache misses were reduced by 23% and 17% respectively and cache power consumption was reduced
by 18%
Franz and Kistler[453]describe an optimization that splits objects across non-contiguous storage areas
to improve cache performance However, their algorithm only applies to strongly typed languages where
developers cannot make assumptions about member layout, such as Java
Zhang and Gupta[1545]developed what they called the common-prefix and narrow-data transformations pointer
compressing members
These compress 32-bit integer values and 32-bit address pointers into 15 bits This transformation is
dynamically applied (the runtime system checks to see if the transformation can be performed) to the
members of dynamically allocated structure objects, enabling two adjacent members to be packed into a
32-bit word (a bit is used to indicate a compressed member) The storage optimization comes from the
commonly seem behavior: (1) integer values tend to be small (the runtime system checks whether the top 18
bits are all 1’s or all 0’s), and (2) that the addresses of the links, in a linked data structure, are often close to
the address of the object they refer to (the runtime system checks whether the two addresses have the same
top 17 bits) Extra machine code has to be generated to compress and uncompress members, which increases
code size (average of 21% on the user code, excluding linked libraries) and lowers runtime performance
(average 30%) A reduction in heap usage of approximately 25% was achieved (the Olden benchmarks were
bench-mark
Coding Guidelines
The order of storage layout of the members in a structure type is representation information that is effectively
guaranteed It would be possible to use this information, in conjunction with theoffsetofmacro to write
code to access specific members of a structure, using pointers to other members However, use of information
on the relative ordering of structure members tends not to be code based, but data based (the same object
is interpreted using different types) The coding guideline issues associated with the layout of types are
layout
1423A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, pointer to
structure points at ini- tial member
then to the unit in which it resides), and vice versa
Commentary
Although not worded as such, this is effectively a requirement on the implementation The only reason for
preventing implementations inserting padding at the start of a structure type is existing practice (and the
resulting existing code that treats the address of a structure object as being equal to the address of the first
member of that structure)
Other Languages
Most languages do not go into this level of representation detail
Trang 126.7.2.1 Structure and union specifiers
alignment1421 layout algorithm used by another language, or even that used by another execution environment
The standard does not guarantee that two structure types having exactly the same member types haveexactly the same storage layout, unless they are part of a common initial sequence
Unnamed padding occupies storage bytes within an object The pattern of bits set, or unset, within thesebytes can be accessed explicitly by a conforming program (usingmemcpyormemsetlibrary functions) Theymay also be accessed implicitly during assignment of structure objects It is the values of these bytes that
is a potential cause of unexpected behavior when thememcmp(amongst others) library function is used tocompare two objects having structure type
Trang 136.7.2.1 Structure and union specifiers 1427
member when written to
Other Languages
Pascal supports a construct, called a variant tag, that can be used by implementations to check that the
member being read from was the last member assigned to However, use of this construct does require that
developers explicitly declare such a tag within the type definition A few implementations perform the check
suggested by the language standard Ada supports a similar construct and implementations are required to
perform execution time checks, when a member is accessed, on what it calls the discriminant (which holds
information on the last member assigned to)
Common Implementations
distinguish between different pointer types and different integer types having the same size) where the
member read is different from the last member stored in
1427A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, union
members start same address
then to the unit in which it resides), and vice versa
Commentary
Although not worded as such, this is effectively a requirement on the implementation A consequence of this
requirement is that all members of a union type have the same offset from the start of the union, zero A
previous requirement dealt with pointer equality between different members of the same union object This
1207 pointer
to union members compare equal
C sentence deals with pointer equality between a pointer to an object having the union type and a pointer to
one of the members of such an object
Trang 146.7.2.1 Structure and union specifiers
1428
Other Languages
Strongly typed languages do not usually (Algol 68 does) provide a mechanism that returns the addresses ofmembers of union (or structure) objects The result of this C requirement (that all members have the sameaddress) are not always specified, or implemented, in other languages It may be more efficient on someprocessors, for instance, for members to be aligned differently (given that in many languages unions mayonly be contained within structure declarations and so could follow other members of a structure)
Common Implementations
The fact that pointers to different types can refer to the same storage location, without the need for any form
of explicit type conversion, is something that optimizers performing points-to analysis need to take intoaccount
it is the alignment requirements of the member types, rather than their size, that determines whether there
is any unnamed padding at the end of the union type When one member has a type that often requiresalignment on an even address and another member contains an odd number of bytes, it is likely that someunnamed padding will be used
C++
The only time this possibility is mentioned in the C++Standard is under thesizeofoperator:
5.3.3p2 When applied to a class, the result is the number of bytes in an object of that class including any padding required
for placing objects of that type in an array
Other Languages
The algorithms used to assign offsets to structure members are common to implementations of manylanguages, including the rationale for unnamed padding at the end Few language definitions explicitly callout the fact that structure or union types may have unnamed padding at their end
Common Implementations
Most implementations use the same algorithm for assigning member offsets and creating unnamed paddingfor all structure and union types in a program, even when these types are anonymous (performing the analysis
to deduce whether the padding is actually required is not straight-forward) Such an implementation strategy
is likely to waste a few bytes in some cases But it has the advantage that, for a given implementation andset of translator options, the same structure declarations always have the same size (there may not be anystandard’s requirement for this statement to be true, but there is sometimes a developer expectation that it istrue)
Trang 156.7.2.1 Structure and union specifiers 1429
Coding Guidelines
Unnamed padding is a representation detail associated with storage layout That this padding may occur
after the last declared member is simply another surprise awaiting developers who try to make use of storage
layout details The guideline recommendation dealing with the use of representation information is applicable 1354 storage
layout 569.1 represen- tation in- formation
The Committee introduced this special case, in C99, to provide a standard defined method of using what
has become known as the struct hack Developers sometimes want a structure object to contain an array
object whose number of elements is decided during program execution A standard, C90, well defined,
technique is to have a member point at dynamically allocated storage However, some developers, making
use of representation information, caught onto the idea of simply declaring the last member be an array
of one element Storage for the entire structure object being dynamically allocated, with the storage
allocation request including sufficient additional storage for the necessary extra array elements Because
array elements are contiguous and implementations are not required to perform runtime checks on array
indexes, the additional storage could simply be treated as being additional array elements This C90 usage
causes problems for translators that perform sophisticated flow analysis, because the size of the object being
accessed does not correspond to the size of the type used to perform the access Should such translators play
safe and treat all structure types containing a single element array as their last member as if they will be used
in a struct hack manner?
The introduction of flexible array members, in C99, provides an explicit mechanism for developers to
indicate to the translator that objects having such a type are likely to have been allocated storage to make use
of the struct hack
The presence of a member having an incomplete type does not cause the structure type that contains it to
have an incomplete type
C90
The issues involved in making use of the struct hack were raised in DR #051 The response pointed out
declaring the member to be an array containing fewer elements and then allocating storage extra storage for
additional elements was not strictly conforming However, declaring the array to have a large number of
elements and allocating storage for fewer elements was strictly conforming
Trang 166.7.2.1 Structure and union specifiers
1433
Common Implementations
All known C90 implementations exhibit the expected behavior for uses of the struct hack However, somestatic analysis tools issue a diagnostic on calls to mallocthat request an amount of storage that is notconsistent (e.g., smaller or not an exact multiple) with the size of the type pointed to by any explicit cast ofits return value
Coding Guidelines
Is the use of flexible arrays members more or less error prone than using any of the alternatives?
The struct hack is not widely used, or even widely known about by developers (although there may besome development communities that are familiar with it) It is likely that many developers will not beexpecting this usage Use of a member having a pointer type, with the pointed-to object being allocatedduring program execution, is a more common idiom (although more statements are needed to allocateand deallocate storage; and experience suggests that developers sometimes forget to free up the additionalpointed-to storage, leading to storage leakage)
From the point of view of static analysis the appearance of a member having an incomplete type providesexplicit notification of likely usage While the appearance of a member having a completed array type islikely to be taken at face value Without more information on developer usage, expectations, and kinds ofmistakes made it is not possible to say anything more on these possible usages
1430this is called a flexible array member
The following are some situations where the member is ignored:
• forming part of a common initial sequence, even if it is the last member,
• compatibility checking across translation units, and
• if an initializer is given in a declaration (this is consistent with the idea that the usage for this type is toallocate variably sized objects viamalloc)
1432
structure
size with
that replaces the flexible array member with an array of unspecified length.106)In particular, the size of thestructure is as if the flexible array member were omitted except that it may have more trailing padding than theomission would imply
Commentary
The C99 specification required implementations to put any padding before the flexible array member.However, several existing implementations (e.g., GNU C, Compaq C, and Sun C) put the padding after theflexible array member Because of the efficiency gains that might be achieved by allowing implementations
to put the padding after the flexible array member the committee decided to sanction this form of layout
The wording was changed by the response to DR #282
Trang 176.7.2.1 Structure and union specifiers 1436
1433SecondHowever, when a.(or->) operator has a left operand that is (a pointer to) a structure with a flexible
array member and the right operand names that member, it behaves as if that member were replaced with the
longest array (with the same element type) that would not make the structure larger than the object being
accessed;
Commentary
The structure object acts as if it effectively grows to fill the available space (but it cannot shrink to smaller
than the storage required to hold all the other members)
1434the offset of the array shall remain that of the flexible array member, even if this would differ from that of the
replacement array
Commentary
This is a requirement on the implementation It effectively prevents an implementation inserting additional
padding before the flexible array member, dependent on the size of the array Fixing the offset of the flexible
array member makes it possible for developers to calculate the amount of additional storage required to
accommodate a given number of array elements
1435If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any
attempt is made to access that element or to generate a pointer one past it
insufficient storage has been allocated (assumingsizeof(float) != 1) for there to be more than zero
elements in the array type of the membermem_2 However, the requirements in the C Standard are written on
the assumption that it is not possible to create a zero sized object, hence this as-if specification
Other Languages
Few languages support the declaration of object types requiring zero bytes of storage
1436EXAMPLE Assuming that all array members are aligned the same, after the declarations: EXAMPLE
flexible member struct s { int n; double d[]; };
struct ss { int n; double d[1]; };
the three expressions:
sizeof (struct s) offsetof(struct s, d) offsetof(struct ss, d)
have the same value The structurestructshas a flexible array memberd.
Ifsizeof (double)is 8, then after the following code is executed:
struct s *s1;
struct s *s2;
s1 = malloc(sizeof (struct s) + 64);
s2 = malloc(sizeof (struct s) + 46);
Trang 186.7.2.2 Enumeration specifiers
1439
and assuming that the calls tomallocsucceed, the objects pointed to bys1ands2behave, for most purposes,
as if the identifiers had been declared as:
struct { int n; double d[8]; } *s1;
struct { int n; double d[5]; } *s2;
Following the further successful assignments:
s1 = malloc(sizeof (struct s) + 10);
they then behave as if the declarations were:
struct { int n; double d[1]; } *s1, *s2;
Trang 196.7.2.2 Enumeration specifiers 1439
1439
tion specifier syntax
enumera-enum-specifier:
enum identifier opt { enumerator-list }
enum identifier opt { enumerator-list , }
enum identifier enumerator-list:
Many languages do not use a keyword to denote an enumerated type, the type is implicit in the general
declaration syntax Those languages that support enumeration constants do not always allow an explicit
value to be given to an enumeration constant The value is specified by the language specification (invariably
using the same algorithm as C, when no explicit values are provided)
Common Implementations
Support for enumeration constants was not included in the original K&R specification (support for this
functionality was added during the early evolution of C[1199]) Many existing C90 implementations support a
trailing comma at the end of anenumerator-list
Coding Guidelines
A general discussion on enumeration types is given elsewhere 517enumerationset of named
constants
The order in which enumeration constants are listed in an enumeration type declaration often follows
some rule, for instance:
• Application conventions (e.g., colors of rainbow, kings of England, etc.)
• Human conventions (e.g., increasing size, direction— such as left-to-right, or clockwise, alphabetic
order, etc.)
• Numeric values (e.g., baud rate, Roman numerals, numeric value of enumeration constant, etc.)
Trang 20initialized enumeration constants in definition
organize information for later recall), there may be more than one possible ordering, or it may not be possible
to create a meaningful ordering For this reason no guideline recommendation is made here
Do the visual layout factors that apply to the declaration of objects also apply to enumeration constants?
init-declarator
one per source line1348.1
The following are some of the differences between the declarations of enumeration constants and objects:
• There are generally significantly fewer declarations of enumerator constants than objects, in a program(which might rule out a guideline recommendation on the grounds of applying to a construct that rarelyoccurs in source)
• Enumeration constants are usually declared amongst other declarations at file scope (i.e., they are notvisually close to statements) One consequence of this is that, based on declarations being read on
as as-needed basis, the benefits of maximizing the amount of surrounding code that appears on the
reading
kinds of
770
display at the same time are likely to be small
The following guideline recommendation is given for consistency with other layout recommendations
Cg1439.1
No more than one enumeration constant definition shall occur on each visible source code line
The issue of enumeration constant naming conventions is discussed elsewhere
Table 1439.1: Some properties of the set of values (the phrase all values refers to all the values in a particular enumeration definition) assigned to the enumeration constants in enumeration definitions Based on the translated form of this book’s benchmark programs.
All value assigned implicitly 60.1 All values are bitwise distinct and zero is not used 8.6 One or more constants share the same value 2.9 All values are continuous , i.e , number of enumeration constants equals maximum value minus minimum value plus 1
80.4
Trang 216.7.2.2 Enumeration specifiers 1441
Constraints
1440The expression that defines the value of an enumeration constant shall be an integer constant expression that
enumera-tion constant representable
in int
has a value representable as anint.
Commentary
This constraint is consistent with the requirement that the value of a constant be in the range of representable
values for its type Enumeration constants are defined to have typeint 823constantrepresentable in its
type
864 enumeration constant type
C++
7.2p1
Theconstant-expressionshall be of integral or enumeration type
7.2p4
If an initializer is specified for an enumerator, the initializing value has the same type as the expression
Source developed using a C++translator may contain enumeration initialization values that would be a
constraint violation if processed by a C translator
1 #include <limits.h>
2
Common Implementations
Some implementations support enumeration constants having values that are only representable in the types
unsigned int,long, orunsigned long
Coding Guidelines
The requirement is that the constant expression have a value that is representable as anint The only
requirement on its type is that it be an integer type The constant expression may have a type other thanint
because of the use of a macro name that happens to have some other type, or because one of its operands
happens to have a different type If the constant expression consists, in the visible source, of an integer
constant containing a suffix, it is possible that the original author or subsequent readers may assume some
additional semantics are implied However, such occurrences are rare and for this reason no guideline
covering this case is given here
There may be relationships between different enumeration constants in the same enumeration type The
issue of explicitly showing this relationship in the definition, using the names of those constants rather than
purely numeric values, is a software engineering one and is not discussed further in these coding guidelines
2
4
The enumeration constants defined in by an enumerated type are a set of identifiers that provide a method of
naming members having a particular property These properties are usually distinct and in many cases the
values used to represent them are irrelevant
Semantics
Trang 226.7.2.2 Enumeration specifiers
1442
1441The identifiers in an enumerator list are declared as constants that have typeintand may appear wherever
the closing brace, the type of each enumerator is the type of its initializing value
In C the type of an enumeration constant is alwaysint, independently of the integer type that is compatiblewith its enumeration type
17 }
Other Languages
Most languages that contain enumerator types treat the associated enumerated constants as belonging to
a unique type that is not compatible with typeint In these languages an enumeration constant must beexplicitly cast (Pascal provides a built-in function,ord) before they can appear where a constant having type
intmay appear
Trang 236.7.2.2 Enumeration specifiers 1445
Other Languages
Not all languages that support enumeration constants allow the value, used to represent them during program
execution, to be specified in their definition
Coding Guidelines
Some guideline documents recommend against assigning an explicit value to an enumeration constant
Such recommendations limit enumeration types to having a symbolic role only It has the effect of giving
developers no choice but to use object-like macros to create sets of identifiers having bit-set roles Using 1931 macro
object-like
macros instead of enumerations makes it much more difficult for static analysis tools to deduce an association
between identifiers (it may still be made apparent to human readers by grouping of macro definitions and
appropriate commenting), which in turn will reduce their ability to flag suspicious use of such identifiers
1443If the first enumerator has no=, the value of its enumeration constant is 0.
Commentary
This choice is motivated by common usage and the fact that arrays are zero based Most enumeration types
contain relatively few enumeration constants and many do not explicitly assign a value to any of them 298limitenumeration
constants
Other Languages
This is the common convention specified by other languages, or by implementations of other languages that
do not specify the initial value
1444Each subsequent enumerator with no = defines its enumeration constant as the value of the constant
expression obtained by adding 1 to the value of the previous enumeration constant
Commentary
If the previous enumeration constant had the valueMAX_INT, adding one will produce a value that cannot be
representable in int
Other Languages
This is the common convention specified by other languages, or by implementations of other languages that
do not specify the initial value
1445(The use of enumerators with=may produce enumeration constants with values that duplicate other values in
the same enumeration.)
Commentary
When such enumeration constants are tested for equality with each other the result will be 1 (true), because it
is their values not their spellings that are compared
C++
The C++Standard does not explicitly mention this possibility, although it does give an example, 7.2p2, of an
enumeration type containing more than one enumeration constant having the same value
Other Languages
No languages known to your author, that support the explicit definition of enumeration constant values,
prohibits the appearance of duplicate values in the same enumeration
Coding Guidelines
There are two ways in which more than one enumeration constant, in the same enumerated type, can have
the same value Either the values were explicitly assigned, or the at least one of the values was implicitly
assigned its value This usage may be an oversight, or it may be intentional (i.e., fixing the names of the
first and last enumeration constant when it is known that new members may be added at a later date) These
guideline recommendations are not intended to recommend against the creation of faults in code What of 0 guidelines
not faults
the intended usage?
Trang 246.7.2.2 Enumeration specifiers
1447
Do readers of the source assume there are no duplicate values among different enumeration constants, fromthe same enumerated type? Unfortunately use of enumerations constants are not sufficiently common amongdevelopers to provide the experience needed to answer this question
1446The enumerators of an enumeration are also known as its members
Each enumerated type shall be compatible with an integer type;
The integer types include the enumeration types The change of wording in the C99 Standard removes a
integer types 519
circularity in the specification
C++
The underlying type of an enumeration may be an integral type that can represent all the enumerator values
While it is possible that source developed using a C++translator may select a different integer type than aparticular C translator, there is no effective difference in behavior because different C translators may alsoselect different types
Other Languages
Most languages that support enumerated types treat such types as being unique types, that is not compatiblewith any other type
Trang 256.7.2.2 Enumeration specifiers 1448
Coding Guidelines
Experience shows that developers are often surprised by some behaviors that occur when a translator selects
a type other thanintfor the compatible type The two attributes that developers appear to assume an
enumerated type to have are promoting to a signed type (rather than unsigned) and being able to represent
all the values that typeintcan (if values other than those in the enumeration definition are assigned to the
object)
If the following guideline recommendation on enumerated types being treated as not being compatible
with any integer type is followed, these assumptions are harmless
Experience with enumerated types in more strongly typed languages has shown that the diagnostics issued
when objects having these types, or their members, are mismatched in operations with other types, are a very
effective method of locating faults Also a number of static analysis tools[502, 694, 1176]perform checks on the
use of objects having an enumerated type and their associated enumeration constants1447.1
1448The choice of type is implementation-defined,108)but shall be capable of representing the values of all the
members of the enumeration
Commentary
This is a requirement on the implementation
C90
The requirement that the type be capable of representing the values of all the members of the enumeration
was added by the response to DR #071
Other Languages
Languages that support enumeration types do not usually specify low level implementation details, such as
the underlying representation
Common Implementations
Most implementations chose the typeint A few implementations attempt to minimize the amount of storage
occupied by each enumerated type They do this by selecting the compatible type to be the integer type with
the lowest rank, that can represent all constant values used in the definition of the contained enumeration
constants
1447.1 However, this is not necessarily evidence of a worthwhile benefit Vendors do sometimes add features to a product because of a
perceived rather actual benefit.
Trang 26A definition of an enumeration type may not include (most don’t) enumeration constants for each of thepossible values that can be represented in the underlying value representation (invariably some integer type).The guideline recommendation that both operands of a binary operator have the same enumerated type limits,
be expected to apply to the given enumeration type (these are discussed in the C sentence for the respectiveoperators) The following example shows two possibilities:
25 }
1449The enumerated type is incomplete until after the}that terminates the list of enumerator declarations
Trang 276.7.2.2 Enumeration specifiers 1453
Following the closing brace of anenum-specifier, each enumerator has the type of its enumeration
Example
The definition:
is not permitted (it is not possible to take the size of an incomplete type) But:
is conforming
1450EXAMPLE The following fragment:
enum hue { chartreuse, burgundy, claret=20, winedark };
enum hue col, *cp;
col = claret;
cp = & col;
if (*cp != burgundy) /* */
makeshuethe tag of an enumeration, and then declarescolas an object that has that type andcpas a
pointer to an object that has that type The enumerated values are in the set { 0, 1, 20, 21 }
and in Ada as:
1 type
3 for hue use (chartreuse => 0, burgundy => 1, claret => 20, winedark => 21);
1451Forward references: tags (6.7.2.3).
1452107) Thus, the identifiers of enumeration constants declared in the same scope shall all be distinct from each footnote
107
other and from other identifiers declared in ordinary declarators
Commentary
This requirement can be deduced from the fact that enumeration constants are in the same name space as
ordinary identifiers, they have no linkage, and that only one identifier with these attributes shall (a constraint)444 name space
C++
The C++Standard does not explicitly make this observation
Other Languages
Ada permits the same identifier to be defined as an enumeration constant in a different enumerated type in
the same scope References to such identifiers have to be explicitly disambiguated
Trang 286.7.2.3 Tags
1455
1453108) An implementation may delay the choice of which integer type until all enumeration constants have been
560 storage requirements needed to represent an object having an enumerated type until all of the members
of that type had been processed In the example below, a single pass implementation, that minimizes thestorage allocated, and uses different representations for pointers to different scalar types, would not be able
to evaluatesizeof(enum e_T *)at the point its value is needed to give a value toe2
This behavior is required of a C++implementation because:
7.2p5 The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in
C++requirement that enforces the C one given above
Trang 296.7.2.3 Tags 1456
1455
tag name same struct, union or enum
Where two declarations that use the same tag declare the same type, they shall both use the same choice of
struct, union, or enum.
The C90 Standard did not explicitly specify this constraint While the behavior was therefore undefined, it
is unlikely that the behavior of any existing code will change when processed by a C99 translator (and no
difference is flagged here)
C++
7.1.5.3p3
Theclass-keyorenumkeyword present in theelaborated-type-specifiershall agree in kind with the
declaration to which the name in theelaborated-type-specifierrefers
Incomplete types are needed to support the declaration of mutually recursive structure and union types It is
not possible to create a mutually recursive enumerated type and a declaration making use of self-referencing
recursion is an edge case that does not appear to be of practical use
C90
This C99 requirement was not specified in C90, which did not containing any wording that ruled out the
declaration of an incomplete enumerated type (and confirmed by the response to DR #118) Adding this
constraint brings the behavior of enumeration types in line with that for structure and union types 1118 sizeof
constraints
Source code containing declarations of incomplete enumerator types will cause C99 translators to issue a
diagnostic, where a C90 translator was not required to issue one
C++
3.3.1p5
[Note: if theelaborated-type-specifierdesignates an enumeration, theidentifiermust refer to an
already declaredenum-name
3.4.4p2
Trang 306.7.2.3 Tags
1458
If theelaborated-type-specifierrefers to anenum-nameand this lookup does not find a previously declared
enum-name, theelaborated-type-specifieris ill-formed
declared in one file422.1
a tag, denoting the same type, is declared more than once is when its type refers to another type in somemutually recursive way
Trang 316.7.2.3 Tags 1460
1458The type is incomplete109)until the closing brace of the list defining the content, and complete thereafter tag
incomplete until
Commentary
The closing brace that defines its content may occur in a separate declaration Incomplete types are one of
the three kinds of types defined in C The only other incomplete type isvoid, which can never be completed.475 incomplete
types
523 void
is incomplete type
10 }
11
C90
The C99 Standard more clearly specifies the intended behavior, which had to be inferred in the C90 Standard.1457tag dec-larations
same scope
C++
The C++Standard specifies this behavior for class definitions (9.1p1), but does not explicitly specify this
behavior for declarations in different scope
Coding Guidelines
If the guideline recommendation dealing with the reuse of identifier names is followed there will never be 792.3 identifier
reusing names
two distinct types with the same name The case of distinct tags being declared with function prototype scope
does not need a guideline recommendation Such a declaration will render the function uncallable, as no type
can be declared to be compatible with its parameter type A translator will issue a diagnostic if a call to it
occurs in the source
1460Each declaration of a structure, union, or enumerated type which does not include a tag declares a distinct struct/union
declaration
no tag
type
Commentary
A declaration of a structure or union type that includes a tag may declare a distinct type, or it may refer to a
previously declared distinct type
If one of the identifiers declared is a typedef name, it will be possible to refer to the type in other contexts.1468 footnote
110
If the identifier being declared is an object there is no standard defined way of referring to its type Such
types are sometimes known as anonymous types
Two types have compatible type if they are the same Types that are distinct are not the same 631compati-ble type
if
Trang 32A type specifier of the form
struct-or-union identifier opt { struct-declaration-list }
or
enum identifier { enumerator-list }
or
enum identifier { enumerator-list , }
declares a structure, union, or enumerated type
Commentary
This specification provides semantics for a subset of the possible token sequences supported by the syntax of
type-specifier The difference between this brace delimited form and the semicolon terminated form is
type specifier
syntax
1378
struct tag; 1464 similar to the difference between the brace delimited and semicolon terminated form of function declarations
(i.e., one specifies content and the other doesn’t)
content
list defines
Commentary
This defines the terms structure content, union content, or enumeration content, which is the content referred
to by the constraint requirement The content is the members of the type declared, plus any type declarations
Trang 33The term tag is not used in C++, which calls the equivalent construct a class name.
Table 1463.1: Occurrence of types declared with tag names (as a percentage of all occurrences of each keyword) Based on the
visible form of the c and h files.
This form of declaration either declares, or redeclares, the identifier, as a tag, in the current scope The
following are some of the uses for this form of declaration:
• To support mutually referring declarations when there is the possibility that a declaration of one of the
structures
• To provide a mechanism for information hiding Developers can declare a tag in an interface without
specifying the details of a types implementation,
• In automatically generated code, where the generator does not yet have sufficient information to fully
define the content of the type, but still needs to refer to it
1465109) An incomplete type may only by used when the size of an object of that type is not needed footnote
109 size needed
Commentary
When is the size of an object not needed? Who, or what needs the size and when do they need it?
The implementation needs the size of objects to allocate storage for them When does storage need to1354 object
reserve storage
be allocated for an object? In theory, not until the object is encountered during program execution (and
in practice for a few languages) However, delaying storage allocation until program execution incurs a
high-performance penalty Knowing the size during translation enables much more efficient machine code to
be generated Also, knowing the size when the type is first encountered (if the size has to be known by the
implementation) can simplify the job of writing a translator (many existing translators operated in a single
single pass
The size of an object having an incomplete array type is not needed to access an element of that array 728
incom-plete array indexing
The Committee responses to defect reports (e.g., DR #017) asking where the size of an object is needed
do not provide a list of places Now the wording has been moved to a footnote, perhaps this discussion will
subside
C90
Trang 346.7.2.3 Tags
1466
It declares a tag that specifies a type that may be used only when the size of an object of the specified type is notneeded
The above sentence appears in the main body of the standard, not a footnote
The C99 wording is more general in that it includes all incomplete types This is not a difference in
The C++Standard contains no such rule, but enumerates the cases:
3.9p8 [Note: the rules for declarations and expressions describe in which contexts incomplete types are prohibited ]
Other Languages
Knowing the size of objects is an issue in all computer languages When the size needs to be known issometimes decided by high-level issues of language design (some languages require their translators toeffectively perform more than one pass over the source code), other times it is decided by implementationtechniques
semantic processing An optimizer may perform multiple passes over the internal representation of statements
in a function, deciding how best to generate machine code for them
1466
It is not needed, for example, when a typedef name is declared to be a specifier for a structure or union, or
size not needed
Common Implementations
This is one area where vendors are often silent on how their language extensions operate For instance, the
gcc typeofoperator returns the type of its operand However, the associated documentation says nothingabout the case of the operand type being incomplete and having a tag that is identical to another definitionoccurring within the scope that thetypeofoccurred One interpretation (unsupported by any specificationfrom the vendor) of the following:
Trang 35is that bothxandyare being declared as being pointers to the type off, that is an incomplete type, and that
the declaration of the tagfred, in a nested scope, has no effect on the declaration ofy
In these contexts the commonly used methods for mapping source code to machine code need to know the
number of bytes in a types object representation
C90
The specification shall be complete before such a function is called or defined
The form of wording has been changed from appearing to be a requirement (which would not be normative
in a footnote) to being commentary
1468110) If there is no identifier, the type can, within the translation unit, only be referred to by the declaration of footnote
xandyare compatible with each other They both have the same anonymous type, but the objectzhas a
different anonymous type Note that the types of the objectsx,y, andzwould be considered to be compatible
if they occurred in different translation units 633compatibleseparate
transla-tion units
C90
This observation was is new in the C90 Standard
Trang 376.7.2.3 Tags 1472
2
C++
7.1.5.3p1
If an elaborated-type-specifieris the sole constituent of a declaration, the declaration is ill-formed
unless
The C++Standard does not listenum identifier ;among the list of exceptions and a conforming C++
translator is required to issue a diagnostic for any instances of this usage
The C++Standard agrees with this footnote for its second reference in the C90 Standard
1471 or-union identifier not visible
struct-or-union identifier not visible
struct-or-union identifier
occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible,
then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.111
Commentary
The forms ofstruct-or-union identifier, excluded by this wording, are the identifier being followed
by a semicolon or a left brace The remaining possible occurrences of this form are described elsewhere and 1466size notneeded
examples
include:
C++
The C++Standard does not explicitly discuss this kind of construction/occurrence, although 3.9p6 and 3.9p7
discuss this form of incomplete type
Coding Guidelines
When no other declaration is visible at the point this type specifier occurs, should this usage be permitted?
Perhaps it was intended that a tag be visible at the point in the source where this type specifier occurs
However, not having a prior declaration visible is either harmless (intended or otherwise), or will cause a
diagnostic to be issued by a translator
A pointer to an incomplete structure or union type is a more strongly typed form of generic pointer than
a pointer to void Whether this use of pointer to incomplete types, for information hiding purposes, is
worthwhile can only be decided by the developer
struct-or-union identifier visible
struct-or-union identifier
or
enum identifier
occurs other than as part of one of the above forms, and a declaration of the identifier as a tag is visible, then
it specifies the same type as that other declaration, and does not redeclare the tag
Trang 386.7.2.3 Tags
1473
Commentary
The forms ofstruct-or-union identifier, excluded by this wording, are the identifier being followed
by a semicolon or a left brace This is the one form that is not a declaration
a member selection operator to a subscript operator) Also experience suggests that this kind of change
in type is not common
• Changes in an objects structure type may occur as a program evolves For instance, the objectxmayhave structure typet_1because it needs to represent information denoted by a few of the members ofthat type At a later time the typet_1may be subdivided into several structure types, with the membersreferenced byxbeing declared in the typet_1_3 Developers then have the choice of changing thedeclaration ofxto bet_1_3, or leaving it alone However, the prior use of a typedef name, rather than
a tag, is unlikely to result in any cost savings, when changing the declaration ofx(i.e., developers arelikely to have declaredxto have typet_1, rather than a synonym of that type, so the declaration ofx
will either have to be edited)
• What are the cognitive costs and benefits associated with the presence, or absence of a keyword inthe source of a declaration? There is a cost to readers in having to process an extra token (i.e., thekeyword) in the visible source, or any benefits, to readers of the visible source However, the visualpresence of this keyword may reduce the cognitive effort needed to deduce the kind of declarationbeing made There does not appear to be a significant cost/benefit difference between any of thesecognitive issues
1473EXAMPLE 1 This mechanism allows declaration of a self-referential structure
struct tnode { int count;
struct tnode *left, *right;
The following alternative formulation uses thetypedefmechanism:
Trang 39∗ no tag in definition
Figure 1472.1: Number of files containing a given number of references to each tag previously defined in the visible source
of that file (times, bullet, square; the definition itself is not included in the count), tags with no definition visible in the c file
(triangle; i.e., it is defined in a header) and anonymous structure/union/enumeration definitions (star) Based on the visible form
of the c files.
typedef struct tnode TNODE;
struct tnode { int count;
TNODE *left, *right;
refer-the declarations
struct s1 { struct s2 *s2p; /* */ }; // D1 struct s2 { struct s1 *s1p; /* */ }; // D2
specify a pair of structures that contain pointers to each other Note, however, that ifs2were already declared
as a tag in an enclosing scope, the declarationD1would refer to it, not to the tags2declared inD2 To eliminate
this context sensitivity, the declaration
struct s2;
may be inserted ahead ofD1 This declares a new tag s2in the inner scope; the declarationD2then completes
the specification of the new type
Trang 40optimization techniques are based on the principle of “cacheing”: under certain circumstances the compilercan remember the last value accessed (read or written) from a location, and use this retained value the nexttime that location is read (The memory, or “cache”, is typically a hardware register.) If this memory is amachine register, for instance, the code can be smaller and faster using the register rather than accessingexternal memory.