The term constant is used because the numeric values do not change during program execution and are known at translation time; although in some cases a person reading the source may only
Trang 1The standard does specify a minimum limit on the number of characters a translator must consider as
significant Implementations are free to ignore characters once this limit is reached The ignored characters
282 internal identifier significant charac- ters
283 external identifier significant charac- ters
do not form part of another token It is as if they did not appear in the source at all
C90
The C90 Standard does not explicitly state this fact
Other Languages
Few languages place limits on the maximum length of an identifier that can appear in a source file Like C,
some specify a lower limit on the number of characters that must be considered significant
Coding Guidelines
Using a large number of characters in an identifier spelling has many potential benefits; for instance, it
provides the opportunity to supply a lot of information to readers, or to reduce dependencies on existing
reader knowledge by spelling words in full rather than using abbreviations There are also potential costs;
for instance, they can cause visual layout problems in the source (requiring new-lines within an expression
in an attempt to keep the maximum line length within the bounds that can be viewed within a fixed-width
window), or increase the cognitive effort needed to visually scan source containing them
The length of an identifier is not itself directly a coding guideline issue However, length is indirectly
involved in many identifier memorability, confusability, and usability issues, which are discussed elsewhere.792 identifier
syntax
Usage
The distribution of identifier lengths is given in Figure792.7
UCN
Commentary
Using other UCNs results in undefined behavior (in some cases even using these UCNs can be a constraint
violation) These character encodings could be thought of as representing letters in the specified national816UCNsnot basic
A collating sequence may not be defined for these universal character names In practice a lack of a defined
collating sequence is not an implementation problem Because a translator only ever needs to compare the
spelling of one identifier for equality with another identifier, which involves a simple character-by-character
comparison (the issue of the ordering of diacritics is handled by not allowing them to occur in an identifier)
Support for this functionality is new and the extent to which implementations are likely to check that
UCN values fall within the list given in annex D is not known
Trang 2Coding Guidelines
The intended purpose for supporting universal character names in identifiers is to reduce the developer effortneeded to comprehend source Identifiers spelled in the developer’s native tongue are more immediatelyrecognizable (because of greater practice with those characters) and also have semantic associations that aremore readily brought to mind
The ISO 10646 Standard does not specify which languages contain the characters it specifies (although itISO 10646 28
does give names to some sets of characters that correspond to a language that contains them) The writtenform of some human languages share common characters; for instance, the characters a through z (and theiruppercase forms) appear in many European orthographies The following discussion refers to using UCNsorthography 792
from more than one human language This is to be taken to mean using UCNs that are not part of the writtenform of the native language of the developer (the case of developers having more than one native language
is not considered) For instance, the character a is used in both Swedish and German; the character û isused in Swedish, but not German; the character ß is used in German but not Swedish Both Swedish andGerman developers would be familiar with the character a, but the character ß would be considered foreign
to a Swedish developer, and the character û foreign to the German
Some coding guideline documents recommend against the use of UCNs Their use within identifierscan increase the portability cost of the source The use of UCNs is an economic issue; the potential cost
of not permitting their use in identifiers needs to be compared against the potential portability benefits.(Alternatively, the benefits of using UCNs could be compared against the possible portability costs.)
Given the purpose of using UCNs, is there any rationale for identifiers to contain characters from morethan one human language? As an English speaker, your author can imagine a developer wanting to use
an English word, or its common abbreviation, as a prefix or suffix to an identifier name Perhaps an Urduspeaker can imagine a similar usage with Urdu words The issue is whether the use of characters in the sameidentifier from different human languages has meaning to the developers who write and maintain the source.Identifiers very rarely occur in isolation Should all the identifiers in the same function, or even sourcefile, only contain UCNs that form the set of characters used by a single human language? Using charactersfrom different human languages when it is possible to use only characters from a single language, potentiallyincreases the cost of maintenance Future maintainers are either going to have to be familiar with theorthography and semantics of the two human languages used or spend additional time processing instances ofidentifiers containing characters they are not familiar with However, in some cases it might not be possible
to enforce a single human language rule For instance, a third-party library may contain callable functionswhose spellings use characters from a human language different from that used in the source code thatcontains calls to it
Support for the use of UCNs in identifiers is new in C99 (and other computer languages) and at the time
of this writing there is almost no practical experience available on the sort of mistakes that developers makewith them
Trang 3Table 797.1: The Unicode digit encodings.
Encoding Range Language Encoding Range Language 0030–0039 ISO Latin-1 0BE7–0BEF Tamil (has no zero) 0660–0669 Arabic–Indic 0C66–0C6F Telugu
06F0–06F9 Eastern Arabic–Indic 0CE6–0CEF Kannada 0966–096F Devanagari 0D66–0D6F Malayalam 09E6–09EF Bengali 0E50–0E59 Thai 0A66–0A6F Gurmukhi 0ED0–0ED9 Lao 0AE6–0AEF Gujarati FF10–FF19 Fullwidth 0B66–0B6F Oriya digits
C++
This requirement is implied by the terminal non-name used in the C++syntax Annex E of the C++Standard
does not list any UCN digits in the list of supported UCN encodings
Other Languages
Java has a similar requirement
Coding Guidelines
The extent to which different cultural conventions support the use of a digit as the first character in an
identifier is not known to your author At some future date the Committee may chose to support the writing
of integer constants using UCNs If this happens, any identifiers that start with a UCN designating a digit
are liable to result in syntax violations There does not appear to be a worthwhile benefit in a guideline
recommendation dealing with the case of an identifier beginning with a UCN designating a digit
in identifiers;
Commentary
Prior to C99 there was no standardized method of representing nonbasic source character set characters
in the source code Support for multibyte characters in string literals and constants was specified in C90;
some implementations extended this usage to cover identifiers They are now officially sanctioned to do this
Support for the ISO 10646 Standard is new in C99 However, there are a number of existing implementations 28 ISO 10646that use a multibyte encoding scheme and this usage is likely to continue for many years The C committee
recognized the importance of this usage and do not force developers to go down a UCN-only path
The standard says nothing about the behavior of the_ _func_ _reserved identifier in the case when a810 func function name is spelled using wide characters
implementation-defined mapping of the source file characters, and an implementation may choose to support
multibyte characters in identifiers via this route
Trang 4Other Languages
While other language standards may not mention multibyte characters, the problem they address is faced byimplementations of those languages For this reason, it is to be expected that some implementations of otherlanguages will contain some form of support for multibyte characters
existing software is converted to use it
Common Implementations
It is common to find translators aimed at the Japanese market supporting JIS, shift-JIS, and EUC encodings(see Table243.3) These encoding use different numeric values than those given in ISO 10646 to representthe same national character
800
When preprocessing tokens are converted to tokens during translation phase 7, if a preprocessing token could
be converted to either a keyword or an identifier, it is converted to a keyword
Commentary
The Committee could have created a separate name space for keywords and allowed developers to defineidentifiers having the same spelling as a keyword The complexity added to a translator by such a specificationwould be significant (based on implementation experience for languages that support this functionality),while a developer’s inability to define identifiers having these spellings was considered a relatively smallinconvenience
C90
This wording is a simplification of the convoluted logic needed in the C90 Standard to deduce from aconstraint what C99 now says in semantics The removal of this C90 constraint is not a change of behavior,since it was not possible to write a program that violated it
Trang 5Extended characters were not available in C90, so the suggestion in this footnote does not apply 215 extended
characters
Other Languages
Issues involving third-party linkers are common to most language implementations that compile to machine
code Some languages, for instance Java, define the characteristics of an implementation at translation
and execution time The Java language specification goes to the extreme (compared to other languages) of
specifying the format of the generated file object code file
Common Implementations
There is a long-standing convention of prefixing externally visible identifier names with an underscore
character when information on them is written out to an object file There is little experience available on
implementation issues involving UCNs, but many existing linkers do assume that identifiers are encoded
using 8-bit characters
Coding Guidelines
The encoding of external identifiers only needs to be considered when interfacing to, or from code written in
another language Cross-language interfacing is outside the scope of these coding guidelines
universal character name
Commentary
Some linkers may not support an occurrence of the backslash (\) character in an identifier name One solution
to this problem is to create names that cannot be declared in the source code by the developer; for instance,
by deleting the\characters and prefixing the name with a digit character
Common Implementations
There are no standards for encoding of universal character names in object files The requirement to support
this form of encoding is too new for it to be possible to say anything about common encodings
Commentary
Here the word long does not have any special meaning It simply suggests an identifier containing many
characters
282 internal identifier significant charac- ters
All characters are significant.20)
C identifiers that differ after the last significant character will cause a diagnostic to be generated by a C++
translator
Annex B contains an informative list of possible implementation limits However, “ these quantities
are only guidelines and do not determine compliance.”
Trang 6Internal identifiers only need to be processed by the translator and the standard is in a strong position to
2.10p1 All characters are significant.20)
References to the same C identifier, which differs after the last significant character, will cause a diagnostic
to be generated by a C++translator
There is also an informative annex which states:
Annex Bp2 Number of initial characters in an internal identifier or a macro name [1024]
Number of initial characters in an external identifier [1024]
Trang 7Coding Guidelines
While the C90 minimum limits for the number of significant characters in an identifier might be considered
unacceptable by many developers, the C99 limits are sufficiently generous that few developers are likely to
complain
Automatically generated C source sometimes relies on a large number of significant characters in an
identifier This can occur because of the desire to simplify the implementation of the generator Character
sequences in different offsets within an identifier might be reserved for different purposes Predefined default
character sequence is used to pad the identifier spelling where necessary
As the following example shows, it is possible for a program’s behavior to change, both when the number
of significant identifiers is increased and when it is decreased
1 /*
2 * Yes, C99 does specify 64 significant characters in an internal
3 * identifier But to keep this example within the page width
4 * we have taken some liberties.
14 * If there are 34 significant characters, the following operand
15 * will resolve to the locally declared object.
16 *
17 * If there are 35 significant characters, the following operand
18 * will resolve to the globally declared object.
28 * If there are 34 significant characters, the following operand
29 * will resolve to the globally declared object.
30 *
31 * If there are 33 significant characters, the following operand
32 * will resolve to the locally declared object.
34 _1 _2 _3 _bb++;
35 }
The following issues need to be addressed:
• All references to the same identifier should use the same character sequence; that is, all characters are
intended to be significant References to the same identifiers that differ in nonsignificant characters
need to be treated as faults
• Within how many significant characters should different identifiers differ? Should identifiers be
required to differ within the minimum number of significant characters specified by the standard, or
can a greater number of characters be considered significant?
Readers do not always carefully check all characters in the spelling of an identifier The contribution made by
characters occurring in different parts of an identifier will depend on the pattern of eye movements employed
Trang 8..
Figure 806.1: Occurrence of unique identifiers whose significant characters match those of a different identifier (as a percentage
of all unique identifiers in a program), for various numbers of significant characters Based on the visible form of the c files.
by readers, which in turn may be affected by their reasons for reading the source, plus cultural factors (e.g.,reading
kinds of
770
direction in which they read text in their native language, or the significance of word endings in their nativelanguage) Characters occurring at both ends of an identifier are used by readers (at least native English- andidentifiers
Identifiers that differ in a single significant character may be considered to be
• different identifiers by a translator, but considered to be the same identifier by some readers of thesource (because they fail to notice the difference)
• the same identifiers by a translator (because the difference occurs in a nonsignificant character), butconsidered to be different identifiers by some readers of the source (because they treat all characters asbeing significant)
• identifiers by both a translator and some readers of the source
The possible reasons for readers making mistakes are discussed elsewhere, as are the guideline developer
Trang 91 extern int e1;
2 extern long el;
3 extern int a_longer_more_meaningful_name;
4 extern int a_longer_more_meeningful_name;
5 extern int a_meaningful_more_longer_name;
Commentary
While the obvious implementation strategy is to ignore the nonsignificant characters, the standard does not
require implementations to use this strategy To speed up identifier lookup many implementations use a
hashed symbol table— the hash value for each identifier is computed from the sequence of characters it
contains Computing this hash value as the characters are read in, to form an identifier, saves a second pass
over those same characters later If nonsignificant characters were included in the original computed hash
value, a subsequent occurrence of that identifier in the source, differing in nonsignificant characters, would
result in a different hash value being calculated and a strong likelihood that the hash table lookup would fail
Developers generally expect implementations to ignore nonsignificant characters An implementation that
behaved differently because identifiers differed in nonsignificant characters might not be regarded as being
very user friendly Highlighting misspellings that occur in nonsignificant characters is not always seen in a
positive light by some developers
C++
In C++all characters are significant, thus this statement does not apply in C++
Other Languages
Some languages specify that nonsignificant characters are ignored and have no effect on the program, while
others are silent on the subject
Common Implementations
Most implementations simply ignore nonsignificant characters They play no part in identifier lookup in
symbol tables
Coding Guidelines
The coding guideline issues relating to the number of characters in an identifier that should be considered
signifi-cant characters
809Forward references: universal character names (6.4.3), macro replacement (6.10.3).
6.4.2.2 Predefined identifiers
Semantics
brace of each function definition, the declaration
static const char func [] = "function-name";
Commentary
Implicitly declaring_ _func_ _immediately after the opening brace in a function definition means that
the first, developer-written declaration within that function can access it Giving_ _func_ _static storage
duration enables its address to be referred to outside the lifetime of the function that contains it (e.g., enabling
a call history to be displayed at some later stage of program execution) This is not a storage overhead
because space needs to be allocated for the string literal denoted by_ _func_ _ Theconstqualifier ensures
Trang 10that any attempts to modify the value cause undefined behavior The identifier_ _func_ _has an array type,and is not a string literal, so the string concatenation that occurs in translation phase 6 is not applicable.transla-
is not necessary when that object is defined using theconstqualifier.gccalso supports the built-in form
_ _FUNCTION_ _
Example
Debugging code in functions can provide useful information But when there are lots of functions, thequantity of useless information can be overwhelming Controlling which functions are to output debugginginformation by using conditional compilation requires that code be edited and the program rebuilt
The names of functions can be used to dynamically control which functions are to output debugginginformation This control not only reduces the amount of information output, but can also reduce executiontime by orders of magnitude (output can be a resource-intense operation)
Trang 1110 * Use the name of the function to control whether debugging is
11 * switched on/off lookup is only called the first time this code
12 * is executed, thereafter the value f _l->enabled can be used.
14 #define D_func_trace(func_name, code) { \
15 static func list * f _l = NULL; \
16 if (f _l ? f _l->enabled : lookup(&f _l, func_name)) \
6 * A fixed list of functions and their debug mode.
7 * We could be more clever and make this a list which
8 * could be added to as a program executes.
22 * Loop through lookup_table looking for a match against f_name.
23 * If a match is found, add f_list to the traces_seen list and
24 * return the value of enabled for that entry.
31 * Loop through lookup_table looking for a match against f_name.
32 * If a match is found, loop over its traces_seen list setting
33 * the enabled flag to new_enabled.
34 *
35 * This function can switch on/off the debugging output from
36 * any registered function.
38 }
translated into the execution character set as indicated in translation phase 5
Commentary
Having the name appearing as if in translation phase 5 avoids any potential issues caused by macro names 133transla-tion phase
5
Trang 12defined with the spelling of keywords or the name_ _func_ _ It also enables a translator to have an identifiername and type predefined internally, ready to be used when this reserved identifier is encountered Translationphase 5 is also where characters get converted to their corresponding members in the execution character set,
an essential requirement for spelling a function name In many implementations the function name written tothe object file, or program image, is different from the one appearing in the source This translation phase 5program
9 * The implicit declaration does not appear until after preprocessing.
10 * So there is no declaration ’static const char func [] = "f";’
11 * visible to the preprocessor (which would result in func being
12 * mapped to CNUF and "f" rather than "g" being output).
Trang 13Names beginning with_ _are reserved for use by a C++implementation This leaves the way open for a C++
implementation to use this name for some purpose
6.4.3 Universal character names
815
universal acter name syntax
It is intended that this syntax notation not be visible to the developer, when reading or writing source code
that contains instances of this construct That is, auniversal-character-nameaware editor displays the
ISO 10646 glyph representing the numeric value specified by thehex-quadsequence value Without such58 glypheditor support, the whole rationale for adding these characters to C, allowing developers to read and write
identifiers in their own language, is voided
It is difficult to imagine developers regularly using UCNs with an editor that does not display UCNs in
some graphical form A guideline recommending the use of such an editor would not be telling developers
anything they did not already know
A number of theories about how people recognize words have been proposed One of the major issues yet792Wordrecognition
models of
to be resolved is the extent to which readers make use of whole word recognition versus mapping character
sequences to sound (phonological coding) Support for UCNs increases the possibility that developers will
encounter unfamiliar characters in source code The issue of developer performance in handling unfamiliar
Trang 146 foo(\\u0123); /* Does contain a UCN */
not basic
char-acter set 0024 ($), 0040 (@), or 0060 (‘), nor one in the range D800 through DFFF inclusive.62)
in the basic source character set The ranges 0D800 through DBFF and 0DC00 through 0DFFF are known
as the surrogate ranges The purpose of these ranges is to allow representation of rare characters in futureversions of the Unicode standard
This constraint means that source files cannot contain the UCN equivalent for any members of the basicsource character set
RationaleUCNs are not permitted to designate characters from the basic source character set in order to permit fastcompilation times for C programs For some real world programs, compilers spend a significant amount oftime merely scanning for the characters that end a quoted string, or end a comment, or end some other token.Although, it is trivial for such loops in a compiler to be able to recognize UCNs, this can result in a surprisingamount of overhead
A UCN is constrained not to specify a character short identifier in the range 0000 through 0020 or 007F through009F inclusive for the same reason: this avoids allowing a UCN to designate the newline character Sincedifferent implementations use different control characters or sequences of control characters to representnewline, UCNs are prohibited from representing any control character
C++
2.2p2 If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F–0x9F (inclusive),
or if the universal character name designates a character in the basic source character set, then the program isill-formed
The range of hexadecimal values that are not permitted in C++is a subset of those that are not permitted in C.This means that source which has been accepted by a conforming C translator will also be accepted by aconforming C++translator, but not the other way around
Trang 15817Universal character names may be used in identifiers, character constants, and string literals to designate
characters that are not in the basic character set
Commentary
UCNs may also appear in comments However, comments do not have a lexical structure to them Inside a
comment character, sequences starting with\uare not treated as UCNs by a translator, although other tools
may choose to do so, in this context The mapping of UCNs in character constants and string literals to the
execution character set occurs in translation phase 5
The constraint on the range of values that a UCN may take prevents them from being used to represent816 UCNs
not basic acter set
char-keywords
C++
The C++Standard also supports the use of universal character names in these contexts, but does not say in
words what it specifies in the syntax (although 2.2p2 comes close for identifiers)
Other Languages
In Java,UnicodeInputCharacterscan represent any character and is mapped in lexical translation step
1 It is possible for every character in the source to appear in this form The mapping only occurs once, so
\u005cu005abecomes\u005a, notZ(005cis the Unicode value for\and005ais the Unicode character
forZ)
Coding Guidelines
UCNs in character constants and string literals are used to represent characters that are output when a program
is executed, or in identifiers to provide more readable source code In the former case it is possible that
UCNs from different natural languages will need to be represented In the latter case it might be surprising if
source code contained UCNs from different languages This usage is a complex one involving issues outside
of these coding guidelines (e.g., configuration management and customer requirements) and your author has
insufficient experience to know whether any guideline recommendations might be worthwhile
Some of the coding guideline issues relating to the use of characters outside of the basic execution
The standard specifies how UCNs are represented in source code A development environment may chose to
provide, to developers, a visible representation of the UCN that matches the glyph with the corresponding
numeric value in ISO 10646 The ISO 10646 BNF syntax for short identifiers is: ISO 10646
short identifier
{ U | u } [ {+}(xxxx | xxxxx | xxxxxx) | {-}xxxxxxxx ]
where x represents a hexadecimal digit
Trang 16is likely to result in more significant characters being retained in identifiers having external linkage.
819
nnnn (and whose eight-digit short identifier is 0000nnnn)
Commentary
It was possible to represent all of the characters specified by versions 1 and 2 of the Unicode-sponsoredcharacter set using four-digit short identifiers Version 3 introduced characters whose representation valuerequires more than four digits
that existing tools (e.g., editors) continue to be able to process source files
The control characters may have special meaning for some tools that process source files (e.g., a nications program used for sending source down a serial link)
syntax
895
value does not change What the C Standard calls a constant-expression developers often shorten to constant
Trang 17Footnote 21
21) The term “literal” generally designates, in this International Standard, those tokens that are called “constants”
The C++Standard also includesstring-literalandboolean-literalin the list of literals, but it does
not include enumeration constants in the list of literals However:
7.2p1
required
The C++terminology more closely follows common developer terminology by using literal (a single token)
and constant (a sequence of operators and literals whose value can be evaluated at translation time) The value
of a literal is explicit in the sequence of characters making up its token A constant may be made up of more
than one token or be an identifier The operands in a constant have to be evaluated by the translator to obtain
its result value C uses the more easily confused terminology ofinteger-constant(a single token) and
constant-expression(a sequence of operators, integer-constantandfloating-constantwhose
value can be evaluated at translation time)
Other Languages
Languages that support types not supported by C (e.g., instance sets) sometimes allow constants having
these types to be specified (e.g., in Pascal[’a’, ’d’]represents a set containing two characters) Fortran
supports complex literal constants (e.g.,(1.0, 2.0)represents the complex number 1.0 + 2.0i)
Many languages do not support (e.g., Java until version 1.5) some form ofenumeration-constant
Coding Guidelines
Constants are the mechanism by which numeric values are written into source code The term constant is
used because the numeric values do not change during program execution (and are known at translation time;
although in some cases a person reading the source may only know that the value used will be one of a list of
possible values because the definition of a macro may be conditional on the setting of some translation time
object-like
The use of constants in source code creates a number of possible maintenance issues, including:
• A constant value, representing some quantity, often needs to occur in multiple locations within source
code Searching for and replacing all occurrences of a particular numeric value in the code is an error
prone process It is not possible, for instance, to know that all15s occurring in the source code have
the same semantic association and some may need to remain unchanged (Your author was once told
by a developer, whose source contained lots of15s, that the UK government would never change
value-added tax from 15%; a few years later it changed to 17.5%.)
• On encountering a constant in the source, a reader usually needs to deduce its semantic association
(either in the application domain or its internal algorithmic function) While its semantics may be very
familiar to the author of the source, the association between value and semantics may not be so readily
made by later readers
• A cognitive switch may need to be made because of the representation used for the constant (e.g.,0 cognitive
switchfloating point, hexadecimal integer, or character constant)
One solution to these problems is to use an identifier to give a symbolic name822.1to the constant, and to use symbolic name
that symbolic name wherever the constant would have appeared in the source Changes to the value of the
constant can then be made by a single modification to the definition of the identifier and a well-chosen name
can help readers make the appropriate semantic association The creation of a symbolic name provides two
pieces of information:
Trang 181 The property represented by that symbolic name For instance, the maximum value of a particulartype (INT_MAX), whether an implementation supports some feature (_ _STDC_IEC_559_ _), a means ofINT_MAX 318
STDC_IEC_559
macro
2015
specifying some operation (SEEK_SET), or a way to obtain information (FE_OVERFLOW)
2 A method of operating on the symbolic name to access the property it represents For instance, metic operations (INT_MAX), testing in a conditional preprocessing directive (_ _STDC_IEC_559_ _),passing as an argument to a library function (SEEK_SET); passing as an argument to a library function,possibly in combination with other symbolic names (FE_OVERFLOW)
arith-Operating on symbolic names involves making use of representation information (Assignment, or argumentpassing, is the only time that representation might not be an issue.) The extent to which the use ofrepresentation information will be considered acceptable will depend on the symbolic name For instance,FE_OVERFLOWappearing as the operand of a bitwise operator is to be expected, but its appearance as theoperand of an arithmetic operator would be suspicious
The use of symbolic names is rarely seen by developers, as applying to all constants that occur in sourcecode In some cases the following are claimed:
• The constants are so sufficiently well-known that there is no need to give them a name
• The number of occurrences of particular constants is not sufficient to warrant creating a name for them
• Operations involving some constant values occur so frequently that their semantic associations areobvious to developers; for instance, assigning0or adding1
It is true that not all numeric values are meaningless to everybody A few values are likely to be universallyknown (at least to Earth-based developers) For instance, there are 60 seconds in a minute, 60 minutes in anhour, and 24 hours in a day The value24occurring in an expression involving time is likely to representhours in a day Many values will only be well known to developers working within a given applicationdomain, such as atomic physics (e.g., the value6.6261E-34) Between these extremes are other values; forinstance,3.14159will be instantly recognized by developers with a mathematics background However,developers without this background may need to think about what it represents There is the possibilitythat developers who have grown up surrounded by other mathematically oriented people will be completelyunaware that others do not recognize the obvious semantic association for this value
A constant having a particular semantic association may only occur once in the source However, theissue is not how many times a constant having a particular semantic association occurs, but how many timesthe particular constant value occurs The same constant value can appear because of different semanticassociations A search for a sequence of digits (a constant value) will locate all occurrences, irrespective ofsemantic association
While an argument can always be made for certain values being so sufficiently well-known that there is nobenefit in replacing them by identifiers, the effort/time taken in discussions on what values are sufficientlywell-known to warrant standing on their own, instead of an identifier, is likely to be significantly greater thanthe sum total of all the extra one seconds, or so, taken to type the identifier
The constant values0and1occur very frequently in source code (see Figure825.1) Experience suggeststhat the semantic associations tend to be that of assigning an initial value in the case of0and accessing apreceding or following item in the case of1 The coding guideline issues are discussed in the subsectionsthat deal with the different kinds of constants (e.g., integer, or floating)
What form of definition should a symbolic name denoting constant value have? Possibilities include thefollowing:
• Macro names These are seen by developers as being technically the same as constants in that they arereplaced by the numeric value of the constant during translation (there can also be an unvoiced biastoward perceived efficiency here)
Trang 19• Enumeration constants The purpose of an enumerated type is to associate a list of constants with each
other This is not to say the definition of an enumerated type containing a single enumeration constant517enumerationset of named
constants
should not occur, but this usage would be unusual Enumeration constants share the same unvoiced
developer bias as macro names— perceived efficiency
• Objects initialized with the constant This approach is advocated by some coding guideline documents
for C++ The extent to which this is because an object declared with theconstqualifier really is
constant and a translator need not allocate storage for it, or because use of the preprocessor (often
called the C preprocessor, as if it were not also in C++) is frowned on in the C++community and is left
to the reader to decide
The enumeration constant versus macro name issue is discussed in detail elsewhere 517enumerationset of named
constants
What name to choose? The constant6.6261E-34illustrates another pitfall Planck’s constant is almost
universally represented, within the physics community, using the letter h (a closely related constant is ¯h,
the reduced Planck constant)) A developer might be tempted to make use of this idiom to name the value,
perhaps even trying to find a way of using UCNs to obtain the appropriateh The single letterhprobably
gives no more information than the value The name PLANCK_CONSTANTis self-evident The developer
attitude— anybody who does not know what6.6261E-34represents has no business reading the source— is
not very productive or helpful
Table 822.1: Occurrence of different kinds of constants (as a percentage of all tokens) Based on the visible form of the c and
This is something of a circular definition in that a constant’s value is also used to determine its type The
lexical form of a constant is also a factor in determining which of a number of possible types it may take An 824constanttype determined by
form and value
unsuffixed constant that is too large to be represented in the typelong long, or a suffixed constant that is
larger than the type with the greatest rank applicable to that suffix, violates this requirement (unless there is
some extended integer type supported by the implementation into whose range the value falls)
It can be argued that all floating constants are in range if the implementation supports ±∞
There is a similar constraint for enumeration constants 1440enumerationconstant
representable in int
C++
The C++Standard has equivalent wording coveringinteger-literals(2.13.1p3),character-literals
(2.13.2p3) andfloating-literals(2.13.3p1) Forenumeration-literalstheir type depends on the
context in which the question is asked:
7.2p4
the closing brace, the type of each enumerator is the type of its initializing value
7.2p5
Trang 20The underlying type of an enumeration is an integral type that can represent all the enumerator values defined inthe enumeration.
5 float f_3 = 1e-99999999999999999999999999999999999999999999999; /* Approximately zero */
6 float f_4 = 0e-99999999999999999999999999999999999999999999999; /* Exact zero */
2.13.1p2 The type of an integer literal depends on its form, value, and suffix
2.13.3p1 The type of a floating literal isdoubleunless explicitly specified by a suffix The suffixesfandFspecifyfloat,
There are no similar statements for the other kinds of literals, although C++does support suffixes on thefloating types However, the syntactic form of string literals, character literals, and boolean literals determinestheir type
Trang 21Coding Guidelines
The type of a constant, unlike object types, can vary between implementations For instance, the integer
constant40000can have either the typeintorlong int The suffix on the integer constant40000uonly
ensures that it has one of the listed unsigned integer types The coding guideline issues associated with the
possibility that the type of a constant can vary between implementations is discussed elsewhere 835integerconstant
type first in list
6.4.4.1 Integer constants
825
integer constant syntax
integer-constant:
decimal-constant integer-suffix opt octal-constant integer-suffix opt hexadecimal-constant integer-suffix opt decimal-constant:
nonzero-digit decimal-constant digit octal-constant:
0
octal-constant octal-digit hexadecimal-constant:
hexadecimal-prefix hexadecimal-digit hexadecimal-constant hexadecimal-digit hexadecimal-prefix: one of
Integer constants are created in translation phase 7 when the preprocessing tokenspp-numberare converted 136transla-tion phase
7
into tokens denoting various forms ofconstant.Integer-constants always denote positive values The
character sequence-1consists of the two tokens {-} {1}, a constant expression 1322constantexpression
syntax
Aninteger-suffixcan be used to restrict the set of possible types the constant can have, it also specifies
the lowest rank an integer constant may have (which forllorLLleaves few further possibilities) TheU, or
u, suffix indicates that the integer constant is unsigned
All translation time integer constants are nonnegative The character sequence-1consists of the token
sequence unary minus followed by thedecimal-constant 1 Support for translation time negative constants
Trang 22in the lexical grammar would create unjustified complexity by requiring lexers to disambiguate binary fromunary operators uses in, for instance:X-1.
C90
Support forlong-long-suffixand the nonterminalhexadecimal-prefixis new in C99
C++
The C++syntax is identical to the C90 syntax
Support forlong-long-suffixand the nonterminalhexadecimal-prefixis not available in C++
Common Implementations
Some implementations specify that the prefix0b(or0B) denotes an integer constant expressed in binarynotation Over the years the C Committee received a number of requests for such a suffix to be added tothe C Standard The Committee did not see sufficient utility for this suffix to be included in C99 The Cembedded systems TR specifieshandHto denote the typesshort fracorshort accum, and one ofk,K,Embed-
ded C TR18
r, andRto denote a fixed-point type
The IBM ILE C compiler[627]supports a packed decimal data type The suffixdorDmay be used tospecify that a literal has this type Microsoft C supports the suffixesi8,i16,i32, andi64denoting integerconstants having the typesbyte(an extension),short,int, and_ _int64, respectively
Other Languages
Although Ada supports integer constants having bases between 1 and 36 (e.g.,2#1101is the binary tation for10#13), few other languages support the use of suffixes Ada also supports the use of underscoreswithin aninteger-constantto make the value more readable
represen-Coding Guidelines
A study by Brysbaert[174]found that the time taken for a person to process an Arabic integer between 1 and
99 was a function of the logarithm of its magnitude, the frequency of the number (based on various estimates
of its frequency of occurrence in everyday life; see Dorogovtsev et al[373]for measurements of numbersappearing in web pages), and sometimes the number of syllables in the spoken form of the value Subjectresponse times varied from approximately 300 ms for values close to zero, to approximately 550 ms forvalues in the nineties
Experience shows that thelong-suffix lis often visually confused with thenonzero-digit 1.825.1
If along-suffixis required, only the formLshall be used
If along-long-suffixis required, only the formLLshall be used
As previously pointed out, constants appearing in the visible form of the source often signify some quantityconstant
syntax822
with real world semantics attached to it However, uses of the integer constants0and1in the visible sourceoften have no special semantics associated with their usage They also represent a significant percentage ofthe total number of integer constants in the source code (see Figure825.1) The frequency of occurrence ofthese values (most RISC processors dedicate a single register to permanently hold the value zero) comesabout through commonly seen program operations These operations include: code to count the number ofoccurrences of entities, or that contain loops, or index the previous or next element of an array (not that 0 or
1 could not also have similar semantic meaning to other constant values)
A blanket requirement that all integer constants be represented in the visible source by symbolic namesfails to take into account that a large percentage of the integer constants used in programs have no special
that has measured the visually similarity of digits with letters.
Trang 23decimal-constant value
1 10 100 1,000 10,000
100,000
. . .
. . .
.
.
.
.
. . .
.
. .. .
. . . .
.
... .. .
. .
. .. . .
.
.
. .
..
. . . .
.
.
. .
.
.
. . .
.
. .
.
. .. ...
.
.
.. .
.
.
.
. .
.
.
.
.. .
.
.
hexadecimal-constant value
.. . . .
.
.. . .
...
.
.. . .
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
.
..
.
.
.
.
.
..
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
.
...
Figure 825.1: Number of integer constants having the lexical form of adecimal-constant(the literal 0 is also included in this
set) andhexadecimal-constantthat have a given value Based on the visible form of the c and h files.
meaning associated with them In particular the integer constants 0 and 1 occur so often (see Figure825.1)
that having to justify why each of them need not be replaced by a symbolic name would have a high cost for
an occasional benefit
No integer constant, other than 0 and 1, shall appear in the visible source code, other than as the sole
preprocessing token in the body of a macro definition or in an enumeration definition
Some developers are sloppy in the use of integer constants, using them where a floating constant was the
appropriate type The presence of a period makes it explicitly visible that a floating type is being used The
general issue of integer constant conversions is discussed elsewhere
835.2 integer constant with suffix, not immediately converted
surprising because the significant digits of a set of values created by randomly sampling from a variety of
different distributions converges to a logarithmic distribution (i.e., Benford’s law).[583]While the results for
decimal-constant(see Figure825.2) may appear to be a reasonable fit, applying a chi-squared test shows
the fit to be remarkably poor (χ2= 132,398) The first nonzero digit ofhexadecimal-constants appears to
be approximately evenly distributed
Table 825.1: Occurrence of various kinds ofinteger-constants (as a percentage of all integer constants; note that zero is
included in thedecimal-constantcount rather than theoctal-constantcount) Based on the visible form of the c and h
Trang 24First non-zero digit
1 2 3 4 5 6 7 8 9 A B C D E F 1
10 100
decimal
hexadecimal
Figure 825.2: Probability of adecimal-constantorhexadecimal-constantstarting with a particular digit; based on c files.
Dotted lines are the probabilities predicted by Benford’s law (for values expressed in base 10 and base 16), i.e., log(1 + d −1 ), where d is the numeric value of the digit.
Table 825.2: Occurrence of variousinteger-suffixsequences (as a percentage of allinteger-constants) Based on the
visible form of the c and h files.
Suffix Character Sequence c files h files Suffix Character Sequence c files h files
L/l 0.1378 0.2096 ULL/uLl/ulL/Ull 0.0128 0.0061 U/uL/ul 0.1269 0.1625 LLU/lLu/LlU/llu 0.0000 0.0000
Table 825.3: Common token pairs involvinginteger-constants Based on the visible form of the.c files.
Token Sequence % Occurrence
of First Token
% Occurrence of Second Token
Token Sequence % Occurrence
of First Token
% Occurrence of Second Token
ordering rules, which were (for the number pair x and y):
• x has to be smaller than y
• x or y has to be round (i.e., round numbers include the numbers 1 to 20 and the multiples of five)
• the difference between x and y has to be a favorite number (These include: 10n×(1, 2, ½, or ¼) forany value of n.)
Description
Trang 25826An integer constant begins with a digit, but has no period or exponent part integer constant
Commentary
A restatement of information given in the Syntax clause
Commentary
A suffix need not uniquely determine an integer constants type, only the lowest rank it may have There is no
suffix for specifying the typeint, or any integer type with rank less thanint(although implementations
may provide these as an extension)
The base document did not specify any suffixes; they were introduced in C90 1 base
is often used to denote what the standard calls a decimal constant, which corresponds to the common case
When they occur in source, both octal and hexadecimal constants are usually referred to by these names,
respectively The benefits of educating developers to use the terminology decimal constant instead of integer
constantare very unlikely to exceed the cost
Commentary
A restatement of information given in the Syntax clause
Coding Guidelines
The constant0is, technically, an octal constant Some guideline documents use the term decimal constant in
their wording, overlooking the fact that, technically, this excludes the value0 The guidelines given in this
book do not fall into this trap, but anybody who creates a modified version of them needs to watch out for it
Commentary
A restatement of information given in the Syntax clause An octal constant is a natural representation to use
when the value held in a single byte needs to be displayed (or read in) and the number of output indicators (or
input keys) is limited (only eight possibilities are needed) For instance, a freestanding environment where
the output device can only represent digits The users of such input/output devices tend to be technically
literate
Other Languages
A few other languages (e.g., Java and Ada) support octal constants Most do not
Common Implementations
K&R C supported the use of the digits8and9in octal constants (support for this functionality was removed
during the early evolution of C[1199]although some implementations continue to support it[610, 1094]) They
represented the values 10 and 11, respectively
Coding Guidelines
Octal constants are rarely used (approximately 0.1% of allinteger-constants, not counting the value0)
There seem to be a number of reasons why developers occasionally use octal constants:
Trang 26• A long-standing practice that arguments to calls to some Unix library functions use octal constants toindicate various attributes (e.g.,open(file, O_WRONLY, 0666)) The introduction, by POSIX in
1990, of identifiers representing these properties has not affected many developers’ coding habits Thevalue0666, in this usage, could be said to be treated like a symbolic identifier
• Cases where it is sometimes necessary to think of a bit pattern in terms of its numeric value Bit patternsare invariably grouped into bytes, making hexadecimal an easier representation to manipulate (becauseits visual representation is easily divisible into bytes and half bytes) However, mental arithmeticinvolving octal digits is easier to perform than that with hexadecimal digits (There are fewer items ofinformation that need to be remembered and people have generally automated the processing of digits,but conscious effort is needed to map the alphabetic letters to their numeric equivalents.)
• The values are copied from an external source; for instance, tables of measurements printed in octal.There are no obvious reasons for recommending the use of octal constants over decimal or hexadecimalconstants (there is a potential advantage to be had from using octal constants)
Other Languages
Many languages support the representation of hexadecimal constants in source code The prefix character$
(not available in C’s basic source character set) is used almost as often, if not more so, than the0xform ofbasic source
of two rather than powers to ten In such cases a constant appearing in the source as a hexadecimal constant
is more easily appreciated (in terms of the sums of the powers of two involved and by which powers of two itdiffers from other constants) than if expressed as a decimal constant
Measurements of constant use in source code show that usage patterns for hexadecimal constants areinteger
constantusage
825
different from decimal constants The probability of a particular digit being the first nonzero digit in ahexadecimal constant is roughly constant, while the probability distribution of this digit in a decimal constantdecreases with increasing value (a ch-squared analysis gives a very low probability of it matching Benford’slaw) Also the sequence of value digits in ahexadecimal-constant(see Table830.1) almost always exactlycorresponds to the number of nibbles in either a character type,short,int, orlong
A study by Logan and Klapp[876]used alphabet arithmetic (e.g., A + 2 = C) to investigate how extendedpractice and rote memorization affected automaticity For inexperienced subjects who had not memorized anyautoma-
tization0
addition table, the results showed that the time taken to perform the addition increased linearly with the value
of the digit being added This is consistent with subjects counting through the letters of the alphabet to obtainthe answer With sufficient practice subjects performance not only improved but became digit-independent.This is consistent with subjects recalling the answer from memory; the task had become automatic
The practice group of subjects were given a sum and had to produce the answer The memorization group
of subjects were asked to memorise a table of sums (e.g., A + 3 = D) In both cases the results showed thatperformance was proportional to the number of times each question/answer pair had been encountered, notthe total amount of time spent
Trang 27Arithmetic involving hexadecimal constants differs from that involving decimal constants in that developers
will have had much less experience in performing it The results of the Logan and Klapp study show that the
only way for developers to achieve the same level of proficiency is to commit the hexadecimal addition table
to memory Whether the cost of this time investment has a worthwhile benefit is unknown
Table 830.1: Occurrence ofhexadecimal-constants containing a given number of digits (as a percentage of all such constants).
Based on the visible form of the c files.
Digits Occurrence Digits Occurrence Digits Occurrence Digits Occurrence
C supports the representation of constants in the base chosen by evolution on planet Earth
Commentary
The C language requires the use of binary representation for the integer types The use of both base 8 and
593 unsigned integer types object representa- tion
base 16 visual representations of binary information has been found to be generally more efficient, for people,
than using a binary representation Developers continue to debate the merits of one base over another Both
experience with using one particular base and the kind of application domain affect preferences
Commentary
The correct Latin prefix is sex, giving sexadecimal It has been claimed that this term was considered too
racey by IBM who adopted hexadecimal (hex is the equivalent Greek prefix, the Latin decimal being retained)
in the 1960s to replace it (the term was used in 1952 by Carl-Eric Froeberg in a set of conversion tables)
834The lexically first digit is the most significant
Commentary
The Arabic digits in a constant could be read in any order In Arabic, words and digits are read/written
right-to-left (least significant to most significant in the case of numbers) The order in which Arabic numerals
are written was exactly copied by medieval scholars, except that they interpreted them using the left-to-right
order used in European languages
type first in list
Commentary
This list only applies to thosepp-numbersthat are converted tointeger-constanttokens as part of
trans-lation phase 7 Integer constants in#ifpreprocessor directives always have typeintmax_t, oruintmax_t136transla-tion phase
7
(in C90 they had typelongorunsigned long)
Trang 28Other Languages
In Java integer constants have typeintunless they are suffixed withl, orL, in which case they have type
long Many languages have a single integer type, which is also the type of all integer constants
Coding Guidelines
The type of an integer constant may depend on the characteristics of the host on which the program executesand the form used to express its value For instance, the integer constant40000may have typeintorlong int(depending on whetherintis represented in more than 16 bits, or in just 16 bits) The hexadecimalconstant0x9C40(40000decimal) may have typeintorunsigned int(depending on the whetherintisrepresented in more than 16 bits, or in just 16 bits)
For objects having an integer type there is a guideline recommending that a single integer type always
be used (the typeint) However, integer constants never have a type whose rank is less thanintand so
The possibility that the type of an integer constant can vary between implementations and platformscreates a portability cost There is also the potential for incorrect developer assumptions about the type of aninteger constant, leading to additional maintenance costs The specification of a guideline recommendation
is complicated by the fact that C does not support a suffix that specifies the typeint(or its correspondingunsigned version) This means it is not possible to specify that a constant, such as40000, has typeintandexpect a diagnostic to appear when using a translator that gives it the typelong
An integer constant containing a suffix is generally taken as a statement of intent by the developer A suffixedinteger constant that is immediately converted to another type is suspicious
those types
Is there anything to be gained from recommending that integer constants less than32767be suffixed ratherthan implicitly converted to another type? The original type of such an integer constant is obvious to thereader and a conversion to a type for which the standard provides a suffix will not change its value; thereal issue is developer expectation Expectation can become involved through the semantics of what theconstant represents For instance, a program that manipulates values associated with the ISO 10646 Standardmay store these values in objects that always have typeunsigned int This usage can lead to developerslearning (implicitly or explicitly) that objects manipulating these semantic quantities have typeunsigned
Trang 29left operand has an unsigned type (If it has a signed type, setting the most significant bit will cause the result
to be negative.) If the identifierFOO_charis a macro whose body is a constant integer having a signed type,
developer expectations will not have been met
In those cases where developers have expectations of an operand having a particular type, use of a suffix
can help ensure that this expectation is met If the integer constant appears in the visible source at the point
its value is used, developers can immediately deduce its type An integer constant in the body of a macro
definition or as an argument in a macro invocation are the two circumstances where type information is not
immediately apparent to readers of the source (The integer constant is likely to be widely separated from its
point of use in an expression.)
The disadvantage of specifying a suffix on an integer constant because of the context in which it is used is
that the applicable type may change The issues involved with implicit conversion versus explicit conversion
are discussed elsewhere An explicit cast, using a typedef name rather than a suffix, is more flexible in this 654 implicit
con-versionregard
Use of a suffix not defined by the standard, but provided by the implementation, is making use of an
extension Does this usage fall within the guideline recommendation dealing with use of extensions, or is it 95.1 extensions
cost/benefit
sufficiently useful that a deviation should be made for it? Suffixes are a means for the developer to specify
type information on integer constants Any construct that enables the developer to provide more information
is usually to be encouraged While there are advantages to this usage, at the time of this writing insufficient
experience is available on the use of suffixes to know whether the advantages outweigh the disadvantages A
deviation against the guideline recommendation might be applicable in some cases
Dev95.1
Any integer constant suffix supported by an implementation may be used
Table 835.1: Occurrence ofinteger-constants having a particular type (as a percentage of all such constants; with the type
denoted by any suffix taken into account) when using two possible representations of the typeint(i.e., 16- and 32-bit) Based on
the visible form of the c and h files.
Type 16-bitint 32-bitint
unsigned int 3.493 0.414
unsigned long 0.557 0.138 other-types 0.029 0.059
836
Trang 30integer constant
possible types
Suffix Decimal Constant Octal or Hexadecimal Constant
long int unsigned int long long int long int
unsigned long int long long int unsigned long long int
uorU unsigned int unsigned int
unsigned long int unsigned long int unsigned long long int unsigned long long int
long long int unsigned long int
long long int unsigned long long int
BothuorU unsigned long int unsigned long int
andlorL unsigned long long int unsigned long long int
llorLL long long int long long int
unsigned long long int
BothuorU unsigned long long int unsigned long long int
andllorLL
Commentary
The lowest rank that an integer constant can have is typeint This list contains the standard integer typesonly, giving preference to these types Any supported extended integer type is considered if an appropriatetype is not found from this list
C90
The type of an integer constant is the first of the corresponding list in which its value can be represented
the letterlorL:long int, unsigned long int; suffixed by both the letters uorUandlorL:unsigned long int.
Support for the typelong longis new in C99
The C90 Standard will give a sufficiently large decimal constant, which does not contain auorUsuffix—the typeunsigned long The C99 Standard will never give a decimal constant that does not contain either
of these suffixes— an unsigned type
Because of the behavior of C++, the sequencing of some types on this list has changed from C90 Thefollowing shows the entries for the C90 Standard that have changed
Suffix Decimal Constant none int
long int unsigned long int
lorL long int unsigned long int
Under C99, the none suffix, andlorLsuffix, case no longer contain an unsigned type on their list
A decimal constant, unless given auorUsuffix, is always treated as a signed type
Trang 312.13.1p2
If it is decimal and has no suffix, it has the first of these types in which its value can be represented:int, long
and has no suffix, it has the first of these types in which its value can be represented:int, unsigned int, long
int, unsigned long int If it is suffixed by uorU, its type is the first of these types in which its value can be
represented:unsigned int, unsigned long int If it is suffixed by lorL, its type is the first of these types in
which its value can be represented:long int, unsigned long int If it is suffixed by ul,lu,uL,Lu,Ul,lU,
UL, orLU, its type isunsigned long int.
The C++Standard follows the C99 convention of maintaining a decimal constant as a signed and never an
unsigned type
The typelong long, and its unsigned partner, is not available in C++
There is a difference between C90 and C++in that the C90 Standard can give a sufficiently large decimal
literal that does not contain auorUsuffix— the typeunsigned long Neither the C++or C99 Standard will
give a decimal constant that does not contain either of these suffixes— an unsigned type
Other Languages
In Java hexadecimal and octal literals always have a signed type and denote a negative value if the high-order
bit, for their type, is set The literal0xcafebabehas decimal value -889275714 and typeintin Java, and
decimal value 3405691582 and typeunsigned intorunsigned longin C
extended integer type can represent its value
Commentary
For an implementation to support an integer constant which is not representable by any standard integer type,
requires that it support an extended integer type that can represent a greater range of values than the types
long longorunsigned long long
A C translation unit that contains an integer constant that has an extended integer type may not be accepted
by a conforming C++translator But then it may not be accepted by another conforming C translator either
Support for the construct is implementation-defined
Other Languages
Very few languages explicitly specify potential implementation support for extended integer types
Common Implementations
In some implementations it is possible for an integer constant to have a type with lower rank than those given
syntax
Coding Guidelines
Source containing an integer constant, the value of which is not representable in one of the standard integer
types, is making use of an extension The guideline recommendation dealing with use of extensions is 95.1 extensions
cost/benefit
applicable here If it is necessary for a program to use an integer constant having an extended integer type,
the deviation for this guideline specifies how this usage should be handled The issue of an integer constant
being within the range supported by a standard integer type on one implementation and not within range on
greater than 32767
Trang 32Consider the token100000000000000000000in an implementation that supports a 64-bit two’s complement
long long, and no extended integer types The numeric value of this token outside of the range of anyinteger type supported by the implementation and therefore it has no type
This sentence was added by the response to DR #298
fractional-constant exponent-part opt floating-suffix opt digit-sequence exponent-part floating-suffix opt
e sign opt digit-sequence
E sign opt digit-sequence sign: one of
+
-digit-sequence:
Trang 33digit digit-sequence digit hexadecimal-fractional-constant:
hexadecimal-digit-sequence opt .
hexadecimal-digit-sequence
hexadecimal-digit-sequence
binary-exponent-part:
p sign opt digit-sequence
P sign opt digit-sequence hexadecimal-digit-sequence:
hexadecimal-digit hexadecimal-digit-sequence hexadecimal-digit floating-suffix: one of
f l F L Commentary
The majority offloating-decimal-constants do not have an exact binary representation For instance, if
FLOAT_RADIXis 2 then only 4% of constants having two digits after the decimal point can be represented
exactly (i.e., those ending 00, 25, 50, and 75)
Unlike aninteger-suffix, afloating-suffixspecifies the actual type, not the lowest rank of a set
of types (not that floating-point types have rank)
Hexadecimal floating constants were introduced to remove the problems associated with translators
incorrectly mapping character sequences denoting decimal floating constants to the internal representation
of floating numbers used at execution time The potential mapping problems only apply to the significand,
so a decimal representation can still be used for the exponent (requiring a hexadecimal representation for
the exponent would have made it harder for human readers to quickly gauge the magnitude of a constant
and created a lexical ambiguity, e.g., would the character sequencep0x1fbe interpreted as ending in the
floating-suffix for not)
The exponent is always required for the hexadecimal notation, unlike decimal floating constants, otherwise,
the translator would not be able to resolve the ambiguity that occurs when af, orF, appears as the last
character of a preprocessing token For instance, 0x1.fcould mean1.0f (thefinterpreted as a suffix
indicating the type float) or1.9375(thefbeing interpreted as part of the significand value)
Thehexadecimal-floating-constant 0x1.FFFFFEp128fdoes not represent the IEC 60559
single-format NaN It overflows to an infinity in the single single-format
C90
Support forhexadecimal-floating-constantis new in C99 The terminaldecimal-floating-constant
is new in C99 and its right-hand side appeared on the right offloating-constantin the C90 Standard
C++
The C++syntax is identical to that given in the C90 Standard
Support forhexadecimal-floating-constantis not available in C++
Trang 34First non-zero digit
1 10 100
Figure 842.1: Probability of adecimal-floating-constant(i.e., not hexadecimal) starting with a particular digit Based on the visible form of the c files Dotted line is the probability predicted by Benford’s, i.e., log(1 + d −1 ), where d is the numeric value of the digit (χ2= 1,680 is a very poor fit).
Other Languages
Support for hexadecimal-floating-constantis unique to C Fortran 90 supports the use of aKIND
specifier as part of the floating constant Fortran also supports the use of the letterD, rather thanE, in theexponent part to indicate that the constant has typedouble(rather than real, the single-precision defaulttype) Java supports the optional suffixesf(typefloat, the default) andd(typedouble)
Coding Guidelines
Mapping to and from a hexadecimal floating constant, and its value as a floating-point literal, requiresknowledge of the underlying representation The purpose of supporting the hexadecimal floating constantnotation is to allow developers to remove uncertainty over the accuracy of the mapping, of values expressed
in decimal, performed by translators Developers are unlikely to want to express floating constants in adecimal notation for any other reason and the guideline recommendation dealing with use of representationinformation is not applicable
Floating constant may be expressed using the hexadecimal floating-point notation
The advantage of hexadecimal floating constants is that they guarantee an exact (whenFLT_RADIXis a power
of two) floating value in the program image, provided the constant has the same or less precision than thetype
For the same rationale as integer constants, there is good reason why most floating constants should not
sole preprocessing token in the body of a macro definition
Trang 35Table 842.1: Occurrence of variousfloating-suffixes (as a percentage of all such constants) Based on the visible form of
the c and h files.
Suffix Character Sequence c files h files
Table 842.2: Common token pairs involvingfloating-constants Based on the visible form of the.c files.
Token Sequence % Occurrence
of First Token
% Occurrence of Second Token
Token Sequence % Occurrence
of First Token
% Occurrence of Second Token
This defines the terms significand part and exponent part
whole-number part fraction part
Commentary
A restatement of information given in the Syntax clause The character denoting the period, which may
appear when floating-point values are converted to strings, is locale dependent However, the period character
that appears in C source is not locale dependent
A leading zero does not indicate an octal floating-point value
C++
2.13.3p1
The integer part, the optional decimal point and the optional fraction part form the significant part of the floating
literal
The use of the term significant may be a typo This term does not appear in the C++Standard and it is only
used in this context in one paragraph
Other Languages
This form of notation is common to all languages that support floating constants, although in some languages
the period (decimal point) in a floating constant is not optional
Coding Guidelines
The term whole-number is sometimes used by developers A more commonly used term is integer part (the
term used by the C++Standard) The commonly used term for the period character in a floating constant is
decimal point
Trang 36A common mathematical convention is to have a single nonzero digit preceding the period This is a
floating constant
digit layout useful convention when reading source code since it enables a quick estimate of the magnitude of the value to
be made There are also circumstances where more than one digit before the period, or leading zeros beforeand after the period, can improve readability when the floating constant is one of many entries in a table Inthis case the relative position of the first non zero digit may provide a useful guide to the relative value of aseries of constants, which may be more important information than their magnitudes
Your author knows of no research showing that any method of displaying floating constants minimizes thecognitive effort, or the error rate, in comprehending them However, there does appear to be an advantage inhaving consistency of visual form between constants close to each other in the source Comprehending therelationship between the various initializers appears to require less effort forg_1andg_2than it does forg_3
accu-racy poor character to binary conversion), but they do contain information Trailing zeros can be interpreted as a
statement of accuracy; for instance, the measurement 7.60 inches is more accurate than 7.6 inches
Leading zeros are sometimes used for padding and have no alternative interpretation Adding trailingzeros to a fractional part for padding purposes is misleading They could be interpreted as giving a floatingconstant a degree of accuracy that it does not possess While such usage does not affect the behavior of aprogram, it can affect how developers interpret the accuracy of the results
Floating constants shall not contain trailing zeros in their fractional part unless these zeros accuratelyrepresent the known value of the quantity being represented
845
signed digit sequence
Trang 37characters before dp characters after dp
1 10
Figure 844.1: Number offloating-constants, that do not contain an exponent part, containing a given number of digit
sequences before and after the decimal point (dp), and the total number of digit in afloating-constant Based on the visible
form of the c and h files.
C++
Like C90, the C++Standard does not support the use ofp, orP
Other Languages
The use of the notationeorEis common to most languages that support the same form of floating constants
Fortran also supports the use of the letterD, rather thanE, to indicate the exponent In this case the constant
has typedouble(there is no typelong double)
Coding Guidelines
Amongst a string of digits, the letterEcan easily be mistaken for the digit8 There is no such problem with
the lettere, which also adds a distinguishing feature to the visual appearance of a floating constant (a change
in the height of the characters denoting the constant) However, there is no evidence to suggest that this
choice of exponent letter is sufficiently important to warrant a guideline recommendation At the time of this
writing there is little experience available for how developers view the exponentpandP While the prefix
indicates that a hexadecimal constant is being denoted, a lowercasepoffers an easily distinguished feature
that its uppercase equivalent does not
When only one of these parts is present, the period character might easily be overlooked, especially when
floating constants occur adjacent to other punctuation tokens such as a comma This problem can be overcome
by ensuring that a digit (zero can always be used) appears on either side of the period However, such usage
is not, itself, free of problems The period can be interpreted as a comma (if the source is being quickly
scanned), causing the digits on either side of the period to be treated as two separate constants The issue of
Trang 38white space between tokens is discussed elsewhere In the case of digits after the decimal point, there is alsowords
tization0
value with a constant that contains a period than one that only contains an exponent (which is likely to requireconscious attention) However, given existing usage (see Figure844.1) a guideline recommendation does notappear worthwhile
Developers reading source often only need an approximate estimate of the value of floating constants.The first few digits and the power of ten (sometimes referred to as the order of magnitude or simply themagnitude) contain sufficient value information The magnitude can be calculated by knowing the number
of nonzero digits before the decimal point and the value of the exponent There are many ways in whichthese two quantities can be varied and yet always denote the same value Is there a way of denoting a floatingconstant such that its visible appearance minimizes the cognitive effort needed to obtain an estimate of itsvalue? The possible ways of varying the visible appearance of a floating constant including:
• Not using an exponent; the magnitude is obtained by counting the number of digits in the whole-numberpart
• Having a fixed number of digits in the whole-number part, usually one; the magnitude is obtained bylooking at the value of the exponent
• Some combination of digits in the whole-number part and the exponent
Trang 39There are a number of factors that suggest developers’ effort will be minimized when small numbers are
written using only a few digits before the decimal point rather than using an exponent, including the following:
• Numbers occur frequently in everyday life and people are practiced at processing the range of values
they commonly encounter The prices of many items in shops in the UK and USA tend to have only a
few digits after the decimal point, while in countries such as Japan and Italy they tend to have more
digits (because of the relative value of their currency)
• Subitizing is the name given to the ability most people have of instantly knowing the number of items1641 subitizing
in a small set (containing up to five items, although some people can only manage three) without
explicitly counting them
Your author does not know of any algorithm that optimizes the format (i.e., how many digits should appear
before a decimal point or selecting whether to use an exponent or not) in which floating-point constants
appear, such that reader effort in extracting a value from them is minimized
Example
Your author is not aware of any studies investigating the effect that the characteristics of human information
processing (e.g., the Stroop effect) have on the probability of the value of a constant being misinterpreted 1641 stroop effect
While support for the hexadecimal representation of floating constants may not be defined in other language
standards, some implementations of these languages (e.g., Fortran) support it
Trang 40Writing the number in normalized form, we get:
1100.01011000010100011111×20= 1.10001011000010100011111×23 (848.3)Representing the number in single-precision, the exponent bias is 127, giving an exponent of 127 + 3 =
13010= 100000102 The final bit pattern is (where | indicates the division of the 32-bit representation intosign bit, exponent, and significand):
What is the decimal representation of the hexadecimal floating-point constant, assuming an IEC 60559representation of0x0.12345p0? For the significand we have:
.1234516= 000100100011010001012= 1.0010001101000101×2−4 (848.5)For the exponent we have:
which gives a bit pattern of: