The New C Standard- P9

The term constant is used because the numeric values do not change during program execution and are known at translation time; although in some cases a person reading the source may only

Trang 1

The standard does specify a minimum limit on the number of characters a translator must consider as

significant Implementations are free to ignore characters once this limit is reached The ignored characters

282 internal identifier significant characters

283 external identifier significant characters

do not form part of another token It is as if they did not appear in the source at all

C90

The C90 Standard does not explicitly state this fact

Other Languages

Few languages place limits on the maximum length of an identifier that can appear in a source file Like C,

some specify a lower limit on the number of characters that must be considered significant

Coding Guidelines

Using a large number of characters in an identifier spelling has many potential benefits; for instance, it

provides the opportunity to supply a lot of information to readers, or to reduce dependencies on existing

reader knowledge by spelling words in full rather than using abbreviations There are also potential costs;

for instance, they can cause visual layout problems in the source (requiring new-lines within an expression

in an attempt to keep the maximum line length within the bounds that can be viewed within a fixed-width

window), or increase the cognitive effort needed to visually scan source containing them

The length of an identifier is not itself directly a coding guideline issue However, length is indirectly

involved in many identifier memorability, confusability, and usability issues, which are discussed elsewhere.792 identifier

syntax

Usage

The distribution of identifier lengths is given in Figure792.7

UCN

Commentary

Using other UCNs results in undefined behavior (in some cases even using these UCNs can be a constraint

violation) These character encodings could be thought of as representing letters in the specified national816UCNsnot basic

A collating sequence may not be defined for these universal character names In practice a lack of a defined

collating sequence is not an implementation problem Because a translator only ever needs to compare the

spelling of one identifier for equality with another identifier, which involves a simple character-by-character

comparison (the issue of the ordering of diacritics is handled by not allowing them to occur in an identifier)

Support for this functionality is new and the extent to which implementations are likely to check that

UCN values fall within the list given in annex D is not known

Trang 2

The intended purpose for supporting universal character names in identifiers is to reduce the developer effortneeded to comprehend source Identifiers spelled in the developer’s native tongue are more immediatelyrecognizable (because of greater practice with those characters) and also have semantic associations that aremore readily brought to mind

The ISO 10646 Standard does not specify which languages contain the characters it specifies (although itISO 10646 28

does give names to some sets of characters that correspond to a language that contains them) The writtenform of some human languages share common characters; for instance, the characters a through z (and theiruppercase forms) appear in many European orthographies The following discussion refers to using UCNsorthography 792

from more than one human language This is to be taken to mean using UCNs that are not part of the writtenform of the native language of the developer (the case of developers having more than one native language

is not considered) For instance, the character a is used in both Swedish and German; the character û isused in Swedish, but not German; the character ß is used in German but not Swedish Both Swedish andGerman developers would be familiar with the character a, but the character ß would be considered foreign

to a Swedish developer, and the character û foreign to the German

Some coding guideline documents recommend against the use of UCNs Their use within identifierscan increase the portability cost of the source The use of UCNs is an economic issue; the potential cost

of not permitting their use in identifiers needs to be compared against the potential portability benefits.(Alternatively, the benefits of using UCNs could be compared against the possible portability costs.)

Given the purpose of using UCNs, is there any rationale for identifiers to contain characters from morethan one human language? As an English speaker, your author can imagine a developer wanting to use

an English word, or its common abbreviation, as a prefix or suffix to an identifier name Perhaps an Urduspeaker can imagine a similar usage with Urdu words The issue is whether the use of characters in the sameidentifier from different human languages has meaning to the developers who write and maintain the source.Identifiers very rarely occur in isolation Should all the identifiers in the same function, or even sourcefile, only contain UCNs that form the set of characters used by a single human language? Using charactersfrom different human languages when it is possible to use only characters from a single language, potentiallyincreases the cost of maintenance Future maintainers are either going to have to be familiar with theorthography and semantics of the two human languages used or spend additional time processing instances ofidentifiers containing characters they are not familiar with However, in some cases it might not be possible

to enforce a single human language rule For instance, a third-party library may contain callable functionswhose spellings use characters from a human language different from that used in the source code thatcontains calls to it

Support for the use of UCNs in identifiers is new in C99 (and other computer languages) and at the time

of this writing there is almost no practical experience available on the sort of mistakes that developers makewith them

Trang 3

Table 797.1: The Unicode digit encodings.

Encoding Range Language Encoding Range Language 0030–0039 ISO Latin-1 0BE7–0BEF Tamil (has no zero) 0660–0669 Arabic–Indic 0C66–0C6F Telugu

06F0–06F9 Eastern Arabic–Indic 0CE6–0CEF Kannada 0966–096F Devanagari 0D66–0D6F Malayalam 09E6–09EF Bengali 0E50–0E59 Thai 0A66–0A6F Gurmukhi 0ED0–0ED9 Lao 0AE6–0AEF Gujarati FF10–FF19 Fullwidth 0B66–0B6F Oriya digits

C++

This requirement is implied by the terminal non-name used in the C++syntax Annex E of the C++Standard

does not list any UCN digits in the list of supported UCN encodings

Other Languages

Java has a similar requirement

The extent to which different cultural conventions support the use of a digit as the first character in an

identifier is not known to your author At some future date the Committee may chose to support the writing

of integer constants using UCNs If this happens, any identifiers that start with a UCN designating a digit

are liable to result in syntax violations There does not appear to be a worthwhile benefit in a guideline

recommendation dealing with the case of an identifier beginning with a UCN designating a digit

in identifiers;

Commentary

Prior to C99 there was no standardized method of representing nonbasic source character set characters

in the source code Support for multibyte characters in string literals and constants was specified in C90;

some implementations extended this usage to cover identifiers They are now officially sanctioned to do this

Support for the ISO 10646 Standard is new in C99 However, there are a number of existing implementations 28 ISO 10646that use a multibyte encoding scheme and this usage is likely to continue for many years The C committee

recognized the importance of this usage and do not force developers to go down a UCN-only path

The standard says nothing about the behavior of the_ _func_ _reserved identifier in the case when a810 func function name is spelled using wide characters

implementation-defined mapping of the source file characters, and an implementation may choose to support

multibyte characters in identifiers via this route

Trang 4

Other Languages

While other language standards may not mention multibyte characters, the problem they address is faced byimplementations of those languages For this reason, it is to be expected that some implementations of otherlanguages will contain some form of support for multibyte characters

existing software is converted to use it

Common Implementations

It is common to find translators aimed at the Japanese market supporting JIS, shift-JIS, and EUC encodings(see Table243.3) These encoding use different numeric values than those given in ISO 10646 to representthe same national character

800

When preprocessing tokens are converted to tokens during translation phase 7, if a preprocessing token could

be converted to either a keyword or an identifier, it is converted to a keyword

Commentary

The Committee could have created a separate name space for keywords and allowed developers to defineidentifiers having the same spelling as a keyword The complexity added to a translator by such a specificationwould be significant (based on implementation experience for languages that support this functionality),while a developer’s inability to define identifiers having these spellings was considered a relatively smallinconvenience

C90

This wording is a simplification of the convoluted logic needed in the C90 Standard to deduce from aconstraint what C99 now says in semantics The removal of this C90 constraint is not a change of behavior,since it was not possible to write a program that violated it

Trang 5

Extended characters were not available in C90, so the suggestion in this footnote does not apply 215 extended

characters

Other Languages

Issues involving third-party linkers are common to most language implementations that compile to machine

code Some languages, for instance Java, define the characteristics of an implementation at translation

and execution time The Java language specification goes to the extreme (compared to other languages) of

specifying the format of the generated file object code file

There is a long-standing convention of prefixing externally visible identifier names with an underscore

character when information on them is written out to an object file There is little experience available on

implementation issues involving UCNs, but many existing linkers do assume that identifiers are encoded

using 8-bit characters

The encoding of external identifiers only needs to be considered when interfacing to, or from code written in

another language Cross-language interfacing is outside the scope of these coding guidelines

universal character name

Commentary

Some linkers may not support an occurrence of the backslash (\) character in an identifier name One solution

to this problem is to create names that cannot be declared in the source code by the developer; for instance,

by deleting the\characters and prefixing the name with a digit character

There are no standards for encoding of universal character names in object files The requirement to support

this form of encoding is too new for it to be possible to say anything about common encodings

Commentary

Here the word long does not have any special meaning It simply suggests an identifier containing many

characters

282 internal identifier significant characters

All characters are significant.20)

C identifiers that differ after the last significant character will cause a diagnostic to be generated by a C++

translator

Annex B contains an informative list of possible implementation limits However, “ these quantities

are only guidelines and do not determine compliance.”

Trang 6

Internal identifiers only need to be processed by the translator and the standard is in a strong position to

2.10p1 All characters are significant.20)

References to the same C identifier, which differs after the last significant character, will cause a diagnostic

to be generated by a C++translator

There is also an informative annex which states:

Annex Bp2 Number of initial characters in an internal identifier or a macro name [1024]

Number of initial characters in an external identifier [1024]

Trang 7

While the C90 minimum limits for the number of significant characters in an identifier might be considered

unacceptable by many developers, the C99 limits are sufficiently generous that few developers are likely to

complain

Automatically generated C source sometimes relies on a large number of significant characters in an

identifier This can occur because of the desire to simplify the implementation of the generator Character

sequences in different offsets within an identifier might be reserved for different purposes Predefined default

character sequence is used to pad the identifier spelling where necessary

As the following example shows, it is possible for a program’s behavior to change, both when the number

of significant identifiers is increased and when it is decreased

1 /*

2 * Yes, C99 does specify 64 significant characters in an internal

3 * identifier But to keep this example within the page width

4 * we have taken some liberties.

14 * If there are 34 significant characters, the following operand

15 * will resolve to the locally declared object.

16 *

18 * will resolve to the globally declared object.

29 * will resolve to the globally declared object.

30 *

32 * will resolve to the locally declared object.

34 _1 _2 _3 _bb++;

35 }

The following issues need to be addressed:

• All references to the same identifier should use the same character sequence; that is, all characters are

intended to be significant References to the same identifiers that differ in nonsignificant characters

need to be treated as faults

• Within how many significant characters should different identifiers differ? Should identifiers be

required to differ within the minimum number of significant characters specified by the standard, or

can a greater number of characters be considered significant?

Readers do not always carefully check all characters in the spelling of an identifier The contribution made by

characters occurring in different parts of an identifier will depend on the pattern of eye movements employed

Trang 8

..

Figure 806.1: Occurrence of unique identifiers whose significant characters match those of a different identifier (as a percentage

of all unique identifiers in a program), for various numbers of significant characters Based on the visible form of the c files.

by readers, which in turn may be affected by their reasons for reading the source, plus cultural factors (e.g.,reading

kinds of

770

direction in which they read text in their native language, or the significance of word endings in their nativelanguage) Characters occurring at both ends of an identifier are used by readers (at least native English- andidentifiers

Identifiers that differ in a single significant character may be considered to be

• different identifiers by a translator, but considered to be the same identifier by some readers of thesource (because they fail to notice the difference)

• the same identifiers by a translator (because the difference occurs in a nonsignificant character), butconsidered to be different identifiers by some readers of the source (because they treat all characters asbeing significant)

• identifiers by both a translator and some readers of the source

The possible reasons for readers making mistakes are discussed elsewhere, as are the guideline developer

Trang 9

1 extern int e1;

2 extern long el;

3 extern int a_longer_more_meaningful_name;

4 extern int a_longer_more_meeningful_name;

5 extern int a_meaningful_more_longer_name;

Commentary

While the obvious implementation strategy is to ignore the nonsignificant characters, the standard does not

require implementations to use this strategy To speed up identifier lookup many implementations use a

hashed symbol table— the hash value for each identifier is computed from the sequence of characters it

contains Computing this hash value as the characters are read in, to form an identifier, saves a second pass

over those same characters later If nonsignificant characters were included in the original computed hash

value, a subsequent occurrence of that identifier in the source, differing in nonsignificant characters, would

result in a different hash value being calculated and a strong likelihood that the hash table lookup would fail

Developers generally expect implementations to ignore nonsignificant characters An implementation that

behaved differently because identifiers differed in nonsignificant characters might not be regarded as being

very user friendly Highlighting misspellings that occur in nonsignificant characters is not always seen in a

positive light by some developers

C++

In C++all characters are significant, thus this statement does not apply in C++

Other Languages

Some languages specify that nonsignificant characters are ignored and have no effect on the program, while

others are silent on the subject

Most implementations simply ignore nonsignificant characters They play no part in identifier lookup in

symbol tables

The coding guideline issues relating to the number of characters in an identifier that should be considered

signifi-cant characters

809Forward references: universal character names (6.4.3), macro replacement (6.10.3).

6.4.2.2 Predefined identifiers

Semantics

brace of each function definition, the declaration

static const char func [] = "function-name";

Commentary

Implicitly declaring_ _func_ _immediately after the opening brace in a function definition means that

the first, developer-written declaration within that function can access it Giving_ _func_ _static storage

duration enables its address to be referred to outside the lifetime of the function that contains it (e.g., enabling

a call history to be displayed at some later stage of program execution) This is not a storage overhead

because space needs to be allocated for the string literal denoted by_ _func_ _ Theconstqualifier ensures

Trang 10

that any attempts to modify the value cause undefined behavior The identifier_ _func_ _has an array type,and is not a string literal, so the string concatenation that occurs in translation phase 6 is not applicable.transla-

is not necessary when that object is defined using theconstqualifier.gccalso supports the built-in form

_ _FUNCTION_ _

Example

Debugging code in functions can provide useful information But when there are lots of functions, thequantity of useless information can be overwhelming Controlling which functions are to output debugginginformation by using conditional compilation requires that code be edited and the program rebuilt

The names of functions can be used to dynamically control which functions are to output debugginginformation This control not only reduces the amount of information output, but can also reduce executiontime by orders of magnitude (output can be a resource-intense operation)

Trang 11

10 * Use the name of the function to control whether debugging is

11 * switched on/off lookup is only called the first time this code

12 * is executed, thereafter the value f _l->enabled can be used.

14 #define D_func_trace(func_name, code) { \

15 static func list * f _l = NULL; \

16 if (f _l ? f _l->enabled : lookup(&f _l, func_name)) \

6 * A fixed list of functions and their debug mode.

7 * We could be more clever and make this a list which

8 * could be added to as a program executes.

22 * Loop through lookup_table looking for a match against f_name.

23 * If a match is found, add f_list to the traces_seen list and

24 * return the value of enabled for that entry.

31 * Loop through lookup_table looking for a match against f_name.

32 * If a match is found, loop over its traces_seen list setting

33 * the enabled flag to new_enabled.

34 *

35 * This function can switch on/off the debugging output from

36 * any registered function.

38 }

translated into the execution character set as indicated in translation phase 5

Commentary

Having the name appearing as if in translation phase 5 avoids any potential issues caused by macro names 133transla-tion phase

5

Trang 12

defined with the spelling of keywords or the name_ _func_ _ It also enables a translator to have an identifiername and type predefined internally, ready to be used when this reserved identifier is encountered Translationphase 5 is also where characters get converted to their corresponding members in the execution character set,

an essential requirement for spelling a function name In many implementations the function name written tothe object file, or program image, is different from the one appearing in the source This translation phase 5program

9 * The implicit declaration does not appear until after preprocessing.

10 * So there is no declaration ’static const char func [] = "f";’

11 * visible to the preprocessor (which would result in func being

12 * mapped to CNUF and "f" rather than "g" being output).

Trang 13

Names beginning with_ _are reserved for use by a C++implementation This leaves the way open for a C++

implementation to use this name for some purpose

6.4.3 Universal character names

815

universal acter name syntax

It is intended that this syntax notation not be visible to the developer, when reading or writing source code

that contains instances of this construct That is, auniversal-character-nameaware editor displays the

ISO 10646 glyph representing the numeric value specified by thehex-quadsequence value Without such58 glypheditor support, the whole rationale for adding these characters to C, allowing developers to read and write

identifiers in their own language, is voided

It is difficult to imagine developers regularly using UCNs with an editor that does not display UCNs in

some graphical form A guideline recommending the use of such an editor would not be telling developers

anything they did not already know

A number of theories about how people recognize words have been proposed One of the major issues yet792Wordrecognition

models of

to be resolved is the extent to which readers make use of whole word recognition versus mapping character

sequences to sound (phonological coding) Support for UCNs increases the possibility that developers will

encounter unfamiliar characters in source code The issue of developer performance in handling unfamiliar

Trang 14

6 foo(\\u0123); /* Does contain a UCN */

not basic

char-acter set 0024 ($), 0040 (@), or 0060 (‘), nor one in the range D800 through DFFF inclusive.62)

in the basic source character set The ranges 0D800 through DBFF and 0DC00 through 0DFFF are known

as the surrogate ranges The purpose of these ranges is to allow representation of rare characters in futureversions of the Unicode standard

This constraint means that source files cannot contain the UCN equivalent for any members of the basicsource character set

RationaleUCNs are not permitted to designate characters from the basic source character set in order to permit fastcompilation times for C programs For some real world programs, compilers spend a significant amount oftime merely scanning for the characters that end a quoted string, or end a comment, or end some other token.Although, it is trivial for such loops in a compiler to be able to recognize UCNs, this can result in a surprisingamount of overhead

A UCN is constrained not to specify a character short identifier in the range 0000 through 0020 or 007F through009F inclusive for the same reason: this avoids allowing a UCN to designate the newline character Sincedifferent implementations use different control characters or sequences of control characters to representnewline, UCNs are prohibited from representing any control character

C++

2.2p2 If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F–0x9F (inclusive),

or if the universal character name designates a character in the basic source character set, then the program isill-formed

The range of hexadecimal values that are not permitted in C++is a subset of those that are not permitted in C.This means that source which has been accepted by a conforming C translator will also be accepted by aconforming C++translator, but not the other way around

Trang 15

817Universal character names may be used in identifiers, character constants, and string literals to designate

characters that are not in the basic character set

Commentary

UCNs may also appear in comments However, comments do not have a lexical structure to them Inside a

comment character, sequences starting with\uare not treated as UCNs by a translator, although other tools

may choose to do so, in this context The mapping of UCNs in character constants and string literals to the

execution character set occurs in translation phase 5

The constraint on the range of values that a UCN may take prevents them from being used to represent816 UCNs

not basic acter set

char-keywords

C++

The C++Standard also supports the use of universal character names in these contexts, but does not say in

words what it specifies in the syntax (although 2.2p2 comes close for identifiers)

Other Languages

In Java,UnicodeInputCharacterscan represent any character and is mapped in lexical translation step

1 It is possible for every character in the source to appear in this form The mapping only occurs once, so

\u005cu005abecomes\u005a, notZ(005cis the Unicode value for\and005ais the Unicode character

forZ)

UCNs in character constants and string literals are used to represent characters that are output when a program

is executed, or in identifiers to provide more readable source code In the former case it is possible that

UCNs from different natural languages will need to be represented In the latter case it might be surprising if

source code contained UCNs from different languages This usage is a complex one involving issues outside

of these coding guidelines (e.g., configuration management and customer requirements) and your author has

insufficient experience to know whether any guideline recommendations might be worthwhile

Some of the coding guideline issues relating to the use of characters outside of the basic execution

The standard specifies how UCNs are represented in source code A development environment may chose to

provide, to developers, a visible representation of the UCN that matches the glyph with the corresponding

numeric value in ISO 10646 The ISO 10646 BNF syntax for short identifiers is: ISO 10646

short identifier

{ U | u } [ {+}(xxxx | xxxxx | xxxxxx) | {-}xxxxxxxx ]

where x represents a hexadecimal digit

Trang 16

is likely to result in more significant characters being retained in identifiers having external linkage.

819

nnnn (and whose eight-digit short identifier is 0000nnnn)

Commentary

It was possible to represent all of the characters specified by versions 1 and 2 of the Unicode-sponsoredcharacter set using four-digit short identifiers Version 3 introduced characters whose representation valuerequires more than four digits

that existing tools (e.g., editors) continue to be able to process source files

The control characters may have special meaning for some tools that process source files (e.g., a nications program used for sending source down a serial link)

syntax

895

value does not change What the C Standard calls a constant-expression developers often shorten to constant

Trang 17

Footnote 21

21) The term “literal” generally designates, in this International Standard, those tokens that are called “constants”

The C++Standard also includesstring-literalandboolean-literalin the list of literals, but it does

not include enumeration constants in the list of literals However:

7.2p1

required

The C++terminology more closely follows common developer terminology by using literal (a single token)

and constant (a sequence of operators and literals whose value can be evaluated at translation time) The value

of a literal is explicit in the sequence of characters making up its token A constant may be made up of more

than one token or be an identifier The operands in a constant have to be evaluated by the translator to obtain

its result value C uses the more easily confused terminology ofinteger-constant(a single token) and

constant-expression(a sequence of operators, integer-constantandfloating-constantwhose

value can be evaluated at translation time)

Other Languages

Languages that support types not supported by C (e.g., instance sets) sometimes allow constants having

these types to be specified (e.g., in Pascal[’a’, ’d’]represents a set containing two characters) Fortran

supports complex literal constants (e.g.,(1.0, 2.0)represents the complex number 1.0 + 2.0i)

Many languages do not support (e.g., Java until version 1.5) some form ofenumeration-constant

Constants are the mechanism by which numeric values are written into source code The term constant is

used because the numeric values do not change during program execution (and are known at translation time;

although in some cases a person reading the source may only know that the value used will be one of a list of

possible values because the definition of a macro may be conditional on the setting of some translation time

object-like

The use of constants in source code creates a number of possible maintenance issues, including:

• A constant value, representing some quantity, often needs to occur in multiple locations within source

code Searching for and replacing all occurrences of a particular numeric value in the code is an error

prone process It is not possible, for instance, to know that all15s occurring in the source code have

the same semantic association and some may need to remain unchanged (Your author was once told

by a developer, whose source contained lots of15s, that the UK government would never change

value-added tax from 15%; a few years later it changed to 17.5%.)

• On encountering a constant in the source, a reader usually needs to deduce its semantic association

(either in the application domain or its internal algorithmic function) While its semantics may be very

familiar to the author of the source, the association between value and semantics may not be so readily

made by later readers

• A cognitive switch may need to be made because of the representation used for the constant (e.g.,0 cognitive

switchfloating point, hexadecimal integer, or character constant)

One solution to these problems is to use an identifier to give a symbolic name822.1to the constant, and to use symbolic name

that symbolic name wherever the constant would have appeared in the source Changes to the value of the

constant can then be made by a single modification to the definition of the identifier and a well-chosen name

can help readers make the appropriate semantic association The creation of a symbolic name provides two

pieces of information:

Trang 18

1 The property represented by that symbolic name For instance, the maximum value of a particulartype (INT_MAX), whether an implementation supports some feature (_ _STDC_IEC_559_ _), a means ofINT_MAX 318

STDC_IEC_559

macro

2015

specifying some operation (SEEK_SET), or a way to obtain information (FE_OVERFLOW)

2 A method of operating on the symbolic name to access the property it represents For instance, metic operations (INT_MAX), testing in a conditional preprocessing directive (_ _STDC_IEC_559_ _),passing as an argument to a library function (SEEK_SET); passing as an argument to a library function,possibly in combination with other symbolic names (FE_OVERFLOW)

arith-Operating on symbolic names involves making use of representation information (Assignment, or argumentpassing, is the only time that representation might not be an issue.) The extent to which the use ofrepresentation information will be considered acceptable will depend on the symbolic name For instance,FE_OVERFLOWappearing as the operand of a bitwise operator is to be expected, but its appearance as theoperand of an arithmetic operator would be suspicious

The use of symbolic names is rarely seen by developers, as applying to all constants that occur in sourcecode In some cases the following are claimed:

• The constants are so sufficiently well-known that there is no need to give them a name

• The number of occurrences of particular constants is not sufficient to warrant creating a name for them

• Operations involving some constant values occur so frequently that their semantic associations areobvious to developers; for instance, assigning0or adding1

It is true that not all numeric values are meaningless to everybody A few values are likely to be universallyknown (at least to Earth-based developers) For instance, there are 60 seconds in a minute, 60 minutes in anhour, and 24 hours in a day The value24occurring in an expression involving time is likely to representhours in a day Many values will only be well known to developers working within a given applicationdomain, such as atomic physics (e.g., the value6.6261E-34) Between these extremes are other values; forinstance,3.14159will be instantly recognized by developers with a mathematics background However,developers without this background may need to think about what it represents There is the possibilitythat developers who have grown up surrounded by other mathematically oriented people will be completelyunaware that others do not recognize the obvious semantic association for this value

A constant having a particular semantic association may only occur once in the source However, theissue is not how many times a constant having a particular semantic association occurs, but how many timesthe particular constant value occurs The same constant value can appear because of different semanticassociations A search for a sequence of digits (a constant value) will locate all occurrences, irrespective ofsemantic association

While an argument can always be made for certain values being so sufficiently well-known that there is nobenefit in replacing them by identifiers, the effort/time taken in discussions on what values are sufficientlywell-known to warrant standing on their own, instead of an identifier, is likely to be significantly greater thanthe sum total of all the extra one seconds, or so, taken to type the identifier

The constant values0and1occur very frequently in source code (see Figure825.1) Experience suggeststhat the semantic associations tend to be that of assigning an initial value in the case of0and accessing apreceding or following item in the case of1 The coding guideline issues are discussed in the subsectionsthat deal with the different kinds of constants (e.g., integer, or floating)

What form of definition should a symbolic name denoting constant value have? Possibilities include thefollowing:

• Macro names These are seen by developers as being technically the same as constants in that they arereplaced by the numeric value of the constant during translation (there can also be an unvoiced biastoward perceived efficiency here)

Trang 19

• Enumeration constants The purpose of an enumerated type is to associate a list of constants with each

other This is not to say the definition of an enumerated type containing a single enumeration constant517enumerationset of named

constants

should not occur, but this usage would be unusual Enumeration constants share the same unvoiced

developer bias as macro names— perceived efficiency

• Objects initialized with the constant This approach is advocated by some coding guideline documents

for C++ The extent to which this is because an object declared with theconstqualifier really is

constant and a translator need not allocate storage for it, or because use of the preprocessor (often

called the C preprocessor, as if it were not also in C++) is frowned on in the C++community and is left

to the reader to decide

The enumeration constant versus macro name issue is discussed in detail elsewhere 517enumerationset of named

constants

What name to choose? The constant6.6261E-34illustrates another pitfall Planck’s constant is almost

universally represented, within the physics community, using the letter h (a closely related constant is ¯h,

the reduced Planck constant)) A developer might be tempted to make use of this idiom to name the value,

perhaps even trying to find a way of using UCNs to obtain the appropriateh The single letterhprobably

gives no more information than the value The name PLANCK_CONSTANTis self-evident The developer

attitude— anybody who does not know what6.6261E-34represents has no business reading the source— is

not very productive or helpful

Table 822.1: Occurrence of different kinds of constants (as a percentage of all tokens) Based on the visible form of the c and

This is something of a circular definition in that a constant’s value is also used to determine its type The

lexical form of a constant is also a factor in determining which of a number of possible types it may take An 824constanttype determined by

form and value

unsuffixed constant that is too large to be represented in the typelong long, or a suffixed constant that is

larger than the type with the greatest rank applicable to that suffix, violates this requirement (unless there is

some extended integer type supported by the implementation into whose range the value falls)

It can be argued that all floating constants are in range if the implementation supports ±∞

There is a similar constraint for enumeration constants 1440enumerationconstant

representable in int

C++

The C++Standard has equivalent wording coveringinteger-literals(2.13.1p3),character-literals

(2.13.2p3) andfloating-literals(2.13.3p1) Forenumeration-literalstheir type depends on the

context in which the question is asked:

7.2p4

the closing brace, the type of each enumerator is the type of its initializing value

7.2p5

Trang 20

The underlying type of an enumeration is an integral type that can represent all the enumerator values defined inthe enumeration.

5 float f_3 = 1e-99999999999999999999999999999999999999999999999; /* Approximately zero */

6 float f_4 = 0e-99999999999999999999999999999999999999999999999; /* Exact zero */

2.13.1p2 The type of an integer literal depends on its form, value, and suffix

2.13.3p1 The type of a floating literal isdoubleunless explicitly specified by a suffix The suffixesfandFspecifyfloat,

There are no similar statements for the other kinds of literals, although C++does support suffixes on thefloating types However, the syntactic form of string literals, character literals, and boolean literals determinestheir type

Trang 21

The type of a constant, unlike object types, can vary between implementations For instance, the integer

constant40000can have either the typeintorlong int The suffix on the integer constant40000uonly

ensures that it has one of the listed unsigned integer types The coding guideline issues associated with the

possibility that the type of a constant can vary between implementations is discussed elsewhere 835integerconstant

type first in list

6.4.4.1 Integer constants

825

integer constant syntax

integer-constant:

decimal-constant integer-suffix opt octal-constant integer-suffix opt hexadecimal-constant integer-suffix opt decimal-constant:

nonzero-digit decimal-constant digit octal-constant:

0

octal-constant octal-digit hexadecimal-constant:

hexadecimal-prefix hexadecimal-digit hexadecimal-constant hexadecimal-digit hexadecimal-prefix: one of

Integer constants are created in translation phase 7 when the preprocessing tokenspp-numberare converted 136transla-tion phase

7

into tokens denoting various forms ofconstant.Integer-constants always denote positive values The

character sequence-1consists of the two tokens {-} {1}, a constant expression 1322constantexpression

syntax

Aninteger-suffixcan be used to restrict the set of possible types the constant can have, it also specifies

the lowest rank an integer constant may have (which forllorLLleaves few further possibilities) TheU, or

u, suffix indicates that the integer constant is unsigned

All translation time integer constants are nonnegative The character sequence-1consists of the token

sequence unary minus followed by thedecimal-constant 1 Support for translation time negative constants

Trang 22

in the lexical grammar would create unjustified complexity by requiring lexers to disambiguate binary fromunary operators uses in, for instance:X-1.

C90

Support forlong-long-suffixand the nonterminalhexadecimal-prefixis new in C99

C++

The C++syntax is identical to the C90 syntax

Support forlong-long-suffixand the nonterminalhexadecimal-prefixis not available in C++

Some implementations specify that the prefix0b(or0B) denotes an integer constant expressed in binarynotation Over the years the C Committee received a number of requests for such a suffix to be added tothe C Standard The Committee did not see sufficient utility for this suffix to be included in C99 The Cembedded systems TR specifieshandHto denote the typesshort fracorshort accum, and one ofk,K,Embed-

ded C TR18

r, andRto denote a fixed-point type

The IBM ILE C compiler[627]supports a packed decimal data type The suffixdorDmay be used tospecify that a literal has this type Microsoft C supports the suffixesi8,i16,i32, andi64denoting integerconstants having the typesbyte(an extension),short,int, and_ _int64, respectively

Other Languages

Although Ada supports integer constants having bases between 1 and 36 (e.g.,2#1101is the binary tation for10#13), few other languages support the use of suffixes Ada also supports the use of underscoreswithin aninteger-constantto make the value more readable

represen-Coding Guidelines

A study by Brysbaert[174]found that the time taken for a person to process an Arabic integer between 1 and

99 was a function of the logarithm of its magnitude, the frequency of the number (based on various estimates

of its frequency of occurrence in everyday life; see Dorogovtsev et al[373]for measurements of numbersappearing in web pages), and sometimes the number of syllables in the spoken form of the value Subjectresponse times varied from approximately 300 ms for values close to zero, to approximately 550 ms forvalues in the nineties

Experience shows that thelong-suffix lis often visually confused with thenonzero-digit 1.825.1

If along-suffixis required, only the formLshall be used

If along-long-suffixis required, only the formLLshall be used

As previously pointed out, constants appearing in the visible form of the source often signify some quantityconstant

syntax822

with real world semantics attached to it However, uses of the integer constants0and1in the visible sourceoften have no special semantics associated with their usage They also represent a significant percentage ofthe total number of integer constants in the source code (see Figure825.1) The frequency of occurrence ofthese values (most RISC processors dedicate a single register to permanently hold the value zero) comesabout through commonly seen program operations These operations include: code to count the number ofoccurrences of entities, or that contain loops, or index the previous or next element of an array (not that 0 or

1 could not also have similar semantic meaning to other constant values)

A blanket requirement that all integer constants be represented in the visible source by symbolic namesfails to take into account that a large percentage of the integer constants used in programs have no special

that has measured the visually similarity of digits with letters.

Trang 23

decimal-constant value

1 10 100 1,000 10,000

100,000

. . .

.

. . .

.

. .. .

. . . .

.

... .. .

. .

. .. . .

.

. .

..

. . . .

.

. .

.

. . .

.

. .

.

. .. ...

.

.. .

.

. .

.

.. .

.

hexadecimal-constant value

.. . . .

.

.. . .

...

.

.. . .

.

...

.

..

.

..

.

. .

.

..

.

...

Figure 825.1: Number of integer constants having the lexical form of adecimal-constant(the literal 0 is also included in this

set) andhexadecimal-constantthat have a given value Based on the visible form of the c and h files.

meaning associated with them In particular the integer constants 0 and 1 occur so often (see Figure825.1)

that having to justify why each of them need not be replaced by a symbolic name would have a high cost for

an occasional benefit

No integer constant, other than 0 and 1, shall appear in the visible source code, other than as the sole

preprocessing token in the body of a macro definition or in an enumeration definition

Some developers are sloppy in the use of integer constants, using them where a floating constant was the

appropriate type The presence of a period makes it explicitly visible that a floating type is being used The

general issue of integer constant conversions is discussed elsewhere

835.2 integer constant with suffix, not immediately converted

surprising because the significant digits of a set of values created by randomly sampling from a variety of

different distributions converges to a logarithmic distribution (i.e., Benford’s law).[583]While the results for

decimal-constant(see Figure825.2) may appear to be a reasonable fit, applying a chi-squared test shows

the fit to be remarkably poor (χ2= 132,398) The first nonzero digit ofhexadecimal-constants appears to

be approximately evenly distributed

Table 825.1: Occurrence of various kinds ofinteger-constants (as a percentage of all integer constants; note that zero is

included in thedecimal-constantcount rather than theoctal-constantcount) Based on the visible form of the c and h

Trang 24

First non-zero digit

1 2 3 4 5 6 7 8 9 A B C D E F 1

10 100

decimal

hexadecimal

Figure 825.2: Probability of adecimal-constantorhexadecimal-constantstarting with a particular digit; based on c files.

Dotted lines are the probabilities predicted by Benford’s law (for values expressed in base 10 and base 16), i.e., log(1 + d −1 ), where d is the numeric value of the digit.

Table 825.2: Occurrence of variousinteger-suffixsequences (as a percentage of allinteger-constants) Based on the

visible form of the c and h files.

Suffix Character Sequence c files h files Suffix Character Sequence c files h files

L/l 0.1378 0.2096 ULL/uLl/ulL/Ull 0.0128 0.0061 U/uL/ul 0.1269 0.1625 LLU/lLu/LlU/llu 0.0000 0.0000

Table 825.3: Common token pairs involvinginteger-constants Based on the visible form of the.c files.

Token Sequence % Occurrence

of First Token

% Occurrence of Second Token

of First Token

ordering rules, which were (for the number pair x and y):

• x has to be smaller than y

• x or y has to be round (i.e., round numbers include the numbers 1 to 20 and the multiples of five)

• the difference between x and y has to be a favorite number (These include: 10n×(1, 2, ½, or ¼) forany value of n.)

Description

Trang 25

826An integer constant begins with a digit, but has no period or exponent part integer constant

Commentary

A restatement of information given in the Syntax clause

Commentary

A suffix need not uniquely determine an integer constants type, only the lowest rank it may have There is no

suffix for specifying the typeint, or any integer type with rank less thanint(although implementations

may provide these as an extension)

The base document did not specify any suffixes; they were introduced in C90 1 base

is often used to denote what the standard calls a decimal constant, which corresponds to the common case

When they occur in source, both octal and hexadecimal constants are usually referred to by these names,

respectively The benefits of educating developers to use the terminology decimal constant instead of integer

constantare very unlikely to exceed the cost

Commentary

A restatement of information given in the Syntax clause

The constant0is, technically, an octal constant Some guideline documents use the term decimal constant in

their wording, overlooking the fact that, technically, this excludes the value0 The guidelines given in this

book do not fall into this trap, but anybody who creates a modified version of them needs to watch out for it

Commentary

A restatement of information given in the Syntax clause An octal constant is a natural representation to use

when the value held in a single byte needs to be displayed (or read in) and the number of output indicators (or

input keys) is limited (only eight possibilities are needed) For instance, a freestanding environment where

the output device can only represent digits The users of such input/output devices tend to be technically

literate

Other Languages

A few other languages (e.g., Java and Ada) support octal constants Most do not

K&R C supported the use of the digits8and9in octal constants (support for this functionality was removed

during the early evolution of C[1199]although some implementations continue to support it[610, 1094]) They

represented the values 10 and 11, respectively

Octal constants are rarely used (approximately 0.1% of allinteger-constants, not counting the value0)

There seem to be a number of reasons why developers occasionally use octal constants:

Trang 26

• A long-standing practice that arguments to calls to some Unix library functions use octal constants toindicate various attributes (e.g.,open(file, O_WRONLY, 0666)) The introduction, by POSIX in

1990, of identifiers representing these properties has not affected many developers’ coding habits Thevalue0666, in this usage, could be said to be treated like a symbolic identifier

• Cases where it is sometimes necessary to think of a bit pattern in terms of its numeric value Bit patternsare invariably grouped into bytes, making hexadecimal an easier representation to manipulate (becauseits visual representation is easily divisible into bytes and half bytes) However, mental arithmeticinvolving octal digits is easier to perform than that with hexadecimal digits (There are fewer items ofinformation that need to be remembered and people have generally automated the processing of digits,but conscious effort is needed to map the alphabetic letters to their numeric equivalents.)

• The values are copied from an external source; for instance, tables of measurements printed in octal.There are no obvious reasons for recommending the use of octal constants over decimal or hexadecimalconstants (there is a potential advantage to be had from using octal constants)

Other Languages

Many languages support the representation of hexadecimal constants in source code The prefix character$

(not available in C’s basic source character set) is used almost as often, if not more so, than the0xform ofbasic source

of two rather than powers to ten In such cases a constant appearing in the source as a hexadecimal constant

is more easily appreciated (in terms of the sums of the powers of two involved and by which powers of two itdiffers from other constants) than if expressed as a decimal constant

Measurements of constant use in source code show that usage patterns for hexadecimal constants areinteger

constantusage

825

different from decimal constants The probability of a particular digit being the first nonzero digit in ahexadecimal constant is roughly constant, while the probability distribution of this digit in a decimal constantdecreases with increasing value (a ch-squared analysis gives a very low probability of it matching Benford’slaw) Also the sequence of value digits in ahexadecimal-constant(see Table830.1) almost always exactlycorresponds to the number of nibbles in either a character type,short,int, orlong

A study by Logan and Klapp[876]used alphabet arithmetic (e.g., A + 2 = C) to investigate how extendedpractice and rote memorization affected automaticity For inexperienced subjects who had not memorized anyautoma-

tization0

addition table, the results showed that the time taken to perform the addition increased linearly with the value

of the digit being added This is consistent with subjects counting through the letters of the alphabet to obtainthe answer With sufficient practice subjects performance not only improved but became digit-independent.This is consistent with subjects recalling the answer from memory; the task had become automatic

The practice group of subjects were given a sum and had to produce the answer The memorization group

of subjects were asked to memorise a table of sums (e.g., A + 3 = D) In both cases the results showed thatperformance was proportional to the number of times each question/answer pair had been encountered, notthe total amount of time spent

Trang 27

Arithmetic involving hexadecimal constants differs from that involving decimal constants in that developers

will have had much less experience in performing it The results of the Logan and Klapp study show that the

only way for developers to achieve the same level of proficiency is to commit the hexadecimal addition table

to memory Whether the cost of this time investment has a worthwhile benefit is unknown

Table 830.1: Occurrence ofhexadecimal-constants containing a given number of digits (as a percentage of all such constants).

Based on the visible form of the c files.

Digits Occurrence Digits Occurrence Digits Occurrence Digits Occurrence

C supports the representation of constants in the base chosen by evolution on planet Earth

Commentary

The C language requires the use of binary representation for the integer types The use of both base 8 and

593 unsigned integer types object representation

base 16 visual representations of binary information has been found to be generally more efficient, for people,

than using a binary representation Developers continue to debate the merits of one base over another Both

experience with using one particular base and the kind of application domain affect preferences

Commentary

The correct Latin prefix is sex, giving sexadecimal It has been claimed that this term was considered too

racey by IBM who adopted hexadecimal (hex is the equivalent Greek prefix, the Latin decimal being retained)

in the 1960s to replace it (the term was used in 1952 by Carl-Eric Froeberg in a set of conversion tables)

834The lexically first digit is the most significant

Commentary

The Arabic digits in a constant could be read in any order In Arabic, words and digits are read/written

right-to-left (least significant to most significant in the case of numbers) The order in which Arabic numerals

are written was exactly copied by medieval scholars, except that they interpreted them using the left-to-right

order used in European languages

type first in list

Commentary

This list only applies to thosepp-numbersthat are converted tointeger-constanttokens as part of

trans-lation phase 7 Integer constants in#ifpreprocessor directives always have typeintmax_t, oruintmax_t136transla-tion phase

7

(in C90 they had typelongorunsigned long)

Trang 28

Other Languages

In Java integer constants have typeintunless they are suffixed withl, orL, in which case they have type

long Many languages have a single integer type, which is also the type of all integer constants

The type of an integer constant may depend on the characteristics of the host on which the program executesand the form used to express its value For instance, the integer constant40000may have typeintorlong int(depending on whetherintis represented in more than 16 bits, or in just 16 bits) The hexadecimalconstant0x9C40(40000decimal) may have typeintorunsigned int(depending on the whetherintisrepresented in more than 16 bits, or in just 16 bits)

For objects having an integer type there is a guideline recommending that a single integer type always

be used (the typeint) However, integer constants never have a type whose rank is less thanintand so

The possibility that the type of an integer constant can vary between implementations and platformscreates a portability cost There is also the potential for incorrect developer assumptions about the type of aninteger constant, leading to additional maintenance costs The specification of a guideline recommendation

is complicated by the fact that C does not support a suffix that specifies the typeint(or its correspondingunsigned version) This means it is not possible to specify that a constant, such as40000, has typeintandexpect a diagnostic to appear when using a translator that gives it the typelong

An integer constant containing a suffix is generally taken as a statement of intent by the developer A suffixedinteger constant that is immediately converted to another type is suspicious

those types

Is there anything to be gained from recommending that integer constants less than32767be suffixed ratherthan implicitly converted to another type? The original type of such an integer constant is obvious to thereader and a conversion to a type for which the standard provides a suffix will not change its value; thereal issue is developer expectation Expectation can become involved through the semantics of what theconstant represents For instance, a program that manipulates values associated with the ISO 10646 Standardmay store these values in objects that always have typeunsigned int This usage can lead to developerslearning (implicitly or explicitly) that objects manipulating these semantic quantities have typeunsigned

Trang 29

left operand has an unsigned type (If it has a signed type, setting the most significant bit will cause the result

to be negative.) If the identifierFOO_charis a macro whose body is a constant integer having a signed type,

developer expectations will not have been met

In those cases where developers have expectations of an operand having a particular type, use of a suffix

can help ensure that this expectation is met If the integer constant appears in the visible source at the point

its value is used, developers can immediately deduce its type An integer constant in the body of a macro

definition or as an argument in a macro invocation are the two circumstances where type information is not

immediately apparent to readers of the source (The integer constant is likely to be widely separated from its

point of use in an expression.)

The disadvantage of specifying a suffix on an integer constant because of the context in which it is used is

that the applicable type may change The issues involved with implicit conversion versus explicit conversion

are discussed elsewhere An explicit cast, using a typedef name rather than a suffix, is more flexible in this 654 implicit

con-versionregard

Use of a suffix not defined by the standard, but provided by the implementation, is making use of an

extension Does this usage fall within the guideline recommendation dealing with use of extensions, or is it 95.1 extensions

cost/benefit

sufficiently useful that a deviation should be made for it? Suffixes are a means for the developer to specify

type information on integer constants Any construct that enables the developer to provide more information

is usually to be encouraged While there are advantages to this usage, at the time of this writing insufficient

experience is available on the use of suffixes to know whether the advantages outweigh the disadvantages A

deviation against the guideline recommendation might be applicable in some cases

Dev95.1

Any integer constant suffix supported by an implementation may be used

Table 835.1: Occurrence ofinteger-constants having a particular type (as a percentage of all such constants; with the type

denoted by any suffix taken into account) when using two possible representations of the typeint(i.e., 16- and 32-bit) Based on

the visible form of the c and h files.

Type 16-bitint 32-bitint

unsigned int 3.493 0.414

unsigned long 0.557 0.138 other-types 0.029 0.059

836

Trang 30

integer constant

possible types

Suffix Decimal Constant Octal or Hexadecimal Constant

long int unsigned int long long int long int

unsigned long int long long int unsigned long long int

uorU unsigned int unsigned int

unsigned long int unsigned long int unsigned long long int unsigned long long int

long long int unsigned long int

long long int unsigned long long int

BothuorU unsigned long int unsigned long int

andlorL unsigned long long int unsigned long long int

llorLL long long int long long int

unsigned long long int

BothuorU unsigned long long int unsigned long long int

andllorLL

Commentary

The lowest rank that an integer constant can have is typeint This list contains the standard integer typesonly, giving preference to these types Any supported extended integer type is considered if an appropriatetype is not found from this list

C90

The type of an integer constant is the first of the corresponding list in which its value can be represented

the letterlorL:long int, unsigned long int; suffixed by both the letters uorUandlorL:unsigned long int.

Support for the typelong longis new in C99

The C90 Standard will give a sufficiently large decimal constant, which does not contain auorUsuffix—the typeunsigned long The C99 Standard will never give a decimal constant that does not contain either

of these suffixes— an unsigned type

Because of the behavior of C++, the sequencing of some types on this list has changed from C90 Thefollowing shows the entries for the C90 Standard that have changed

Suffix Decimal Constant none int

long int unsigned long int

lorL long int unsigned long int

Under C99, the none suffix, andlorLsuffix, case no longer contain an unsigned type on their list

A decimal constant, unless given auorUsuffix, is always treated as a signed type

Trang 31

2.13.1p2

If it is decimal and has no suffix, it has the first of these types in which its value can be represented:int, long

and has no suffix, it has the first of these types in which its value can be represented:int, unsigned int, long

int, unsigned long int If it is suffixed by uorU, its type is the first of these types in which its value can be

represented:unsigned int, unsigned long int If it is suffixed by lorL, its type is the first of these types in

which its value can be represented:long int, unsigned long int If it is suffixed by ul,lu,uL,Lu,Ul,lU,

UL, orLU, its type isunsigned long int.

The C++Standard follows the C99 convention of maintaining a decimal constant as a signed and never an

unsigned type

The typelong long, and its unsigned partner, is not available in C++

There is a difference between C90 and C++in that the C90 Standard can give a sufficiently large decimal

literal that does not contain auorUsuffix— the typeunsigned long Neither the C++or C99 Standard will

give a decimal constant that does not contain either of these suffixes— an unsigned type

Other Languages

In Java hexadecimal and octal literals always have a signed type and denote a negative value if the high-order

bit, for their type, is set The literal0xcafebabehas decimal value -889275714 and typeintin Java, and

decimal value 3405691582 and typeunsigned intorunsigned longin C

extended integer type can represent its value

Commentary

For an implementation to support an integer constant which is not representable by any standard integer type,

requires that it support an extended integer type that can represent a greater range of values than the types

long longorunsigned long long

A C translation unit that contains an integer constant that has an extended integer type may not be accepted

by a conforming C++translator But then it may not be accepted by another conforming C translator either

Support for the construct is implementation-defined

Other Languages

Very few languages explicitly specify potential implementation support for extended integer types

In some implementations it is possible for an integer constant to have a type with lower rank than those given

syntax

Source containing an integer constant, the value of which is not representable in one of the standard integer

types, is making use of an extension The guideline recommendation dealing with use of extensions is 95.1 extensions

cost/benefit

applicable here If it is necessary for a program to use an integer constant having an extended integer type,

the deviation for this guideline specifies how this usage should be handled The issue of an integer constant

being within the range supported by a standard integer type on one implementation and not within range on

greater than 32767

Trang 32

Consider the token100000000000000000000in an implementation that supports a 64-bit two’s complement

long long, and no extended integer types The numeric value of this token outside of the range of anyinteger type supported by the implementation and therefore it has no type

This sentence was added by the response to DR #298

fractional-constant exponent-part opt floating-suffix opt digit-sequence exponent-part floating-suffix opt

e sign opt digit-sequence

E sign opt digit-sequence sign: one of

+

-digit-sequence:

Trang 33

digit digit-sequence digit hexadecimal-fractional-constant:

hexadecimal-digit-sequence opt .

hexadecimal-digit-sequence

hexadecimal-digit-sequence

binary-exponent-part:

p sign opt digit-sequence

P sign opt digit-sequence hexadecimal-digit-sequence:

hexadecimal-digit hexadecimal-digit-sequence hexadecimal-digit floating-suffix: one of

f l F L Commentary

The majority offloating-decimal-constants do not have an exact binary representation For instance, if

FLOAT_RADIXis 2 then only 4% of constants having two digits after the decimal point can be represented

exactly (i.e., those ending 00, 25, 50, and 75)

Unlike aninteger-suffix, afloating-suffixspecifies the actual type, not the lowest rank of a set

of types (not that floating-point types have rank)

Hexadecimal floating constants were introduced to remove the problems associated with translators

incorrectly mapping character sequences denoting decimal floating constants to the internal representation

of floating numbers used at execution time The potential mapping problems only apply to the significand,

so a decimal representation can still be used for the exponent (requiring a hexadecimal representation for

the exponent would have made it harder for human readers to quickly gauge the magnitude of a constant

and created a lexical ambiguity, e.g., would the character sequencep0x1fbe interpreted as ending in the

floating-suffix for not)

The exponent is always required for the hexadecimal notation, unlike decimal floating constants, otherwise,

the translator would not be able to resolve the ambiguity that occurs when af, orF, appears as the last

character of a preprocessing token For instance, 0x1.fcould mean1.0f (thefinterpreted as a suffix

indicating the type float) or1.9375(thefbeing interpreted as part of the significand value)

Thehexadecimal-floating-constant 0x1.FFFFFEp128fdoes not represent the IEC 60559

single-format NaN It overflows to an infinity in the single single-format

C90

Support forhexadecimal-floating-constantis new in C99 The terminaldecimal-floating-constant

is new in C99 and its right-hand side appeared on the right offloating-constantin the C90 Standard

C++

The C++syntax is identical to that given in the C90 Standard

Support forhexadecimal-floating-constantis not available in C++

Trang 34

First non-zero digit

1 10 100

Figure 842.1: Probability of adecimal-floating-constant(i.e., not hexadecimal) starting with a particular digit Based on the visible form of the c files Dotted line is the probability predicted by Benford’s, i.e., log(1 + d −1 ), where d is the numeric value of the digit (χ2= 1,680 is a very poor fit).

Other Languages

Support for hexadecimal-floating-constantis unique to C Fortran 90 supports the use of aKIND

specifier as part of the floating constant Fortran also supports the use of the letterD, rather thanE, in theexponent part to indicate that the constant has typedouble(rather than real, the single-precision defaulttype) Java supports the optional suffixesf(typefloat, the default) andd(typedouble)

Mapping to and from a hexadecimal floating constant, and its value as a floating-point literal, requiresknowledge of the underlying representation The purpose of supporting the hexadecimal floating constantnotation is to allow developers to remove uncertainty over the accuracy of the mapping, of values expressed

in decimal, performed by translators Developers are unlikely to want to express floating constants in adecimal notation for any other reason and the guideline recommendation dealing with use of representationinformation is not applicable

Floating constant may be expressed using the hexadecimal floating-point notation

The advantage of hexadecimal floating constants is that they guarantee an exact (whenFLT_RADIXis a power

of two) floating value in the program image, provided the constant has the same or less precision than thetype

For the same rationale as integer constants, there is good reason why most floating constants should not

sole preprocessing token in the body of a macro definition

Trang 35

Table 842.1: Occurrence of variousfloating-suffixes (as a percentage of all such constants) Based on the visible form of

the c and h files.

Suffix Character Sequence c files h files

Table 842.2: Common token pairs involvingfloating-constants Based on the visible form of the.c files.

of First Token

This defines the terms significand part and exponent part

whole-number part fraction part

Commentary

A restatement of information given in the Syntax clause The character denoting the period, which may

appear when floating-point values are converted to strings, is locale dependent However, the period character

that appears in C source is not locale dependent

A leading zero does not indicate an octal floating-point value

C++

2.13.3p1

The integer part, the optional decimal point and the optional fraction part form the significant part of the floating

literal

The use of the term significant may be a typo This term does not appear in the C++Standard and it is only

used in this context in one paragraph

Other Languages

This form of notation is common to all languages that support floating constants, although in some languages

the period (decimal point) in a floating constant is not optional

The term whole-number is sometimes used by developers A more commonly used term is integer part (the

term used by the C++Standard) The commonly used term for the period character in a floating constant is

decimal point

Trang 36

A common mathematical convention is to have a single nonzero digit preceding the period This is a

floating constant

digit layout useful convention when reading source code since it enables a quick estimate of the magnitude of the value to

be made There are also circumstances where more than one digit before the period, or leading zeros beforeand after the period, can improve readability when the floating constant is one of many entries in a table Inthis case the relative position of the first non zero digit may provide a useful guide to the relative value of aseries of constants, which may be more important information than their magnitudes

Your author knows of no research showing that any method of displaying floating constants minimizes thecognitive effort, or the error rate, in comprehending them However, there does appear to be an advantage inhaving consistency of visual form between constants close to each other in the source Comprehending therelationship between the various initializers appears to require less effort forg_1andg_2than it does forg_3

accu-racy poor character to binary conversion), but they do contain information Trailing zeros can be interpreted as a

statement of accuracy; for instance, the measurement 7.60 inches is more accurate than 7.6 inches

Leading zeros are sometimes used for padding and have no alternative interpretation Adding trailingzeros to a fractional part for padding purposes is misleading They could be interpreted as giving a floatingconstant a degree of accuracy that it does not possess While such usage does not affect the behavior of aprogram, it can affect how developers interpret the accuracy of the results

Floating constants shall not contain trailing zeros in their fractional part unless these zeros accuratelyrepresent the known value of the quantity being represented

845

signed digit sequence

Trang 37

characters before dp characters after dp

1 10

Figure 844.1: Number offloating-constants, that do not contain an exponent part, containing a given number of digit

sequences before and after the decimal point (dp), and the total number of digit in afloating-constant Based on the visible

form of the c and h files.

C++

Like C90, the C++Standard does not support the use ofp, orP

Other Languages

The use of the notationeorEis common to most languages that support the same form of floating constants

Fortran also supports the use of the letterD, rather thanE, to indicate the exponent In this case the constant

has typedouble(there is no typelong double)

Amongst a string of digits, the letterEcan easily be mistaken for the digit8 There is no such problem with

the lettere, which also adds a distinguishing feature to the visual appearance of a floating constant (a change

in the height of the characters denoting the constant) However, there is no evidence to suggest that this

choice of exponent letter is sufficiently important to warrant a guideline recommendation At the time of this

writing there is little experience available for how developers view the exponentpandP While the prefix

indicates that a hexadecimal constant is being denoted, a lowercasepoffers an easily distinguished feature

that its uppercase equivalent does not

When only one of these parts is present, the period character might easily be overlooked, especially when

floating constants occur adjacent to other punctuation tokens such as a comma This problem can be overcome

by ensuring that a digit (zero can always be used) appears on either side of the period However, such usage

is not, itself, free of problems The period can be interpreted as a comma (if the source is being quickly

scanned), causing the digits on either side of the period to be treated as two separate constants The issue of

Trang 38

white space between tokens is discussed elsewhere In the case of digits after the decimal point, there is alsowords

tization0

value with a constant that contains a period than one that only contains an exponent (which is likely to requireconscious attention) However, given existing usage (see Figure844.1) a guideline recommendation does notappear worthwhile

Developers reading source often only need an approximate estimate of the value of floating constants.The first few digits and the power of ten (sometimes referred to as the order of magnitude or simply themagnitude) contain sufficient value information The magnitude can be calculated by knowing the number

of nonzero digits before the decimal point and the value of the exponent There are many ways in whichthese two quantities can be varied and yet always denote the same value Is there a way of denoting a floatingconstant such that its visible appearance minimizes the cognitive effort needed to obtain an estimate of itsvalue? The possible ways of varying the visible appearance of a floating constant including:

• Not using an exponent; the magnitude is obtained by counting the number of digits in the whole-numberpart

• Having a fixed number of digits in the whole-number part, usually one; the magnitude is obtained bylooking at the value of the exponent

• Some combination of digits in the whole-number part and the exponent

Trang 39

There are a number of factors that suggest developers’ effort will be minimized when small numbers are

written using only a few digits before the decimal point rather than using an exponent, including the following:

• Numbers occur frequently in everyday life and people are practiced at processing the range of values

they commonly encounter The prices of many items in shops in the UK and USA tend to have only a

few digits after the decimal point, while in countries such as Japan and Italy they tend to have more

digits (because of the relative value of their currency)

• Subitizing is the name given to the ability most people have of instantly knowing the number of items1641 subitizing

in a small set (containing up to five items, although some people can only manage three) without

explicitly counting them

Your author does not know of any algorithm that optimizes the format (i.e., how many digits should appear

before a decimal point or selecting whether to use an exponent or not) in which floating-point constants

appear, such that reader effort in extracting a value from them is minimized

Example

Your author is not aware of any studies investigating the effect that the characteristics of human information

processing (e.g., the Stroop effect) have on the probability of the value of a constant being misinterpreted 1641 stroop effect

While support for the hexadecimal representation of floating constants may not be defined in other language

standards, some implementations of these languages (e.g., Fortran) support it

Trang 40

Writing the number in normalized form, we get:

1100.01011000010100011111×20= 1.10001011000010100011111×23 (848.3)Representing the number in single-precision, the exponent bias is 127, giving an exponent of 127 + 3 =

13010= 100000102 The final bit pattern is (where | indicates the division of the 32-bit representation intosign bit, exponent, and significand):

What is the decimal representation of the hexadecimal floating-point constant, assuming an IEC 60559representation of0x0.12345p0? For the significand we have:

.1234516= 000100100011010001012= 1.0010001101000101×2−4 (848.5)For the exponent we have:

which gives a bit pattern of:

Tiêu đề	The New C Standard- P9
Trường học	Unknown
Chuyên ngành	Computer Science
Thể loại	essay
Năm xuất bản	2009
Thành phố	Unknown

Định dạng
Số trang	100
Dung lượng	706,47 KB