3D Graphics with OpenGL ES and M3G- P42 pptx

Values close to zero have a very high accuracy: two consecutive floats at around 1.0 have a preci-sion of1/16777216, floats at around 250.0 have roughly the same precision as fixed-point nu

Trang 1

does not require this large a range, as only magnitudes up to232need to be representable,

and for colors even210is enough The precision of these ﬁxed-point numbers is ﬁxed:

(1/65536), whereas the precision of ﬂoats depends on the magnitude of the values Values

close to zero have a very high accuracy: two consecutive ﬂoats at around 1.0 have a

preci-sion of1/16777216, ﬂoats at around 250.0 have roughly the same precision as ﬁxed-point

numbers, while larger numbers become more inaccurate (two consecutive ﬂoats around

17 million are further than 1.0 units apart) OpenGL requires only accuracy of one part

in105, which is a little under 17 bits; single-precision ﬂoats have 24 bits of accuracy.

Below are C macros for converting from ﬂoat to ﬁxed and vice versa:

#define fixed_to_float( a ) (((float)a) / (1<<16))

These are “quick-and-dirty” versions of conversion float_to_fixed can overﬂow

if the magnitude of the ﬂoat value is too great, or underﬂow if it is too small

fixed_to_float can be made slightly more accurate by rounding For example,

asymmetric arithmetic rounding works by adding 0.5 to the number before truncating

it to an integer, e.g., (int)floor((a) / 65536.0f + 0.5f)

Finally, note that some of these conversions are expensive on some processors and thus

should not be used in performance-critical code such as inner loops

Here are some 16.16 ﬁxed-point numbers, expressed in hexadecimal, and the

correspond-ing decimal numbers:

Depending on the situation it may make sense to move the decimal point to some other

location, although 16.16 is a good general choice For example, if you are only interested in

numbers between zero and one (but excluding one), you should move the decimal point

all the way to the left; if you use 32 bits denote that with u0.32 (here u stands for unsigned).

In rasterization, the number of sub-pixel bits and the size of the screen in pixels determine

the number of bits you should have on the right side of the decimal point Signed 16.16

is a compromise that is relatively easy to use, and gives the same relative importance to

numbers between zero and one as to values above one

Trang 2

In the upcoming examples we also use other fixed-point formats For example, a 32.32 fixed-point value would be stored using 64 bits and it could be converted to a float by dividing it by232, whereas 32.16 would take 48 bits and have 32 integer and 16 decimal bits, and 32.0 would denote a regular 32-bit signed integer To distinguish between unsigned (such as u0.32) and signed two’s complement fixed-point formats we prepend

unsigned formats with u.

In this appendix, we first go through fixed-point processing in C We then follow by showing what you can do by using assembly language, and conclude with a section on fixed-point programming in Java

A.1 FIXED-POINT METHODS IN C

In this section we ﬁrst discuss the basic ﬁxed-point operations, followed by the shared exponent approach for vector operations, and conclude with an example that precalcu-lates trigonometric functions in a table

A.1.1 BASIC OPERATIONS

The addition of two ﬁxed-point numbers is usually very straightforward (and subtraction

is just a signed add):

#define add_fixed_fixed( a, b ) ((a)+(b))

We have to watch out, though; the operation may overflow As opposed to floats, the overflow is totally silent, there is no warning about the result being wrong Therefore, you should always insert a debugging code to your fixed-point math, the main idea being that the results before and after clamping from 64-bit integers to 32-bit integers have to agree.1 Here is an example of how that can be done:

#if defined(DEBUG) int add_fixed_fixed_chk( int a, int b ) {

int64 bigresult = ((int64)a) + ((int64)b);

int smallresult = a + b;

assert(smallresult == bigresult);

return smallresult;

}

#endif

#if defined(DEBUG)

#else

#endif

1 Code examples are not directly portable Minimally you have to select the correct platform 64-bit type Examples:

Trang 3

Another point to note is that these ﬁxed-point routines should always be macros or inlined

functions, not called through regular functions The function calling overhead would take

away most of the speed beneﬁts of ﬁxed-point programming For the debug versions using

regular functions is ﬁne, though

Multiplications are more complicated than additions Let us analyze the case of

mul-tiplying two 16.16 numbers and storing the result into another 16.16 number When

we multiply two 16.16 numbers, the accurate result is a 32.32 number We ignore the

last 16 bits of the result simply by shifting right 16 steps, yielding a 32.16 number If

all the remaining bits are zero, either one or both of the operands were zero, or we

underﬂowed, i.e., the magnitude of the result was too small to be represented in a 16.16

fixed-point number Similarly, if the result is too large to fit in 16.16, we overflow But if

the result is representable as a 16.16 number, we can simply take the lowest 32 bits Note

that the intermediate result must be stored in a 64-bit integer, unless the magnitude of

the result is known to be under 1.0 before multiplication We are ﬁnally ready to deﬁne

multiplication:

#define mul_fixed_fixed( a, b ) (int)(((int64)(a)*(int64)(b)) >> 16)

If one of the multiplicands is an int, then the inputs are 16.16 and 32.0, the result is 48.16,

and we can omit the shift operation:

Multiplications overﬂow even more easily than additions The following example shows

how you can check for overﬂows in debug builds:

#if defined(DEBUG)

int mul_fixed_fixed_chk( int a, int b )

{

int64 bigresult = (((int64)a) * ((int64)b)) >> 16;

/* high bits must be just sign bits (0’s or 1’s) */

assert( (sign == 0) || (sign == — 1) );

return (int)bigresult;

}

#endif

Note also that multiplications by power-of-two are typically faster when done with shifts

instead of normal multiplication For example:

assert((a << 4) == (a * 16));

Let us then see how division works Dividing two 16.16 numbers gives you an integer,

and loses precision in the process However, as we want the result to be 16.16, we should

shift the nominator left 16 steps and store it in an int64 before the division This also

Trang 4

avoids losing the fractional bits Here are several versions of the division with different arguments (ﬁxed or int), producing a 16.16 result:

#define div_fixed_fixed( a, b ) (int)( (((int64)(a))<<16) / (b) )

These simple versions do not check for overﬂows, nor do they trap the case b= 0 Divi-sion, however, is usually a much slower operation than multiplication If the interval of operations is small enough, it may be possible to precalculate a table of reciprocals and perform multiplication With a wider interval one can do a sparse table of reciprocals and interpolate the nearest results

For slightly more precision, we can incorporate rounding into the fixed-point operations Rounding works much the same way as when converting a float to a fixed-point number: add 0.5 before truncating to an integer Since we use integer division in the operations,

we just have to add0.5 before the division For multiplication this is easy and fairly cheap:

since our divider is the ﬁxed value of1 << 16, we add one half of that, 1 << 15, before

the shift:

#define mul_fixed_fixed_round( a, b ) \ (int)( ((int64)(a) * (int64)(b) + (1<<15)) >> 16)

Similarly, for correct rounding in division of a by b, we should add b/2 to a before dividing by b.

A.1.2 SHARED EXPONENTS

Sometimes the range that is required for calculations is too great to ﬁt into 32-bit registers

In some of those cases you can still avoid the use of full floating point For example, you can create your own floating-point operations that do not deal with the trickiest parts of the IEEE standard, e.g., the handling of infinities, NaNs (Not-a-Numbers), or floating-point exceptions

However, with vector operations, which are often needed in 3D graphics, another pos-sibility is to store the meaningful bits, the mantissas, separately into integers, perform

integer calculations using them, and to share the exponent across all terms For example,

if you need to calculate a dot product of a floating-point vector against a vector of inte-ger or fixed-point numbers, you could normalize the floating-point vector to a common base exponent, perform the multiplications and additions in fixed point, and finally, if needed, adjust the base exponent depending on the result Another name for this practice

of shared exponents is block ﬂoating point.

Using a shared exponent may lead to underﬂow, truncating some of the terms to zero In some cases such truncation may lead to a large error Here is a bit contrived example of a

Trang 5

worst-case error:[1.0e40, 1.0e8, 1.0e8, 1.0e8] · [0, 32768, 32768, 32768] With a shared

exponent the ﬁrst vector becomes[1, 0, 0, 0] ∗ 1e40, which, when dotted with the second

vector, produces a result that is very different from the true answer

The resulting number sequence, mantissas together with the shared exponent, is really a

vectorized ﬂoating-point number and needs to be treated as such in the subsequent

calcu-lations, until to the point where the exponent can be ﬁnally eliminated It may seem that

since the exponent must be normalized in the end in any case, we are not saving much

Keep in mind, though, that the most expensive operations are only performed once for

the full dot product It may even be possible that the required multiplication and

addi-tion operaaddi-tions can be done with efﬁcient multiply-and-accumulate (MAC) operaaddi-tions

in assembler if the processor supports such operations

Conversion from ﬂoating point vectors into vectorized ﬂoating point is only useful in

situations where the cost of conversion can be amortized somehow For example, if you

run 50 dot products where the ﬂoating-point vector stays the same and the ﬁxed-point

vectors vary, this method can save a lot of computation An example where you might

need this kind of functionality is in your physics library A software implementation of

vertex array transformation by modelview and projection matrices is another example

where this approach could be attempted: multiplication of a homogeneous vertex with a

4 × 4 matrix can be done with four dot products

Many processors support operations that can be used for normalizing the result For

example ARM processors with the ARMv5 instruction set or later support the CLZ

instruction that counts the number of leading zero bits in an integer Even when the

processor supports these operations, they are only typically expressed either as

compiler-speciﬁc intrinsic functions or through inline assembler For example, a portable version

of count-leading-zeros can be implemented as follows:

/* Table stores the CLZ value for a byte */

static unsigned char clz_table[256] = { 8, 7, 6, 6, };

INLINE int clz_unsigned( unsigned int num )

{

int res = 24;

if (num >> 16)

{

num >>= 16;

res — = 16;

}

if (num > 255)

{

num >>= 8;

res — = 8;

}

Trang 6

return clz_table[num] + res;

} GCC compiler has a built-in command for CLZ that can be used like this:

INLINE int clz_unsigned( unsigned int num ) {

} The built-in will get compiled to ARM CLZ opcode when compiled to ARM target The performance of this routine depends on the processor architecture, and for some processors it may be faster to calculate the result with arithmetic instructions instead of table lookups

In comparison, the ARM assembly variant of the same thing is:

INLINE int clz_unsigned( unsigned int num ) {

int result;

asm {

} return result;

}

A.1.3 TRIGONOMETRIC OPERATIONS

The use of trigonometric functions such as sin, cos, or arctan can be expensive both in

ﬂoating-point and ﬁxed-point domains But since these functions are repeating, sym-metric, have a compact range [−1,1], and can sometimes be expressed in terms of each

other (e.g., sin(θ+90◦) = cos(θ)), you can precalculate them directly into tables and store

the results in ﬁxed point

A case in point is sin (and from that cos), for which only a90◦segment needs to be tab-ulated, and the rest can be obtained through the symmetry and continuity properties of

sin Since the table needs to be indexed by an integer, the input parameter needs to be

discretized as well Quantizing90◦to 1024 steps usually gives a good trade-off between accuracy, table size, and ease of manipulation of angle values (since 1024 is a power of two) The following code precalculates such a table

short sintable[1024];

int ang;

for( ang = 0; ang < 1024 ; ang++ )

Trang 7

/* angle_in_radians = ang/1024 * pi/2 */

double rad_angle = (ang * PI) / (1024.0 * 2.0);

sintable[ang] = (short)( — sin(rad_angle) * 32768.0);

}

In the loop we ﬁrst convert the table index into radians Using that value we evaluate sin

and scale the result to the chosen ﬁxed-point range The values of sin vary from 0.0 to 1.0

within the ﬁrst quadrant If we multiply value 1.0 of sin by 32768.0 and convert to short,

the result overﬂows to zero A solution is to negate the sin values in the table and negate

those back after the value is read from the table

Here is an example function of extracting values for sin Note that the return value is sin

scaled by 32768.0

INLINE int fixed_sin( int angle )

{

int subang = angle & 1023;

else if ( phase == 1024 ) return — (int)sintable[ 1023 — subang ];

}

A.2 FIXED-POINT METHODS IN ASSEMBLY

LANGUAGE

Typically all processors have instructions that are helpful for ﬁxed-point computations

For example, most processors support multiplication of two 32-bit values into a

64-bit result However, it may be difﬁcult for the compiler to ﬁnd the optimal instruction

sequence for the C code; direct assembly code is sometimes the only way to achieve good

performance Depending on the compiler and the processor, improvements of more than

2× can be often achieved using optimized assembly code

Let us take the ﬁxed-point multiplication covered earlier as an example If you multiply

two 32-bit integers, the result will also be a 32-bit integer, which may overﬂow the results

before you have a chance to shift the results back into a safe range Even if the target

processor supports the optimized multiplication, it may be impossible to get a compiler

to generate such assembly instructions To be safe, you have to promote at least one of the

arguments to a 64-bit integer There are two solutions to this dilemma The ﬁrst (easy)

solution is to use a good optimizing compiler that detects the casts around the operands,

and then performs a narrower and faster multiplication You might even be able to study

the machine code sequences that the compiler produces to learn how to express operations

Trang 8

so that they lead to efﬁcient machine code The second solution is to use inlined assembly and explicitly use the narrowest multiply that you can get away with

Here we show an example of how to do ﬁxed-point operations using ARM assembler ARM processor is a RISC-type processor with sixteen 32-bit registers (r0-r15), out of which r15 is restricted to program counter (PC) and r13 to stack pointer (SP), and r14 is typically used as a link register (LR); the rest are available for arbitrary use

All ARM opcodes can be preﬁxed with a conditional check based on which the operation

is either executed or ignored All data opcodes have three-register forms where a constant shift operation can be applied to the rightmost register operand with no performance cost For example, the following C-code

int INLINE foo( int a, int b ) {

int t = a + (b >> 16);

} executes in just two cycles when converted to ARM:

(reverse subtract) For more details about ARM assembler, see www.arm.com/documentation Note that the following examples are not optimized for any particular ARM implementa-tion The pipelining rules for different ARM variants, as well as different implementations

of each variant, can be different

The following example code multiplies a u0.32 ﬁxed-point number with another u0.32 ﬁxed-point number and stores the resulting high 32 bits to register r0

; assuming:

; r2 = input value 0

; r3 = input value 1

; result is directly in r0 register, low bits in r1

In the example above there is no need to actually shift the result by 32 as we can directly store the high bits of the result to the correct register To fully utilize this increased control

of operations and intermediate result ranges, you should combine primitive operations (add, sub, mul) into larger blocks The following example shows how to multiply a nor-malized vec4 dot product with a vertex or a normal vector represented as 16.16 ﬁxed point

Trang 9

We want to make the code run as fast as possible and we have selected the ﬁxed-point

ranges accordingly In the example we have chosen the range of the normalized vector of

the transformation matrix to be 0.30, as we are going to accumulate the results of four

multiplications together, and we need 2 bits of extra room for accumulation:

; input:

; r1-r4 = vec4 (assumed to be same over N input vectors) X,Y,Z,W

;

; in the code:

; 64-bit output is in r8:r7,

; we take the high 32 bits (r8 register) directly

As we implemented the whole operation as one vec4· vec4 dot product instead of a

collection of primitive ﬁxed-point operations, we avoided intermediate shifts and thus

improved the accuracy of the result By using the 0.30 ﬁxed-point format we reduced

the accuracy of the input vector by 2 bits, but usually the effect is negligible: remember

that even IEEE ﬂoats have only 24 signiﬁcant bits With careful selection of ranges, we

avoided overﬂows altogether and eliminated a 64-bit shift operation which would require

several cycles By using ARM-speciﬁc multiply-and-accumulate instructions that operate

directly in 64 bits, we avoided doing 64-bit accumulations that usually require 2 assembly

opcodes: ADD and ADC (add with carry).

In the previous example the multiplication was done in ﬁxed point If the input values,

e.g., vertex positions, are small, some accuracy is lost in the ﬁnal output because of the

ﬁxed position of the decimal point For more accuracy, the exponents should be tracked

as well In the following example the input matrix is stored in a format where each matrix

column has a common exponent and the scalar parts are normalized to that exponent

The code shows how one row is multiplied Note that this particular variant assumes

availability of the ARMv5 instruction CLZ and will thus not run on ARMv4 devices

; input:

;

Trang 10

; r2 — r6 = X,Y,Z,W,E (exponent)

; Code below does not do tight normalization (e.g., if

; we have number 0x00000000 00000001, we don’t return

; 0x40000000, but we subtract the exponent with 32 and return

; 0x00000001) This is because we do only highest-bit

; counting in the high 32 bits of the result No accuracy

; is lost due to this at this stage.

;

; If tight normalization is required, it can be added with

; extra comparisons.

; The following opcode (eor) calculates the rough abs(r9)

; value Positive values stay the same, but negative

; values are bit-inverted — > outcome of ~abs( — 1) = 0 etc.

; This is enough for our range calculation Note that we

; use arithmetic shift that extends the sign bits.

; note2: ARM register shift with zero returns the original value

; output in r9 (scalar) and r6 (exponent)

Tiêu đề	3D Graphics with OpenGL ES and M3G
Trường học	Standard University
Chuyên ngành	Computer Science
Thể loại	Bài luận
Năm xuất bản	2023
Thành phố	City Name

Định dạng
Số trang	10
Dung lượng	144,51 KB