3D Graphics with OpenGL ES and M3G- P12 pps

This way most fragments that are behind other objects will be discarded by the depth test, avoiding a lot of useless frame buffer updates.. At least blending and writing to the frame buf

Trang 1

both depth fail and pass) A very advanced use case for stenciling is volumetric shadow

casting [Hei91]

Depth test

Depth testing is used for hidden surface removal: the depth value of the incoming

frag-ment is compared against the one already stored at the pixel, and if the comparison

fails, the fragment is discarded If the comparison function is LESS only fragments with

smaller depth value than already in the depth buffer pass; other fragments are discarded

This can be seen in Figure 3.2, where the translucent object is clipped to the depth values

written by the opaque object The passed fragments continue along the pipeline and are

eventually committed to the frame buffer

There are other ways of determining the visibility Conceptually the simplest approach is

the painter’s algorithm, which sorts the objects into a back-to-front order from the camera,

and renders them so that a closer object always draws over the previous, farther objects

There are several drawbacks to this The sorting may require signiﬁcant extra time and

space, particularly if there are a lot of objects in the scene Moreover, sorting the

prim-itives simply does not work when the primprim-itives interpenetrate, that is, a triangle pokes

through another If you instead sort on a per-pixel basis using the depth buffer, visibility

is always resolved correctly, the storage requirements are ﬁxed, and the running time is

proportional to the screen resolution rather than the number of objects

With depth buffering it may make sense to have at least a partial front-to-back rendering

order, the opposite that is needed without a depth buffer This way most fragments that

are behind other objects will be discarded by the depth test, avoiding a lot of useless frame

buffer updates At least blending and writing to the frame buffer can be avoided, but

some engines even perform texture mapping and fogging only after they detect that the

fragment survives the depth test

Depth offset

As already discussed in Section 2.5.1, the depth buffer has only a ﬁnite resolution

Deter-mining the correct depth ordering for objects that are close to each other but not close to

the near frustum plane may not always be easy, and may result in z-ﬁghting, as shown in

Figure 2.11 Let us examine why this happens

Figure 3.22 shows a situation where two surfaces are close to each other, and how the

distance between them along the viewing direction increases with the slope or slant of the

surfaces Let us interpret the small squares as pixel extents (in the horizontal direction as

one unit of screen x, in the vertical direction as one unit of depth buffer z), and study the

image more carefully On the left, no matter where on the pixel we sample the surfaces, the

lower surface always has a higher depth value, but at this z-resolution and at this particular

depth, both will have the same quantized depth value In the middle image, if the lower

surface is sampled at the left end of the pixel and the higher surface at the right end, they

Trang 2

x

z

x

z

x

F i g u r e 3.22: The slope needs to be taken into account with polygon offset The two lines are two surfaces close to each other, the arrow shows the viewing direction, and the coordinate axes illustrate

x and z axis orientations On the left, the slope of the surfaces with respect to the viewing direction

is zero The slope grows to 1 in the middle, and to about 5 on the right The distance between the surfaces along the viewing direction also grows as the slope increases.

will have the same depth On the rightmost image, the depth order might be inverted depending on where the surfaces are evaluated In general, due to limited precisions in the depth buffer and transformation arithmetic, if two surfaces are near each other, but have different vertex values and different transformations, it is almost random which surface appears in the front at any given pixel

The situation in Figure 2.11 is contrived, but z-ﬁghting can easily occur in real applica-tions, too For example, in a shooter game, after you spray a wall with bullets, you may want to paint bullet marks on top of the wall You would try to align the patches with the wall, but want to guarantee that the bullet marks will resolve to be on top By adding a

polygon offset, also known as depth offset, to the bullet marks, you can help the rendering

engine to determine the correct order The depth offset is computed as

where m is the maximum depth slope of the polygon, computed by the rendering engine for each polygon, while factor and units are user-given constants.

3.5.2 BLENDING

Blending takes the incoming fragment color (the source color) and the current value in the color buffer (the destination color) and mixes them Typically the value in the alpha

channel determines how the blending is done

Some systems do not reserve storage for alpha in the color buffer, and do not therefore support a destination alpha In such a case, all computations assume the destination alpha

to be 1, allowing all operations to produce meaningful results If destination alpha is sup-ported, many advanced compositing effects become possible [PD84]

Trang 3

Two interpretations of alpha

The transparency, or really opacity (alpha = 1 typically means opaque, alpha = 0,

transpar-ent) described by alpha has two different interpretations, as illustrated in Figure 3.23 One

interpretation is that the pixel is partially covered by the fragment, and the alpha denotes

that coverage value Both in the leftmost image and in the middle image two triangles

each cover about one-half of the pixel On the left the triangle orientations are

indepen-dent from each other, and we get the expected coverage value of0.5 + 0.5 · 0.5 = 0.75,

as the ﬁrst fragment covers one-half, and the second is expected to cover also one-half of

what was left uncovered However, if the triangles are correlated, the total coverage can

be anything between 0.5 (the two polygons overlap each other) and 1.0 (the two triangles

abut, as in the middle image)

The other interpretation of alpha is that a pixel is fully covered by a transparent ﬁlm that

adds a factor of alpha of its own color and lets the rest (one minus alpha) of the existing

color to show through, as illustrated on the right of Figure 3.23 In this case, the total

opacity is also1 − 0.5 · 0.5 = 0.75

These two interpretations can also be combined For example, when drawing transparent,

edge-antialiased lines, the alpha is less than one due to transparency, and may be further

reduced by partial coverage of a pixel

Blend equations and factors

The basic blend equation adds the source and destination colors using blending factors,

producing C = C s S + C d D The basic blending uses factors (S, D) =(SRC_ALPHA,

ONE_MINUS_SRC_ALPHA) That is, the alpha component of the incoming fragment

determines how much of the new surface color is used, e.g., 0.25, and the remaining

F i g u r e 3.23: Left: Two opaque polygons each cover half of a pixel, and if their orientations are

random, the chances are that 0.75 of the pixel will be covered Center: If it is the same polygon

drawn twice, only half of the pixel should be covered, whereas if the polygons abut as in the image,

the whole pixel should be covered Right: Two polygons with50% opacity fully cover the pixel, creating

a compound ﬁlm with 75% opacity.

Trang 4

portion comes from the destination color already in the color buffer, e.g.,1.0 − 0.25 = 0.75 This kind of blending is used in the last image in Figure 3.2

There are several additional blending factors that may be used The simplest ones are ZERO and ONE where all the color components are multiplied with 0 or 1, that is, either ignored or taken as is One can use either the destination or source alpha, or one minus alpha as the blending factor (SRC_ALPHA, ONE_MINUS_SRC_ALPHA, DST_ALPHA, ONE_MINUS_DST_ALPHA) Using the ONE_MINUS version ﬂips the meaning of opacity to transparency and vice versa

With all the factors described so far, the factors for each of the R, G, B, and A channels are the same, and they can be applied to both source or destination colors However, it is also possible to use the complete 4-component color as the blending factor, so that each channel gets a unique factor For example, using SRC_COLOR as the blending factor for destination color produces(R s R d , G s G d , B s B d , A s A d) In OpenGL ES, SRC_COLOR and ONE_MINUS_SRC_COLOR are legal blending factors only for destination color, while DST_COLOR and ONE_MINUS_DST_COLOR can only be used with the source color Finally, SRC_ALPHA_SATURATE can be used with the source color, producing a blend-ing factor(f, f, f, 1) where f = min(A s,1 − A d)

Here are some examples of using the blending factors The default rendering that does not

use blending is equivalent to using (ONE, ZERO) as the (src, dst) blending factors To add

a layer with75% transparency, use 0.25 as the source alpha and select the (SRC_ALPHA,

ONE_MINUS_SRC_ALPHA) blending factors To equally mix n layers, set the factors to (SRC_ALPHA, ONE) and render each layer with alpha = 1/n To draw a colored ﬁlter on

top of the frame, use (ZERO, SRC_COLOR)

A later addition to OpenGL, which is also available in some OpenGL ES implementa-tions through an extension,2allows you to subtract C s S from C d D and vice versa Another

extension3allows you to deﬁne separate blending factors for the color (RGB) and alpha components

Rendering transparent objects

OpenGL renders primitives in the same order as they are sent to the engine With depth buffering, one can use an arbitrary rendering order, as the closest surface will always remain visible However, for correct results in the presence of transparent surfaces in the scene, the objects should be rendered in a back-to-front order On the other hand, this is usually the slowest approach, since pixels that will be hidden by opaque objects are unnecessarily rendered The best results, in terms of both performance and quality, are obtained if you sort the objects, render the opaque objects front-to-back with depth

2 OES_blend_subtract

3

Trang 5

testing and depth writing turned on, then turn depth write off and enable blending, and

ﬁnally draw the transparent objects in a back-to-front order

To see why transparent surfaces need to be sorted, think of a white object behind blue

glass, both of which are behind red glass, both glass layers being50% transparent If you

draw the blue glass ﬁrst (as you should) and then the red glass, you end up with more red

than blue:(0.75, 0.25, 0.5), whereas if you draw the layers in opposite order you get more

blue:(0.5, 0.25, 0.75)

As described earlier, if it is not feasible to separate transparent objects from opaque objects

otherwise, you can use the alpha test to render them in two passes

Multi-pass rendering

The uses of blending are not limited to rendering translucent objects and compositing

images on top of the background Multi-pass rendering refers to techniques where objects

and materials are synthesized by combining multiple rendering passes, typically of the

same geometry, to achieve the ﬁnal appearance Blending is a fundamental requirement

for all hardware-accelerated multi-pass rendering approaches, though in some cases the

blending machinery of texture mapping units can be used instead of the later blending

stage

An historical example of multi-pass rendering is light mapping, discussed in Section 3.4.3:

back in the days of old, when graphics hardware only used to have a single texture unit,

light mapping could be implemented by rendering the color texture and light map

tex-ture as separate passes with (DST_COLOR, ZERO) or (ZERO, SRC_COLOR)

blend-ing in between However, this is the exact same operation as combinblend-ing the two usblend-ing

a MODULATE texture function, so you will normally just use that if you have

multi-texturing capability

While multi-texturing and multi-pass rendering can substitute for each other in simple

cases, they are more powerful combined Light mapping involves the single operation AB,

which is equally doable with either multi-texturing or multi-pass rendering Basically, any

series of operations that can be evaluated in a straightforward left-to-right order, such

as AB + C, can be decomposed into either texturing stages or rendering passes More

complex operations, requiring one or more intermediate results, can be decomposed into

a combination of multi-texturing and multi-pass rendering: AB + CD can be satisﬁed

with two multi-textured rendering passes, AB additively blended with CD.

While you can render an arbitrary number of passes, the number of texture units quickly

becomes the limiting factor when proceeding toward more complex shading equations

This can be solved by storing intermediate results in textures, either by copying the frame

buffer contents after rendering an intermediate result or by using direct render-to-texture

capability

Multi-pass rendering, at least in theory, makes it possible to construct arbitrarily complex

rendering equations from the set of basic blending and texturing operations This has

Trang 6

been demonstrated by systems that translate a high-level shading language into OpenGL rendering passes [POAU00, PMTH01] In practice, the computation is limited by the numeric accuracy of the individual operations and the intermediate results: with 8 bits per channel in the frame buffer, rounding errors accumulate fast enough that great care

is needed to maximize the number of useful bits in the result

3.5.3 DITHERING, LOGICAL OPERATIONS, AND MASKING

Before the calculated color at a pixel is committed to the frame buffer, there are two more processing steps that can be taken: dithering and logical operations Finally, writing to

each of the different buffers can also be masked, that is, disabled.

Dithering

The human eye can accommodate to great changes in illumination: the ratio of the light

on a bright day to the light on a moonless overcast night can be a billion to one With a ﬁxed lighting situation, the eye can distinguish a much smaller range of contrast, perhaps

10000 : 1 However, in scenes that do not have very bright lights, 8 bits, or 256 levels, are sufﬁcient to produce color transitions that appear continuous and seamless Since 8 bits also matches pretty well the limits of current displays, and is a convenient unit of storage and computation on binary computers, using 8 bits per color channel on a display is a typical choice on a desktop

Some displays cannot even display all those 256 levels of intensity, and some frame buffers save in memory costs by storing fewer than 8 bits per channel Having too few bits

avail-able can lead to banding Let us say you calculate a color channel at 8 bits where values

range from 0 to 255, but can only store 4 bits with a range from 0 to 15 Now all values between 64 and 80 (0100000 and 0101000 in binary) map to either 4 or 5 (0100 or 0101)

If you simply quantize the values in an image where the colors vary smoothly, so that values from 56 to 71 map to 4 and from 72 to 87 map to 5, the ﬂat areas and the sudden jumps between them become obvious to the viewer However, if you mix pixels of values

4 and 5 at roughly equal amounts where the original image values are around 71 or 72, the eye fuses them together and interprets them as a color between 4 and 5 This is called

dithering, and is illustrated in Figure 3.24.

F i g u r e 3.24: A smooth ramp (left) is quantized (middle) causing banding Dithering (right) produces

smoother transitions even though individual pixels are quantized.

Trang 7

OpenGL allows turning dithering on and off per drawing command This way, internal

computations can be calculated at a higher precision, but color ramps are dithered just

after blending and before committing to the frame buffer

Another approach to dithering is to have the internal frame buffer at a higher resolution

than the display color depth In this case, dithering takes place only when the frame is

complete and is sent to the display This allows allows reasonable results even on displays

that only have a single bit per pixel, such as the monochrome displays of some low-end

mobile devices, or newspapers printed with only black ink In such situations, dithering

is absolutely required so that any impression of continuous intensity variations can be

conveyed

Logical operations

Logical operations, or logic ops for short, are the last processing stage of the OpenGL

graphics pipeline They are mutually exclusive with blending With logic ops, the source

and destination pixel data are considered bit patterns, rather than color values, and a

logi-cal operation such as AND, OR, XOR, etc., is applied between the source and the destination

before the values are stored in the color buffer

In the past, logical operations were used, for example, to draw a cursor without having

to store the background behind the cursor If one draws the cursor shape with XOR, then

another XOR will erase it, reinstating the original background OpenGL ES 1.0 and 1.1

support logical operations as they are fast to implement in software renderers and allow

some special effects, but both M3G and OpenGL ES 2.0 omit this functionality

Masking

Before the fragment values are actually stored in the frame buffer, the different data ﬁelds

can be masked Writing into the color buffer can be turned off for each of red, green, blue,

or alpha channels The same can be done for the depth channel For the stencil buffer, even

individual bits may be masked before writing to the buffer

3.6 LIFE CYCLE OF A FRAME

Now that we have covered the whole low-level 3D graphics pipeline, let us take a look at

the full life cycle of an application and a frame

In the beginning of an application, resources have to be obtained The most important

resource is the frame buffer This includes the color buffer, how many bits there are for

each color channel, existence and bit depth of the alpha channel, depth buffer, stencil

buffer, and multisample buffers The geometry data and texture maps also require

mem-ory, but those resources can be allocated later

Trang 8

The viewport transformation and projection matrices describe the type of camera that is being used, and are usually set up only once for the whole application The modelview matrix, however, changes whenever something moves, whether they are objects in the scene or the camera viewing the scene

After the resources have been obtained and the ﬁxed parameters set up, new frames are rendered one after another In the beginning of a new frame, the color, depth, and other buffers are usually cleared We then render the objects one by one Before rendering each object, we set up its rendering state, including the lights, texture maps, blending modes, and so on Once the frame is complete, the system is told to display the image If the rendering was quick, it may make sense to wait for a while before starting the next frame, instead of rendering as many frames as possible and using too much power This cycle

is repeated until the application is ﬁnished It is also possible to read the contents of the frame buffer into user memory, for example to grab screen shots

3.6.1 SINGLE VERSUS DOUBLE BUFFERING

In a simple graphics system there may be only a single color buffer, into which new

graphics is drawn at the same time as the display is refreshed from it This single

buffer-ing has the beneﬁts of simplicity and lesser use of graphics memory However, even if the

graphics drawing happens very fast, the rendering and the display refresh are usually not synchronized with each other, which leads to annoying tearing and ﬂickering

Double buffering avoids tearing by rendering into a back buffer and notifying the

sys-tem when the frame is completed The syssys-tem can then synchronize the copying of the rendered image to the display with the display refresh cycle Double buffering is the recommended way of rendering to the screen, but single-buffering is still useful for off-screen surfaces

3.6.2 COMPLETE GRAPHICS SYSTEM

Figure 3.25 presents a conceptual high-level model of a graphics system Applications run on a CPU, which is connected to a GPU with a first-in-first-out (FIFO) buffer The GPU feeds pixels into various frame buffers of different APIs, from which the display subsystem composites the final displayed image, or which can be fed back to graphics processing through the texture-mapping unit The Graphics Device Interface (GDI) block implements functionality that is typically present in 2D graphics APIs of the operating systems The Compositor block handles the mixing of different types of content surfaces

in the system, such as 3D rendering surfaces and native OS graphics

Inside the GPU a command processor processes the commands coming from the CPU

to the 2D or 3D graphics subsystems, which may again be buffered A typical 3D subsys-tem consists of two executing units: a vertex unit for transformations and lighting, and

a fragment unit for the rear end of the 3D pipeline Real systems may omit some of the components; for example, the CPU may do more (even all) of the graphics processing,

Trang 9

CPU FIFO

FIFO

GPU

OpenVG

Composition

Display

Command

Fragment Unit

Texture Memory Graphics Device Interface (GDI)

BUFFER

F i g u r e 3.25: A conceptual model of a graphics system.

some of the FIFO buffers may be direct unbuffered bus connections, or the compositor

is not needed if the 3D subsystem executes in a full-screen mode Nevertheless,

look-ing at the 3D pipeline, we can separate roughly four main execution stages: the CPU,

the vertex unit that handles transformations and lighting (also known as the geometry

unit), the rasterization and fragment-processing unit (pixel pipeline), and the display

composition unit

Figure 3.26 shows an ideal case when all four units can work in parallel While the CPU

is processing a new frame, the vertex unit performs geometry processing for the

previ-ous frame, the rasterization unit works on the frame before that, and the display subunit

displays a frame that was begun three frames earlier If the system is completely balanced,

and the FIFOs are large enough to mask temporary imbalances, this pipelined system

can produce images four times faster than a fully sequential system such as the one in

Figure 3.27 Here, one opportunity for parallelism vanishes from the lack of double

buffer-ing, and all the stages in general wait until the others have completed their frame before

proceeding with the next frame

3.6.3 SYNCHRONIZATION POINTS

We call the situation where one unit of the graphics system has to wait for the input of a

previous unit to complete, or even the whole pipeline to ﬂush, a synchronization point.

Even if the graphics system has been designed to be able to execute fully in parallel, use of

certain API features may create a synchronization point For example, if the application

asks to read back the current frame buffer contents, the CPU has to stall and wait until

all the previous commands have fully executed and have been committed into the frame

buffer Only then can the contents be delivered to the application

Another synchronization point is caused by binding the rendering output to a texture

map Also, creating a new texture map and using it for the ﬁrst time may create a

bottle-neck for transferring the data from the CPU to the GPU and organizing it into a format

Trang 10

T&L

Rasterizer

Flip

N

N 2 1

N 1 1 N 1 2

N 1 3

N 2 2

N 2 3 N 2 2

N 2 1

F i g u r e 3.26: Parallelism of asynchronous multibuffered rendering.

CPU T&L

N 1 1

F i g u r e 3.27: Nonparallel nature of single-buffered or synchronized rendering.

that is native to the texturing unit A similar synchronization point can result from the modiﬁcation of an existing texture map

In general, the best performance is obtained if each hardware unit in the system executes

in parallel The first rule of thumb is to keep most of the traffic flowing in the same direc-tion, and to query as little data as possible back from the graphics subsystem If you must read the results back, e.g., if you render into a texture map, delaying the use of that data until a few frames later may help the system avoid stalling You should also use server-side objects wherever possible, as they allow the data to be cached on the GPU For best performance, such cached data should not be changed after it has been loaded Finally, you can try to increase parallelism, for example, by executing application-dependent CPU processing immediately after GPU-intensive calls such as clearing the buffers, drawing a large textured mesh, or swapping buffers Another way to improve parallelism is to move non-graphics–related processing into another thread altogether

Định dạng
Số trang	10
Dung lượng	186,02 KB