5.2.4 EXTENSIBILITY Now that we have a fairly generic, rendering-oriented scene graph design, we need to decide whether to make it extensible, that is, to open up the rendering traversal
Trang 1F i g u r e 5.6: Some of the features that are available in 3ds Max for meshes (left ), materials (middle ), and animations (right ).
Only a fraction of these can be supported in real-time systems, particularly on mobile devices that have no programmable graphics hardware (Images copyright cAutodesk.)
Many exciting authoring tool features are ruled out by technical limitations alone,
especially when targeting mobile devices For example, it is hardly feasible to animate
a subdivision-surface model by free-form deformations and render it with refractions,
displacement mapping, and soft shadows Technical constraints notwithstanding, the
mere effort to define and implement such a huge array of techniques is formidable The
definition effort becomes even more difficult if the features need to be standardized
so that independent implementations will work the same way Finally, including
every-thing that is “nice to have” will lead to a bloated system with lots of little-used
functionality that mostly just obscures the essential parts
The M3G standardization group settled for relatively few built-in animation and rendering
techniques Beyond what is directly provided by OpenGL ES 1.0, the key features
are hierarchical transformations, layered (multipass) rendering, two mesh modifiers
(vertex morphing and skinning), and keyframe animation These allow surprisingly
complex animated scenes to be exported from authoring tools, and reproduced at
runtime with very little application code Many sophisticated mesh deformations, for
example, can be exported as suitably keyframed morph targets Of course, almost
Trang 2any technique can be written in Java, using M3G only for rasterization, but then performance might become an issue
5.2.4 EXTENSIBILITY
Now that we have a fairly generic, rendering-oriented scene graph design, we need to
decide whether to make it extensible, that is, to open up the rendering traversal and expose
the underlying rendering engine so that the user can plug in completely new types of objects, write rendering methods for them, and have them blend in seamlessly and behave just like built-in objects The M3G scene graph was not made extensible, for the reasons outlined below
A key issue affecting the extensibility of a scene graph is whether the underlying rendering engine can be dictated at the time of design, or whether the implementations need to
be able to use different low-level APIs M3G is based on the latter approach Although conceptually based on OpenGL ES 1.0, it does not expose the low-level rendering context that it uses internally This design allows practical implementations to use later versions
of OpenGL ES, proprietary extensions, customized software rasterizers, or perhaps even Direct3D Mobile Similarly, emulators and development tools on the PC may well be based on desktop OpenGL
For a scene graph to be considered extensible, it would also have to support user-defined callbacks However, if user-defined callbacks are allowed to modify the scene graph right
in the middle of the rendering traversal, it becomes an implementation nightmare to maintain the security and stability of the system What happens if one of the callbacks removes a scene graph branch that the engine was just processing, for example? On the other hand, if the callbacks are not given write access to the scene graph, they become much less useful
Even providing read-only access to the scene graph during callbacks may be problematic For example, a callback should ideally have access to global data about light sources, bounding boxes, modelview matrices, nearby objects, and so on, but to arrange the internal operations and data structures of the engine so that this information is readily available may not be easy or cheap
For M3G, the final straw that settled the extensibility issue was the environment that the engine is running on Interrupting a relatively tight piece of code, such as the rendering traversal, is inefficient even in pure native code, let alone if it involves transitioning from native code to Java and vice versa As a result, M3G was made a “black box” that never interrupts the execution of any API methods by calling back to user code
5.2.5 CLASS HIERARCHY
Having nailed down the key features of our scene graph, the final step is to come up with
an object-oriented class hierarchy to support those features in a logical and efficient way
Trang 3We need to decide what kind of nodes are available, what components and properties do
they have, which of those may be shared or inherited, and so on
M3G has a very simple hierarchy of nodes: as shown in Figure 5.7, it only has eight
concrete node types and an abstract base class Although the node hierarchy in M3G is
small, it is representative of scene graphs in general In the following, we go through
the M3G node hierarchy from top to bottom, discussing alternative designs along
the way
All nodes in a typical object-oriented scene graph are derived from an abstract base class,
which in M3G is called Node Attributes that are deemed applicable to just any type of
node are defined in the base class, along with corresponding functions that operate on
those attributes There are no hard-and-fast rules on what the attributes should be, but
anything that needs to be inherited or accumulated in the scene graph is a good candidate
In M3G, the most important thing that is present in every node is the node transformation.
The node transformation specifies the position, orientation, and scale of a node relative
to its parent, with an optional3×4 matrix to cover the whole spectrum of affine
transfor-mations (see Section 2.3) Other properties of M3G nodes include various on/off toggles
and masks
Some scene graph systems also allow low-level rendering state, such as blending modes,
to be inherited from parent nodes to their children This capability is more trouble
than it is worth, though, and so was left out of M3G Resolving the complete rendering
state for an object is slow and error-prone if each individual state bit is a function
of arbitrarily many nodes encountered along the way from the root to the leaf Also,
it makes little sense for rendering attributes to be inheritable in a system that is
Node Group
Mesh
Sprite3D
Camera
Light
World
SkinnedMesh
MorphingMesh
F i g u r e 5.7:The class hierarchy of scene graph nodes in M3G The arrows denote inheritance:
World is derived from Group, SkinnedMesh and MorphingMesh are derived from Mesh, and
every-thing is ultimately derived from Node.
Trang 4optimized for spatial organization: objects should be grouped according to their physical proximity, not because of their texture map or shininess
Group nodes are the basic building blocks of a scene graph, and they come in many fla-vors; some examples are shown in Figure 5.8 The basic Group node in M3G stores an unordered and unlimited set of child nodes The only other type of group in M3G is the designated root node, World Other scene graph designs may support groups that store
an ordered set of nodes, groups that select only one of their children for rendering, groups that store a transformation, and so on
The structure of the basic rigid-body Mesh of M3G was already described in Section 5.1; see Figure 5.1 for a quick recap The MorphingMesh is otherwise the same, but includes multiple VertexBuffers—the morph targets—and a weighting factor for each The SkinnedMeshis a hierarchical construct that forms an entire branch in the main scene graph; it is essentially a very specialized kind of group node See Figure 12.5 for how a SkinnedMeshis structured Note that regardless of the type of mesh, the vertex buffers and other mesh components in M3G can be shared between multiple meshes This allows, for example, a variety of car objects to share a single base mesh while only the texture maps are different
Sprite3Dis a screen-aligned quadrilateral having a position and optionally a size in 3D space It can be used for billboards, text labels, UI widgets, and others Sprite3D also
Switch
Node A Node B Node C
selected
Group
Node A Node B Node C
OrderedGroup
Node A Node B Node C
3rd 2nd 1st
LOD
Node C
D < 10? D > 100?
Node A Node B
F i g u r e 5.8: Different kinds of group nodes that have been used in earlier scene graphs M3G only supports the basic, unordered groups, but has other means to implement the OrderedGroup and Switch behaviors There is no direct substi-tute for the level-of-detail node LOD; to get the same effect, you will need to manually enable and disable nodes based on
their distance (D) from the camera.
Trang 5illustrates the notion of having different kinds of renderable objects in a scene graph, not
only triangle meshes Some scene graphs support a wide variety of renderables that are
not ordinary triangle meshes, at least not from the user’s point of view Such renderables
include spheres, cylinders, terrains, particles, impostors, skyboxes, and so on
The Camera node defines from where and how the scene is viewed The camera node
has a position and orientation in the scene, together constituting the transformation from
world coordinates to eye coordinates The camera also defines a projective transformation
that maps the eye coordinates into clip coordinates The projective transformation may
be given explicitly in the form of a4 × 4 matrix, or implicitly by defining the extents
of the view frustum There are often several camera nodes in the scene to facilitate easy
switching from one viewpoint to another For example, a racing game might feature the
driver’s view, rear view, and a view from behind
Finally, the Light node defines a light source The types of lights supported by M3G
include ambient, directional, point, and spot lights They are modeled after the OpenGL
lighting equation
5.3 RETAINED MODE RENDERING
Retained mode refers to a programming paradigm for 3D graphics where a persistent
representation of graphical content is stored in memory and managed by a library layer
The persistent representation is often called a scene graph Compared to immediate
mode, where fine-grained rendering commands are submitted to the graphics API and
immediately executed, the retained-mode programmer performs less low-level work in
loading, managing, culling, and rendering the scene Also, giving more control over
the content to the graphics library gives the library an opportunity to optimize the
data for the underlying hardware
Early scene graphs, such as Performer by SGI [RH94], were designed to work around
the performance problems of the original OpenGL, which had a very immediate-mode
API indeed: several function calls had to be made to draw each triangle, yielding a lot
of overhead Also, vertices, indices, textures, and all other graphics resources were held
in application memory and controlled by the application This made it difficult for
OpenGL to internally cache or optimize any of the source data The only retained-mode
concept available were display lists, i.e., compiled sequences of OpenGL function calls,
but they turned out to be inflexible from the application point of view, and difficult
to optimize from the OpenGL driver point of view.1
Later versions of OpenGL, and OpenGL ES even more so, have departed from their
pure immediate-mode roots Vertex arrays and texture objects were introduced first,
1 As a result, display lists were not included in OpenGL ES.
Trang 6followed by Vertex Buffer Objects (VBOs), and most recently Frame Buffer Objects (FBOs) This trend of moving more and more data into graphics memory—the “server side” in OpenGL parlance—is still ongoing with, e.g., Direct3D 10 adding State Objects [Bly06]
M3G was designed to be a retained-mode system from the ground up Although it does have a concept of immediate mode, all data are still held in Java objects that are fully managed by M3G The difference is the “full” retained mode is just that: those objects are rendered individually, as opposed to collecting them into a complete scene graph Retained-mode rendering in a typical M3G implementation is, at least on a conceptual level, done as shown in Figure 5.9 Note that this is all happening in native code, with-out having to fetch any data from the Java side We will now describe each step in more detail
5.3.1 SETTING UP THE CAMERA AND LIGHTS
The first step is to set up global parameters, such as the camera and lights Finding the active camera is easy, as there is a direct link to it from the World To find the light sources,
we have to scan through the entire scene graph, but in practice this only needs to be done once The set of lights in a scene is unlikely to change on a regular basis, so we can easily cache direct pointers to them for later use
Once we have the lights collected into a list, we transform them into eye coordinates
by multiplying the position and/or direction of each light by its modelview matrix To compute the modelview matrices, we trace the scene graph path from each light node
to the camera node, concatenating the node transformations along the way into a3 × 4 matrix Note that many of these paths will typically overlap, particularly the closer we get
to the camera node, so it makes a lot of sense to cache the transformations in some form
A simple but effective scheme is to cache the world-to-camera transformation; this will be
Set up camera and lights
Update bounding volumes
Collect potentially visible objects
Resolve rendering state
Sort Render
Next frame
F i g u r e 5.9: Scene graph rendering in a typical M3G implementation No Java code is involved in this process, as the scene graph is retained in native data structures.
Trang 7needed a lot as we go forward Caching the local-to-world transformation for each node
may be a good idea, as well
5.3.2 RESOLVING RENDERING STATE
After setting up global state, we move on to individual objects Traversing the scene graph,
we first eliminate any nodes and their descendants that have the rendering enable flag
turned off For each mesh that remains, we check whether its scope mask (see
Section 15.6.2) matches with that of the camera, culling the mesh if not As the final quick
check, we drop any submeshes that have no associated Appearance
We then resolve the rendering state for each remaining object The state includes
numerous transformations, appearance components, vertex and index buffers, and so
on At this stage we also quickly validate each object, checking that its vertex coordinates
are present, that triangle indices do not point beyond vertex array boundaries, that a
SkinnedMeshhas all the necessary bones in place, and so on
To compute the modelview matrix for a mesh, we again trace the path from the
mesh node upward in the scene graph until we hit the root node, compounding any
node transformations along the way into one matrix This matrix is then concatenated
with the world-to-camera transformation, which we cached earlier, to obtain the final
modelview matrix
Compared to ordinary meshes, skinned meshes (see Section 4.2.2) need some special
treatment For each bone in the skeleton, we need to compute a compound
transforma-tion to the coordinate system where the actual skinning is to be done This may be the
eye space, the world space, or the coordinate system of the SkinnedMesh node itself
In principle, the choice of coordinate system makes no difference to the end result, but in
practice, the impact of low-precision arithmetic gets more severe the more
transforma-tions we compound into the bone matrices Thus, using the SkinnedMesh coordinate
system may be a good idea on an integer-only CPU
Once we are done with the transformations, we associate each mesh with the lights that
potentially affect it; this is again determined using the scope masks If there are more lights
associated with an object than the underlying rendering engine can handle, we simply
select the N most relevant lights and ignore the rest.
5.3.3 FINDING POTENTIALLY VISIBLE OBJECTS
The next stage in retained-mode rendering is to determine which objects are inside or
intersecting the view frustum, and are therefore potentially visible Note that any number
of the potentially visible objects may be entirely occluded by other objects, but in the
absence of occlusion culling, we need to render all of them anyway
Before the actual view frustum culling, we need to update the bounding volumes that are
stored in each node In a bounding volume hierarchy (BVH), such as the one shown in
Trang 8Figure 5.4, the bounding volume of a group node encloses the bounding volumes of its children We start updating the volumes from the meshes at the leaf nodes, proceeding
upward in the tree until we reach the root node Dirty flags, propagated upward in the
hierarchy, may be used to speed up the traversal: only those branches need to be pro-cessed where some node transformations or vertex coordinates have changed since the last frame
The bounding volume of a node may be a sphere, an axis-aligned bounding box (AABB),
an oriented bounding box (OBB), or any arbitrary shape as long as it encloses all
ver-tices contained in that node and its descendants The most common types of bounding volumes are shown in Figure 5.10 Practical M3G implementations are likely to be using AABBs and bounding spheres only The more complex volumes are too slow to gener-ate automatically, and there is no way in the M3G API for the developer to provide the bounding volumes Bounding spheres and AABBs are also the fastest to check against the view frustum for intersections
Ideally, different kinds of bounding volumes would be used on different types of objects and scenes For example, bounding spheres are not a good fit with architectural models, but may be the best choice for skinned meshes Bounding spheres are the fastest type of bounding volume to update, which is an important property for deformable meshes, and they also provide a fairly tight fit to human-like characters (recall the famous “Vitruvian Man” by Leonardo da Vinci)
With the bounding volume hierarchy updated and the rendering state resolved, we tra-verse the scene graph one final time; this time to cull the objects that are outside the view frustum Starting from the World, we check whether the bounding volume of the current node is inside, outside, or intersecting the view frustum If it is inside, the objects in that branch are potentially visible, and are inserted to the list of objects that will ultimately
be sent to the rendering engine If the bounding volume is outside the view frustum, the branch is not visible and gets culled If the bounding volume and view frustum intersect,
we recurse into the children of that node, and repeat from the beginning
F i g u r e 5.10:Different kinds of bounding volumes, illustrated in two dimensions for clarity From the left: axis-aligned
bounding box (AABB), oriented bounding box (OBB), bounding sphere, and convex polytope The convex polytope in this example is constructed from an AABB, shown in dashed line, by beveling its horizontal and vertical edges at 45◦angles.
Trang 95.3.4 SORTING AND RENDERING
As the final step before rendering, we sort the list of potentially visible objects by two or
more criteria The primary criterion is the rendering layer, which is a user-specified global
ordering of submeshes; see Section 15.4 The secondary sorting key is transparency—
opaque objects must be rendered first so that translucent objects can be properly blended
with them Ideally, the transparent objects would be further sorted into a back-to-front
order (see Section 3.5.2), but this is not required by M3G due to the potential impact to
performance
Any further sorting keys exist merely to optimize the rendering order for the underlying
rendering engine, and are thus specific to each implementation and device A good rule
of thumb is to sort into buckets by rendering state, then front-to-back within each bucket
to minimize overdraw See Chapter 6 for more information on rendering optimizations
State sorting is made easier and faster by the fact that rendering state is grouped into
appearance components to begin with There are usually just a few different instances of
each type of component in a scene, so they are easily enumerated or hashed into a fixed
number of bits, and used as part of the sorting key The sorting key can therefore be made
very compact, for instance a 32-bit or 64-bit integer
Finally, we iterate through the sorted queue of objects and dispatch them to the low-level
rendering engine To start off, we set the low-level rendering state to that of the first object
in the queue We render that object and any subsequent objects having the same state, and
repeat from the start when we hit the first object with differing state
When all objects have been sent to the renderer, we return control back to the application,
letting it draw some more graphics or just flush the frame buffer to the screen The
appli-cation is then expected to animate and otherwise update the scene graph in preparation
for the next frame
Trang 10PERFORMANCE AND
SCALABILITY
The fundamental challenge anyone programming for mobile phones faces is that to be successful in the marketplace, an application needs to be deployed on dozens of different phone models Although the existence of programming standards such as OpenGL ES and M3G has reduced the fragmentation in the market, one still has to deal with a broad and diverse range of devices
The performance characteristics, available memory, potential display configurations, programming tool chains, Java stack implementations, control devices, available libraries, operating systems, and underlying CPU architectures vary from one phone model to another The problem of writing applications that port and scale to all these devices is such
a hard and complex one that several industry-wide standardization efforts have emerged
to tackle it, e.g., OpenKODE from the Khronos Group, and the Java platform and related JSR libraries defined by the Java Community Process
For the purposes of this discussion, we will ignore most of the portability and scala-bility issues and concentrate on those that are related to 3D graphics Even so, dealing with the variety in the devices out there is a formidable challenge The performance difference in raw rendering power between a software and a hardware-based renderer can be hundredfold—whether this can be utilized and measured in real-life scenarios
is an entirely different matter The lowest-end devices with a 3D engine use96 × 65 monochrome displays, and have a 20MHz ARM7 processor The high end at the time