Therefore, if an intersection with part of the bounding hierarchy is detected, then the volumes and geometry within that bounding volume must also be tested.. Different from low-level fe
Trang 1in a vector form, the directions of the reflected and refracted rays can be determined
n2cos 1 cos 2
N
(14)
As an optimization that may be made in the implementation of a raytracer, themedium in the absence of geometry may be assumed to have an index of refrac-tion of one Therefore, as n2 is now one, the index of refraction for each piece ofgeometry becomes equal to the ratio of the indices of refraction of the medium tothat of its surroundings, simplifying (14) to
VrefractD L C cos 1 cos 2/ N (15)
Controlling Scene Complexity
Because each intersection of a ray with scene geometry may generate additionalrays, the number of rays to be traced through the scene grows geometrically with thenumber of ray-geometry intersections, and therefore with scene complexity In ex-treme cases, the number of rays traced and therefore the number of intersection testsperformed can reach infinity Fortunately, very few surfaces are perfectly reflective
or transparent and absorb some of the light energy falling on them For this son, the total light energy represented by the secondary rays diminishes with eachintersection and the effect of the secondary rays on the output image is less signifi-cant with each bound Note that once a ray escapes the volume enclosing the scenegeometry, it can no longer generate any ray-geometry intersections and it will notcontribute further to the output image
rea-To reduce computational complexity and therefore calculation time required togenerate the output image using the raytracing algorithm, various methods are usu-ally employed to limit the number of recursions which may be generated by a singleprimary ray In its simplest form, this can be achieved by placing an upper limit onthe number of bounces allowed [Shirley 05] In a more sophisticated system, theprojected effect of the ray on the output image may be estimated after each inter-section [Hall 83] If the expected contribution of either a reflected or transmittedray falls below a particular threshold, the new ray is not traced By altering thisthreshold, a rough approximation to a final image may be generated quickly forpreview purposes, or a more refined, final output
Figure 3 shows examples of an output image generated using primary andshadow rays only for recursion depths of zero to five In Figure3(a), no reflection
is seen because the reflection rays are not generated at the primary ray intersectionpoint This causes the spheres to appear flat and dull Figure3(b) depicts reflections
Trang 2540 G Sellers and R Lukac
Fig 3 An image generated using primary and secondary rays for (a) zero, (b) one, (c) two, (d) three, (e) four and (f) five bounces
Trang 3in Figure3(c) to Figure3(f), and the differences between each successive recursion
of the ray tracing algorithm becomes harder and harder to distinguish Namely, asshown in Figure4, very little additional information is added past the third bounce.Therefore, the raytracing process can be stopped in this case after three or fourbounces
Image Quality Issues
The primary image quality concern with ray traced images is aliasing artifacts[Cook 86] As ray tracing is a point-sampling technique, it is subject to spatialaliasing when it reaches or exceeds the sampling limit This limit is described bythe Nyquist theorem [Nyquist 28], which states that the maximum frequency of asignal that may be represented by a point-sampled data set is half of the samplingfrequency of that set Any content of a signal beyond this limit will manifest as
Trang 4542 G Sellers and R Lukac
an aliased signal at a lower frequency It follows from this theorem that, in thetwo-dimensional output image, the sampling rate of the image is determined by thespacing of the pixels, in other words, the image resolution However, the input sig-nal is essentially the scene geometry, which may contain very fine details Aliasingcan appear in a number of ways The most obvious effect is jagged edges of objects
in the output image Another serious artifact is the disappearance of fine details,including very small objects Because objects in a ray-traced scene may be repre-sented analytically, it is possible for a small object to fall between primary rays
In this case, no intersection will be detected; the small object will not have any fect on the output image and will therefore not be visible In an animation wherethe small object moves between frames, in some frames the object will generate anintersection and in others it will not It will therefore seem to appear and disappear
ef-as it moves
Figure5shows an image with the checkered texture generated procedurally; ithas infinite detail and the smallest squares seen in the distance cover an area signifi-cantly smaller than a pixel As can be seen, the image rendered with no antialiasing(Figure 5a) has noticeably lower quality compared to the image rendered with amoderate level of antialiasing (Figure5b)
To compensate for aliasing artifacts, various antialiasing techniques have beendevised Most methods for reducing aliasing artifacts are based on the concept ofsuper-sampling, which refers to the computation of samples at a higher frequencythan the output resolution [Dobkin 96] A simple approach to super-sampling con-sists of taking multiple samples for each pixel in the output image, and using theaveraged value of these samples as the output value This is equivalent to generating
a higher resolution output image and then down-sampling it to the desired outputresolution Namely, each pixel is subdivided into several subpixels; for example,
Fig 5 An infinite checkered plane rendered (a) without antialiasing and (b) with modest antialiasing
Trang 5a b
Fig 6 Grid sampling: (a) regular pattern and (b) pseudorandom pattern
into a three by three grid, producing nine subpixels per output pixel A primaryray is then generated passing through each of the subpixels If the center of eachsubpixel is used as the target for the primary ray then aliasing artifacts can still beseen, albeit at higher frequencies This is because the sampling pattern is still regu-lar, and there may still be scene content that would generate image data beyond thenew, higher sampling frequency To compensate for this, more sample points can beadded, thus dividing each output pixel into more and more subpixels Alternatively,
an irregular sampling pattern can be created by moving the sample positions withinthe subpixels
This is known as jittered grid super-sampling Figure6(a) shows the spacing ofthe three by three subpixels of the regular grid The randomized positions used in
a jittered grid are shown in Figure6(b) The advantage of arranging the subsamplepositions on a pseudorandom grid is that not only are the effects of aliasing reduced,but straight lines at all angles are equally well represented, which is not the casewhen a regular grid pattern is used
Figure7 shows the same image rendered with a regular sampling grid in ure7(a) and with a jittered grid in Figure7(b) Inset at the top of each image is amagnified section of the upper edge of each square As it can be seen, the appear-ance of jagged hard edges in the rendered image reduces with the increased number
Fig-of shades used to represent the edge
There are a large number of other methods of implementing antialiasing Many
of these methods are adaptive [Painter 89], [Mitchell 90], allowing higher levels
of antialiasing to be applied at edges, or other areas with high frequency content.However, an in-depth analysis is beyond the scope of this article and we will notdiscuss the topic further here
Trang 6544 G Sellers and R Lukac
of the computation time is spent performing intersection tests [Whitted 80] As mary rays enter the scene, they must be tested for intersection with all elements
pri-of the scene geometry The performance pri-of a raytracer is thus heavily influenced
by its ability to coarsely discard large parts of the scene from the set of geometrythat must be tested for intersection against a given primary ray [Clark 76] Further-more, any secondary rays generated as a result of theses intersections must also
be tested against the scene geometry, potentially generating millions of additionalintersection tests
There has been a large amount of work performed in the area of accelerationand optimization of raytracing algorithms [Weghorst 84], [Kay 86] Whilst optimiz-ing the intersection test itself can produce reasonable results, it is important to try
to minimize the number of tests that are performed This is usually achieved byemploying hierarchical methods for culling; mostly those based on bounding vol-umes and space partitioning tree structures [Arvo 89]
Bounding Volumes
When using a bounding hierarchy approach to coarsely culling geometry, a tree
is constructed with each node containing the bounding volumes of finer and finer
Trang 7pieces of the scene geometry [Rusinkiewicz 00] At each of the leaf nodes of thetree, individual geometry elements of the scene are contained in a form that isdirectly tested for intersection with the ray being traced Two important considera-tions must be made when choosing a structure for the tree representing the boundinghierarchy First, the geometric primitive representing the bounding volume must beone that is simple to test for the presence of an intersection with a line segment in
a three-dimensional space All that is important for a quick determination whetherthere is an intersection with the bounding volume or not If it can be determined that
no intersection is made with the bounding volume, all of the geometry containedwithin that volume may be discarded from the potential set of geometry that may beintersected by the ray As all that is desired is to know whether a ray may intersectgeometry within the volume, the exact location of the intersection of the ray andthe geometric primitive is not important; it is sufficient to know only whether there
is an intersection with the primitive Second, as the use of bounding volumes forculling geometry is an optimization strategy, the time taken to generate the bound-ing volumes should be less than the time saved by using them during the raytracingprocess If it were not, then the total time required to generate the bounding hierar-chy and then use it to render the image would be greater than the time required torender the image without the acceleration structure
For these reasons, spheres are often used as the geometric primitive representingbounding volumes [Thomas 01] Unlike finding a sphere that encloses a set of geom-etry reasonably tightly, finding the smallest enclosing sphere for a set of geometry is
a rather complex task Although the bounding sphere does not necessarily representthe smallest possible bounding volume for a specific set of geometry, it is relativelytrivial to test against for the presence or absence of an intersection The lack of
an intersection between the ray and the bounding volume allows the volume to beskipped in the tree, but the presence of an intersection with a bounding volume doesnot necessarily indicate the existence of an intersection with part of the geometrycontained within it Therefore, if an intersection with part of the bounding hierarchy
is detected, then the volumes and geometry within that bounding volume must also
be tested It is possible that a ray may intersect part of the bounding hierarchy yetnot intersect any geometry within it
Space Partitioning Tree Structures
Popular examples of space partitioning data structures are binary space partitioningtrees [Fuchs 80], octrees and kd-trees [Bentley 75] These tree structures recursivelysubdivide volumes into smaller and smaller parts
The binary space partitioning tree was originally described by Henry Fuchs in
1980 [Fuchs 80] In this method, a binary tree is constructed by recursively dividing the three-dimensional space of the scene by planes At each node, a newplane is constructed with geometry that falls on one side of the plane in one childbranch of the tree and geometry falling on the other side of the plane placed in the
Trang 8sub-546 G Sellers and R Lukacother Eventually, each child node of the binary tree will contain a minimal num-ber of geometric primitives, or it will no longer be possible to further subdividethe space without intersecting scene geometry In the latter case, a decision must
be made as to whether to include the geometry in both children of the subdivisionplane or to cease the subdivision process
The octree is a space partitioning tree structure that operates on volumes of dimensional space The entirety of the scene geometry is enclosed in an axis alignedbounding box At each node, the space is subdivided in half in each axis, givingnode eight child nodes Any geometry that straddles more than one of the children
three-is placed within the current node That geometry that three-is not cut by the subdivthree-ision three-iscarried further down the recursion until it too is cut The recursion stops either when
no geometry is left to carry to the next smaller level, or after some predefined ber of steps to limit the recursion depth Each node maintains a list of the geometrythat falls within it As rays are cast through the scene, a list of octree nodes throughwhich the ray will pass is generated, and only the geometry contained within thosenodes need be tested against for intersections This has the potential to greatly accel-erate the tracing process, especially in cases where the scene is made up of a largeamount of very fine geometry
num-Hardware Accelerated Raytracing
Recently, interest has been shown in using hardware to accelerate ray tracingoperations, particularly within the context of real-time raytracing In some instances,standard PC hardware has been shown to have become fast enough to produce highquality raytraced images at interactive and real-time rates For example, Intel hasshown a demonstration of the popular video game Enemy Territory: Quake Warsrendered using raytracing [Phol 09] In this project, the team was able to achieveframe rates of between 20 and 35 frames per second using four 2.66GHz Intel Dun-nington processors While not technically hardware acceleration, this demonstrateswhat is currently possible using very high-end hardware Furthermore, Intel’s up-coming Larrabee product is expected to be well suited to raytracing [Seiler 08].Moving further along the spectrum of hardware acceleration, the Graphics Pro-cessing Unit (GPU) has been used for raytracing At first, it was necessary to mapthe raytracing problem to existing graphics application programming interfaces,such as Open Graphics Library (OpenGL) as in [Purcell 02] More recently, theGPU has been used as a more general purpose processor and applications such
as raytracing have become more efficient on such systems For example, NVidiapresented an implementation of a real-time raytracing algorithm using their Cudaplatform [Luebke 08]
Dedicated hardware has also been investigated as a means to achieve high formance in ray tracing applications An example of such work can be found in[Schmittler 04], where a dedicated raytracing engine was prototyped using a field-programmable gate array (FPGA) The same team has continued to work on the
Trang 9per-system [Woop 05] and has achieved impressive results using only modest hardwareresources The outcome has been a device designed to accelerate the scene traversaland ray-geometry intersection tests Using a 66MHz FPGA, the project has shownraytraced scenes at more than 20 frames per second It can be expected that shouldsuch a device scale to the multi-gigahertz range that is seen in modern CPU imple-mentations, significantly higher performance could be achieved.
Summary
This chapter presented the fundamentals of raytracing which is an advanced puter graphics methods used to render an image by tracing the path of light throughpixels in an image plane Raytracing is not a new field; however, there is still cur-rently a large amount of active research being conducted on the subject as well as
com-in the related field of photon mappcom-ing Probably the macom-in boostcom-ing factor behcom-indthese interests is that direct rendering techniques such as rasterization start to breakdown as the size of geometry used by artists becomes finer and more detailed, andpolygons begin to cover screen areas smaller than a single pixel Therefore, more so-phisticated computer graphics solutions are called for Microfacet-based techniques,such as the well-known Reyes algorithm, constitute such a modern solution Al-though these techniques have been used for some time to render extremely finegeometry and implicit surfaces in offline systems such as those used for movies, ray-tracing is believed to become a computer graphics method for tomorrow With thecontinued increase in computing power available to consumers, it is quite possiblethat interactive and real-time raytracing could become a commonly used technique
in video games, digital content creation, computer-aided design, and other consumerapplications
Com-[Perlin 01] Perlin, K (2001), Improving Noise, In Computer Graphics, Vol 35, No 3
[Jensen 96] Jensen, H W (1996), Global Illumination using Photon Maps, In Rendering niques ’96, Springer Wien, X Peuvo and Schr¨oder, Eds., 21–30.
Tech-[Phol 09] Phol, D (2009), Light It Up! Quake Wars Gets Ray Traced, In Intel Visual Adrenalin Magazine, Issue 2, 2009.
[Purcell 02] Percell, T., Buck, I., Mark W R., and Hanrahan, P (2002), Raytracing on grammable Graphics Hardware, In ACM Transaction on Graphics 21 (3), pp 703–712, (Proceedings of SIGGRAPH 2002).
Trang 10Pro-548 G Sellers and R Lukac [Luebke 08] Luebke, D and Parker, S (2008), Interactive Raytracing with CUDA, Presentation, NVidia Sponsored Session, SIGGRAPH 2008.
[Seiler 08] Seiler, L et al (2008), Larrabee: A Many-Core x86 Architecture for Visual ing, In ACM Transactions on Graphics 27 (3), Article 18.
Comput-[Schmittler 04] Schmittler, J., Woop, S., Wagner, D., Paul, W., and Slusallek, P (2004), Realtime Raytracing of Dynamic Scenes on an FPGA Chip, In Proceedings of Graphics Hardware 2004, Grenoble, France, August 28th–29th, 2004.
[Woop 05] Woop, S., Schmittler, J., and Slusallek, P (2005), RPU: A Programmable Ray cessing Unit for Realtime Raytracing, In ACM Transactions on Graphics 24 (3), pp 434–444, (Proceedings of SIGGRAPH 2005).
Pro-[Jarosz 08] Jarosz, W., Jensen, H W., and Donner, C (2008), Advanced Global Illumination using Photon Mapping, SIGGRAPH 2008, ACM SIGGRAPH 2008 Classes.
[Nyquist 28] Nyquist, H (1928), Certain Topics in Telegraph Transmission Theory, in tions of the AIEE, Volume 47, pp 617–644 (Reprinted in Proceedings of the IEEE, Volume 90 (2), 2002).
Transac-[Fuchs 80] Fuchs, H., Kedem, M., and Naylor, B F (1980), On Visible Surface Generation by A Priori Tree Structures, in Proceedings of the 7 th Annual Conference on Computer Graphics and Interactive Techniques, pp 124–133.
[Cook 87] Cook, R L., Carpenter, L., and Catmull, E., The Reyes Rendering Architecture, In Proceedings of SIGGRAPH ’87, pp 95–102.
[Foley 90] Foley, J D., van Dam, A., Feiner, S K., and Hughes, J F (1990), Computer Graphics: Principles and Practice, 2 nd Ed.
[Hanrahan 93] Hanrahan, P and Krueger, W (1993), Reflection rrom Layered Surfaces due to Subsurface Scattering, in Proceedings of SIGGRAPH 1993, pp 165–174.
[Weidlich 08] Weidlich, A and Wilkie, A (2008), Realistic Rendering of Birefringency in axial Crystals, in ACM Transactions on Graphics 27 (1), pp 1–12.
Uni-[Henning 04] Henning, C and Stephenson, P (2004), Accellerating the Ray Tracing of Height Fields, in Proceedings of the 2 nd International Conference on Computer Graphics and Interac- tive Techniques in Australasia and South East Asia, pp 254–258.
[Houston 06] Houston, B., Nielson, M B., Batty, C., and Museth, K (2006), Hierarchical RLE Level Set: A Compact and Versatile Deformable Surface Representation, in ACM Transactions
Ultra-[Glassner 89] Glassner, A S (1989), An Introduction to Ray Tracing, Morgan Kaufmann, ISBN 978-0122861604.
[Blinn 77] Blinn, J F (1977), Models of Light Reflection for Computer Synthesized Models, in Proceedings of the 4 th Annual Conference on Computer Graphics and Interactive Techniques,
pp 192–198.
[Cook 82] Cook, R L and Torrance, K E (1982), A Reflectance Model for Computer Graphics,
in ACM Transactions on Graphics, Volume 1 (1), pp 7–24.
[Oren 95] Oren, M and Nayar, S K (1995), Generalization of the Lambertian Model and plications for Computer Vision, International Journal of Computer Vision, Volume 14 (3),
Trang 11[Peachy 85] Peachy, D (1985) Solid Texturing of Complex Surfaces, in Proceedings of the 12 th
Annual Conference on Computer Graphics and Interactive Techniques, pp 279–286 [Marschner 05] Marschner, S R., Westlin, S H., Arbree, A and Moon, J T (2005), Measuring and Modeling the Appearance of Finished Wood, in ACM Transactions on Graphics, Volume
[Clark 76] Clark, J H (1976), Hierarchical Geometric Models for Visible Surface Algorithms, in Communications of the ACM, Volume 19 (10), pp 547–554.
[Weghorst 84] Weghorst, H., Hooper, G., and Greenberg, D.P (1974), Improved Computational Methods for Ray Tracing, in ACM Transactions on Graphics, Volume 3 (1), pp 52–69 [Kay 86] Kay, T L and Kajiya, J T (1986), Ray Tracing Complex Scenes, in Computer Graphics, Volume 20 (4), pp 269–278
[Arvo 89] Arvo, J and Kirk, D (1989), A Survey of Ray Tracing Techniques, in An Introduction
to Raytracing, Academic Press Ltd (publishers), ISBN 0-12-286160-4, pp 201–262 [Rusinkiewicz 00] Rusinkiewicz, S and Levoy, M (2000), QSplat: A Multiresolution Point Ren- dering System for Large Meshes, in Proceedings of the 27 th Annual Conference on Computer Graphics and Interactive Techniques, pp 343–352.
[Thomas 01] Thomas, F and Torras, C (2001), 3D Collision Detection, A Survey, in Computers and Graphics, Volume 25, pp 269–285.
[Painter 89] Painter, J and Sloan, K (1989), Antialiased ray tracing by adaptive progressive finement, in Proceedings of the 1989 SIGGRAPH Conference, pp 281–288.
re-[Mitchell 90] Mitchell, D P (1990) The Antialiasing Problem in Ray Tracing, SIGGRAPH 1990 Course Notes.
[Shirley 05] Shirley, P., Ashikhmin, M., Gleicher, M., Marschner S., Reinhard E., Sung K., Thompson W., Willemsen P (2005), Fundamentals of Computer Graphics, 2 nd Ed.,
pp 201–237 A.K Peters Ltd., ISBN 978-1568812694.
[Hall 83] Hall, R A and Greenberg, D P (1983), Ray Tracing Complex Scenes, in Computer Graphics, Volume 20 (4), pp 269–278.
Trang 12Chapter 24
The 3D Human Motion Control Through
Refined Video Gesture Annotation
Yohan Jin, Myunghoon Suk, and B Prabhakaran
Introduction
In the beginning of computer and video game industry, simple game controllersconsisting of buttons and joysticks were employed, but recently game consoles arereplacing joystick buttons with novel interfaces such as the remote controllers withmotion sensing technology on the Nintendo Wii [1] Especially video-based humancomputer interaction (HCI) technique has been applied to games, and the represen-tative game is ‘Eyetoy’ on the Sony PlayStation 2 Video-based HCI technique hasgreat benefit to release players from the intractable game controller Moreover, inorder to communicate between humans and computers, video-based HCI is verycrucial since it is intuitive, easy to get, and inexpensive
On the one hand, extracting semantic low-level features from video human tion data is still a major challenge The level of accuracy is really dependent oneach subject’s characteristic and environmental noises Of late, people have beenusing 3D motion-capture data for visualizing real human motions in 3D space (e.g,
mo-‘Tiger Woods’ in EA Sports, ‘Angelina Jolie’ in Bear-Wolf movie) and analyzingmotions for specific performance (e.g, ‘golf swing’ and ‘walking’) 3D motion-capture system (‘VICON’) generates a matrix for each motion clip Here, a column
is corresponding to a human’s sub-body part and row represents time frames ofdata capture Thus, we can extract sub-body part’s motion only by selecting specificcolumns Different from low-level feature values of video human motion, 3D hu-man motion-capture data matrix are not pixel values, but is closer to human level ofsemantics
The motivation of this paper starts from the following observations Video basedhuman motion data is essential for human computer interaction, but there is a seman-tic gap between human perceptions and the low-level feature values of video humanmotions It might be partially covered by some good machine learning algorithm,
Y Jin, M Suk, and B Prabhakaran ( )
MySpace (Fox Interactive Media) 407 N Maple Dr Beverly Hills, CA 90210
Department of Computer Science, University of Texas at Dallas, TX, USA
e-mail: ychin@myspace.com; mhs071000@utdallas.edu; praba@utdallas.edu
B Furht (ed.), Handbook of Multimedia for Digital Entertainment and Arts,
DOI 10.1007/978-0-387-89024-1 24, c Springer Science+Business Media, LLC 2009
551
Trang 13but it is quite susceptible to variations in subjects and environments Thus, we need
to refine video human motion’s low-level features by using more semantically wellrepresented data, such as motion-capture data in this case We show how we can use3D motion-capture data as Knowledge-Base for understanding video human motionclasses For achieving this goal, there is a barrier Two motion examples belong-ing to one class (e.g, ‘backhand’ motion) of video and 3D motion-capture data arevisually similar to human But based on the under-lying data representation, videoand 3D motion-capture motion data are heterogeneous in nature Video low-levelfeatures are extracted from pixel’s intensity values and 3D motion-capture matrix istranslational and rotational DOF values of human motions
To refine video low-level feature data, we mix human video low-level featuresand semantic 3D motion-capture features and use it as the “hidden” states in theHidden-Markov Model (HMM) HMM already has been used widely for speechrecognition and human gesture recognition as well In this paper, we show thatHMM can combine the two heterogeneous data and merge the ‘knowledge-based’semantic 3D motion capture data as the hidden state We show this 3D motioncapture assisted HMM model can significantly improve video human motion recog-nition rate
The player’s motions on the 3D game are recognized as input through the camera,the video-base human motions are estimated from the HMM The 3D game can becontrolled with the estimated results To synthesize the human motion on the game,
we prepare several atomic motion clips For a video motion sequence, the samemotion clip as the motion estimated from the HMM is selected as output The gapbetween several different motion clips can be smoothly interpolated
Related Work
Video based human gesture problem have attracted great interest in computer vision,machine learning and vision-based computer graphics area Human action recog-nition problem can be separated into several different approaches First, in a gaittracking based method, Yacoob et al [22] tracked legs and got the parameterizedvalues for matching in the Eigenspace Second, in a motion feature values basedapproach, Huang et al [21] computed the principle component analysis with fea-ture values of human silhouette Using these reduced dimensional motion featurevalues, they tackled the gait-analysis problem Masoud et al [4] also used motionextracted directly from images They represent motion feature as a feature image.Subsequently feature images were mapped to lower-dimensional manifold that ismatched with the most like human action class Third, some approaches requiresome devices for taking refined level human actions Li et al [23] used data-globefor capturing American Sign Languages and get feature values using SVD
Hidden Markov Model [20] is really powerful and useful modeling for ing speech and video human gestures Yamato et al [15] first applied HMM modelfor recognizing video human tennis gesture categories and pointed out when training
Trang 14recogniz-24 The 3D Human Motion Control Through Refined Video Gesture Annotation 553and testing data from different subjects, the accuracy dropped Yang et al [16]showed how vector quantization can be applied with feature values with HMM andapplied with isolated and continuous gesture recognition as well Eickeler et al [19]rejected unknown gestures with HMM model by learning undefined action classes.They also demonstrated HMM can deal with recognizing very subtle gestures, such
as “hand waving”, “spin”, “head moving” and showed HMM can reduce significanterror much better than Neural Networks [18] There are approaches which try tomerge some “knowledge” into HMM model for performance enhancement Yoshio
et al [3] combined HMM with automation process by using one to one gesturerecognition results It is a very local knowledge and delays the decision process.Recently, Neil et al [1] used domain knowledge for improving the maximum likeli-hood estimates Some “rules” defined by domain expert can smoothen human tennisaction commentary
Vision-based human motion input is the most convenient and real-life applicableway for capturing human actions From this characteristic, there are many graphicapplications which accept vision input and use its low-dimensional values as thecontrol signals [5] [6] [7] [8] Park et al [5] synthesized 3D human motion whichfollows 2D video human action’s trajectories in the soccer video clips Jinxiang
et al generated new 3D character motions based on video human actor’s poses[6] and synthesize facial expressions followed by vision face expression throughparameterized PCA comparison [7] 2D video sequences give high-level informa-tion from sequences of images [8] to animation and the motion capture data whichembedded the refined knowledge for high-quality expression [7] On the other hand,3D motion capture data can help video based human motion recognition problemwith the semantic characteristics Hodgins et al [10] segmented motion capturedata into distinct behaviors in unsupervised way very accurately and it is much sim-pler than video segmentation [2] Li et al [23] achieved very high performanceratio for segmenting and classifying human motion capture data in real-time Tian
et al [24] classified hands signal using 3D motion capture data by minimizing theback projection error between each frame of video and 3D motion sequence usingDynamic Time Warping technique It has similar intuition to this paper Here, weapply 3D motion-capture data examples for recognizing human whole body motionand demonstrate how we can mix the heterogeneous data (video and 3D motion-capture) and embed into hidden markov model Thus, we can show how much 3Dmotion capture assisted methodology can help quantitatively with the traditionalHMM model approach
Proposed Approach
In this work, we propose a way of using 3D motion-capture data streams to ognize video human motion data Here, we consider 3D motion capture data asthe knowledgebase since there exits human efforts and knowledge while we aretaking 3D human motion capture data A subject wears reflective markers [13]
Trang 15rec-which are corresponding to body joint segments (e.g., ‘lower back’, ‘upper back’,
‘thorax’ -these belong to ‘Torso’) Each joint’s degree of freedom is the columnvector value and rows are the time frames in the 3D motion capture matrix Thus,
we can select specific body segment in which we are interested Degree of freedomvalues are also close to human semantic representation First, to make a correspon-dence between video and 3D motion capture data, 3D motion capture data must bedown-sampled since its frame rate (120 fps) is 4 times faster than video motion’sframe rate (30fps) -see Figure 1 After down sampling, we can have the 3D mo-tion capture frames 3df1; 3df5; 3df9; : : : 3dfm/ corresponding to the video frames
.vf1; vf2; vf3; : : : ; vfn/ where down sampling ratio is m D 4.n 1/ C 1
Second, we extract the representative features and combine into a single
matrix The feature vector of a video frame vfi/ has 10-dimensional values
Fig 1 Knowledge-based HMM Learning for Video Gesture Recognition using 3D motion-capture Data Stream