Therefore, an approach of synthesis using existing motion would be more suited for real time application.. However, creating new motion from existing ones is a challenging task because o
Trang 1Motion Segmentation Based On Joint Swings
Chia Wen Jie Alvin
Department of Computer Science School of Computing
National University of Singapore
2009
Trang 2Abstract
Synthesizing new motion is a difficult problem Synthesis through physical simulation produces the best results but it suffers from the amount of time needed and thus, it is not suitable for real time use such as in a game Therefore, an approach of synthesis using existing motion would be more suited for real time application However, creating new motion from existing ones is a challenging task because of the motion data generally lacks structure and intuitive interpretation
We have come out with a novel motion segmentation model based on the dynamics of the motion which enables us to modify the intensity and timing of existing motion For example, we could make a kick much more forceful, or change the duration of the kick We believe our model could
be used for motion compression as well as help in motion analysis in general because it encodes temporal, spatial and intensity information
Subject Descriptors:
I.3.6 Computer Graphics Methodology and Techniques
I.3.7 Three-Dimensional Graphics and Realism
Keywords:
Character Animation, Algorithms, Data Structures, Computer Graphics
Implemented Software and Hardware:
Microsoft Visual Studio 2008 C++, OpenGL, Microsoft Foundation Class (MFC)
Trang 5Such animations are normally done in 2 ways: requiring skilled animators who manually hand animate 3D characters in software packages like Maya and 3DS Max or using motion capture, where actors wearing special suits “acted” out the required motion which are then captured and stored Manual animation requires skilled animators and is very time consuming while motion capture is expensive and the resultant motion might not meet the requirements because of noise or simply because the timing of the actor is off
This is where motion synthesis comes in, it allows the creation of new motions which when done right, can satisfy the requirements of the application Furthermore, it allows reuse of existing motion data, which would be wasted since normally they are used in an application once and discarded because it is very hard for another application to use it without having to editing or modify it
Motion synthesis is usually done by manipulating the motion data in its raw form Figure
1 shows a plot of all the joint angles in a motion against time It is hard to decipher what the motion is doing by only looking at the plots
Trang 6Figure 1: Plot of all the joint angles against time Note how difficult it is so know what the motion is doing
However, because the raw motion data itself is unstructured, it not trivial to get information such as the swinging of the joints and how fast it is swinging
Therefore, to derive meaning from motion data, one way would be to build a hierarchical model to on top of the motion data A hierarchical model is a way to derive meaning from multimedia signals such as video, audio and in our case, motion data Taking audio, in particular speech as an example, we can analyze to break down speech into phonemes and then combine phonemes into syllables and finally combine syllables into words To break down the audio signal into these components, we need to perform segmentation on the audio signal to know the start and end of the phonemes As far as we know, no one has done this before for motion data Similar to how it is done for video and audio clips, building a hierarchical model requires that segmentation be done To do this, we need a segmentation model and this leads us to the objective of this thesis
1.2 Thesis Objective and Contribution
The objective of this thesis is to develop a segmentation model for motion data to model the swings of the motion For instance, when the arm is swinging forward during a run, the humerus (the bone where the bicep and tricep are) is swinging forward in a rather geodesic manner For example, in the picture below, the right upper arm will swing forward and back as the character runs
Trang 7Figure 2: The right arm will swing forward during a run
This means, the bone is rotating around a fairly constant axis of rotation when swinging forward This applies to the other bones such as those on the legs as well Thus, we will have taken a step forward in the building of a hierarchical model for motion data if we can build this segmentation model
Such swings can be segmented from the motion data and it’s analogy with respect to video would be shots in shot detection
Therefore, the main objective of this thesis is to come out with a segmentation model which will be able to segment out these swings If we playback just these swings instead
of the original motion data, we should get a good approximation of the original motion This would demonstrate that the model does indeed work
With these swings segmented out, we can show that we can do use them to some applications, namely:
• A fairly basic form of motion compression by just storing these swings
• A way to index and search for motion in a motion database by using these the segmentation result
• Simple motion editing by manipulating the properties of these swings
1.3 Thesis Organization
The rest of the thesis will be organized as such We will talk about some Background Knowledge about animating a skeleton with motion data because this is a very basic
Trang 8requirement in dealing with motion data Then we go on to Related Work where we discuss some relevant works in the literature After that would be discussion on our Motion Segmentation Model and how it is done Following that would be on how we can make use of the segmentation model and we end with the conclusion
Trang 9Chapter 2
Background Knowledge
2.1 Animating a Skeleton using Motion Data
2.1.1 The Skeleton Structure
The skeleton is a structure similar to our human skeleton It is made up of bones and joints The bones are typically named according to their biological name So for example, the thigh bone is called the femur
Figure 3: An example skeleton and the bone names The labels give the medical name for each bone
The skeleton structure is actually a tree with the root joint as the root of the tree So, the child of the lfemur is the ltibia Each child will store the transformation information from its parent to itself The position of the skeleton in Figure 2 is known as the bind-pose
Trang 102.1.2 Animating the Skeleton
How do we animate the skeletal structure? We do so by specifying the rotation of each individual bone with respect to the parent bone’s local coordinate frame Different bones have varying number of Degrees of Freedom (DOFs) For example, the femur (thigh) can rotate freely in the x, y and z axis while the radius (fore-arm) can only rotate in one axis Typically, depending on the motion capture equipment used, we will know the order to apply the x, y and z rotation for each bone/joint
The root is the only bone with translation component to it and whose rotation component
is with respect to the world coordinate frame Therefore, it has 6 DOFs Any translation
of the skeleton in 3D is specified by the translation component of the root
The structure of the bones as well as the sequence of angles for each bones form what we know as the motion data
A collection of the angles for each bone specifies a pose for the skeleton We call such a
collection a frame A sequence of frames would give us the animation of the skeleton performing whatever motion is captured Motion capture is usually captured at 120 frames per second and then down scaled to 60 frames per second This is what is done with the motion capture data from the CMU Graphics Lab Therefore, each frame represents 1 / 60 of a second, and from the number of frames in a motion, we can work out the duration of the motion
As mentioned, the pose in Figure 3 is known as the bind-pose and it is usually specified
by having all the DOF of each bone set to 0
Trang 112.1.3 Animation a 3D mesh
To drive a 3D mesh of a human using the skeleton, we have to perform a process known
as skinning Given a 3D mesh and a skeleton,
Before we can assign a bone, we must position the skeleton “inside” the mesh so that the when the bone rotates, the vertices that follows will do so correctly The image below shows the skeleton inside the mesh
Figure 6: Skeleton positioned inside the mesh Figure 7: Color coding the vertex assignment
Once skinning is done, the character is said to be “rigged” and is ready to be animated
Trang 122.2 Representing Rotations
There are a number of ways to represent rotations in 3D
Below are some of them
• Euler Angle [refer to citation 18 for more details]
• Rotation Matrix [refer to citation 19 for more details]
• Quaternion [refer to citation 20 for more details]
• Exponential Map [refer to citation 17 for more details]
Euler Angle representation is the most straight forward way where the rotation is represented by a 3x1 vector corresponding to rotation with respect to x, y and z axes respectively It is developed by Leonhard Euler to describe the orientation of a rigid body (a body in which the relative position of all its points is constant) in 3D Euclidean Space However, one well known problem with Euler Angle representation is that it is plagued
by the gimbal lock problem
Gimbal lock is the loss of one degree of freedom that occurs when the 1 of the 3 axes (in
a 3D space) becomes aligned with one of the remaining 2 axes This results in a loss of the degree of freedom for the particular axis Refer to [21] for a more in depth explanation
We can use the rotation matrix representation, where each rotation is encoded by a 3x3 matrix This does not suffer from the gimbal lock problem However, a 3D rotation is has only 3 degrees of freedom, namely the angle to rotate for each principal axis but the rotation matrix has 9 components This is not suitable for applications where memory constraints apply
There is also the quaternion representation where a rotation is represented compactly by a 4x1 vector Rotation can be performed in quaternion space and it does not suffer from gimbal lock as well However, quaternion have a strict rule to be of unit length, otherwise, the 4x1 vector representing the quaternion would not make any sense This makes
Trang 13quaternion unsuitable for applications where interpolations and making small changes to rotations are required because the quaternion has to be renormalized each time it is changed
This is where exponential maps come in Exponential maps attempts to map a 3D rotation
to a vector in R3 space by having the R3 vector represent the axis of rotation and the magnitude of the vector specifying the angle to rotate using the Right Hand Rule This is not possible without the possibility of gimbal lock However, the gimbal lock in exponential map is avoidable and that makes it suitable as a replacement for quaternion
We can convert from Exponential Map to Quaternion as shown below
Let
All we have done is reorganize the problematic term so that instead of computing v / |v| (i.e v/θ), we compute sin( ½θ) / θ This is because sin(½θ)/θ = ½sinc(½θ), and sinc is a
function that is known to be computable and continuous at and around zero Assured that
the function is computable, we still need a formula for computing it, since sinc is not
included in standard math libraries Using the Taylor Expansion of sine function, we get:
Trang 14From this we see that the term is well defined, and that evaluating the entire infinite
series would give us the exact value But as θ → 0, each successive term is smaller than the last, and terms are alternately added and subtracted, so if we approximate the true
value by the first n terms, the error will be no greater than the magnitude of the (n+1)th term
The principal advantage of quaternion over Euler angle is their freedom from gimbal lock
We already know that the exponential map must suffers from gimbal lock too, so if it is
to be useful, we must know how and where gimbal lock occurs and show how they can
be avoided at a cost that is outweighed by its benefits
The problems with exponential map shows up on the spheres (in R3) of radius 2nπ (for
n=1,2,3,…) This makes sense, since a rotation of 2π about any axis is equivalent to no
rotation at all – the entire shell of points 2π distant from the origin (and 4π, and so on) collapses to the identity in SO(3) So if we can restrict our parameterization to the inside
of the ball of radius 2π, we will avoid the gimbal lock Fortunately, each member of SO(3) (except the rotation of zero radians) has two possible representations within this ball: as a rotation of q radians about v, and as a rotation of 2π- θ radians about −v
By moving through time in small steps (making small changes to the rotation, keeping the change to < π), we can easily keep orientations inside the ball by doing this: at each
time step when the rotation is queried for its value, we examine |v|, and if it is close to π,
we replace v by (1−2π/ |v|)v, which is an equivalent rotation Such reparameterization
could be done to the Euler Angles as well, but it is simpler when performed on Exponential Maps since it involves just scaling a 3x1 vector since the magnitude of the
Trang 15vector represents the angle of rotation For Euler Angle, a series of trigonometric functions are involved to do this and obviously is more computationally intensive compared to doing the same thing for Exponential Map
One disadvantage of Exponential Map when compared to quaternion is that there is no simple way to combine rotations We have to convert the Exponential Map to quaternion, perform quaternion multiplication, and then transform the result back to Exponential Map
Therefore, in our segmentation model, we make use of Exponential Map to represent the rotations because we need to perform interpolation, averaging and smoothing on
sequence of rotations It is easier to compute using Exponential Map compared to
Quaternion
Trang 16of a motion This method is tedious and error prone As far as we know, there is no segmentation done to find out where the swings of a motion are Extraction of such low level features has not been done and we believed that with the extraction of such swings might be more useful then determining start and end of motion
For videos, there are shot detection to detect when a shot starts and ends A shot is defined as the time when the shoot button is pressed on the recorder to the time when the stop button is pressed There are many literatures on this and when comparing motion data with video, this is the closest analogy
In audio, in particular speech processing, speech is model by phonemes and syllabuses, before these are combined into words, phrases etc We believed, that this can be by extracting swings from motion data, we can do something similar to speech processing,
by combining swings into actions such as kick, punch etc
Trang 173.2 Motion Synthesis
There are actually a number of ways to synthesize motion One way uses physical simulation to simulation the physics of the required motion so as to come out with new ones Note that physical simulation need not use any existing motion to generate new motions The other is to use existing motion We termed it Exemplar-based techniques
We could store all the motions in a database and generate or synthesize new motions by finding suitable example in the database This is synthesis through multiple examples
In both ways, synthesis is done through a direct manipulation/handling of the motion data itself
Physical Simulation, as the name suggests, tries to generate motion by simulating the physics of what would happened given the required conditions For example, we can specify that a character needs to kick a certain object in space, and given a physically correct model, we can run a simulation to produce the desired motion
However, Physical Simulation often involves some form of optimization and therefore, it
is very slow and is not very suitable for real-time usage such as in games or virtual environment We will focus on Exemplar-based synthesis techniques instead as mentioned earlier in the 1st chapter of this thesis
Exemplar-based techniques fall into several categories Very broadly, these are:
• Concatenation approach
• Time Warping approach
• Signal Processing approach
For Exemplar-based techniques, we can synthesize new motions from many examples, or from a single example The concatenation approach usually uses many examples to synthesize new ones while the Time Warping and Signal Processing are mainly dealing with 1 motion We can also say that the 2 approaches are more like Motion Editing rather
Trang 18than Motion Synthesis However, I would still label them as Motion Synthesis because the generated motion is “new”; it is not produced by motion capturing
Note that what the techniques above only synthesize new motion data by dealing with the motion data itself However we will see that there is one example of editing the skeleton structure and generating new motion data for the new structure
Our approach falls only in the editing motion data category We do not modify the skeleton structure at all
We must mention that for all Exemplar-based motion synthesis, there is almost always post processing being performed after the motion is generated One common problem is the foot skate problem where the foot “slides” along the ground However, we can use motion generated from Exemplar-based methods as a “starting point” for animators working on computer games as well as computer generated movies It would definitely be faster than having the animator create an animation from scratch
Trang 193.2.1 Concatenation Approach
In this approach, the main idea for synthesizing new motion is to take example motion and concatenate them together The hard part is finding the right place to join different motion sequences together so that the resultant motion is correct
One method of doing this is to use a Motion Graph There are actually a number of papers published on the topic of Motion Graph Motion Graph (Kovar, Gleicher, Pighin, 2002), Interactive Motion Generation from Example (Arikan, Forsyth, 2003) all talks about motion graphs
The general idea about motion graphs is that the edges are all motion clips while the nodes are the transition points Consider Figure 7 below which shows a very simple motion graph made of 2 different motion clips
Figure 8: Picture showing example of motion graph
The 2 horizontal lines on the left are 2 different motion clips The motion starts on the left and plays to the right On the right, the green dot and line represents the transition point
So if we start at the top motion, and we play the animation, when we reach the frame at the green dot, we can either choose to continue playing the original motion, or move to the bottom motion and continue the animation from there Therefore, any walks from the motion graph will be a new sequence of motion
The challenge then is to locate the transition points To do this, there need to be a way to compare the between 2 different poses to determine their similarity Once we have this,
we can then come out with a similarity image between 2 different motions It is generated
by comparing each frame in one motion with every frame in the other The images below
Trang 20show example of such similarity images A high similarity will show up as white in the image while low similarity with be darker
Figure 9: Similarity image between 2 walk motion, note the repeating patterns due to the cyclic nature of
walking
Figure 10: Similarity image for a motion against itself Note the white diagonal This is due to comparing
the pose in a frame against itself
Once the similarity images are generated, the transition points can be determined by finding the pair of frames where similarity is high However, because there is no way we can have perfect matches of poses between 2 different motions, some form of blending must be performed during the transition from on motion to the next
Trang 213.2.2 Time Warping Approach
Time warping is one technique which allows user to adjust the timing of animated characters without affecting their poses For example, we can adjust a punching motion such that a punch takes longer to execute
The importance of timing in animation is highlighted in John Lasseter’s “Principles of Traditional Animation Applied to 3D Computer Animation” and he notes how even the slightest timing difference can greatly affect the perception of the animation However, the time warping too requires great skill and patience in order to achieve good result
Linear time warping is usually used because it is easier to perform However, recent works have used non-linear time warping which can produce better results
In the paper, “Guided Time warping for Motion Editing”, the author is able to change the timing of a motion by doing non-linear time warping
Figure 11: Difference between linear and non-linear time warping Image taken from the paper
The method generates a retimed output motion based on 2 motions, an input motion which is to be retimed, and a reference motion which controls affects how the input motion is retimed The output will be similar to the input while matching the “speed” of the reference
Trang 223.2.3 Signal Processing Approach
In this approach, the motion data is treated as a motion signal The sequence of angles for each DOF of each joint becomes the signal The signal can then be converted to the frequency domain and the motion can be edited by filtering out unwanted frequencies and converting the signal back to the time domain
In the paper “Motion Signal Processing”, this is what the authors did They found out that the main movements in a motion, such as the swinging of the thighs and arms during walking were mainly composed of the lower frequency components The high frequency components were either noise, or details such as waving of hands
In the recent SIGGRAPH paper, “Cartoon Animation Filter”, the authors came out with a filter that could easily add anticipation, follow-through and squash-and-stretch to a motion
Figure 12: The plot of the cartoon animation filter Image taken from the paper
The filter is actually a very simple one It is just an inverted Laplacian of Gaussian The new motion is obtained by adding a filtered version of the motion signal to itself The result is quite elegant in that one filter is all that is needed, and there is only one parameter for the user to control The others can be determined automatically Therefore, this method can provide a quick and easy way quick come out with new motions from existing ones
Trang 233.2.4 Skeleton Structure Modifying Approach
Modifying the skeleton structure to synthesis new motion seems rather counter intuitive
at first Why modify the skeleton? The skeleton is driven by the motion data so we should focus on manipulating the motion data instead
On closer examination, we could modify the skeleton structure to generate come up with physically impossible motion is a rather novel idea The most prominent example is the paper “Rubber-like Exaggeration for Character Animation”, which breaks each bone in the skeleton into smaller ones so as to be able to simulate the “rubbery” effect in cartoons
Figure 13: Example of rubber like motion Note the stretching of the limbs of the character Image taken
from the paper
Each bone is broken down into several small bones covering the whole length of the original ones as show in Figure 13 For squash and stretch effects, the length of the bones can be “lengthen” during the stretching portion of the animation Such motion is impossible to be performed by a human being because the human bone can lengthened or shorten at will
Figure 14: Breaking down of a bone into several smaller ones Orange represents the original joint while
the green ones are the smaller joints used to represent the original ones Image taken from the paper
Trang 24Chapter 3
Motion Segmentation
4.1 The Motion Segmentation Model
In this section, we will be discussing the details of our segmentation model as well as the some applications where it could be used
Before that, let us recall that for each bone/joint in a skeleton, it is driven by a series of rotations applied to it Each rotation can be described by a rotation matrix The rotation matrix has only 3 degrees of freedom, which is the angle to rotate in the x, y and z axis respectively Therefore, the skeleton has a sequence of rotation matrix for each bone/joint describing its pose at a particular frame
The main idea behind our Segmentation Model is that for highly dynamic motions such
as running, kicking or any sports motion, the quick and forceful swinging of the limbs can be parameterized and/or approximated by a rotation axis and a start and end angle The “rest” period between 2 consecutive swings are usually quite stationary There might
be some movement about but, for most parts, during a highly dynamic movement, the rest period is pretty stationary
The reason we can do this is that for such forceful swings, the bone/joint going through the motion follows a nearly geodesic path that very nearly rotates around a fixed axis We can visualize it imaging the centre of rotation of the bone as the center of a sphere with radius equals to the length of the bone The tip of the bone will then trace a path on the surface of the sphere as it rotates