In this thesis, first we investigate asaliency based method for rigid registration of renal perfusion images.A neurobiology based visual saliency model is used for this purpose.Saliency a
Trang 1Methodology for MR Image Analysis: Application to Cardiac
and Renal Images
Dwarikanath Mahapatra Department of Electrical and Computer Engineering
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS OF
DOCTOR OF PHILOSOPHY
National University of Singapore, 2011
Trang 2Magnetic resonance imaging (MRI) has emerged as a reliable tool forfunctional analysis of internal organs like the kidney and the heart.Due to the considerable length of time taken to acquire MR images,they are affected by patient motion Besides, MR images are charac-terized by low spatial resolution, noise and rapidly changing intensity.Rapid intensity change is the primary challenge that MR image regis-tration methods need to address In this thesis, first we investigate asaliency based method for rigid registration of renal perfusion images.
A neurobiology based visual saliency model is used for this purpose.Saliency acts as a contrast invariant metric, and a mutual informationframework is used The second part of our work deals with elastic reg-istration of cardiac perfusion images The saliency model is modified
to reflect the local similarity property at every pixel Markov randomfields (MRFs) were used to integrate saliency and gradient informationfor elastic registration Apart from being a contrast invariant metric,saliency also influences the smoothness of the registration field andspeeds up registration by identifying pixels relevant for registration
In the final part of our work we investigate a joint registration and mentation (JRS) method for the perfusion images JRS is particularlyimportant for MR images in order to fully exploit the available tempo-ral information from the image sequence MRFs were used to combinethe mutual dependency of registration and segmentation information
seg-by formulating the data penalty and smoothness cost as a function
of registration and segmentation labels The displacement vector andsegmentation class of every pixel was obtained from multi-resolutiongraph cut optimization This eliminates the need for a separate seg-mentation step and also increases computation speed Experimental
Trang 3volve making our proposed JRS method robust to different datasets,and investigating the possibility of using learning techniques to solvethe registration and segmentation problem.
Trang 4Over the course of my PhD there have been many people who havetaught me the importance of perseverance and focus in research work.
I want to thank my PhD supervisor, Dr Ying Sun, for her ment She had the patience to deal with my many mistakes andguided me very well during my research work I also want to thank
commit-Dr Stefan Winkler, who was my PhD supervisor in my first year Hisinvolvement was important for me in cultivating interest on varioustopics of computer vision I have learnt a lot from my interactionswith both of my supervisors, especially in the way to conduct re-search I also express my deepest gratitude to members of my thesiscommittee, Dr Ashraf Kassim and Dr Sim-Heng Ong, for their input
at different stages of my PhD
Completing a PhD is not possible without the support of family andfriends I want to thank my parents and brother for their uncondi-tional moral support at all times during my graduate student life All
my friends were extremely generous with their encouragement andmotivation, especially Dr Sujoy Roy I enjoyed my numerous dis-cussions with Chao Li, Ruchir Srivastava, Mukesh Kumar Saini andAjay Kumar Mishra on miscellaneous topics including, my research
As officer-in-charge of the Embedded Video Lab, Mr Jack Ng made
my stay there an enjoyable one I also want to thank BadarinathKarri for being a good friend and example Finally, I thank the manyanonymous reviewers whose comments helped to improve my researchwork
Trang 5• D Mahapatra and Y Sun “Integrating Segmentation Information for
Im-proved Elastic Registration of Perfusion Images using an MRF framework”
Accepted in IEEE Trans Image Processing with minor revisions.
• D Mahapatra and Y Sun “MRF Based Intensity Invariant Elastic
Reg-istration of Cardiac Perfusion Images Using Saliency Information” IEEE
Trans Biomedical Engineering, 58 (4), pp-991-1000,2011.
• D Mahapatra and Y Sun “Rigid Registration of Renal Perfusion Images
using a Neurobiology based Visual Saliency model.” EURASIP Journal on
Image and Video Processing, 2010.
Conferences
• D Mahapatra and Y Sun “Joint Registration and Segmentation of
Dynamic Cardiac Perfusion Images using MRFs” In Proc Medical age Computing and Computer Assisted Intervention (MICCAI), Beijing,September 2010, pp 493-501
Im-• D Mahapatra and Y Sun “An MRF Framework for Joint Registration
and Segmentation of Natural and Perfusion Images” In Proc IEEE national Conference on Image Processing (ICIP), Hong Kong, September
Inter-2010, pp 1709-1712
• D Mahapatra and Y Sun “Non-rigid registration of Dynamic renal MR
images using a Saliency based MRF model” In Proc MICCAI, New YorkCity, September 2008, pp 771-779
• D Mahapatra and Y Sun “Registration of dynamic renal MR images
using neurobiological model of saliency” In Proc IEEE International
Trang 6Sym-in office environments usSym-ing neurobiology-saliency based particle filter” In
Proc IEEE International Conference on Multimedia and Expo (ICME ),
Hannover, June 2008, pp 953-956
• D Mahapatra and Y Sun “Using Saliency Features for Graphcut
Seg-mentation of Perfusion Kidney Images” In Proc International Conference
on Biomedical Engineering (ICBME), Singapore, December 2008, pp 642
639-• D Mahapatra, S Roy and Y Sun “Retrieval of MR Kidney Images by
Incorporating Spatial Information in Histogram of Low Level Features” InProc ICBME, Singapore, December 2008, pp 661-664
• D Mahapatra, S Winkler and S.C Yen “Motion saliency outweighs
other low-level features while watching videos” In SPIE Human Vision and
Electronic Imaging (HVEI ) 2008, San Jose, CA.
Trang 7List of Figures xi
1.1 Motivation 1
1.1.1 Our Contribution 2
1.1.2 Thesis Overview 5
2 Background 7 2.1 Anatomy of Kidney and Heart 7
2.1.1 Heart Anatomy 7
2.1.2 Kidney Anatomy 9
2.1.3 Basics of Perfusion MR Imaging 9
2.2 Saliency 11
2.2.1 Itti-Koch Saliency Model 12
2.2.1.1 Extraction of Early Visual Features 14
2.2.1.2 The Saliency Map 14
2.2.1.3 Strengths and Limitations 16
2.2.2 Scale-Space Maps 17
2.3 Mutual Information 18
2.4 Markov Random Fields 19
2.4.1 Visual Labeling 20
2.4.2 Markov Random Fields 21
2.4.3 Gibbs Random Fields 21
Trang 82.4.4 Markov-Gibbs Equivalence 22
2.4.5 Bayes Labeling of MRFs 23
2.4.6 Regularization 24
2.4.7 Energy Function Optimization 25
2.4.7.1 Graph Cuts 26
2.4.7.2 α − β Swap 27
2.4.7.3 α-Expansion 28
2.5 Description of Datasets 29
2.5.1 Kidney Data 29
2.5.2 Cardiac Data 31
3 Rigid Registration 33 3.1 Introduction 33
3.2 Theory 36
3.2.1 Saliency Model 36
3.2.1.1 Saliency Map in 3D 37
3.2.2 Rigid Registration 37
3.2.2.1 Quantitative-qualitative Mutual Information 38
3.2.3 Saliency based Registration 39
3.2.4 Optimization 40
3.2.4.1 Derivative Based Optimizer 41
3.3 Experiments 45
3.3.1 Registration Procedure 45
3.4 Results 46
3.4.1 Saliency Maps for Pre- and Post-contrast Enhanced Images 47 3.4.2 Registration Functions 49
3.4.3 Robustness of Registration 53
3.4.4 Registration Accuracy for Real Patient Data 56
3.4.5 Computation Time 58
3.5 Discussion and Conclusion 58
Trang 94 Non-Rigid Registration 63
4.1 Introduction 63
4.1.1 Elastic Registration of Dynamic Contrast Enhanced Images 65 4.1.2 Saliency Based Registration 65
4.2 Modified Saliency Model 66
4.2.1 Saliency Maps for Cardiac MRI 68
4.2.2 Saliency Map in 3D 68
4.2.3 Limitations of Saliency 69
4.3 Method 69
4.3.1 Saliency Based Non-Rigid Registration 69
4.3.2 Markov Random Fields 70
4.3.2.1 Data Penalty Term 71
4.3.2.2 Pairwise Interaction Term 72
4.3.3 Optimization Using Modified Narrow Band Graph Cuts 74
4.3.4 Extension to 3D 75
4.3.5 Calculation of Registration Error 76
4.4 Experiments and Results 79
4.5 Conclusion 79
5 Joint Registration and Segmentation 81 5.1 Introduction 81
5.1.1 Our Contribution 85
5.2 Theory 85
5.2.1 Joint Registration and Segmentation 85
5.2.1.1 Overview of Method 88
5.2.2 Markov Random Fields 88
5.2.2.1 Data Penalty Term 89
5.2.2.2 Pairwise Interaction Term 91
5.2.2.3 Optimization using Graph Cuts 92
5.2.3 Extension to 3D 92
5.3 Experiments and Results 95
5.3.1 Synthetic Images 96
5.3.1.1 Interdependence of Registration and Segmentation 98
Trang 105.3.1.2 Accuracy of Segmentation 99
5.3.2 Cine Cardiac MRI 101
5.3.3 Natural Images 104
5.3.4 Computation Time 104
5.4 Discussion 105
5.5 Conclusion 108
6 Experimental Validations 111 6.1 Saliency Based Registration 111
6.1.1 Cardiac Perfusion MRI 111
6.1.1.1 Effect of Saliency Based Narrow Band Graph Cuts115 6.1.2 3D Registration Results on Liver Datasets 116
6.2 Joint Registration and Segmentation 118
6.2.1 Cardiac Perfusion MRI 118
6.2.2 Kidney Perfusion Images 122
6.2.3 3D Registration Results Liver Data 125
7 Conclusion and Future Work 139 7.1 Conclusion 139
7.2 Future Work 141
Trang 111.1 Diagram showing different stages for complete image analysis ofperfusion datasets 3
2.1 Heart anatomy (a) different parts of the heart and (b) the bloodflow In (b), blue components indicate de-oxygenated blood path-ways and red components indicate oxygenated pathways The im-
ages were taken from http://en.wikipedia.org/wiki/Heart . 8
2.2 Kidney Anatomy 1 Renal pyramid 2 Interlobar artery 3 Renalartery 4 Renal vein 5 Renal hilum 6 Renal pelvis 7 Ureter
8 Minor calyx 9 Renal capsule 10 Inferior renal capsule 11.Superior renal capsule 12 Interlobar vein 13 Nephron 14 Minorcalyx 15 Major calyx 16 Renal papilla 17 Renal column The
image was taken from http://en.wikipedia.org/wiki/Kidney . 9
2.3 Figures showing graph structure for different approaches (a)
Opti-mal Expansion move: example of graph G α for 1D image The set
of pixels in the image is P = {p, q, r, s} and the current partition is
P = {P1, P2, P3}, where P1 ={p}, P2 ={q, r}, and Pα={s} Two
auxiliary nodes a = a {p,q} , b = a {r,s} are introduced between boring pixels separated in the current partition Auxiliary nodes
neigh-are added at the boundary of sets P l ; (b) An example of graph G αβ for a 1D image The set of pixels in the image is P αβ = P α ∪ Pβ,
where P α = {p, r, s} and Pβ ={q, · · · , w} The images are taken
from Boykov et al (1) 30
Trang 123.1 Saliency maps of contrast enhanced image sequence (a)-(d) showimages from different stages of contrast enhancement with added
noise The variance of noise added were 02, 05, 08 and 1 (a)
is the reference image to which all images are registered (e)-(h)show the respective saliency maps; (i) colorbar for the saliencymaps The saliency maps are seen to be similar Color images arefor illustration purposes In actual experiments gray scale imageswere used 48
3.2 Saliency profiles of patches from different regions The size ofpatches used are (a) 3× 3; (b) 5 × 5; and (c) 7 × 7 Patches from
the background, cortex and medulla are considered 49
3.3 Plots showing variation of different similarity measures when
reg-istering pre- or post-contrast images First column is for N M I, second column for QM I1 and third column for QM I2 First row
shows the variation for rotation parameters about x-axis while ond column shows variation for translation along x-axis The vari-
sec-ance of added noise was 0.08 x-axis of the plots shows relative error between the actual and candidate transformation while y-
axis shows value of similarity measure 52
3.4 Plots showing variation of different similarity measures when
regis-tering pre- and post contrast images: (a) N M I; (b) QM I1; and (c)
QM I2 The plots show results for Ty (translation along y-axis)
x-axis of the plots shows relative error between actual and candidate
transformation while y-axis shows value of similarity measure . 53
3.5 Synthetic image patch showing shortcomings of N M I (a)-(b)
pre-contrast intensity values and corresponding image patch; and (d) intensity values after contrast enhancement and correspondingpatch 53
(c)-3.6 Robustness performance for (a) rotation and (b) translation The
image pairs belong to same stage of contrast enhancement x-axis shows range of transformation parameters while y-axis shows the
number of correct matches 56
Trang 133.7 Robustness performance when registering contrast enhanced ages Results for (a) rotation and (b) translation Images belong
im-to different contrast enhanced stages x-axis shows range of formation parameters while y-axis shows the number of correct
trans-matches 57
3.8 Difference images highlighting performance of our registration gorithm Columns 1-3 show target image, source image and dif-ference image before registration Columns 4-6 show difference
al-images after registration using N M I, QM I1, and QM I2,
respec-tively Rows 1 and 2 show pairs of images belonging to differentstages of contrast enhancement Rows 3 and 4 show images wherethe source-target image pair was from either pre- or postcontraststage 62
4.1 Saliency maps of contrast enhanced image sequence Cardiac ages from different stages of contrast enhancement are shown; (a)shows the target frame, (b)-(d) show images from different stages ofcontrast enhancement (e)-(h) show the respective saliency mapsfrom our modified saliency model The saliency maps are seen to
im-be similar; (i)-(l) show saliency maps obtained using the originalmodel in (2); The saliency maps are sparse and exhibit a lot ofvariability (m) colorbar for the saliency maps Color images arefor illustration purposes 80
5.1 Registration results for synthetic images (a) reference image; (b)
floating image; registered images using (c) demons; (d) M RF s; and (e) J RS The added noise is equivalent to σ = 0.08 The
objects in the image were individually registered by defining a maskaround each of them 99
5.2 Segmentation results for a synthetic image using (a) J RS; (b) AC and (c) GC; (d) inaccurate initial masks shown in different colors and (e) superimposed outline of segmented objects using J RS from
masks in (d) 101
Trang 145.3 (a) Change in registration error (pixels) with increasing noise levels
for the 3 registration methods; (b) Change in DM values for JRS
and graph cuts The x-axis shows the variance of added noise and
y-axis shows (a) average registration error in pixels and (b) DM
values 102
5.4 Segmentation results for cine cardiac images Outlines of the
seg-mented LV is shown in green using (a) AC; (b) GC; and (c) J RS. 104
5.5 Registration results for cine cardiac images The boundary of the
LV is deformed (in reference image) using the obtained motion
field (in red) and overlaid on floating image First row shows
re-sults when using J RS, second row shows rere-sults using M RF s and
third row shows results for demons algorithm Blue line shows the
outline of the LV in the reference frame This gives an idea of the
degree of deformation recovered using J RS . 105
5.6 Registration and Segmentation performance for natural images
(a) reference image; (b) floating image with mask outline; (c)
dif-ference image before registration; (d) difference image after
regis-tration using J RS; (e) segmented mask from floating image using
only graph cuts; (f) segmented mask using J RS . 106
Trang 156.1 Results for registration of cardiac images The boundary of the
LV deformed using the obtained motion field (in red) is overlaid
on floating image The blue contour is the outline of the LV in the
reference image, shown in (a) First row shows results for GSI, second row shows results for GI, third row shows results for Int, and fourth row shows results for Sal Columns (1) − (4) indicate
floating images corresponding to different stages of contrast hancement Each column corresponds to the same floating image.(a)-(f) show difference images corresponding to the floating image
en-in (3); (a) reference image; (b) difference image before registration;
difference after registration using (c) GSI; (d) GI; (e) Int; and (f)
Sal For the superimposed contours and difference images areas of
misregistration using GI, Int and Sal are highlighted using yellow
arrows 127
6.2 Different performance measures for cardiac image registration (a)average registration error in mm for all 12 cardiac datasets; (b) mu-tual information values before and after registration for all frames
of a typical cardiac image sequence (c) Woods criteria values fore and after registration for a typical cardiac image sequence.Values reported in (a), (b) and (c) are using 4 different similarity
be-measures; Int-only intensity information; Sal-only saliency mation; GI-only gradient information; and GSI-combination of
infor-gradient and saliency information MI-mutual information (nounits); WC-Woods criteria value in (3) (no units); and Error-displacement error in mm as described in Section4.3.5 128
6.3 Results for registration of 3D liver perfusion volumes Each rowshows results for different slices The first column is the referenceimage The second and third columns show the superimposed de-
formed contours of liver on floating volume slice for GI and GSI.
The fourth column shows the difference image before registration
followed by the difference image after registration using GI in the fifth column and the difference image after registration using GSI
in the sixth column Results are for slices 12, 16, 22. 129
Trang 166.4 Quantitative measures for liver registration (a) Average tion error in mm; (b) average mutual information values; and (c)average Woods criteria values (from (3)) The values are aver-aged over all datasets and 4 similarity measures have been used,
registra-Int-only intensity information; Sal-only saliency information;
GI-only gradient information; and GSI-combination of gradient and
saliency information 130
6.5 Registration results for cardiac perfusion images The boundary ofthe LV is deformed (in reference image) using the obtained motionfield (in red) and overlaid on floating image The blue contourshows the outline of LV in the reference image First row shows
results when using J RS, second row has results using M RF s and
third row shows results for demons algorithm 131
6.6 Difference images before and after cardiac image registration (a)reference image; (b) floating image; (c) difference image beforeregistration; difference image after registration using (d) demons;
(e) M RF s; and (f)J RS Areas of misregistration in (d) and (e)
are highlighted by yellow arrows 132
6.7 Segmentation results for cardiac perfusion images Outlines of the
segmented LV is shown in green using (a) AC; (b) GC; and (c) J RS.132
6.8 Registered images showing folding (a) reference image; (b)
float-ing image; (c) registered image from M RF s; and (d) registered image from J RS The registered image from M RF s shows fold- ing The registered image from J RS shows no folding because
segmentation information also influences smoothness cost 133
6.9 Segmentation results from different stages of contrast
enhance-ment First row shows results using GC and second row shows results using J RS GC shows a tendency for over-segmentation
and under-segmentation 133
Trang 176.10 Registration results for perfusion kidney images First row showsthe reference image Second row shows the floating image and thirdrow shows the difference image before registration Fourth andfifth row, respectively, show the difference images after registration
using M RF s and J RS. 134
6.11 Registration results from different slices of a kidney volume Firstrow shows the reference images, second row shows the floatingimage and third row shows the difference image before registration.Fourth and fifth row, respectively, show the difference images after
registration using M RF s and J RS. 135
6.12 3D view for kidney segmentation (a) original dataset; (b) imposed contour of segmented regions 136
super-6.13 Segmentation results from different slices of a kidney volume First
row shows results using GC and second row shows results using
J RS GC shows a tendency for over-segmentation and
under-segmentation 136
6.14 Results for registration of 3D liver perfusion images Each rowshows results for different slices The first column is the referenceimage followed by the floating image in the second column Thethird column shows the difference image before registration fol-
lowed by the difference image after registration using J RS in the
fourth column 137
Trang 193.1 Average translation error and registration accuracy for differentnoise levels The figures are for simulated motion studies on allvolumes of the sequence Translation errors are for values along
x, y, z axis . 55
3.2 Average translation errors for rigid registration N M I is ized mutual information QM I1 is the measure in (4) using scale-
normal-space maps QM I2 is our approach using the neurobiology based
saliency model All values are in units of mm 59
3.3 Average rotation errors for rigid registration N M I is ized mutual information QM I1 is the measure in (4) using scale-
normal-space maps QM I2 is our approach using the neurobiology based
saliency model All values are in units of degrees 59
5.1 Means and standard deviations of registration errors for syntheticimage datasets at different noise levels Values are in pixels 99
5.2 Segmentation performance for synthetic images Average DM ues for J RS, GC and AC at different noise levels are shown Values
val-shown are in % 101
5.3 Quantitative performance measures for cine cardiac image
registra-tion: N M I-normalized mutual information (no units); W C-Woods
criteria value in (3) (no units); and Err-displacement error in mm.
The values indicate the average measures over all datasets 103
Trang 206.1 Quantitative performance measures for cardiac image registration
using different similarity metrics: Int-only intensity information;
Sal-only saliency information; GI-only gradient information; and GSI-combination of gradient and saliency information N M I-
normalized mutual information (no units); W C-Woods criteria
value in (3) (no units); and Err-displacement error in mm as
de-scribed in Section 4.3.5 The values indicate the average measuresover all datasets 115
6.2 Quantitative performance measures for registration of 3D kidney
data; Int-only intensity information; Sal-only saliency tion; GI-only gradient information; and GSI-combination of gra- dient and saliency information N M I-normalized mutual informa- tion (no units); W C-Woods criteria value in (3) (no units); and
informa-Err-displacement error in mm The values indicate the average
measures over all datasets 118
6.3 Registration performance of cardiac perfusion datasets Means andstandard deviations of registration errors for real patient cardiacdatasets Values are in mm 120
6.4 Segmentation performance of cardiac perfusion datasets The
seg-mentation performance is evaluated in terms of DM values (in %)
and RMS errors (in mm) 121
6.5 Reliability function values for cardiac segmentation results for
dif-ferent d Higher the value of ℜ better is the performance
Calcu-lated values are over all datasets 121
6.6 Registration performance of kidney perfusion datasets Means andstandard deviations of registration errors for real patient cardiacdatasets Values are in mm 124
6.7 Segmentation performance of kidney perfusion datasets The
seg-mentation performance for GC and J RS is given in terms of DM
values (in %) and RMS errors (in mm) 125
6.8 Reliability function values for kidney segmentation results for
dif-ferent d The higher the value of ℜ better is the performance.
Calculated values are over all datasets 125
Trang 21Statistics from the American Heart Association indicate that cardiovascular ease (CVD) is a major cause of death The heart being an internal and very deli-cate organ, requires utmost care in its monitoring and treatment The method ofchoice is to use non-invasive diagnostic tools Dynamic contrast enhanced mag-netic resonance imaging (DCE-MRI) has emerged as an important non-invasivetechnique for the reliable functional analysis of internal organs DCE-MRI (orperfusion imaging) helps in the detection of abnormalities leading to early diag-nosis and treatment In perfusion imaging a contrast agent (e.g., gadopentetatedimeglumine (Gd-DTPA)) is intravenously injected into a patient and a series of
dis-MR images acquired over time The flow of contrast agent changes the sity of pixels in regions through which it flows (e.g., arteries, veins or muscles)
inten-By monitoring the flow of contrast agent (and hence the blood flow), a reliableanalysis of cardiac function can be obtained
Apart from the heart, renal (kidney) function analysis is also an importantdiagnostic measure because of the large number of people affected by renal dis-eases Of the 50 million Americans estimated to have hypertension, approxi-mately 1 − 5% have renovascular disease (RVD) as the underlying cause (5).Diagnosis of RVD is essential since hypertension and renal artery stenosis (RAS)are often found to coexist DCE-MRI plays an important role in diagnosis andtreatment of RVD It also helps in early detection of kidney rejection in case of
Trang 22transplantation, which has emerged as the treatment of choice for patients withend-stage renal diseases Monitoring the flow rate of contrast agent plays animportant role in fulfilling these objectives.
Certain challenges have to be overcome before an accurate functional analysis
of DCE-MRI can be carried out Since MR image acquisition may take severalminutes, the images are affected by patient breathing It leads to displacement
of the organs in the head to feet direction and also results in through-planemotion Patient movement is also observed resulting in further translation androtation effects For cardiac images elastic deformations of different tissues is alsoobserved It is necessary to correct for any observed motion, which is achieved
by registration Thereafter, the organ of interest (OOI) is extracted to determinetime intensity curves and other parameters for assessing its working Extraction
of the OOI is termed as segmentation Manual accomplishment of these tasks is
a labor and time intensive process due to the large number of acquired images foreach patient This necessitates the development of automated registration andsegmentation algorithms
This dissertation is aimed at the development of automated registration andsegmentation algorithms for DCE-MRI of the kidney and heart The algorithmsare especially meant to overcome the challenges of perfusion images, i.e., lowspatial resolution, noise, changing intensity and poor image contrast However,they can also be applied to other types of medical and natural images with minormodifications The developed algorithms were tested on cardiac, renal and liverDCE-MRI of human subjects
1.1.1 Our Contribution
Figure 1.1 shows the general work flow for registration and segmentation of namic perfusion images Before registering elastic deformations, translation androtation motion is corrected Depending upon patient movement, such motioncan be very large for some datasets Although many algorithms for correctingsuch movement exist, their effectiveness is limited by the nature of perfusiondatasets Changing intensity is a major challenge for all registration algorithms
dy-We have used a neurobiology based visual saliency model that identifies similar
Trang 23Figure 1.1: Diagram showing different stages for complete image analysis ofperfusion datasets.
regions in a pair of contrast enhanced images in spite of intensity change Thesaliency model is a fairly accurate representation of the working of the humanvisual system (HVS) due to bottom-up cues alone, as is demonstrated by its closeadherence to human eye-fixation patterns The saliency maps assign similar val-ues to the same region in a pair of images despite the presence of intensity change.Thus, saliency proves to be a contrast invariant metric assigning importance val-ues to every pixel The importance values, which are with respect to the taskbeing accomplished (i.e., registration), are used in a quantitative-qualitative mu-tual information (QMI) framework for registering the perfusion images This is arobust metric, outperforming conventional mutual information based approaches.Our method’s novelty lies in the use of a neurobiology based saliency model andinvestigating its effectiveness for image registration tasks Modifications weremade to the original saliency model to make it suitable for perfusion images Wealso perform a detailed analysis on the effectiveness of similar approaches andoptimization schemes for registration
Once rigid registration is complete, elastic deformations have to be sated especially for cardiac datasets Results for rigid registration show thatsaliency acts as a contrast invariant measure within a QMI framework Thesaliency maps, although sparse, work well with QMI because rigid registrationmaximizes a global metric An accurate one-to-one correspondence between pix-els, although not necessary for rigid registration, is imperative for elastic regis-tration A similarity measure for elastic registration should be able to capturethe relation between corresponding pixels in a deformed image pair We modifythe saliency model so that the saliency maps reflect the similarity of correspond-ing pixels in spite of displacement and intensity change A Markov random field
Trang 24compen-(MRF) framework was used which allows inclusion of saliency and context pendent information for robust registration Saliency also helped identify impor-tant pixels and reduce the number of graph-nodes, thus increasing registrationspeed Experimental results show that by using additional saliency informationour method outperforms conventional methods using only edge or intensity in-formation.
de-Image registration is generally followed by image segmentation Previous proaches used the registered image sequence for segmentation Provided the reg-istration is accurate, this approach gave satisfactory results Using separate reg-istration and segmentation step does not make full use of the available temporalinformation from contrast agent flow Although the wash-in and wash-out of con-trast agent poses challenges for registration, it gives important segmentation in-formation which can be exploited to improve registration accuracy Subsequently,improved registration leads to better segmentation results We have formulated
ap-a joint registrap-ation ap-and segmentap-ation method using MRFs where segmentap-ationclass and displacement vectors for every pixel are determined in one step, thusavoiding the need for multiple iterations Results on real patient datasets show asignificant improvement in registration and segmentation accuracy in comparisonwith previous methods which solved both problems separately
To summarize, this thesis makes the following contributions towards tion and segmentation of different dynamic perfusion MR images:
registra-1 Investigating the effectiveness of a neurobiological model of visual saliencyfor rigid registration of dynamic perfusion images The role of saliencyand the effectiveness of different optimization schemes were thoroughly an-alyzed Our method optimizes a QMI based cost function with saliencyvalues acting as the utility measure
2 Developing a framework for the combination of gradient and saliency formation for elastic registration of cardiac and renal perfusion images.Saliency information improves registration accuracy and reduces compu-tation time by identifying important pixels for registration
in-3 Formulating a MRF framework for joint registration and segmentation ofperfusion images A pixel’s displacement vector and segmentation class are
Trang 25determined in one step by combining registration and segmentation mation The proposed algorithm overcomes challenges of intensity changedue to contrast enhancement, as well as inaccurate segmentation due topoor contrast between object and background.
infor-1.1.2 Thesis Overview
The rest of the thesis is organized as follows Chapter 2 describes kidney andcardiac anatomy and introduces some common terms that shall be used for therest of the thesis Further, we provide background on different theoretical frame-works used in our work, e.g., saliency, mutual information, MRFs and graph cuts
In Chapter3 we describe our approach to rigid registration of perfusion images.The QMI framework and our novel optimization framework is also detailed Re-sults are shown for kidney images because of their large rotation and translationmotion Chapter 4 gives the principles and challenges of elastic registration inperfusion images The modified saliency map which acts as a contrast invariantmetric based local image information is explained Chapter 5 describes the ad-vantages of joint registration and segmentation over conventional methods thatsolve each problem separately An explanation of the principle of joint registra-tion and segmentation is provided, followed by our MRF based framework Thedifferent cost functions are explained, providing justification for combination ofregistration and segmentation information Some related experiments are alsoshown In Chapter 6 we present experimental results on perfusion images forelastic registration and joint registration and segmentation Finally, in Chapter7
we list our conclusions and outline ideas for future work
Trang 27• Left Atrium: is the upper left chamber of the heart that receives oxygenated
blood from the lungs and pumps it down into the left ventricle, which furtherpumps it into the body
• Right atrium: is the upper right chamber of the heart that receives blood
from the superior vena cava and transports it to the right ventricle
• Left ventricle: is the lower left chamber that pumps oxygenated blood into
the body
• Right ventricle: is the lower right chamber of the heart that pumps
deoxy-genated blood into the lungs through the pulmonary arteries
Figure2.1 (a) shows the different parts of the heart followed by an illustration
of blood flow in Fig 2.1 (b) The LV being the largest and strongest chamber
in the heart, pumps oxygenated blood out to distant tissues in the entire body
Trang 28(a) (b)
Figure 2.1: Heart anatomy (a) different parts of the heart and (b) the
and red components indicate oxygenated pathways The images were taken from
http://en.wikipedia.org/wiki/Heart.
Therefore, its performance is carefully monitored by physicians The IV and RV
are separated by the septum, a wall made of muscle The heart or the cardiac muscle tissue is called myocardium, and its inner and outer layers are called
endocardium and epicardium When the ventricles contract, the papillary muscles
help to keep tension on the chordae tendieae and thus assist in the functioning
of the valves
The cardiovascular system is made up of the heart and circulatory system
(blood vessels) Blood is supplied to the heart by its own vascular system, calledcoronary circulation Heart problems caused by narrowed heart arteries are called
coronary artery disease (CAD) or ischemic heart disease CAD is the most
com-mon form of heart disease in the developed world and the leading cause of heartattacks
Trang 29Figure 2.2: Kidney Anatomy 1 Renal pyramid 2 Interlobar artery 3 Renal
artery 4 Renal vein 5 Renal hilum 6 Renal pelvis 7 Ureter 8 Minor calyx 9.Renal capsule 10 Inferior renal capsule 11 Superior renal capsule 12 Interlobarvein 13 Nephron 14 Minor calyx 15 Major calyx 16 Renal papilla 17 Renal
column The image was taken from http://en.wikipedia.org/wiki/Kidney.
the kidney Renovascular disease refers to the vascular disorders of the kidneys.
Such disorders cause renal dysfunction from reduced blood flow due to partial orcomplete occlusion of large, medium or small renal vessels
2.1.3 Basics of Perfusion MR Imaging
Magnetic resonance imaging (MRI) is a non-invasive method to obtain images
of internal organs, thus giving anatomical details of soft tissue, such as grayand white matter in the brain, as well as as other organs like the heart, kidney
Trang 30and liver In the case of perfusion MRI, the organ of interest (OOI) is scannedrepeatedly by an MRI device following the bolus injection of contrast agent Theflow of the contrast agent is reflected in the changing intensity of different tissuesand organs The changing pixel intensity corresponding to the same tissue acrossthe image sequence provides valuable functional information about the OOI Thetime taken by the contrast agent to flow through kidney and drain into the bladder
is used to analyze kidney function; and the flow of contrast agent from the rightventricle to the left ventricle provides information on cardiac functions
Contrast agents are administered into the subject to facilitate observation
of intensity change and hence contrast variation between tissues The contrastchanges by varying the relaxation times of tissues (7; 8) In most cases theaddition of contrast agent improves the sensitivity and specificity of MR images
MRI contrast agents act predominantly on T1 relaxation, which results in signal
enhancement and “positive” contrast, or on T2 relaxation, which results in signalreduction and “negative” contrast
Positive contrast agents are typically small molecular weight compounds withtheir active elements being Gadolinium, Manganese or Iron, all of which haveunpaired electron spins in their outer shells and long relaxivities Gadoliniumdiethylenetriamine-pentaacetic acid (Gd-DTPA) is often used in cardiac perfu-sion MRI Negative contrast agents are small particulate aggregates like super-paramagnetic iron oxide (SPIO) After contrast injection, its concentration in theblood stream increases and then decreases as it is eliminated from the tissues Ingeneral, a contrast enhancement is obtained by one tissue having a higher affinity
or vascularity than another Most tumors for example have greater Gd uptake
than the surrounding tissues causing a shorter T1 and a stronger signal
Dynamic MRI refers to the acquisition of a series of MR images at a frame rate The acquired time-series MR images can be used to study dynamicprocesses such as tissue perfusion Achieving higher temporal resolution and spa-tial resolution in dynamic MRI are generally mutually conflicting Conventionaldynamic MRI acquires a full data set to reconstruct each time frame separately.But for perfusion studies the temporal resolution cannot be compromised There-fore, for fast acquisition of each time frame, perfusion MR image sequences haverelatively low spatial resolution
Trang 31high-2.2 Saliency
Saliency defines how different a region is from its surroundings based on ous features, thus attracting our attention Visual attention models (or saliencymodels) refer to computational models that determine a saliency map (conspicu-ity map) based on the interaction of different features Saliency models mayconsider two types of features, i.e., bottom-up, like intensity, colour, texture,edge orientation, etc., or top-down like prior knowledge of the desired task Weshall review some models of bottom-up feature based visual attention in currentliterature Treisman and Gelade in (9) proposed a feature integration theory,one of the earliest hypothesis for attention It suggests that attention must bedirected serially to each stimulus in a display whenever conjunctions of more thanone separable feature are needed to characterize or distinguish the presented ob-jects Another model was proposed by Mozer in (10) which modeled an objectfor recognition tasks The work by Itti and Koch (2) proposes the popular neuro-biological attention model based on saliency maps and has been found to have ahigh correlation with human fixations (11) Soto and Blanco in (12) explored therole of space based and object based visual attention within a cueing paradigm.Participants had to discriminate the orientation of a line that appeared withinone of four moving circles differing in colour A cue appearing close to one of thefour circles indicates the location or circle where the target stimulus was likely
vari-to appear Results suggest that object and space based attention interact withselection by location over object-based selection Logan in (13) proposed a theoryintegrating space based and object based approaches to visual attention
From the above works we infer that saliency is defined by local image features
at various scales Salient regions are those where feature strength is greater thanits neighbours This understanding goes beyond the concept of simple featureslike gradient magnitude or intensity difference On closer examination of whatmakes strong edges grab our attention the conclusion is, the difference in intensitybetween edge pixels and its neighbors being high the HVS is strongly attracted
to these points We now look at some works that use local features for objectdetection or salient region detection Scale is an important factor in these meth-ods leading to robust identification of salient regions Kadir and Brady in (14)
Trang 32use entropy in a scale-space model to detect salient regions in an image Entropygives a measure of information content in a neighborhood, and different scales areused for robust identification of salient regions Lowe in (15) introduces a scaleinvariant feature descriptor that identifies salient points irrespective of rotationand the scale at which features are selected This technique has been used inmany object matching tasks Serre et al in (16) propose a biological model forobject detection which is inspired by the working of the HVS It uses a feature setcombining position and scale tolerant edge detectors over neighboring locationsand multiple orientations for an object detection task.
Apart from detecting salient regions in static images, many works have cussed on detecting salient regions in videos Wixson in (17) integrates opticalflow cues to determine objects that are motion salient In (18; 19) the authorshave used a Bayesian framework to predict surprising regions in video while in(20) the problem of detecting salient regions in videos is posed as an inferenceprocess in a probabilistic graphical model The computational model in (21) usesentropy for identifying salient regions in numerous short-duration video clips
fo-In contrast enhanced images we need a similarity metric that can match gions in the face of contrast enhancement We use saliency as a contrast invariantmeasure because it has been found to have a high correlation with human fixa-tions (11) Humans are good at recognizing and identifying objects even in lowcontrast This inspired us to test whether a saliency based measure can be used
re-to match contrast enhanced images In this chapter, we shall give an overview oftwo works related to saliency, the neurobiology model of Itti and Koch (2) andthe scale-space map method of (14)
2.2.1 Itti-Koch Saliency Model
Primates have a remarkable ability to interpret complex scenes in real time It
is believed that intermediate and higher visual processes select a subset of able sensory information for processing (22) This is most likely to reduce thecomplexity of scene analysis (23) The selection of visual information appears
avail-to be in the form of a spatially circumscribed region of the visual field which is
Trang 33also called the focus of attention It scans the scene both in a rapid,
bottom-up, saliency-driven, and task independent manner and in a slower, top-down andtask dependent manner (23) Models of visual attention include “dynamic rout-ing models” where information from a small region of the visual field can progressthrough the cortical visual hierarchy The attended region is selected through dy-namic modifications of cortical connectivity or by establishing specific temporalpatterns of activity (22; 23;24)
The saliency model by Itti and Koch builds on biologically plausible ture proposed in (25) and is at the basis of several other models (26; 27) Themodel is related to the “feature integration theory” that explains human visualsearch strategies (9) Visual input is first decomposed into a set of topographicfeature maps and different spatial locations compete for saliency within each mapsuch that only locations that stand out from their surroundings persist The fea-ture maps serve as input to a saliency map that determines the conspicuity overthe entire visual scene It is believed that such a map is located in the posteriorparietal cortex of primates (28) The model represents a complete account ofbottom-up saliency and does not require any top-down guidance to shift atten-tion Such a framework allows for parallel processing for fast selection of a smallnumber of interesting image locations
architec-From the input image nine spatial scales are created using dyadic Gaussianpyramids (29) They progressively low pass filter and subsample the input imageyielding horizontal and vertical reduction factors ranging from 1 : 1 to 1 : 256
in eight octaves Each feature is computed by a set of linear center-surroundoperations akin to visual receptive fields Typical visual neurons are most sen-sitive in a small region of the visual space (the center) Stimuli presented in abroader, weaker antagonistic region concentric with the center (referred as thesurround) inhibit the neuronal response Such an architecture is sensitive to localspatial discontinuities, and is particularly well suited to detecting locations thatstand out from their surroundings This is a general computational principle inthe retina (30) Center-surround is implemented in the model as the difference
between fine and coarse scales The center is a pixel at scale c ∈ {2, 3, 4}, and
the surround is the corresponding pixel at scale s = c + δ, δ ∈ {3, 4} The
across-scale difference between the two maps is obtained by interpolating to finer
Trang 34scales and point-by-point subtraction Several scales lead to multiscale featureextraction by including different size ratios between center and surround regions.
2.2.1.1 Extraction of Early Visual Features
Although the original model includes colour information, we do not include it
in our description as it is not relevant for our data Let r, g and b be the red, green and blue channels of the input image and an intensity image I is obtained as
I = (r + g + b)/3 I is used to create a Gaussian pyramid I(σ) where σ ∈ [0 · · · 8]
is the scale Center-surround difference (denoted as ⊖) between a “center” fine
scale c and “surround” coarse scale s yields the feature maps The first set of
feature maps for intensity contrast is given by
Local orientation information is obtained from I using oriented Gabor mids O(σ, θ), where σ ∈ [0 · · · 8] represents scale and θ ∈ {0 ◦ , 45 ◦ , 90 ◦ , 135 ◦ }
pyra-are the preferred orientations (29) Orientation feature maps are obtained as
O(c, s, θ) = |O(c, θ) ⊖ O(s, θ)|. (2.2)
In total 30 feature maps are computed: 6 for intensity and 24 for orientation
The purpose of the saliency map is to represent the conspicuity (or saliency) atevery location in the visual field by a scalar quantity, and to guide the selection
of attended locations based on the spatial distribution of saliency A combination
of the feature maps provides bottom-up input to the saliency map modeled as adynamic neural network The different feature maps represent different modal-ities with different dynamic ranges and extraction mechanisms When all 30feature maps are combined salient objects appearing strongly in a few maps may
be masked by noise or less salient objects in other maps Therefore a
normaliza-tion operator N (.) is used to globally promote maps having a small number of
strong peaks of activity (conspicuous locations), while globally suppressing maps
Trang 35containing numerous comparable peak responses N (.) consists of the following
steps:
1 Normalize the values in the map to a fixed range [0· · · M], in order to
eliminate modality-dependent amplitude difference;
2 Find the location of the map’s global maximum M and compute the average
m of all its other local maxima; and
3 Globally multiply the map by (M − m)2
Comparing the maxima of the entire map to the average overall activationmeasures how different the most active location is from the average When thisdifference is large, the most active location stands out and the map is stronglypromoted When the difference is small the map contains nothing unique and is
suppressed The biological motivation behind the design of N (.) is that it coarsely
replicates cortical lateral inhibition mechanisms, where neighboring similar tures inhibit each other via specific anatomically defined connections (31) The
fea-feature maps are combined into two “conspicuity” maps, I for intensity and O for orientation at the scale σ = 4 of the saliency map The final saliency map
obtained as the combination of the two normalized conspicuity maps is
in a neuronally plausible implementation, the SM is modeled as a 2D layer of
leaky integrate-and-fire neurons at scale σ = 4 These model neurons consist of
a single capacitance that integrates the charge delivered by synaptic input, of
a leakage conductance and a voltage threshold When the threshold is reached
a prototypical spike is generated and the capacitive charge is shunted to zero.The SM feeds into a biologically plausible 2D “winner-take-all” (WTA) neuralnetwork (22; 25) at scale σ = 4, where synaptic interactions among units ensure
that only the most active locations are suppressed
Trang 36The neurons receiving excitatory input from SM are all independent Thepotential of SM neurons at more salient locations increases faster and each SMneuron excites its corresponding WTA neuron All the WTA neurons also evolveindependently of each other, until one (the winner) first reaches threshold andfires This triggers three simultaneous mechanisms :
1 The FOA is shifted to the location of the winner neuron;
2 The global inhibition of the WTA is triggered and completely inhibits sets) all WTA neurons;
(re-3 Local inhibition is transiently activated in the SM, in an area with the sizeand new location of the FOA; this not only yields dynamical shifts of theFOA, by allowing the next most salient location to subsequently becomethe winner, but also prevents the FOA from immediately returning to apreviously attended location
Such an inhibition of return has been demonstrated in human visual chophysics (32) As no top-down attentional component is modeled, the FOA
psy-is a simple dpsy-isk with radius fixed to one-sixth of the smaller of the input imagewidth or height The time constants, conductances and firing thresholds of thesimulated neurons were chosen so that the FOA jumps from one salient location
to another in approximately 30− 70 ms of simulated time The attended area is
inhibited for approximately 500− 900 ms The difference in the relative
magni-tude of these delays is sufficient to ensure thorough scanning of the image andprevent cycling through a limited number of locations
2.2.1.3 Strengths and Limitations
Despite its simple architecture and feed forward structure the model is capable
of strong performance with complex natural scenes It can quickly detect salientpoints in different kinds of images (2) Another strength of the model is the par-allel implementation of the computationally expensive early feature extractionstages and the attention focusing system This allows for real time operation
on dedicated hardware A critical part of the model is the implementation of
Trang 37the normalization operator N (.) which provides a general mechanism for
com-puting saliency The resulting saliency measure is closer to human saliency as
it implements spatial competition between salient locations The feed-forward
implementation of N (.) is faster and simpler than iterative schemes The
effi-ciency of the proposed saliency model depends upon the features used and can
be tailored to arbitrary tasks through the implementation of dedicated featuremaps
size For each voxel x, the probability distribution of intensity i in a spherical region of radius s centered at x is denoted as p i (s, x) The local entropy L(s, x) from p i (s, x) is defined below:
L(s, x) = −∑p i (s, x) log p i (s, x). (2.4)
The best scale s x for the region centered at voxel x is selected as one that maximizes the local entropy L(s, x) Since larger scale and higher local differences are also preferred, the saliency value of voxel x, denoted as A(s x , x) is defined
by the maximal local entropy value weighted by both the best scale s x and aself-dissimilarity measure in the scale space,
Trang 382.3 Mutual Information
Research work that eventually led to the introduction of mutual information as
a registration measure dates back to the early 1990′s Woods et al (3; 33) firstintroduced a registration measure for multimodal images based on the assump-tion that regions of similar tissue (and hence similar gray values) in one imagewould correspond to regions in another image of similar gray values Ideally, theratio of the gray values for all corresponding points in a certain region in eitherimage varies little Consequently the average variance of this ratio for all regions
is minimized to achieve registration Hill et al in (34) proposed an adaptation ofWoods’ measure They construct a feature space which is a two dimensional plotshowing the combination of gray values in each of the two images for all the corre-sponding points The feature space (or joint histogram) changes as the alignment
of images change When the two images are correctly registered, correspondinganatomical structures overlap and the joint histogram shows certain clusters forthe gray values of the structures As the images become misaligned, structuresstart overlapping with those that are not their anatomical counterparts As aresult, the intensity of clusters for corresponding anatomical structures will de-crease and new combinations of gray values emerge This is reflected in the jointhistogram by a dispersion of the clustering
Entropy was proposed as a registration measure in (35;36) Entropy measuresthe the dispersion of a probability distribution It is low when a distribution has
a few sharply defined dominant peaks and is maximal when all outcomes have an
equal chance of occurrence Given events e1, · · · , em occurring with probabilities
p1, · · · , pm, the Shannon entropy is given by
where p(i, j) is the joint distribution obtained from the joint histogram Mutual
Trang 39information (MI) is defined in terms of joint entropy as
I(A, B) = H(A) + H(B) − H(A, B). (2.8)
Maximizing mutual information is related to minimizing joint entropy viously, it has been described how the joint histogram of two images’ gray valuesdisperses with misregistration and the joint entropy is a measure of dispersion.The advantage of MI over joint entropy is that it includes the entropies of theindividual images MI and joint entropy are computed for the overlapping areas
Pre-of two images and is therefore sensitive to the size and contents Pre-of overlap Onedisadvantage of joint entropy is that low values can result from complete misreg-istrations Thus MI is also defined in the form of Kullback-Liebler distance which
is a measure of the distance between two distributions The MI of images A and
The above equation measures the distance between the joint distribution in
case of independence of p(a) and p(b) Increasing misregistration leads to a
de-crease in the MI value MI is not an easy measure to understand, e.g., theunderlying process of how misregistration influences the probability distribution
is difficult to envisage or the relation between joint and marginal distributions.But MI has been found to be a generally applicable measure for clinical applica-tions without the need for parameter tuning, preprocessing or user initialization.However, in MI based registration there is no way to integrate the correlation ofneighboring pixels into the probability distribution which should be an importantconsideration
Contextual constraints are a necessity in the interpretation of visual tion A scene is understood in the spatial and visual context of the objects in it.Markov Random Field (MRF) theory provides a convenient and consistent way
Trang 40informa-for modeling context dependent entities such as image pixels and correlated tures This is achieved by characterizing mutual influences among such entitiesusing conditional MRF distributions The practical use of MRF models is largelyascribed to a theorem stating the equivalence between MRFs and Gibbs distri-butions which was established by Hammersley and Clifford in (37) and furtherdeveloped by Besag in (38) This is because the joint distribution is required inmost applications but deriving the joint distribution from conditional distribu-tions is difficult for MRFs The MRFs-Gibbs equivalence theorem points out thatthe joint distribution of an MRF is a Gibbs distribution Also from the compu-tational perspective, the local property of MRFs leads to algorithms which can
fea-be implemented in a local and massively parallel manner
2.4.1 Visual Labeling
Most vision problems can be posed as labeling problems in which the solution is aset of labels assigned to image pixels or features A labeling problem is specified
in terms of a set of sites, and a set of labels A site represents a point or region
in the Euclidean space such as an image pixel or an image feature like a cornerpoint, line segment or surface patch A set of sites are categorized in terms of
their regularity The set of sites S for a 2D image of size n × n is denoted by
S = {(i, j)|1 ≤ i, j ≤ n}. (2.10)
A label is an event that may happen to a site The set of labels is represented
by L and can be discrete or continuous For example, in edge detection the label set is L = {edge,non-edge} Another essential property of a label set is continu-
ity For an ordered label set, a quantitative measure of similarity between anytwo labels can be defined The similarity measure is symbolic, typically taking
a value on “equal” or “non-equal” Ordering and similarity not only categorizelabeling problems, but also affect computational complexity and our choice oflabeling algorithms In the terminology of random fields, a labeling is also called
a configuration In computer vision, a configuration or labeling can correspond
to an edge map, an interpretation of image features in terms of object features