A maximum likelihood method for detecting bad samples from Illumina BeadChips data Nguyễn Hà Anh Tuấn Trường Đại học Công nghệ Luận văn Thạc sĩ ngành: Khoa học máy tính; Mã số: 60 48 0
Trang 1A maximum likelihood method for detecting bad samples from Illumina BeadChips data
Nguyễn Hà Anh Tuấn
Trường Đại học Công nghệ Luận văn Thạc sĩ ngành: Khoa học máy tính; Mã số: 60 48 01
Người hướng dẫn: TS Lê Sỹ Vinh
Năm bảo vệ: 2012
Keywords Công nghệ thông tin; Dữ liệu
Content
Table of Contents
Overview 1
1 Introduction 3 1.1 Biological background 3
1.2 Some common types of mutation 5
1.3 SNP and SNP genotype 6
1.4 Microarray technology and Illumina BeadChips 7
1.5 Genotype callers 8
1.6 Quality control and quality assurance 9
1.6.1 Identify samples with discordant sex information 10
1.6.2 Identify samples that have high missing and heterozygosity rate 11 1.6.3 Identify duplicated or related samples 11
1.6.4 Identify samples that have different ancestries 12
2 Genotype callers 14 2.1 Illuminus 14
2.2 GenoSNP 17
2.3 GenCall 18
Trang 22.4 Comparing three callers 18
3 Maximum likelihood method for detecting bad samples 20
3.1 Create potential bad sample list 21
3.2 Estimate the fitness of data 22
3.3 Remove bad samples 24
4.1 Input file format 25
4.2 Experiment 1 27
4.3 Experiment 2 31
References
[APC+10] C.A Anderson, F.H Pettersson, G.M Clarke, L.R Cardon, A.P Morris, and
K.T Zondervan Data quality control in genetic case-control association
studies Nat Protoc, 5(9):1564-73, 2010
[CBSI07] Benilton Carvalho, Henrik Bengtsson, Terence P Speed, and Rafael A
Irizarry Exploration, normalization, and genotype calls of high-density
oligonucleotide snp array data Biostatistics, 8(2):485-499, 2007
[CM01] Francis S Collins and Victor A McKusick Implications of the human genome
project for medical science JAMA: The Journal of the American Medical
Association, 285(5):540-544, 2001
[GYC+08a] Eleni Giannoulatou, Christopher Yau, Stefano Colella, Jiannis Ragous- sis, and
Christopher C Holmes Genosnp: a variational bayes within- sample snp genotyping algorithm that does not require a reference population
Bioinformatics, 24(19):2209-2214, 2008
[GYC+08b] Eleni Giannoulatou, Christopher Yau, Stefano Colella, Jiannis Ragous- sis, and
Christopher C Holmes A genotype calling algorithm for the illumina
beadarray platform Bioinformatics, 24(19):2209-2214, 2008
[Inc05] Illumina Inc Illumina gencall data analysis software http:
gencall_data_analysis_software.pdf, 2005
Trang 3[Inc06] Illumina Inc Infinium ii assay workflow http://www.illumina.com/
documents/products/workflows/workflow_infinium_ii.pdf, 2006
[KF01] Larry J Kricka and Paolo Fortina Microarray technology and appli-