© 2003 The MathWorks, Inc.MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB The MathWorks, Inc... © 2003 The MathWorks, Inc.Bioinfor
Trang 1© 2003 The MathWorks, Inc.
MATLAB Applications in Bioinformatics
Developing and Deploying Bioinformatics
Applications with MATLAB
The MathWorks, Inc.
Trang 2© 2003 The MathWorks, Inc.
Presentation Layout
MATLAB applications in Bioinformatics
Customer success stories
MATLAB & The Bioinformatics Toolbox
Sequence analysis
Microarray analysis
Integrating MATLAB with other tools
MATLAB as computational engine for Excel
Questions/Answers & Wrap-up
Trang 3© 2003 The MathWorks, Inc.
Bioinformatics Applications
• Sequence analysis
• Base calling algorithm design, sequence alignment,
sequence building algorithms
• Microarray analysis
• Image processing, QA/QC, data normalization, data analysis
• Proteomics
• Mass Spectrometry signal processing, protein marker
identification and classification, peptide sequence
identification, 2D-Gel image analysis
• Systems Biology
• Interaction network identification, simulation of metabolic
pathways, flux analysis
Trang 4© 2003 The MathWorks, Inc.
Bioinformatics teams supporting multiple
constituencies with multiple tools.
• SPLUS, R, SAS, Mathematica
• Web based tools
•Custom one-off analyses
•Programs for biologists
Software Engineers
•C++, Java
•Work off MATLAB prototypes
Trang 5© 2003 The MathWorks, Inc.
Using MATLAB, bioinformatics teams can support
multiple constituencies.
MATLAB GUI’s, analyses
•Custom one-off analyses
•Programs for biologists
Applicatio ns
Trang 6© 2003 The MathWorks, Inc.
Complete draft of the human genome, accelerated by Applied Biosystems — using MATLAB
algorithms.
“Having one integrated package
is a big advantage Using MATLAB and the MATLAB Compiler reduced my development time by a factor of 4 or 5.”
“MATLAB has always been ideal as an algorithm prototyping tool,” Labrenz concludes, “but the MATLAB Compiler and C/C++ Math and Graphics Libraries add a whole new dimension, allowing rapid
delivery of sophisticated solutions.”
Jim Labrenz, Applied Biosystems
User example: Genetic Sequence Base Calling
Trang 7© 2003 The MathWorks, Inc.
User example: Breast Cancer Prognosis
Rosetta Inpharmatics recently developed a tool that enables clinicians to determine a breast cancer patient’s prognosis based on the gene expression profile of the primary tumor
“Since MATLAB and the Image Processing Toolbox are fully integrated and the MATLAB platform is very good for matrix calculation, we did not have
to spend time writing the low level image processing and the basic data analysis routines like vector and matrix calculations”
“Our research scientists are happy with the quick feedback,” Dr Dai says “Using MathWorks tools,
we can respond to their requests very fast, and it’s easy for the scientists to use these tools
Using the GUIs that we develop in MATLAB, they can access functions without having to remember the underlying code.”
Dr Hongyue Dai, Rosetta Inpharmatics/Merck & Company
Trang 8© 2003 The MathWorks, Inc.
Trang 9© 2003 The MathWorks, Inc.
More than 600 textbooks for education and professional use, in 19
– Natural Sciences – Environmental Sciences
Thousands of universities teach students using
MathWorks products.
Trang 10© 2003 The MathWorks, Inc.
Industry Issues & Solutions
•Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported
•There is no standard biological
data format
•Applications must be easily
deployable within organizations
•MATLAB is a supported, open architecture, user-friendly
environment for data analysis across applications, algorithm development, and deployment
•MATLAB and the Bioinformatics Toolbox provides file format support for common data sources (web-
based, sequences, microarray, etc.)
•MATLAB’s deployment tools and user-interface design environment allow easy deployment of MATLAB based applications
Trang 11© 2003 The MathWorks, Inc.
The Bioinformatics Toolbox
Robert HensonThe MathWorks, Inc
Developing and Deploying Bioinformatics
Applications with MATLAB
Trang 12© 2003 The MathWorks, Inc.
The MathWorks Product Family
Code Generation
Blocksets
Integrated for:
technical computing, data analysis and visualization
system modeling and simulation
implementation of real-time embedded software
PC-based real-time systems
Trang 13© 2003 The MathWorks, Inc.
• FASTA, PDB, SCF, GPR, GAL
• GenBank, EMBL, PIR, PDB
• Needleman-Wunsch, Smith-Waterman
• DNA/RNA/AA conversions, pattern searching
• Lowess, global mean, MAD (median absolute deviation)
Trang 14© 2003 The MathWorks, Inc.
Command History
MATLAB Desktop Tools
Launchpad:
Start other tools and
demos
Workspace Browser:
See your data
Command Window
Trang 15© 2003 The MathWorks, Inc.
Sequence Alignment Tutorial Example
• Get human and mouse genes from GenBank
• Look for open reading frames (ORFs)
• Convert DNA sequences to amino acid sequences
• Create a dotplot of the two sequences
• Perform global alignment
• Perform local alignment
Trang 16© 2003 The MathWorks, Inc.
Microarray Data Analysis Tutorial Example
• Plot expression profiles for genes
• Filter genes based on information content of profile
• Perform hierarchical clustering
• Perform K-means clustering
• Perform Principal Component Analysis
Reference:
DeRisi, JL, Iyer, VR, Brown, PO "Exploring the metabolic and genetic control of gene expression on a genomic scale." Science 1997 Oct 24;278(5338):680-6.
Trang 17© 2003 The MathWorks, Inc.
Integrating and Deploying Bioinformatics Tools with
MATLAB
Robert HensonThe MathWorks, Inc
Developing and Deploying Bioinformatics
Applications with MATLAB
Trang 18© 2003 The MathWorks, Inc.
Trang 19© 2003 The MathWorks, Inc.
C/C++
Web
Stand-alone
Excel COM
Deploying with MATLAB
Trang 20© 2003 The MathWorks, Inc.
Push Data into MATLAB
Trang 21© 2003 The MathWorks, Inc.
Computational Engine for Excel
Spread Sheet Applications
• MATLAB Excel Link can be the
computational engine behind your
Excel applications
• Fast scalable solution
MLPutMatrix("data",B2:H43) MLPutMatrix("Genes",A2:A43) MLPutMatrix("TimeSteps",B1:H1) MLEvalString("clustergram(data,'RowLabels',…
Genes,'ColLabels',TimeSteps)")
Trang 22© 2003 The MathWorks, Inc.
Image Processing Signal Processing
Statistics
What else could you do?
Bioinformatics
Trang 23© 2003 The MathWorks, Inc.
Integrating and Deploying Bioinformatics Tools with
MATLAB
Robert HensonThe MathWorks, Inc
Developing and Deploying Bioinformatics
Applications with MATLAB
Trang 24© 2003 The MathWorks, Inc.
Industry Issues & Solutions
•Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported
•There is no standard biological
data format
•Applications must be easily
deployable within organizations
•MATLAB is a supported, open architecture, user-friendly
environment for data analysis across applications, algorithm development, and deployment
•MATLAB and the Bioinformatics Toolbox provides file format support for common data sources (web-
based, sequences, microarray, etc.)
•MATLAB’s deployment tools and user-interface design environment allow easy deployment of MATLAB based applications
Trang 25© 2003 The MathWorks, Inc.
Further Information
• Bioinformatics Toolbox Product page
–Demos, technical literature, trial information