Strategies for Clustering, Classifying, Integrating, Standardizing and Visualizing Microarray Gene Expression Data

Granda, Willy Valdivia

doi:10.1007/978-1-4419-8760-0_8

Willy Valdivia Granda²

426 Accesses
2 Citations

Abstract

Over the last century, investigation of the anatomical and morphological characteristics of a small number of organisms has played an important role in the understanding of numerous biological processes. The rediscovery of Mendel’s laws of heredity in the opening of the 20th century initiated a scientific quest to understand the mechanisms of how genetic information is transmitted and the biological consequences of genetic variation. With the emergence of molecular biology in the last thirty years, classical genetic research has shifted from understanding how visible traits are transmitted to the study of the genome structure at the molecular level. Innovations such as PCR and advances in robotics such as miniaturization and parallelization have lead to a rapid development of more accurate, sensitive and powerful devices used for the analysis of the molecular structure, function and interaction of gene products. With the development of ultrahigh throughput screening tools and the drive from 96-microwell plates to 384- and 1536-microwell plates, it is expected that the generation of whole sequences of different organisms will increase at a rate ~100 times higher than previously anticipated (HeadGordon and Wooley, 2001; Helfrich, 2002; Beson et al. 2002). As powerful, automated biological sampling and analytical tools are becoming available to more laboratories, an exponential and sometimes overwhelming accumulation of multi-format post-genomic datasets is produced. Consequently, modern biology is becoming a data driven multidisciplinary science in which biologists, mathematicians, statisticians, physicists and computer scientists are developing tools to identify ‘in silico’ the coding regions of genes, predict and model protein structural characteristics, define protein-protein interactions, construct biochemical networks and identify potential drug targets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from £29.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 88.00; Price includes VAT (United Kingdom)

Hardcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Holomics - a user-friendly R shiny application for multi-omics data integration and analysis

Article Open access 04 March 2024

Bioinformatics: new tools and applications in life science and personalized medicine

Article 06 January 2021

An Engineering Approach to Bioinformatics and Its Applications

References

Aach J, Rindone W, Church GM (2000) Systematic management and analysis of yeast gene expression data. Genome Res (10)431–345.
Article PubMed CAS Google Scholar
Achard F, Vaysseix G, Barilot E (2001) XML, Bioinformatics and data integration. Bioinformatics (17)2:115–125.
Article PubMed CAS Google Scholar
Aggarwal CC (2002) Towards effective and interpretable data mining by visual interaction. SIGKDD explorations (3)2:11–34.
Article Google Scholar
Akutsu T, Miyano S, Kuhara, S (2000) Inferring qualitative relations in genetic networks and metabolic pathway. Bioinformatics 16:727–734.
Article PubMed CAS Google Scholar
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudso n J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R and Staudt LM et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.
Article PubMed CAS Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patters of gene expression revealed by clustering analysis. Proc. Natl. Acad. Sci. USA (96)12:6745–6750.
Article Google Scholar
Alter O, Brown P, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modelling. Proc. Natl. Acad. Sci. USA (97)18:10101–10106.
Article Google Scholar
Ambroise C, McLachlam G (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Nat Acad Sci USA (99)10:6562–6566.
Article CAS Google Scholar
Anderson AB, Basilevsky A, Hum DPJ (1983) Missing Data: A review of the literature. (Rossi PH, Wright JD, Anderson AB Eds). Handbook in Survey Research (pp. 415–494). Academic Press.
Google Scholar
Aronow BJ, Richardson B, Handwerger S (2001) Microarray analysis of trophoblast differentiation: gene expression reprogramming in key gene function categories. Physiol Genomics 6:105–116.
PubMed CAS Google Scholar
Azuaje F, Bolshakova N (2002) Clustering genomic expression data: Design and evaluation principles. In: Understanding and Using Microarray Techniques. A practical Guide. (Bubitzky BD, Granzow M Eds) London: Spring Verlag.
Google Scholar
Baldi P, Brunak S (2001) Bioinformatics: the Machine Learning Approach. Cambridge: MIT Press.
Google Scholar
Baldi P, Long A (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (17) 6:509–519.
Article PubMed CAS Google Scholar
Baldi P, Natfield W (2002) DNA microarrays and gene expression. From experiments to data analysis and modelling. Cambridge: Oxford UP.
Google Scholar
Barash Y, Friedman N (2002) Context-specific Bayesian clustering for gene expression data.
Google Scholar
Comput Biol 9(2):169–191.
Google Scholar
Barillot E, Achard F (2000) XML: a lingua franca for science. TIBTECH 18:331–333.
Article CAS Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman D, Ostell J, Rapp BA Wheeler D (2002) GenBank. Nucleic Acids Res (30): 17–20.
Article PubMed CAS Google Scholar
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Gillanders E, Leja D, Dietrich K, Berens M, Alberts D, Sondak V, Hayward N, Trent J (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406:536–440.
Article PubMed CAS Google Scholar
Bø TH, Jonassen I (2002) New feature selection procedure for classification of expression profiles. Genome Biology 3(4);research0017.1-0017.11.
Google Scholar
Bolshakova N, Azuaje F (2003) Cluster validation for genome expression data. Technical Report TCD-CS-2002-33 Computer Science Department. Trinity College Dublin http:// www.cs.tcd.ie/publications/tech-reports/reports.02/TCD-CS-2002-33.pdf
Google Scholar
Bower JM, Bolouri H (2001) Compuational modelling of biochemical networks. Massachusetts: MIT Press.
Google Scholar
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim I, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimun information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Gen 29:365–371.
Article CAS Google Scholar
Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone SA (2003) ArrayExpress-a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 31(1):68–71.
Article PubMed CAS Google Scholar
Brazma A, Robinson A, Cameron G, Ashburner M (2002) One-shop for microarray data. Nature 403:699–700.
Article CAS Google Scholar
Brazma A, Vilo J (2002) Gene Expression Data Analysis. FEBS Lett (480)1:17–24.
Google Scholar
Breiman L (1998) Bagging Predictors. Technical Report No. 421. Department of Statistics University of California Berkeley.
Google Scholar
Brody J.P., Williams B.A., Wold B.J., Quake S.R. (2002) Significance and statistical errors in the analysis of DNA microarray data. Proc. Nat. Acad. Sci. USA (99):20:12975–12978.
Article CAS Google Scholar
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97:262–267.
Article PubMed CAS Google Scholar
Burges, C. (1998) A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery (2)2:1–43.
Google Scholar
Butte AJ, Tamayo P, Slonin D, Golub T, Kohane I (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Nat. Acad. Sci. USA 97(22): 12182–12186.
Article PubMed CAS Google Scholar
Celis JE, Kruhoffer M, Gromova I, Frederiksen C, Ostergaard M, Thykjaer T, Gromov P, Yu J, Palsdottir H, Magnusson N, Orntoft TF (2000) Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. FEBS Lett 480(1):2–16.
Article PubMed CAS Google Scholar
Cheng, Y, Church GM (2000) Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 8:93–103.
PubMed CAS Google Scholar
Chilingaryan A, Gevorgyan N, Vardanyan D, Jones D, Szabo A (2002) Paper title. Mathematical Biosciences (176):59–72.
Article PubMed CAS Google Scholar
D’haeseleer P (2001) Beyond Co-Expression: Gene Network Interference. www.cs.unm.edu/ ~patrick/networks/diss.pdf
Google Scholar
DTiaeseleer P, Liang S, Somogyi R (2000) Genetic network interference: from co-expression clustering to reverse engineering. Bioinformatics (16)8:707–726.
Article Google Scholar
Dudoit S, Fridlyand J (2002) A prediction-based resampling methods for estimating the number of clusters in a dataset. Genome Biology (3)7:research0036.1-0036.21.
Google Scholar
Dudoit S, Fridlyand J, Speed TP (2000) Comparison of discrimination methods of tumors using gene expression data. Department of Statistics Technical Report 576. University of Berkeley.
Google Scholar
Dubitzky W, Granzow W, Berrar D (2001) Data Mining and Machine Learning Methods for Microarray Analysis. In: Methods of Microarray Data Analysis (Lin SM, Johnson KF eds) (pp 5–22). Massachusetts: Springer Science+Business Media New York.
Google Scholar
Duogherty E, Barrera J, Brun M, Kim S, Cesar RM, Chen Y, Bittner M, Trent M (2002) Inference from clustering with application to gene-expression microarrays. J Comp Biol (9)1:105–126.
Article Google Scholar
Edgar R, Domrachev RM Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res (30)1:207–210.
Article Google Scholar
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. New York: Chapman & Hall.
Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25): 14863–14868.
Article PubMed CAS Google Scholar
Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M (2001) Correspondence analysis applied to microarray data. Proc Natl Acad Sci USA 98: 10781–10786.
Article PubMed CAS Google Scholar
Fix E, Hodges J (1951) Discriminatory analysis non parametric discrimination: consistency properties. Technical Report Randolph Filed Texas. USAF School of Aviation Medicine.
Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comp Syst Sci 55(1): 119–139.
Article Google Scholar
Friedman N, Linial M, Nachman I, Pe′er D (2000) Using Bayesian Networks to Analyze Expression Data. J Comput Biol 7(3-4):601–20
Article PubMed CAS Google Scholar
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (16)10:906–914.
Article Google Scholar
Garofalakis M, Hyun D, Rastogi R, Shim (2000) Efficient Algorithms for Constructing Decision Trees with Constraints. Proc. Sixth ACM SIGKDD. Paper 296.
Google Scholar
Geschwind DH (2001) Sharing gene expression data: an array of options. Nature Rev Neuroscience. (2):435–438.
Article CAS Google Scholar
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Nat. Acad. Sci. USA (97)22:12079–12084.
Article Google Scholar
Gilbert DR, Schroeder M, van Helden J (2000) Interactive visualization and exploration of relationships between biological objects. TIBTECH (18):487–494.
Article CAS Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligluri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537.
Article PubMed CAS Google Scholar
Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res (1):94–96
Article Google Scholar
Graves DJ (1999) Powerful tools for genetic analysis come to age. TIBTECH (17) 127–134.
Article CAS Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2000) Gene selection for cancer discrimination using support vector machines. Machine Learning 46(l/3):389.
Google Scholar
Halgren RG, Fielden MR, Fong CJ, Zacharewski TR (2001) Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res. 29(2):582–8.
Article PubMed CAS Google Scholar
Han J, Kamber M (2001) Data mining. Concepts and applications. San Francisco: Morgan Kaufmann Press.
Google Scholar
Hand DJ, (1999) Statistics and Data Mining: Intersecting Disciplines. Proc. Fifth ACM SIGKDD (1)1:16–19.
Article Google Scholar
Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. Cambridge: MIT Press.
Google Scholar
Harding J, Rocke DM (2002) Robust Model-Based Clustering of Genes in Microarray Data: Are there G ene C lusters? www.camda.duke.edu/CAMDAOO/Abstracts/Presentations/ Poster_13.pdf
Google Scholar
Harrison P, Kumar A, Lan N, Echols N, Snyder M, Gerstein M (2002) A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution. J Mol Biol 316:409–19.
Article PubMed CAS Google Scholar
Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown PO (2001a) Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2):research0003.1-0003.21.
Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Data Mining Inference and prediction. Berlin: Springer-Verlag.
Google Scholar
Hawng KB, Cho DY, Park S, Kim SD, Zhang BT (2002) Applying machine learning techniques to analysis of gene expression data: Cancer diagnostics. Methods of Microarray Data Analysis. (Lin SM, Johnson, KF eds.) (pp 167–182). Massachusetts: Springer Science+Business Media New York.
Google Scholar
HeadGordon T, Wooley J (2001) Computational challenges in structural and functional genomics. IBM System Journal. (40)2: 265–296.
Article Google Scholar
Helfrich JP (2002) Raw Data to Knowledge Warehouse in Proteomic-Based Drug Discovery: A Scientific Data Management Issue. Biotechniques Supp. on Comp Proteom 48–53.
Google Scholar
Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics (17)2:126–136.
Article Google Scholar
Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9:1106–1115.
Article PubMed CAS Google Scholar
Hilsenbeck SG, Friedrichs WE, Schiff R, O’Connell P, Hansen RK, Osborne CK, Fuqua SAW (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J Nat Cancer Inst 91: 453–459.
Article PubMed CAS Google Scholar
Holter NS, Maritan A, Cieplak M, Federoff NV, Banavar JR (2002) Dynamic modelling of expression data. Proc Nat Acad Sci USA (98)4j:193–1698.
Google Scholar
Hvidsten TR, Komorowski J, Sandvik AK, Legreid AL (2001) Predicting gene function from gene expressions and ontologies. In: Pacific Symposium on Biocomputing pp. 299–310 (Airman RB Dunker AK Hunter L Lauderdale K and Klein TE eds) Mauna Lani Hawaii World Scientific Publishing Co.
Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys 31(3):264–323.
Article Google Scholar
Jamil HM, Modica GA, Teran MA (2001) Towards a Visual Query Interphase for Phylogenetic Databases. CIKM’ 01:57–64.
Google Scholar
Kanehisa M, Goto S, Kawashima S, Nakaya A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res. 30(1):42–6.
Article PubMed CAS Google Scholar
Kaufiman, SA (1998) Investigations. New York: Oxford UP.
Google Scholar
Kazic T (2000) Semiotes: a semantics for sharing. Bioinformatics 16(12): 1129–1144.
Article PubMed CAS Google Scholar
Keller DA, Schummer M, Hood L, Ruzzo WL (2000) Bayesian Classification of DNA Array Expression Data. Technical Report UW-CSE-2000-08-01.
Google Scholar
Kerr MK, Churchill GA (2001) Bootstrapping cluster analysis: Asessing the reliability of conclusions from microarray experiments. Proc. Nat. Acad. Sci. USA (98)16:8961–8965.
Article Google Scholar
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6): 658–659.
Article CAS Google Scholar
Kitano H (2002) Computational system biology. Nature 420:206–210.
Article PubMed CAS Google Scholar
Kitano H (2002a) Foundations of system biology. Massachusetts: MIT Press.
Google Scholar
Kohonen T (1981) Automatic formation of topological maps of patterns in a self-organizing system. In Proc. Second Scandinavian Conf. on Image Analysis 214–220.
Google Scholar
Kohonen T (1997) Self-organizing maps. Berlin: Springer–Verlag.
Google Scholar
Kothapalli R. Yoder SJ, Mane S, Loughram Jr TP (2002) Microarray results: How accurate they are? BMC Bioinformatics (3):22
Article PubMed Google Scholar
Kuo WP, Jenseen T, Butte AT, Ohno-Machado L, Kohane IS (2002) Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics (18):405–412.
Article PubMed CAS Google Scholar
Kuramochi M, Karypis G (2001) Gene Classification using expression profiles: A feasibility study. Department of Computer Science/Army HPC Research Center. Technical Report 01-029.
Google Scholar
Landgrebe J, Wurst W, Welzl G (2002) Permutation-validated principal components analysis of microarray data. Genome Biol 3(4):research0019.
Article PubMed Google Scholar
Lee MT, Kuo FFC, Whitemore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: Statistical methods and evidence of repetitive cDNA hybridisations. Proc. Nat. Acad. Sci. USA (97)18:9834–9839.
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LA (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN algorithms. Bioinformatics 12(12): 1131–1142.
Article Google Scholar
Liang S, Fuhrman S, Somogyi R (1998) REVEAL. A genereal reverse engineering Algorithm for the Interference of Genetic Network Architecture. Pac. Symp. Biocomputing 18–29.
Google Scholar
Little RA, Rubin DR (1987) Statistical analysis with missing data. New York: John Wiley & Sons.
Google Scholar
Lockhart DJ, Winzeler EA (2001) Genomics gene expression and DNA arrays. Nature (405):827–836.
Article CAS Google Scholar
MacQueen J 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on mathematical statistics and probability 1:281–297.
Google Scholar
Mendez MA, Hodar C, Vulpe C, Gonzalez M, Cambiazo V (2002) Discriminant analysis to evaluate clustering of gene expression data. FEBS Letts 522(1-3):24–28.
Article CAS Google Scholar
Model F, König T, Piepenbrock C, Adorjan P (2002) Statistical process control for large scale microarray experiments. Bioinformatics 155–163.
Google Scholar
Moler EJ, Chow ML, Mian JS (2000) Analysis of molecular profile data using generative and discriminative methods. Physiol. Genomics 4:109–126.
PubMed CAS Google Scholar
Mukherjee S (2002) Classifying Microarray Data Using Support Vector Machines. Berrar DP, Dubitzky W, Granzow M (Eds). A Practical Approach to Microarray Data Analysis. Boston: Springer Science+Business Media New York.
Google Scholar
Mukherjee S, Tamayo P, Mesirov JP, Slonim D, Verri A, Poggio T (199) Support Vector Machine Classification of Microarray Data. CBCL Paper 182/AI Memo-1676, Massachusetts Institute of Technology. Cambridge.
Google Scholar
Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA (2002) The limit of the fold change: A practical approach for selecting differentially expressed genes from microarray data. BMC Bioinformatics 3:17
Article PubMed Google Scholar
Nadon R, Shoemaker J (2002) Statistical issues with microarrays: processing and analysis. Trends in Genetics 18(5):265–271.
Article PubMed CAS Google Scholar
Pan K, Lih C, Cohen SN (2002) Analysis of NDA microarrays using algorithms that employ rule-based expert knowledge. Proc Nat Acad Sci USA 99(4):21118–2123.
Article CAS Google Scholar
Pavlidis P, Weston J, Cai J, Grundy WN (2001) Gene functional classification from heterogeneous data. RECOMB 2001: Proc Fifth Ann Int Conf Comp Biol 249-255.
Google Scholar
Peterson LE (2003) Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Comput Methods Programs Biomed 70(2): 107–119
Article PubMed Google Scholar
Proudfoot N (1980) Pseudogenes. Nature 286(5776):840–841.
Article PubMed CAS Google Scholar
Qi. H (2002) Feature Selection and kNN fusion in molecular classification of multiple tumor types. Proc. Intern. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS’02) http://aicip.ece.utk.edu/publication/02metmbs.pdf
Google Scholar
Quackenbush J. Computational analysis of microarray data. (2001) Nat Rev Genet 2(6):418–427.
Article PubMed CAS Google Scholar
Ramoni M, Sebastiani P (1998) Bayesian methods for intelligent data analysis. Kmi Technical reportKMi-TR-67. The Open University.
Google Scholar
Ramoni M, Sebastiani P, Kohane I.S. (2002) From the cover: Cluster Analysis of Gene Expression Dynamics. Proc Nat Acad Sci USA 99(14):9121–9126.
Article PubMed CAS Google Scholar
Ravasz E, Somera L, Mongru DA, Oltvai N, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555.
Article PubMed CAS Google Scholar
Raychaudhuri S, Stuart JM, Altman RB (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 5:452–463. (Altman RB Dunker AK Hunter L Lauderdale K and Klein TE eds) Mauna Lani Hawaii World Scientific Publishing Co.
Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26): 15149–15154
Article PubMed CAS Google Scholar
Raymond, MR, Roberts DM (1987) A comparison of methods for treating incomplete data in selection research. Educational and Psychological Measurement 47:13–26.
Article Google Scholar
Reed RD, Marks II RJ (1998) Neural smithing. Supervised learning in feedforward artificial neural networks. Cambridge: MIT Press.
Google Scholar
Rifkin SA, Atteson K, Kim J (2000) constrain structure analysis of gene expression. Funt Integr Genomics 1:174–185.
Article CAS Google Scholar
Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D and Brown PO (2000) Systematic Variation in Gene Expression patters in human cancer cell lines. Nature (24):224–235.
Google Scholar
Rubin DB 1976. Inference and missing values. Biometrika. 63:581–592.
Google Scholar
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Ko hn KW, Reinho Id WC, Myers TG, Andrews DT, ScudieroDA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000) A gene expression database for the molecular pharmacology of cancer. Nature (24):236–244.
CAS Google Scholar
Selaru FM, Xu Y, Yin J, Zou T, Liu TC, Mori Y, Abraham JM, Sato F, Wang S, Twigg C, Olaru A, Shustova V, Leytin A, Hytiroglou P, Shibata D, Harpaz N, Meltzer SJ (2002) Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions. Gastroenterology 122(3):606–613.
Article PubMed Google Scholar
Seo J, Shneiderman B (2002) Interactively exploring hierarchical clustering Results. IEEE Computer (35)7:80–86
Google Scholar
Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM (2001) The Stanford Microarray Database. Nucleic Acids Res (1): 152-155.
Google Scholar
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and their inference from gene expression time series. Complex Syst (13)1:54–75.
Google Scholar
Skurichina M, Duin RPW (1998) Bagging for linear classifiers. Pattern Recognition 31(7):909–930.
Article Google Scholar
Skurichina M, Duin RPW (2002) Bagging, boosting and the random sample method for linear classifiers. Pattern Analysis & Appli (5): 121–135.
Article Google Scholar
Sneath PHA. Sokal RR (1973) Numerical Taxonomy. San Francisco: Freeman & Co., Publishers.
Google Scholar
Sokal RR, Michener CD, (1958) A statistical method for evaluating systematic relationships. Sci. Bull. University of Kansas 38:1409–1438.
Google Scholar
Soukas A, Cohen P, Socci ND, Friedman JM (2000) Leptin-specific patterns of gene expression in white adipose tissue. Genes & Development 14:963–980.
CAS Google Scholar
Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9):research0046.1-0046.9.
Google Scholar
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9:3273–3297.
PubMed CAS Google Scholar
Spruill SE, Lu J, Hardy S, Weir B (2002) Assessing sources of variability in gene expression data. Biotechniques 33:916–923.
PubMed CAS Google Scholar
Stoeckert CJ, Causton HC, Ball CA (2002) Microarray databases: standards and ontologies. Nat Genet. Suppl 2:469–73.
Article CAS Google Scholar
Strohman R (2002) Maneuvering in the complex path from genotype to phenotype. Science 296:701–702.
Article PubMed CAS Google Scholar
Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HFJr, Hampton GM (2001) Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures. Cancer Res 61:7388–7393
PubMed CAS Google Scholar
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci. USA 96(6):2907–2912.
Article PubMed CAS Google Scholar
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22:281–285.
Article PubMed CAS Google Scholar
Thomas R (1991) Regulatory networks seen as asynchronous automata: A biological Description. J Theor Biol (153): 1–23.
Article Google Scholar
Thomas RS, Rank DR, Penn SG, Zastrow GM, Hayes KR, Pande K, Glover E, Silander T, Craven MW, Reddy JK, Jovanovich SB, Bradfield CA. (2001) Identification of toxicologically predictive gene sets using cDNA microarrays. Mol. Pharmacol 60:1189–1194.
PubMed CAS Google Scholar
Törönen P, Kolehmainen M, Wong G, Castrén E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2): 142–146.
Article PubMed Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525.
Article PubMed CAS Google Scholar
Tusher GV, Tibshirani R, Chu G (2001) Significance analysis applied to ionizing radiation response. Proc. Nat. Acad. Sci. USA (98)9:5116–5121.
Article Google Scholar
Valdivia-Granda WA, Deckard E, Perrizo W (2002) Peano Count Trees (P-Trees) and Rule Association Mining for Gene Expression Profiling of DNA Microarray Data. Proc. Inter Conf in Bioinformatics. Bangkok, Thailand OstraAna08.
Google Scholar
Vapnik V (1995) The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.
Book Google Scholar
Wagner A (1998) The fate of duplicated genes: loss or new function? BioEssays 20 785–788.
Article PubMed CAS Google Scholar
Wolf PJ, Wang Y (2002) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15.
Google Scholar
Yeung KY, Haynor DR, Ruzzo W (2001a) Validating clustering for gene expression data. Bioinformatics (17)4:309–318.
Article Google Scholar
Yeung KY, Ruzzo W (2001) Principal component analysis for clustering for gene expression data. Bioinformatics (17)9:763–774.
Article Google Scholar
Yue H, Eastman PS, Wang B, Minor J, Doctolero MH, Nuttal R, Stack R, Becker JW, Montgomery JR, Vainer M, Johnston R. (2001) An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res (29) 8:e41.
Article PubMed CAS Google Scholar
Zhang K, Zhao H (2000) Assessing reliability of gene clusters from gene expression data. Funct Integr Genomics 1(3):156–173.
Article PubMed CAS Google Scholar
Zhang, Z, Harrison P, Gerstein M (2002) Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res 12(10): 1466–1482.
Article PubMed CAS Google Scholar
Zhao L.P, Prentice R, Breeden L (2001) Statistical modeling of large microarray datasets to identify stimulus-response profiles. Proc. Nat. Acad. Sci. USA (98)10:5631–5636.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Plant Pathology, North Dakota State University Genomics and Bioinformatics Group, USA
Willy Valdivia Granda

Authors

Willy Valdivia Granda
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

University of Kentucky Medical Center, USA
Eric M. Blalock

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Granda, W.V. (2003). Strategies for Clustering, Classifying, Integrating, Standardizing and Visualizing Microarray Gene Expression Data. In: Blalock, E.M. (eds) A Beginner’s Guide to Microarrays. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8760-0_8

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8760-0_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-4684-5
Online ISBN: 978-1-4419-8760-0
eBook Packages: Springer Book Archive

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics