Thanks to visit codestin.com
Credit goes to link.springer.com

Skip to main content

Strategies for Clustering, Classifying, Integrating, Standardizing and Visualizing Microarray Gene Expression Data

  • Chapter
A Beginner’s Guide to Microarrays
  • 426 Accesses

  • 2 Citations

Abstract

Over the last century, investigation of the anatomical and morphological characteristics of a small number of organisms has played an important role in the understanding of numerous biological processes. The rediscovery of Mendel’s laws of heredity in the opening of the 20th century initiated a scientific quest to understand the mechanisms of how genetic information is transmitted and the biological consequences of genetic variation. With the emergence of molecular biology in the last thirty years, classical genetic research has shifted from understanding how visible traits are transmitted to the study of the genome structure at the molecular level. Innovations such as PCR and advances in robotics such as miniaturization and parallelization have lead to a rapid development of more accurate, sensitive and powerful devices used for the analysis of the molecular structure, function and interaction of gene products. With the development of ultrahigh throughput screening tools and the drive from 96-microwell plates to 384- and 1536-microwell plates, it is expected that the generation of whole sequences of different organisms will increase at a rate ~100 times higher than previously anticipated (HeadGordon and Wooley, 2001; Helfrich, 2002; Beson et al. 2002). As powerful, automated biological sampling and analytical tools are becoming available to more laboratories, an exponential and sometimes overwhelming accumulation of multi-format post-genomic datasets is produced. Consequently, modern biology is becoming a data driven multidisciplinary science in which biologists, mathematicians, statisticians, physicists and computer scientists are developing tools to identify ‘in silico’ the coding regions of genes, predict and model protein structural characteristics, define protein-protein interactions, construct biochemical networks and identify potential drug targets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from £29.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 88.00
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aach J, Rindone W, Church GM (2000) Systematic management and analysis of yeast gene expression data. Genome Res (10)431–345.

    Article  PubMed  CAS  Google Scholar 

  2. Achard F, Vaysseix G, Barilot E (2001) XML, Bioinformatics and data integration. Bioinformatics (17)2:115–125.

    Article  PubMed  CAS  Google Scholar 

  3. Aggarwal CC (2002) Towards effective and interpretable data mining by visual interaction. SIGKDD explorations (3)2:11–34.

    Article  Google Scholar 

  4. Akutsu T, Miyano S, Kuhara, S (2000) Inferring qualitative relations in genetic networks and metabolic pathway. Bioinformatics 16:727–734.

    Article  PubMed  CAS  Google Scholar 

  5. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudso n J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R and Staudt LM et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.

    Article  PubMed  CAS  Google Scholar 

  6. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patters of gene expression revealed by clustering analysis. Proc. Natl. Acad. Sci. USA (96)12:6745–6750.

    Article  Google Scholar 

  7. Alter O, Brown P, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modelling. Proc. Natl. Acad. Sci. USA (97)18:10101–10106.

    Article  Google Scholar 

  8. Ambroise C, McLachlam G (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Nat Acad Sci USA (99)10:6562–6566.

    Article  CAS  Google Scholar 

  9. Anderson AB, Basilevsky A, Hum DPJ (1983) Missing Data: A review of the literature. (Rossi PH, Wright JD, Anderson AB Eds). Handbook in Survey Research (pp. 415–494). Academic Press.

    Google Scholar 

  10. Aronow BJ, Richardson B, Handwerger S (2001) Microarray analysis of trophoblast differentiation: gene expression reprogramming in key gene function categories. Physiol Genomics 6:105–116.

    PubMed  CAS  Google Scholar 

  11. Azuaje F, Bolshakova N (2002) Clustering genomic expression data: Design and evaluation principles. In: Understanding and Using Microarray Techniques. A practical Guide. (Bubitzky BD, Granzow M Eds) London: Spring Verlag.

    Google Scholar 

  12. Baldi P, Brunak S (2001) Bioinformatics: the Machine Learning Approach. Cambridge: MIT Press.

    Google Scholar 

  13. Baldi P, Long A (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (17) 6:509–519.

    Article  PubMed  CAS  Google Scholar 

  14. Baldi P, Natfield W (2002) DNA microarrays and gene expression. From experiments to data analysis and modelling. Cambridge: Oxford UP.

    Google Scholar 

  15. Barash Y, Friedman N (2002) Context-specific Bayesian clustering for gene expression data.

    Google Scholar 

  16. Comput Biol 9(2):169–191.

    Google Scholar 

  17. Barillot E, Achard F (2000) XML: a lingua franca for science. TIBTECH 18:331–333.

    Article  CAS  Google Scholar 

  18. Benson DA, Karsch-Mizrachi I, Lipman D, Ostell J, Rapp BA Wheeler D (2002) GenBank. Nucleic Acids Res (30): 17–20.

    Article  PubMed  CAS  Google Scholar 

  19. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Gillanders E, Leja D, Dietrich K, Berens M, Alberts D, Sondak V, Hayward N, Trent J (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406:536–440.

    Article  PubMed  CAS  Google Scholar 

  20. Bø TH, Jonassen I (2002) New feature selection procedure for classification of expression profiles. Genome Biology 3(4);research0017.1-0017.11.

    Google Scholar 

  21. Bolshakova N, Azuaje F (2003) Cluster validation for genome expression data. Technical Report TCD-CS-2002-33 Computer Science Department. Trinity College Dublin http:// www.cs.tcd.ie/publications/tech-reports/reports.02/TCD-CS-2002-33.pdf

    Google Scholar 

  22. Bower JM, Bolouri H (2001) Compuational modelling of biochemical networks. Massachusetts: MIT Press.

    Google Scholar 

  23. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim I, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimun information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Gen 29:365–371.

    Article  CAS  Google Scholar 

  24. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone SA (2003) ArrayExpress-a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 31(1):68–71.

    Article  PubMed  CAS  Google Scholar 

  25. Brazma A, Robinson A, Cameron G, Ashburner M (2002) One-shop for microarray data. Nature 403:699–700.

    Article  CAS  Google Scholar 

  26. Brazma A, Vilo J (2002) Gene Expression Data Analysis. FEBS Lett (480)1:17–24.

    Google Scholar 

  27. Breiman L (1998) Bagging Predictors. Technical Report No. 421. Department of Statistics University of California Berkeley.

    Google Scholar 

  28. Brody J.P., Williams B.A., Wold B.J., Quake S.R. (2002) Significance and statistical errors in the analysis of DNA microarray data. Proc. Nat. Acad. Sci. USA (99):20:12975–12978.

    Article  CAS  Google Scholar 

  29. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97:262–267.

    Article  PubMed  CAS  Google Scholar 

  30. Burges, C. (1998) A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery (2)2:1–43.

    Google Scholar 

  31. Butte AJ, Tamayo P, Slonin D, Golub T, Kohane I (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Nat. Acad. Sci. USA 97(22): 12182–12186.

    Article  PubMed  CAS  Google Scholar 

  32. Celis JE, Kruhoffer M, Gromova I, Frederiksen C, Ostergaard M, Thykjaer T, Gromov P, Yu J, Palsdottir H, Magnusson N, Orntoft TF (2000) Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. FEBS Lett 480(1):2–16.

    Article  PubMed  CAS  Google Scholar 

  33. Cheng, Y, Church GM (2000) Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 8:93–103.

    PubMed  CAS  Google Scholar 

  34. Chilingaryan A, Gevorgyan N, Vardanyan D, Jones D, Szabo A (2002) Paper title. Mathematical Biosciences (176):59–72.

    Article  PubMed  CAS  Google Scholar 

  35. D’haeseleer P (2001) Beyond Co-Expression: Gene Network Interference. www.cs.unm.edu/ ~patrick/networks/diss.pdf

    Google Scholar 

  36. DTiaeseleer P, Liang S, Somogyi R (2000) Genetic network interference: from co-expression clustering to reverse engineering. Bioinformatics (16)8:707–726.

    Article  Google Scholar 

  37. Dudoit S, Fridlyand J (2002) A prediction-based resampling methods for estimating the number of clusters in a dataset. Genome Biology (3)7:research0036.1-0036.21.

    Google Scholar 

  38. Dudoit S, Fridlyand J, Speed TP (2000) Comparison of discrimination methods of tumors using gene expression data. Department of Statistics Technical Report 576. University of Berkeley.

    Google Scholar 

  39. Dubitzky W, Granzow W, Berrar D (2001) Data Mining and Machine Learning Methods for Microarray Analysis. In: Methods of Microarray Data Analysis (Lin SM, Johnson KF eds) (pp 5–22). Massachusetts: Springer Science+Business Media New York.

    Google Scholar 

  40. Duogherty E, Barrera J, Brun M, Kim S, Cesar RM, Chen Y, Bittner M, Trent M (2002) Inference from clustering with application to gene-expression microarrays. J Comp Biol (9)1:105–126.

    Article  Google Scholar 

  41. Edgar R, Domrachev RM Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res (30)1:207–210.

    Article  Google Scholar 

  42. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. New York: Chapman & Hall.

    Google Scholar 

  43. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25): 14863–14868.

    Article  PubMed  CAS  Google Scholar 

  44. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M (2001) Correspondence analysis applied to microarray data. Proc Natl Acad Sci USA 98: 10781–10786.

    Article  PubMed  CAS  Google Scholar 

  45. Fix E, Hodges J (1951) Discriminatory analysis non parametric discrimination: consistency properties. Technical Report Randolph Filed Texas. USAF School of Aviation Medicine.

    Google Scholar 

  46. Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comp Syst Sci 55(1): 119–139.

    Article  Google Scholar 

  47. Friedman N, Linial M, Nachman I, Pe′er D (2000) Using Bayesian Networks to Analyze Expression Data. J Comput Biol 7(3-4):601–20

    Article  PubMed  CAS  Google Scholar 

  48. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (16)10:906–914.

    Article  Google Scholar 

  49. Garofalakis M, Hyun D, Rastogi R, Shim (2000) Efficient Algorithms for Constructing Decision Trees with Constraints. Proc. Sixth ACM SIGKDD. Paper 296.

    Google Scholar 

  50. Geschwind DH (2001) Sharing gene expression data: an array of options. Nature Rev Neuroscience. (2):435–438.

    Article  CAS  Google Scholar 

  51. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Nat. Acad. Sci. USA (97)22:12079–12084.

    Article  Google Scholar 

  52. Gilbert DR, Schroeder M, van Helden J (2000) Interactive visualization and exploration of relationships between biological objects. TIBTECH (18):487–494.

    Article  CAS  Google Scholar 

  53. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligluri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537.

    Article  PubMed  CAS  Google Scholar 

  54. Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res (1):94–96

    Article  Google Scholar 

  55. Graves DJ (1999) Powerful tools for genetic analysis come to age. TIBTECH (17) 127–134.

    Article  CAS  Google Scholar 

  56. Guyon I, Weston J, Barnhill S, Vapnik V (2000) Gene selection for cancer discrimination using support vector machines. Machine Learning 46(l/3):389.

    Google Scholar 

  57. Halgren RG, Fielden MR, Fong CJ, Zacharewski TR (2001) Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res. 29(2):582–8.

    Article  PubMed  CAS  Google Scholar 

  58. Han J, Kamber M (2001) Data mining. Concepts and applications. San Francisco: Morgan Kaufmann Press.

    Google Scholar 

  59. Hand DJ, (1999) Statistics and Data Mining: Intersecting Disciplines. Proc. Fifth ACM SIGKDD (1)1:16–19.

    Article  Google Scholar 

  60. Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. Cambridge: MIT Press.

    Google Scholar 

  61. Harding J, Rocke DM (2002) Robust Model-Based Clustering of Genes in Microarray Data: Are there G ene C lusters? www.camda.duke.edu/CAMDAOO/Abstracts/Presentations/ Poster_13.pdf

    Google Scholar 

  62. Harrison P, Kumar A, Lan N, Echols N, Snyder M, Gerstein M (2002) A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution. J Mol Biol 316:409–19.

    Article  PubMed  CAS  Google Scholar 

  63. Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown PO (2001a) Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2):research0003.1-0003.21.

    Google Scholar 

  64. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Data Mining Inference and prediction. Berlin: Springer-Verlag.

    Google Scholar 

  65. Hawng KB, Cho DY, Park S, Kim SD, Zhang BT (2002) Applying machine learning techniques to analysis of gene expression data: Cancer diagnostics. Methods of Microarray Data Analysis. (Lin SM, Johnson, KF eds.) (pp 167–182). Massachusetts: Springer Science+Business Media New York.

    Google Scholar 

  66. HeadGordon T, Wooley J (2001) Computational challenges in structural and functional genomics. IBM System Journal. (40)2: 265–296.

    Article  Google Scholar 

  67. Helfrich JP (2002) Raw Data to Knowledge Warehouse in Proteomic-Based Drug Discovery: A Scientific Data Management Issue. Biotechniques Supp. on Comp Proteom 48–53.

    Google Scholar 

  68. Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics (17)2:126–136.

    Article  Google Scholar 

  69. Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9:1106–1115.

    Article  PubMed  CAS  Google Scholar 

  70. Hilsenbeck SG, Friedrichs WE, Schiff R, O’Connell P, Hansen RK, Osborne CK, Fuqua SAW (1999) Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J Nat Cancer Inst 91: 453–459.

    Article  PubMed  CAS  Google Scholar 

  71. Holter NS, Maritan A, Cieplak M, Federoff NV, Banavar JR (2002) Dynamic modelling of expression data. Proc Nat Acad Sci USA (98)4j:193–1698.

    Google Scholar 

  72. Hvidsten TR, Komorowski J, Sandvik AK, Legreid AL (2001) Predicting gene function from gene expressions and ontologies. In: Pacific Symposium on Biocomputing pp. 299–310 (Airman RB Dunker AK Hunter L Lauderdale K and Klein TE eds) Mauna Lani Hawaii World Scientific Publishing Co.

    Google Scholar 

  73. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys 31(3):264–323.

    Article  Google Scholar 

  74. Jamil HM, Modica GA, Teran MA (2001) Towards a Visual Query Interphase for Phylogenetic Databases. CIKM’ 01:57–64.

    Google Scholar 

  75. Kanehisa M, Goto S, Kawashima S, Nakaya A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res. 30(1):42–6.

    Article  PubMed  CAS  Google Scholar 

  76. Kaufiman, SA (1998) Investigations. New York: Oxford UP.

    Google Scholar 

  77. Kazic T (2000) Semiotes: a semantics for sharing. Bioinformatics 16(12): 1129–1144.

    Article  PubMed  CAS  Google Scholar 

  78. Keller DA, Schummer M, Hood L, Ruzzo WL (2000) Bayesian Classification of DNA Array Expression Data. Technical Report UW-CSE-2000-08-01.

    Google Scholar 

  79. Kerr MK, Churchill GA (2001) Bootstrapping cluster analysis: Asessing the reliability of conclusions from microarray experiments. Proc. Nat. Acad. Sci. USA (98)16:8961–8965.

    Article  Google Scholar 

  80. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6): 658–659.

    Article  CAS  Google Scholar 

  81. Kitano H (2002) Computational system biology. Nature 420:206–210.

    Article  PubMed  CAS  Google Scholar 

  82. Kitano H (2002a) Foundations of system biology. Massachusetts: MIT Press.

    Google Scholar 

  83. Kohonen T (1981) Automatic formation of topological maps of patterns in a self-organizing system. In Proc. Second Scandinavian Conf. on Image Analysis 214–220.

    Google Scholar 

  84. Kohonen T (1997) Self-organizing maps. Berlin: Springer–Verlag.

    Google Scholar 

  85. Kothapalli R. Yoder SJ, Mane S, Loughram Jr TP (2002) Microarray results: How accurate they are? BMC Bioinformatics (3):22

    Article  PubMed  Google Scholar 

  86. Kuo WP, Jenseen T, Butte AT, Ohno-Machado L, Kohane IS (2002) Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics (18):405–412.

    Article  PubMed  CAS  Google Scholar 

  87. Kuramochi M, Karypis G (2001) Gene Classification using expression profiles: A feasibility study. Department of Computer Science/Army HPC Research Center. Technical Report 01-029.

    Google Scholar 

  88. Landgrebe J, Wurst W, Welzl G (2002) Permutation-validated principal components analysis of microarray data. Genome Biol 3(4):research0019.

    Article  PubMed  Google Scholar 

  89. Lee MT, Kuo FFC, Whitemore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: Statistical methods and evidence of repetitive cDNA hybridisations. Proc. Nat. Acad. Sci. USA (97)18:9834–9839.

    Article  Google Scholar 

  90. Li L, Weinberg CR, Darden TA, Pedersen LA (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN algorithms. Bioinformatics 12(12): 1131–1142.

    Article  Google Scholar 

  91. Liang S, Fuhrman S, Somogyi R (1998) REVEAL. A genereal reverse engineering Algorithm for the Interference of Genetic Network Architecture. Pac. Symp. Biocomputing 18–29.

    Google Scholar 

  92. Little RA, Rubin DR (1987) Statistical analysis with missing data. New York: John Wiley & Sons.

    Google Scholar 

  93. Lockhart DJ, Winzeler EA (2001) Genomics gene expression and DNA arrays. Nature (405):827–836.

    Article  CAS  Google Scholar 

  94. MacQueen J 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on mathematical statistics and probability 1:281–297.

    Google Scholar 

  95. Mendez MA, Hodar C, Vulpe C, Gonzalez M, Cambiazo V (2002) Discriminant analysis to evaluate clustering of gene expression data. FEBS Letts 522(1-3):24–28.

    Article  CAS  Google Scholar 

  96. Model F, König T, Piepenbrock C, Adorjan P (2002) Statistical process control for large scale microarray experiments. Bioinformatics 155–163.

    Google Scholar 

  97. Moler EJ, Chow ML, Mian JS (2000) Analysis of molecular profile data using generative and discriminative methods. Physiol. Genomics 4:109–126.

    PubMed  CAS  Google Scholar 

  98. Mukherjee S (2002) Classifying Microarray Data Using Support Vector Machines. Berrar DP, Dubitzky W, Granzow M (Eds). A Practical Approach to Microarray Data Analysis. Boston: Springer Science+Business Media New York.

    Google Scholar 

  99. Mukherjee S, Tamayo P, Mesirov JP, Slonim D, Verri A, Poggio T (199) Support Vector Machine Classification of Microarray Data. CBCL Paper 182/AI Memo-1676, Massachusetts Institute of Technology. Cambridge.

    Google Scholar 

  100. Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA (2002) The limit of the fold change: A practical approach for selecting differentially expressed genes from microarray data. BMC Bioinformatics 3:17

    Article  PubMed  Google Scholar 

  101. Nadon R, Shoemaker J (2002) Statistical issues with microarrays: processing and analysis. Trends in Genetics 18(5):265–271.

    Article  PubMed  CAS  Google Scholar 

  102. Pan K, Lih C, Cohen SN (2002) Analysis of NDA microarrays using algorithms that employ rule-based expert knowledge. Proc Nat Acad Sci USA 99(4):21118–2123.

    Article  CAS  Google Scholar 

  103. Pavlidis P, Weston J, Cai J, Grundy WN (2001) Gene functional classification from heterogeneous data. RECOMB 2001: Proc Fifth Ann Int Conf Comp Biol 249-255.

    Google Scholar 

  104. Peterson LE (2003) Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Comput Methods Programs Biomed 70(2): 107–119

    Article  PubMed  Google Scholar 

  105. Proudfoot N (1980) Pseudogenes. Nature 286(5776):840–841.

    Article  PubMed  CAS  Google Scholar 

  106. Qi. H (2002) Feature Selection and kNN fusion in molecular classification of multiple tumor types. Proc. Intern. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS’02) http://aicip.ece.utk.edu/publication/02metmbs.pdf

    Google Scholar 

  107. Quackenbush J. Computational analysis of microarray data. (2001) Nat Rev Genet 2(6):418–427.

    Article  PubMed  CAS  Google Scholar 

  108. Ramoni M, Sebastiani P (1998) Bayesian methods for intelligent data analysis. Kmi Technical reportKMi-TR-67. The Open University.

    Google Scholar 

  109. Ramoni M, Sebastiani P, Kohane I.S. (2002) From the cover: Cluster Analysis of Gene Expression Dynamics. Proc Nat Acad Sci USA 99(14):9121–9126.

    Article  PubMed  CAS  Google Scholar 

  110. Ravasz E, Somera L, Mongru DA, Oltvai N, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555.

    Article  PubMed  CAS  Google Scholar 

  111. Raychaudhuri S, Stuart JM, Altman RB (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 5:452–463. (Altman RB Dunker AK Hunter L Lauderdale K and Klein TE eds) Mauna Lani Hawaii World Scientific Publishing Co.

    Google Scholar 

  112. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26): 15149–15154

    Article  PubMed  CAS  Google Scholar 

  113. Raymond, MR, Roberts DM (1987) A comparison of methods for treating incomplete data in selection research. Educational and Psychological Measurement 47:13–26.

    Article  Google Scholar 

  114. Reed RD, Marks II RJ (1998) Neural smithing. Supervised learning in feedforward artificial neural networks. Cambridge: MIT Press.

    Google Scholar 

  115. Rifkin SA, Atteson K, Kim J (2000) constrain structure analysis of gene expression. Funt Integr Genomics 1:174–185.

    Article  CAS  Google Scholar 

  116. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D and Brown PO (2000) Systematic Variation in Gene Expression patters in human cancer cell lines. Nature (24):224–235.

    Google Scholar 

  117. Rubin DB 1976. Inference and missing values. Biometrika. 63:581–592.

    Google Scholar 

  118. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Ko hn KW, Reinho Id WC, Myers TG, Andrews DT, ScudieroDA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN (2000) A gene expression database for the molecular pharmacology of cancer. Nature (24):236–244.

    CAS  Google Scholar 

  119. Selaru FM, Xu Y, Yin J, Zou T, Liu TC, Mori Y, Abraham JM, Sato F, Wang S, Twigg C, Olaru A, Shustova V, Leytin A, Hytiroglou P, Shibata D, Harpaz N, Meltzer SJ (2002) Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions. Gastroenterology 122(3):606–613.

    Article  PubMed  Google Scholar 

  120. Seo J, Shneiderman B (2002) Interactively exploring hierarchical clustering Results. IEEE Computer (35)7:80–86

    Google Scholar 

  121. Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM (2001) The Stanford Microarray Database. Nucleic Acids Res (1): 152-155.

    Google Scholar 

  122. Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and their inference from gene expression time series. Complex Syst (13)1:54–75.

    Google Scholar 

  123. Skurichina M, Duin RPW (1998) Bagging for linear classifiers. Pattern Recognition 31(7):909–930.

    Article  Google Scholar 

  124. Skurichina M, Duin RPW (2002) Bagging, boosting and the random sample method for linear classifiers. Pattern Analysis & Appli (5): 121–135.

    Article  Google Scholar 

  125. Sneath PHA. Sokal RR (1973) Numerical Taxonomy. San Francisco: Freeman & Co., Publishers.

    Google Scholar 

  126. Sokal RR, Michener CD, (1958) A statistical method for evaluating systematic relationships. Sci. Bull. University of Kansas 38:1409–1438.

    Google Scholar 

  127. Soukas A, Cohen P, Socci ND, Friedman JM (2000) Leptin-specific patterns of gene expression in white adipose tissue. Genes & Development 14:963–980.

    CAS  Google Scholar 

  128. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9):research0046.1-0046.9.

    Google Scholar 

  129. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9:3273–3297.

    PubMed  CAS  Google Scholar 

  130. Spruill SE, Lu J, Hardy S, Weir B (2002) Assessing sources of variability in gene expression data. Biotechniques 33:916–923.

    PubMed  CAS  Google Scholar 

  131. Stoeckert CJ, Causton HC, Ball CA (2002) Microarray databases: standards and ontologies. Nat Genet. Suppl 2:469–73.

    Article  CAS  Google Scholar 

  132. Strohman R (2002) Maneuvering in the complex path from genotype to phenotype. Science 296:701–702.

    Article  PubMed  CAS  Google Scholar 

  133. Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HFJr, Hampton GM (2001) Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures. Cancer Res 61:7388–7393

    PubMed  CAS  Google Scholar 

  134. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci. USA 96(6):2907–2912.

    Article  PubMed  CAS  Google Scholar 

  135. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22:281–285.

    Article  PubMed  CAS  Google Scholar 

  136. Thomas R (1991) Regulatory networks seen as asynchronous automata: A biological Description. J Theor Biol (153): 1–23.

    Article  Google Scholar 

  137. Thomas RS, Rank DR, Penn SG, Zastrow GM, Hayes KR, Pande K, Glover E, Silander T, Craven MW, Reddy JK, Jovanovich SB, Bradfield CA. (2001) Identification of toxicologically predictive gene sets using cDNA microarrays. Mol. Pharmacol 60:1189–1194.

    PubMed  CAS  Google Scholar 

  138. Törönen P, Kolehmainen M, Wong G, Castrén E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2): 142–146.

    Article  PubMed  Google Scholar 

  139. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525.

    Article  PubMed  CAS  Google Scholar 

  140. Tusher GV, Tibshirani R, Chu G (2001) Significance analysis applied to ionizing radiation response. Proc. Nat. Acad. Sci. USA (98)9:5116–5121.

    Article  Google Scholar 

  141. Valdivia-Granda WA, Deckard E, Perrizo W (2002) Peano Count Trees (P-Trees) and Rule Association Mining for Gene Expression Profiling of DNA Microarray Data. Proc. Inter Conf in Bioinformatics. Bangkok, Thailand OstraAna08.

    Google Scholar 

  142. Vapnik V (1995) The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.

    Book  Google Scholar 

  143. Wagner A (1998) The fate of duplicated genes: loss or new function? BioEssays 20 785–788.

    Article  PubMed  CAS  Google Scholar 

  144. Wolf PJ, Wang Y (2002) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15.

    Google Scholar 

  145. Yeung KY, Haynor DR, Ruzzo W (2001a) Validating clustering for gene expression data. Bioinformatics (17)4:309–318.

    Article  Google Scholar 

  146. Yeung KY, Ruzzo W (2001) Principal component analysis for clustering for gene expression data. Bioinformatics (17)9:763–774.

    Article  Google Scholar 

  147. Yue H, Eastman PS, Wang B, Minor J, Doctolero MH, Nuttal R, Stack R, Becker JW, Montgomery JR, Vainer M, Johnston R. (2001) An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res (29) 8:e41.

    Article  PubMed  CAS  Google Scholar 

  148. Zhang K, Zhao H (2000) Assessing reliability of gene clusters from gene expression data. Funct Integr Genomics 1(3):156–173.

    Article  PubMed  CAS  Google Scholar 

  149. Zhang, Z, Harrison P, Gerstein M (2002) Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res 12(10): 1466–1482.

    Article  PubMed  CAS  Google Scholar 

  150. Zhao L.P, Prentice R, Breeden L (2001) Statistical modeling of large microarray datasets to identify stimulus-response profiles. Proc. Nat. Acad. Sci. USA (98)10:5631–5636.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Granda, W.V. (2003). Strategies for Clustering, Classifying, Integrating, Standardizing and Visualizing Microarray Gene Expression Data. In: Blalock, E.M. (eds) A Beginner’s Guide to Microarrays. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8760-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8760-0_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-4684-5

  • Online ISBN: 978-1-4419-8760-0

  • eBook Packages: Springer Book Archive

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics