epiGeniusHD

In the project EpiGeniusHD we investigate the role of chromatin and epigenetics markers in Huntington’s Disease, particularly the deregulated gene expression patterns of Huntington’s Disease.

The project was made possible by the Netherlands Bioinformatics Centre and the Human Genetics Department of the Leiden University Medical Center.

Abstract

Epigenetic phenomena such as DNA methylation, histone modifications, and chromatin structure influence phenotype and gene expression. Epigenetic changes can cause long term effects on health and may have a pivotal role in disease etiology. We are only beginning to understand the molecular basis of these effects, for which we need to reconcile specific hypotheses about biomolecular mechanisms and large amounts of data and knowledge that is available through a highly heterogeneous set of resources. A representative example is Huntington’s Disease (HD). HD is principally caused by a straightforward genetic aberration (a CAG repeat expansion in the HD gene), but the downstream mechanisms are still poorly understood and no cure is available yet. There is preliminary evidence, generated in our laboratories and others, that epigenetic mechanisms play an important role. New hypotheses need to be formulated that integrate knowledge beyond a single domain of expertise. We can support this process computationally by combining several technologies following an e-Science approach that helps exploiting multi-disciplinary expertise. In a cycle that emphasizes continuous communication between biology and technology experts, we employ state-of-the-art text mining and data integration in the form of repeatable workflows whilst incrementally building a knowledge base of structured, machine-readable knowledge that links to other knowledge resources across the web. Semantic Web tools allow us to search for novel relations across experiments, our own and those of others. By composing the epiGenius ‘e-Laboratory’ we leverage our technological advancements for biological experts and stimulate collaboration between scientists with different backgrounds (computational scientists, bench biologists, medical doctors). We anticipate that this approach will result in breakthroughs in the elucidation of epigenetic mechanisms in the context of HD and beyond, which in turn can lead to recommendations for medical research.

Background

Epigenetic phenomena are a key factor in orchestrating the gene expression patterns that determine cellular identity. Chemical modifications of nucleotides and DNA-binding proteins, especially histones, form a yet poorly understood epigenetic ‘code’ that is heritable through mitosis and via germ cells. Aberrations in this code can have a substantial influence on health [1, 2]. External factors may cause these aberrations. For example, some nutrients have an effect on epigenetic gene control that appear to affect aging, brain development, obesity, forms of cancer [3-5], and even susceptibility to adult disease in utero [6, 7]. Epigenetic changes possibly underlie the aberrant patterns of transcriptional activity that are observed in many diseases. A good example is Huntington’s Disease (HD). The genetic cause of this disease, an expansion of CAG repeats in the gene for Huntingtin, is clearly identified, but the downstream molecular mechanisms leading to the HD phenotype are still poorly understood. Changes in mRNA levels for various proteins and receptors are visible in early grade HD brains before any recognizable neuropathology [8-10] and are associated with neuronal dysfunction prior to neuronal death [11], while the pattern of transcriptional pathology in the different brain regions agrees with the pattern of neurodegeneration [12]. The relation between gene expression and HD pathology was confirmed in animal models [13, 14]. New hypotheses that take epigenetic mechanisms into account may explain these observations more comprehensibly. For instance, mutant huntingtin may cause pathologic changes by interacting with transcription factors [15-17] and histones H3 and H4 [18] or by directly interfering with DNA; there is evidence that huntingtin affects DNA conformation and transcription factor binding by occupying gene promoters in vivo in a polyglutamine-dependent manner [19]. At the same time, ensuring that all potentially relevant facts in literature and databases are considered when conceiving these hypotheses has become extremely difficult. Looking at literature (PubMed) alone, we find that the number of publications that mention ‘epigenetics’ has grown from its first mention in 1964 to over 30000 [20], a conservative estimate considering that many key epigenetic components were previously studied in different contexts (e.g. ‘Histone deacetylase’ without ‘epigenetics’ adds another 6000 publications). It is clear that new technologies for systematic data analysis plus mechanisms that support multidisciplinary collaboration are desired to bridge between hypothesis and data. A number of developments in e-science (see Box 1 for a glossary) aim to provide such support and are ready to be tested for challenging applications in life science. (i) Biological assertions in text can be extracted automatically from the large volume of biomedical literature by matching terms to predefined terminologies [21, 22], or by machine learning techniques [20, 23, 24]. Additional relations can be predicted by statistically comparing ‘concept profiles’ (Box 1 & [25-27]). We recently extended the predictive power of this method by statistically incorporating data sources other than literature (Haagen et al., manuscript in preparation). (ii) The Linked Data movement (http://linkeddata.org; [28]) and the Concept Web Alliance help to create a machine readable semantic web of data, following the same principles that created the human-readable world wide web [29, 30]. This enhances our potential to investigate hypotheses [31-33]. (iii) Semantic models or Ontologies stored using the Resource Description Framework (RDF; [34]) and the Web Ontology Language (OWL; [35]) represent knowledge in a form that can be used for the digital conservation of bioinformatics methods and results, and machine inference across the web [36, 37]. RDF is the model of choice for Linked Data. (iv) Workflow systems offer a platform for the design of computational experiments that run services created by diverse experts across the Internet [38] or a grid [39, 40]. A workflow can be seen as a digital analogue of a wet laboratory protocol. This approach has been used for data integration [41, 42], systematic analysis of micro-array data [43], and text mining supported by Semantic Web tools [24]. The workflow tool Taverna is being extended to produce Semantically Linked Data directly, among others for epiGeniusHD (matching activity). (v) Community web sites such as myExperiment.org [44], and BioCatalogue.org [45] support the social aspect of scientific collaboration [46-48]. This is extended by ‘e-laboratories’ that leverage e-Science technologies for specific communities of domain users [46]. Apart from developing technology that supports collaboration, we have also gained experience in social mechanisms that support multidisciplinary collaboration. Inspired by the success of an open and agile approach, including the early involvement of ‘power-users’, epiGeniusHD also emphasizes communication and short feedback cycles between selected epigenetics/HD experts, computer scientists, and software engineers. We expect that the application of this approach and the aforementioned technologies will enhance the way we conceive hypotheses and lead to breakthroughs in our understanding of the role of epigenetics in HD.

Objectives

The main aim of our research is to enhance the process of conceiving and testing hypotheses about the role of epigenetic mechanisms in HD pathology by an e-science approach.

Key objectives:

  • to explore e-Science tools and a collaborative e-Science research cycle for a real-life application in life science that will benefit from bridging between biologists and technologists, and hypothesis and data.
  • to extend our understanding of epigenetics mechanisms in HD by applying a combination of workflow and semantic technologies for (i) text mining and data integration experiments, (ii) ‘post-hoc’ analysis of semantically linked data across experiments, (iii) leveraging workflows and semantically linked data in an e-laboratory for the epigenetics/HD community.
  • to develop an approach for bridging between specific mechanistic hypotheses and the results from data driven analysis by placing hypothesis and mining results in one semantic framework.
  • to learn how to extract new information from linked data to find for instance
    • novel epigenetic factors that explain HD pathology
    • new evidence for hypotheses that suggest a role for epigenetics in HD
    • more specific drug targets than for instance HDAC inhibitors

References

1. Egger G, Liang G, Aparicio A, Jones PA: Epigenetics in human disease and prospects for epigenetic therapy. Nature 2004, 429:457-463.

2. Feinberg AP: Phenotypic plasticity and the epigenetics of human disease. Nature 2007, 447:433-440.

3. Garfinkel MD, Ruden DM: Chromatin effects in nutrition, cancer, and obesity. Nutrition 2004, 20:56-62.

4. Liu L, Wylie RC, Andrews LG, Tollefsbol TO: Aging, cancer and nutrition: the DNA methylation connection. Mech Ageing Dev 2003, 124:989-98.

5. Liu L, van Groen T, Kadish I, Tollefsbol TO: DNA methylation impacts on learning and memory in aging. Neurobiol Aging 2007.

6. Mathers JC: Early nutrition: impact on epigenetics. Forum Nutr 2007, 60:42-8.

7. Waterland RA, Jirtle RL: Early nutrition, epigenetic changes at transposons and imprinted genes, and enhanced susceptibility to adult chronic diseases. Nutrition 2004, 20:63-8.

8. Augood SJ, Faull RL, Love DR, Emson PC: Reduction in enkephalin and substance P messenger RNA in the striatum of early grade Huntington’s disease: a detailed cellular in situ hybridization study. Neuroscience 1996, 72:1023-36.

9. Augood SJ, Faull RL, Emson PC: Dopamine D1 and D2 receptor gene expression in the striatum in Huntington’s disease. Annals of neurology 1997, 42:215-21.

10. Norris PJ, Waldvogel HJ, Faull RL, Love DR, Emson PC: Decreased neuronal nitric oxide synthase messenger RNA and somatostatin messenger RNA in the striatum of Huntington’s disease. Neuroscience 1996, 72:1037-47.

11. Cha JH: Transcriptional signatures in Huntington’s disease. Progress in neurobiology 2007, 83:228-48.

12. Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, Elliston LA, Hartog C, Goldstein DR, Thu D, Hollingsworth ZR, Collin F, Synek B, Holmans PA, Young AB, Wexler NS, Delorenzi M, Kooperberg C, Augood SJ, Faull RL, Olson JM, Jones L, Luthi-Carter R: Regional and cellular gene expression changes in human Huntington’s disease brain. Human molecular genetics 2006, 15:965-77.

13. Kuhn A, Goldstein DR, Hodges A, Strand AD, Sengstag T, Kooperberg C, Becanovic K, Pouladi MA, Sathasivam K, Cha JH, Hannan AJ, Hayden MR, Leavitt BR, Dunnett SB, Ferrante RJ, Albin R, Shelbourne P, Delorenzi M, Augood SJ, Faull RL, Olson JM, Bates GP, Jones L, Luthi-Carter R: Mutant huntingtin’s effects on striatal gene expression in mice recapitulate changes observed in human Huntington’s disease brain and do not differ with mutant huntingtin length or wild-type huntingtin dosage. Human molecular genetics 2007, 16:1845-61.

14. Luthi-Carter R, Strand AD, Hanson SA, Kooperberg C, Schilling G, La Spada AR, Merry DE, Young AB, Ross CA, Borchelt DR, Olson JM: Polyglutamine and transcription: gene expression changes shared by DRPLA and Huntington’s disease mouse models reveal context-independent effects. Human molecular genetics 2002, 11:1927-37.

15. van Roon-Mom WM, Reid SJ, Jones AL, MacDonald ME, Faull RL, Snell RG: Insoluble TATA-binding protein accumulation in Huntington’s disease cortex. Brain research 2002, 109:1-10.

16. Nucifora FC, Sasaki M, Peters MF, Huang H, Cooper JK, Yamada M, Takahashi H, Tsuji S, Troncoso J, Dawson VL, Dawson TM, Ross CA: Interference by huntingtin and atrophin-1 with cbp-mediated transcription leading to cellular toxicity. Science (New York, N.Y 2001, 291:2423-8.

17. Dunah AW, Jeong H, Griffin A, Kim YM, Standaert DG, Hersch SM, Mouradian MM, Young AB, Tanese N, Krainc D: Sp1 and TAFII130 transcriptional activity disrupted in early Huntington’s disease. Science (New York, N.Y 2002, 296:2238-43.

18. Hazeki N, Tsukamoto T, Yazawa I, Koyama M, Hattori S, Someki I, Iwatsubo T, Nakamura K, Goto J, Kanazawa I: Ultrastructure of nuclear aggregates formed by expressing an expanded polyglutamine. Biochemical and biophysical research communications 2002, 294:429-40.

19. Benn CL, Sun T, Sadri-Vakili G, McFarland KN, DiRocco DP, Yohrling GJ, Clark TW, Bouzou B, Cha JH: Huntingtin modulates transcription, occupies gene promoters in vivo, and binds directly to DNA in a polyglutamine-dependent manner. J Neurosci 2008, 28:10720-33.

20. Kolarik C, Klinger R, Hofmann-Apitius M: Identification of histone modifications in biomedical text for supporting epigenomic research. BMC bioinformatics 2009, 10 Suppl 1:S28.

21. Spasic I, Ananiadou S, McNaught J, Kumar A: Text mining and ontologies in biomedicine: making sense of raw text. Briefings in bioinformatics 2005, 6:239-51.

22. Kors JA, Schuemie MJ, Schijvenaars B, Weeber M, Mons B: Combination of Genetic Databases for Improving Identification of Genes and Proteins in Text. In BioLINK. Detroit, Michigan, USA: 2005.

23. Katrenko S, Adriaans P: Using Semi-Supervised Techniques to Detect Gene Mentions. In Second BioCreative Challenge Workshop. 2007.

24. Roos M, Marshall MS, Gibson AP, Schuemie M, Meij E, Katrenko S, van Hage WR, Krommydas K, Adriaans PW: Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC bioinformatics 2009, 10 Suppl 10:S9.

25. Jelier R, t Hoen PA, Sterrenburg E, den Dunnen JT, van Ommen GJ, Kors JA, Mons B: Literature-aided meta-analysis of microarray data: a compendium study on muscle development and disease. BMC bioinformatics 2008, 9:291.

26. van Haagen HHHBM, ‘t Hoen PAC, Botelho Bovo A, de Morrée A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen GB, van der Maarel SM, Kern VM, Mons B, Schuemie MJ: Novel protein-protein interactions inferred from literature context. PLoS ONE 2009, 4:e7894.

27. Jelier R, Jenster G, Dorssers LC, Wouters BJ, Hendriksen PJ, Mons B, Delwel R, Kors JA: Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation. BMC bioinformatics 2007, 8:14.

28. Linked Data | Linked Data – Connect Distributed Data across the Web [http://linkeddata.org/].

29. Mons B, Velterop J: Nano-Publication in the e-science era. In Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009). Washington DC, USA: CEUR-WS; , 523:14.

30. Mons B, Ashburner M, Chichester C, van Mulligen E, Weeber M, den Dunnen J, van Ommen GJ, Musen M, Cockerill M, Hermjakob H, Mons A, Packer A, Pacheco R, Lewis S, Berkeley A, Melton W, Barris N, Wales J, Meijssen G, Moeller E, Roes PJ, Borner K, Bairoch A: Calling on a million minds for community annotation in WikiProteins. Genome biology 2008, 9:R89.

31. Cheung KH, Yip KY, Smith A, Deknikker R, Masiar A, Gerstein M: YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 2005, 21 Suppl 1:i85-96.

32. Marshall M, Post L, Roos M, Breit T: Using Semantic Web Tools to Integrate Experimental Measurement Data on Our Own Terms. In On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. 2006:688, 679 %U http://dx.doi.org/10.1007/11915034_92.

33. Post LJ, Roos M, Marshall MS, van Driel R, Breit TM: A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data. Bioinformatics 2007, 23:3080-7.

34. Resource Description Framework (RDF) Primer [http://www.w3.org/TR/rdf-primer/].

35. OWL Web Ontology Language [http://www.w3.org/TR/owl-features/].

36. Antoniou G: A semantic Web primer. Cambridge Mass.: MIT Press; 2004.

37. Neumann E, Miller E, Wilbanks J: What the semantic web could do for the life sciences. Drug Discovery Today: BIOSILICO 2004, 2:228-236.

38. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 2006, 34:W729-W732.

39. The VLAM-G Abstract Machine: A Data and Process Handling System on the Grid [http://dx.doi.org/10.1007/3-540-48228-8_9].

40. Inda MA, van Batenburg MF, Roos M, Belloum AS, Vasunin D, Wibisono A, van Kampen AH, Breit TM: SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences. BMC research notes 2008, 1:63.

41. Romano P: Automation of in-silico data analysis processes through workflow management systems. Brief. Bioinformatics 2008, 9:57-68.

42. Stein LD: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nature reviews 2008, 9:678-88.

43. Fisher P, Hedeler C, Wolstencroft K, Hulme H, Noyes H, Kemp S, Stevens R, Brass A: A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis. Nucleic Acids Res 2007, 35:5625-33.

44. Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D: myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 2010.

45. Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M, Wolstencroft K, Aleksejevs S, Stevens R, Pettifer S, Lopez R, Goble CA: BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res 2010.

46. De Roure D, Goble C, Aleksejevs S, Bechhofer S, Bhagat J, Cruickshank D, Fisher P, Hull D, Michaelides D, Newman D, Proctor R, Poschen M: Towards Open Science: The myExperiment approach. 2009, Concurrency and Computation: Practice and Experience. (In Press).

47. De Roure D, Goble C, Bhagat J, Cruickshank D, Goderis A, Michaelides D, Newman D: myExperiment: Defining the Social Virtual Research Environment. In: 4th International Conference on Open Repositories, May 2009, Atlanta, Georgia, US.

48. Goble CA, De Roure DC: myExperiment: social networking for workflow-using e-scientists. In: Proceedings of the 2nd workshop on Workflows in support of large-scale science, 2007, Monterey, California, USA.

49. Cheung K, Frost HR, Marshall MS, Prud’hommeaux E, Samwald M, Zhao J, Paschke A: A journey to Semantic Web query federation in the life sciences. BMC Bioinformatics 2009, 10:S10.

50. Jelier R, Schuemie MJ, Roes P, van Mulligen EM, Kors JA: Literature-based concept profiles for gene annotation: the issue of weighting. Int J Med Inform 2008, 77:354-362.

51. Cong S, Pepers BA, Evert BO, Rubinsztein DC, Roos RAC, van Ommen GB, Dorsman JC: Mutant huntingtin represses CBP, but not p300, by binding and protein degradation. Mol. Cell. Neurosci 2005, 30:560-571.

52. Ramos YFM, Hestand MS, Verlaan M, Krabbendam E, Ariyurek Y, van Galen M, van Dam H, van Ommen GB, den Dunnen JT, Zantema A, ‘t Hoen PAC: Genome-wide assessment of differential roles for p300 and CBP in transcription regulation. Nucleic Acids Res 2010.

53. Ruttenberg A, Rees JA, Samwald M, Marshall MS: Life sciences on the Semantic Web: the Neurocommons and beyond. Briefings in bioinformatics 2009, 10:193-204.

54. Goodman N, McCormick K, Goldowitz D, Hockly E, Johnson C, Kristal B, MacDonald M, Truant R, van Beuzekom M: Plans for HDBase—a research community website for Huntington’s Disease. Clinical Neuroscience Research 2003, 3:21.

55. Goble C, Stevens R, Hull D, Wolstencroft K, Lopez R: Data curation + process curation=data integration + science. Brief. Bioinformatics 2008, 9:506-517.

56. Ludascher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y: Scientific Workflow Management and the Kepler System. 2005.

57. Rubin DL, Shah NH, Noy NF: Biomedical ontologies: a functional perspective. Brief. Bioinformatics 2008, 9:75-90.

58. Clark T, Kinoshita J: Alzforum and SWAN: the present and future of scientific web communities. Briefings in bioinformatics 2007, 8:163-71.

59. Rauwerda H, Roos M, Hertzberger BO, Breit TM: The promise of a virtual lab in drug discovery. Drug Discov. Today 2006, 11:228-236.

60. Assel M, van de Vijver D, Libin P, Theys K, Harezlak D, O Nualláin B, Nowakowski P, Bubak M, Vandamme A, Imbrechts S, Sangeda R, Jiang T, Frentz D, Sloot P: A collaborative environment allowing clinical investigations on integrated biomedical databases. Stud Health Technol Inform 2009, 147:51-61.

61. Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res 2003, 13:1998-2004.

62. Goetze S, Mateos-Langerak J, Gierman HJ, de Leeuw W, Giromus O, Indemans MH, Koster J, Ondrej V, Versteeg R, van Driel R: The three-dimensional structure of human interphase chromosomes is related to the transcriptome map. Mol Cell Biol 2007, 27:4475-87.

63. Gierman HJ, Indemans MH, Koster J, Goetze S, Seppen J, Geerts D, van Driel R, Versteeg R: Domain-wide regulation of gene expression in the human genome. Genome Res 2007, 17:1286-95.

64. Marshall MS, Roos M, Meij E, Katrenko S, Hage, van WR, Adriaans P: Semantic disclosure in an e-Science environment. In Semantic e-Science. In press. Springer Verlag; 2010.