EpiGenius is our nickname for projects where we apply e-Science methodology to gain our understanding of chromatin and epigenetics in how cells work. The e-Science aspects are the multidisciplinary collaboration and the application of emerging information technologies.
We are currently running one such project, epiGeniusHD, where we try to establish the role of typical epigenetic markers such CPG islands, DNA methylation, and histone markers, in the substantially deregulated gene expression patterns of Huntington’s Disease.
Phenomena such as DNA methylation, histone modifications, and chromatin condensation constitute a poorly understood heritable code that influences phenotype and gene expression. It is important that we uncover the role of these epigenetic mechanisms in the development of diseases. Investigating epigenetic mechanisms in relation to disease and health requires support for bringing together data, tools, and knowledge, and the researchers and their expertise. Enabling a a multidisciplinary collaborative approach in specific fields of science to materialize and flourish is an ambition of e-Science (enhanced Science). e-Science tools can be used to perform sophisticated computational experiments across data and knowledge resources. It meets the social web in its ambition to engage different communities of scientists. Typical e-Science tools, such as workflow tool Taverna, make the expertise of scientists from different disciplines available for others, and allow researchers to go beyond a single experiment in a single domain of expertise.
e-Science for Life Scientists
Many e-Scientists work on enhancing computational research for life science. First, we enable bioinformaticians to design and execute better and more powerful computational experiments by using ‘workflows’. Workflows are like wet laboratory protocols, but with a run button and they can run across remote laboratories. A workflow connects computational components such as components giving access to various databases or performing a statistical calculation. Most importantly, the components in a workflow are generally made by other scientists, which enables the exploitation of expertise from multiple domains. A second element for which the interest is rapidly growing is the development and exploitation of knowledge. We can represent knowledge in machine readable form, thereby making it applicable for experimentation. We can connect knowledge across domains or locations, and also to data, making data less ‘low level’. Finally, e-Laboratory factories are being developed to facilitate the creation biologist-friendly interfaces. epiGenius builds on these developments.
An example scenario
A general problem for today’s life scientists is the information overload; a simple query in PubMed can lead to enormous amounts of ‘hits’ (cancer: >2 million, epigenetics: >25000). It is no longer possible to go through all those papers and extract the relevant elements for your hypothesis without bias. You define more defined queries or make a ‘best’ selection from the list of results. This is not ideal. You could use text mining, a computational approach. Text mining can’t do everything, but it is reasonably good at extracting for instance protein names from a lot of abstracts. If you do this with abstracts that contain elements from your hypothesis, there is a good chance that the results may be relevant for your hypothesis. However, if you would obtain long lists of protein names, how do you deal with that? For instance, how do you store the relationship with your hypothesis? Hence, we need to structure the text mining results, preferably within the context of our hypothesis. This can be done. Using the Web Ontology Language (OWL) we can structure knowledge. We can use that to structure and store the text mining results. We capture not only the protein names, but also relationships between proteins and for instance the relationship with our hypothesis. A nice application example is to see if there are proteins that link two hypotheses; for instance we found in a preliminary experiment that NF-KappaB links a model for food and a model for chromatin. The important notion is that the text mining results get a ‘second life’. Moreover, the ‘knowledge base’ of structured knowledge (not only of text mining) can be shared, which means that we can computationally investigate results and interpretations beyond a single experiment and beyond a single researcher.
This is not yet the essence of epiGenius. For that the question ‘who can do what?’ is relevant. To implement the above we created a ‘workflow’ that performs text mining in an insightful series of steps (like a wet laboratory protocol, but with a run button). We also made the knowledge structures in OWL to structure the text mining results. In addition, we mastered the query language by which such knowledge can be interrogated. And we could do this because PhD students and other computational experts provided us the necessary means. Very nice, but we are now at the stage where I, a bioinformatician trained in biology and computer science, can do this. But what about the real epigeneitcs expert?! This is what epiGenius is all about. It is very important that bioinformaticians can design and execute sophisticated epigenetics analyses, but there must be an interface to involve the epigenetics expert. Generalising, these experts often know what they want, but not how, and sometimes -when new technology is well presented- they will even want new things.
For the previous web page on epigenius click here