Background and Concept

By providing a complete Homo sapiens ‘parts list’ (the gene sequences) and a powerful ‘toolkit’ (technologies), the Human Genome Project has revolutionised mankind’s ability to explore how genes cause disease and other phenotypes. Studies in this domain are proceeding at a rapid and ever-increasing pace, generating unprecedented amounts of raw and processed data. It is now imperative that the scientific community finds ways to effectively manage and exploit this flood of information for knowledge creation and practical benefit to society. This fundamental goal lies at the heart of the “Genotype-To-Phenotype Databases: A Holistic Solution (GEN2PHEN)” project.


Previous genetics studies have shown that inter-individual genome variation plays a major role in differential normal development and disease processes. However, the details of how these relationships work are far from clear, even in the case of most Mendelian disorders where single genetic alterations are fully penetrant (essentially causative, rather than risk modifying). Background genetic effects (modifier genes), epistasis, somatic variation, and environmental factors all complicate the situation. This is particularly the case in complex, multi-factorial disorders (e.g., cancer, heart disease, diabetes, dementia) that will affect most of us at some stage in our lifetime. Strategies do, however, now exist to study the genetics of these disorders, and such investigations are a major focus of research throughout Europe and beyond. A common thread in these studies is the need to create ever-larger datasets and integrate these more effectively.

Success in deciphering the mechanisms and pathways underpinning genotype-to-phenotype (G2P) relationships will bring about radical new opportunities for predicting, preventing, diagnosing, and treating all forms of illness. It will launch an era of truly effective personalised medicine. Extensive research is therefore being conducted worldwide to characterise genetic variation in normal and disease contexts. Sadly though, the resulting flood of primary information is not yet being managed or utilised as effectively as it should be - due simply to the lack of a sufficiently organised and mature database infrastructure by which the discoveries can be gathered, stored, integrated and queried as a composite whole in the electronic (internet) domain. Furthermore, whilst new positive findings are being handled sub-optimally, ‘negative’ observations are in most cases not even reported in any way, shape, or form – despite the fact that they constitute an essential part of any complete and accurate G2P depiction. This needs to change, and an international ‘Human Variome Project’ (HVP) has emerged to help argue this case.

It is against this backdrop that the GEN2PHEN project aims to become the key European contribution to the challenges listed above, harmonised with similar projects elsewhere, and dovetailed into many related European programmes of work. It will provide an important and timely solution to a current research need that was highlighted by the European Strategy Forum on Research Infrastructures (ESFRI) - Priority area: ‘Upgrade of European Bio-Informatics Infrastructure (Shared platform for data resources in the Life Sciences)’. It will provide European G2P research and biotech industries with the proper support they need in terms of database technologies and data integration systems. Only then can our societies maximally benefit from the current exponentially increasing rate of genetic data generation in disease research and clinical settings.