Yesterday, a paper on the analysis and interpretation of the genomes of a family of four was released in PLoS Genetics and featured in the Wall Street Journal, spearheaded by Rick Dewey and Euan Ashley. I was fortunate to be involved in this groundbreaking analysis, a logical next step to the clinical interpretation of Steve Quake’s genome last year in the Lancet. Collaborating on this paper got me thinking about analysis of family genomes in the age of GWAS (Genome-Wide Association Studies).
In the linkage studies of the past, researchers focused on families and segregation patterns of alleles to identify genes significantly linked with disease. These studies worked great for rare diseases, as they could focus on a single linked region/gene at a time. But for complex/multigenic diseases, the segregation patterns of a disease are not as clear, and the GWAS community has stepped in to tackle these problems on a larger scale. However, the genetic basis of only a few diseases have been successfully mapped by GWAS (to, say, greater than 50% of the genetic variance explained by the factors identified in the studies), such as age-related macular degeneration, and the bulk of diseases and traits have come up short. For complex diseases, the difficulty is the same as before: with so many unaccounted-for variables, we are back to a needle in a haystack problem. There is a great potential for combining family data with GWAS-based methods: in an analogous method to Sarah Ng and Jay Shendure’s identification of disease genes in rare diseases by exome sequencing, the ability to “subtract out” some of the noise (that may be family-specific) may result in more reliable results. Specifically, an unaffected family member may be used to down-weight the SNPs in common with an affected subject.
Looking at the genomes of the whole family at once in a clinical assessment context (applying results from large studies to a smaller number of individuals) was crucial to this analysis. At the most basic level, simply estimating the error rates is highly aided by the sequencing of multiple family members: knowing that the likelihood of AG and GG parents will have an AA child is vanishingly small gives us a confidence level for the SNP calls we do make. Then, when it comes to assessment of disease risk, analysis of multiple family members demonstrates the exact problem of complex diseases. While both parents may not be at risk for a disease, the exact combination of alleles passed down can confer a greater risk than the average of the parents. It is precisely here that genetic risk has a potential to trump family history in clinical analysis. At present, family history is a great predictor of clinical outcome, as it encapsulates much of the uncharacterized risk conferred by genetics. However, as our understanding of the genetic factors of disease increases, the genetic profile can incorporate something the family history cannot: the precise pattern of allele segregation. Finally, a family analysis can allow for phased genomes, which can inform the presence of “compound heterozygotes,” or cases where both alleles of a gene are affected by 2 different SNPs. While each of these may not be damaging on their own, the combination of both alleles may render both copies of the gene ineffective.
As the availability of genome-wide methods rapidly expanded, analysis of families seemed to go out of fashion for a while. Of course, we will need sophisticated informatics methods to tease out the signal from the noise, and these would not be trivial. However, with the current trends of the cost of genotyping and genome sequencing, a dataset of 100 families with a common disease is not out of sight. Then, of course, the clinical assessment of a family genome is another challenge, to which this paper brings a novel perspective, and it will be fascinating to follow the further development of these methods.