Thursday, August 23, 2012

Causal genes hiding in the "p"-patch!

We've posted many times about the problems we face today in dealing with multifactorial causation. In metaphoric terms, we wand to find causes that satisfy a statistical criteron of 'significance', by using some test, often some probability, p, of unusualness of the result that points to causation, that we can symbolically refer to as a p-value.

This applies to human genetics and the fashionable 'omics' approach, and to much else in biology.  One thing we talked about before and recently is the hypothesis that rare variants cause human trait variation in the sense of the difference between cases and controls. Some investigators have been arguing that rare variants with strong effect, rather than common variants, account for a substantial fraction of disease (combinations of variants, some of them rare, each with small effects, is another version of the rare-variant arguments).

But rare variants present a problem, which is that you don't see them often enough for statistical significance to be achieved. Yet they may be causal.  We recently noted that finding the same rare variant in affected family members is one possible way to identify them where significance is less of an overwhelming requirement.  Our last couple of posts deal with this subject.

Two back-to-back papers in the August 10 American Journal of Human Genetics are of interest here, because of what they confirm about this problem.  These are two reports from David Goldstein's lab, both large-scale searches for genetic causation, one of idiopathic generalized epilepsy and the other of schizophrenia (both open access).  Goldstein has argued for some time that genomewide association studies (GWAS) aren't finding genes with large effects because most complex diseases are caused by rare variants, with small effects. They don't reach significance, though they're real causes (one thinks): we're caught in the p-patch!

Idiopathic generalized epilepsy
Idiopathic generalized epilepsy (IGE) is a complex disease that, like many such diseases, is highly heritable but its genetic architecture has been difficult to parse ('Idiopathic' means cause not known).  According to the paper, rare copy number variants have been found to explain the disorder in only 3% of affected individuals.  So Goldstein's purpose was to test whether rare variants with moderate effect could be found to explain IGE.

The group compared the exomes -- all the exons, DNA coding regions --  of 118 people with IGE with those of 242 controls, and found no variants significantly associated with the disorder.  They then looked at almost 4000 variants that they considered to be candidates for epilepsy susceptibility and genotyped 878 cases and 1830 controls for these variants, with no statistically significant finding.

They report that close to 1/2 of these variants were only in cases, which suggested to them that at least some of these must be genetic risk factors.  However, the high heterogeneity of epilepsy disorders means that any single variant will be difficult to find, and/or that single-nucleotide variants have small effects.  E.g., they estimate that the variant they observed most frequently here accounts for 0.6% of the cases of IGE in this study, if it is indeed turns out to be causal, and this is the ballpark figure for causal variants they've identified for other complex diseases.  And, a recent study of epilepsy published in Cell by a group at Baylor compared cases to controls looking at all exons, and found potentially pathologic variants statistically as often in controls as in cases.

The current paper concludes that "moderately rare variants with intermediate effects ("goldilocks alleles") do not play a major role in the risk of IGE."  Current methods are not adequate for detecting variants with very small effects, even when they exist. The epilepsies are considered to be channelopathies, disorders in which an ion channel disruption plays a major part.  Thus, it has been assumed that mutations in ion channel genes would be found to be causal, but the list of candidate genes identified by these authors is not enriched for such genes, suggesting that "the pathophysiology governing epilepsy might be far more complex than simply a disorder of disrupted ion channels..."

Finally, the authors conclude that results from small studies must be treated with caution as they can't provide comprehensive lists of candidate variants.  But, studies large enough to detect variants that are at a frequency of, say, 0.06%, as some of the variants in this study, are essentially impossible.  Such variants, they say, "will probably only be securely implicated through gene-based association analyses in large sample sizes and, where available, cosegregation analyses within multiplex families."

Schizophrenia
Schizophrenia is another complex trait with high heritability, high phenotypic heterogeneity, and a low success rate with respect to identifying genetic risk factors.  As with most traits, GWAS have identified some genes with very low effect, but not always replicably.  Again, the question is whether the causal variants are moderately rare but identifiable in large studies, or so heterogeneous and rare as to remain hidden with current large-population based methods. 

In the study reported in the AJHG, Goldstein's group followed the same 2-step analysis as described above for IGE, ultimately assessing selected variants in 2,617 cases and 1800 controls.  No single variant was statistically significant, though, again, they identified case-specific variants, some of which may actually be causal.  They conclude that risk of schizophrenia is unlikely to be due to moderately rare variants with moderate effect, and that "multiple rarer genetic variants must contribute substantially to the predisposition to schizophrenia, suggesting that both very large sample sizes and gene-based association tests will be required for securely identifying genetic risk factors."

In essence, this is either polygenic control in which each case is due to some combination of large numbers of individually weak, mainly rare, contributing variants, or that individual strong-variants exist but are so rare that we may struggle to get enough samples.  Follow-up or family studies that find many different variants in the same gene, and where the gene's function seems plausible for the trait, could help.  But it could be that there aren't enough humans on earth to achieve significance in the statistical sense....and that in important ways means the variant or gene isn't 'significant' in the public health or clinical setting either: approaches to aggregate causation may be needed. A way to escape from the p-patch.  We think so, at least, as we've said many times here before.

No comments: