Wednesday, May 9, 2018

Are common diseases generally caused by 'common' variants?

The funding mantra of genomewide mapping is that common variants cause common disease (CV-CD).  This was convenient for HapMap and other association-based attempts to find genetic causation.  The approach didn't require very dense genotyping or massive sample sizes, for example. Normally, based on Mendel's widely known experiments and so on, one would expect anything 'genetic' to run in families; however, because of notions like low penetrance--low probability of having the trait even if you've inherited the genetic variants--small nuclear families can't work, as a rule, and big enough families would be too costly or even impossible to ascertain.  In particular, for traits due to the effects of many genes or environmental factors and/or only weak causal variants' effects, families would not really be practicable.

So, conveniently, when DNA sequencing on a genomewide scale became practicable, the idea was that sequence variants might not have wholly determinative effects but the effects might be enough that we just need to find them in the population as a whole, not the smallish families that we have a hard enough time ascertaining.  People carrying such a variant would have a higher probability of showing the trait.

It was a convenient short-cut, but there is a legitimate evolutionary rationale behind this:  The same mutation will not recur very often so that if there are many copies of a causative allele (sequence variant) in a population, these are probably identical by descent (IBD), from a single ancestral mutational event.  In that sense, genomewide association studies (GWAS) are finding family members carrying the same allele, but without having to work through the actual (inaccessible, very large, multi-generation ancestral) pedigrees that connect them.  If the IBD assumption were not basically true, then different instances of the same nucleotide change will have different local genomic backgrounds, and the effects will often or even likely vary among the descendants of different mutations, affecting association tests, though the analysis rarely, if ever, attempts to detect or adjust for this.

In principle it can work well if a trait really is caused by alleles at a tractably small number of genes.  That's a very big 'if', but assuming it, which is similar to assuming the trait is a classical Mendelian trait, then one can find association of the allele with the trait among affected people, because, at that site at least, they are distant relatives.  If it is to be considered, though, the effect of a given allele does have to be strong enough, and its frequency in the sample large enough, to pass a statistical significance test.  This is a potential major issue since in very large samples searching countless sites across the genome, reaching significance means many observations, and in a sense requires an allele's frequency and/or individual impact to be high.

In essence, this is the underpinning and implicit justification for the huge GWAS empire.  There are many details, but one important assertion by the leaders of the new EverBigger (and more costly) AllOfUs project, is that common diseases are their target.  Rare diseases generally just won't show up often enough to find statistically reliable 'hits'.

Of course, 'common' is a subjective term, and if one searches millions of genome sites and their allele frequencies vary in the sample, tons of them might be 'common' by such a definition.  And they will have to have strong enough effects to be detectable as well based on suitably convincing significance criteria.  So we might expect CV-CD to be a proper description of such studies.  But there is a subtle difference: the implication (and once, 20 years ago, the de facto expectation) is that this meant that one or a few common variants cause the common disease.

Obviously, if that assumption of convenience were roughly true, then one can think of pharmaceutical or other preventive measures to target the causal variants in these genes in affected persons.  In fact, we have largely based the nearly 20-year GWAS effort on such a wedge rationale, starting with smaller-scale projects like HapMap.  Unfortunately, that was a huge success!

Why unfortunately?  Because, no matter how you define 'common', what we've clearly found, time and again, trait after trait, is that these common diseases are in each case due to effects of a different set of 'common' alleles whose effects are individually weak.  In that sense, the individual allele per se is not very predictive, because many unaffected people also carry that allele.  Every case is genetically unique so one Pharma does not fit all.  It is, I would assert, highly misrepresentative if not irresponsible to suggest otherwise, as is the common PR legerdemain.

Instead, what we know very clearly is that in many or most 'common' disease instances, since each case is caused by different sets of alleles, not only is each case causally unique, but no one allele is, in itself nearly necessary for the disease.  There isn't usually a single 'druggable' target of Pharma's dreams.  There was perhaps legitimate doubt about this 20 years ago when the adventure began, but no longer.

Indeed, it is generally rare for anything close to most of cases, compared to controls, to share any given allele, and even when that happens, the cause, as statistically estimated, say, by comparing cases and controls, is usually only slightly attributable to that allele's effects.  Even then most variation is typically not being accounted for, as measured by the trait's estimated heritability, because it seems due to a plethora of alleles too weak or rare to be detected in the sample, even if they're there and are, collectively, the greatest risk contributor. And, of course, we've not mentioned lifestyles and other environmental factors, nor the often largely non-overlapping results from different populations, nor various other factors.

The non-Mendelian Mendelian reality of life
I think that as a community we were led into this causal cul de sac by taking Mendel too literally or too hopefully.  To be sure, some traits are qualitative--they appear in two or a few distinct states, like green vs yellow peas--and these are basically the kinds of traits Mendel studied, because they were tractable.  In such cases each gene transmits in families in a regular way, that in his honor we call 'Mendelian'.  And human genetics had great success identifying them and their causal genes (cystic fibrosis is one well-known example but there have been many others).  However, common diseases are generally not caused by individual alleles at single genes.  Quantitative geneticists, such as agricultural breeders have basically known about the complexity of most traits for a century, even if specific contributing genes couldn't be identified until methods like GWAS came along 15-20 years ago.

Since we know all this now, from countless studies, it is irresponsible to hijack huge funding for more and more again of the same, based on a CV-CD promise that neither the public nor many investigators understand (or if they do, dare acknowledge).  One might go farther and suggest that this makes 'CV-CD' a semantic shell-game, that the Congress and public are still buying--bravely assuming that the administrators and scientists themselves, who are pushing this view, actually understand the genomic (and environmental) landscape.

NIH Director Collins is busy and has to worry about his institute's budget.  He may or may not know the kinds of things we've mentioned here--but he should!  His staff and his advisors should!  We have not invented them, no matter whether we've explained them fully or precisely enough.  We have no vested interest in the viewpoint we're expressing.  But the evidence shows that research should now be capitalizing, so to speak, on what we've actually learned from the genomic mapping era, rather than just doing more of the same, no matter how safe that is for careers (a structural problem that society should remedy).

Instead of ever more wheel-spinning, what we really need is new thinking, different rather than just more of the same Big Data enumeration.  Until new ideas bubble up, neither we nor anyone else can specify what they should be.  Continuing to pay for ever bigger data serves several immediate interests very well: the academic enterprise whose lifeblood includes faculty salaries and overhead funding for research done in their institution, the media and equipment suppliers who thrive on ever-biggerness, and the administrators and scientists whose imagination is too impoverished to generate some actual ideas.  More is easier, more insightful is very much harder.

So, yes, common diseases are caused by common variants--tens or hundreds of them!  Enumerating them is becoming a stale, repetitive costly business and maybe 'business' is the right word.  The public is paying for more, but in a sense getting less.  Until some day, someone thinks differently.

No comments: