Selection leaves signatures in the DNA sequence of genes, with many test statistics devised to detect its action. While these statistics are frequently used to support hypotheses about the adaptive significance of particular genes, the effect these genes have on reproductive fitness is rarely quantified experimentally. Consequently, it is unclear how gene-level signatures of selection are associated with empirical estimates of gene effect on fitness. Eukaryotic datasets that permit this comparison are very limited. Using the model plant Arabidopsis thaliana, for which these resources are available, we calculated seven gene-level substitution and polymorphism-based statistics commonly used to infer selection (dN/dS, NI, DOS, Tajima's D, Fu and Li's D*, Fay and Wu's H, and Zeng's E) and, using knockout lines, compared these to gene-level estimates of effect on fitness. We found that consistent with expectations, essential genes were more likely to be classified as negatively selected. By contrast, using 379 Arabidopsis genes for which data was available, we found no evidence that genes predicted to be positively selected had a significantly different effect on fitness than genes evolving more neutrally. We discuss these results in the context of the analytic challenges posed by Arabidopsis, one of the only systems in which this study could be conducted, and advocate for examination in additional systems. These results are relevant to the evaluation of genome-wide studies across species where experimental fitness data is unavailable, as well as highlighting an increasing need for the latter.
Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.