Population genomics and adaptive evolution of urban white-footed mice

Population genomics involves computational examination of large numbers of DNA sequences to identify the modes and tempo of evolution.  Next-generation sequencing now generates millions of sequences cheaply and quickly, and focusing on protein-coding transcripts (i.e. the exome) improves the power to detect functional variants experiencing natural selection.  A strength of these approaches is that they can identify variants under selection that would be difficult to predict a priori; one can then examine the adaptive dynamics of these candidate genes in the field.  We are using these new approaches to examine adaptation to urban ecosystems among white-footed mice in NYC.  In collaboration with Prof. Rachel O’Neill at UCONN, we are generating deep exome sequence for urban and non-urban mice using massively parallel Roche 454 and ABI SOLiD sequencing.

Sequencing, assembly, & annotation of exomes for urban  Peromyscus

We are currently assembling exomes for white-footed mice based on sequencing

Level 2 gene ontology (GO) terms identified for 31 candidate genes under selection with an elevated ratio of nonsynonymous to synonymous base pair changes.

of messenger RNA transcripts extracted from liver, gonad, and brain samples.  To date, we have run three 454 plates of multiplexed cDNA libraries from four urban and one rural population.  From these data we have assembled over 18K contigs with an average length nearing 1K bp.  We have established homology with over 11K known genes, and assigned ~75,000 GO (gene ontology) annotations to 9,700 of these genes.  We are currently conducting additional 454 sequencing and collecting samples from additional rural populations.  Later in 2012 we will be using SOLiD 5500XL sequencing of individual cDNA libraries to improve the coverage and depth of our exomes.  Individual RNA-Seq data will also facilitate more powerful tests for selection and investigation of gene expression.

Identification of candidate genes under selection in urban environments

Manhattan plot for one urban population. Dots above red line represent potentially adaptive variants (single nucleotide changes) in genes that occurred significantly more often (P < 0.0000001) in the urban vs. rural population

The second phase of this research uses statistical outlier analyses to identify candidate genes under selection.  Urban P. leucopus often achieve their highest population densities and reproductive rates in small forest fragments due to a lack of competitors and predators.  Costs of reproduction, intraspecific competition, and disease likely all increase in urban populations. When population density is chronically high, genes involved in sperm competition among males and “faster” life histories among females would be favored.  Recent studies have detected signatures of selection in taxa inhabiting chronically polluted areas for relatively short periods of time.  We predict that the 14-member metallothionein gene family will exhibit adaptive SNPs and expression changes at polluted sites.  Genes related to disease may also be under directional, or even relaxed, selection depending on the specific pathogen involved.  To date, we have identified a large number of SNPs and completed exploratory scans for outlier loci under selection in urban populations.  These scans used the ratio of nonsynonymous to synonymous substitutions (dN/dS) and significant differences in allele frequencies to identify 31 and 14 outlier genes, respectively.  These candidate genes are involved in heavy metal detox pathways, cellular proliferation and translational elongation, immune activity, lipid metabolism, and mitochondrial respiration.  These results are preliminary and require further validation, but many of these candidates are in line with our hypotheses for adaptive change in urban populations.  Future genome scans will employ much more powerful techniques based on individual genotypes.

Is there an "adaptive urban landscape"? Do genetic variants selected for in NYC decline steadily along urban-to-rural gradients, or drop off quickly after a certain threshold?

future work: Gene expression & adaptive landscape genomics

Beyond changes at the nucleotide level, adaptation may also proceed through changes in gene expression.  We will use SOLiD RNA-Seq data from individual mice to examine changes in gene expression between urban and rural populations.  We will also conduct additional sequencing and genome scans to identify potential adaptive changes in regulatory regions that fall outside of protein-coding genes.

Field studies that examine the relationships between genotype, phenotype, and ecological pressures driving natural selection are the gold standard of evidence for adaptations at the nucleotide level.  To initially establish the ecological importance of candidate genes identified above, we will examine correlations between landscape variables and adaptive variants genotyped from white-footed mice sampled along urban-to-rural transects from NYC’s Central Park to the Catskills in upstate NY.  We will use new spatial analysis methods to examine the associations between adaptive SNPs and environmental characteristics such as percent canopy / urbanization, soil heavy metals, habitat quality, and population density.