Currently, I am studying the rate and dynamics of meiotic gene conversion. This work is in preparation for publication and I plan to post it on the bioRxiv by mid-April.
Slides from a recent talk I gave describing this research are available here. In summary, I estimated the rate at which a base in the genome is be affected by meiotic gene conversion (~8×10-6/base pair/generation), found evidence for GC bias (~70% of gene conversions transmit G or C alleles vs. A or T), and found that females transmit more gene conversions than males (~1.5× more).
SIGMA Type 2 Diabetes Project
I led the analysis for the SIGMA Type 2 Diabetes Project that examined more than 8,000 Mexican and other Latin American descent individuals to identify type 2 diabetes susceptibility loci. We identified a novel locus that confers risk for type 2 diabetes and has high frequency in Mexicans and other Latin Americans. The paper describing this work is currently in press at Nature and will appear online before the end of 2013.
Inferring haplotype phase in large genotype datasets of unreleated individuals and trios/duos
A key insight underlying the methodology employed in HAPI-UR is that haplotype phase accuracy increases with sample size. HAPI-UR uses a computationally efficient method that is more than 18 times faster than other phasing methods. Thus HAPI-UR is efficient and effective at phasing very large datasets and will be especially applicable to the increasingly large datasets being generated and now available.
Inferring haplotype phase in family datasets
Other methods for inferring haplotype phase in family genotype data have runtime that scales exponentially in the number of individuals in the family. HAPI uses a novel state formulation that leverages the fact that real genetic data contain relatively few recombination events and, in so doing, obtains polynomial runtime on real genetic data. The problem of inferring haplotypes in family data has been shown to be NP-hard, but in practice the state formulation that HAPI uses enables it to merge an exponential number of states for realistic inputs.
When run on a dataset containing 103 nuclear families, HAPI was more than 300 times faster than other methods. When analyzing a family with 11 children, HAPI used an average of 4.2 states per marker, with a maximum of 48 states at any marker. In contrast, other methods use 22c markers, where c is the number of children in a nuclear family, and thus for an 11 child family, other methods build 4.2 million states per marker.
HAPI is currently only able to handle nuclear families, but I plan to extend it to apply to general pedigrees so that haplotype-based genetic analyses of family data will not be computationally limited.
Local ancestry inference in Latinos
I developed an extension to HapMix to enable it to infer local ancestry in multi-way admixed populations, including Latinos. This extension is described in the supplement to the 1000 Genomes Phase I paper. I aided in applying this method to identify a breast cancer risk locus in Latinas; the paper describing this work is available here.