INDIAN INSTITUTE OF AGRICULTURAL BIOTECHNOLOGY
INDIAN AGRICULTURAL RESEARCH INSTITUTE
Credit seminar on
GENOME WIDE ASSOCIATION
STUDIES (GWAS)
ARUN MAHESH CHANNAPUR
M.Sc. 1st YEAR
ROLL NO- 60158
DISCIPLINE- GENETICS AND PLANT BREEDING
COURSE-SEMINAR (GPB591)
CONTENTS
Introduction to gene mapping.
Linkage mapping and Association mapping.
Concept of linkage disequilibrium(LD).
Milestones of GWAS.
Population structure, kinship and statistical model.
Procedure of GWAS.
Major GWA Studies in plants and case study.
Gene mapping?
1. LINKAGE MAPPING 2. ASSOCIATION MAPPING
A. Candidate gene association studies
B. Genome-wide association studies
Target Target
gene/trait gene/trait
marker marker
(Zhu et al., 2008)
Linkage Disequilibrium(LD)
“Non-random association of alleles at different loci”
D = ( pAB . pab) – (pAb . paB)
(Marker assisted plant breeding: principles and practices- B.D. SINGH , A.K. SINGH)
LD-AM
(Mackay and Powell, 2007)
(Marker assisted plant breeding: principles and practices- B.D.
SINGH , A.K. SINGH)
Two ways to visualize or depict LD
LD decay plots Disequilibrium matrices
LD decay plot for a hypothetical locus. (Flint-Garcia, 2003) Heat maps for visualization of LD
Genome-Wide Association Studies (GWAS)
Milestones :
Developed in context of
human disease genetics
in the mid 1990s
First GWAS publication
in 2002 (Ozaki et al.,
2002 )
First GWAS publication
in plants in 2005
(Aranzana et al., 2005 )
(Xiao et al., 2017)
Objectives of GWAS
To identify genomic regions associated with particular trait of interest.
To identify favourable haplotype for selection.
To Determination of genetic architecture of the trait
Identification of genes of interests.
Development of SNP markers.
High resolution mapping of genome.
Procedure of GWAS :
(Alqudah et al., 2020)
Population structure
Relatedness among individuals
(Kinship)
“Unequal familial relationship”.
(Lipka et al., 2018)
Address relatedness
among individuals
Control for population structure:
STRUCTURE
PCA
Control For Kinship: (Beckett et al., 2017)
Identity by descent (IBD)
Identity by state (IBS)
https://brainder.org/2015/07/29/understanding-the-kinship-matrix/
Schematic diagram of the different types of population encountered in
association mapping studies
Mixed linear model framework
True False
Type I error Correct decision
Rejected False positive True positive
Probability = α Probability = 1-β
Not Correct decision Type II error
True negative False negative
rejected Probability = 1-α Probability = β
(Yu et al., 2006)
Germplasm
(bi-parental; multi
parental; diversity
panel etc.) Variation in
Association variation phenotype
in genotype association (agronomical,
(SSR, SNP, etc.) biomedical,
Model molecular, etc.)
(t-test, GLM, MLM, MLMM,
etc.)
Statistical software packages generally used for association mapping in plants
software package Brief description
Free packages:
1. TASSEL LD statistic calculation and graphic visualization; sequence analysis; association mapping using logistic regression,
GLM, MLM, and some other models; structure and kinship analyses; analysis of insertion/deletion, diversity
estimation, etc. (http://sourceforge.net/projects/tassel; http://www. maizegenetics.net)
2. EMMAX Fast computation, for large AM studies, corrects for population structure and kinship (http://genetics.
cs.ucla.edu/emmax/)
3. GenAMap Implements structured association mapping, employs various algorithms, good graphical presentation
(http://sailing.cs.cmu.edu/genamap/)
4. STRUCTURE Population structure analysis; generates Q matrix; computation intensive (http://pritch.bsd.uchicago.
edu/structure.html)
5. EINGENSTRAT Association analysis; PCA to generate P matrix to be used in the place of Q matrix (http://genepath.
med/harvard.edu/~reich/software.html)
Commercial packages:
1. ASREML MLM analysis for animal breeding data, can be used for plants (http://www.vsni.co.uk/products/ asreml)
2. GenStat Implements GLM and MLM, corrects for population structure (http://www.vsni.co.uk/software/ genstat)
3. JMP Genomics Computation of population structure and kinship coefficient (marker-based) (http://www.jmp.com/
software/genomics/)
Zhu et al. (2008) and Gupta et al. (2014)
Interpret GWAS results
Manhattan Plot
-log10(p)
Chromosome Chromosome7 position (Mb)
(Cortes et al., 2020)
Q-Q plot (quantile-quantile plot)
(Alqudah et al., 2020) (Yang et al., 2020)
Major GWA studies carried out in plants
(T. K. Sahu et al., 2023)
356 diverse inbred lines of maize Traits under phenotyping
(2, 168, and 186 from the USA, CODE TRAIT
Genotyping
China, and CIMMYT, respectively) LRL Longest Root Length Illumina method
TRL Total Root Length
Four subpopulations TRSA Total Root Surface Area
541,575 informative SNPs
TRT Total Number Of Root Tips
1. non-stiff stalk TRV Total Root Volume In total, 12,000 high-
2. stiff stalk RD Root Diameter quality SNPs were
3. tropical/subtropical group (TST) RF Root Forks randomly selected to
4. mixed group RDW Root Dry Weight estimate the population
RSR Root-to-shoot Ratio
structure matrix (Q) in
Conditions under phenotyping SDW Shoot Dry Weight
Structure version 2.3.4
TDW Total Dry Weight
1. P-sufficient software.
NVL Number Of Visible Leaves
All informative SNPs were
2. P-deficient PH Plant Height
used to calculate the
kinship matrix (K) in the R
package Genome
Two models were used to correct false associations: Association and Prediction
1. The general linear model (GLM), which used Q to correct for population Integrated Tool.
stratification. Genome wide association
2. The mixed linear model (MLM), for which Q + K was used to correct for mapping for 13 traits was
population structure and relative kinship. performed with TASSEL
version 5.0 software.
A total of 297 associated genes identified by the genome wide association study showed differential
expression with respect to low-P stress.
23 candidate genes with pleiotropic effects showed different expression levels at least at one of the three
stages.
A) The expression profile of all significant candidate genes. (B) The expression profile of 23 candidate genes
Key candidate genes and their annotation.
Identification of Favourable Haplotypes for Molecular Breeding of Maize
(A) The gene structure of GRMZM2G009544 (CDS, coding sequence).
(B) The haplotypes combined with 12 significant polymorphism sites
(C) A linkage disequilibrium (LD) plot of the 12 polymorphic sites.
RESULT
Haplotype analysis of the gene Hap5, harbouring 12 favourable SNPs, could enhance strong root
GRMZM2G009544. systems and P absorption under low-P stress
REFERENCES
Alqudah, A.M., Sallam, A., Baenziger, P.S. and Börner, A., 2020. GWAS: fast-forwarding gene
identification and characterization in temperate cereals: lessons from barley–a review. Journal of
advanced research, 22, 119-135.
Ardlie, K.G., Kruglyak, L. and Seielstad, M., 2002. Patterns of linkage disequilibrium in the human
genome. Nature Reviews Genetics, 3(4), 299-309.
B.D. Singh · A.K. Singh - Marker-Assisted Plant Breeding: Principles and Practices.
Collard, B.C., Jahufer, M.Z.Z., Brouwer, J.B. and Pang, E.C.K., 2005. An introduction to markers,
quantitative trait loci (QTL) mapping and marker-assisted selection for crop improvement: the basic
concepts. Euphytica, 142, 169-196.
Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell, S.E., 2011. A
robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one, 6(5),
p.e19379.
Eltaher, S., Baenziger, P.S., Belamkar, V., Emara, H.A., Nower, A.A., Salem, K.F., Alqudah, A.M. and
Sallam, A., 2021. GWAS revealed effect of genotype× environment interactions for grain yield of
Nebraska winter wheat. BMC genomics, 22, 1-14.
Ibrahim, A.K., Zhang, L., Niyitanga, S., Afzal, M.Z., Xu, Y., Zhang, L., Zhang, L. and Qi, J., 2020. Principles
and approaches of association mapping in plant breeding. Tropical Plant Biology, 13, 212-224.