Resources

All of Us Data Browser

Browse aggregate-level data contributed by All of Us research participants. Data are derived from multiple data sources. To protect participant privacy, we have removed personal identifiers, rounded aggregate data to counts of 20, and only included summary demographic information. Individual-level data are available for analysis in the Researcher Workbench.

BRAVO

This version of BRAVO variant browser shows chromosome locations (on GRCh38 human genome assembly), alleles, functional annotations, and allele frequencies for 705 million variants observed in 132,345 deeply sequenced (>38X) genomes from the TOPMed (Trans-Omics for Precision Medicine) data freeze 8.

CADD

Combined Annotation Dependent Depletion (CADD) is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. Developed by UW, Brotman-Baty, Hudson-Alpha, and Berlin Institute of Health.

dbGaP

Database of Genotypes and Phenotypes (dbGaP), a repository of information produced by studies investigating the interaction of genotype and phenotype.

dbSNP

dbSNP is a single-nucleotide-polymorphism database - a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI).

Ensembl

Ensembl provides an automatic gene annotation for Homo sapiens. In the case of human and mouse, the GTF files are equivalent to the GENCODE gene set. Developed by the European Bioinformatics Institute, under the European Molecular Biology Laboratory.

Genebass

Genebass (“gene-based association summary statistics”) is a resource of exome-based association statistics, made available to the public. The dataset encompasses 4,529 phenotypes with gene-based and single-variant testing across 394,841 individuals with exome sequence data from the UK Biobank. Genebass was developed by the following organizations which provided funding and guidance: AbbVie, Biogen, Pfizer, Broad Institute.

GEO

Gene Expression Omnibus (GEO) is a database for gene expression profiling and RNA methylation profiling managed by the National Center for Biotechnology Information (NCBI). Array- and sequence-based data are accepted.

gnomAD

gnomAD (the Genome Aggregation Database), originally launched in 2014 as the Exome Aggregation Consortium (ExAC), is a resource developed by an international coalition of investigators [organization], with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Broad Institute contributes data storage, computing resources, and human effort.

VEP

The Variant Effect Predictor (VEP) is part of Ensembl and Ensembl Genomes and allows the user to explore and analyze the effect that the variants (SNPs, CNVs, indels or structural variations) have on a particular gene, sequence, protein, transcript or transcription factor.

Projects & Programs

1000 Genomes

  • launched in January 2008; completed in 2015
  • 2,504 samples
  • Demographics difficult to summarize
  • Data provided through IGSR

All of Us

The All of Us Research Program is a historic effort to collect and study data from one million or more people living in the United States.

Demographics (n=409,420):

Population n %
white 222,660 54.38
black 77,080 18.83
hispanic 64,680 15.8
>1 16,280 3.98
asian 13,840 3.38
other 7,180 1.75
skip 5,220 1.27
decline 2,560 0.63
total 409,500  

source

245,400 WGS samples

CCDG

The Centers for Common Disease Genomics (CCDG) are a collaborative large-scale genome sequencing effort comprehensively identifying rare risk and protective variants that contribute to multiple common disease phenotypes.

gnomAD

Demographics

Population overall
African/African American 20,744
Amish 456
Latino/Admixed American 7,647
Ashkenazi Jewish 1,736
East Asian 2,604
European (Finnish) 5,316
Middle Eastern 158
European (non-Finnish) 34,029
South Asian 2,419
Other 1,047
XX 38,947
XY 37,209
Total 76,156

TOPMed

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed data are being made available to the scientific community as a series of “data freezes”: genotypes and phenotypes via dbGaP; read alignments via the Sequence Read Archive (SRA); and variant summary information via the Bravo variant server and dbSNP.

population n %
European 72,300 40
African 51,110 29
Hispanic 34,060 19
Asian 14,700 8
Other/Multiple/Unknown 7,240 4

source

Summary

  n European African Hispanic Asian NA
1000 Genomes 2,504          
All of Us 409,420 222,660 (54%) 77,080 (19%) 64,680 (16%) 13,840 (3%) 31,240 (8%)
Genebass/UKBB 394,841 367,300 (93%) 6,700 (2%) 0 (0%) 5,200 (1%) 15,800 (4%)
gnomAD 76,156 39,345 (52%) 20,744 (27%) 7,647 (10%) 5,023 (7%) 3,397 (4%)
MVP 104,923 72,939 (70%) 24,623 (23%) 5,554 (5%) 687 (1%) 1,120 (1%)
TOPMed 179,410 72,300 (40%) 51,110 (29%) 34,060 (19%) 14,700 (8%) 7,240 (4%)
  • Note: NA = Other, multiple, declined, skipped, etc.
  • Note: All of Us claimed 245,400 WGS samples (from a population of 409,420)
  • Note: UKBB ancestry statistics from NPR article
  • 430,000 “white British ancestry
  • 78,296 UKBB “non-white British

Biobanks

UK Biobank - ~500k volunteers - longitudinal study over period of 30 years - connects genotypes & phenotypes - founded by Sir Rory Collins - overseen by a board chaired by Lord Kakkar that is accountable to the Medical Research Council & Wellcome Trust (the constituent organizations of the company) - funded by the UK Department of Health, the Medical Research Council, the Scottish Executive, and the Wellcome Trust medical research charity.

All of Us - Mayo Clinic in Rochester, Minnesota

Organizations

  • Genome Reference Consortium (GRC)
  • (under NIH)
  • (under NIH)
  • Centers for Common Disease Genomics (CCDG) (under NHGRI)

Summary of organizations and resources

European Molecular Biology Laboratory (EMBL)
└── European Bioinformatics Institute (EBI)
    └── Ensembl
        ├── Annotations
        └── Variant effect predictor (VEP)
        
United Kingdom Research and Innovation (UKRI)
└── Medical Research Council (MRC)
    └── UK Biobank
        └── Genebass
        
Department of Health and Human Services (HHS)
├── National Institutes of Health (NIH)
│   └── National Human Genome Research Institute (NHGRI)
│       ├── Centers for Common Disease Genomics (CCDG)
│       └── Human Genome Project (HGP)
├── National Heart, Lung and Blood Institute (NHLBI)
│   └── Trans-Omics for Precision Medicine (TOPMed) Consortium
│       ├── Data on dbGaP (database of Genotypes and Phenotypes)
│       └── Bravo variant browser
└── National Library of Medicine (NLM)
    └── National Center for Biotechnology Information (NCBI)
        ├── GenBank sequence database
        ├── Single Nucleotide Polymorphism Database (dbSNP)
        ├── Online Mendelian Inheritance in Man (OMIM)
        └── PubMed
        
University of Washington
└── Combined Annotation Dependent Depletion (CADD)

Broad Institute
├── Genome Aggregation Database (gnomaD)
└── Hail