Resources

All of Us Data Browser

Browse aggregate-level data contributed by All of Us research participants. Data are derived from multiple data sources. To protect participant privacy, we have removed personal identifiers, rounded aggregate data to counts of 20, and only included summary demographic information. Individual-level data are available for analysis in the Researcher Workbench.

BRAVO

This version of BRAVO variant browser shows chromosome locations (on GRCh38 human genome assembly), alleles, functional annotations, and allele frequencies for 705 million variants observed in 132,345 deeply sequenced (>38X) genomes from the TOPMed (Trans-Omics for Precision Medicine) data freeze 8.

CADD

Combined Annotation Dependent Depletion (CADD) is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. Developed by UW, Brotman-Baty, Hudson-Alpha, and Berlin Institute of Health.

dbGaP

Database of Genotypes and Phenotypes (dbGaP), a repository of information produced by studies investigating the interaction of genotype and phenotype.

dbSNP

dbSNP is a single-nucleotide-polymorphism database - a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI).

Ensembl

Ensembl provides an automatic gene annotation for Homo sapiens. In the case of human and mouse, the GTF files are equivalent to the GENCODE gene set. Developed by the European Bioinformatics Institute, under the European Molecular Biology Laboratory.

Genebass

Genebass (“gene-based association summary statistics”) is a resource of exome-based association statistics, made available to the public. The dataset encompasses 4,529 phenotypes with gene-based and single-variant testing across 394,841 individuals with exome sequence data from the UK Biobank. Genebass was developed by the following organizations which provided funding and guidance: AbbVie, Biogen, Pfizer, Broad Institute.

GEO

Gene Expression Omnibus (GEO) is a database for gene expression profiling and RNA methylation profiling managed by the National Center for Biotechnology Information (NCBI). Array- and sequence-based data are accepted.

gnomAD

gnomAD (the Genome Aggregation Database), originally launched in 2014 as the Exome Aggregation Consortium (ExAC), is a resource developed by an international coalition of investigators [organization], with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Broad Institute contributes data storage, computing resources, and human effort.

VEP

The Variant Effect Predictor (VEP) is part of Ensembl and Ensembl Genomes and allows the user to explore and analyze the effect that the variants (SNPs, CNVs, indels or structural variations) have on a particular gene, sequence, protein, transcript or transcription factor.

Projects & Programs

1000 Genomes

launched in January 2008; completed in 2015
2,504 samples
Demographics difficult to summarize
Data provided through IGSR

All of Us

The All of Us Research Program is a historic effort to collect and study data from one million or more people living in the United States.

Demographics (n=409,420):

Population	n	%
white	222,660	54.38
black	77,080	18.83
hispanic	64,680	15.8
>1	16,280	3.98
asian	13,840	3.38
other	7,180	1.75
skip	5,220	1.27
decline	2,560	0.63
total	409,500

source

245,400 WGS samples

CCDG

The Centers for Common Disease Genomics (CCDG) are a collaborative large-scale genome sequencing effort comprehensively identifying rare risk and protective variants that contribute to multiple common disease phenotypes.

gnomAD

Demographics

Population	overall
African/African American	20,744
Amish	456
Latino/Admixed American	7,647
Ashkenazi Jewish	1,736
East Asian	2,604
European (Finnish)	5,316
Middle Eastern	158
European (non-Finnish)	34,029
South Asian	2,419
Other	1,047
XX	38,947
XY	37,209
Total	76,156

TOPMed

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed data are being made available to the scientific community as a series of “data freezes”: genotypes and phenotypes via dbGaP; read alignments via the Sequence Read Archive (SRA); and variant summary information via the Bravo variant server and dbSNP.

population	n	%
European	72,300	40
African	51,110	29
Hispanic	34,060	19
Asian	14,700	8
Other/Multiple/Unknown	7,240	4

source

Summary

	n	European	African	Hispanic	Asian	NA
1000 Genomes	2,504
All of Us	409,420	222,660 (54%)	77,080 (19%)	64,680 (16%)	13,840 (3%)	31,240 (8%)
Genebass/UKBB	394,841	367,300 (93%)	6,700 (2%)	0 (0%)	5,200 (1%)	15,800 (4%)
gnomAD	76,156	39,345 (52%)	20,744 (27%)	7,647 (10%)	5,023 (7%)	3,397 (4%)
MVP	104,923	72,939 (70%)	24,623 (23%)	5,554 (5%)	687 (1%)	1,120 (1%)
TOPMed	179,410	72,300 (40%)	51,110 (29%)	34,060 (19%)	14,700 (8%)	7,240 (4%)

Note: NA = Other, multiple, declined, skipped, etc.
Note: All of Us claimed 245,400 WGS samples (from a population of 409,420)
Note: UKBB ancestry statistics from NPR article
430,000 “white British ancestry
78,296 UKBB “non-white British

Biobanks

UK Biobank - ~500k volunteers - longitudinal study over period of 30 years - connects genotypes & phenotypes - founded by Sir Rory Collins - overseen by a board chaired by Lord Kakkar that is accountable to the Medical Research Council & Wellcome Trust (the constituent organizations of the company) - funded by the UK Department of Health, the Medical Research Council, the Scottish Executive, and the Wellcome Trust medical research charity.

All of Us - Mayo Clinic in Rochester, Minnesota

Organizations

Genome Reference Consortium (GRC)
(under NIH)
(under NIH)
Centers for Common Disease Genomics (CCDG) (under NHGRI)

Summary of organizations and resources

European Molecular Biology Laboratory (EMBL)
└── European Bioinformatics Institute (EBI)
    └── Ensembl
        ├── Annotations
        └── Variant effect predictor (VEP)
        
United Kingdom Research and Innovation (UKRI)
└── Medical Research Council (MRC)
    └── UK Biobank
        └── Genebass
        
Department of Health and Human Services (HHS)
├── National Institutes of Health (NIH)
│   └── National Human Genome Research Institute (NHGRI)
│       ├── Centers for Common Disease Genomics (CCDG)
│       └── Human Genome Project (HGP)
├── National Heart, Lung and Blood Institute (NHLBI)
│   └── Trans-Omics for Precision Medicine (TOPMed) Consortium
│       ├── Data on dbGaP (database of Genotypes and Phenotypes)
│       └── Bravo variant browser
└── National Library of Medicine (NLM)
    └── National Center for Biotechnology Information (NCBI)
        ├── GenBank sequence database
        ├── Single Nucleotide Polymorphism Database (dbSNP)
        ├── Online Mendelian Inheritance in Man (OMIM)
        └── PubMed
        
University of Washington
└── Combined Annotation Dependent Depletion (CADD)

Broad Institute
├── Genome Aggregation Database (gnomaD)
└── Hail