The National
Institute of Health (NIH) Genomic Data Sharing Policy expects that genomic research
data from NIH-supported studies involving human specimens as well as non-human
and model organisms will be submitted to appropriate data repositories.The list below provides examples of relevant
databases.
NIH Data Repositories, NIH-Funded Databases, and NIH Database Collaborations
Array Express: an NIH-funded database at the
European Molecular Biology Laboratory -European Bioinformatics Institute that
collects and disseminates microarray-based gene-expression data. Read more about Array Express.
DNA Data Bank of Japan
(DDBJ): a data bank
organized by the National Institute of Genetics in Japan that collects sequence data. As a member of the International
Nucleotide Sequence Database Collaboration, DDBJ exchanges data with GenBank at the NIH National Center for Biotechnology Information and the European Nucleotide Archive European Molecular Biology Laboratory -European
Bioinformatics Institute.
Read more about DDBJ.
Database of
Genotypes and Phenotypes (dbGaP): an NIH
database at the National Center for Biotechnology Information
originally designed to archive and distribute coded
genotype, phenotype, exposure, and pedigree data from genome-wide
association studies. dbGaP now accepts additional types of data such as copy
number variants and large-scale sequencing.
Read more about dbGaP.
Database
of Short Genetic Variations (dbSNP): an NIH database at
the National Center for Biotechnology Information that includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP provides population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
Read more about dbSNP.
Database of Genomic Structural Variation (dbVar): an NIH database at the National Center for Biotechnology Information for large-scale
structural genomic variations--such as insertions, deletions, translocations, and inversions--and associated phenotype information. dbVar accepts germline and somatic human structural
variant data as well as data from a diverse array of organisms, including agriculturally important plants and livestock. Read
more about dbVar.
European Nucleotide
Archive (ENA): a database at the European Molecular Biology Laboratory -European Bioinformatics Institute (EMBL-EBI) that collects, maintains, and presents comprehensive sequencing
information--including raw sequencing data, sequence assembly information, and functional annotation--as part of the permanent public scientific record. As a member of the International Nucleotide Sequence Database Collaboration, EMBL-EBI exchanges data with GenBank at the NIH National Center for Biotechnology Information and the Data Bank of Japan.
Read more about ENA.
FlyBase: an NIH-funded database for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly
species. It includes referenced sequence genomes, phenotypic and gene expression data, chromosome maps, and additional resources. Read more about FlyBase.
GenBank:
an NIH genetic sequence database at the National Center for Biotechnology Information (NCBI) that provides an annotated collection of publicly available DNA sequences. As a member of the International Nucleotide Sequence Database Collaboration, NCBI exchanges GenBank data with the European Nucleotide Archive at the European Molecular Biology Laboratory -European Bioinformatics Institute and the Data
Bank of Japan. Read more about GenBank.
Gene
Expression Omnibus (GEO): an NIH data repository that archives and distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data. Read more about GEO.
Influenza Research Database
(IRD): an NIH-funded database that provides genomic and proteomic data for influenza viruses as well as surveillance data and phenotypic characteristics of viruses isolated from extracts. Read more about IRD.
Mouse
Genome Informatics (MGI): an NIH-funded international database for the laboratory mouse Mus musculus
that provides data on gene characterization, allelic variants, gene expression, mouse tumor biology, strain-specific phenotypes and genotypes, and mammalian orthology. Read more about MGI.
Rat Genome Database (RGD):
an NIH-funded database that serve as a repository of genetic and genomic data from the laboratory rat Rattus norvegicus
and also provides curation of mapped positions for quantitative trait loci, known mutations, and other phenotypic data. Read more about RGD.
Sequence
Read Archive (SRA): NIH's primary archive of high-throughput sequencing data at the National Center for Biotechnology Information (NCBI). SRA stores raw sequencing data as well as alignment information in the form of read placements on a reference sequence. As a member of the International Nucleotide
Sequence Database Collaboration, NCBI exchanges SRA data with the European Nucleotide Archive European Molecular Biology Laboratory -European Bioinformatics Institute and the Data Bank of Japan. Read more about SRA.
WormBase:
an NIH-funded international consortium that provides accurate, current, accessible information concerning the genetics, genomics, and biology of Caenorhabditis elegans and related nematodes. Read more about WormBase.
Xenbase:
an NIH-funded database that serves as a biology and genomics resource for research on the African frog species Xenopus
laevis and Xenopus tropicalis. Read more about Xenbase.
Zebrafish Information Network
(ZFIN): an NIH-funded database that collects, curates, and
disseminates genetic, genomic, phenotypic, and developmental data about the zebrafishDanio rerio. Data represented in ZFIN are derived from three primary sources: curation of zebrafish publications, individual research laboratories, and collaborations with bioinformatics organizations. Read more about ZFIN.
Data Repositories Established as NIH Trusted Partners
For genomic data derived from human specimens, NIH may
employ trusted third parties, or trusted partners, to meet infrastructure needs for data storage and/or to provide tools that are useful for genomic data analyses. A trusted partner is defined as a public or private, national or international organization that is able to meet core NIH standards for establishing the data quality and data management service
protocols.
NIH Established Trusted Partners
Cancer Genomics Hub (CGHub):CGHub stores, catalogs, and facilitates research using cancer genome sequences, alignments, and mutation information from the Cancer Genome Atlas (TCGA) consortium and related projects.
|