Generic Model Organism Database

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

Generic Model Organism Database project logo

History

The GMOD project was started in the early 2000s as a collaboration between several model organism databases (MODs) who shared a need to create similar software tools for processing data from sequencing projects. MODs, or organism-specific databases, describe genome and other information about important experimental organisms in the life sciences and capture the large volumes of data and information being generated by modern biology. Rather than each group designing their own software, four major MODs--FlyBase, Saccharomyces Genome Database, Mouse Genome Database, and WormBase—worked together to create applications that provide functionality needed by all MODs, such as software to help manage the data within the MOD, and to help users access and query the data.

The GMOD project works to keep software components interoperable. To this end, many of the tools use a common input/output file format or run off a Chado schema database.

Chado database schema

The Chado[1] schema aims to cover many of the classes of data frequently used by modern biologists, from genetic data to phylogenetic trees to publications to organisms to microarray data to IDs to RNA/protein expression. Chado makes extensive use of controlled vocabularies to type all entities in the database; for example: genes, transcripts, exons, transposable elements, etc., are stored in a feature table, with the type provided by Sequence Ontology. When a new type is added to the Sequence Ontology, the feature table requires no modification, only an update of the data in the database. The same is largely true of analysis data that can be stored in Chado as well.

The existing core modules of Chado are:

sequence - for sequences/features
cv - for controlled-vocabs/ontologies
general - currently just dbxrefs
organism - taxonomic data
pub - publication and references
companalysis - augments sequence module with computational analysis data
map - non-sequence maps
genetic - genetic and phenotypic data
expression - gene expression
natural diversity - population data

Software

The full list of GMOD software components is found on the GMOD Components page. These components include:

GMOD Core (Chado database and tools)
- Chado: the Chado schema and tools to install it.
- XORT: a tool for loading and dumping chado-xml
- GMODTools: extracts data from a Chado database into common genome bulk formats (GFF, Fasta, etc.)
MOD website
- Tripal: a web front end based on Drupal.
Genome Editing and Visualization
- Apollo: a Java application for viewing and editing genome annotations
- GBrowse: a CGI application for displaying genome annotations[2]
- JBrowse: a JavaScript application for displaying genome annotations
- Pathway Tools: a genome browser with a comparative mode
Comparative Genomics
- GBrowse_syn: a GBrowse based synteny viewer
- CMap: a CGI application for displaying comparative maps
Literature curation
- Textpresso: a text mining system for scientific literature
Database querying tools
- BioMart: a query-oriented data management system
- InterMine: open source data warehouse system
Biological Pathways
- Pathway Tools: tools for metabolic pathway information, and analysis of high-throughput functional genomics data
Regulatory Networks
- Pathway Tools: supports definition of regulatory interactions and browsing of regulatory networks
Analysis
- Galaxy[3]
- MAKER

Participating databases

The following organism databases are contributing to and/or adopting GMOD components for model organism databases.

ANISEED	AntonosporaDB	Arabidopsis
Beebase	BeetleBase[4]	Bovine genome database (BGD)
BioHealthBase	Bovine QTL Viewer	Cattle EST Gene Family Database
CGD	CGL	ChromDB
Chromosome 7 Annotation Project	CSHLmpd	Database of Genomic Variants
DictyBase[5]	DroSpeGe	EcoCyc
FlyBase	Fungal Comparative Genomics	Fungal Telomere Browser
Gallus Genome Browser	GeneDB	GrainGenes
Gramene	HapMap	Human 2q33
Human Genome Segmental Duplication Database	IVDB	MAGI
Marine Biological Lab Organism Databases	Mouse Genome Informatics	Non-Human Segmental Duplication Database
OMAP	OryGenesDB	Oryza Chromosome 8
Pathway Tools	ParameciumDB[6]	PeanutMap
PlantsDB	PlasmoDB	PomBase
PseudoCAP	PossumBase	PUMAdb
Rat Genome Database	Saccharomyces Genome Database	SGD Lite
SmedDB	Sol Genomics Network	Soybase
Soybean Gbrowse Database	T1DBase	The Arabidopsis Information Resource
TGD	The Genome Institute	The Institute for Genomic Research
TIGR Rice Genome Browser	ToxoDB	TriAnnot BAC Viewer
VectorBase	wFleaBase[7]	WormBase
XanthusBase	Xenbase

Related projects

Bioperl, BioJava, Biopython, BioRuby, etc.
Ensembl
Gene Ontology
DAS
Genomics Unified Schema
Manatee: Manual Annotation Tool
Biocurator.org
Open Biomedical Ontologies
Sequence Ontology Project

References

Christopher J. Mungall; David B. Emmert; The FlyBase Consortium (2007). "A Chado case study: an ontology-based modular schema for representing genome-associated biological information". Bioinformatics. 23 (13): i337–i346. doi:10.1093/bioinformatics/btm189. PMID 17646315.
Stein LD; Mungall C; Shu S; Caudy M; Mangone M; Day A; Nickerson E; Stajich JE; Harris TW; Arva A; Lewis S. (2002). "The generic genome browser: a building block for a model organism system database". Genome Res. 12 (10): 1599–610. doi:10.1101/gr.403602. PMC 187535. PMID 12368253.
Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; Grüning, B.; Guerler, A.; Hillman-Jackson, J.; Von Kuster, G.; Rasche, E.; Soranzo, N.; Turaga, N.; Taylor, J.; Nekrutenko, A.; Goecks, J. (8 July 2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research. 44 (W1): W3–W10. doi:10.1093/nar/gkw343. PMC 4987906. PMID 27137889.
Wang L; Wang S; Li Y; Paradesi MS; Brown SJ. (2007). "BeetleBase: the model organism database for Tribolium castaneum". Nucleic Acids Res. 35 (Database issue): D476–9. doi:10.1093/nar/gkl776. PMC 1669707. PMID 17090595.
Chisholm RL; Gaudet P; Just EM; Pilcher KE; Fey P; Merchant SN; Kibbe WA. (2006). "dictyBase, the model organism database for Dictyostelium discoideum". Nucleic Acids Res. 34 (Database issue): D423–7. doi:10.1093/nar/gkj090. PMC 1347453. PMID 16381903.
Arnaiz O; Cain S; Cohen J; Sperling L. (2007). "ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data". Nucleic Acids Res. 35 (Database issue): D439–44. doi:10.1093/nar/gkl777. PMC 1669747. PMID 17142227.
Colbourne JK; Singan VR; Gilbert DG. (2005). "wFleaBase: the Daphnia genome database". BMC Bioinformatics. 6: 45. doi:10.1186/1471-2105-6-45. PMC 555599. PMID 15752432.

External links

GMOD website

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Christopher J. Mungall; David B. Emmert; The FlyBase Consortium (2007). "A Chado case study: an ontology-based modular schema for representing genome-associated biological information". Bioinformatics. 23 (13): i337–i346. doi:10.1093/bioinformatics/btm189. PMID 17646315.

[2] Stein LD; Mungall C; Shu S; Caudy M; Mangone M; Day A; Nickerson E; Stajich JE; Harris TW; Arva A; Lewis S. (2002). "The generic genome browser: a building block for a model organism system database". Genome Res. 12 (10): 1599–610. doi:10.1101/gr.403602. PMC 187535. PMID 12368253.

[3] Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; Grüning, B.; Guerler, A.; Hillman-Jackson, J.; Von Kuster, G.; Rasche, E.; Soranzo, N.; Turaga, N.; Taylor, J.; Nekrutenko, A.; Goecks, J. (8 July 2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research. 44 (W1): W3–W10. doi:10.1093/nar/gkw343. PMC 4987906. PMID 27137889.

[4] Wang L; Wang S; Li Y; Paradesi MS; Brown SJ. (2007). "BeetleBase: the model organism database for Tribolium castaneum". Nucleic Acids Res. 35 (Database issue): D476–9. doi:10.1093/nar/gkl776. PMC 1669707. PMID 17090595.

[5] Chisholm RL; Gaudet P; Just EM; Pilcher KE; Fey P; Merchant SN; Kibbe WA. (2006). "dictyBase, the model organism database for Dictyostelium discoideum". Nucleic Acids Res. 34 (Database issue): D423–7. doi:10.1093/nar/gkj090. PMC 1347453. PMID 16381903.

[6] Arnaiz O; Cain S; Cohen J; Sperling L. (2007). "ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data". Nucleic Acids Res. 35 (Database issue): D439–44. doi:10.1093/nar/gkl777. PMC 1669747. PMID 17142227.

[7] Colbourne JK; Singan VR; Gilbert DG. (2005). "wFleaBase: the Daphnia genome database". BMC Bioinformatics. 6: 45. doi:10.1186/1471-2105-6-45. PMC 555599. PMID 15752432.