eTBLAST
eTBLAST is a now-defunct free text similarity service search engine originally developed by Alexander Pertsemlidis and Harold “Skip” Garner at The University of Texas Southwestern Medical Center. eTBLAST offered access to the MEDLINE database, the National Institutes of Health (NIH) CRISP database, the Institute of Physics (IOP) database, Wikipedia, arXiv, the NASA technical reports database, Virginia Tech class descriptions and a variety of databases of clinical interest. eTBLAST searched citation databases[1][2] and databases containing full text,[3] such as PUBMED and compared a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST was later offered as a web-based service of The Innovation Laboratory at the Virginia Bioinformatics Institute.
eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into an on-line database. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics,[4][5] Anaesthesia and Intensive Care,[6] Clinical Chemistry,[7] Urologic Oncology,[8] Nature,[9] and Science.[10]
The system is now called HelioBLAST and is offered — still free of charge — by Harold "Skip" Garner through his company HelioText. It is continuously expanding with additional text-based databases.
See also
- BLAST (Basic Local Alignment Search Tool)
- Natural language processing
- Medical literature retrieval
References
- Lewis, J; Ossowski, S; Hicks, J; Errami, M; Garner, HR (2006). "Text similarity: An alternative way to search MEDLINE". Bioinformatics. 22 (18): 2298–304. doi:10.1093/bioinformatics/btl388. PMID 16926219.
- Pertsemlidis, A; Garner, HR (2004). "Text comparison based on dynamic programming". IEEE Engineering in Medicine and Biology Magazine. 23 (6): 66–71. doi:10.1109/MEMB.2004.1378640. PMID 15688594.
- Sun, Z; Errami, M; Long, T; Renard, C; Choradia, N; Garner, H (2010). Curioso, Walter H (ed.). "Systematic Characterizations of Text Similarity in Full Text Biomedical Publications". PLOS ONE. 5 (9): e12704. Bibcode:2010PLoSO...512704S. doi:10.1371/journal.pone.0012704. PMC 2939881. PMID 20856807.
- Errami, M; Hicks, JM; Fisher, W; Trusty, D; Wren, JD; Long, TC; Garner, HR (2007). "Deja vu a study of duplicate citations in Medline". Bioinformatics. 24 (2): 243–9. doi:10.1093/bioinformatics/btm574. PMID 18056062.
- Errami, M; Sun, Z; George, AC; Long, TC; Skinner, MA; Wren, JD; Garner, HR (2010). "Identifying duplicate content using statistically improbable phrases". Bioinformatics. 26 (11): 1453–7. doi:10.1093/bioinformatics/btq146. PMC 2872002. PMID 20472545.
- Loadsman, JA; Garner, HR; Drummond, GB (2008). "Towards the elimination of duplication in Anaesthesia and Intensive Care". Anaesthesia and Intensive Care. 36 (5): 643–5. doi:10.1177/0310057X0803600502. PMID 18853580.
- George, AC; Long, TC; Garner, HR (2010). "Quaere Verum". Clinical Chemistry. 56 (4): 673–4. doi:10.1373/clinchem.2009.130468. PMID 20093558.
- Garner, HR (2011). "Combating unethical publications with plagiarism detection services". Urologic Oncology. 29 (1): 95–9. doi:10.1016/j.urolonc.2010.09.016. PMC 3035174. PMID 21194644.
- Errami, M; Garner, H (2008). "A tale of two citations". Nature. 451 (7177): 397–9. Bibcode:2008Natur.451..397E. doi:10.1038/451397a. PMID 18216832. S2CID 4358525.
- Long, TC; Errami, M; George, AC; Sun, Z; Garner, HR (2009). "Responding to Possible Plagiarism". Science. 323 (5919): 1293–4. doi:10.1126/science.1167408. PMID 19265004. S2CID 28467385.