List of datasets for machine-learning research

These datasets are used for machine-learning research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-quality training datasets.[1] High-quality labeled training datasets for supervised and semi-supervised machine learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality datasets for unsupervised learning can also be difficult and costly to produce.[2][3][4][5]

Image data

Datasets consisting primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces.

Dataset name	Brief description	Preprocessing	Instances	Format	Default task	Created (updated)	Reference	Creator
Aff-Wild	298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360)	the detected faces, facial landmarks and valence-arousal annotations	~1,250,000 manually annotated images	video (visual + audio modalities)	affect recognition (valence-arousal estimation)	2017	CVPR[6] IJCV[7]	D.Kollias et al.
Aff-Wild2	558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630)	the detected faces, detected and aligned faces and annotations	~2,800,000 manually annotated images	video (visual + audio modalities)	affect recognition (valence-arousal estimation, basic expression classification, action unit detection)	2019	BMVC[8] FG[9]	D.Kollias et al.
FERET (facial recognition technology)	11338 images of 1199 individuals in different positions and at different times.	None.	11,338	Images	Classification, face recognition	2003	[10][11]	United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)	7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.	Files labelled with expression. Perceptual validation ratings provided by 319 raters.	7,356	Video, sound files	Classification, face recognition, voice recognition	2018	[12][13]	S.R. Livingstone and F.A. Russo
SCFace	Color images of faces at various angles.	Location of facial features extracted. Coordinates of features given.	4,160	Images, text	Classification, face recognition	2011	[14][15]	M. Grgic et al.
Yale Face Database	Faces of 15 individuals in 11 different expressions.	Labels of expressions.	165	Images	Face recognition	1997	[16][17]	J. Yang et al.
Cohn-Kanade AU-Coded Expression Database	Large database of images with labels for expressions.	Tracking of certain facial features.	500+ sequences	Images, text	Facial expression analysis	2000	[18][19]	T. Kanade et al.
JAFFE Facial Expression Database	213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.	Images are cropped to the facial region. Includes semantic ratings data on emotion labels.	213	Images, text	Facial expression cognition	1998	[20][21]	Lyons, Kamachi, Gyoba
FaceScrub	Images of public figures scrubbed from image searching.	Name and m/f annotation.	107,818	Images, text	Face recognition	2014	[22][23]	H. Ng et al.
BioID Face Database	Images of faces with eye positions marked.	Manually set eye positions.	1521	Images, text	Face recognition	2001	[24][25]	BioID
Skin Segmentation Dataset	Randomly sampled color values from face images.	B, G, R, values extracted.	245,057	Text	Segmentation, classification	2012	[26][27]	R. Bhatt.
Bosphorus	3D Face image database.	34 action units and 6 expressions labeled; 24 facial landmarks labeled.	4652	Images, text	Face recognition, classification	2008	[28][29]	A Savran et al.
UOY 3D-Face	neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.	labeling.	5250	Images, text	Face recognition, classification	2004	[30][31]	University of York
CASIA 3D Face Database	Expressions: Anger, smile, laugh, surprise, closed eyes.	None.	4624	Images, text	Face recognition, classification	2007	[32][33]	Institute of Automation, Chinese Academy of Sciences
CASIA NIR	Expressions: Anger Disgust Fear Happiness Sadness Surprise	None.	480	Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second	Face recognition, classification	2011	[34]	Zhao, G. et al.
BU-3DFE	neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.	None.	2500	Images, text	Facial expression recognition, classification	2006	[35]	Binghamton University
Face Recognition Grand Challenge Dataset	Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.	None.	4007	Images, text	Face recognition, classification	2004	[36][37]	National Institute of Standards and Technology
Gavabdb	Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.	None.	549	Images, text	Face recognition, classification	2008	[38][39]	King Juan Carlos University
3D-RMA	Up to 100 subjects, expressions mostly neutral. Several poses as well.	None.	9971	Images, text	Face recognition, classification	2004	[40][41]	Royal Military Academy (Belgium)
SoF	112 persons (66 males and 46 females) wear glasses under different illumination conditions.	A set of synthetic filters (blur, occlusions, noise, and posterization ) with different level of difficulty.	42,592 (2,662 original image × 16 synthetic image)	Images, Mat file	Gender classification, face detection, face recognition, age estimation, and glasses detection	2017	[42][43]	Afifi, M. et al.
IMDB-WIKI	IMDB and Wikipedia face images with gender and age labels.	None	523,051	Images	Gender classification, face detection, face recognition, age estimation	2015	[44]	R. Rothe, R. Timofte, L. V. Gool

Action recognition

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
TV Human Interaction Dataset	Videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none.	None.	6,766 video clips	video clips	Action prediction	2013	[45]	Patron-Perez, A. et al.
Berkeley Multimodal Human Action Database (MHAD)	Recordings of a single person performing 12 actions	MoCap pre-processing	660 action samples	8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphones	Action classification	2013	[46]	Ofli, F. et al.
THUMOS Dataset	Large video dataset for action classification.	Actions classified and labeled.	45M frames of video	Video, images, text	Classification, action detection	2013	[47][48]	Y. Jiang et al.
MEXAction2	Video dataset for action localization and spotting	Actions classified and labeled.	1000	Video	Action detection	2014	[49]	Stoian et al.

Object detection and recognition

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Visual Genome	Images and their description		108,000	images, text	Image captioning	2016	[50]	R. Krishna et al.
Berkeley 3-D Object Dataset	849 images taken in 75 different scenes. About 50 different object classes are labeled.	Object bounding boxes and labeling.	849	labeled images, text	Object recognition	2014	[51][52]	A. Janoch et al.
Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)	500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300.	Each image segmented by five different subjects on average.	500	Segmented images	Contour detection and hierarchical image segmentation	2011	[53]	University of California, Berkeley
Microsoft Common Objects in Context (COCO)	complex everyday scenes of common objects in their natural context.	Object highlighting, labeling, and classification into 91 object types.	2,500,000	Labeled images, text	Object recognition	2015	[54][55]	T. Lin et al.
SUN Database	Very large scene and object recognition database.	Places and objects are labeled. Objects are segmented.	131,067	Images, text	Object recognition, scene recognition	2014	[56][57]	J. Xiao et al.
ImageNet	Labeled object image database, used in the ImageNet Large Scale Visual Recognition Challenge	Labeled objects, bounding boxes, descriptive words, SIFT features	14,197,122	Images, text	Object recognition, scene recognition	2009 (2014)	[58][59][60]	J. Deng et al.
Open Images	A Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes.	Image-level labels, Bounding boxes	9,178,275	Images, text	Classification, Object recognition	2017	[61]
TV News Channel Commercial Detection Dataset	TV commercials and news broadcasts.	Audio and video features extracted from still images.	129,685	Text	Clustering, classification	2015	[62][63]	P. Guha et al.
Statlog (Image Segmentation) Dataset	The instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel.	Many features calculated.	2310	Text	Classification	1990	[64]	University of Massachusetts
Caltech 101	Pictures of objects.	Detailed object outlines marked.	9146	Images	Classification, object recognition.	2003	[65][66]	F. Li et al.
Caltech-256	Large dataset of images for object classification.	Images categorized and hand-sorted.	30,607	Images, Text	Classification, object detection	2007	[67][68]	G. Griffin et al.
SIFT10M Dataset	SIFT features of Caltech-256 dataset.	Extensive SIFT feature extraction.	11,164,866	Text	Classification, object detection	2016	[69]	X. Fu et al.
LabelMe	Annotated pictures of scenes.	Objects outlined.	187,240	Images, text	Classification, object detection	2005	[70]	MIT Computer Science and Artificial Intelligence Laboratory
Cityscapes Dataset	Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.	Pixel-level segmentation and labeling	25,000	Images, text	Classification, object detection	2016	[71]	Daimler AG et al.
PASCAL VOC Dataset	Large number of images for classification tasks.	Labeling, bounding box included	500,000	Images, text	Classification, object detection	2010	[72][73]	M. Everingham et al.
CIFAR-10 Dataset	Many small, low-resolution, images of 10 classes of objects.	Classes labelled, training set splits created.	60,000	Images	Classification	2009	[59][74]	A. Krizhevsky et al.
CIFAR-100 Dataset	Like CIFAR-10, above, but 100 classes of objects are given.	Classes labelled, training set splits created.	60,000	Images	Classification	2009	[59][74]	A. Krizhevsky et al.
CINIC-10 Dataset	A unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10.	Classes labelled, training, validation, test set splits created.	270,000	Images	Classification	2018	[75]	Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Fashion-MNIST	A MNIST-like fashion product database	Classes labelled, training set splits created.	60,000	Images	Classification	2017	[76]	Zalando SE
notMNIST	Some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts.	Classes labelled, training set splits created.	500,000	Images	Classification	2011	[77]	Yaroslav Bulatov
German Traffic Sign Detection Benchmark Dataset	Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.	Signs manually labeled	900	Images	Classification	2013	[78][79]	S Houben et al.
KITTI Vision Benchmark Dataset	Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.	Many benchmarks extracted from data.	>100 GB of data	Images, text	Classification, object detection	2012	[80][81]	A Geiger et al.
Linnaeus 5 dataset	Images of 5 classes of objects.	Classes labelled, training set splits created.	8000	Images	Classification	2017	[82]	Chaladze & Kalatozishvili
FieldSAFE	Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.	Classes labelled geographically.	>400 GB of data	Images and 3D point clouds	Classification, object detection, object localization	2017	[83]	M. Kragh et al.
11K Hands	11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification.	None	11,076 hand images	Images and (.mat, .txt, and .csv) label files	Gender recognition and biometric identification	2017	[84]	M Afifi
CORe50	Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories.	Classes labelled, training set splits created based on a 3-way, multi-runs benchmark.	164,866 RBG-D images	images (.png or .pkl) and (.pkl, .txt, .tsv) label files	Classification, Object recognition	2017	[85]	V. Lomonaco and D. Maltoni
OpenLORIS-Object	Lifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly.	Classes labelled, training/validation/testing set splits created by benchmark scripts.	1,106,424 RBG-D images	images (.png and .pkl) and (.pkl) label files	Classification, Lifelong object recognition, Robotic Vision	2019	[86]	Q. She et al.
THz and thermal video data set	This multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes.	3D lookup tables are provided that allow you to project images onto 3D point clouds.	More than 20 videos. The duration of each video is about 85 seconds (about 345 frames).	AP2J	Experiments with hidden object detection	2019	[87][88]	Alexei A. Morozov and Olga S. Sushkova

Handwriting and character recognition

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Artificial Characters Dataset	Artificially generated data describing the structure of 10 capital English letters.	Coordinates of lines drawn given as integers. Various other features.	6000	Text	Handwriting recognition, classification	1992	[89]	H. Guvenir et al.
Letter Dataset	Upper case printed letters.	17 features are extracted from all images.	20,000	Text	OCR, classification	1991	[90][91]	D. Slate et al.
CASIA-HWDB	Offline handwritten Chinese character database. 3755 classes in the GB 2312 character set.	Gray-scaled images with background pixels labeled as 255.	1,172,907	Images, Text	Handwriting recognition, classification	2009	[92]	CASIA
CASIA-OLHWDB	Online handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the GB 2312 character set.	Provides the sequences of coordinates of strokes.	1,174,364	Images, Text	Handwriting recognition, classification	2009	[93][92]	CASIA
Character Trajectories Dataset	Labeled samples of pen tip trajectories for people writing simple characters.	3-dimensional pen tip velocity trajectory matrix for each sample	2858	Text	Handwriting recognition, classification	2008	[94][95]	B. Williams
Chars74K Dataset	Character recognition in natural images of symbols used in both English and Kannada		74,107		Character recognition, handwriting recognition, OCR, classification	2009	[96]	T. de Campos
UJI Pen Characters Dataset	Isolated handwritten characters	Coordinates of pen position as characters were written given.	11,640	Text	Handwriting recognition, classification	2009	[97][98]	F. Prat et al.
Gisette Dataset	Handwriting samples from the often-confused 4 and 9 characters.	Features extracted from images, split into train/test, handwriting images size-normalized.	13,500	Images, text	Handwriting recognition, classification	2003	[99]	Yann LeCun et al.
Omniglot dataset	1623 different handwritten characters from 50 different alphabets.	Hand-labeled.	38,300	Images, text, strokes	Classification, one-shot learning	2015	[100][101]	American Association for the Advancement of Science
MNIST database	Database of handwritten digits.	Hand-labeled.	60,000	Images, text	Classification	1998	[102][103]	National Institute of Standards and Technology
Optical Recognition of Handwritten Digits Dataset	Normalized bitmaps of handwritten data.	Size normalized and mapped to bitmaps.	5620	Images, text	Handwriting recognition, classification	1998	[104]	E. Alpaydin et al.
Pen-Based Recognition of Handwritten Digits Dataset	Handwritten digits on electronic pen-tablet.	Feature vectors extracted to be uniformly spaced.	10,992	Images, text	Handwriting recognition, classification	1998	[105][106]	E. Alpaydin et al.
Semeion Handwritten Digit Dataset	Handwritten digits from 80 people.	All handwritten digits have been normalized for size and mapped to the same grid.	1593	Images, text	Handwriting recognition, classification	2008	[107]	T. Srl
HASYv2	Handwritten mathematical symbols	All symbols are centered and of size 32px x 32px.	168233	Images, text	Classification	2017	[108]	Martin Thoma
Noisy Handwritten Bangla Dataset	Includes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast.	All images are centered and of size 32x32.	Numeral Dataset: 23330, Character Dataset: 76000	Images, text	Handwriting recognition, classification	2017	[109][110]	M. Karki et al.

Aerial images

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Aerial Image Segmentation Dataset	80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.	Images manually segmented.	80	Images	Aerial Classification, object detection	2013	[111][112]	J. Yuan et al.
KIT AIS Data Set	Multiple labeled training and evaluation datasets of aerial images of crowds.	Images manually labeled to show paths of individuals through crowds.	~ 150	Images with paths	People tracking, aerial tracking	2012	[113][114]	M. Butenuth et al.
Wilt Dataset	Remote sensing data of diseased trees and other land cover.	Various features extracted.	4899	Images	Classification, aerial object detection	2014	[115][116]	B. Johnson
MASATI dataset	Maritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions.	Object bounding boxes and labeling.	7389	Images	Classification, aerial object detection	2018	[117][118]	A.-J. Gallego et al.
Forest Type Mapping Dataset	Satellite imagery of forests in Japan.	Image wavelength bands extracted.	326	Text	Classification	2015	[119][120]	B. Johnson
Overhead Imagery Research Data Set	Annotated overhead imagery. Images with multiple objects.	Over 30 annotations and over 60 statistics that describe the target within the context of the image.	1000	Images, text	Classification	2009	[121][122]	F. Tanner et al.
SpaceNet	SpaceNet is a corpus of commercial satellite imagery and labeled training data.	GeoTiff and GeoJSON files containing building footprints.	>17533	Images	Classification, Object Identification	2017	[123][124][125]	DigitalGlobe, Inc.
UC Merced Land Use Dataset	These images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US.	This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class.	2,100	Image chips of 256x256, 30 cm (1 foot) GSD	Land cover classification	2010	[126]	Yi Yang and Shawn Newsam
SAT-4 Airborne Dataset	Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.	SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three.	500,000	Images	Classification	2015	[127][128]	S. Basu et al.
SAT-6 Airborne Dataset	Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.	SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies.	405,000	Images	Classification	2015	[127][128]	S. Basu et al.

Other images

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Density functional theory quantum simulations of graphene	Labelled images of raw input to a simulation of graphene	Raw data (in HDF5 format) and output labels from density functional theory quantum simulation	60744 test and 501473 and training files	Labeled images	Regression	2019	[129]	K. Mills & I. Tamblyn
Quantum simulations of an electron in a two dimensional potential well	Labelled images of raw input to a simulation of 2d Quantum mechanics	Raw data (in HDF5 format) and output labels from quantum simulation	1.3 million images	Labeled images	Regression	2017	[130]	K. Mills, M.A. Spanner, & I. Tamblyn
MPII Cooking Activities Dataset	Videos and images of various cooking activities.	Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling.	881,755 frames	Labeled video, images, text	Classification	2012	[131][132]	M. Rohrbach et al.
FAMOS Dataset	5,000 unique microstructures, all samples have been acquired 3 times with two different cameras.	Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition.	30,000	Images and .mat files	Authentication	2012	[133]	S. Voloshynovskiy, et al.
PharmaPack Dataset	1,000 unique classes with 54 images per class.	Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV).	54,000	Images and .mat files	Fine-grain classification	2017	[134]	O. Taran and S. Rezaeifar, et al.
Stanford Dogs Dataset	Images of 120 breeds of dogs from around the world.	Train/test splits and ImageNet annotations provided.	20,580	Images, text	Fine-grain classification	2011	[135][136]	A. Khosla et al.
StanfordExtra Dataset	2D keypoints and segmentations for the Stanford Dogs Dataset.	2D keypoints and segmentations provided.	12,035	Labelled images	3D reconstruction/pose estimation	2020	[137]	B. Biggs et al.
The Oxford-IIIT Pet Dataset	37 categories of pets with roughly 200 images of each.	Breed labeled, tight bounding box, foreground-background segmentation.	~ 7,400	Images, text	Classification, object detection	2012	[136][138]	O. Parkhi et al.
Corel Image Features Data Set	Database of images with features extracted.	Many features including color histogram, co-occurrence texture, and colormoments,	68,040	Text	Classification, object detection	1999	[139][140]	M. Ortega-Bindenberger et al.
Online Video Characteristics and Transcoding Time Dataset.	Transcoding times for various different videos and video properties.	Video features given.	168,286	Text	Regression	2015	[141]	T. Deneke et al.
Microsoft Sequential Image Narrative Dataset (SIND)	Dataset for sequential vision-to-language	Descriptive caption and storytelling given for each photo, and photos are arranged in sequences	81,743	Images, text	Visual storytelling	2016	[142]	Microsoft Research
Caltech-UCSD Birds-200-2011 Dataset	Large dataset of images of birds.	Part locations for birds, bounding boxes, 312 binary attributes given	11,788	Images, text	Classification	2011	[143][144]	C. Wah et al.
YouTube-8M	Large and diverse labeled video dataset	YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities	8 million	Video, text	Video classification	2016	[145][146]	S. Abu-El-Haija et al.
YFCC100M	Large and diverse labeled image and video dataset	Flickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags)	100 million	Video, Image, Text	Video and Image classification	2016	[147][148]	B. Thomee et al.
Discrete LIRIS-ACCEDE	Short videos annotated for valence and arousal.	Valence and arousal labels.	9800	Video	Video emotion elicitation detection	2015	[149]	Y. Baveye et al.
Continuous LIRIS-ACCEDE	Long videos annotated for valence and arousal while also collecting Galvanic Skin Response.	Valence and arousal labels.	30	Video	Video emotion elicitation detection	2015	[150]	Y. Baveye et al.
MediaEval LIRIS-ACCEDE	Extension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films.	Violence, valence and arousal labels.	10900	Video	Video emotion elicitation detection	2015	[151]	Y. Baveye et al.
Leeds Sports Pose	Articulated human pose annotations in 2000 natural sports images from Flickr.	Rough crop around single person of interest with 14 joint labels	2000	Images plus .mat file labels	Human pose estimation	2010	[152]	S. Johnson and M. Everingham
Leeds Sports Pose Extended Training	Articulated human pose annotations in 10,000 natural sports images from Flickr.	14 joint labels via crowdsourcing	10000	Images plus .mat file labels	Human pose estimation	2011	[153]	S. Johnson and M. Everingham
MCQ Dataset	6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems.	None	735 answer sheets and 33,540 answer boxes	Images and .mat file labels	Development of multiple choice test assessment systems	2017	[154][155]	Afifi, M. et al.
Surveillance Videos	Real surveillance videos cover a large surveillance time (7 days with 24 hours each).	None	19 surveillance videos (7 days with 24 hours each).	Videos	Data compression	2016	[156]	Taj-Eddin, I. A. T. F. et al.
LILA BC	Labeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science.	None	~10M images	Images	Classification	2019	[157]	LILA working group
Can We See Photosynthesis?	32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions.	None	32 videos	Videos	Liveness detection of plants	2017	[158]	Taj-Eddin, I. A. T. F. et al.

Text data

Datasets consisting primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis.

Reviews

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Amazon reviews	US product reviews from Amazon.com.	None.	~ 82M	Text	Classification, sentiment analysis	2015	[159]	McAuley et al.
OpinRank Review Dataset	Reviews of cars and hotels from Edmunds.com and TripAdvisor respectively.	None.	42,230 / ~259,000 respectively	Text	Sentiment analysis, clustering	2011	[160][161]	K. Ganesan et al.
MovieLens	22,000,000 ratings and 580,000 tags applied to 33,000 movies by 240,000 users.	None.	~ 22M	Text	Regression, clustering, classification	2016	[162]	GroupLens Research
Yahoo! Music User Ratings of Musical Artists	Over 10M ratings of artists by Yahoo users.	None described.	~ 10M	Text	Clustering, regression	2004	[163][164]	Yahoo!
Car Evaluation Data Set	Car properties and their overall acceptability.	Six categorical features given.	1728	Text	Classification	1997	[165][166]	M. Bohanec
YouTube Comedy Slam Preference Dataset	User vote data for pairs of videos shown on YouTube. Users voted on funnier videos.	Video metadata given.	1,138,562	Text	Classification	2012	[167][168]	Google
Skytrax User Reviews Dataset	User reviews of airlines, airports, seats, and lounges from Skytrax.	Ratings are fine-grain and include many aspects of airport experience.	41396	Text	Classification, regression	2015	[169]	Q. Nguyen
Teaching Assistant Evaluation Dataset	Teaching assistant reviews.	Features of each instance such as class, class size, and instructor are given.	151	Text	Classification	1997	[170][171]	W. Loh et al.
Vietnamese Students’ Feedback Corpus (UIT-VSFC)	Students’ Feedback.	Comments	16,000	Text	Classification	1997	[172]	Nguyen et al.
Vietnamese Social Media Emotion Corpus (UIT-VSMEC)	Users’ Facebook Comments.	Comments	6,927	Text	Classification	1997	[173]	Nguyen et al.

News articles

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
NYSK Dataset	English news articles about the case relating to allegations of sexual assault against the former IMF director Dominique Strauss-Kahn.	Filtered and presented in XML format.	10,421	XML, text	Sentiment analysis, topic extraction	2013	[174]	Dermouche, M. et al.
The Reuters Corpus Volume 1	Large corpus of Reuters news stories in English.	Fine-grain categorization and topic codes.	810,000	Text	Classification, clustering, summarization	2002	[175]	Reuters
The Reuters Corpus Volume 2	Large corpus of Reuters news stories in multiple languages.	Fine-grain categorization and topic codes.	487,000	Text	Classification, clustering, summarization	2005	[176]	Reuters
Thomson Reuters Text Research Collection	Large corpus of news stories.	Details not described.	1,800,370	Text	Classification, clustering, summarization	2009	[177]	T. Rose et al.
Saudi Newspapers Corpus	31,030 Arabic newspaper articles.	Metadata extracted.	31,030	JSON	Summarization, clustering	2015	[178]	M. Alhagri
RE3D (Relationship and Entity Extraction Evaluation Dataset)	Entity and Relation marked data from various news and government sources. Sponsored by Dstl	Filtered, categorisation using Baleen types	not known	JSON	Classification, Entity and Relation recognition	2017	[179]	Dstl
Examiner Spam Clickbait Catalogue	Clickbait, spam, crowd-sourced headlines from 2010 to 2015	Publish date and headlines	3,089,781	CSV	Clustering, Events, Sentiment	2016	[180]	R. Kulkarni
ABC Australia News Corpus	Entire news corpus of ABC Australia from 2003 to 2019	Publish date and headlines	1,186,018	CSV	Clustering, Events, Sentiment	2020	[181]	R. Kulkarni
Worldwide News - Aggregate of 20K Feeds	One week snapshot of all online headlines in 20+ languages	Publish time, URL and headlines	1,398,431	CSV	Clustering, Events, Language Detection	2018	[182]	R. Kulkarni
Reuters News Wire Headline	11 Years of timestamped events published on the news-wire	Publish time, Headline Text	16,121,310	CSV	NLP, Computational Linguistics, Events	2018	[183]	R. Kulkarni
The Irish Times Ireland News Corpus	24 Years of Ireland News from 1996 to 2019	Publish time, Headline Category and Text	1,484,340	CSV	NLP, Computational Linguistics, Events	2020	[184]	R. Kulkarni
News Headlines Dataset for Sarcasm Detection	High quality dataset with Sarcastic and Non-sarcastic news headlines.	Clean, normalized text	26,709	JSON	NLP, Classification, Linguistics	2018	[185]	Rishabh Misra

Messages

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Enron Email Dataset	Emails from employees at Enron organized into folders.	Attachments removed, invalid email addresses converted to [email protected] or [email protected].	~ 500,000	Text	Network analysis, sentiment analysis	2004 (2015)	[186][187]	Klimt, B. and Y. Yang
Ling-Spam Dataset	Corpus containing both legitimate and spam emails.	Four version of the corpus involving whether or not a lemmatiser or stop-list was enabled.	2,412 Ham 481 Spam	Text	Classification	2000	[188][189]	Androutsopoulos, J. et al.
SMS Spam Collection Dataset	Collected SMS spam messages.	None.	5,574	Text	Classification	2011	[190][191]	T. Almeida et al.
Twenty Newsgroups Dataset	Messages from 20 different newsgroups.	None.	20,000	Text	Natural language processing	1999	[192]	T. Mitchell et al.
Spambase Dataset	Spam emails.	Many text features extracted.	4,601	Text	Spam detection, classification	1999	[193]	M. Hopkins et al.
ColBERT Dataset	Short jokes.	Outliers removed.	200,000	Text	Humor detection, classification	2020	[194]	I. Annamoradnejad.

Twitter and tweets

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
MovieTweetings	Movie rating dataset based on public and well-structured tweets		~710,000	Text	Classification, regression	2018	[195]	S. Dooms
Twitter100k	Pairs of images and tweets		100,000	Text and Images	Cross-media retrieval	2017	[196][197]	Y. Hu, et al.
Sentiment140	Tweet data from 2009 including original text, time stamp, user and sentiment.	Classified using distant supervision from presence of emoticon in tweet.	1,578,627	Tweets, comma, separated values	Sentiment analysis	2009	[198][199]	A. Go et al.
ASU Twitter Dataset	Twitter network data, not actual tweets. Shows connections between a large number of users.	None.	11,316,811 users, 85,331,846 connections	Text	Clustering, graph analysis	2009	[200][201]	R. Zafarani et al.
SNAP Social Circles: Twitter Database	Large Twitter network data.	Node features, circles, and ego networks.	1,768,149	Text	Clustering, graph analysis	2012	[202][203]	J. McAuley et al.
Twitter Dataset for Arabic Sentiment Analysis	Arabic tweets.	Samples hand-labeled as positive or negative.	2000	Text	Classification	2014	[204][205]	N. Abdulla
Buzz in Social Media Dataset	Data from Twitter and Tom's Hardware. This dataset focuses on specific buzz topics being discussed on those sites.	Data is windowed so that the user can attempt to predict the events leading up to social media buzz.	140,000	Text	Regression, Classification	2013	[206][207]	F. Kawala et al.
Paraphrase and Semantic Similarity in Twitter (PIT)	This dataset focuses on whether tweets have (almost) same meaning/information or not. Manually labeled.	tokenization, part-of-speech and named entity tagging	18,762	Text	Regression, Classification	2015	[208][209]	Xu et al.
Geoparse Twitter benchmark dataset	This dataset contains tweets during different news events in different countries. Manually labeled location mentions.	location annotations added to JSON metadata	6,386	Tweets, JSON	Classification, Information Extraction	2014	[210][211]	S.E. Middleton et al.
Dutch Social media collection	This dataset contains Covid-19 tweets made by Dutch speakers or users from Netherlands. The data has been machine-annotated	classified for sentiment, tweet text & user description translated to English. Industry mention are extracted	271,342	JSONL	Sentiment, multi-label classification, machine translation	2020	[212] [213] [214]	Aaaksh Gupta, CoronaWhy

Dialogues

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
NPS Chat Corpus	Posts from age-specific online chat rooms.	Hand privacy masked, tagged for part of speech and dialogue-act.	~ 500,000	XML	NLP, programming, linguistics	2007	[215]	Forsyth, E., Lin, J., & Martell, C.
Twitter Triple Corpus	A-B-A triples extracted from Twitter.		4,232	Text	NLP	2016	[216]	Sordini, A. et al.
UseNet Corpus	UseNet forum postings.	Anonymized e-mails and URLs. Omitted documents with lengths <500 words or >500,000 words, or that were <90% English.	7 billion	Text		2011	[217]	Shaoul, C., & Westbury C.
NUS SMS Corpus	SMS messages collected between two users, with timing analysis.		~ 10,000	XML	NLP	2011	[218]	KAN, M
Reddit All Comments Corpus	All Reddit comments (as of 2015).		~ 1.7 billion	JSON	NLP, research	2015	[219]	Stuck_In_the_Matrix
Ubuntu Dialogue Corpus	Dialogues extracted from Ubuntu chat stream on IRC.			CSV	Dialogue Systems Research	2015	[220]	Lowe, R. et al.

Other text

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Web of Science Dataset	Hierarchical Datasets for Text Classification	None.	46,985	Text	Classification, Categorization	2017	[221][222]	K. Kowsari et al.
Legal Case Reports	Federal Court of Australia cases from 2006 to 2009.	None.	4,000	Text	Summarization, citation analysis	2012	[223][224]	F. Galgani et al.
Blogger Authorship Corpus	Blog entries of 19,320 people from blogger.com.	Blogger self-provided gender, age, industry, and astrological sign.	681,288	Text	Sentiment analysis, summarization, classification	2006	[225][226]	J. Schler et al.
Social Structure of Facebook Networks	Large dataset of the social structure of Facebook.	None.	100 colleges covered	Text	Network analysis, clustering	2012	[227][228]	A. Traud et al.
Dataset for the Machine Comprehension of Text	Stories and associated questions for testing comprehension of text.	None.	660	Text	Natural language processing, machine comprehension	2013	[229][230]	M. Richardson et al.
The Penn Treebank Project	Naturally occurring text annotated for linguistic structure.	Text is parsed into semantic trees.	~ 1M words	Text	Natural language processing, summarization	1995	[231][232]	M. Marcus et al.
DEXTER Dataset	Task given is to determine, from features given, which articles are about corporate acquisitions.	Features extracted include word stems. Distractor features included.	2600	Text	Classification	2008	[233]	Reuters
Google Books N-grams	N-grams from a very large corpus of books	None.	2.2 TB of text	Text	Classification, clustering, regression	2011	[234][235]	Google
Personae Corpus	Collected for experiments in Authorship Attribution and Personality Prediction. Consists of 145 Dutch-language essays.	In addition to normal texts, syntactically annotated texts are given.	145	Text	Classification, regression	2008	[236][237]	K. Luyckx et al.
CNAE-9 Dataset	Categorization task for free text descriptions of Brazilian companies.	Word frequency has been extracted.	1080	Text	Classification	2012	[238][239]	P. Ciarelli et al.
Sentiment Labeled Sentences Dataset	3000 sentiment labeled sentences.	Sentiment of each sentence has been hand labeled as positive or negative.	3000	Text	Classification, sentiment analysis	2015	[240][241]	D. Kotzias
BlogFeedback Dataset	Dataset to predict the number of comments a post will receive based on features of that post.	Many features of each post extracted.	60,021	Text	Regression	2014	[242][243]	K. Buza
Stanford Natural Language Inference (SNLI) Corpus	Image captions matched with newly constructed sentences to form entailment, contradiction, or neutral pairs.	Entailment class labels, syntactic parsing by the Stanford PCFG parser	570,000	Text	Natural language inference/recognizing textual entailment	2015	[244]	S. Bowman et al.
DSL Corpus Collection (DSLCC)	A multilingual collection of short excerpts of journalistic texts in similar languages and dialects.	None	294,000 phrases	Text	Discriminating between similar languages	2017	[245]	Tan, Liling et al.
Urban Dictionary Dataset	Corpus of words, votes and definitions	User names anonymised	2,580,925	CSV	NLP, Machine comprehension	2016 May	[246]	Anonymous
T-REx	Wikipedia abstracts aligned with Wikidata entities	Alignment of Wikidata triples with Wikipedia abstracts	11M aligned triples	JSON and NIF	NLP, Relation Extraction	2018	[247]	H. Elsahar et al.
General Language Understanding Evaluation (GLUE)	Benchmark of nine tasks	Various	~1M sentences and sentence pairs		NLU	2018	[248][249]	Wang et al.
Atticus Open Contract Dataset (AOK)	Dataset of legal contracts with rich expert annotations		~3,000 labels	CSV and PDF	Natural language processing, QnA	2020		The Atticus Project
Vietnamese Image Captioning Dataset (UIT-ViIC)	Vietnamese Image Captioning Dataset		19,250 captions for 3,850 images	CSV and PDF	Natural language processing, Computer vision	2020	[250]	Lam et al.
Vietnamese Names annotated with Genders (UIT-ViNames)	Vietnamese Names annotated with Genders		26,850 Vietnamese full names annotated with genders	CSV	Natural language processing	2020	[251]	To et al.

Sound data

Datasets of sounds and sound features.

Speech

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Zero Resource Speech Challenge 2015	Spontaneous speech (English), Read speech (Xitsonga).	raw wav	English: 5h, 12 speakers; Xitsonga: 2h30; 24 speakers	sound	Unsupervised discovery of speech features/subword units/word units	2015	[252][253]	Versteegh et al.
Parkinson Speech Dataset	Multiple recordings of people with and without Parkinson's Disease.	Voice features extracted, disease scored by physician using unified Parkinson's disease rating scale	1,040	Text	Classification, regression	2013	[254][255]	B. E. Sakar et al.
Spoken Arabic Digits	Spoken Arabic digits from 44 male and 44 female.	Time-series of mel-frequency cepstrum coefficients.	8,800	Text	Classification	2010	[256][257]	M. Bedda et al.
ISOLET Dataset	Spoken letter names.	Features extracted from sounds.	7797	Text	Classification	1994	[258][259]	R. Cole et al.
Japanese Vowels Dataset	Nine male speakers uttered two Japanese vowels successively.	Applied 12-degree linear prediction analysis to it to obtain a discrete-time series with 12 cepstrum coefficients.	640	Text	Classification	1999	[260][261]	M. Kudo et al.
Parkinson's Telemonitoring Dataset	Multiple recordings of people with and without Parkinson's Disease.	Sound features extracted.	5875	Text	Classification	2009	[262][263]	A. Tsanas et al.
TIMIT	Recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences.	Speech is lexically and phonemically transcribed.	6300	Text	Speech recognition, classification.	1986	[264][265]	J. Garofolo et al.
Arabic Speech Corpus	A single-speaker, Modern Standard Arabic (MSA) speech corpus with phonetic and orthographic transcripts aligned to phoneme level	Speech is orthographically and phonetically transcribed with stress marks.	~1900	Text, WAV	Speech Synthesis, Speech Recognition, Corpus Alignment, Speech Therapy, Education.	2016	[266]	N. Halabi
Common Voice	A public domain database of crowdsourced data across a wide range of dialects.	Validation by other users	English: 1,118 hours	MP3 with corresponding text files	Speech recognition	June 2017 (December 2019)	[267]	Mozilla

Music

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Geographic Origin of Music Data Set	Audio features of music samples from different locations.	Audio features extracted using MARSYAS software.	1,059	Text	Geographic classification, clustering	2014	[268][269]	F. Zhou et al.
Million Song Dataset	Audio features from one million different songs.	Audio features extracted.	1M	Text	Classification, clustering	2011	[270][271]	T. Bertin-Mahieux et al.
MUSDB18	Multi-track popular music recordings	Raw audio	150	MP4, WAV	Source Separation	2017	[272]	Z. Rafii et al.
Free Music Archive	Audio under Creative Commons from 100k songs (343 days, 1TiB) with a hierarchy of 161 genres, metadata, user data, free-form text.	Raw audio and audio features.	106,574	Text, MP3	Classification, recommendation	2017	[273]	M. Defferrard et al.
Bach Choral Harmony Dataset	Bach chorale chords.	Audio features extracted.	5665	Text	Classification	2014	[274][275]	D. Radicioni et al.

Other sounds

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
UrbanSound	Labeled sound recordings of sounds like air conditioners, car horns and children playing.	Sorted into folders by class of events as well as metadata in a JSON file and annotations in a CSV file.	1,059	Sound (WAV)	Classification	2014	[276][277]	J. Salamon et al.
AudioSet	10-second sound snippets from YouTube videos, and an ontology of over 500 labels.	128-d PCA'd VGG-ish features every 1 second.	2,084,320	Text (CSV) and TensorFlow Record files	Classification	2017	[278]	J. Gemmeke et al., Google
Bird Audio Detection challenge	Audio from environmental monitoring stations, plus crowdsourced recordings		17,000+		Classification	2016 (2018)	[279][280]	Queen Mary University and IEEE Signal Processing Society
WSJ0 Hipster Ambient Mixtures	Audio from WSJ0 mixed with noise recorded in the San Francisco Bay Area	Noise clips matched to WSJ0 clips	28,000	Sound (WAV)	Audio source separation	2019	[281]	Wichern, G., et al., Whisper and MERL
Clotho	4,981 audio samples of 15 to 30 seconds long, each audio sample having five different captions of eight to 20 words long.		24,905	Sound (WAV) and text (CSV)	Automated audio captioning	2020	[282][283]	K. Drossos, S. Lipping, and T. Virtanen

Signal data

Datasets containing electric signal information requiring some sort of Signal processing for further analysis.

Electrical

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Witty Worm Dataset	Dataset detailing the spread of the Witty worm and the infected computers.	Split into a publicly available set and a restricted set containing more sensitive information like IP and UDP headers.	55,909 IP addresses	Text	Classification	2004	[284][285]	Center for Applied Internet Data Analysis
Cuff-Less Blood Pressure Estimation Dataset	Cleaned vital signals from human patients which can be used to estimate blood pressure.	125 Hz vital signs have been cleaned.	12,000	Text	Classification, regression	2015	[286][287]	M. Kachuee et al.
Gas Sensor Array Drift Dataset	Measurements from 16 chemical sensors utilized in simulations for drift compensation.	Extensive number of features given.	13,910	Text	Classification	2012	[288][289]	A. Vergara
Servo Dataset	Data covering the nonlinear relationships observed in a servo-amplifier circuit.	Levels of various components as a function of other components are given.	167	Text	Regression	1993	[290][291]	K. Ullrich
UJIIndoorLoc-Mag Dataset	Indoor localization database to test indoor positioning systems. Data is magnetic field based.	Train and test splits given.	40,000	Text	Classification, regression, clustering	2015	[292][293]	D. Rambla et al.
Sensorless Drive Diagnosis Dataset	Electrical signals from motors with defective components.	Statistical features extracted.	58,508	Text	Classification	2015	[294][295]	M. Bator

Motion-tracking

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Wearable Computing: Classification of Body Postures and Movements (PUC-Rio)	People performing five standard actions while wearing motion trackers.	None.	165,632	Text	Classification	2013	[296][297]	Pontifical Catholic University of Rio de Janeiro
Gesture Phase Segmentation Dataset	Features extracted from video of people doing various gestures.	Features extracted aim at studying gesture phase segmentation.	9900	Text	Classification, clustering	2014	[298][299]	R. Madeo et a
Vicon Physical Action Data Set Dataset	10 normal and 10 aggressive physical actions that measure the human activity tracked by a 3D tracker.	Many parameters recorded by 3D tracker.	3000	Text	Classification	2011	[300][301]	T. Theodoridis
Daily and Sports Activities Dataset	Motor sensor data for 19 daily and sports activities.	Many sensors given, no preprocessing done on signals.	9120	Text	Classification	2013	[302][303]	B. Barshan et al.
Human Activity Recognition Using Smartphones Dataset	Gyroscope and accelerometer data from people wearing smartphones and performing normal actions.	Actions performed are labeled, all signals preprocessed for noise.	10,299	Text	Classification	2012	[304][305]	J. Reyes-Ortiz et al.
Australian Sign Language Signs	Australian sign language signs captured by motion-tracking gloves.	None.	2565	Text	Classification	2002	[306][307]	M. Kadous
Weight Lifting Exercises monitored with Inertial Measurement Units	Five variations of the biceps curl exercise monitored with IMUs.	Some statistics calculated from raw data.	39,242	Text	Classification	2013	[308][309]	W. Ugulino et al.
sEMG for Basic Hand movements Dataset	Two databases of surface electromyographic signals of 6 hand movements.	None.	3000	Text	Classification	2014	[310][311]	C. Sapsanis et al.
REALDISP Activity Recognition Dataset	Evaluate techniques dealing with the effects of sensor displacement in wearable activity recognition.	None.	1419	Text	Classification	2014	[311][312]	O. Banos et al.
Heterogeneity Activity Recognition Dataset	Data from multiple different smart devices for humans performing various activities.	None.	43,930,257	Text	Classification, clustering	2015	[313][314]	A. Stisen et al.
Indoor User Movement Prediction from RSS Data	Temporal wireless network data that can be used to track the movement of people in an office.	None.	13,197	Text	Classification	2016	[315][316]	D. Bacciu
PAMAP2 Physical Activity Monitoring Dataset	18 different types of physical activities performed by 9 subjects wearing 3 IMUs.	None.	3,850,505	Text	Classification	2012	[317]	A. Reiss
OPPORTUNITY Activity Recognition Dataset	Human Activity Recognition from wearable, object, and ambient sensors is a dataset devised to benchmark human activity recognition algorithms.	None.	2551	Text	Classification	2012	[318][319]	D. Roggen et al.
Real World Activity Recognition Dataset	Human Activity Recognition from wearable devices. Distinguishes between seven on-body device positions and comprises six different kinds of sensors.	None.	3,150,000 (per sensor)	Text	Classification	2016	[320]	T. Sztyler et al.
Toronto Rehab Stroke Pose Dataset	3D human pose estimates (Kinect) of stroke patients and healthy participants performing a set of tasks using a stroke rehabilitation robot.	None.	10 healthy person and 9 stroke survivors (3500-6000 frames per person)	CSV	Classification	2017	[321][322][323]	E. Dolatabadi et al.
Corpus of Social Touch (CoST)	7805 gesture captures of 14 different social touch gestures performed by 31 subjects. The gestures were performed in three variations: gentle, normal and rough, on a pressure sensor grid wrapped around a mannequin arm.	Touch gestures performed are segmented and labeled.	7805 gesture captures	CSV	Classification	2016	[324][325]	M. Jung et al.

Other signals

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Wine Dataset	Chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.	13 properties of each wine are given	178	Text	Classification, regression	1991	[326][327]	M. Forina et al.
Combined Cycle Power Plant Data Set	Data from various sensors within a power plant running for 6 years.	None	9568	Text	Regression	2014	[328][329]	P. Tufekci et al.

Physical data

Datasets from physical systems.

High-energy physics

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
HIGGS Dataset	Monte Carlo simulations of particle accelerator collisions.	28 features of each collision are given.	11M	Text	Classification	2014	[330][331][332]	D. Whiteson
HEPMASS Dataset	Monte Carlo simulations of particle accelerator collisions. Goal is to separate the signal from noise.	28 features of each collision are given.	10,500,000	Text	Classification	2016	[331][332][333]	D. Whiteson

Systems

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Yacht Hydrodynamics Dataset	Yacht performance based on dimensions.	Six features are given for each yacht.	308	Text	Regression	2013	[334][335]	R. Lopez
Robot Execution Failures Dataset	5 data sets that center around robotic failure to execute common tasks.	Integer valued features such as torque and other sensor measurements.	463	Text	Classification	1999	[336]	L. Seabra et al.
Pittsburgh Bridges Dataset	Design description is given in terms of several properties of various bridges.	Various bridge features are given.	108	Text	Classification	1990	[337][338]	Y. Reich et al.
Automobile Dataset	Data about automobiles, their insurance risk, and their normalized losses.	Car features extracted.	205	Text	Regression	1987	[339][340]	J. Schimmer et al.
Auto MPG Dataset	MPG data for cars.	Eight features of each car given.	398	Text	Regression	1993	[341]	Carnegie Mellon University
Energy Efficiency Dataset	Heating and cooling requirements given as a function of building parameters.	Building parameters given.	768	Text	Classification, regression	2012	[342][343]	A. Xifara et al.
Airfoil Self-Noise Dataset	A series of aerodynamic and acoustic tests of two and three-dimensional airfoil blade sections.	Data about frequency, angle of attack, etc., are given.	1503	Text	Regression	2014	[344]	R. Lopez
Challenger USA Space Shuttle O-Ring Dataset	Attempt to predict O-ring problems given past Challenger data.	Several features of each flight, such as launch temperature, are given.	23	Text	Regression	1993	[345][346]	D. Draper et al.
Statlog (Shuttle) Dataset	NASA space shuttle datasets.	Nine features given.	58,000	Text	Classification	2002	[347]	NASA

Astronomy

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Volcanoes on Venus – JARtool experiment Dataset	Venus images returned by the Magellan spacecraft.	Images are labeled by humans.	not given	Images	Classification	1991	[348][349]	M. Burl
MAGIC Gamma Telescope Dataset	Monte Carlo generated high-energy gamma particle events.	Numerous features extracted from the simulations.	19,020	Text	Classification	2007	[349][350]	R. Bock
Solar Flare Dataset	Measurements of the number of certain types of solar flare events occurring in a 24-hour period.	Many solar flare-specific features are given.	1389	Text	Regression, classification	1989	[351]	G. Bradshaw

Earth science

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Volcanoes of the World	Volcanic eruption data for all known volcanic events on earth.	Details such as region, subregion, tectonic setting, dominant rock type are given.	1535	Text	Regression, classification	2013	[352]	E. Venzke et al.
Seismic-bumps Dataset	Seismic activities from a coal mine.	Seismic activity was classified as hazardous or not.	2584	Text	Classification	2013	[353][354]	M. Sikora et al.

Other physical

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Concrete Compressive Strength Dataset	Dataset of concrete properties and compressive strength.	Nine features are given for each sample.	1030	Text	Regression	2007	[355][356]	I. Yeh
Concrete Slump Test Dataset	Concrete slump flow given in terms of properties.	Features of concrete given such as fly ash, water, etc.	103	Text	Regression	2009	[357][358]	I. Yeh
Musk Dataset	Predict if a molecule, given the features, will be a musk or a non-musk.	168 features given for each molecule.	6598	Text	Classification	1994	[359]	Arris Pharmaceutical Corp.
Steel Plates Faults Dataset	Steel plates of 7 different types.	27 features given for each sample.	1941	Text	Classification	2010	[360]	Semeion Research Center

Biological data

Datasets from biological systems.

Human

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
EEG Database	Study to examine EEG correlates of genetic predisposition to alcoholism.	Measurements from 64 electrodes placed on the scalp sampled at 256 Hz (3.9 ms epoch) for 1 second.	122	Text	Classification	1999	[361]	H. Begleiter
P300 Interface Dataset	Data from nine subjects collected using P300-based brain-computer interface for disabled subjects.	Split into four sessions for each subject. MATLAB code given.	1,224	Text	Classification	2008	[362][363]	U. Hoffman et al.
Heart Disease Data Set	Attributed of patients with and without heart disease.	75 attributes given for each patient with some missing values.	303	Text	Classification	1988	[364][365]	A. Janosi et al.
Breast Cancer Wisconsin (Diagnostic) Dataset	Dataset of features of breast masses. Diagnoses by physician is given.	10 features for each sample are given.	569	Text	Classification	1995	[366][367]	W. Wolberg et al.
National Survey on Drug Use and Health	Large scale survey on health and drug use in the United States.	None.	55,268	Text	Classification, regression	2012	[368]	United States Department of Health and Human Services
Lung Cancer Dataset	Lung cancer dataset without attribute definitions	56 features are given for each case	32	Text	Classification	1992	[369][370]	Z. Hong et al.
Arrhythmia Dataset	Data for a group of patients, of which some have cardiac arrhythmia.	276 features for each instance.	452	Text	Classification	1998	[371][372]	H. Altay et al.
Diabetes 130-US hospitals for years 1999–2008 Dataset	9 years of readmission data across 130 US hospitals for patients with diabetes.	Many features of each readmission are given.	100,000	Text	Classification, clustering	2014	[373][374]	J. Clore et al.
Diabetic Retinopathy Debrecen Dataset	Features extracted from images of eyes with and without diabetic retinopathy.	Features extracted and conditions diagnosed.	1151	Text	Classification	2014	[375][376]	B. Antal et al.
Diabetic Retinopathy Messidor Dataset	Methods to evaluate segmentation and indexing techniques in the field of retinal ophthalmology (MESSIDOR)	Features retinopathy grade and risk of macular edema	1200	Images, Text	Classification, Segmentation	2008	[377][378]	Messidor Project
Liver Disorders Dataset	Data for people with liver disorders.	Seven biological features given for each patient.	345	Text	Classification	1990	[379][380]	Bupa Medical Research Ltd.
Thyroid Disease Dataset	10 databases of thyroid disease patient data.	None.	7200	Text	Classification	1987	[381][382]	R. Quinlan
Mesothelioma Dataset	Mesothelioma patient data.	Large number of features, including asbestos exposure, are given.	324	Text	Classification	2016	[383][384]	A. Tanrikulu et al.
Parkinson's Vision-Based Pose Estimation Dataset	2D human pose estimates of Parkinson's patients performing a variety of tasks.	Camera shake has been removed from trajectories.	134	Text	Classification, regression	2017	[385][386][387]	M. Li et al.
KEGG Metabolic Reaction Network (Undirected) Dataset	Network of metabolic pathways. A reaction network and a relation network are given.	Detailed features for each network node and pathway are given.	65,554	Text	Classification, clustering, regression	2011	[388]	M. Naeem et al.
Modified Human Sperm Morphology Analysis Dataset (MHSMA)	Human sperm images from 235 patients with male factor infertility, labeled for normal or abnormal sperm acrosome, head, vacuole, and tail.	Cropped around single sperm head. Magnification normalized. Training, validation, and test set splits created.	1,540	.npy files	Classification	2019	[389][390]	S. Javadi and S.A. Mirroshandel

Animal

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Abalone Dataset	Physical measurements of Abalone. Weather patterns and location are also given.	None.	4177	Text	Regression	1995	[391]	Marine Research Laboratories – Taroona
Zoo Dataset	Artificial dataset covering 7 classes of animals.	Animals are classed into 7 categories and features are given for each.	101	Text	Classification	1990	[392]	R. Forsyth
Demospongiae Dataset	Data about marine sponges.	503 sponges in the Demosponge class are described by various features.	503	Text	Classification	2010	[393]	E. Armengol et al.
Splice-junction Gene Sequences Dataset	Primate splice-junction gene sequences (DNA) with associated imperfect domain theory.	None.	3190	Text	Classification	1992	[370]	G. Towell et al.
Mice Protein Expression Dataset	Expression levels of 77 proteins measured in the cerebral cortex of mice.	None.	1080	Text	Classification, Clustering	2015	[394][395]	C. Higuera et al.

Plant

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Forest Fires Dataset	Forest fires and their properties.	13 features of each fire are extracted.	517	Text	Regression	2008	[396][397]	P. Cortez et al.
Iris Dataset	Three types of iris plants are described by 4 different attributes.	None.	150	Text	Classification	1936	[398][399]	R. Fisher
Plant Species Leaves Dataset	Sixteen samples of leaf each of one-hundred plant species.	Shape descriptor, fine-scale margin, and texture histograms are given.	1600	Text	Classification	2012	[400][401]	J. Cope et al.
Mushroom Dataset	Mushroom attributes and classification.	Many properties of each mushroom are given.	8124	Text	Classification	1987	[402]	J. Schlimmer
Soybean Dataset	Database of diseased soybean plants.	35 features for each plant are given. Plants are classified into 19 categories.	307	Text	Classification	1988	[403]	R. Michalski et al.
Seeds Dataset	Measurements of geometrical properties of kernels belonging to three different varieties of wheat.	None.	210	Text	Classification, clustering	2012	[404][405]	Charytanowicz et al.
Covertype Dataset	Data for predicting forest cover type strictly from cartographic variables.	Many geographical features given.	581,012	Text	Classification	1998	[406][407]	J. Blackard et al.
Abscisic Acid Signaling Network Dataset	Data for a plant signaling network. Goal is to determine set of rules that governs the network.	None.	300	Text	Causal-discovery	2008	[408]	J. Jenkens et al.
Folio Dataset	20 photos of leaves for each of 32 species.	None.	637	Images, text	Classification, clustering	2015	[409][410]	T. Munisami et al.
Oxford Flower Dataset	17 category dataset of flowers.	Train/test splits, labeled images,	1360	Images, text	Classification	2006	[138][411]	M-E Nilsback et al.
Plant Seedlings Dataset	12 category dataset of plant seedlings.	Labelled images, segmented images,	5544	Images	Classification, detection	2017	[412]	Giselsson et al.
Fruits 360 dataset	Database with images of 120 fruits and vegetables.	100x100 pixels, White background.	82213	Images (jpg)	Classification	2017-2019	[413][414]	Mihai Oltean, Horea Muresan

Microbe

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Ecoli Dataset	Protein localization sites.	Various features of the protein localizations sites are given.	336	Text	Classification	1996	[415][416]	K. Nakai et al.
MicroMass Dataset	Identification of microorganisms from mass-spectrometry data.	Various mass spectrometer features.	931	Text	Classification	2013	[417][418]	P. Mahe et al.
Yeast Dataset	Predictions of Cellular localization sites of proteins.	Eight features given per instance.	1484	Text	Classification	1996	[419][420]	K. Nakai et al.

Drug Discovery

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Tox21 Dataset	Prediction of outcome of biological assays.	Chemical descriptors of molecules are given.	12707	Text	Classification	2016	[421]	A. Mayr et al.

Anomaly data

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Numenta Anomaly Benchmark (NAB)	Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted.	None	50+ files	Comma separated values	Anomaly detection	2016 (continually updated)	[422]	Numenta
Skoltech Anomaly Benchmark (SKAB)	Each file represents a single experiment and contains a single anomaly. The dataset represents a multivariate time series collected from the sensors installed on the testbed.	There are two markups for Outlier detection (point anomalies) and Changepoint detection (collective anomalies) problems	30+ files (v0.9)	Comma separated values	Anomaly detection	2020 (continually updated)	[423] [424]	Iurii D. Katser and Vyacheslav O. Kozitsin
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study	Most data files are adapted from UCI Machine Learning Repository data, some are collected from the literature.	treated for missing values, numerical attributes only, different percentages of anomalies, labels	1000+ files	ARFF	Anomaly detection	2016 (possibly updated with new datasets and/or results)	[425]	Campos et al.

Question Answering data

This section includes datasets that deals with structured data.

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
DBpedia Neural Question Answering (DBNQA) Dataset	A large collection of Question to SPARQL specially design for Open Domain Neural Question Answering over DBpedia Knowledgebase.	This dataset contains a large collection of Open Neural SPARQL Templates and instances for training Neural SPARQL Machines; it was pre-processed by semi-automatic annotation tools as well as by three SPARQL experts.	894,499	Question-query pairs	Question Answering	2018	[426][427]	Hartmann, Soru, and Marx et al.
Vietnamese Question Answering Dataset (UIT-ViQuAD)	A large collection of Vietnamese questions for evaluating MRC models.	This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia.	23,074	Question-answer pairs	Question Answering	2020	[428]	Nguyen et al.
Vietnamese Multiple-Choice Machine Reading Comprehension Corpus(ViMMRC)	A collection of Vietnamese multiple-choice questions for evaluating MRC models.	This corpus includes 2,783 Vietnamese multiple-choice questions.	2,783	Question-answer pairs	Question Answering/Machine Reading Comprehension	2020	[429]	Nguyen et al.

Multivariate data

Datasets consisting of rows of observations and columns of attributes characterizing those observations. Typically used for regression analysis or classification but other types of algorithms can also be used. This section includes datasets that do not fit in the above categories.

Financial

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Dow Jones Index	Weekly data of stocks from the first and second quarters of 2011.	Calculated values included such as percentage change and a lags.	750	Comma separated values	Classification, regression, Time series	2014	[430][431]	M. Brown et al.
Statlog (Australian Credit Approval)	Credit card applications either accepted or rejected and attributes about the application.	Attribute names are removed as well as identifying information. Factors have been relabeled.	690	Comma separated values	Classification	1987	[432][433]	R. Quinlan
eBay auction data	Auction data from various eBay.com objects over various length auctions	Contains all bids, bidderID, bid times, and opening prices.	~ 550	Text	Regression, classification	2012	[434][435]	G. Shmueli et al.
Statlog (German Credit Data)	Binary credit classification into "good" or "bad" with many features	Various financial features of each person are given.	690	Text	Classification	1994	[436]	H. Hofmann
Bank Marketing Dataset	Data from a large marketing campaign carried out by a large bank .	Many attributes of the clients contacted are given. If the client subscribed to the bank is also given.	45,211	Text	Classification	2012	[437][438]	S. Moro et al.
Istanbul Stock Exchange Dataset	Several stock indexes tracked for almost two years.	None.	536	Text	Classification, regression	2013	[439][440]	O. Akbilgic
Default of Credit Card Clients	Credit default data for Taiwanese creditors.	Various features about each account are given.	30,000	Text	Classification	2016	[441][442]	I. Yeh

Weather

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Cloud DataSet	Data about 1024 different clouds.	Image features extracted.	1024	Text	Classification, clustering	1989	[443]	P. Collard
El Nino Dataset	Oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific.	12 weather attributes are measured at each buoy.	178080	Text	Regression	1999	[444]	Pacific Marine Environmental Laboratory
Greenhouse Gas Observing Network Dataset	Time-series of greenhouse gas concentrations at 2921 grid cells in California created using simulations of the weather.	None.	2921	Text	Regression	2015	[445]	D. Lucas
Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory	Continuous air samples in Hawaii, USA. 44 years of records.	None.	44 years	Text	Regression	2001	[446]	Mauna Loa Observatory
Ionosphere Dataset	Radar data from the ionosphere. Task is to classify into good and bad radar returns.	Many radar features given.	351	Text	Classification	1989	[382][447]	Johns Hopkins University
Ozone Level Detection Dataset	Two ground ozone level datasets.	Many features given, including weather conditions at time of measurement.	2536	Text	Classification	2008	[448][449]	K. Zhang et al.

Census

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Adult Dataset	Census data from 1994 containing demographic features of adults and their income.	Cleaned and anonymized.	48,842	Comma separated values	Classification	1996	[450]	United States Census Bureau
Census-Income (KDD)	Weighted census data from the 1994 and 1995 Current Population Surveys.	Split into training and test sets.	299,285	Comma separated values	Classification	2000	[451][452]	United States Census Bureau
IPUMS Census Database	Census data from the Los Angeles and Long Beach areas.	None	256,932	Text	Classification, regression	1999	[453]	IPUMS
US Census Data 1990	Partial data from 1990 US census.	Results randomized and useful attributes selected.	2,458,285	Text	Classification, regression	1990	[454]	United States Census Bureau

Transit

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Bike Sharing Dataset	Hourly and daily count of rental bikes in a large city.	Many features, including weather, length of trip, etc., are given.	17,389	Text	Regression	2013	[455][456]	H. Fanaee-T
New York City Taxi Trip Data	Trip data for yellow and green taxis in New York City.	Gives pick up and drop off locations, fares, and other details of trips.	6 years	Text	Classification, clustering	2015	[457]	New York City Taxi and Limousine Commission
Taxi Service Trajectory ECML PKDD	Trajectories of all taxis in a large city.	Many features given, including start and stop points.	1,710,671	Text	Clustering, causal-discovery	2015	[458][459]	M. Ferreira et al.
METR-LA	Speed from loop detectors in the highway of Los Angles County.	Average speed in 5 minutes timesteps.	7,094,304 from 207 sensors and 34,272 timesteps	Comma separated values	Regression, Forecasting	2014	[460]	Jagadish et. al.
PeMS	Speed, flow, occupancy and other metrics from loop detectors and other sensors in the freeway of the State of California, U.S.A..	Metric usually aggregated via Average into 5 minutes timesteps.	39,000 individual detectors, each containing years of timeseries	Comma separated values	Regression, Forecasting, Nowcasting, Interpolation	(updated realtime)	[461]	California Department of Transportation

Internet

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Webpages from Common Crawl 2012	Large collection of webpages and how they are connected via hyperlinks	None.	3.5B	Text	clustering, classification	2013	[462]	V. Granville
Internet Advertisements Dataset	Dataset for predicting if a given image is an advertisement or not.	Features encode geometry of ads and phrases occurring in the URL.	3279	Text	Classification	1998	[463][464]	N. Kushmerick
Internet Usage Dataset	General demographics of internet users.	None.	10,104	Text	Classification, clustering	1999	[465]	D. Cook
URL Dataset	120 days of URL data from a large conference.	Many features of each URL are given.	2,396,130	Text	Classification	2009	[466][467]	J. Ma
Phishing Websites Dataset	Dataset of phishing websites.	Many features of each site are given.	2456	Text	Classification	2015	[468]	R. Mustafa et al.
Online Retail Dataset	Online transactions for a UK online retailer.	Details of each transaction given.	541,909	Text	Classification, clustering	2015	[469]	D. Chen
Freebase Simple Topic Dump	Freebase is an online effort to structure all human knowledge.	Topics from Freebase have been extracted.	large	Text	Classification, clustering	2011	[470][471]	Freebase
Farm Ads Dataset	The text of farm ads from websites. Binary approval or disapproval by content owners is given.	SVMlight sparse vectors of text words in ads calculated.	4143	Text	Classification	2011	[472][473]	C. Masterharm et al.

Games

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Poker Hand Dataset	5 card hands from a standard 52 card deck.	Attributes of each hand are given, including the Poker hands formed by the cards it contains.	1,025,010	Text	Regression, classification	2007	[474]	R. Cattral
Connect-4 Dataset	Contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced.	None.	67,557	Text	Classification	1995	[475]	J. Tromp
Chess (King-Rook vs. King) Dataset	Endgame Database for White King and Rook against Black King.	None.	28,056	Text	Classification	1994	[476][477]	M. Bain et al.
Chess (King-Rook vs. King-Pawn) Dataset	King+Rook versus King+Pawn on a7.	None.	3196	Text	Classification	1989	[478]	R. Holte
Tic-Tac-Toe Endgame Dataset	Binary classification for win conditions in tic-tac-toe.	None.	958	Text	Classification	1991	[479]	D. Aha

Other multivariate

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created (updated)	Reference	Creator
Housing Data Set	Median home values of Boston with associated home and neighborhood attributes.	None.	506	Text	Regression	1993	[480]	D. Harrison et al.
The Getty Vocabularies	structured terminology for art and other material culture, archival materials, visual surrogates, and bibliographic materials.	None.	large	Text	Classification	2015	[481]	Getty Center
Yahoo! Front Page Today Module User Click Log	User click log for news articles displayed in the Featured Tab of the Today Module on Yahoo! Front Page.	Conjoint analysis with a bilinear model.	45,811,883 user visits	Text	Regression, clustering	2009	[482][483]	Chu et al.
British Oceanographic Data Centre	Biological, chemical, physical and geophysical data for oceans. 22K variables tracked.	Various.	22K variables, many instances	Text	Regression, clustering	2015	[484]	British Oceanographic Data Centre
Congressional Voting Records Dataset	Voting data for all USA representatives on 16 issues.	Beyond the raw voting data, various other features are provided.	435	Text	Classification	1987	[485]	J. Schlimmer
Entree Chicago Recommendation Dataset	Record of user interactions with Entree Chicago recommendation system.	Details of each users usage of the app are recorded in detail.	50,672	Text	Regression, recommendation	2000	[486]	R. Burke
Insurance Company Benchmark (COIL 2000)	Information on customers of an insurance company.	Many features of each customer and the services they use.	9,000	Text	Regression, classification	2000	[487][488]	P. van der Putten
Nursery Dataset	Data from applicants to nursery schools.	Data about applicant's family and various other factors included.	12,960	Text	Classification	1997	[489][490]	V. Rajkovic et al.
University Dataset	Data describing attributed of a large number of universities.	None.	285	Text	Clustering, classification	1988	[491]	S. Sounders et al.
Blood Transfusion Service Center Dataset	Data from blood transfusion service center. Gives data on donors return rate, frequency, etc.	None.	748	Text	Classification	2008	[492][493]	I. Yeh
Record Linkage Comparison Patterns Dataset	Large dataset of records. Task is to link relevant records together.	Blocking procedure applied to select only certain record pairs.	5,749,132	Text	Classification	2011	[494][495]	University of Mainz
Nomao Dataset	Nomao collects data about places from many different sources. Task is to detect items that describe the same place.	Duplicates labeled.	34,465	Text	Classification	2012	[496][497]	Nomao Labs
Movie Dataset	Data for 10,000 movies.	Several features for each movie are given.	10,000	Text	Clustering, classification	1999	[498]	G. Wiederhold
Open University Learning Analytics Dataset	Information about students and their interactions with a virtual learning environment.	None.	~ 30,000	Text	Classification, clustering, regression	2015	[499][500]	J. Kuzilek et al.
Mobile phone records	Telecommunications activity and interactions	Aggregation per geographical grid cells and every 15 minutes.	large	Text	Classification, Clustering, Regression	2015	[501]	G. Barlacchi et al.

Curated repositories of datasets

As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research.

OpenML:[502] Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms.
PMLB:[503] A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification and regression datasets in a standardized format that are accessible through a Python API.
Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic.
Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.[504][505]

References

Wissner-Gross, A. "Datasets Over Algorithms". Edge.com. Retrieved 8 January 2016.
Weiss, G. M.; Provost, F. (1 September 2003). "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction". Journal of Artificial Intelligence Research. AI Access Foundation. 19: 315–354. doi:10.1613/jair.1199. ISSN 1076-9757. S2CID 2344521.
Turney, Peter (2000). "Types of cost in inductive concept learning". arXiv:cs/0212034.
Abney, Steven (17 September 2007). Semisupervised Learning for Computational Linguistics. CRC Press. ISBN 978-1-4200-1080-0.
Žliobaitė, Indrė; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoff (2011). "Active Learning with Evolving Streaming Data". Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 597–612. doi:10.1007/978-3-642-23808-6_39. ISBN 978-3-642-23807-9. ISSN 0302-9743.
Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Kotsia, I. (2017). "Aff-Wild: Valence and Arousal in-the-wild Challenge" (PDF). Computer Vision and Pattern Recognition Workshops (CVPRW), 2017: 1980–1987. doi:10.1109/CVPRW.2017.248. ISBN 978-1-5386-0733-6. S2CID 3107614.
Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision (IJCV), 2019. 127 (6–7): 907–929. doi:10.1007/s11263-019-01158-4. S2CID 13679040.
Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface" (PDF). British Machine Vision Conference (BMVC), 2019. arXiv:1910.04855.
Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing affective behavior in the first abaw 2020 competition". IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2020. arXiv:2001.11409.
Phillips, P. Jonathon; et al. (1998). "The FERET database and evaluation procedure for face-recognition algorithms". Image and Vision Computing. 16 (5): 295–306. doi:10.1016/s0262-8856(97)00070-x.
Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch graph matching". IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321. doi:10.1109/34.598235.
Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English". PLOS ONE. 13 (5): e0196391. Bibcode:2018PLoSO..1396391L. doi:10.1371/journal.pone.0196391. PMC 5955500. PMID 29768426.
Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). doi:10.5281/zenodo.1188976.
Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface–surveillance cameras face database". Multimedia Tools and Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2. S2CID 207218990.
Wallace, Roy, et al. "Inter-session variability modelling and joint factor analysis for face authentication." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011.
Georghiades, A. "Yale face database". Center For Computational Vision And Control At Yale University, http://CVC.yale.edu/Projects/Yalefaces/Yalefa. 2: 1997. External link in |journal= (help)
Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature extraction using field-programmable gate arrays". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848. doi:10.1109/tsmcb.2005.862728. PMID 16903373. S2CID 7334355.
Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive database for facial expression analysis." Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000.
Zeng, Zhihong; et al. (2009). "A survey of affect recognition methods: Audio, visual, and spontaneous expressions". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (1): 39–58. CiteSeerX 10.1.1.144.217. doi:10.1109/tpami.2008.52. PMID 19029545.
Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial expression images". The Japanese Female Facial Expression (JAFFE) Database. doi:10.5281/zenodo.3451524.
Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "Coding facial expressions with Gabor wavelets." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998.
Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.
RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2015). "One-to-many face recognition with bilinear CNNs". arXiv:1506.01342 [cs.CV].
Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001.
Huang, Gary B., et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.
Bhatt, Rajen B., et al. "Efficient skin region segmentation using low complexity fuzzy decision tree model." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.
Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue areas in melanoma dermoscopy images". Computerized Medical Imaging and Graphics. 38 (5): 403–410. doi:10.1016/j.compmedimag.2014.03.007. PMC 4287461. PMID 24786720.
Maes, Chris, et al. "Feature detection on 3D face surfaces for pose normalisation and recognition." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010.
Savran, Arman, et al. "Bosphorus database for 3D face analysis." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56.
Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-dimensional face recognition: An eigensurface approach." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004.
Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face Recognition". Journal of Multimedia. 6 (5): 467–475. CiteSeerX 10.1.1.461.9710. doi:10.4304/jmm.6.5.467-475.
Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D face recognition by local shape difference boosting". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (10): 1858–1870. CiteSeerX 10.1.1.471.2424. doi:10.1109/tpami.2009.200. PMID 20724762. S2CID 15263913.
Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face recognition using learned visual codebook." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.
Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011). "Facial expression recognition from near-infrared videos" (PDF). Image and Vision Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002.
Soyel, Hamit, and Hasan Demirel. "Facial expression recognition using 3D facial feature distances." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838.
Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition". Computer Vision and Image Understanding. 101 (1): 1–15. CiteSeerX 10.1.1.134.8784. doi:10.1016/j.cviu.2005.05.005.
Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature sets for face recognition under difficult lighting conditions". IEEE Transactions on Image Processing. 19 (6): 1635–1650. Bibcode:2010ITIP...19.1635T. CiteSeerX 10.1.1.105.3355. doi:10.1109/tip.2010.2042645. PMID 20172829. S2CID 4943234.
Mousavi, Mir Hashem, Karim Faez, and Amin Asghari. "Three dimensional face recognition using SVM classifier." Computer and Information Science, 2008. ICIS 08. Seventh IEEE/ACIS International Conference on. IEEE, 2008.
Amberg, Brian, Reinhard Knothe, and Thomas Vetter. "Expression invariant 3D face recognition with a morphable model." Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on. IEEE, 2008.
İrfanoğlu, M. O., Berk Gökberk, and Lale Akarun. "3D shape-based face recognition using automatically registered facial surfaces." Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 4. IEEE, 2004.
Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D and grey level clues". Pattern Recognition Letters. 22 (12): 1321–1329. doi:10.1016/s0167-8655(01)00077-0.
Afifi, Mahmoud; Abdelhamed, Abdelrahman (13 June 2017). "AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces". arXiv:1706.04277 [cs.CV].
"SoF dataset". sites.google.com. Retrieved 18 November 2017.
"IMDB-WIKI". data.vision.ee.ethz.ch. Retrieved 13 March 2018.
Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012). "Structured learning of human interactions in TV shows". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (12): 2441–2453. doi:10.1109/tpami.2012.24. PMID 23079467. S2CID 6060568.
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). Berkeley MHAD: A comprehensive multimodal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.
Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013.
Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014.
Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu, Michel (2016). "Fast Action Localization in Large-Scale Video Archives". IEEE Transactions on Circuits and Systems for Video Technology. 26 (10): 1917–1930. doi:10.1109/TCSVT.2015.2475835. S2CID 31537462.
Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata, Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017). "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations". International Journal of Computer Vision. 123: 32–73. arXiv:1602.07332. doi:10.1007/s11263-016-0981-7. S2CID 4492210.
Karayev, S., et al. "A category-level 3-D object dataset: putting the Kinect to work." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011.
Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable nonparametric image parsing with superpixels." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365.
Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour Detection and Hierarchical Image Segmentation" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161. PMID 20733228. S2CID 206764694. Retrieved 27 February 2016.
Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 740–755.
Russakovsky, Olga; et al. (2015). "Imagenet large scale visual recognition challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547.
Xiao, Jianxiong, et al. "Sun database: Large-scale scene recognition from abbey to zoo." Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.
Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang, Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition". arXiv:1310.1531 [cs.CV].
Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale Visual Recognition Challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547.
Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages."
Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast News Videos." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014.
Hauptmann, Alexander G., and Michael J. Witbrock. "Story segmentation and detection of commercials in broadcast news video." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998.
Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and visualizing nonlinear correlation clusters." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.
Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.
Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007 [Online]. Available: http://authors.library.caltech.edu/7694, 2007.
Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.
Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177.
Heitz, Geremy; et al. (2009). "Shape-based object localization for descriptive classification". International Journal of Computer Vision. 84 (1): 40–62. CiteSeerX 10.1.1.142.280. doi:10.1007/s11263-009-0228-y. S2CID 646320.
M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset." In CVPR Workshop on The Future of Datasets in Vision, 2015.
Everingham, Mark; et al. (2010). "The pascal visual object classes (voc) challenge". International Journal of Computer Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4. S2CID 4246903.
Felzenszwalb, Pedro F.; et al. (2010). "Object detection with discriminatively trained part-based models". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645. CiteSeerX 10.1.1.153.2745. doi:10.1109/tpami.2009.167. PMID 20634557. S2CID 3198903.
Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
"CINIC-10 dataset". Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 9 October 2018. Retrieved 13 November 2018.
fashion-mnist: A MNIST-like fashion product database. Benchmark :point_right, Zalando Research, 7 October 2017, retrieved 7 October 2017
"notMNIST dataset". Machine Learning, etc. 8 September 2011. Retrieved 13 October 2017.
Houben, Sebastian, et al. "Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
Mathias, Mayeul, et al. "Traffic sign recognition—How far are we from the solution?." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012.
Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 dataset. Chaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/
Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle Detection in Agriculture". Sensors. 17 (11): 2579. arXiv:1709.03526. Bibcode:2017arXiv170903526F. doi:10.3390/s17112579. PMC 5713196. PMID 29120383.
Afifi, Mahmoud (12 November 2017). "Gender recognition and biometric identification using a large dataset of hand images". arXiv:1711.04322 [cs.CV].
Lomonaco, Vincenzo; Maltoni, Davide (18 October 2017). "CORe50: a New Dataset and Benchmark for Continuous Object Recognition". arXiv:1705.03550 [cs.CV].
She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin; Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao; Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (15 November 2019). "OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning". arXiv:1911.06487v2 [cs.CV].
Morozov, Alexei; Sushkova, Olga (13 June 2019). "THz and thermal video data set". Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance. Moscow: IRE RAS. Retrieved 19 July 2019.
Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov, Alexander (9 July 2019). "Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images" (PDF). CEUR. 2391: paper19. Retrieved 19 July 2019.
Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept definitions." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993.
Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive classifiers". Machine Learning. 6 (2): 161–182. doi:10.1007/bf00114162.
Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved learning of Riemannian metrics for exploratory analysis". Neural Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865. doi:10.1016/j.neunet.2004.06.008. PMID 15555853.
Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. doi:10.1016/j.patcog.2012.06.021.
Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters". 2009 10th International Conference on Document Analysis and Recognition: 1206–1210. doi:10.1109/ICDAR.2009.163. ISBN 978-1-4244-4500-4. S2CID 5705532.
Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting motion primitives from natural handwriting data. Springer Berlin Heidelberg, 2006.
Meier, Franziska, et al. "Movement segmentation using a primitive library."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.
T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009
Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters." LREC. 2008.
Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures of von mises distributions for people trajectory shape analysis". IEEE Transactions on Circuits and Systems for Video Technology. 21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550. S2CID 1427766.
Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature selection challenge." Advances in neural information processing systems. 2004.
Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (11 December 2015). "Human-level concept learning through probabilistic program induction". Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L. doi:10.1126/science.aab3050. ISSN 0036-8075. PMID 26659050.
Lake, Brenden (9 November 2019), Omniglot data set for one-shot learning, retrieved 10 November 2019
LeCun, Yann; et al. (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791.
Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008.
Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of combining multiple classifiers and their applications to handwriting recognition". IEEE Transactions on Systems, Man and Cybernetics. 22 (3): 418–435. doi:10.1109/21.155943. hdl:10338.dmlcz/135217.
Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based handwritten digit recognition." (1996).
Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using relevance weighted LDA". Pattern Recognition. 38 (4): 485–493. doi:10.1016/j.patcog.2004.09.005.
Hong, Yi, et al. "Learning a mixture of sparse distance metrics for classification and dimensionality reduction." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 [cs.CV].
Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat; Mukhopadhyay, Supratik (20 June 2018). "Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters". arXiv:1806.08037 [cs.CV].
Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019), "PCGAN-CHAR: Progressively Trained Classifier Generative Adversarial Networks for Classification of Noisy Handwritten Bangla Characters", Digital Libraries at the Crossroads of Digital Information for the Future, Springer International Publishing, pp. 3–15, arXiv:1908.08987, doi:10.1007/978-3-030-34058-2_1, ISBN 978-3-030-34057-5, S2CID 201665955
Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013). "Systematic benchmarking of aerial image segmentation". IEEE Geoscience and Remote Sensing Letters. 10 (6): 1527–1531. Bibcode:2013IGRSL..10.1527Y. doi:10.1109/lgrs.2013.2261453. S2CID 629629.
Vatsavai, Ranga Raju. "Object based image classification: state of the art and computational challenges." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013.
Butenuth, Matthias, et al. "Integrating pedestrian simulation, tracking and event detection for crowd analysis." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.
Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis using frame-wise normalized feature for people counting." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012.
Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees." International journal of remote sensing34.20 (2013): 6969–6982.
Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran (2015). "A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification". Remote Sensing Letters. 6 (7): 568–577. doi:10.1080/2150704X.2015.1062159. S2CID 58788630.
Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks." Remote Sensing. 2018; 10(4):511.
Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset" [Online]. Available: https://www.iuii.ua.es/datasets/masati/, 2018.
Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using geographically weighted variables for image classification". Remote Sensing Letters. 3 (6): 491–499. doi:10.1080/01431161.2011.629637. S2CID 122543681.
Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid NN-GA Model Based Approach." Information Systems Design and Intelligent Applications. Springer India, 2016. 227-236.
Diegert, Carl. "A combinatorial method for tracing objects using semantics of their shape." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010.
Razakarivony, Sebastien, and Frédéric Jurie. "Small target detection combining foreground and background manifolds." IAPR International Conference on Machine Vision Applications. 2013.
"SpaceNet". explore.digitalglobe.com. Retrieved 13 March 2018.
Etten, Adam Van (5 January 2017). "Getting Started With SpaceNet Data". The DownLinQ. Retrieved 13 March 2018.
Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July 2017). Integrating edge/boundary priors with classification scores for building detection in very high resolution data. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705. ISBN 978-1-5090-4951-6. S2CID 8297433.
Yang, Yi; Newsam, Shawn (2010). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '10. New York, New York, USA: ACM Press. doi:10.1145/1869790.1869829. ISBN 9781450304283. S2CID 993769.
Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (3 November 2015). DeepSat: a learning framework for satellite imagery. ACM. p. 37. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134.
Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (21 November 2019). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097.
Mills, Kyle; Tamblyn, Isaac (16 May 2018), Big graphene dataset, National Research Council of Canada, doi:10.4224/c8sc04578j.data
Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (16 May 2018). "Quantum simulation". Quantum simulations of an electron in a two dimensional potential well. National Research Council of Canada. doi:10.4224/PhysRevA.96.042113.data.
Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). A database for fine grained activity detection of cooking activities. IEEE. doi:10.1109/cvpr.2012.6247801. ISBN 978-1-4673-1228-8.
Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of actions: Recovering the syntax and semantics of goal-directed human activities."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS)."Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012.
Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile fine-grained recognition of pharma packages."Proc. European Signal Processing Conference (EUSIPCO). 2017.
Khosla, Aditya, et al. "Novel dataset for fine-grained image categorization: Stanford dogs."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011.
Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
Biggs, Benjamin, et al. "Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.."Proc. ECCV. 2020.
Razavian, Ali, et al. "CNN features off-the-shelf: an astounding baseline for recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014.
Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity queries in MARS". IEEE Transactions on Knowledge and Data Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079. doi:10.1109/69.738357.
He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "Multiscale conditional random fields for image labeling." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004.
Deneke, Tewodros, et al. "Video transcoding time prediction for proactive load balancing." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014.
Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell (13 April 2016). "Visual Storytelling". arXiv:1604.03968 [cs.CL].CS1 maint: multiple names: authors list (link)
Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset." (2011).
Duan, Kun, et al. "Discovering localized attributes for fine-grained recognition." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
"YouTube-8M Dataset". research.google.com. Retrieved 1 October 2016.
Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale Video Classification Benchmark". arXiv:1609.08675 [cs.CV].
"YFCC100M Dataset". mmcommons.org. Yahoo-ICSI-LLNL. Retrieved 1 June 2017.
Bart Thomee; David A Shamma; Gerald Friedland; Benjamin Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April 2016). "Yfcc100m: The new data in multimedia research". Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817. doi:10.1145/2812802. S2CID 207230134.
Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-ACCEDE: A Video Database for Affective Content Analysis," in IEEE Transactions on Affective Computing, 2015.
Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The mediaeval 2015 affective impact of movies task," in MediaEval 2015 Workshop, 2015.
S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation", in Proceedings of the 21st British Machine Vision Conference (BMVC2010)
S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011)
Afifi, Mahmoud; Hussain, Khaled F. (2 November 2017). "The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques". arXiv:1711.00972 [cs.CV].
"MCQ Dataset". sites.google.com. Retrieved 18 November 2017.
Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.; Derbaz, S. (July 2016). A new compression technique for surveillance videos: Evaluation using new dataset. 2016 Sixth International Conference on Digital Information and Communication Technology and Its Applications (DICTAP). pp. 159–164. doi:10.1109/DICTAP.2016.7544020. ISBN 978-1-4673-9609-7. S2CID 8698850.
Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.; Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White, Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.; Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.; Photopoulou, Theoni (2018). "Machine learning to classify animal species in camera trap images: Applications in ecology". Methods in Ecology and Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120. ISSN 2041-210X.
Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed, Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M. (November 2017). "Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification". Journal of Electronic Imaging. 26 (6): 060501. arXiv:1706.03867. Bibcode:2017JEI....26f0501T. doi:10.1117/1.jei.26.6.060501. ISSN 1017-9909. S2CID 12367169.
McAuley, Julian, et al. "Image-based recommendations on styles and substitutes." Proceedings of the 38th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2015
Ganesan, Kavita; Zhai, Chengxiang (2012). "Opinion-based entity ranking". Information Retrieval. 15 (2): 116–150. doi:10.1007/s10791-011-9174-8. hdl:2142/15252. S2CID 16258727.
Lv, Yuanhua, Dimitrios Lymberopoulos, and Qiang Wu. "An exploration of ranking heuristics in mobile local search." Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2012.
Harper, F. Maxwell; Konstan, Joseph A. (2015). "The MovieLens Datasets: History and Context". ACM Transactions on Interactive Intelligent Systems. 5 (4): 19. doi:10.1145/2827872. S2CID 16619709.
Koenigstein, Noam, Gideon Dror, and Yehuda Koren. "Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy." Proceedings of the fifth ACM conference on Recommender systems. ACM, 2011.
McFee, Brian, et al. "The million song dataset challenge." Proceedings of the 21st international conference companion on World Wide Web. ACM, 2012.
Bohanec, Marko, and Vladislav Rajkovic. "Knowledge acquisition and explanation for multi-attribute decision making." 8th Intl Workshop on Expert Systems and their Applications. 1988.
Tan, Peter J., and David L. Dowe. "MML inference of decision graphs with multi-way joins." Australian Joint Conference on Artificial Intelligence. 2002.
"Quantifying comedy on YouTube: why the number of o's in your LOL matter". Metatext NLP Database. Retrieved 26 October 2020.
Kim, Byung Joo (2012). "A Classifier for Big Data". Convergence and Hybrid Information Technology. Communications in Computer and Information Science. 310. pp. 505–512. doi:10.1007/978-3-642-32692-9_63. ISBN 978-3-642-32691-2.
Pérezgonzález, Jose D.; Gilbey, Andrew (2011). "Predicting Skytrax airport rankings from customer reviews". Journal of Airport Management. 5 (4): 335–339.
Loh, Wei-Yin, and Yu-Shan Shih. "Split selection methods for classification trees." Statistica sinica(1997): 815–840.
Lim, Tjen-Sien; Loh, Wei-Yin; Shih, Yu-Shan (2000). "A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms". Machine Learning. 40 (3): 203–228. doi:10.1023/a:1007608224229. S2CID 17030953.
Kiet Van Nguyen, Vu Duc Nguyen, Phu X. V. Nguyen, Tham T. H. Truong, Ngan Luu-Thuy Nguyen. "UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}}
Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition for Vietnamese Social Media Text}}
Dermouche, Mohamed; Velcin, Julien; Khouas, Leila; Loudcher, Sabine (2014). A Joint Model for Topic-Sentiment Evolution over Time. IEEE. doi:10.1109/icdm.2014.82. ISBN 978-1-4799-4302-9.
Rose, Tony; Stevenson, Mark; Whitehead, Miles (2002). "The Reuters Corpus Volume 1-from Yesterday's News to Tomorrow's Language Resources" (PDF). LREC. 2. S2CID 9239414.
Amini, Massih R.; Usunier, Nicolas; Goutte, Cyril (2009). "Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization". Advances in Neural Information Processing Systems: 28–36.
Liu, Ming; et al. (2015). "VRCA: a clustering algorithm for massive amount of texts". Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press.
Al-Harbi, S; Almuhareb, A; Al-Thubaity, A; Khorsheed, M. S.; Al-Rajeh, A (2008). "Automatic Arabic Text Classification". Proceedings of the 9th International Conference on the Statistical Analysis of Textual Data, Lyon, France.
"Relationship and Entity Extraction Evaluation Dataset: Dstl/re3d". 17 December 2018.
"The Examiner - SpamClickBait Catalogue".
"A Million News Headlines".
"One Week of Global News Feeds".
Kulkarni, Rohit (2018), Reuters News-Wire Archive, Harvard Dataverse, doi:10.7910/DVN/XDB74W
"IrishTimes - the Waxy-Wany News".
"News Headlines Dataset For Sarcasm Detection". kaggle.com. Retrieved 27 April 2019.
Klimt, Bryan, and Yiming Yang. "Introducing the Enron Corpus." CEAS. 2004.
Kossinets, Gueorgi, Jon Kleinberg, and Duncan Watts. "The structure of information pathways in a social communication network." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.
Androutsopoulos, Ion; Koutsias, John; Chandrinos, Konstantinos V.; Paliouras, George; Spyropoulos, Constantine D. (2000). "An evaluation of Naive Bayesian anti-spam filtering". In Potamias, G.; Moustakis, V.; van Someren, M. (eds.). Proceedings of the Workshop on Machine Learning in the New Information Age. 11th European Conference on Machine Learning, Barcelona, Spain. 11. pp. 9–17. arXiv:cs/0006013. Bibcode:2000cs........6013A.
Bratko, Andrej; et al. (2006). "Spam filtering using statistical data compression models" (PDF). The Journal of Machine Learning Research. 7: 2673–2698.
Almeida, Tiago A., José María G. Hidalgo, and Akebo Yamakami. "Contributions to the study of SMS spam filtering: new collection and results."Proceedings of the 11th ACM symposium on Document engineering. ACM, 2011.
Delany; Jane, Sarah; Buckley, Mark; Greene, Derek (2012). "SMS spam filtering: methods and data". Expert Systems with Applications. 39 (10): 9899–9908. doi:10.1016/j.eswa.2012.02.053.
Joachims, Thorsten. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. No. CMU-CS-96-118. Carnegie-mellon univ pittsburgh pa dept of computer science, 1996.
Dimitrakakis, Christos, and Samy Bengio. Online Policy Adaptation for Ensemble Algorithms. No. EPFL-REPORT-82788. IDIAP, 2002.
Annamoradnejad, Issa. arXiv:2004.12765. arXiv:2004.12765, 2020.
Dooms, S. et al. "Movietweetings: a movie rating dataset collected from twitter, 2013. Available from https://github.com/sidooms/MovieTweetings."
RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2017). "Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval". arXiv:1703.06618 [cs.CV].
"huyt16/Twitter100k". GitHub. Retrieved 26 March 2018.
Go, Alec; Bhayani, Richa; Huang, Lei (2009). "Twitter sentiment classification using distant supervision". CS224N Project Report, Stanford. 1: 12.
Chikersal, Prerna, Soujanya Poria, and Erik Cambria. "SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning." Proceedings of the International Workshop on Semantic Evaluation, SemEval. 2015.
Zafarani, Reza, and Huan Liu. "Social computing data repository at ASU." School of Computing, Informatics and Decision Systems Engineering, Arizona State University (2009).
Bisgin, Halil, Nitin Agarwal, and Xiaowei Xu. "Investigating homophily in online social networks." Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. Vol. 1. IEEE, 2010.
McAuley, Julian J.; Leskovec, Jure. "Learning to Discover Social Circles in Ego Networks". NIPS. 2012: 2012.
Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko (2014). "Network-based statistical comparison of citation topology of bibliographic databases". Scientific Reports. 4 (6496): 6496. arXiv:1502.05061. Bibcode:2014NatSR...4E6496S. doi:10.1038/srep06496. PMC 4178292. PMID 25263231.
Abdulla, N., et al. "Arabic sentiment analysis: Corpus-based and lexicon-based." Proceedings of the IEEE conference on Applied Electrical Engineering and Computing Technologies (AEECT). 2013.
Abooraig, Raddad, et al. "On the automatic categorization of Arabic articles based on their political orientation." Third International Conference on Informatics Engineering and Information Science (ICIEIS2014). 2014.
Kawala, François, et al. "Prédictions d'activité dans les réseaux sociaux en ligne." 4ième conférence sur les modèles et l'analyse des réseaux: Approches mathématiques et informatiques. 2013.
Sabharwal, Ashish; Samulowitz, Horst; Tesauro, Gerald (2015). "Selecting Near-Optimal Learners via Incremental Data Allocation". arXiv:1601.00024 [cs.LG].
Xu et al. "SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)" Proceedings of the 9th International Workshop on Semantic Evaluation. 2015.
Xu et al. "Extracting Lexically Divergent Paraphrases from Twitter" Transactions of the Association for Computational (TACL). 2014.
Middleton, Stuart E; Middleton, Lee; Modafferi, Stefano (2014). "Real-Time Crisis Mapping of Natural Disasters Using Social Media" (PDF). IEEE Intelligent Systems. 29 (2): 9–17. doi:10.1109/MIS.2013.126. S2CID 15139204.
"geoparsepy". 2016. Python PyPI library
Gupta, Aakash (5 December 2020). "Dutch social media collection" Check |url= value (help). doi:10.5072/FK2/MTPTL7. Cite journal requires |journal= (help)
"Streamlit". huggingface.co. Retrieved 18 December 2020.
"Dutch Social media collection". kaggle.com. Retrieved 18 December 2020.
Forsyth, E., Lin, J., & Martell, C. (2008, June 25). The NPS Chat Corpus. Retrieved from http://faculty.nps.edu/cmartell/NPSChat.htm
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan, A Neural Network Approach to Context-Sensitive Generation of Conversational Responses, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL-HLT 2015), June 2015.
Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET corpus (2005-2011) Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html)
KAN, M. (2011, January). NUS Short Message Service (SMS) Corpus. Retrieved from http://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/
Stuck_In_the_Matrix. (2015, July 3). I have every publicly available Reddit comment for research. ~ 1.7 billion comments @ 250 GB compressed. Any interest in this? [Original post]. Message posted to https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/
Ryan Lowe, Nissan Pow, Iulian V. Serban and Joelle Pineau, "The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructure Multi-Turn Dialogue Systems", SIGDial 2015.
K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber and L. E. Barnes, "HDLTex: Hierarchical Deep Learning for Text Classification", 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 364-371. doi: 10.1109/ICMLA.2017.0-134
K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber and L. E. Barnes, "Web of Science Dataset", doi:10.17632/9rw3vkcfy4.6
Galgani, Filippo, Paul Compton, and Achim Hoffmann. "Combining different summarization techniques for legal text." Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data. Association for Computational Linguistics, 2012.
Nagwani, N. K. (2015). "Summarizing large text collection using topic modeling and clustering based on MapReduce framework". Journal of Big Data. 2 (1): 1–18. doi:10.1186/s40537-015-0020-5.
Schler, Jonathan; et al. (2006). "Effects of Age and Gender on Blogging" (PDF). AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 6.
Anand, Pranav, et al. "Believe Me-We Can Do This! Annotating Persuasive Acts in Blog Text."Computational Models of Natural Argument. 2011.
Traud, Amanda L., Peter J. Mucha, and Mason A. Porter. "Social structure of Facebook networks." Physica A: Statistical Mechanics and its Applications391.16 (2012): 4165–4180.
Richard, Emile; Savalle, Pierre-Andre; Vayatis, Nicolas (2012). "Estimation of Simultaneously Sparse and Low Rank Matrices". arXiv:1206.6474 [cs.DS].
Richardson, Matthew; Burges, Christopher JC; Renshaw, Erin (2013). "MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text". EMNLP. 1.
Weston, Jason; Bordes, Antoine; Chopra, Sumit; Rush, Alexander M.; Bart van Merriënboer; Joulin, Armand; Mikolov, Tomas (2015). "Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks". arXiv:1502.05698 [cs.AI].
Marcus, Mitchell P.; Ann Marcinkiewicz, Mary; Santorini, Beatrice (1993). "Building a large annotated corpus of English: The Penn Treebank". Computational Linguistics. 19 (2): 313–330.
Collins, Michael (2003). "Head-driven statistical models for natural language parsing". Computational Linguistics. 29 (4): 589–637. doi:10.1162/089120103322753356.
Guyon, Isabelle, et al., eds. Feature extraction: foundations and applications. Vol. 207. Springer, 2008.
Lin, Yuri, et al. "Syntactic annotations for the google books ngram corpus." Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, 2012.
Krishnamoorthy, Niveda; et al. (2013). "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge". AAAI. 1.
Luyckx, Kim, and Walter Daelemans. "Personae: a Corpus for Author and Personality Prediction from Text." LREC. 2008.
Solorio, Thamar, Ragib Hasan, and Mainul Mizan. "A case study of sockpuppet detection in wikipedia." Workshop on Language Analysis in Social Media (LASM) at NAACL HLT. 2013.
Ciarelli, Patrick Marques, and Elias Oliveira. "Agglomeration and elimination of terms for dimensionality reduction." Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on. IEEE, 2009.
Zhou, Mingyuan, Oscar Hernan Madrid Padilla, and James G. Scott. "Priors for random count matrices derived from a family of negative binomial processes." Journal of the American Statistical Association just-accepted (2015): 00–00.
Kotzias, Dimitrios, et al. "From group to individual labels using deep features." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.
Ning, Yue; Muthiah, Sathappan; Rangwala, Huzefa; Ramakrishnan, Naren (2016). "Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning". arXiv:1602.08033 [cs.SI].
Buza, Krisztian. "Feedback prediction for blogs."Data analysis, machine learning and knowledge discovery. Springer International Publishing, 2014. 145–152.
Soysal, Ömer M (2015). "Association rule mining with mostly associated sequential patterns". Expert Systems with Applications. 42 (5): 2582–2592. doi:10.1016/j.eswa.2014.10.049.
Bowman, Samuel, et al. "A large annotated corpus for learning natural language inference." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 2015.
"DSL Corpus Collection". ttg.uni-saarland.de. Retrieved 22 September 2017.
"Urban Dictionary Words and Definitions".
H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F. Laforest, E. Simperl, "T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples", Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
"Computers Are Learning to Read—But They're Still Not So Smart". Wired. Retrieved 29 December 2019.
Quan, Hoang Lam; Quang, Duy Le; Van Kiet, Nguyen; Ngan, Luu-Thuy Nguyen. "UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning".
To, Quoc Huy; Nguyen, Van Kiet; Nguyen, Luu Thuy Ngan; Nguyen, Gia Tuan Anh. "Gender Prediction Based on Vietnamese Names with Machine Learning Techniques" (PDF).
M. Versteegh, R. Thiollière, T. Schatz, X.-N. Cao, X. Anguera, A. Jansen, and E. Dupoux (2015). "The Zero Resource Speech Challenge 2015," in INTERSPEECH-2015.
M. Versteegh, X. Anguera, A. Jansen, and E. Dupoux, (2016). "The Zero Resource Speech Challenge 2015: Proposed Approaches and Results," in SLTU-2016.
Sakar, Betul Erdogdu; et al. (2013). "Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings". IEEE Journal of Biomedical and Health Informatics. 17 (4): 828–834. doi:10.1109/jbhi.2013.2245674. PMID 25055311. S2CID 15491516.
Zhao, Shunan, et al. "Automatic detection of expressed emotion in Parkinson's disease." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
Used in: Hammami, Nacereddine, and Mouldi Bedda. "Improved tree model for Arabic speech recognition." Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on. Vol. 5. IEEE, 2010.
Maaten, Laurens. "Learning discriminative fisher kernels." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
Cole, Ronald, and Mark Fanty. "Spoken letter recognition." Proc. Third DARPA Speech and Natural Language Workshop. 1990.
Chapelle, Olivier; Sindhwani, Vikas; Keerthi, Sathiya S. (2008). "Optimization techniques for semi-supervised support vector machines" (PDF). The Journal of Machine Learning Research. 9: 203–233.
Kudo, Mineichi; Toyama, Jun; Shimbo, Masaru (1999). "Multidimensional curve classification using passing-through regions". Pattern Recognition Letters. 20 (11): 1103–1111. CiteSeerX 10.1.1.46.2515. doi:10.1016/s0167-8655(99)00077-x.
Jaeger, Herbert; et al. (2007). "Optimization and applications of echo state networks with leaky-integrator neurons". Neural Networks. 20 (3): 335–352. doi:10.1016/j.neunet.2007.04.016. PMID 17517495.
Tsanas, Athanasios; et al. (2010). "Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests". IEEE Transactions on Biomedical Engineering (Submitted manuscript). 57 (4): 884–893. doi:10.1109/tbme.2009.2036000. PMID 19932995. S2CID 7382779.
Clifford, Gari D.; Clifton, David (2012). "Wireless technology in disease management and medicine". Annual Review of Medicine. 63: 479–492. doi:10.1146/annurev-med-051210-114650. PMID 22053737.
Zue, Victor; Seneff, Stephanie; Glass, James (1990). "Speech database development at MIT: TIMIT and beyond". Speech Communication. 9 (4): 351–356. doi:10.1016/0167-6393(90)90010-7.
Kapadia, Sadik, Valtcho Valtchev, and S. J. Young. "MMI training for continuous phoneme recognition on the TIMIT database." Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on. Vol. 2. IEEE, 1993.
Halabi, Nawar (2016). Modern Standard Arabic Phonetics for Speech Synthesis (PDF) (PhD Thesis). University of Southampton, School of Electronics and Computer Science.
Ardila, Rosana; Branson, Megan; Davis, Kelly; Henretty, Michael; Kohler, Michael; Meyer, Josh; Morais, Reuben; Saunders, Lindsay; Tyers, Francis M.; Weber, Gregor (13 December 2019). "Common Voice: A Massively-Multilingual Speech Corpus". arXiv:1912.06670v2 [cs.CL].
Zhou, Fang, Q. Claire, and Ross D. King. "Predicting the geographical origin of music." Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 2014.
Saccenti, Edoardo; Camacho, José (2015). "On the use of the observation‐wise k‐fold operation in PCA cross‐validation". Journal of Chemometrics. 29 (8): 467–478. doi:10.1002/cem.2726. hdl:10481/55302. S2CID 62248957.
Bertin-Mahieux, Thierry, et al. "The million song dataset." ISMIR 2011: Proceedings of the 12th International Society for Music Information Retrieval Conference, 24–28 October 2011, Miami, Florida. University of Miami, 2011.
Henaff, Mikael; et al. (2011). "Unsupervised learning of sparse features for scalable audio classification" (PDF). ISMIR. 11.
Rafii, Zafar (2017). "Music". MUSDB18 - a corpus for music separation. doi:10.5281/zenodo.1117372.
Defferrard, Michaël; Benzi, Kirell; Vandergheynst, Pierre; Bresson, Xavier (6 December 2016). "FMA: A Dataset For Music Analysis". arXiv:1612.01840 [cs.SD].
Esposito, Roberto; Radicioni, Daniele P. (2009). "Carpediem: Optimizing the viterbi algorithm and applications to supervised sequential learning" (PDF). The Journal of Machine Learning Research. 10: 1851–1880.
Sourati, Jamshid; et al. (2016). "Classification Active Learning Based on Mutual Information". Entropy. 18 (2): 51. Bibcode:2016Entrp..18...51S. doi:10.3390/e18020051.
Salamon, Justin; Jacoby, Christopher; Bello, Juan Pablo. "A dataset and taxonomy for urban sound research." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
Lagrange, Mathieu; Lafay, Grégoire; Rossignol, Mathias; Benetos, Emmanouil; Roebel, Axel (2015). "An evaluation framework for event detection using a morphological model of acoustic scenes". arXiv:1502.00141 [stat.ML].
Gemmeke, Jort F., et al. "Audio Set: An ontology and human-labeled dataset for audio events." IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2017.
"Watch out, birders: Artificial intelligence has learned to spot birds from their songs". Science | AAAS. 18 July 2018. Retrieved 22 July 2018.
"Bird Audio Detection challenge". Machine Listening Lab at Queen Mary University. 3 May 2016. Retrieved 22 July 2018.
Wichern, G., et al. "WHAM!: Extending Speech Separation to Noisy Environments", Interspeech, 2019, https://arxiv.org/abs/1907.01160
Drossos, K., Lipping, S., and Virtanen, T. "Clotho: An Audio Captioning Dataset" IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2020.
Drossos, K., Lipping, S., and Virtanen, T. (2019). Clotho dataset (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3490684
The CAIDA UCSD Dataset on the Witty Worm – 19–24 March 2004, http://www.caida.org/data/passive/witty_worm_dataset.xml
Chen, Zesheng, and Chuanyi Ji. "Optimal worm-scanning method using vulnerable-host distributions." International Journal of Security and Networks 2.1–2 (2007): 71–80.
Kachuee, Mohamad, et al. "Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time." Circuits and Systems (ISCAS), 2015 IEEE International Symposium on. IEEE, 2015.
PhysioBank, PhysioToolkit. "PhysioNet: components of a new research resource for complex physiologic signals." Circulation. v101 i23. e215-e220.
Vergara, Alexander; et al. (2012). "Chemical gas sensor drift compensation using classifier ensembles". Sensors and Actuators B: Chemical. 166: 320–329. doi:10.1016/j.snb.2012.01.074.
Korotcenkov, G.; Cho, B. K. (2014). "Engineering approaches to improvement of conductometric gas sensor parameters. Part 2: Decrease of dissipated (consumable) power and improvement stability and reliability". Sensors and Actuators B: Chemical. 198: 316–341. doi:10.1016/j.snb.2014.03.069.
Quinlan, John R (1992). "Learning with continuous classes" (PDF). 5th Australian Joint Conference on Artificial Intelligence. 92.
Merz, Christopher J.; Pazzani, Michael J. (1999). "A principal components approach to combining regression estimates". Machine Learning. 36 (1–2): 9–32. doi:10.1023/a:1007507221352.
Torres-Sospedra, Joaquin, et al. "UJIIndoorLoc-Mag: A new database for magnetic field-based localization problems." Indoor Positioning and Indoor Navigation (IPIN), 2015 International Conference on. IEEE, 2015.
Berkvens, Rafael, Maarten Weyn, and Herbert Peremans. "Mean Mutual Information of Probabilistic Wi-Fi Localization." Indoor Positioning and Indoor Navigation (IPIN), 2015 International Conference on. Banff, Canada: IPIN. 2015.
Paschke, Fabian, et al. "Sensorlose Zustandsüberwachung an Synchronmotoren."Proceedings. 23. Workshop Computational Intelligence, Dortmund, 5.-6. Dezember 2013. KIT Scientific Publishing, 2013.
Lessmeier, Christian, et al. "Data Acquisition and Signal Analysis from Measured Motor Currents for Defect Detection in Electromechanical Drive Systems."
Ugulino, Wallace, et al. "Wearable computing: Accelerometers’ data classification of body postures and movements." Advances in Artificial Intelligence-SBIA 2012. Springer Berlin Heidelberg, 2012. 52–61.
Schneider, Jan; et al. (2015). "Augmenting the senses: a review on sensor-based learning support". Sensors. 15 (2): 4097–4133. doi:10.3390/s150204097. PMC 4367401. PMID 25679313.
Madeo, Renata CB, Clodoaldo AM Lima, and Sarajane M. Peres. "Gesture unit segmentation using support vector machines: segmenting gestures from rest positions." Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 2013.
Lun, Roanna; Zhao, Wenbing (2015). "A survey of applications and human motion recognition with Microsoft Kinect". International Journal of Pattern Recognition and Artificial Intelligence. 29 (5): 1555008. doi:10.1142/s0218001415550083.
Theodoridis, Theodoros, and Huosheng Hu. "Action classification of 3d human models using dynamic ANNs for mobile robot surveillance."Robotics and Biomimetics, 2007. ROBIO 2007. IEEE International Conference on. IEEE, 2007.
Etemad, Seyed Ali, and Ali Arya. "3D human action recognition and style transformation using resilient backpropagation neural networks." Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. Vol. 4. IEEE, 2009.
Altun, Kerem; Barshan, Billur; Tunçel, Orkun (2010). "Comparative study on classifying human activities with miniature inertial and magnetic sensors". Pattern Recognition. 43 (10): 3605–3620. doi:10.1016/j.patcog.2010.04.019. hdl:11693/11947.
Nathan, Ran; et al. (2012). "Using tri-axial acceleration data to identify behavioral modes of free-ranging animals: general concepts and tools illustrated for griffon vultures". The Journal of Experimental Biology. 215 (6): 986–996. doi:10.1242/jeb.058602. PMC 3284320. PMID 22357592.
Anguita, Davide, et al. "Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine." Ambient assisted living and home care. Springer Berlin Heidelberg, 2012. 216–223.
Su, Xing; Tong, Hanghang; Ji, Ping (2014). "Activity recognition with smartphone sensors". Tsinghua Science and Technology. 19 (3): 235–249. doi:10.1109/tst.2014.6838194.
Kadous, Mohammed Waleed. Temporal classification: Extending the classification paradigm to multivariate time series. Diss. The University of New South Wales, 2002.
Graves, Alex, et al. "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
Velloso, Eduardo, et al. "Qualitative activity recognition of weight lifting exercises."Proceedings of the 4th Augmented Human International Conference. ACM, 2013.
Mortazavi, Bobak Jack, et al. "Determining the single best axis for exercise repetition recognition and counting on smartwatches." Wearable and Implantable Body Sensor Networks (BSN), 2014 11th International Conference on. IEEE, 2014.
Sapsanis, Christos, et al. "Improving EMG based Classification of basic hand movements using EMD." Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE. IEEE, 2013.
Andrianesis, Konstantinos; Tzes, Anthony (2015). "Development and control of a multifunctional prosthetic hand with shape memory alloy actuators". Journal of Intelligent & Robotic Systems. 78 (2): 257–289. doi:10.1007/s10846-014-0061-6. S2CID 207174078.
Banos, Oresti; et al. (2014). "Dealing with the effects of sensor displacement in wearable activity recognition". Sensors. 14 (6): 9995–10023. doi:10.3390/s140609995. PMC 4118358. PMID 24915181.
Stisen, Allan, et al. "Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition."Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 2015.
Bhattacharya, Sourav, and Nicholas D. Lane. "From Smart to Deep: Robust Activity Recognition on Smartwatches using Deep Learning."
Bacciu, Davide; et al. (2014). "An experimental characterization of reservoir computing in ambient assisted living applications". Neural Computing and Applications. 24 (6): 1451–1464. doi:10.1007/s00521-013-1364-4. hdl:11568/237959. S2CID 14124013.
Palumbo, Filippo; Barsocchi, Paolo; Gallicchio, Claudio; Chessa, Stefano; Micheli, Alessio (2013). "Multisensor Data Fusion for Activity Recognition Based on Reservoir Computing". Evaluating AAL Systems Through Competitive Benchmarking. Communications in Computer and Information Science. 386. pp. 24–35. doi:10.1007/978-3-642-41043-7_3. ISBN 978-3-642-41042-0.
Reiss, Attila, and Didier Stricker. "Introducing a new benchmarked dataset for activity monitoring."Wearable Computers (ISWC), 2012 16th International Symposium on. IEEE, 2012.
Roggen, Daniel, et al. "OPPORTUNITY: Towards opportunistic activity and context recognition systems." World of Wireless, Mobile and Multimedia Networks & Workshops, 2009. WoWMoM 2009. IEEE International Symposium on a. IEEE, 2009.
Kurz, Marc, et al. "Dynamic quantification of activity recognition capabilities in opportunistic systems." Vehicular Technology Conference (VTC Spring), 2011 IEEE 73rd. IEEE, 2011.
Sztyler, Timo, and Heiner Stuckenschmidt. "On-body localization of wearable devices: an investigation of position-aware activity recognition." Pervasive Computing and Communications (PerCom), 2016 IEEE International Conference on. IEEE, 2016.
Zhi, Ying Xuan; Lukasik, Michelle; Li, Michael H.; Dolatabadi, Elham; Wang, Rosalie H.; Taati, Babak (2018). "Automatic Detection of Compensation During Robotic Stroke Rehabilitation Therapy". IEEE Journal of Translational Engineering in Health and Medicine. 6: 2100107. doi:10.1109/JTEHM.2017.2780836. ISSN 2168-2372. PMC 5788403. PMID 29404226.
Dolatabadi, Elham; Zhi, Ying Xuan; Ye, Bing; Coahran, Marge; Lupinacci, Giorgia; Mihailidis, Alex; Wang, Rosalie; Taati, Babak (23 May 2017). The toronto rehab stroke pose dataset to detect compensation during stroke rehabilitation therapy. ACM. pp. 375–381. doi:10.1145/3154862.3154925. ISBN 9781450363631. S2CID 24581930.
"Toronto Rehab Stroke Pose Dataset".
Jung, Merel M.; Poel, Mannes; Poppe, Ronald; Heylen, Dirk K. J. (1 March 2017). "Automatic recognition of touch gestures in the corpus of social touch". Journal on Multimodal User Interfaces. 11 (1): 81–96. doi:10.1007/s12193-016-0232-9. ISSN 1783-8738. S2CID 1802116.
Jung, M.M. (Merel) (1 June 2016). "Corpus of Social Touch (CoST)". University of Twente. doi:10.4121/uuid:5ef62345-3b3e-479c-8e1d-c922748c9b29. Cite journal requires |journal= (help)
Aeberhard, S., D. Coomans, and O. De Vel. "Comparison of classifiers in high dimensional settings." Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep 92-02 (1992).
Basu, Sugato. "Semi-supervised clustering with limited background knowledge." AAAI. 2004.
Tüfekci, Pınar (2014). "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods". International Journal of Electrical Power & Energy Systems. 60: 126–140. doi:10.1016/j.ijepes.2014.02.027.
Kaya, Heysem, Pınar Tüfekci, and Fikret S. Gürgen. "Local and global learning methods for predicting power of a combined gas & steam turbine." International conference on emerging trends in computer and electronics engineering (ICETCEE'2012), Dubai. 2012.
Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2014). "Searching for exotic particles in high-energy physics with deep learning". Nature Communications. 5: 2014. arXiv:1402.4735. Bibcode:2014NatCo...5.4308B. doi:10.1038/ncomms5308. PMID 24986233. S2CID 195953.
Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2015). "Enhanced Higgs Boson to τ+ τ− Search with Deep Learning". Physical Review Letters. 114 (11): 111801. arXiv:1410.3469. Bibcode:2015PhRvL.114k1801B. doi:10.1103/physrevlett.114.111801. PMID 25839260. S2CID 2339142.
Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.; Kégl, B.; Rousseau, D. (2015). "The Higgs Machine Learning Challenge". Journal of Physics Conference Series. 664 (7): 072015. Bibcode:2015JPhCS.664g2015A. doi:10.1088/1742-6596/664/7/072015.
Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson. 'Parameterized Machine Learning for High-Energy Physics.' In submission.
Ortigosa, I.; Lopez, R.; Garcia, J. "A neural networks approach to residuary resistance of sailing yachts prediction". Proceedings of the International Conference on Marine Engineering MARINE. 2007.
Gerritsma, J., R. Onnink, and A. Versluis.Geometry, resistance and stability of the delft systematic yacht hull series. Delft University of Technology, 1981.
Liu, Huan, and Hiroshi Motoda. Feature extraction, construction and selection: A data mining perspective. Springer Science & Business Media, 1998.
Reich, Yoram. Converging to Ideal Design Knowledge by Learning. [Carnegie Mellon University], Engineering Design Research Center, 1989.
Todorovski, Ljupčo; Džeroski, Sašo (1999). "Experiments in Meta-level Learning with ILP". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. 1704. pp. 98–106. doi:10.1007/978-3-540-48247-5_11. ISBN 978-3-540-66490-1.
Wang, Yong. A new approach to fitting linear models in high dimensional spaces. Diss. The University of Waikato, 2000.
Kibler, Dennis; Aha, David W.; Albert, Marc K. (1989). "Instance‐based prediction of real‐valued attributes". Computational Intelligence. 5 (2): 51–57. doi:10.1111/j.1467-8640.1989.tb00315.x. S2CID 40800413.
Palmer, Christopher R., and Christos Faloutsos. "Electricity based external similarity of categorical attributes." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2003. 486–500.
Tsanas, Athanasios; Xifara, Angeliki (2012). "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools". Energy and Buildings. 49: 560–567. doi:10.1016/j.enbuild.2012.03.003.
De Wilde, Pieter (2014). "The gap between predicted and measured energy performance of buildings: A framework for investigation". Automation in Construction. 41: 40–49. doi:10.1016/j.autcon.2014.02.009.
Brooks, Thomas F., D. Stuart Pope, and Michael A. Marcolini. Airfoil self-noise and prediction. Vol. 1218. National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Division, 1989.
Draper, David. "Assessment and propagation of model uncertainty." Journal of the Royal Statistical Society, Series B (Methodological) (1995): 45–97.
Lavine, Michael (1991). "Problems in extrapolation illustrated with space shuttle O-ring data". Journal of the American Statistical Association. 86 (416): 919–921. doi:10.1080/01621459.1991.10475132.
Wang, Jun, Bei Yu, and Les Gasser. "Concept tree based clustering visualization with shaded similarity matrices." Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 2002.
Pettengill, Gordon H., et al. "Magellan: Radar performance and data products." Science252.5003 (1991): 260–265.
Aharonian, F.; et al. (2008). "Energy spectrum of cosmic-ray electrons at TeV energies". Physical Review Letters. 101 (26): 261104. arXiv:0811.3894. Bibcode:2008PhRvL.101z1104A. doi:10.1103/PhysRevLett.101.261104. hdl:2440/51450. PMID 19437632. S2CID 41850528.
Bock, R. K.; et al. (2004). "Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope". Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment. 516 (2): 511–528. Bibcode:2004NIMPA.516..511B. doi:10.1016/j.nima.2003.08.157.
Li, Jinyan; et al. (2004). "Deeps: A new instance-based lazy discovery and classification system". Machine Learning. 54 (2): 99–124. doi:10.1023/b:mach.0000011804.08528.7d.
Siebert, Lee, and Tom Simkin. "Volcanoes of the world: an illustrated catalog of Holocene volcanoes and their eruptions." (2014).
Sikora, Marek; Wróbel, Łukasz (2010). "Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines". Archives of Mining Sciences. 55 (1): 91–114.
Sikora, Marek, and Beata Sikora. "Rough natural hazards monitoring." Rough Sets: Selected Methods and Applications in Management and Engineering. Springer London, 2012. 163–179.
Yeh, I–C (1998). "Modeling of strength of high-performance concrete using artificial neural networks". Cement and Concrete Research. 28 (12): 1797–1808. doi:10.1016/s0008-8846(98)00165-3.
Zarandi, MH Fazel; et al. (2008). "Fuzzy polynomial neural networks for approximation of the compressive strength of concrete". Applied Soft Computing. 8 (1): 488–498. Bibcode:2008ApSoC...8...79S. doi:10.1016/j.asoc.2007.02.010.
Yeh, I. "Modeling slump of concrete with fly ash and superplasticizer." Computers and Concrete5.6 (2008): 559–572.
Gencel, Osman; et al. (2011). "Comparison of artificial neural networks and general linear model approaches for the analysis of abrasive wear of concrete". Construction and Building Materials. 25 (8): 3486–3494. doi:10.1016/j.conbuildmat.2011.03.040.
Dietterich, Thomas G., et al. "A comparison of dynamic reposing and tangent distance for drug activity prediction." Advances in Neural Information Processing Systems (1994): 216–216.
Buscema, Massimo, William J. Tastle, and Stefano Terzi. "Meta net: A new meta-classifier family."Data Mining Applications Using Artificial Adaptive Systems. Springer New York, 2013. 141–182.
Ingber, Lester (1997). "Statistical mechanics of neocortical interactions: Canonical momenta indicatorsof electroencephalography". Physical Review E. 55 (4): 4578–4593. arXiv:physics/0001052. Bibcode:1997PhRvE..55.4578I. doi:10.1103/PhysRevE.55.4578. S2CID 6390999.
Hoffmann, Ulrich; Vesin, Jean-Marc; Ebrahimi, Touradj; Diserens, Karin (2008). "An efficient P300-based brain–computer interface for disabled subjects". Journal of Neuroscience Methods. 167 (1): 115–125. CiteSeerX 10.1.1.352.4630. doi:10.1016/j.jneumeth.2007.03.005. PMID 17445904. S2CID 9648828.
Donchin, Emanuel; Spencer, Kevin M.; Wijesinghe, Ranjith (2000). "The mental prosthesis: assessing the speed of a P300-based brain-computer interface". IEEE Transactions on Rehabilitation Engineering. 8 (2): 174–179. doi:10.1109/86.847808. PMID 10896179.
Detrano, Robert; et al. (1989). "International application of a new probability algorithm for the diagnosis of coronary artery disease". The American Journal of Cardiology. 64 (5): 304–310. doi:10.1016/0002-9149(89)90524-9. PMID 2756873.
Bradley, Andrew P (1997). "The use of the area under the ROC curve in the evaluation of machine learning algorithms" (PDF). Pattern Recognition. 30 (7): 1145–1159. doi:10.1016/s0031-3203(96)00142-2.
Street, W. N.; Wolberg, W. H.; Mangasarian, O. L. (1993). "Nuclear feature extraction for breast tumor diagnosis". In Acharya, Raj S; Goldgof, Dmitry B (eds.). Biomedical Image Processing and Biomedical Visualization. 1905. pp. 861–870. doi:10.1117/12.148698. S2CID 14922543.
Demir, Cigdem, and Bülent Yener. "Automated cancer diagnosis based on histopathological images: a systematic survey." Rensselaer Polytechnic Institute, Tech. Rep (2005).
Abuse, Substance. "Mental Health Services Administration, Results from the 2010 National Survey on Drug Use and Health: Summary of National Findings, NSDUH Series H-41, HHS Publication No.(SMA) 11-4658." Rockville, MD: Substance Abuse and Mental Health Services Administration 201 (2011).
Hong, Zi-Quan; Yang, Jing-Yu (1991). "Optimal discriminant plane for a small number of samples and design method of classifier on the plane". Pattern Recognition. 24 (4): 317–324. doi:10.1016/0031-3203(91)90074-f.
Li, Jinyan, and Limsoon Wong. "Using rules to analyse bio-medical data: a comparison between C4. 5 and PCL." Advances in Web-Age Information Management. Springer Berlin Heidelberg, 2003. 254-265.
Güvenir, H. Altay, et al. "A supervised machine learning algorithm for arrhythmia analysis."Computers in Cardiology 1997. IEEE, 1997.
Lagus, Krista, et al. "Independent variable group analysis in learning compact representations for data." Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05), T. Honkela, V. Könönen, M. Pöllä, and O. Simula, Eds., Espoo, Finland. 2005.
Strack, Beata, et al. "Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records." BioMed Research International 2014; 2014
Rubin, Daniel J (2015). "Hospital readmission of patients with diabetes". Current Diabetes Reports. 15 (4): 1–9. doi:10.1007/s11892-015-0584-7. PMID 25712258. S2CID 3908599.
Antal, Bálint; Hajdu, András (2014). "An ensemble-based system for automatic screening of diabetic retinopathy". Knowledge-Based Systems. 60 (2014): 20–27. arXiv:1410.8576. Bibcode:2014arXiv1410.8576A. doi:10.1016/j.knosys.2013.12.023. S2CID 13984326.
Haloi, Mrinal (2015). "Improved Microaneurysm Detection using Deep Neural Networks". arXiv:1505.04424 [cs.CV].
ELIE, Guillaume PATRY, Gervais GAUTHIER, Bruno LAY, Julien ROGER, Damien. "ADCIS Download Third Party: Messidor Database". adcis.net. Retrieved 25 February 2018.
Decencière, Etienne; Zhang, Xiwei; Cazuguel, Guy; Lay, Bruno; Cochener, Béatrice; Trone, Caroline; Gain, Philippe; Ordonez, Richard; Massin, Pascale (26 August 2014). "Feedback on a Publicly Distributed Image Database: The Messidor Database". Image Analysis & Stereology. 33 (3): 231–234. doi:10.5566/ias.1155. ISSN 1854-5165.
Bagirov, A. M.; et al. (2003). "Unsupervised and supervised data classification via nonsmooth and global optimization". Top. 11 (1): 1–75. CiteSeerX 10.1.1.1.6429. doi:10.1007/bf02578945. S2CID 14165678.
Fung, Glenn, et al. "A fast iterative algorithm for fisher discriminant using heterogeneous kernels."Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.
Quinlan, John Ross, et al. "Inductive knowledge acquisition: a case study." Proceedings of the Second Australian Conference on Applications of expert systems. Addison-Wesley Longman Publishing Co., Inc., 1987.
Zhou, Zhi-Hua; Jiang, Yuan (2004). "NeC4. 5: neural ensemble based C4. 5". IEEE Transactions on Knowledge and Data Engineering. 16 (6): 770–773. CiteSeerX 10.1.1.1.8430. doi:10.1109/tkde.2004.11. S2CID 1024861.
Er, Orhan; et al. (2012). "An approach based on probabilistic neural network for diagnosis of Mesothelioma's disease". Computers & Electrical Engineering. 38 (1): 75–81. doi:10.1016/j.compeleceng.2011.09.001.
Er, Orhan, A. Çetin Tanrikulu, and Abdurrahman Abakay. "Use of artificial intelligence techniques for diagnosis of malignant pleural mesothelioma."Dicle Tıp Dergisi 42.1 (2015).
Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (25 July 2017). "Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Deep Learning Pose Estimation". Journal of Neuroengineering and Rehabilitation. 15 (1): 97. arXiv:1707.09416. Bibcode:2017arXiv170709416L. doi:10.1186/s12984-018-0446-z. PMC 6219082. PMID 30400914.
Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (May 2018). "Automated assessment of levodopa-induced dyskinesia: Evaluating the responsiveness of video-based features". Parkinsonism & Related Disorders. 53: 42–45. doi:10.1016/j.parkreldis.2018.04.036. ISSN 1353-8020. PMID 29748112.
"Parkinson's Vision-Based Pose Estimation Dataset | Kaggle". kaggle.com. Retrieved 22 August 2018.
Shannon, Paul; et al. (2003). "Cytoscape: a software environment for integrated models of biomolecular interaction networks". Genome Research. 13 (11): 2498–2504. doi:10.1101/gr.1239303. PMC 403769. PMID 14597658.
Javadi, Soroush; Mirroshandel, Seyed Abolghasem (2019). "A novel deep learning method for automatic assessment of human sperm images". Computers in Biology and Medicine. 109: 182–194. doi:10.1016/j.compbiomed.2019.04.030. ISSN 0010-4825. PMID 31059902.
"soroushj/mhsma-dataset: MHSMA: The Modified Human Sperm Morphology Analysis Dataset". github.com. Retrieved 3 May 2019.
Clark, David, Zoltan Schreter, and Anthony Adams. "A quantitative comparison of dystal and backpropagation." Proceedings of 1996 Australian Conference on Neural Networks. 1996.
Jiang, Yuan, and Zhi-Hua Zhou. "Editing training data for kNN classifiers with neural network ensemble." Advances in Neural Networks–ISNN 2004. Springer Berlin Heidelberg, 2004. 356–361.
Ontañón, Santiago, and Enric Plaza. "On similarity measures based on a refinement lattice." Case-Based Reasoning Research and Development. Springer Berlin Heidelberg, 2009. 240–255.
Higuera, Clara; Gardiner, Katheleen J.; Cios, Krzysztof J. (2015). "Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome". PLOS ONE. 10 (6): e0129126. Bibcode:2015PLoSO..1029126H. doi:10.1371/journal.pone.0129126. PMC 4482027. PMID 26111164.
Ahmed, Md Mahiuddin; et al. (2015). "Protein dynamics associated with failed and rescued learning in the Ts65Dn mouse model of Down syndrome". PLOS ONE. 10 (3): e0119491. Bibcode:2015PLoSO..1019491A. doi:10.1371/journal.pone.0119491. PMC 4368539. PMID 25793384.
Cortez, Paulo, and Aníbal de Jesus Raimundo Morais. "A data mining approach to predict forest fires using meteorological data." (2007).
Farquad, M. A. H.; Ravi, V.; Raju, S. Bapi (2010). "Support vector regression based hybrid rule extraction methods for forecasting". Expert Systems with Applications. 37 (8): 5577–5589. doi:10.1016/j.eswa.2010.02.055.
Fisher, Ronald A (1936). "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179–188. doi:10.1111/j.1469-1809.1936.tb02137.x. hdl:2440/15227.
Ghahramani, Zoubin, and Michael I. Jordan. "Supervised learning from incomplete data via an EM approach." Advances in neural information processing systems 6. 1994.
Mallah, Charles; Cope, James; Orwell, James (2013). "Plant leaf classification using probabilistic integration of shape, texture and margin features". Signal Processing, Pattern Recognition and Applications. 5: 1.
Yahiaoui, Itheri, Olfa Mzoughi, and Nozha Boujemaa. "Leaf shape descriptor for tree species identification." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.
Langley, PAT (2014). "Trading off simplicity and coverage in incremental concept learning" (PDF). Machine Learning Proceedings. 1988: 73.
Tan, Ming, and Larry Eshelman. "Using weighted networks to represent classification knowledge in noisy domains." Proceedings of the Fifth International Conference on Machine Learning. 2014.
Charytanowicz, Małgorzata, et al. "Complete gradient clustering algorithm for features analysis of x-ray images." Information technologies in biomedicine. Springer Berlin Heidelberg, 2010. 15–24.
Sanchez, Mauricio A.; et al. (2014). "Fuzzy granular gravitational clustering algorithm for multivariate data". Information Sciences. 279: 498–511. doi:10.1016/j.ins.2014.04.005.
Blackard, Jock A.; Dean, Denis J. (1999). "Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables". Computers and Electronics in Agriculture. 24 (3): 131–151. CiteSeerX 10.1.1.128.2475. doi:10.1016/s0168-1699(99)00046-0.
Fürnkranz, Johannes. "Round robin rule learning."Proceedings of the 18th International Conference on Machine Learning (ICML-01): 146--153. 2001.
Li, Song; Assmann, Sarah M.; Albert, Réka (2006). "Predicting essential components of signal transduction networks: a dynamic model of guard cell abscisic acid signaling". PLOS Biol. 4 (10): e312. arXiv:q-bio/0610012. Bibcode:2006q.bio....10012L. doi:10.1371/journal.pbio.0040312. PMC 1564158. PMID 16968132.
Munisami, Trishen; et al. (2015). "Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest Neighbour Classifiers". Procedia Computer Science. 58: 740–747. doi:10.1016/j.procs.2015.08.095.
Li, Bai (2016). "Atomic potential matching: An evolutionary target recognition approach based on edge features". Optik-International Journal for Light and Electron Optics. 127 (5): 3162–3168. Bibcode:2016Optik.127.3162L. doi:10.1016/j.ijleo.2015.11.186.
Nilsback, Maria-Elena, and Andrew Zisserman. "A visual vocabulary for flower classification."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.
Giselsson, Thomas M.; et al. (2017). "A Public Image Database for Benchmark of Plant Seedling Classification Algorithms". arXiv:1711.05458 [cs.CV].
Muresan, Horea; Oltean, Mihai (2018). "Fruit recognition from images using deep learning". Acta Univ. Sapientiae, Informatica. 10 (1): 26–42. doi:10.2478/ausi-2018-0002.
Oltean, Mihai; Muresan, Horea (2017). "A dataset with fruit images on Kaggle".
Nakai, Kenta; Kanehisa, Minoru (1991). "Expert system for predicting protein localization sites in gram‐negative bacteria". Proteins: Structure, Function, and Bioinformatics. 11 (2): 95–110. doi:10.1002/prot.340110203. PMID 1946347. S2CID 27606447.
Ling, Charles X., et al. "Decision trees with minimal costs." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.
Mahé, Pierre, et al. "Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum." Bioinformatics (2014): btu022.
Barbano, Duane; et al. (2015). "Rapid characterization of microalgae and microalgae mixtures using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS)". PLOS ONE. 10 (8): e0135337. Bibcode:2015PLoSO..1035337B. doi:10.1371/journal.pone.0135337. PMC 4536233. PMID 26271045.
Horton, Paul; Nakai, Kenta (1996). "A probabilistic classification system for predicting the cellular localization sites of proteins" (PDF). ISMB-96 Proceedings. 4: 109–15. PMID 8877510.
Allwein, Erin L.; Schapire, Robert E.; Singer, Yoram (2001). "Reducing multiclass to binary: A unifying approach for margin classifiers" (PDF). The Journal of Machine Learning Research. 1: 113–141.
Mayr, Andreas; Klambauer, Guenter; Unterthiner, Thomas; Hochreiter, Sepp (2016). "DeepTox: Toxicity Prediction Using Deep Learning". Frontiers in Environmental Science. 3: 80. doi:10.3389/fenvs.2015.00080.
Lavin, Alexander; Ahmad, Subutai (12 October 2015). Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark. p. 38. arXiv:1510.03336. doi:10.1109/ICMLA.2015.141. ISBN 978-1-5090-0287-0. S2CID 6842305.
Iurii D. Katser; Vyacheslav O. Kozitsin. "SKAB GitHub repository". Retrieved 12 January 2021.
Iurii D. Katser; Vyacheslav O. Kozitsin (2020). "Skoltech Anomaly Benchmark (SKAB)". Kaggle. doi:10.34740/KAGGLE/DSV/1693952. Retrieved 12 January 2021.
Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study". Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN 1384-5810. S2CID 1952214.
Ann-Kathrin Hartmann, Tommaso Soru, Edgard Marx. Generating a Large Dataset for Neural Question Answering over the DBpedia Knowledge Base. 2018.
Tommaso Soru, Edgard Marx. Diego Moussallem, Andre Valdestilhas, Diego Esteves, Ciro Baron. SPARQL as a Foreign Language. 2018.
Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. A Vietnamese Dataset for Evaluating Machine Reading Comprehension. COLING 2020.
Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension. IEEE Access. 2020.
Brown, Michael Scott, Michael J. Pelosi, and Henry Dirska. "Dynamic-radius species-conserving genetic algorithm for the financial forecasting of Dow Jones index stocks." Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg, 2013. 27–41.
Shen, Kao-Yi; Tzeng, Gwo-Hshiung (2015). "Fuzzy Inference-Enhanced VC-DRSA Model for Technical Analysis: Investment Decision Aid". International Journal of Fuzzy Systems. 17 (3): 375–389. doi:10.1007/s40815-015-0058-8. S2CID 68241024.
Quinlan, J. Ross (1987). "Simplifying decision trees". International Journal of Man-machine Studies. 27 (3): 221–234. CiteSeerX 10.1.1.18.4267. doi:10.1016/s0020-7373(87)80053-6.
Hamers, Bart; Suykens, Johan AK; De Moor, Bart (2003). "Coupled transductive ensemble learning of kernel models" (PDF). Journal of Machine Learning Research. 1: 1–48.
Shmueli, Galit, Ralph P. Russo, and Wolfgang Jank. "The BARISTA: a model for bid arrivals in online auctions." The Annals of Applied Statistics(2007): 412–441.
Peng, Jie, and Hans-Georg Müller. "Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions." The Annals of Applied Statistics (2008): 1056–1077.
Eggermont, Jeroen, Joost N. Kok, and Walter A. Kosters. "Genetic programming for data classification: Partitioning the search space."Proceedings of the 2004 ACM symposium on Applied computing. ACM, 2004.
Moro, Sérgio; Cortez, Paulo; Rita, Paulo (2014). "A data-driven approach to predict the success of bank telemarketing". Decision Support Systems. 62: 22–31. doi:10.1016/j.dss.2014.03.001. hdl:10071/9499.
Payne, Richard D.; Mallick, Bani K. (2014). "Bayesian Big Data Classification: A Review with Complements". arXiv:1411.5653 [stat.ME].
Akbilgic, Oguz; Bozdogan, Hamparsum; Balaban, M. Erdal (2014). "A novel Hybrid RBF Neural Networks model as a forecaster". Statistics and Computing. 24 (3): 365–375. doi:10.1007/s11222-013-9375-7. S2CID 17764829.
Jabin, Suraiya. "Stock market prediction using feed-forward artificial neural network." Int. J. Comput. Appl. (IJCA) 99.9 (2014).
Yeh, I-Cheng; Che-hui, Lien (2009). "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients". Expert Systems with Applications. 36 (2): 2473–2480. doi:10.1016/j.eswa.2007.12.020.
Lin, Shu Ling (2009). "A new two-stage hybrid approach of credit risk in banking industry". Expert Systems with Applications. 36 (4): 8333–8341. doi:10.1016/j.eswa.2008.10.015.
Pelckmans, Kristiaan; et al. (2005). "The differogram: Non-parametric noise variance estimation and its use for model selection". Neurocomputing. 69 (1): 100–122. doi:10.1016/j.neucom.2005.02.015.
Bay, Stephen D.; et al. (2000). "The UCI KDD archive of large data sets for data mining research and experimentation". ACM SIGKDD Explorations Newsletter. 2 (2): 81–85. CiteSeerX 10.1.1.15.9776. doi:10.1145/380995.381030. S2CID 534881.
Lucas, D. D.; et al. (2015). "Designing optimal greenhouse gas observing networks that consider performance and cost". Geoscientific Instrumentation, Methods and Data Systems. 4 (1): 121. Bibcode:2015GI......4..121L. doi:10.5194/gi-4-121-2015.
Pales, Jack C.; Keeling, Charles D. (1965). "The concentration of atmospheric carbon dioxide in Hawaii". Journal of Geophysical Research. 70 (24): 6053–6076. Bibcode:1965JGR....70.6053P. doi:10.1029/jz070i024p06053.
Sigillito, Vincent G., et al. "Classification of radar returns from the ionosphere using neural networks." Johns Hopkins APL Technical Digest10.3 (1989): 262–266.
Zhang, Kun, and Wei Fan. "Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond." Knowledge and Information Systems14.3 (2008): 299–326.
Reich, Brian J., Montserrat Fuentes, and David B. Dunson. "Bayesian spatial quantile regression." Journal of the American Statistical Association (2012).
Kohavi, Ron (1996). "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid". KDD. 96.
Oza, Nikunj C., and Stuart Russell. "Experimental comparisons of online and batch versions of bagging and boosting." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.
Bay, Stephen D (2001). "Multivariate discretization for set mining". Knowledge and Information Systems. 3 (4): 491–512. CiteSeerX 10.1.1.217.921. doi:10.1007/pl00011680. S2CID 10945544.
Ruggles, Steven (1995). "Sample designs and sampling errors". Historical Methods: A Journal of Quantitative and Interdisciplinary History. 28 (1): 40–46. doi:10.1080/01615440.1995.9955312.
Meek, Christopher, Bo Thiesson, and David Heckerman. "The Learning Curve Method Applied to Clustering." AISTATS. 2001.
Fanaee-T, Hadi; Gama, Joao (2013). "Event labeling combining ensemble detectors and background knowledge". Progress in Artificial Intelligence. 2 (2–3): 113–127. doi:10.1007/s13748-013-0040-3. S2CID 3345087.
Giot, Romain, and Raphaël Cherrier. "Predicting bikeshare system usage up to one day ahead." Computational intelligence in vehicles and transportation systems (CIVTS), 2014 IEEE symposium on. IEEE, 2014.
Zhan, Xianyuan; et al. (2013). "Urban link travel time estimation using large-scale taxi data with partial information". Transportation Research Part C: Emerging Technologies. 33: 37–49. doi:10.1016/j.trc.2013.04.001.
Moreira-Matias, Luis; et al. (2013). "Predicting taxi–passenger demand using streaming data". IEEE Transactions on Intelligent Transportation Systems. 14 (3): 1393–1402. doi:10.1109/tits.2013.2262376. S2CID 14764358.
Hwang, Ren-Hung; Hsueh, Yu-Ling; Chen, Yu-Ting (2015). "An effective taxi recommender system based on a spatio-temporal factor analysis model". Information Sciences. 314: 28–40. doi:10.1016/j.ins.2015.03.068.
H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and Cyrus Shahabi. Big data and its technical challenges. Commun. ACM, 57(7):86–94, July 2014.
http://pems.dot.ca.gov/
Meusel, Robert, et al. "The Graph Structure in the Web—Analyzed on Different Aggregation Levels."The Journal of Web Science 1.1 (2015).
Kushmerick, Nicholas. "Learning to remove internet advertisements." Proceedings of the third annual conference on Autonomous Agents. ACM, 1999.
Fradkin, Dmitriy, and David Madigan. "Experiments with random projections for machine learning."Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
This data was used in the American Statistical Association Statistical Graphics and Computing Sections 1999 Data Exposition.
Ma, Justin, et al. "Identifying suspicious URLs: an application of large-scale online learning."Proceedings of the 26th annual international conference on machine learning. ACM, 2009.
Levchenko, Kirill, et al. "Click trajectories: End-to-end analysis of the spam value chain." Security and Privacy (SP), 2011 IEEE Symposium on. IEEE, 2011.
Mohammad, Rami M., Fadi Thabtah, and Lee McCluskey. "An assessment of features related to phishing websites using an automated technique."Internet Technology And Secured Transactions, 2012 International Conference for. IEEE, 2012.
Singh, Ashishkumar, et al. "Clustering Experiments on Big Transaction Data for Market Segmentation." Proceedings of the 2014 International Conference on Big Data Science and Computing. ACM, 2014.
Bollacker, Kurt, et al. "Freebase: a collaboratively created graph database for structuring human knowledge." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.
Mintz, Mike, et al. "Distant supervision for relation extraction without labeled data." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009.
Mesterharm, Chris, and Michael J. Pazzani. "Active learning using on-line algorithms."Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.
Wang, Shusen; Zhang, Zhihua (2013). "Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling" (PDF). The Journal of Machine Learning Research. 14 (1): 2729–2769. arXiv:1303.4207. Bibcode:2013arXiv1303.4207W.
Cattral, Robert; Oppacher, Franz; Deugo, Dwight (2002). "Evolutionary data mining with automatic rule generalization" (PDF). Recent Advances in Computers, Computing and Communications: 296–300. S2CID 18625415.
Burton, Ariel N.; Kelly, Paul H.J. (2006). "Performance prediction of paging workloads using lightweight tracing". Future Generation Computer Systems. Elsevier BV. 22 (7): 784–793. doi:10.1016/j.future.2006.02.003. ISSN 0167-739X.
Bain, Michael; Muggleton, Stephen (1994). "Learning optimal chess strategies". Machine Intelligence. Oxford University Press, Inc. 13.
Quilan, J. R. (1983). "Learning efficient classification procedures and their application to chess end games". Machine Learning: An Artificial Intelligence Approach. 1: 463–482. doi:10.1007/978-3-662-12405-5_15. ISBN 978-3-662-12407-9.
Shapiro, Alen D. (1987). Structured induction in expert systems. Addison-Wesley Longman Publishing Co., Inc.
Matheus, Christopher J.; Rendell, Larry A. (1989). "Constructive Induction on Decision Trees" (PDF). IJCAI. 89.
Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. Vol. 571. John Wiley & Sons, 2005.
Ruotsalo, Tuukka; Aroyo, Lora; Schreiber, Guus (2009). "Knowledge-based linguistic annotation of digital cultural heritage collections" (PDF). IEEE Intelligent Systems. 24 (2): 64–75. doi:10.1109/MIS.2009.32. S2CID 6667472.
Li, Lihong, et al. "Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.
Yeung, Kam Fung, and Yanyan Yang. "A proactive personalized mobile news recommendation system." Developments in E-systems Engineering (DESE), 2010. IEEE, 2010.
Gass, Susan E.; Roberts, J. Murray (2006). "The occurrence of the cold-water coral Lophelia pertusa (Scleractinia) on oil and gas platforms in the North Sea: colony growth, recruitment and environmental controls on distribution". Marine Pollution Bulletin. 52 (5): 549–559. doi:10.1016/j.marpolbul.2005.10.002. PMID 16300800.
Gionis, Aristides; Mannila, Heikki; Tsaparas, Panayiotis (2007). "Clustering aggregation". ACM Transactions on Knowledge Discovery from Data. 1 (1): 4. CiteSeerX 10.1.1.709.528. doi:10.1145/1217299.1217303. S2CID 433708.
Obradovic, Zoran, and Slobodan Vucetic.Challenges in Scientific Data Mining: Heterogeneous, Biased, and Large Samples. Technical Report, Center for Information Science and Technology Temple University, 2004.
Van Der Putten, Peter; van Someren, Maarten (2000). "CoIL challenge 2000: The insurance company case". Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report. 9: 1–43.
Mao, K. Z. (2002). "RBF neural network center selection based on Fisher ratio class separability measure". IEEE Transactions on Neural Networks. 13 (5): 1211–1217. doi:10.1109/tnn.2002.1031953. PMID 18244518.
Olave, Manuel; Rajkovic, Vladislav; Bohanec, Marko (1989). "An application for admission in public school systems" (PDF). Expert Systems in Public Administration. 1: 145–160.
Lizotte, Daniel J., Omid Madani, and Russell Greiner. "Budgeted learning of nailve-bayes classifiers." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2002.
Lebowitz, Michael (1986). Concept learning in a rich input domain: Generalization-based memory. Machine Learning: An Artificial Intelligence Approach. 2. pp. 193–214. ISBN 9780934613002.
Yeh, I-Cheng; Yang, King-Jang; Ting, Tao-Ming (2009). "Knowledge discovery on RFM model using Bernoulli sequence". Expert Systems with Applications. 36 (3): 5866–5871. doi:10.1016/j.eswa.2008.07.018.
Lee, Wen-Chen; Cheng, Bor-Wen (2011). "An intelligent system for improving performance of blood donation". Journal of Quality Vol. 18 (2): 173.
Schmidtmann, Irene, et al. "Evaluation des Krebsregisters NRW Schwerpunkt Record Linkage." Abschlußbericht vom 11 (2009).
Sariyar, Murat; Borg, Andreas; Pommerening, Klaus (2011). "Controlling false match rates in record linkage using extreme value theory". Journal of Biomedical Informatics. 44 (4): 648–654. doi:10.1016/j.jbi.2011.02.008. PMID 21352952.
Candillier, Laurent, and Vincent Lemaire. "Design and Analysis of the Nomao challenge Active Learning in the Real-World." Proceedings of the ALRA: Active Learning in Real-world Applications, Workshop ECML-PKDD. 2012.
Marquez, Ivan Garrido. "A Domain Adaptation Method for Text Classification based on Self-adjusted Training Approach." (2013).
Nagesh, Harsha S., Sanjay Goil, and Alok N. Choudhary. "Adaptive Grids for Clustering Massive Data Sets." SDM. 2001.
Kuzilek, Jakub, et al. "OU Analyse: analysing at-risk students at The Open University." Learning Analytics Review (2015): 1–16.
Siemens, George, et al. Open Learning Analytics: an integrated & modularized platform. Diss. Open University Press, 2011.
Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella, Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio; Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno (2015). "A multi-source dataset of urban life in the city of Milan and the Province of Trentino". Scientific Data. 2: 150055. Bibcode:2015NatSD...250055B. doi:10.1038/sdata.2015.55. ISSN 2052-4463. PMC 4622222. PMID 26528394.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013). "OpenML: networked science in machine learning". SIGKDD Explorations. 15 (2): 49–60. arXiv:1407.7722. doi:10.1145/2641190.2641198. S2CID 4977460.
Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017). "PMLB: a large benchmark suite for machine learning evaluation and comparison". BioData Mining. 10: 36. arXiv:1703.00512. Bibcode:2017arXiv170300512O. doi:10.1186/s13040-017-0154-4. PMC 5725843. PMID 29238404.
"Off The Shelf Datasets". appen.com. Appen. Retrieved 30 December 2020.
"Open Source Datasets". appen.com. Appen. Retrieved 30 December 2020.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Wissner-Gross, A. "Datasets Over Algorithms". Edge.com. Retrieved 8 January 2016.

[2] Weiss, G. M.; Provost, F. (1 September 2003). "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction". Journal of Artificial Intelligence Research. AI Access Foundation. 19: 315–354. doi:10.1613/jair.1199. ISSN 1076-9757. S2CID 2344521.

[3] Turney, Peter (2000). "Types of cost in inductive concept learning". arXiv:cs/0212034.

[4] Abney, Steven (17 September 2007). Semisupervised Learning for Computational Linguistics. CRC Press. ISBN 978-1-4200-1080-0.

[5] Žliobaitė, Indrė; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoff (2011). "Active Learning with Evolving Streaming Data". Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 597–612. doi:10.1007/978-3-642-23808-6_39. ISBN 978-3-642-23807-9. ISSN 0302-9743.

[6] Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Kotsia, I. (2017). "Aff-Wild: Valence and Arousal in-the-wild Challenge" (PDF). Computer Vision and Pattern Recognition Workshops (CVPRW), 2017: 1980–1987. doi:10.1109/CVPRW.2017.248. ISBN 978-1-5386-0733-6. S2CID 3107614.

[7] Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision (IJCV), 2019. 127 (6–7): 907–929. doi:10.1007/s11263-019-01158-4. S2CID 13679040.

[8] Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface" (PDF). British Machine Vision Conference (BMVC), 2019. arXiv:1910.04855.

[9] Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing affective behavior in the first abaw 2020 competition". IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2020. arXiv:2001.11409.

[:4-10] Phillips, P. Jonathon; et al. (1998). "The FERET database and evaluation procedure for face-recognition algorithms". Image and Vision Computing. 16 (5): 295–306. doi:10.1016/s0262-8856(97)00070-x.

[11] Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch graph matching". IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321. doi:10.1109/34.598235.

[12] Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English". PLOS ONE. 13 (5): e0196391. Bibcode:2018PLoSO..1396391L. doi:10.1371/journal.pone.0196391. PMC 5955500. PMID 29768426.

[13] Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). doi:10.5281/zenodo.1188976.

[:0-14] Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface–surveillance cameras face database". Multimedia Tools and Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2. S2CID 207218990.

[15] Wallace, Roy, et al. "Inter-session variability modelling and joint factor analysis for face authentication." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011.

[16] Georghiades, A. "Yale face database". Center For Computational Vision And Control At Yale University, http://CVC.yale.edu/Projects/Yalefaces/Yalefa. 2: 1997. External link in |journal= (help)

[17] Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature extraction using field-programmable gate arrays". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848. doi:10.1109/tsmcb.2005.862728. PMID 16903373. S2CID 7334355.

[18] Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive database for facial expression analysis." Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000.

[19] Zeng, Zhihong; et al. (2009). "A survey of affect recognition methods: Audio, visual, and spontaneous expressions". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (1): 39–58. CiteSeerX 10.1.1.144.217. doi:10.1109/tpami.2008.52. PMID 19029545.

[20] Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial expression images". The Japanese Female Facial Expression (JAFFE) Database. doi:10.5281/zenodo.3451524.

[21] Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "Coding facial expressions with Gabor wavelets." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998.

[22] Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.

[23] RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2015). "One-to-many face recognition with bilinear CNNs". arXiv:1506.01342 [cs.CV].

[24] Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001.

[25] Huang, Gary B., et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.

[26] Bhatt, Rajen B., et al. "Efficient skin region segmentation using low complexity fuzzy decision tree model." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.

[27] Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue areas in melanoma dermoscopy images". Computerized Medical Imaging and Graphics. 38 (5): 403–410. doi:10.1016/j.compmedimag.2014.03.007. PMC 4287461. PMID 24786720.

[28] Maes, Chris, et al. "Feature detection on 3D face surfaces for pose normalisation and recognition." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010.

[29] Savran, Arman, et al. "Bosphorus database for 3D face analysis." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56.

[30] Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-dimensional face recognition: An eigensurface approach." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004.

[31] Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face Recognition". Journal of Multimedia. 6 (5): 467–475. CiteSeerX 10.1.1.461.9710. doi:10.4304/jmm.6.5.467-475.

[32] Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D face recognition by local shape difference boosting". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (10): 1858–1870. CiteSeerX 10.1.1.471.2424. doi:10.1109/tpami.2009.200. PMID 20724762. S2CID 15263913.

[33] Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face recognition using learned visual codebook." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.

[34] Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011). "Facial expression recognition from near-infrared videos" (PDF). Image and Vision Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002.

[35] Soyel, Hamit, and Hasan Demirel. "Facial expression recognition using 3D facial feature distances." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838.

[36] Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition". Computer Vision and Image Understanding. 101 (1): 1–15. CiteSeerX 10.1.1.134.8784. doi:10.1016/j.cviu.2005.05.005.

[37] Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature sets for face recognition under difficult lighting conditions". IEEE Transactions on Image Processing. 19 (6): 1635–1650. Bibcode:2010ITIP...19.1635T. CiteSeerX 10.1.1.105.3355. doi:10.1109/tip.2010.2042645. PMID 20172829. S2CID 4943234.

[38] Mousavi, Mir Hashem, Karim Faez, and Amin Asghari. "Three dimensional face recognition using SVM classifier." Computer and Information Science, 2008. ICIS 08. Seventh IEEE/ACIS International Conference on. IEEE, 2008.

[39] Amberg, Brian, Reinhard Knothe, and Thomas Vetter. "Expression invariant 3D face recognition with a morphable model." Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on. IEEE, 2008.

[40] İrfanoğlu, M. O., Berk Gökberk, and Lale Akarun. "3D shape-based face recognition using automatically registered facial surfaces." Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 4. IEEE, 2004.

[41] Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D and grey level clues". Pattern Recognition Letters. 22 (12): 1321–1329. doi:10.1016/s0167-8655(01)00077-0.

[42] Afifi, Mahmoud; Abdelhamed, Abdelrahman (13 June 2017). "AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces". arXiv:1706.04277 [cs.CV].

[43] "SoF dataset". sites.google.com. Retrieved 18 November 2017.

[44] "IMDB-WIKI". data.vision.ee.ethz.ch. Retrieved 13 March 2018.

[45] Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012). "Structured learning of human interactions in TV shows". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (12): 2441–2453. doi:10.1109/tpami.2012.24. PMID 23079467. S2CID 6060568.

[46] Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). Berkeley MHAD: A comprehensive multimodal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.

[47] Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013.

[48] Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014.

[49] Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu, Michel (2016). "Fast Action Localization in Large-Scale Video Archives". IEEE Transactions on Circuits and Systems for Video Technology. 26 (10): 1917–1930. doi:10.1109/TCSVT.2015.2475835. S2CID 31537462.

[50] Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata, Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017). "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations". International Journal of Computer Vision. 123: 32–73. arXiv:1602.07332. doi:10.1007/s11263-016-0981-7. S2CID 4492210.

[:6-51] Karayev, S., et al. "A category-level 3-D object dataset: putting the Kinect to work." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011.

[52] Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable nonparametric image parsing with superpixels." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365.

[53] Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour Detection and Hierarchical Image Segmentation" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161. PMID 20733228. S2CID 206764694. Retrieved 27 February 2016.

[54] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 740–755.

[55] Russakovsky, Olga; et al. (2015). "Imagenet large scale visual recognition challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547.

[56] Xiao, Jianxiong, et al. "Sun database: Large-scale scene recognition from abbey to zoo." Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.

[57] Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang, Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition". arXiv:1310.1531 [cs.CV].

[58] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

[:02-59] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

[60] Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale Visual Recognition Challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547.

[61] Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages."

[62] Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast News Videos." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014.

[63] Hauptmann, Alexander G., and Michael J. Witbrock. "Story segmentation and detection of commercials in broadcast news video." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998.

[64] Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and visualizing nonlinear correlation clusters." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.

[65] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.

[66] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.

[67] Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007 [Online]. Available: http://authors.library.caltech.edu/7694, 2007.

[68] Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.

[69] Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177.

[70] Heitz, Geremy; et al. (2009). "Shape-based object localization for descriptive classification". International Journal of Computer Vision. 84 (1): 40–62. CiteSeerX 10.1.1.142.280. doi:10.1007/s11263-009-0228-y. S2CID 646320.

[71] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset." In CVPR Workshop on The Future of Datasets in Vision, 2015.

[72] Everingham, Mark; et al. (2010). "The pascal visual object classes (voc) challenge". International Journal of Computer Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4. S2CID 4246903.

[73] Felzenszwalb, Pedro F.; et al. (2010). "Object detection with discriminatively trained part-based models". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645. CiteSeerX 10.1.1.153.2745. doi:10.1109/tpami.2009.167. PMID 20634557. S2CID 3198903.

[:12-74] Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.

[75] "CINIC-10 dataset". Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 9 October 2018. Retrieved 13 November 2018.

[76] fashion-mnist: A MNIST-like fashion product database. Benchmark :point_right, Zalando Research, 7 October 2017, retrieved 7 October 2017

[77] "notMNIST dataset". Machine Learning, etc. 8 September 2011. Retrieved 13 October 2017.

[78] Houben, Sebastian, et al. "Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.

[79] Mathias, Mayeul, et al. "Traffic sign recognition—How far are we from the solution?." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.

[80] Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

[81] Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012.

[82] Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 dataset. Chaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/

[83] Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle Detection in Agriculture". Sensors. 17 (11): 2579. arXiv:1709.03526. Bibcode:2017arXiv170903526F. doi:10.3390/s17112579. PMC 5713196. PMID 29120383.

[84] Afifi, Mahmoud (12 November 2017). "Gender recognition and biometric identification using a large dataset of hand images". arXiv:1711.04322 [cs.CV].

[85] Lomonaco, Vincenzo; Maltoni, Davide (18 October 2017). "CORe50: a New Dataset and Benchmark for Continuous Object Recognition". arXiv:1705.03550 [cs.CV].

[86] She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin; Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao; Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (15 November 2019). "OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning". arXiv:1911.06487v2 [cs.CV].

[87] Morozov, Alexei; Sushkova, Olga (13 June 2019). "THz and thermal video data set". Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance. Moscow: IRE RAS. Retrieved 19 July 2019.

[88] Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov, Alexander (9 July 2019). "Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images" (PDF). CEUR. 2391: paper19. Retrieved 19 July 2019.

[89] Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept definitions." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993.

[90] Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive classifiers". Machine Learning. 6 (2): 161–182. doi:10.1007/bf00114162.

[91] Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved learning of Riemannian metrics for exploratory analysis". Neural Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865. doi:10.1016/j.neunet.2004.06.008. PMID 15555853.

[casia13-92] Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. doi:10.1016/j.patcog.2012.06.021.

[OLHWDB1-93] Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters". 2009 10th International Conference on Document Analysis and Recognition: 1206–1210. doi:10.1109/ICDAR.2009.163. ISBN 978-1-4244-4500-4. S2CID 5705532.

[94] Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting motion primitives from natural handwriting data. Springer Berlin Heidelberg, 2006.

[95] Meier, Franziska, et al. "Movement segmentation using a primitive library."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.

[96] T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009

[97] Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters." LREC. 2008.

[98] Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures of von mises distributions for people trajectory shape analysis". IEEE Transactions on Circuits and Systems for Video Technology. 21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550. S2CID 1427766.

[99] Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature selection challenge." Advances in neural information processing systems. 2004.

[100] Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (11 December 2015). "Human-level concept learning through probabilistic program induction". Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L. doi:10.1126/science.aab3050. ISSN 0036-8075. PMID 26659050.

[101] Lake, Brenden (9 November 2019), Omniglot data set for one-shot learning, retrieved 10 November 2019

[102] LeCun, Yann; et al. (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791.

[103] Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008.

[104] Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of combining multiple classifiers and their applications to handwriting recognition". IEEE Transactions on Systems, Man and Cybernetics. 22 (3): 418–435. doi:10.1109/21.155943. hdl:10338.dmlcz/135217.

[105] Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based handwritten digit recognition." (1996).

[106] Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using relevance weighted LDA". Pattern Recognition. 38 (4): 485–493. doi:10.1016/j.patcog.2004.09.005.

[107] Hong, Yi, et al. "Learning a mixture of sparse distance metrics for classification and dimensionality reduction." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.

[108] Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 [cs.CV].

[109] Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat; Mukhopadhyay, Supratik (20 June 2018). "Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters". arXiv:1806.08037 [cs.CV].

[110] Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019), "PCGAN-CHAR: Progressively Trained Classifier Generative Adversarial Networks for Classification of Noisy Handwritten Bangla Characters", Digital Libraries at the Crossroads of Digital Information for the Future, Springer International Publishing, pp. 3–15, arXiv:1908.08987, doi:10.1007/978-3-030-34058-2_1, ISBN 978-3-030-34057-5, S2CID 201665955

[111] Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013). "Systematic benchmarking of aerial image segmentation". IEEE Geoscience and Remote Sensing Letters. 10 (6): 1527–1531. Bibcode:2013IGRSL..10.1527Y. doi:10.1109/lgrs.2013.2261453. S2CID 629629.

[112] Vatsavai, Ranga Raju. "Object based image classification: state of the art and computational challenges." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013.

[113] Butenuth, Matthias, et al. "Integrating pedestrian simulation, tracking and event detection for crowd analysis." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.

[114] Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis using frame-wise normalized feature for people counting." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012.

[115] Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees." International journal of remote sensing34.20 (2013): 6969–6982.

[116] Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran (2015). "A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification". Remote Sensing Letters. 6 (7): 568–577. doi:10.1080/2150704X.2015.1062159. S2CID 58788630.

[117] Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks." Remote Sensing. 2018; 10(4):511.

[118] Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset" [Online]. Available: https://www.iuii.ua.es/datasets/masati/, 2018.

[119] Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using geographically weighted variables for image classification". Remote Sensing Letters. 3 (6): 491–499. doi:10.1080/01431161.2011.629637. S2CID 122543681.

[120] Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid NN-GA Model Based Approach." Information Systems Design and Intelligent Applications. Springer India, 2016. 227-236.

[121] Diegert, Carl. "A combinatorial method for tracing objects using semantics of their shape." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010.

[122] Razakarivony, Sebastien, and Frédéric Jurie. "Small target detection combining foreground and background manifolds." IAPR International Conference on Machine Vision Applications. 2013.

[123] "SpaceNet". explore.digitalglobe.com. Retrieved 13 March 2018.

[124] Etten, Adam Van (5 January 2017). "Getting Started With SpaceNet Data". The DownLinQ. Retrieved 13 March 2018.

[125] Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July 2017). Integrating edge/boundary priors with classification scores for building detection in very high resolution data. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705. ISBN 978-1-5090-4951-6. S2CID 8297433.

[126] Yang, Yi; Newsam, Shawn (2010). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '10. New York, New York, USA: ACM Press. doi:10.1145/1869790.1869829. ISBN 9781450304283. S2CID 993769.

[:1-127] Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (3 November 2015). DeepSat: a learning framework for satellite imagery. ACM. p. 37. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134.

[:11-128] Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (21 November 2019). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097.

[129] Mills, Kyle; Tamblyn, Isaac (16 May 2018), Big graphene dataset, National Research Council of Canada, doi:10.4224/c8sc04578j.data

[130] Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (16 May 2018). "Quantum simulation". Quantum simulations of an electron in a two dimensional potential well. National Research Council of Canada. doi:10.4224/PhysRevA.96.042113.data.

[131] Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). A database for fine grained activity detection of cooking activities. IEEE. doi:10.1109/cvpr.2012.6247801. ISBN 978-1-4673-1228-8.

[132] Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of actions: Recovering the syntax and semantics of goal-directed human activities."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

[133] Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS)."Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012.

[134] Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile fine-grained recognition of pharma packages."Proc. European Signal Processing Conference (EUSIPCO). 2017.

[135] Khosla, Aditya, et al. "Novel dataset for fine-grained image categorization: Stanford dogs."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011.

[:7-136] Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

[137] Biggs, Benjamin, et al. "Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.."Proc. ECCV. 2020.

[Razavian,_Ali_2014-138] Razavian, Ali, et al. "CNN features off-the-shelf: an astounding baseline for recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014.

[139] Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity queries in MARS". IEEE Transactions on Knowledge and Data Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079. doi:10.1109/69.738357.

[140] He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "Multiscale conditional random fields for image labeling." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004.

[141] Deneke, Tewodros, et al. "Video transcoding time prediction for proactive load balancing." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014.

[142] Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell (13 April 2016). "Visual Storytelling". arXiv:1604.03968 [cs.CL].CS1 maint: multiple names: authors list (link)

[143] Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset." (2011).

[144] Duan, Kun, et al. "Discovering localized attributes for fine-grained recognition." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

[145] "YouTube-8M Dataset". research.google.com. Retrieved 1 October 2016.

[146] Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale Video Classification Benchmark". arXiv:1609.08675 [cs.CV].

[147] "YFCC100M Dataset". mmcommons.org. Yahoo-ICSI-LLNL. Retrieved 1 June 2017.

[148] Bart Thomee; David A Shamma; Gerald Friedland; Benjamin Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April 2016). "Yfcc100m: The new data in multimedia research". Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817. doi:10.1145/2812802. S2CID 207230134.

[149] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-ACCEDE: A Video Database for Affective Content Analysis," in IEEE Transactions on Affective Computing, 2015.

[150] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.

[151] M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The mediaeval 2015 affective impact of movies task," in MediaEval 2015 Workshop, 2015.

[152] S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation", in Proceedings of the 21st British Machine Vision Conference (BMVC2010)

[153] S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011)

[154] Afifi, Mahmoud; Hussain, Khaled F. (2 November 2017). "The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques". arXiv:1711.00972 [cs.CV].

[155] "MCQ Dataset". sites.google.com. Retrieved 18 November 2017.

[156] Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.; Derbaz, S. (July 2016). A new compression technique for surveillance videos: Evaluation using new dataset. 2016 Sixth International Conference on Digital Information and Communication Technology and Its Applications (DICTAP). pp. 159–164. doi:10.1109/DICTAP.2016.7544020. ISBN 978-1-4673-9609-7. S2CID 8698850.

[TabakNorouzzadeh2018-157] Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.; Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White, Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.; Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.; Photopoulou, Theoni (2018). "Machine learning to classify animal species in camera trap images: Applications in ecology". Methods in Ecology and Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120. ISSN 2041-210X.

[158] Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed, Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M. (November 2017). "Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification". Journal of Electronic Imaging. 26 (6): 060501. arXiv:1706.03867. Bibcode:2017JEI....26f0501T. doi:10.1117/1.jei.26.6.060501. ISSN 1017-9909. S2CID 12367169.

[159] McAuley, Julian, et al. "Image-based recommendations on styles and substitutes." Proceedings of the 38th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2015

[160] Ganesan, Kavita; Zhai, Chengxiang (2012). "Opinion-based entity ranking". Information Retrieval. 15 (2): 116–150. doi:10.1007/s10791-011-9174-8. hdl:2142/15252. S2CID 16258727.

[161] Lv, Yuanhua, Dimitrios Lymberopoulos, and Qiang Wu. "An exploration of ranking heuristics in mobile local search." Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2012.

[162] Harper, F. Maxwell; Konstan, Joseph A. (2015). "The MovieLens Datasets: History and Context". ACM Transactions on Interactive Intelligent Systems. 5 (4): 19. doi:10.1145/2827872. S2CID 16619709.

[163] Koenigstein, Noam, Gideon Dror, and Yehuda Koren. "Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy." Proceedings of the fifth ACM conference on Recommender systems. ACM, 2011.

[164] McFee, Brian, et al. "The million song dataset challenge." Proceedings of the 21st international conference companion on World Wide Web. ACM, 2012.

[165] Bohanec, Marko, and Vladislav Rajkovic. "Knowledge acquisition and explanation for multi-attribute decision making." 8th Intl Workshop on Expert Systems and their Applications. 1988.

[166] Tan, Peter J., and David L. Dowe. "MML inference of decision graphs with multi-way joins." Australian Joint Conference on Artificial Intelligence. 2002.

[167] "Quantifying comedy on YouTube: why the number of o's in your LOL matter". Metatext NLP Database. Retrieved 26 October 2020.

[168] Kim, Byung Joo (2012). "A Classifier for Big Data". Convergence and Hybrid Information Technology. Communications in Computer and Information Science. 310. pp. 505–512. doi:10.1007/978-3-642-32692-9_63. ISBN 978-3-642-32691-2.

[169] Pérezgonzález, Jose D.; Gilbey, Andrew (2011). "Predicting Skytrax airport rankings from customer reviews". Journal of Airport Management. 5 (4): 335–339.

[170] Loh, Wei-Yin, and Yu-Shan Shih. "Split selection methods for classification trees." Statistica sinica(1997): 815–840.

[171] Lim, Tjen-Sien; Loh, Wei-Yin; Shih, Yu-Shan (2000). "A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms". Machine Learning. 40 (3): 203–228. doi:10.1023/a:1007608224229. S2CID 17030953.

[172] Kiet Van Nguyen, Vu Duc Nguyen, Phu X. V. Nguyen, Tham T. H. Truong, Ngan Luu-Thuy Nguyen. "UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis}}

[173] Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition for Vietnamese Social Media Text}}

[174] Dermouche, Mohamed; Velcin, Julien; Khouas, Leila; Loudcher, Sabine (2014). A Joint Model for Topic-Sentiment Evolution over Time. IEEE. doi:10.1109/icdm.2014.82. ISBN 978-1-4799-4302-9.

[175] Rose, Tony; Stevenson, Mark; Whitehead, Miles (2002). "The Reuters Corpus Volume 1-from Yesterday's News to Tomorrow's Language Resources" (PDF). LREC. 2. S2CID 9239414.

[176] Amini, Massih R.; Usunier, Nicolas; Goutte, Cyril (2009). "Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization". Advances in Neural Information Processing Systems: 28–36.

[177] Liu, Ming; et al. (2015). "VRCA: a clustering algorithm for massive amount of texts". Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press.

[178] Al-Harbi, S; Almuhareb, A; Al-Thubaity, A; Khorsheed, M. S.; Al-Rajeh, A (2008). "Automatic Arabic Text Classification". Proceedings of the 9th International Conference on the Statistical Analysis of Textual Data, Lyon, France.

[179] "Relationship and Entity Extraction Evaluation Dataset: Dstl/re3d". 17 December 2018.

[180] "The Examiner - SpamClickBait Catalogue".

[181] "A Million News Headlines".

[182] "One Week of Global News Feeds".

[183] Kulkarni, Rohit (2018), Reuters News-Wire Archive, Harvard Dataverse, doi:10.7910/DVN/XDB74W

[184] "IrishTimes - the Waxy-Wany News".

[185] "News Headlines Dataset For Sarcasm Detection". kaggle.com. Retrieved 27 April 2019.

[186] Klimt, Bryan, and Yiming Yang. "Introducing the Enron Corpus." CEAS. 2004.

[187] Kossinets, Gueorgi, Jon Kleinberg, and Duncan Watts. "The structure of information pathways in a social communication network." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.

[188] Androutsopoulos, Ion; Koutsias, John; Chandrinos, Konstantinos V.; Paliouras, George; Spyropoulos, Constantine D. (2000). "An evaluation of Naive Bayesian anti-spam filtering". In Potamias, G.; Moustakis, V.; van Someren, M. (eds.). Proceedings of the Workshop on Machine Learning in the New Information Age. 11th European Conference on Machine Learning, Barcelona, Spain. 11. pp. 9–17. arXiv:cs/0006013. Bibcode:2000cs........6013A.

[189] Bratko, Andrej; et al. (2006). "Spam filtering using statistical data compression models" (PDF). The Journal of Machine Learning Research. 7: 2673–2698.

[190] Almeida, Tiago A., José María G. Hidalgo, and Akebo Yamakami. "Contributions to the study of SMS spam filtering: new collection and results."Proceedings of the 11th ACM symposium on Document engineering. ACM, 2011.

[191] Delany; Jane, Sarah; Buckley, Mark; Greene, Derek (2012). "SMS spam filtering: methods and data". Expert Systems with Applications. 39 (10): 9899–9908. doi:10.1016/j.eswa.2012.02.053.

[192] Joachims, Thorsten. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. No. CMU-CS-96-118. Carnegie-mellon univ pittsburgh pa dept of computer science, 1996.

[193] Dimitrakakis, Christos, and Samy Bengio. Online Policy Adaptation for Ensemble Algorithms. No. EPFL-REPORT-82788. IDIAP, 2002.

[194] Annamoradnejad, Issa. arXiv:2004.12765. arXiv:2004.12765, 2020.

[195] Dooms, S. et al. "Movietweetings: a movie rating dataset collected from twitter, 2013. Available from https://github.com/sidooms/MovieTweetings."

[196] RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2017). "Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval". arXiv:1703.06618 [cs.CV].

[197] "huyt16/Twitter100k". GitHub. Retrieved 26 March 2018.

[198] Go, Alec; Bhayani, Richa; Huang, Lei (2009). "Twitter sentiment classification using distant supervision". CS224N Project Report, Stanford. 1: 12.

[199] Chikersal, Prerna, Soujanya Poria, and Erik Cambria. "SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning." Proceedings of the International Workshop on Semantic Evaluation, SemEval. 2015.

[200] Zafarani, Reza, and Huan Liu. "Social computing data repository at ASU." School of Computing, Informatics and Decision Systems Engineering, Arizona State University (2009).

[201] Bisgin, Halil, Nitin Agarwal, and Xiaowei Xu. "Investigating homophily in online social networks." Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. Vol. 1. IEEE, 2010.

[202] McAuley, Julian J.; Leskovec, Jure. "Learning to Discover Social Circles in Ego Networks". NIPS. 2012: 2012.

[203] Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko (2014). "Network-based statistical comparison of citation topology of bibliographic databases". Scientific Reports. 4 (6496): 6496. arXiv:1502.05061. Bibcode:2014NatSR...4E6496S. doi:10.1038/srep06496. PMC 4178292. PMID 25263231.

[204] Abdulla, N., et al. "Arabic sentiment analysis: Corpus-based and lexicon-based." Proceedings of the IEEE conference on Applied Electrical Engineering and Computing Technologies (AEECT). 2013.

[205] Abooraig, Raddad, et al. "On the automatic categorization of Arabic articles based on their political orientation." Third International Conference on Informatics Engineering and Information Science (ICIEIS2014). 2014.

[206] Kawala, François, et al. "Prédictions d'activité dans les réseaux sociaux en ligne." 4ième conférence sur les modèles et l'analyse des réseaux: Approches mathématiques et informatiques. 2013.

[207] Sabharwal, Ashish; Samulowitz, Horst; Tesauro, Gerald (2015). "Selecting Near-Optimal Learners via Incremental Data Allocation". arXiv:1601.00024 [cs.LG].

[208] Xu et al. "SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)" Proceedings of the 9th International Workshop on Semantic Evaluation. 2015.

[209] Xu et al. "Extracting Lexically Divergent Paraphrases from Twitter" Transactions of the Association for Computational (TACL). 2014.

[210] Middleton, Stuart E; Middleton, Lee; Modafferi, Stefano (2014). "Real-Time Crisis Mapping of Natural Disasters Using Social Media" (PDF). IEEE Intelligent Systems. 29 (2): 9–17. doi:10.1109/MIS.2013.126. S2CID 15139204.

[211] "geoparsepy". 2016. Python PyPI library

[212] Gupta, Aakash (5 December 2020). "Dutch social media collection" Check |url= value (help). doi:10.5072/FK2/MTPTL7. Cite journal requires |journal= (help)

[213] "Streamlit". huggingface.co. Retrieved 18 December 2020.

[214] "Dutch Social media collection". kaggle.com. Retrieved 18 December 2020.

[215] Forsyth, E., Lin, J., & Martell, C. (2008, June 25). The NPS Chat Corpus. Retrieved from http://faculty.nps.edu/cmartell/NPSChat.htm

[216] Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan, A Neural Network Approach to Context-Sensitive Generation of Conversational Responses, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL-HLT 2015), June 2015.

[217] Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET corpus (2005-2011) Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html)

[218] KAN, M. (2011, January). NUS Short Message Service (SMS) Corpus. Retrieved from http://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/

[219] Stuck_In_the_Matrix. (2015, July 3). I have every publicly available Reddit comment for research. ~ 1.7 billion comments @ 250 GB compressed. Any interest in this? [Original post]. Message posted to https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/

[220] Ryan Lowe, Nissan Pow, Iulian V. Serban and Joelle Pineau, "The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructure Multi-Turn Dialogue Systems", SIGDial 2015.

[KOW2017-221] K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber and L. E. Barnes, "HDLTex: Hierarchical Deep Learning for Text Classification", 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 364-371. doi: 10.1109/ICMLA.2017.0-134

[KOW2017WOS-222] K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S. Gerber and L. E. Barnes, "Web of Science Dataset", doi:10.17632/9rw3vkcfy4.6

[223] Galgani, Filippo, Paul Compton, and Achim Hoffmann. "Combining different summarization techniques for legal text." Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data. Association for Computational Linguistics, 2012.

[224] Nagwani, N. K. (2015). "Summarizing large text collection using topic modeling and clustering based on MapReduce framework". Journal of Big Data. 2 (1): 1–18. doi:10.1186/s40537-015-0020-5.

[225] Schler, Jonathan; et al. (2006). "Effects of Age and Gender on Blogging" (PDF). AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 6.

[226] Anand, Pranav, et al. "Believe Me-We Can Do This! Annotating Persuasive Acts in Blog Text."Computational Models of Natural Argument. 2011.

[227] Traud, Amanda L., Peter J. Mucha, and Mason A. Porter. "Social structure of Facebook networks." Physica A: Statistical Mechanics and its Applications391.16 (2012): 4165–4180.

[228] Richard, Emile; Savalle, Pierre-Andre; Vayatis, Nicolas (2012). "Estimation of Simultaneously Sparse and Low Rank Matrices". arXiv:1206.6474 [cs.DS].

[229] Richardson, Matthew; Burges, Christopher JC; Renshaw, Erin (2013). "MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text". EMNLP. 1.

[230] Weston, Jason; Bordes, Antoine; Chopra, Sumit; Rush, Alexander M.; Bart van Merriënboer; Joulin, Armand; Mikolov, Tomas (2015). "Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks". arXiv:1502.05698 [cs.AI].

[231] Marcus, Mitchell P.; Ann Marcinkiewicz, Mary; Santorini, Beatrice (1993). "Building a large annotated corpus of English: The Penn Treebank". Computational Linguistics. 19 (2): 313–330.

[232] Collins, Michael (2003). "Head-driven statistical models for natural language parsing". Computational Linguistics. 29 (4): 589–637. doi:10.1162/089120103322753356.

[233] Guyon, Isabelle, et al., eds. Feature extraction: foundations and applications. Vol. 207. Springer, 2008.

[234] Lin, Yuri, et al. "Syntactic annotations for the google books ngram corpus." Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, 2012.

[235] Krishnamoorthy, Niveda; et al. (2013). "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge". AAAI. 1.

[236] Luyckx, Kim, and Walter Daelemans. "Personae: a Corpus for Author and Personality Prediction from Text." LREC. 2008.

[237] Solorio, Thamar, Ragib Hasan, and Mainul Mizan. "A case study of sockpuppet detection in wikipedia." Workshop on Language Analysis in Social Media (LASM) at NAACL HLT. 2013.

[238] Ciarelli, Patrick Marques, and Elias Oliveira. "Agglomeration and elimination of terms for dimensionality reduction." Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on. IEEE, 2009.

[239] Zhou, Mingyuan, Oscar Hernan Madrid Padilla, and James G. Scott. "Priors for random count matrices derived from a family of negative binomial processes." Journal of the American Statistical Association just-accepted (2015): 00–00.

[240] Kotzias, Dimitrios, et al. "From group to individual labels using deep features." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.

[241] Ning, Yue; Muthiah, Sathappan; Rangwala, Huzefa; Ramakrishnan, Naren (2016). "Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning". arXiv:1602.08033 [cs.SI].

[242] Buza, Krisztian. "Feedback prediction for blogs."Data analysis, machine learning and knowledge discovery. Springer International Publishing, 2014. 145–152.

[243] Soysal, Ömer M (2015). "Association rule mining with mostly associated sequential patterns". Expert Systems with Applications. 42 (5): 2582–2592. doi:10.1016/j.eswa.2014.10.049.

[244] Bowman, Samuel, et al. "A large annotated corpus for learning natural language inference." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 2015.

[245] "DSL Corpus Collection". ttg.uni-saarland.de. Retrieved 22 September 2017.

[246] "Urban Dictionary Words and Definitions".

[247] H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F. Laforest, E. Simperl, "T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples", Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).

[248] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.

[249] "Computers Are Learning to Read—But They're Still Not So Smart". Wired. Retrieved 29 December 2019.

[250] Quan, Hoang Lam; Quang, Duy Le; Van Kiet, Nguyen; Ngan, Luu-Thuy Nguyen. "UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning".

[251] To, Quoc Huy; Nguyen, Van Kiet; Nguyen, Luu Thuy Ngan; Nguyen, Gia Tuan Anh. "Gender Prediction Based on Vietnamese Names with Machine Learning Techniques" (PDF).

[252] M. Versteegh, R. Thiollière, T. Schatz, X.-N. Cao, X. Anguera, A. Jansen, and E. Dupoux (2015). "The Zero Resource Speech Challenge 2015," in INTERSPEECH-2015.

[253] M. Versteegh, X. Anguera, A. Jansen, and E. Dupoux, (2016). "The Zero Resource Speech Challenge 2015: Proposed Approaches and Results," in SLTU-2016.

[254] Sakar, Betul Erdogdu; et al. (2013). "Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings". IEEE Journal of Biomedical and Health Informatics. 17 (4): 828–834. doi:10.1109/jbhi.2013.2245674. PMID 25055311. S2CID 15491516.

[255] Zhao, Shunan, et al. "Automatic detection of expressed emotion in Parkinson's disease." Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.

[:2-256] Used in: Hammami, Nacereddine, and Mouldi Bedda. "Improved tree model for Arabic speech recognition." Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on. Vol. 5. IEEE, 2010.

[257] Maaten, Laurens. "Learning discriminative fisher kernels." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.

[258] Cole, Ronald, and Mark Fanty. "Spoken letter recognition." Proc. Third DARPA Speech and Natural Language Workshop. 1990.

[259] Chapelle, Olivier; Sindhwani, Vikas; Keerthi, Sathiya S. (2008). "Optimization techniques for semi-supervised support vector machines" (PDF). The Journal of Machine Learning Research. 9: 203–233.

[260] Kudo, Mineichi; Toyama, Jun; Shimbo, Masaru (1999). "Multidimensional curve classification using passing-through regions". Pattern Recognition Letters. 20 (11): 1103–1111. CiteSeerX 10.1.1.46.2515. doi:10.1016/s0167-8655(99)00077-x.

[261] Jaeger, Herbert; et al. (2007). "Optimization and applications of echo state networks with leaky-integrator neurons". Neural Networks. 20 (3): 335–352. doi:10.1016/j.neunet.2007.04.016. PMID 17517495.

[262] Tsanas, Athanasios; et al. (2010). "Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests". IEEE Transactions on Biomedical Engineering (Submitted manuscript). 57 (4): 884–893. doi:10.1109/tbme.2009.2036000. PMID 19932995. S2CID 7382779.

[263] Clifford, Gari D.; Clifton, David (2012). "Wireless technology in disease management and medicine". Annual Review of Medicine. 63: 479–492. doi:10.1146/annurev-med-051210-114650. PMID 22053737.

[264] Zue, Victor; Seneff, Stephanie; Glass, James (1990). "Speech database development at MIT: TIMIT and beyond". Speech Communication. 9 (4): 351–356. doi:10.1016/0167-6393(90)90010-7.

[265] Kapadia, Sadik, Valtcho Valtchev, and S. J. Young. "MMI training for continuous phoneme recognition on the TIMIT database." Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on. Vol. 2. IEEE, 1993.

[HALABI2016-266] Halabi, Nawar (2016). Modern Standard Arabic Phonetics for Speech Synthesis (PDF) (PhD Thesis). University of Southampton, School of Electronics and Computer Science.

[267] Ardila, Rosana; Branson, Megan; Davis, Kelly; Henretty, Michael; Kohler, Michael; Meyer, Josh; Morais, Reuben; Saunders, Lindsay; Tyers, Francis M.; Weber, Gregor (13 December 2019). "Common Voice: A Massively-Multilingual Speech Corpus". arXiv:1912.06670v2 [cs.CL].

[268] Zhou, Fang, Q. Claire, and Ross D. King. "Predicting the geographical origin of music." Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 2014.

[269] Saccenti, Edoardo; Camacho, José (2015). "On the use of the observation‐wise k‐fold operation in PCA cross‐validation". Journal of Chemometrics. 29 (8): 467–478. doi:10.1002/cem.2726. hdl:10481/55302. S2CID 62248957.

[270] Bertin-Mahieux, Thierry, et al. "The million song dataset." ISMIR 2011: Proceedings of the 12th International Society for Music Information Retrieval Conference, 24–28 October 2011, Miami, Florida. University of Miami, 2011.

[271] Henaff, Mikael; et al. (2011). "Unsupervised learning of sparse features for scalable audio classification" (PDF). ISMIR. 11.

[272] Rafii, Zafar (2017). "Music". MUSDB18 - a corpus for music separation. doi:10.5281/zenodo.1117372.

[273] Defferrard, Michaël; Benzi, Kirell; Vandergheynst, Pierre; Bresson, Xavier (6 December 2016). "FMA: A Dataset For Music Analysis". arXiv:1612.01840 [cs.SD].

[274] Esposito, Roberto; Radicioni, Daniele P. (2009). "Carpediem: Optimizing the viterbi algorithm and applications to supervised sequential learning" (PDF). The Journal of Machine Learning Research. 10: 1851–1880.

[275] Sourati, Jamshid; et al. (2016). "Classification Active Learning Based on Mutual Information". Entropy. 18 (2): 51. Bibcode:2016Entrp..18...51S. doi:10.3390/e18020051.

[276] Salamon, Justin; Jacoby, Christopher; Bello, Juan Pablo. "A dataset and taxonomy for urban sound research." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.

[277] Lagrange, Mathieu; Lafay, Grégoire; Rossignol, Mathias; Benetos, Emmanouil; Roebel, Axel (2015). "An evaluation framework for event detection using a morphological model of acoustic scenes". arXiv:1502.00141 [stat.ML].

[278] Gemmeke, Jort F., et al. "Audio Set: An ontology and human-labeled dataset for audio events." IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2017.

[279] "Watch out, birders: Artificial intelligence has learned to spot birds from their songs". Science | AAAS. 18 July 2018. Retrieved 22 July 2018.

[280] "Bird Audio Detection challenge". Machine Listening Lab at Queen Mary University. 3 May 2016. Retrieved 22 July 2018.

[281] Wichern, G., et al. "WHAM!: Extending Speech Separation to Noisy Environments", Interspeech, 2019, https://arxiv.org/abs/1907.01160

[282] Drossos, K., Lipping, S., and Virtanen, T. "Clotho: An Audio Captioning Dataset" IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2020.

[283] Drossos, K., Lipping, S., and Virtanen, T. (2019). Clotho dataset (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3490684

[284] The CAIDA UCSD Dataset on the Witty Worm – 19–24 March 2004, http://www.caida.org/data/passive/witty_worm_dataset.xml

[285] Chen, Zesheng, and Chuanyi Ji. "Optimal worm-scanning method using vulnerable-host distributions." International Journal of Security and Networks 2.1–2 (2007): 71–80.

[286] Kachuee, Mohamad, et al. "Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time." Circuits and Systems (ISCAS), 2015 IEEE International Symposium on. IEEE, 2015.

[287] PhysioBank, PhysioToolkit. "PhysioNet: components of a new research resource for complex physiologic signals." Circulation. v101 i23. e215-e220.

[288] Vergara, Alexander; et al. (2012). "Chemical gas sensor drift compensation using classifier ensembles". Sensors and Actuators B: Chemical. 166: 320–329. doi:10.1016/j.snb.2012.01.074.

[289] Korotcenkov, G.; Cho, B. K. (2014). "Engineering approaches to improvement of conductometric gas sensor parameters. Part 2: Decrease of dissipated (consumable) power and improvement stability and reliability". Sensors and Actuators B: Chemical. 198: 316–341. doi:10.1016/j.snb.2014.03.069.

[290] Quinlan, John R (1992). "Learning with continuous classes" (PDF). 5th Australian Joint Conference on Artificial Intelligence. 92.

[291] Merz, Christopher J.; Pazzani, Michael J. (1999). "A principal components approach to combining regression estimates". Machine Learning. 36 (1–2): 9–32. doi:10.1023/a:1007507221352.

[292] Torres-Sospedra, Joaquin, et al. "UJIIndoorLoc-Mag: A new database for magnetic field-based localization problems." Indoor Positioning and Indoor Navigation (IPIN), 2015 International Conference on. IEEE, 2015.

[293] Berkvens, Rafael, Maarten Weyn, and Herbert Peremans. "Mean Mutual Information of Probabilistic Wi-Fi Localization." Indoor Positioning and Indoor Navigation (IPIN), 2015 International Conference on. Banff, Canada: IPIN. 2015.

[294] Paschke, Fabian, et al. "Sensorlose Zustandsüberwachung an Synchronmotoren."Proceedings. 23. Workshop Computational Intelligence, Dortmund, 5.-6. Dezember 2013. KIT Scientific Publishing, 2013.

[295] Lessmeier, Christian, et al. "Data Acquisition and Signal Analysis from Measured Motor Currents for Defect Detection in Electromechanical Drive Systems."

[296] Ugulino, Wallace, et al. "Wearable computing: Accelerometers’ data classification of body postures and movements." Advances in Artificial Intelligence-SBIA 2012. Springer Berlin Heidelberg, 2012. 52–61.

[297] Schneider, Jan; et al. (2015). "Augmenting the senses: a review on sensor-based learning support". Sensors. 15 (2): 4097–4133. doi:10.3390/s150204097. PMC 4367401. PMID 25679313.

[298] Madeo, Renata CB, Clodoaldo AM Lima, and Sarajane M. Peres. "Gesture unit segmentation using support vector machines: segmenting gestures from rest positions." Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 2013.

[299] Lun, Roanna; Zhao, Wenbing (2015). "A survey of applications and human motion recognition with Microsoft Kinect". International Journal of Pattern Recognition and Artificial Intelligence. 29 (5): 1555008. doi:10.1142/s0218001415550083.

[300] Theodoridis, Theodoros, and Huosheng Hu. "Action classification of 3d human models using dynamic ANNs for mobile robot surveillance."Robotics and Biomimetics, 2007. ROBIO 2007. IEEE International Conference on. IEEE, 2007.

[301] Etemad, Seyed Ali, and Ali Arya. "3D human action recognition and style transformation using resilient backpropagation neural networks." Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on. Vol. 4. IEEE, 2009.

[302] Altun, Kerem; Barshan, Billur; Tunçel, Orkun (2010). "Comparative study on classifying human activities with miniature inertial and magnetic sensors". Pattern Recognition. 43 (10): 3605–3620. doi:10.1016/j.patcog.2010.04.019. hdl:11693/11947.

[303] Nathan, Ran; et al. (2012). "Using tri-axial acceleration data to identify behavioral modes of free-ranging animals: general concepts and tools illustrated for griffon vultures". The Journal of Experimental Biology. 215 (6): 986–996. doi:10.1242/jeb.058602. PMC 3284320. PMID 22357592.

[304] Anguita, Davide, et al. "Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine." Ambient assisted living and home care. Springer Berlin Heidelberg, 2012. 216–223.

[305] Su, Xing; Tong, Hanghang; Ji, Ping (2014). "Activity recognition with smartphone sensors". Tsinghua Science and Technology. 19 (3): 235–249. doi:10.1109/tst.2014.6838194.

[306] Kadous, Mohammed Waleed. Temporal classification: Extending the classification paradigm to multivariate time series. Diss. The University of New South Wales, 2002.

[307] Graves, Alex, et al. "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

[308] Velloso, Eduardo, et al. "Qualitative activity recognition of weight lifting exercises."Proceedings of the 4th Augmented Human International Conference. ACM, 2013.

[309] Mortazavi, Bobak Jack, et al. "Determining the single best axis for exercise repetition recognition and counting on smartwatches." Wearable and Implantable Body Sensor Networks (BSN), 2014 11th International Conference on. IEEE, 2014.

[310] Sapsanis, Christos, et al. "Improving EMG based Classification of basic hand movements using EMD." Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE. IEEE, 2013.

[Andrianesis,_Konstantinos_2015-311] Andrianesis, Konstantinos; Tzes, Anthony (2015). "Development and control of a multifunctional prosthetic hand with shape memory alloy actuators". Journal of Intelligent & Robotic Systems. 78 (2): 257–289. doi:10.1007/s10846-014-0061-6. S2CID 207174078.

[312] Banos, Oresti; et al. (2014). "Dealing with the effects of sensor displacement in wearable activity recognition". Sensors. 14 (6): 9995–10023. doi:10.3390/s140609995. PMC 4118358. PMID 24915181.

[313] Stisen, Allan, et al. "Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition."Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 2015.

[314] Bhattacharya, Sourav, and Nicholas D. Lane. "From Smart to Deep: Robust Activity Recognition on Smartwatches using Deep Learning."

[315] Bacciu, Davide; et al. (2014). "An experimental characterization of reservoir computing in ambient assisted living applications". Neural Computing and Applications. 24 (6): 1451–1464. doi:10.1007/s00521-013-1364-4. hdl:11568/237959. S2CID 14124013.

[316] Palumbo, Filippo; Barsocchi, Paolo; Gallicchio, Claudio; Chessa, Stefano; Micheli, Alessio (2013). "Multisensor Data Fusion for Activity Recognition Based on Reservoir Computing". Evaluating AAL Systems Through Competitive Benchmarking. Communications in Computer and Information Science. 386. pp. 24–35. doi:10.1007/978-3-642-41043-7_3. ISBN 978-3-642-41042-0.

[317] Reiss, Attila, and Didier Stricker. "Introducing a new benchmarked dataset for activity monitoring."Wearable Computers (ISWC), 2012 16th International Symposium on. IEEE, 2012.

[318] Roggen, Daniel, et al. "OPPORTUNITY: Towards opportunistic activity and context recognition systems." World of Wireless, Mobile and Multimedia Networks & Workshops, 2009. WoWMoM 2009. IEEE International Symposium on a. IEEE, 2009.

[319] Kurz, Marc, et al. "Dynamic quantification of activity recognition capabilities in opportunistic systems." Vehicular Technology Conference (VTC Spring), 2011 IEEE 73rd. IEEE, 2011.

[320] Sztyler, Timo, and Heiner Stuckenschmidt. "On-body localization of wearable devices: an investigation of position-aware activity recognition." Pervasive Computing and Communications (PerCom), 2016 IEEE International Conference on. IEEE, 2016.

[321] Zhi, Ying Xuan; Lukasik, Michelle; Li, Michael H.; Dolatabadi, Elham; Wang, Rosalie H.; Taati, Babak (2018). "Automatic Detection of Compensation During Robotic Stroke Rehabilitation Therapy". IEEE Journal of Translational Engineering in Health and Medicine. 6: 2100107. doi:10.1109/JTEHM.2017.2780836. ISSN 2168-2372. PMC 5788403. PMID 29404226.

[322] Dolatabadi, Elham; Zhi, Ying Xuan; Ye, Bing; Coahran, Marge; Lupinacci, Giorgia; Mihailidis, Alex; Wang, Rosalie; Taati, Babak (23 May 2017). The toronto rehab stroke pose dataset to detect compensation during stroke rehabilitation therapy. ACM. pp. 375–381. doi:10.1145/3154862.3154925. ISBN 9781450363631. S2CID 24581930.

[323] "Toronto Rehab Stroke Pose Dataset".

[324] Jung, Merel M.; Poel, Mannes; Poppe, Ronald; Heylen, Dirk K. J. (1 March 2017). "Automatic recognition of touch gestures in the corpus of social touch". Journal on Multimodal User Interfaces. 11 (1): 81–96. doi:10.1007/s12193-016-0232-9. ISSN 1783-8738. S2CID 1802116.

[325] Jung, M.M. (Merel) (1 June 2016). "Corpus of Social Touch (CoST)". University of Twente. doi:10.4121/uuid:5ef62345-3b3e-479c-8e1d-c922748c9b29. Cite journal requires |journal= (help)

[326] Aeberhard, S., D. Coomans, and O. De Vel. "Comparison of classifiers in high dimensional settings." Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep 92-02 (1992).

[327] Basu, Sugato. "Semi-supervised clustering with limited background knowledge." AAAI. 2004.

[328] Tüfekci, Pınar (2014). "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods". International Journal of Electrical Power & Energy Systems. 60: 126–140. doi:10.1016/j.ijepes.2014.02.027.

[329] Kaya, Heysem, Pınar Tüfekci, and Fikret S. Gürgen. "Local and global learning methods for predicting power of a combined gas & steam turbine." International conference on emerging trends in computer and electronics engineering (ICETCEE'2012), Dubai. 2012.

[330] Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2014). "Searching for exotic particles in high-energy physics with deep learning". Nature Communications. 5: 2014. arXiv:1402.4735. Bibcode:2014NatCo...5.4308B. doi:10.1038/ncomms5308. PMID 24986233. S2CID 195953.

[:8-331] Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2015). "Enhanced Higgs Boson to τ+ τ− Search with Deep Learning". Physical Review Letters. 114 (11): 111801. arXiv:1410.3469. Bibcode:2015PhRvL.114k1801B. doi:10.1103/physrevlett.114.111801. PMID 25839260. S2CID 2339142.

[:9-332] Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.; Kégl, B.; Rousseau, D. (2015). "The Higgs Machine Learning Challenge". Journal of Physics Conference Series. 664 (7): 072015. Bibcode:2015JPhCS.664g2015A. doi:10.1088/1742-6596/664/7/072015.

[333] Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson. 'Parameterized Machine Learning for High-Energy Physics.' In submission.

[334] Ortigosa, I.; Lopez, R.; Garcia, J. "A neural networks approach to residuary resistance of sailing yachts prediction". Proceedings of the International Conference on Marine Engineering MARINE. 2007.

[335] Gerritsma, J., R. Onnink, and A. Versluis.Geometry, resistance and stability of the delft systematic yacht hull series. Delft University of Technology, 1981.

[336] Liu, Huan, and Hiroshi Motoda. Feature extraction, construction and selection: A data mining perspective. Springer Science & Business Media, 1998.

[337] Reich, Yoram. Converging to Ideal Design Knowledge by Learning. [Carnegie Mellon University], Engineering Design Research Center, 1989.

[338] Todorovski, Ljupčo; Džeroski, Sašo (1999). "Experiments in Meta-level Learning with ILP". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. 1704. pp. 98–106. doi:10.1007/978-3-540-48247-5_11. ISBN 978-3-540-66490-1.

[339] Wang, Yong. A new approach to fitting linear models in high dimensional spaces. Diss. The University of Waikato, 2000.

[340] Kibler, Dennis; Aha, David W.; Albert, Marc K. (1989). "Instance‐based prediction of real‐valued attributes". Computational Intelligence. 5 (2): 51–57. doi:10.1111/j.1467-8640.1989.tb00315.x. S2CID 40800413.

[341] Palmer, Christopher R., and Christos Faloutsos. "Electricity based external similarity of categorical attributes." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2003. 486–500.

[342] Tsanas, Athanasios; Xifara, Angeliki (2012). "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools". Energy and Buildings. 49: 560–567. doi:10.1016/j.enbuild.2012.03.003.

[343] De Wilde, Pieter (2014). "The gap between predicted and measured energy performance of buildings: A framework for investigation". Automation in Construction. 41: 40–49. doi:10.1016/j.autcon.2014.02.009.

[344] Brooks, Thomas F., D. Stuart Pope, and Michael A. Marcolini. Airfoil self-noise and prediction. Vol. 1218. National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Division, 1989.

[345] Draper, David. "Assessment and propagation of model uncertainty." Journal of the Royal Statistical Society, Series B (Methodological) (1995): 45–97.

[346] Lavine, Michael (1991). "Problems in extrapolation illustrated with space shuttle O-ring data". Journal of the American Statistical Association. 86 (416): 919–921. doi:10.1080/01621459.1991.10475132.

[347] Wang, Jun, Bei Yu, and Les Gasser. "Concept tree based clustering visualization with shaded similarity matrices." Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 2002.

[348] Pettengill, Gordon H., et al. "Magellan: Radar performance and data products." Science252.5003 (1991): 260–265.

[:10-349] Aharonian, F.; et al. (2008). "Energy spectrum of cosmic-ray electrons at TeV energies". Physical Review Letters. 101 (26): 261104. arXiv:0811.3894. Bibcode:2008PhRvL.101z1104A. doi:10.1103/PhysRevLett.101.261104. hdl:2440/51450. PMID 19437632. S2CID 41850528.

[350] Bock, R. K.; et al. (2004). "Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope". Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment. 516 (2): 511–528. Bibcode:2004NIMPA.516..511B. doi:10.1016/j.nima.2003.08.157.

[351] Li, Jinyan; et al. (2004). "Deeps: A new instance-based lazy discovery and classification system". Machine Learning. 54 (2): 99–124. doi:10.1023/b:mach.0000011804.08528.7d.

[352] Siebert, Lee, and Tom Simkin. "Volcanoes of the world: an illustrated catalog of Holocene volcanoes and their eruptions." (2014).

[353] Sikora, Marek; Wróbel, Łukasz (2010). "Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines". Archives of Mining Sciences. 55 (1): 91–114.

[354] Sikora, Marek, and Beata Sikora. "Rough natural hazards monitoring." Rough Sets: Selected Methods and Applications in Management and Engineering. Springer London, 2012. 163–179.

[355] Yeh, I–C (1998). "Modeling of strength of high-performance concrete using artificial neural networks". Cement and Concrete Research. 28 (12): 1797–1808. doi:10.1016/s0008-8846(98)00165-3.

[356] Zarandi, MH Fazel; et al. (2008). "Fuzzy polynomial neural networks for approximation of the compressive strength of concrete". Applied Soft Computing. 8 (1): 488–498. Bibcode:2008ApSoC...8...79S. doi:10.1016/j.asoc.2007.02.010.

[357] Yeh, I. "Modeling slump of concrete with fly ash and superplasticizer." Computers and Concrete5.6 (2008): 559–572.

[358] Gencel, Osman; et al. (2011). "Comparison of artificial neural networks and general linear model approaches for the analysis of abrasive wear of concrete". Construction and Building Materials. 25 (8): 3486–3494. doi:10.1016/j.conbuildmat.2011.03.040.

[359] Dietterich, Thomas G., et al. "A comparison of dynamic reposing and tangent distance for drug activity prediction." Advances in Neural Information Processing Systems (1994): 216–216.

[360] Buscema, Massimo, William J. Tastle, and Stefano Terzi. "Meta net: A new meta-classifier family."Data Mining Applications Using Artificial Adaptive Systems. Springer New York, 2013. 141–182.

[:3-361] Ingber, Lester (1997). "Statistical mechanics of neocortical interactions: Canonical momenta indicatorsof electroencephalography". Physical Review E. 55 (4): 4578–4593. arXiv:physics/0001052. Bibcode:1997PhRvE..55.4578I. doi:10.1103/PhysRevE.55.4578. S2CID 6390999.

[362] Hoffmann, Ulrich; Vesin, Jean-Marc; Ebrahimi, Touradj; Diserens, Karin (2008). "An efficient P300-based brain–computer interface for disabled subjects". Journal of Neuroscience Methods. 167 (1): 115–125. CiteSeerX 10.1.1.352.4630. doi:10.1016/j.jneumeth.2007.03.005. PMID 17445904. S2CID 9648828.

[363] Donchin, Emanuel; Spencer, Kevin M.; Wijesinghe, Ranjith (2000). "The mental prosthesis: assessing the speed of a P300-based brain-computer interface". IEEE Transactions on Rehabilitation Engineering. 8 (2): 174–179. doi:10.1109/86.847808. PMID 10896179.

[364] Detrano, Robert; et al. (1989). "International application of a new probability algorithm for the diagnosis of coronary artery disease". The American Journal of Cardiology. 64 (5): 304–310. doi:10.1016/0002-9149(89)90524-9. PMID 2756873.

[365] Bradley, Andrew P (1997). "The use of the area under the ROC curve in the evaluation of machine learning algorithms" (PDF). Pattern Recognition. 30 (7): 1145–1159. doi:10.1016/s0031-3203(96)00142-2.

[366] Street, W. N.; Wolberg, W. H.; Mangasarian, O. L. (1993). "Nuclear feature extraction for breast tumor diagnosis". In Acharya, Raj S; Goldgof, Dmitry B (eds.). Biomedical Image Processing and Biomedical Visualization. 1905. pp. 861–870. doi:10.1117/12.148698. S2CID 14922543.

[367] Demir, Cigdem, and Bülent Yener. "Automated cancer diagnosis based on histopathological images: a systematic survey." Rensselaer Polytechnic Institute, Tech. Rep (2005).

[368] Abuse, Substance. "Mental Health Services Administration, Results from the 2010 National Survey on Drug Use and Health: Summary of National Findings, NSDUH Series H-41, HHS Publication No.(SMA) 11-4658." Rockville, MD: Substance Abuse and Mental Health Services Administration 201 (2011).

[369] Hong, Zi-Quan; Yang, Jing-Yu (1991). "Optimal discriminant plane for a small number of samples and design method of classifier on the plane". Pattern Recognition. 24 (4): 317–324. doi:10.1016/0031-3203(91)90074-f.

[Jinyan_2003-370] Li, Jinyan, and Limsoon Wong. "Using rules to analyse bio-medical data: a comparison between C4. 5 and PCL." Advances in Web-Age Information Management. Springer Berlin Heidelberg, 2003. 254-265.

[371] Güvenir, H. Altay, et al. "A supervised machine learning algorithm for arrhythmia analysis."Computers in Cardiology 1997. IEEE, 1997.

[372] Lagus, Krista, et al. "Independent variable group analysis in learning compact representations for data." Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05), T. Honkela, V. Könönen, M. Pöllä, and O. Simula, Eds., Espoo, Finland. 2005.

[373] Strack, Beata, et al. "Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records." BioMed Research International 2014; 2014

[374] Rubin, Daniel J (2015). "Hospital readmission of patients with diabetes". Current Diabetes Reports. 15 (4): 1–9. doi:10.1007/s11892-015-0584-7. PMID 25712258. S2CID 3908599.

[375] Antal, Bálint; Hajdu, András (2014). "An ensemble-based system for automatic screening of diabetic retinopathy". Knowledge-Based Systems. 60 (2014): 20–27. arXiv:1410.8576. Bibcode:2014arXiv1410.8576A. doi:10.1016/j.knosys.2013.12.023. S2CID 13984326.

[376] Haloi, Mrinal (2015). "Improved Microaneurysm Detection using Deep Neural Networks". arXiv:1505.04424 [cs.CV].

[377] ELIE, Guillaume PATRY, Gervais GAUTHIER, Bruno LAY, Julien ROGER, Damien. "ADCIS Download Third Party: Messidor Database". adcis.net. Retrieved 25 February 2018.

[378] Decencière, Etienne; Zhang, Xiwei; Cazuguel, Guy; Lay, Bruno; Cochener, Béatrice; Trone, Caroline; Gain, Philippe; Ordonez, Richard; Massin, Pascale (26 August 2014). "Feedback on a Publicly Distributed Image Database: The Messidor Database". Image Analysis & Stereology. 33 (3): 231–234. doi:10.5566/ias.1155. ISSN 1854-5165.

[379] Bagirov, A. M.; et al. (2003). "Unsupervised and supervised data classification via nonsmooth and global optimization". Top. 11 (1): 1–75. CiteSeerX 10.1.1.1.6429. doi:10.1007/bf02578945. S2CID 14165678.

[380] Fung, Glenn, et al. "A fast iterative algorithm for fisher discriminant using heterogeneous kernels."Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.

[381] Quinlan, John Ross, et al. "Inductive knowledge acquisition: a case study." Proceedings of the Second Australian Conference on Applications of expert systems. Addison-Wesley Longman Publishing Co., Inc., 1987.

[Zhou,_Zhi-Hua_2004-382] Zhou, Zhi-Hua; Jiang, Yuan (2004). "NeC4. 5: neural ensemble based C4. 5". IEEE Transactions on Knowledge and Data Engineering. 16 (6): 770–773. CiteSeerX 10.1.1.1.8430. doi:10.1109/tkde.2004.11. S2CID 1024861.

[383] Er, Orhan; et al. (2012). "An approach based on probabilistic neural network for diagnosis of Mesothelioma's disease". Computers & Electrical Engineering. 38 (1): 75–81. doi:10.1016/j.compeleceng.2011.09.001.

[384] Er, Orhan, A. Çetin Tanrikulu, and Abdurrahman Abakay. "Use of artificial intelligence techniques for diagnosis of malignant pleural mesothelioma."Dicle Tıp Dergisi 42.1 (2015).

[385] Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (25 July 2017). "Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Deep Learning Pose Estimation". Journal of Neuroengineering and Rehabilitation. 15 (1): 97. arXiv:1707.09416. Bibcode:2017arXiv170709416L. doi:10.1186/s12984-018-0446-z. PMC 6219082. PMID 30400914.

[386] Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (May 2018). "Automated assessment of levodopa-induced dyskinesia: Evaluating the responsiveness of video-based features". Parkinsonism & Related Disorders. 53: 42–45. doi:10.1016/j.parkreldis.2018.04.036. ISSN 1353-8020. PMID 29748112.

[387] "Parkinson's Vision-Based Pose Estimation Dataset | Kaggle". kaggle.com. Retrieved 22 August 2018.

[388] Shannon, Paul; et al. (2003). "Cytoscape: a software environment for integrated models of biomolecular interaction networks". Genome Research. 13 (11): 2498–2504. doi:10.1101/gr.1239303. PMC 403769. PMID 14597658.

[389] Javadi, Soroush; Mirroshandel, Seyed Abolghasem (2019). "A novel deep learning method for automatic assessment of human sperm images". Computers in Biology and Medicine. 109: 182–194. doi:10.1016/j.compbiomed.2019.04.030. ISSN 0010-4825. PMID 31059902.

[390] "soroushj/mhsma-dataset: MHSMA: The Modified Human Sperm Morphology Analysis Dataset". github.com. Retrieved 3 May 2019.

[391] Clark, David, Zoltan Schreter, and Anthony Adams. "A quantitative comparison of dystal and backpropagation." Proceedings of 1996 Australian Conference on Neural Networks. 1996.

[392] Jiang, Yuan, and Zhi-Hua Zhou. "Editing training data for kNN classifiers with neural network ensemble." Advances in Neural Networks–ISNN 2004. Springer Berlin Heidelberg, 2004. 356–361.

[393] Ontañón, Santiago, and Enric Plaza. "On similarity measures based on a refinement lattice." Case-Based Reasoning Research and Development. Springer Berlin Heidelberg, 2009. 240–255.

[394] Higuera, Clara; Gardiner, Katheleen J.; Cios, Krzysztof J. (2015). "Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome". PLOS ONE. 10 (6): e0129126. Bibcode:2015PLoSO..1029126H. doi:10.1371/journal.pone.0129126. PMC 4482027. PMID 26111164.

[395] Ahmed, Md Mahiuddin; et al. (2015). "Protein dynamics associated with failed and rescued learning in the Ts65Dn mouse model of Down syndrome". PLOS ONE. 10 (3): e0119491. Bibcode:2015PLoSO..1019491A. doi:10.1371/journal.pone.0119491. PMC 4368539. PMID 25793384.

[396] Cortez, Paulo, and Aníbal de Jesus Raimundo Morais. "A data mining approach to predict forest fires using meteorological data." (2007).

[397] Farquad, M. A. H.; Ravi, V.; Raju, S. Bapi (2010). "Support vector regression based hybrid rule extraction methods for forecasting". Expert Systems with Applications. 37 (8): 5577–5589. doi:10.1016/j.eswa.2010.02.055.

[398] Fisher, Ronald A (1936). "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179–188. doi:10.1111/j.1469-1809.1936.tb02137.x. hdl:2440/15227.

[399] Ghahramani, Zoubin, and Michael I. Jordan. "Supervised learning from incomplete data via an EM approach." Advances in neural information processing systems 6. 1994.

[400] Mallah, Charles; Cope, James; Orwell, James (2013). "Plant leaf classification using probabilistic integration of shape, texture and margin features". Signal Processing, Pattern Recognition and Applications. 5: 1.

[401] Yahiaoui, Itheri, Olfa Mzoughi, and Nozha Boujemaa. "Leaf shape descriptor for tree species identification." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.

[402] Langley, PAT (2014). "Trading off simplicity and coverage in incremental concept learning" (PDF). Machine Learning Proceedings. 1988: 73.

[403] Tan, Ming, and Larry Eshelman. "Using weighted networks to represent classification knowledge in noisy domains." Proceedings of the Fifth International Conference on Machine Learning. 2014.

[404] Charytanowicz, Małgorzata, et al. "Complete gradient clustering algorithm for features analysis of x-ray images." Information technologies in biomedicine. Springer Berlin Heidelberg, 2010. 15–24.

[405] Sanchez, Mauricio A.; et al. (2014). "Fuzzy granular gravitational clustering algorithm for multivariate data". Information Sciences. 279: 498–511. doi:10.1016/j.ins.2014.04.005.

[406] Blackard, Jock A.; Dean, Denis J. (1999). "Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables". Computers and Electronics in Agriculture. 24 (3): 131–151. CiteSeerX 10.1.1.128.2475. doi:10.1016/s0168-1699(99)00046-0.

[407] Fürnkranz, Johannes. "Round robin rule learning."Proceedings of the 18th International Conference on Machine Learning (ICML-01): 146--153. 2001.

[408] Li, Song; Assmann, Sarah M.; Albert, Réka (2006). "Predicting essential components of signal transduction networks: a dynamic model of guard cell abscisic acid signaling". PLOS Biol. 4 (10): e312. arXiv:q-bio/0610012. Bibcode:2006q.bio....10012L. doi:10.1371/journal.pbio.0040312. PMC 1564158. PMID 16968132.

[409] Munisami, Trishen; et al. (2015). "Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest Neighbour Classifiers". Procedia Computer Science. 58: 740–747. doi:10.1016/j.procs.2015.08.095.

[410] Li, Bai (2016). "Atomic potential matching: An evolutionary target recognition approach based on edge features". Optik-International Journal for Light and Electron Optics. 127 (5): 3162–3168. Bibcode:2016Optik.127.3162L. doi:10.1016/j.ijleo.2015.11.186.

[411] Nilsback, Maria-Elena, and Andrew Zisserman. "A visual vocabulary for flower classification."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.

[412] Giselsson, Thomas M.; et al. (2017). "A Public Image Database for Benchmark of Plant Seedling Classification Algorithms". arXiv:1711.05458 [cs.CV].

[413] Muresan, Horea; Oltean, Mihai (2018). "Fruit recognition from images using deep learning". Acta Univ. Sapientiae, Informatica. 10 (1): 26–42. doi:10.2478/ausi-2018-0002.

[414] Oltean, Mihai; Muresan, Horea (2017). "A dataset with fruit images on Kaggle".

[415] Nakai, Kenta; Kanehisa, Minoru (1991). "Expert system for predicting protein localization sites in gram‐negative bacteria". Proteins: Structure, Function, and Bioinformatics. 11 (2): 95–110. doi:10.1002/prot.340110203. PMID 1946347. S2CID 27606447.

[416] Ling, Charles X., et al. "Decision trees with minimal costs." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.

[417] Mahé, Pierre, et al. "Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum." Bioinformatics (2014): btu022.

[418] Barbano, Duane; et al. (2015). "Rapid characterization of microalgae and microalgae mixtures using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS)". PLOS ONE. 10 (8): e0135337. Bibcode:2015PLoSO..1035337B. doi:10.1371/journal.pone.0135337. PMC 4536233. PMID 26271045.

[419] Horton, Paul; Nakai, Kenta (1996). "A probabilistic classification system for predicting the cellular localization sites of proteins" (PDF). ISMB-96 Proceedings. 4: 109–15. PMID 8877510.

[420] Allwein, Erin L.; Schapire, Robert E.; Singer, Yoram (2001). "Reducing multiclass to binary: A unifying approach for margin classifiers" (PDF). The Journal of Machine Learning Research. 1: 113–141.

[421] Mayr, Andreas; Klambauer, Guenter; Unterthiner, Thomas; Hochreiter, Sepp (2016). "DeepTox: Toxicity Prediction Using Deep Learning". Frontiers in Environmental Science. 3: 80. doi:10.3389/fenvs.2015.00080.

[422] Lavin, Alexander; Ahmad, Subutai (12 October 2015). Evaluating Real-time Anomaly Detection Algorithms – the Numenta Anomaly Benchmark. p. 38. arXiv:1510.03336. doi:10.1109/ICMLA.2015.141. ISBN 978-1-5090-0287-0. S2CID 6842305.

[423] Iurii D. Katser; Vyacheslav O. Kozitsin. "SKAB GitHub repository". Retrieved 12 January 2021.

[424] Iurii D. Katser; Vyacheslav O. Kozitsin (2020). "Skoltech Anomaly Benchmark (SKAB)". Kaggle. doi:10.34740/KAGGLE/DSV/1693952. Retrieved 12 January 2021.

[CamposZimek2016-425] Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study". Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN 1384-5810. S2CID 1952214.

[426] Ann-Kathrin Hartmann, Tommaso Soru, Edgard Marx. Generating a Large Dataset for Neural Question Answering over the DBpedia Knowledge Base. 2018.

[427] Tommaso Soru, Edgard Marx. Diego Moussallem, Andre Valdestilhas, Diego Esteves, Ciro Baron. SPARQL as a Foreign Language. 2018.

[428] Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. A Vietnamese Dataset for Evaluating Machine Reading Comprehension. COLING 2020.

[429] Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. Enhancing Lexical-Based Approach With External Knowledge for Vietnamese Multiple-Choice Machine Reading Comprehension. IEEE Access. 2020.

[430] Brown, Michael Scott, Michael J. Pelosi, and Henry Dirska. "Dynamic-radius species-conserving genetic algorithm for the financial forecasting of Dow Jones index stocks." Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg, 2013. 27–41.

[431] Shen, Kao-Yi; Tzeng, Gwo-Hshiung (2015). "Fuzzy Inference-Enhanced VC-DRSA Model for Technical Analysis: Investment Decision Aid". International Journal of Fuzzy Systems. 17 (3): 375–389. doi:10.1007/s40815-015-0058-8. S2CID 68241024.

[432] Quinlan, J. Ross (1987). "Simplifying decision trees". International Journal of Man-machine Studies. 27 (3): 221–234. CiteSeerX 10.1.1.18.4267. doi:10.1016/s0020-7373(87)80053-6.

[433] Hamers, Bart; Suykens, Johan AK; De Moor, Bart (2003). "Coupled transductive ensemble learning of kernel models" (PDF). Journal of Machine Learning Research. 1: 1–48.

[434] Shmueli, Galit, Ralph P. Russo, and Wolfgang Jank. "The BARISTA: a model for bid arrivals in online auctions." The Annals of Applied Statistics(2007): 412–441.

[435] Peng, Jie, and Hans-Georg Müller. "Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions." The Annals of Applied Statistics (2008): 1056–1077.

[436] Eggermont, Jeroen, Joost N. Kok, and Walter A. Kosters. "Genetic programming for data classification: Partitioning the search space."Proceedings of the 2004 ACM symposium on Applied computing. ACM, 2004.

[437] Moro, Sérgio; Cortez, Paulo; Rita, Paulo (2014). "A data-driven approach to predict the success of bank telemarketing". Decision Support Systems. 62: 22–31. doi:10.1016/j.dss.2014.03.001. hdl:10071/9499.

[438] Payne, Richard D.; Mallick, Bani K. (2014). "Bayesian Big Data Classification: A Review with Complements". arXiv:1411.5653 [stat.ME].

[439] Akbilgic, Oguz; Bozdogan, Hamparsum; Balaban, M. Erdal (2014). "A novel Hybrid RBF Neural Networks model as a forecaster". Statistics and Computing. 24 (3): 365–375. doi:10.1007/s11222-013-9375-7. S2CID 17764829.

[440] Jabin, Suraiya. "Stock market prediction using feed-forward artificial neural network." Int. J. Comput. Appl. (IJCA) 99.9 (2014).

[441] Yeh, I-Cheng; Che-hui, Lien (2009). "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients". Expert Systems with Applications. 36 (2): 2473–2480. doi:10.1016/j.eswa.2007.12.020.

[442] Lin, Shu Ling (2009). "A new two-stage hybrid approach of credit risk in banking industry". Expert Systems with Applications. 36 (4): 8333–8341. doi:10.1016/j.eswa.2008.10.015.

[443] Pelckmans, Kristiaan; et al. (2005). "The differogram: Non-parametric noise variance estimation and its use for model selection". Neurocomputing. 69 (1): 100–122. doi:10.1016/j.neucom.2005.02.015.

[444] Bay, Stephen D.; et al. (2000). "The UCI KDD archive of large data sets for data mining research and experimentation". ACM SIGKDD Explorations Newsletter. 2 (2): 81–85. CiteSeerX 10.1.1.15.9776. doi:10.1145/380995.381030. S2CID 534881.

[445] Lucas, D. D.; et al. (2015). "Designing optimal greenhouse gas observing networks that consider performance and cost". Geoscientific Instrumentation, Methods and Data Systems. 4 (1): 121. Bibcode:2015GI......4..121L. doi:10.5194/gi-4-121-2015.

[446] Pales, Jack C.; Keeling, Charles D. (1965). "The concentration of atmospheric carbon dioxide in Hawaii". Journal of Geophysical Research. 70 (24): 6053–6076. Bibcode:1965JGR....70.6053P. doi:10.1029/jz070i024p06053.

[447] Sigillito, Vincent G., et al. "Classification of radar returns from the ionosphere using neural networks." Johns Hopkins APL Technical Digest10.3 (1989): 262–266.

[448] Zhang, Kun, and Wei Fan. "Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond." Knowledge and Information Systems14.3 (2008): 299–326.

[449] Reich, Brian J., Montserrat Fuentes, and David B. Dunson. "Bayesian spatial quantile regression." Journal of the American Statistical Association (2012).

[450] Kohavi, Ron (1996). "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid". KDD. 96.

[451] Oza, Nikunj C., and Stuart Russell. "Experimental comparisons of online and batch versions of bagging and boosting." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.

[452] Bay, Stephen D (2001). "Multivariate discretization for set mining". Knowledge and Information Systems. 3 (4): 491–512. CiteSeerX 10.1.1.217.921. doi:10.1007/pl00011680. S2CID 10945544.

[453] Ruggles, Steven (1995). "Sample designs and sampling errors". Historical Methods: A Journal of Quantitative and Interdisciplinary History. 28 (1): 40–46. doi:10.1080/01615440.1995.9955312.

[454] Meek, Christopher, Bo Thiesson, and David Heckerman. "The Learning Curve Method Applied to Clustering." AISTATS. 2001.

[455] Fanaee-T, Hadi; Gama, Joao (2013). "Event labeling combining ensemble detectors and background knowledge". Progress in Artificial Intelligence. 2 (2–3): 113–127. doi:10.1007/s13748-013-0040-3. S2CID 3345087.

[456] Giot, Romain, and Raphaël Cherrier. "Predicting bikeshare system usage up to one day ahead." Computational intelligence in vehicles and transportation systems (CIVTS), 2014 IEEE symposium on. IEEE, 2014.

[457] Zhan, Xianyuan; et al. (2013). "Urban link travel time estimation using large-scale taxi data with partial information". Transportation Research Part C: Emerging Technologies. 33: 37–49. doi:10.1016/j.trc.2013.04.001.

[458] Moreira-Matias, Luis; et al. (2013). "Predicting taxi–passenger demand using streaming data". IEEE Transactions on Intelligent Transportation Systems. 14 (3): 1393–1402. doi:10.1109/tits.2013.2262376. S2CID 14764358.

[459] Hwang, Ren-Hung; Hsueh, Yu-Ling; Chen, Yu-Ting (2015). "An effective taxi recommender system based on a spatio-temporal factor analysis model". Information Sciences. 314: 28–40. doi:10.1016/j.ins.2015.03.068.

[460] H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and Cyrus Shahabi. Big data and its technical challenges. Commun. ACM, 57(7):86–94, July 2014.

[461] ttp://pems.dot.ca.gov/

[462] Meusel, Robert, et al. "The Graph Structure in the Web—Analyzed on Different Aggregation Levels."The Journal of Web Science 1.1 (2015).

[463] Kushmerick, Nicholas. "Learning to remove internet advertisements." Proceedings of the third annual conference on Autonomous Agents. ACM, 1999.

[464] Fradkin, Dmitriy, and David Madigan. "Experiments with random projections for machine learning."Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.

[465] This data was used in the American Statistical Association Statistical Graphics and Computing Sections 1999 Data Exposition.

[466] Ma, Justin, et al. "Identifying suspicious URLs: an application of large-scale online learning."Proceedings of the 26th annual international conference on machine learning. ACM, 2009.

[467] Levchenko, Kirill, et al. "Click trajectories: End-to-end analysis of the spam value chain." Security and Privacy (SP), 2011 IEEE Symposium on. IEEE, 2011.

[468] Mohammad, Rami M., Fadi Thabtah, and Lee McCluskey. "An assessment of features related to phishing websites using an automated technique."Internet Technology And Secured Transactions, 2012 International Conference for. IEEE, 2012.

[469] Singh, Ashishkumar, et al. "Clustering Experiments on Big Transaction Data for Market Segmentation." Proceedings of the 2014 International Conference on Big Data Science and Computing. ACM, 2014.

[470] Bollacker, Kurt, et al. "Freebase: a collaboratively created graph database for structuring human knowledge." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.

[471] Mintz, Mike, et al. "Distant supervision for relation extraction without labeled data." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009.

[472] Mesterharm, Chris, and Michael J. Pazzani. "Active learning using on-line algorithms."Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.

[473] Wang, Shusen; Zhang, Zhihua (2013). "Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling" (PDF). The Journal of Machine Learning Research. 14 (1): 2729–2769. arXiv:1303.4207. Bibcode:2013arXiv1303.4207W.

[474] Cattral, Robert; Oppacher, Franz; Deugo, Dwight (2002). "Evolutionary data mining with automatic rule generalization" (PDF). Recent Advances in Computers, Computing and Communications: 296–300. S2CID 18625415.

[475] Burton, Ariel N.; Kelly, Paul H.J. (2006). "Performance prediction of paging workloads using lightweight tracing". Future Generation Computer Systems. Elsevier BV. 22 (7): 784–793. doi:10.1016/j.future.2006.02.003. ISSN 0167-739X.

[476] Bain, Michael; Muggleton, Stephen (1994). "Learning optimal chess strategies". Machine Intelligence. Oxford University Press, Inc. 13.

[477] Quilan, J. R. (1983). "Learning efficient classification procedures and their application to chess end games". Machine Learning: An Artificial Intelligence Approach. 1: 463–482. doi:10.1007/978-3-662-12405-5_15. ISBN 978-3-662-12407-9.

[478] Shapiro, Alen D. (1987). Structured induction in expert systems. Addison-Wesley Longman Publishing Co., Inc.

[479] Matheus, Christopher J.; Rendell, Larry A. (1989). "Constructive Induction on Decision Trees" (PDF). IJCAI. 89.

[:5-480] Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. Vol. 571. John Wiley & Sons, 2005.

[481] Ruotsalo, Tuukka; Aroyo, Lora; Schreiber, Guus (2009). "Knowledge-based linguistic annotation of digital cultural heritage collections" (PDF). IEEE Intelligent Systems. 24 (2): 64–75. doi:10.1109/MIS.2009.32. S2CID 6667472.

[482] Li, Lihong, et al. "Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.

[483] Yeung, Kam Fung, and Yanyan Yang. "A proactive personalized mobile news recommendation system." Developments in E-systems Engineering (DESE), 2010. IEEE, 2010.

[484] Gass, Susan E.; Roberts, J. Murray (2006). "The occurrence of the cold-water coral Lophelia pertusa (Scleractinia) on oil and gas platforms in the North Sea: colony growth, recruitment and environmental controls on distribution". Marine Pollution Bulletin. 52 (5): 549–559. doi:10.1016/j.marpolbul.2005.10.002. PMID 16300800.

[485] Gionis, Aristides; Mannila, Heikki; Tsaparas, Panayiotis (2007). "Clustering aggregation". ACM Transactions on Knowledge Discovery from Data. 1 (1): 4. CiteSeerX 10.1.1.709.528. doi:10.1145/1217299.1217303. S2CID 433708.

[486] Obradovic, Zoran, and Slobodan Vucetic.Challenges in Scientific Data Mining: Heterogeneous, Biased, and Large Samples. Technical Report, Center for Information Science and Technology Temple University, 2004.

[487] Van Der Putten, Peter; van Someren, Maarten (2000). "CoIL challenge 2000: The insurance company case". Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report. 9: 1–43.

[488] Mao, K. Z. (2002). "RBF neural network center selection based on Fisher ratio class separability measure". IEEE Transactions on Neural Networks. 13 (5): 1211–1217. doi:10.1109/tnn.2002.1031953. PMID 18244518.

[489] Olave, Manuel; Rajkovic, Vladislav; Bohanec, Marko (1989). "An application for admission in public school systems" (PDF). Expert Systems in Public Administration. 1: 145–160.

[490] Lizotte, Daniel J., Omid Madani, and Russell Greiner. "Budgeted learning of nailve-bayes classifiers." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2002.

[491] Lebowitz, Michael (1986). Concept learning in a rich input domain: Generalization-based memory. Machine Learning: An Artificial Intelligence Approach. 2. pp. 193–214. ISBN 9780934613002.

[492] Yeh, I-Cheng; Yang, King-Jang; Ting, Tao-Ming (2009). "Knowledge discovery on RFM model using Bernoulli sequence". Expert Systems with Applications. 36 (3): 5866–5871. doi:10.1016/j.eswa.2008.07.018.

[493] Lee, Wen-Chen; Cheng, Bor-Wen (2011). "An intelligent system for improving performance of blood donation". Journal of Quality Vol. 18 (2): 173.

[494] Schmidtmann, Irene, et al. "Evaluation des Krebsregisters NRW Schwerpunkt Record Linkage." Abschlußbericht vom 11 (2009).

[495] Sariyar, Murat; Borg, Andreas; Pommerening, Klaus (2011). "Controlling false match rates in record linkage using extreme value theory". Journal of Biomedical Informatics. 44 (4): 648–654. doi:10.1016/j.jbi.2011.02.008. PMID 21352952.

[496] Candillier, Laurent, and Vincent Lemaire. "Design and Analysis of the Nomao challenge Active Learning in the Real-World." Proceedings of the ALRA: Active Learning in Real-world Applications, Workshop ECML-PKDD. 2012.

[497] Marquez, Ivan Garrido. "A Domain Adaptation Method for Text Classification based on Self-adjusted Training Approach." (2013).

[498] Nagesh, Harsha S., Sanjay Goil, and Alok N. Choudhary. "Adaptive Grids for Clustering Massive Data Sets." SDM. 2001.

[499] Kuzilek, Jakub, et al. "OU Analyse: analysing at-risk students at The Open University." Learning Analytics Review (2015): 1–16.

[500] Siemens, George, et al. Open Learning Analytics: an integrated & modularized platform. Diss. Open University Press, 2011.