List of Datasets For Machine-Learning Research
List of Datasets For Machine-Learning Research
These datasets are applied for machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of
machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively,
the availability of high-quality training datasets.[1] High-quality labeled training datasets for supervised and semi-supervised machine learning algorithms are
usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality
datasets for unsupervised learning can also be difficult and costly to produce.[2][3][4][5]
Many organizations including governments publish and share their datasets . The datasets are classified, based on the licenses, as Open data and Non-Open data.
The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made
available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes.
                                     Finance, Economics, Commerce, Societal, Health, Academy, Sports, Food, Agriculture, Travel, Geospatial, Political, Consumer,
 Specific category
                                     Transport, Logistics, Environmental, Real-Estate, Legal, Entertainment, Energy, Hospitality
 Scope                               Supranational Union, National, Subnational, Municipality, Urban, Rural
 Status (https://docs.openml.org/
                                     Verified, In-Preparation, Deactivated(or Deprecated)
 #dataset-status)
 Number of records                   100s, 1000s, 10000s, 100000s, Millions
The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many
government organizations and academic institutions.
                                           https://ckan.github.io/ckan-instances/
                                                                                                                              Data repository for government or non-profit
 Comprehensive Knowledge
                               AGPL                                                                                           organisations, Data Management Solution for
 Archive Network (CKAN)                    https://github.com/sebneu/ckan_instances/blob/master/instances.csv                 Research Institutes
                                           https://dataverse.org/installations
                                                                                                                              Data Management Solution for Research
 Dataverse                     Apache
                                           https://dataverse.org/metrics                                                      Institutes
Datasetlist.com https://www.datasetlist.com
 Global Open Data Index – Open Knowledge         https://index.okfn.org/ Archived (https://web.archive.org/web/20200525213547/https://index.okfn.org/) 25 May 2020 at the
 Foundation                                      Wayback Machine
 Google Dataset Search                           https://datasetsearch.research.google.com/
Kaggle https://www.kaggle.com/datasets
OpenDOAR https://v2.sherpa.ac.uk/opendoar/
 OpenML                                          https://www.openml.org/search?type=data
 Papers with Code                                https://paperswithcode.com/datasets
Image data
These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.
Facial recognition
In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces.
   Dataset                                                                                                         Created
                Brief description        Preprocessing     Instances              Format         Default task                            Reference     Creator
    name                                                                                                          (updated)
                                         Files labelled
Ryerson
               7,356 video and           with
Audio-Visual                                                                                    Classification,
               audio recordings of       expression.                                                                                                 S.R.
Database of                                                                                     face
               24 professional           Perceptual                                                                           [12][13]               Livingstone
Emotional                                                  7,356        Video, sound files      recognition,      2018
               actors. 8 emotions        validation                                                                                                  and F.A.
Speech and                                                                                      voice
               each at two               ratings                                                                                                     Russo
Song                                                                                            recognition
               intensities.              provided by
(RAVDESS)
                                         319 raters.
                                         Location of
               Color images of           facial features                                        Classification,
                                                                                                                              [14][15]               M. Grgic et
SCFace         faces at various          extracted.        4,160        Images, text            face              2011
                                                                                                                                                     al.
               angles.                   Coordinates of                                         recognition
                                         features given.
               Faces of 15
Yale Face      individuals in 11         Labels of                                              Face                          [16][17]               J. Yang et
                                                           165          Images                                    1997
Database       different                 expressions.                                           recognition                                          al.
               expressions.
Cohn-Kanade
               Large database of         Tracking of                                            Facial
AU-Coded                                                   500+                                                               [18][19]               T. Kanade
               images with labels        certain facial                 Images, text            expression        2000
Expression                                                 sequences                                                                                 et al.
               for expressions.          features.                                              analysis
Database
               Images of faces
BioID Face                               Manually set                                           Face                          [24][25]
               with eye positions                          1521         Images, text                              2001                               BioID
Database                                 eye positions.                                         recognition
               marked.
               neutral face, 5
               expressions: anger,                                                              Face
UOY 3D-                                                                                                                       [30][31]               University
Face
               happiness, sadness,       labeling.         5250         Images, text            recognition,      2004
                                                                                                                                                     of York
               eyes closed,                                                                     classification
               eyebrows raised.
                                                                                                                                                                             Institute of
                Expressions: Anger,
CASIA 3D                                                                                                     Face                                                            Automation,
                smile, laugh,                                                                                                                [32][33]
Face
                surprise, closed
                                         None.               4624            Images, text                    recognition,      2007                                          Chinese
Database                                                                                                     classification                                                  Academy of
                eyes.
                                                                                                                                                                             Sciences
                Expressions: Anger                                           Annotated Visible
                                                                                                             Face
                Disgust Fear                                                 Spectrum and Near Infrared                                      [34]                            Zhao, G. et
CASIA NIR                                None.               480                                             recognition,      2011
                Happiness Sadness                                            Video captures at 25                                                                            al.
                                                                                                             classification
                Surprise                                                     frames per second
                Up to 22 samples
Face                                                                                                                                                                         National
                for each subject.
Recognition                                                                                                  Face                                                            Institute of
                Expressions: anger,                                                                                                          [36][37]
Grand                                    None.               4007            Images, text                    recognition,      2004                                          Standards
                happiness, sadness,
Challenge                                                                                                    classification                                                  and
                surprise, disgust,
Dataset                                                                                                                                                                      Technology
                puffy. 3D Data.
                Up to 61 samples
                for each subject.
                Expressions neutral                                                                          Face                                                            King Juan
Gavabdb         face, smile, frontal     None.               549             Images, text                    recognition,      2008          [38][39]                        Carlos
                accentuated laugh,                                                                           classification                                                  University
                frontal random
                gesture. 3D images.
                                                                                                             Gender
                                         A set of
                                                                                                             classification,
                                         synthetic filters
                112 persons (66                              42,592                                          face
                                         (blur,
                males and 46                                 (2,662                                          detection,
                                         occlusions,
                females) wear                                original                                        face                            [42][43]                        Afifi, M. et
SoF                                      noise, and                          Images, Mat file                                  2017
                glasses under                                image × 16                                      recognition,                                                    al.
                                         posterization )
                different illumination                       synthetic                                       age
                                         with different
                conditions.                                  image)                                          estimation,
                                         level of
                                                                                                             and glasses
                                         difficulty.
                                                                                                             detection
                                                                                                             Gender
                                                                                                             classification,
                IMDb and Wikipedia                                                                           face
                                                                                                                                                                             R. Rothe,
                face images with                                                                             detection,                      [44]
IMDb-WIKI                                None                523,051         Images                                            2015                                          R. Timofte,
                gender and age                                                                               face
                                                                                                                                                                             L. V. Gool
                labels.                                                                                      recognition,
                                                                                                             age
                                                                                                             estimation
Action recognition
                                                                                                                                             Created
   Dataset name              Brief description               Preprocessing             Instances         Format           Default Task                  Reference       Creator
                                                                                                                                            (updated)
                                                                                       45M                               Classification,
                        Large video dataset for        Actions classified and                        Video, images,                                     [47][48]
THUMOS Dataset                                                                         frames of                         action             2013                    Y. Jiang et al.
                        action classification.         labeled.                                      text
                                                                                       video                             detection
Dataset Name Brief description Preprocessing Instances Format Default Task Created (updated) Reference
Berkeley 3-D     849 images taken        Object bounding boxes         849            labeled images, text             Object              2014                       [51][52]            A.
Object           in 75 different         and labeling.                                                                 recognition                                                        al.
Dataset          scenes. About 50
                 different object
                classes are
                labeled.
                Labeled object
                image database,
                                       Labeled objects,                                                    Object
                used in the
                                       bounding boxes,                                                     recognition,                    [59][60][61]
ImageNet        ImageNet Large                                        14,197,122    Images, text                             2009 (2014)                  J.
                                       descriptive words, SIFT                                             scene
                Scale Visual
                                       features                                                            recognition
                Recognition
                Challenge
                A Large set of
                images listed as
                having CC BY 2.0                                                                                             2017
                license with image-                                                                        Classification,
                                       Image-level labels,                                                                                 [62]
Open Images     level labels and                                      9,178,275     Images, text           Object
                bounding boxes
                                       Bounding boxes
                                                                                                           recognition       (V7 : 2022)
                spanning
                thousands of
                classes.
TV News
Channel         TV commercials         Audio and video features
                                                                                                           Clustering,                     [63][64]
Commercial      and news               extracted from still           129,685       Text                                     2015                         P.
                                                                                                           classification
Detection       broadcasts.            images.
Dataset
                                                                                                                                                          MI
                                                                                                           Classification,                                Sc
                Annotated pictures                                                                                                         [72]
LabelMe                                Objects outlined.              187,240       Images, text           object            2005                         Art
                of scenes.
                                                                                                           detection                                      Int
                                                                                                                                                          La
                Stereo video
                sequences
                recorded in street                                                                         Classification,
Cityscapes                             Pixel-level segmentation                                                                            [73]           Da
                scenes, with pixel-                                   25,000        Images, text           object            2016
Dataset                                and labeling                                                                                                       al.
                level annotations.                                                                         detection
                Metadata also
                included.
                Large number of
                                                                                                           Classification,
PASCAL VOC      images for             Labeling, bounding box                                                                              [74][75]       M.
                                                                      500,000       Images, text           object            2010
Dataset         classification         included                                                                                                           et
                                                                                                           detection
                tasks.
                Like CIFAR-10,
                                       Classes labelled,
CIFAR-100       above, but 100                                                                                                             [60][76]       A.
                                       training set splits            60,000        Images                 Classification    2009
Dataset         classes of objects                                                                                                                        et
                                       created.
                are given.
               A unified                                                                                                                                 Lu
               contribution of                                                                                                                           Ell
               CIFAR-10 and            Classes labelled,                                                                                                 Cro
CINIC-10                                                                                                                                  [77]
               Imagenet with 10        training, validation, test    270,000       Images                        Classification    2018                  An
Dataset
               classes, and 3          set splits created.                                                                                               An
               splits. Larger than                                                                                                                       Am
               CIFAR-10.                                                                                                                                 Sto
               A MNIST-like            Classes labelled,
Fashion-                                                                                                                                  [78]
               fashion product         training set splits           60,000        Images                        Classification    2017                  Za
MNIST
               database                created.
               Some publicly
               available fonts and
               extracted glyphs
               from them to make       Classes labelled,
                                                                                                                                          [79]           Ya
notMNIST       a dataset similar to    training set splits           500,000       Images                        Classification    2011
                                                                                                                                                         Bu
               MNIST. There are        created.
               10 classes, with
               letters A-J taken
               from different fonts.
               Images from
               vehicles of traffic
German         signs on German
Traffic Sign   roads. These signs
Detection      comply with UN          Signs manually labeled        900           Images                        Classification    2013   [80][81]       S
Benchmark      standards and
Dataset        therefore are the
               same as in other
               countries.
               Autonomous
               vehicles driving
               through a mid-size
KITTI Vision                                                                                                     Classification,
               city captured           Many benchmarks               >100 GB of                                                           [82][83][84]
Benchmark                                                                          Images, text                  object            2012                  AG
               images of various       extracted from data.          data
Dataset                                                                                                          detection
               areas using
               cameras and laser
               scanners.
                                       Classes labelled,
Linnaeus 5     Images of 5                                                                                                                [85]           Ch
                                       training set splits           8000          Images                        Classification    2017
dataset        classes of objects.                                                                                                                       Ka
                                       created.
               Multi-modal dataset
               for obstacle
               detection in
               agriculture
                                                                                                                 Classification,
               including stereo
                                                                                                                 object
               camera, thermal         Classes labelled              >400 GB of    Images and 3D point                                    [86]
FieldSAFE                                                                                                        detection,        2017                  M.
               camera, web             geographically.               data          clouds
                                                                                                                 object
               camera, 360-
                                                                                                                 localization
               degree camera,
               lidar, radar, and
               precise
               localization.
               11,076 hand
               images (1600 x
               1200 pixels) of 190
                                                                                                                 Gender
               subjects, of varying
                                                                     11,076 hand   Images and (.mat, .txt, and   recognition              [87]
11K Hands      ages between 18 –       None                                                                                        2017                  M
                                                                     images        .csv) label files             and biometric
               75 years old, for
                                                                                                                 identification
               gender recognition
               and biometric
               identification.
               Specifically
               designed for
               Continuous/Lifelong
               Learning and                                                        images (.png or .pkl)
                                       Classes labelled,
               Object Recognition,
                                       training set splits           164,866                                     Classification,
               is a collection of                                                  and (.pkl, .txt, .tsv)                                 [88]           V.
CORe50                                 created based on a 3-         RBG-D                                       Object            2017
               more than 500                                                                                                                             an
                                       way, multi-runs               images        label files                   recognition
               videos (30fps) of
                                       benchmark.
               50 domestic
               objects belonging
               to 10 different
               categories.
OpenLORIS-     Lifelong/Continual      Classes labelled,             1,106,424     images (.png and .pkl)        Classification,   2019   [89]           Q.
Object         Robotic Vision          training/validation/testing   RBG-D                                       Lifelong
               dataset                 set splits created by         images        and (.pkl) label files        object
               (OpenLORIS-             benchmark scripts.                                                        recognition,
               Object) collected                                                                                 Robotic
               by real robots                                                                                    Vision
               mounted with
               multiple high-
               resolution sensors,
               includes a
               collection of 121
               object instances
               (1st version of
               dataset, 40
               categories daily
               necessities objects
               under 20 scenes).
               The dataset has
               rigorously
               considered 4
                 environment
                 factors under
                 different scenes,
                 including
                 illumination,
                 occlusion, object
                 pixel size and
                 clutter, and defines
                 the difficulty levels
                 of each factor
                 explicitly.
                 The Cambridge-
                                                                                                                                                       Ga
                 driving Labeled                                                                               Object
                                         The dataset is labeled                                                                                        Bro
                 Video Database                                       over 700                                 recognition              [95][96][97]
CamVid                                   with semantic labels for                    Images                                      2008                  Sh
                 (CamVid) is a                                        images                                   and
                                         32 semantic classes.                                                                                          Fa
                 collection of                                                                                 classification
                                                                                                                                                       Ro
                 videos.
                                                                                                                                                       Oli
                                                                                                                                                       Ma
                 RailSem19 is a                                                                                Object
                                                                                                                                                       Mu
                 dataset for                                                                                   recognition
                                         The dataset is labeled                                                                                        Ma
                 understanding                                                                                 and                      [98][99]
RailSem19                                semanticly and box-          8500           Images                                      2019                  Ze
                 scenes for vision                                                                             classification,
                                         wise.                                                                                                         Da
                 systems on                                                                                    scene
                                                                                                                                                       Ste
                 railways.                                                                                     recognition
                                                                                                                                                       Sa
                                                                                                                                                       Cs
                                                                                                                                                       Ke
                                                                                                                                                       Bu
                 BOREAS is a
                                                                                                                                                       J.
                 multi-season
                                                                                                                                                       Yu
                 autonomous driving
                                                                                                                                                       An
                 dataset. It includes
                                                                                                               Object                                  Ha
                 data from includes
                                                                                                               recognition                             Sh
                 a Velodyne Alpha-
                                         The data is annotated by     350 km of      Images, Lidar and Radar   and                      [100][101]     Jin
BOREAS           Prime (128-beam)                                                                                                2023
                                         3D bounding boxes.           driving data   data                      classification,                         We
                 lidar, a FLIR
                                                                                                               scene                                   Ts
                 Blackfly S camera,
                                                                                                               recognition                             La
                 a Navtech CIR304-
                                                                                                                                                       Y.K
                 H radar, and an
                                                                                                                                                       An
                 Applanix POS LV
                                                                                                                                                       Sc
                 GNSS-INS.
                                                                                                                                                       Tim
                                                                                                                                                       Ba
                                                                      5000
                                                                      images for
                                         The labeling include         training and                                                                     Ka
Bosch Small
                 It is a dataset of      bounding boxes of traffic    a video                                  Traffic light            [102][103]     Be
Traffic Lights                                                                       Images                                      2017
                 traffic lights.         lights together with their   sequence of                              recognition                             No
Dataset
                                         state (active light).        8334                                                                             Bo
                                                                      frames for
                                                                      evaluation
                                                                                                                                                       Je
                                                                                                                                                       Nic
                                         The labeling include                                                                                          Ré
                 It is a dataset of      bounding boxes of                                                     Railway                                 Ra
                                                                      more than                                                         [104][105]
FRSign           French railway          railway signals together                    Images                    signal            2020                  Ch
                                                                      100000
                 signals.                with their state (active                                              recognition                             Gr
                                         light).                                                                                                       Ro
                                                                                                                                                       Po
                                                                                                                                                       Ha
                                         The labeling include
                                                                                                                                                       Ph
                 It is a dataset of      bounding boxes of                                                     Railway
                                                                                                                                        [106][107]     Fa
GERALD           German railway          railway signals together     5000           Images                    signal            2023
                                                                                                                                                       Ch
                 signals.                with their state (active                                              recognition
                                                                                                                                                       Sc
                                         light).
Multi-cue        Multi-cue onboard       The databaset is labeled     1092 image     Images                    Object            2009   [108]          Ch
pedestrian       pedestrian              box-wise.                    pairs with                               recognition                             Wo
                 detection dataset is                                 1776 boxes                               and                                     Wa
                 a dataset for                                        for                                      classification                          Sc
                                                                      pedestrians
                 detection of
                 pedestrians.
                                                                                                                                                                                    Tu
                 RAWPED is a                                                                                                                                                        Bu
                                                                                                                    Object
                 dataset for                                                                                                                                                        Be
                                         The dataset is labeled                                                     recognition                                    [109][110]
RAWPED           detection of                                        26000            Images                                           2020                                         Bu
                                         box-wise.                                                                  and
                 pedestrians in the                                                                                                                                                 Cu
                                                                                                                    classification
                 context of railways.                                                                                                                                               Gu
                                                                                                                                                                                    Alp
                 OSDaR23 is a
                                                                                                                                                                                    DZ
                 multi-sensory                                                                                      Object
                                                                                                                                                                                    Sc
                 dataset for             The databaset is labeled    16874            Images, Lidar, Radar and      recognition                                    [111][112]
OSDaR23                                                                                                                                2023                                         De
                 detection of objects    box-wise.                   frames           Infrared                      and
                                                                                                                                                                                    an
                 in the context of                                                                                  classification
                                                                                                                                                                                    Fu
                 railways.
                                                                                                                                                                                    Arg
                 Argoverse is a                                                                                     Object
                                                                                                                                                                                    Ca
                 multi-sensory                                                                                      recognition
                                                                                                                                                                                    Me
                 dataset for             The dataset is annotated    320 hours        Data from 7 cameras and       and                                            [113][114]
Agroverse                                                                                                                              2022                                         Un
                 detection of objects    box-wise.                   of recording     LiDAR                         classification,
                                                                                                                                                                                    Ge
                 in the context of                                                                                  object
                                                                                                                                                                                    Ins
                 roads.                                                                                             tracking
                                                                                                                                                                                    Te
               Artificially
               generated        Coordinates of
               data             lines drawn
Artificial
               describing       given as                                                                         Handwriting recognition,                  [115]
Characters                                         6000                             Text                                                       1992
               the structure    integers.                                                                        classification
Dataset
               of 10 capital    Various other
               English          features.
               letters.
               Online
               handwritten
               Chinese
               character
               database,
                                Provides the
               collected
CASIA-                          sequences of                                                                     Handwriting recognition,                  [119][118]
               using Anoto                         1,174,364                        Images, Text                                               2009
OLHWDB                          coordinates of                                                                   classification
               pen on paper.
                                strokes.
               3755 classes
               in the GB
               2312
               character
               set.
               Labeled
               samples of
                                3-dimensional
               pen tip
Character                       pen tip velocity
               trajectories                                                                                      Handwriting recognition,                  [120][121]
Trajectories                    trajectory         2858                             Text                                                       2008
               for people                                                                                        classification
Dataset                         matrix for each
               writing
                                sample
               simple
               characters.
               Character
               recognition in
               natural
                                                                                                                 Character recognition,
Chars74K       images of                                                                                                                                   [122]
                                                   74,107                                                        handwriting recognition,      2009
Dataset        symbols
                                                                                                                 OCR, classification
               used in both
               English and
               Kannada
                                Derived from
                                NIST Special
                                Database 19.                                                                                                               EMNIST dataset[124]
               Handwritten      Converted to                                                                     character recognition,
EMNIST         characters       28x28 pixel        800,000                          Images                       classification, handwriting   2016
dataset        from 3600        images,                                                                          recognition                               Documentation[125
               contributors     matching the
                                MNIST
                                dataset.[123]
UJI Pen        Isolated         Coordinates of     11,640                           Text                         Handwriting recognition,      2009        [126][127]
Characters     handwritten      pen position as                                                                  classification
Dataset        characters        characters
                                 were written
                                 given.
               Handwriting       Features
               samples           extracted from
               from the          images, split
Gisette                                                                                                             Handwriting recognition,                        [128]
               often-            into train/test,   13,500                          Images, text                                                     2003
Dataset                                                                                                             classification
               confused 4        handwriting
               and 9             images size-
               characters.       normalized.
               1623
               different
               handwritten
Omniglot                                                                                                            Classification, one-shot                        [129][130]
               characters        Hand-labeled.      38,300                          Images, text, strokes                                            2015
dataset                                                                                                             learning
               from 50
               different
               alphabets.
               Database of
MNIST                                                                                                                                                               [131][132]
               handwritten       Hand-labeled.      60,000                          Images, text                    Classification                   1994
database
               digits.
Optical
Recognition    Normalized        Size
of             bitmaps of        normalized and                                                                     Handwriting recognition,                        [133]
                                                    5620                            Images, text                                                     1998
Handwritten    handwritten       mapped to                                                                          classification
Digits         data.             bitmaps.
Dataset
Pen-Based
                                 Feature
Recognition    Handwritten
                                 vectors
of             digits on                                                                                            Handwriting recognition,                        [134][135]
                                 extracted to be    10,992                          Images, text                                                     1998
Handwritten    electronic                                                                                           classification
                                 uniformly
Digits         pen-tablet.
                                 spaced.
Dataset
                                 All handwritten
                                 digits have
Semeion
               Handwritten       been
Handwritten                                                                                                         Handwriting recognition,                        [136]
               digits from       normalized for     1593                            Images, text                                                     2008
Digit                                                                                                               classification
               80 people.        size and
Dataset
                                 mapped to the
                                 same grid.
Aerial images
                                                                                                                             Created
  Dataset name              Brief description           Preprocessing          Instances      Format      Default Task                   Reference                    Creator
                                                                                                                            (updated)
Aditya Arora,
                                                                                                                                                            Akshita Gupta,
                                                    Precise instance-level
                                                    annotatio carried out by                              Aerial
iSAID: Instance                                     professional                                          Classification,                                   Salman Khan,
                                                                               655,451
Segmentation in                                     annotators, cross-                        Images,     Object                        [140][141]
                                                                               (15                                          2019
Aerial Images                                       checked and validated
                                                                               classes)
                                                                                              jpg, json   Detection,                                        Guolei Sun,
Dataset                                             by expert annotators                                  Instance
                                                    complying with well-                                  Segmentation                                      Fahad Shahbaz Khan,
                                                    defined guidelines.
                                                                                                                                                            Fan Zhu,
Aerial Image          80 high-resolution            Images manually            80             Images      Aerial            2013        [142][143]          J. Yuan et al.
Segmentation          aerial images with            segmented.                                            Classification,
Dataset               spatial resolution                                                                  object
                                                                                                          detection
                    ranging from 0.3 to
                    1.0.
                    Multiple labeled           Images manually
                                                                                            Images       People
                    training and evaluation    labeled to show paths                                                                     [144][145]
KIT AIS Data Set                                                           ~ 150            with         tracking,           2012                            M. Butenuth et al.
                    datasets of aerial         of individuals through
                                                                                            paths        aerial tracking
                    images of crowds.          crowds.
                    Maritime scenes of
                    optical aerial images
                    from the visible
                    spectrum. It contains
                    color images in
                                                                                                         Classification,
                    dynamic marine             Object bounding boxes                                                                     [148][149]
MASATI dataset                                                             7389             Images       aerial object       2018                            A.-J. Gallego et al.
                    environments, each         and labeling.
                                                                                                         detection
                    image may contain
                    one or multiple targets
                    in different weather
                    and illumination
                    conditions.
Forest Type         Satellite imagery of       Image wavelength                                                                          [150][151]
                                                                           326              Text         Classification      2015                            B. Johnson
Mapping Dataset     forests in Japan.          bands extracted.
                                               Over 30 annotations
                    Annotated overhead         and over 60 statistics
Overhead Imagery                                                                            Images,                                      [152][153]
                    imagery. Images with       that describe the target    1000                          Classification      2009                            F. Tanner et al.
Research Data Set                                                                           text
                    multiple objects.          within the context of
                                               the image.
                    SpaceNet is a corpus
                                               GeoTiff and GeoJSON                                       Classification,
                    of commercial satellite                                                                                              [154][155][156]
SpaceNet                                       files containing building   >17533           Images       Object              2017                            DigitalGlobe, Inc.
                    imagery and labeled
                                               footprints.                                               Identification
                    training data.
Underwater images
                                                                                                                                          Created
    Dataset name             Brief description                 Preprocessing                Instances      Format         Default Task                     Reference            Creator
                                                                                                                                         (updated)
Other images
                                                                                                                                          Created
   Dataset name           Brief description             Preprocessing              Instances          Format        Default Task                           Reference            Creator
                                                                                                                                         (updated)
                                                                                                                                                                        A. Ebadi, P.
                      A novel benchmark gas                                                          Image,                                                [162][163]
NRC-GAMMA                                          None                            28,883                        Classification          2021                           Paul, S. Auer, &
                      meter image dataset                                                            Label
                                                                                                                                                                        S. Tremblay
The                   Images of scanned            None                            4908              TIFF/pdf    Source device           2020              [164]        C. Ben Rabah
SUPATLANTIQUE         official and Wikipedia                                                                     identification,                                        et al.
dataset                   documents                                                                          forgery detection,
                                                                                                             Classification,..
                          2D keypoints and                                                                   3D
StanfordExtra                                        2D keypoints and                            Labelled                                   [173]
                          segmentations for the                                    12,035                    reconstruction/pose     2020                B. Biggs et al.
Dataset                                              segmentations provided.                     images
                          Stanford Dogs Dataset.                                                             estimation
                                                     Breed labeled, tight
                          37 categories of pets
The Oxford-IIIT Pet                                  bounding box,                               Images,     Classification,                [172][174]
                          with roughly 200 images                                  ~ 7,400                                           2012                O. Parkhi et al.
Dataset                                              foreground-background                       text        object detection
                          of each.
                                                     segmentation.
Online Video
                          Transcoding times for
Characteristics and                                                                                                                         [177]
                          various different videos   Video features given.         168,286       Text        Regression              2015                T. Deneke et al.
Transcoding Time
                          and video properties.
Dataset.
                                                     Descriptive caption and
Microsoft Sequential                                 storytelling given for
                          Dataset for sequential                                                 Images,                                    [178]        Microsoft
Image Narrative                                      each photo, and photos        81,743                    Visual storytelling     2016
                          vision-to-language                                                     text                                                    Research
Dataset (SIND)                                       are arranged in
                                                     sequences
Discrete LIRIS-           Short videos annotated     Valence and arousal                                     Video emotion                  [185]
                                                                                   9800          Video                               2015                Y. Baveye et al.
ACCEDE                    for valence and arousal.   labels.                                                 elicitation detection
                      Labeled Information
                      Library of Alexandria:
                      Biology and
                      Conservation. Labeled
                                                                              ~10M                                                    [193]        LILA working
LILA BC               images that support         None                                       Images     Classification         2019
                                                                              images                                                               group
                      machine learning
                      research around ecology
                      and environmental
                      science.
Text data
These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis.
Reviews
                                                                                                                                   Created
     Dataset Name              Brief description                Preprocessing         Instances       Format     Default Task                 Reference           Creator
                                                                                                                                  (updated)
                                                                                                                Classification,
                          US product reviews from                                    233.1                                        2015        [197][198]
 Amazon reviews                                         None.                                        Text       sentiment                                  McAuley et al.
                          Amazon.com.                                                million                                      (2018)
                                                                                                                analysis
 Car Evaluation Data      Car properties and their      Six categorical features                                                              [204][205]
                                                                                     1728            Text       Classification    1997                     M. Bohanec
 Set                      overall acceptability.        given.
                          User vote data for pairs of
 YouTube Comedy
                          videos shown on YouTube.                                                                                            [206][207]
 Slam Preference                                        Video metadata given.        1,138,562       Text       Classification    2012                     Google
                          Users voted on funnier
 Dataset
                          videos.
 Vietnamese Social
                          Users’ Facebook                                                                                                     [212]
 Media Emotion                                          Comments                     6,927           Text       Classification    1997                     Nguyen et al.
                          Comments.
 Corpus (UIT-VSMEC)
 Vietnamese Open-
 domain Complaint                                                                                                                             [213]
                          Customer product reviews      Comments                     5,485           Text       Classification    2021                     Nguyen et al.
 Detection dataset
 (ViOCD)
                                                                                     Containing
 ViHOS: Hate Speech
                                                                                     26k spans                  Span                          [214]
 Spans Detection for      Social Media Texts            Comments                                     Text                         2021                     Hoang et al.
                                                                                     on 11k                     Detection
 Vietnamese
                                                                                     comments
News articles
                                                                                                                                   Created
     Dataset Name             Brief description               Preprocessing          Instances      Format      Default Task                  Reference           Creator
                                                                                                                                  (updated)
                                                                                                                  NLP,
The Irish Times          24 Years of Ireland News        Publish time, Headline                                   Computational                  [225]
                                                                                         1,484,340    CSV                            2020                          R. Kulkarni
Ireland News Corpus      from 1996 to 2019               Category and Text                                        Linguistics,
                                                                                                                  Events
Messages
                                                                                                                                     Created
    Dataset Name               Brief description                  Preprocessing           Instances     Format     Default Task                 Reference              Creator
                                                                                                                                    (updated)
                                                          Attachments removed,
                                                                                                                  Network
                         Emails from employees at         invalid email addresses
                                                                                                                  analysis,         2004        [227][228]        Klimt, B. and Y.
Enron Email Dataset      Enron organized into             converted to                    ~ 500,000    Text
                                                                                                                  sentiment         (2015)                        Yang
                         folders.                         user@enron.com or
                                                                                                                  analysis
                                                          no_address@enron.com.
                                                                                                                  Natural
Twenty Newsgroups        Messages from 20 different                                                                                             [233]
                                                          None.                           20,000       Text       language          1999                          T. Mitchell et al.
Dataset                  newsgroups.
                                                                                                                  processing
                                                                                                                  Spam
                                                          Many text features                                                                    [234]
Spambase Dataset         Spam emails.                                                     4,601        Text       detection,        1999                          M. Hopkins et al.
                                                          extracted.
                                                                                                                  classification
                                                                                                                                     Created
    Dataset Name              Brief description               Preprocessing             Instances      Format     Default Task                   Reference              Creator
                                                                                                                                    (updated)
                                                                                                                  Clustering,
SNAP Social Circles:                                     Node features, circles,                                                                [242][243]          J. McAuley et
                         Large Twitter network data.                                    1,768,149     Text        graph             2012
Twitter Database                                         and ego networks.                                                                                          al.
                                                                                                                  analysis
Twitter Dataset for
                                                         Samples hand-labeled as                                                                [244][245]
Arabic Sentiment         Arabic tweets.                                                 2000          Text        Classification    2014                            N. Abdulla
                                                         positive or negative.
Analysis
Dutch Social media       This dataset contains           classified for sentiment,      271,342       JSONL       Sentiment,        2020        [252][253][254]     Aaaksh Gupta,
collection               COVID-19 tweets made by         tweet text & user                                        multi-label                                       CoronaWhy
                         Dutch speakers or users         description translated to                                classification,
                          from Netherlands. The data        English. Industry mention                                     machine
                          has been machine labeled          are extracted                                                 translation
Dialogues
                                                                                                                                              Created
   Dataset Name                Brief description                Preprocessing             Instances        Format       Default Task                     Reference                   Creator
                                                                                                                                             (updated)
                                                           Hand privacy masked,                                         NLP,
                          Posts from age-specific                                                                                                        [255]                Forsyth, E., Lin, J.,
NPS Chat Corpus                                            tagged for part of speech      ~ 500,000       XML           programming,         2007
                          online chat rooms.                                                                                                                                  & Martell, C.
                                                           and dialogue-act.                                            linguistics
Reddit All Comments       All Reddit comments (as                                         ~ 1.7                         NLP,                             [259]
                                                                                                          JSON                               2015                             Stuck_In_the_Matrix
Corpus                    of 2015).                                                       billion                       research
                                                                                          930
                          Dialogues extracted from                                        thousand                      Dialogue
Ubuntu Dialogue                                                                                                                                          [260]
                          Ubuntu chat stream on                                           dialogues,      CSV           Systems              2015                             Lowe, R. et al.
Corpus
                          IRC.                                                            7.1 million                   Research
                                                                                          utterances
                                                                                          DSTC2
                          The Dialog State Tracking
                                                                                          contains
                          Challenges 2 & 3
                                                                                          ~3.2k                                                                               Henderson, Matthew
                          (DSTC2&3) were research
Dialog State Tracking                                      Transcription of spoken        calls –                       Dialogue                         [261]                and Thomson,
                          challenge focused on                                                            Json                               2014
Challenge                                                  dialogs with labelling         DSTC3                         state tracking                                        Blaise and Williams,
                          improving the state of the
                                                                                          contains                                                                            Jason D
                          art in tracking the state of
                                                                                          ~2.3k
                          spoken dialog systems.
                                                                                          calls
Legal
                                                                                                                                   Default       Created
    Dataset Name                 Brief description                   Preprocessing                  Instances    Format                                          Reference             Creator
                                                                                                                                    Task        (updated)
Other text
                                                                                                                                                 Created
  Dataset Name            Brief description           Preprocessing           Instances        Format               Default Task                                  Reference             Creator
                                                                                                                                                (updated)
                                                                                                            Classification,
Web of Science          Hierarchical Datasets                                                                                                                    [266][267]         K. Kowsari et
                                                   None.                    46,985             Text                                             2017
Dataset                 for Text Classification                                                             Categorization                                                          al.
                                                                                                            Summarization,
                        Federal Court of
Legal Case                                                                                                                                                       [268][269]         F. Galgani et
                        Australia cases from       None.                    4,000              Text                                             2012
Reports
                        2006 to 2009.
                                                                                                            citation analysis                                                       al.
                        Stories and
Dataset for the
                        associated questions                                                                Natural language
Machine                                                                                                                                                          [274][275]         M. Richardson
                        for testing                None.                    660                Text         processing, machine                 2013
Comprehension of                                                                                                                                                                    et al.
                        comprehension of                                                                    comprehension
Text
                        text.
                     Naturally occurring
The Penn                                      Text is parsed into                                 Natural language                         [276][277]        M. Marcus et
                     text annotated for                                ~ 1M words     Text                                      1995
Treebank Project                              semantic trees.                                     processing, summarization                                  al.
                     linguistic structure.
                     Task given is to
                     determine, from          Features extracted
                     features given, which    include word stems.                                                                          [278]
DEXTER Dataset                                                         2600           Text        Classification                2008                         Reuters
                     articles are about       Distractor features
                     corporate                included.
                     acquisitions.
                     Collected for
                     experiments in
                                              In addition to normal
                     Authorship Attribution
                                              texts, syntactically                                                                         [281][282]        K. Luyckx et
Personae Corpus      and Personality                                   145            Text        Classification, regression    2008
                                              annotated texts are                                                                                            al.
                     Prediction. Consists
                                              given.
                     of 145 Dutch-
                     language essays.
                     Archives of social
                     media websites,          Text extracted and
                                                                       ~100,000,000                                                        [283][284]        J.
PushShift            including Reddit,        normalized from                         Json        NLP, sentiment, linguistics   2022
                                                                       posts                                                                                 Baumgartner
                     Twitter, and             WARCs
                     Hackernews.
                     Categorization task
                     for free text            Word frequency has                                                                           [285][286]        P. Ciarelli et
CNAE-9 Dataset                                                         1080           Text        Classification                2012
                     descriptions of          been extracted.                                                                                                al.
                     Brazilian companies.
                                              Sentiment of each
Sentiment Labeled    3000 sentiment           sentence has been                                   Classification, sentiment                [287][288]
                                                                       3000           Text                                      2015                         D. Kotzias
Sentences Dataset    labeled sentences.       hand labeled as                                     analysis
                                              positive or negative.
                     Dataset to predict the
                     number of comments
BlogFeedback                                  Many features of                                                                             [289][290]
                     a post will receive                               60,021         Text        Regression                    2014                         K. Buza
Dataset                                       each post extracted.
                     based on features of
                     that post.
                     Image captions
                     matched with newly       Entailment class
Stanford Natural
                     constructed              labels, syntactic                                   Natural language
Language                                                                                                                                   [291]             S. Bowman et
                     sentences to form        parsing by the           570,000        Text        inference/recognizing         2015
Inference (SNLI)                                                                                                                                             al.
                     entailment,              Stanford PCFG                                       textual entailment
Corpus
                     contradiction, or        parser
                     neutral pairs.
                     A multilingual
                     collection of short
DSL Corpus           excerpts of                                       294,000                    Discriminating between                   [292]             Tan, Liling et
                                              None                                    Text                                      2017
Collection (DSLCC)   journalistic texts in                             phrases                    similar languages                                          al.
                     similar languages and
                     dialects.
Urban Dictionary     Corpus of words,         User names                                          NLP, Machine                             [293]
                                                                       2,580,925      CSV                                       2016 May                     Anonymous
Dataset              votes and definitions    anonymised                                          comprehension
                                                                                      JSON
                                                                                      and NIF
                     Wikipedia abstracts      Alignment of Wikidata                   [2] (http
                                                                       11M aligned                                                         [294]             H. Elsahar et
T-REx                aligned with Wikidata    triples with Wikipedia                  s://hady    NLP, Relation Extraction      2018
                                                                       triples                                                                               al.
                     entities                 abstracts                               elsahar.
                                                                                      github.i
                                                                                      o/t-rex/)
                                                                       ~1M
General Language
                     Benchmark of nine                                 sentences                                                           [295][296][297]
Understanding                                 Various                                             NLU                           2018                         Wang et al.
                     tasks                                             and sentence
Evaluation (GLUE)
                                                                       pairs
Contract
Understanding                                                                                                                                                The Atticus
Atticus Dataset      Dataset of legal                                                 CSV                                                                    Project (http
                                                                       ~13,000                    Natural language
(CUAD) (formerly     contracts with rich                                              and                                       2021                         s://www.atticu
                                                                       labels                     processing, QnA
known as Atticus     expert annotations                                               PDF                                                                    sprojectai.org/
Open Contract                                                                                                                                                cuad)
Dataset (AOK))
                                                                       26,850
Vietnamese
                     Vietnamese Names                                  Vietnamese
Names annotated                                                                                   Natural language                         [299]
                     annotated with                                    full names     CSV                                       2020                         To et al.
with Genders (UIT-                                                                                processing
                     Genders                                           annotated
ViNames)
                                                                       with genders
                                                                       10,000
                                                                       Vietnamese
Vietnamese
                     Vietnamese                                        users'
Constructive and
                     Constructive and                                  comments on                Natural Language                         [300]
Toxic Speech                                                                          CSV                                       2021                         Nguyen et al.
                     Toxic Speech                                      online                     Processing
Detection Dataset
                     Detection Dataset                                 newspapers
(UIT-ViCTSD)
                                                                       on 10
                                                                       domains
Sound data
These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis.
Speech
                                                                                                                                       Created
    Dataset Name           Brief description              Preprocessing             Instances          Format        Default Task                  Reference         Creator
                                                                                                                                      (updated)
                                                                                    English:
                                                                                                                   Unsupervised
                                                                                    5h, 12
 Zero Resource          Spontaneous speech                                                                         discovery of
                                                                                    speakers;      WAV (audio                                      [301][302]    Versteegh et
 Speech Challenge       (English), Read speech       None, raw WAV files.                                          speech             2015
                                                                                    Xitsonga:      only)                                                         al.
 2015                   (Xitsonga).                                                                                features/subword
                                                                                    2h30, 24
                                                                                                                   units/word units
                                                                                    speakers
                        Recordings of 630
                        speakers of eight major
                                                                                                                   Speech
                        dialects of American         Speech is lexically and                                                                       [313][314]    J. Garofolo et
 TIMIT                                                                              6300           Text            recognition,       1986
                        English, each reading ten    phonemically transcribed.                                                                                   al.
                                                                                                                   classification.
                        phonetically rich
                        sentences.
                                                                                                                   Speech
                        A single-speaker, Modern
                                                                                                                   Synthesis,
                        Standard Arabic (MSA)
                                                     Speech is                                                     Speech
                        speech corpus with
 Arabic Speech                                       orthographically and                                          Recognition,                    [315]
                        phonetic and                                                ~1900          Text, WAV                          2016                       N. Halabi
 Corpus                                              phonetically transcribed                                      Corpus
                        orthographic transcripts
                                                     with stress marks.                                            Alignment,
                        aligned to phoneme
                                                                                                                   Speech Therapy,
                        level.
                                                                                                                   Education.
                        A public domain
                        database of                                                 English:       MP3 with                           2017 June
                                                     Validation by other users                                     Speech                          [316]
 Common Voice           crowdsourced data                                           1,118          corresponding                      (2019                      Mozilla
                                                     .                                                             recognition
                        across a wide range of                                      hours          text files                         December)
                        dialects.
                        A single-speaker corpus
                        of English public-domain     Quality check,
                                                                                                                   Speech                          [317]         Keith Ito,
 LJSpeech               audiobook recordings,        normalized transcription       13,100         CSV, WAV                           2017
                                                                                                                   synthesis                                     Linda Johnson
                        split into short clips at    alongside the original.
                        punctuation marks.
Music
                                                                                                                                       Created
     Dataset Name             Brief description                Preprocessing               Instances      Format     Default Task                 Reference         Creator
                                                                                                                                      (updated)
                         Audio features of music                                                                   Geographic
 Geographic Origin of                                   Audio features extracted                                                                  [318][319]
                         samples from different                                            1,059          Text     classification,    2014                      F. Zhou et al.
 Music Data Set                                         using MARSYAS software.
                         locations.                                                                                clustering
Other sounds
                                                                                                                                             Created
   Dataset Name           Brief description           Preprocessing         Instances                 Format              Default Task                    Reference         Creator
                                                                                                                                            (updated)
                       10-second sound
                       snippets from             128-d PCA'd VGG-ish
                                                                                           Text (CSV) and TensorFlow                                      [328]          J. Gemmeke
 AudioSet              YouTube videos, and       features every 1           2,084,320                                    Classification     2017
                                                                                           Record files                                                                  et al., Google
                       an ontology of over       second.
                       500 labels.
                                                                                                                                                                         Queen Mary
                       Audio from
                                                                                                                                                                         University
 Bird Audio            environmental
                                                                                                                                            2016          [329][330]     and IEEE
 Detection             monitoring stations,                                 17,000+                                      Classification
                                                                                                                                            (2018)                       Signal
 challenge             plus crowdsourced
                                                                                                                                                                         Processing
                       recordings
                                                                                                                                                                         Society
Signal data
Datasets containing electric signal information requiring some sort of signal processing for further analysis.
Electrical
                                                                                                                                           Created
     Dataset Name                Brief description                  Preprocessing            Instances     Format      Default Task                     Reference          Creator
                                                                                                                                          (updated)
                                                            Levels of various
                            Data covering the nonlinear
                                                            components as a function                                                                    [340][341]
 Servo Dataset              relationships observed in a                                      167          Text         Regression         1993                         K. Ullrich
                                                            of other components are
                            servo-amplifier circuit.
                                                            given.
Motion-tracking
                                                                                                                                          Created
     Dataset Name               Brief description                  Preprocessing           Instances     Format     Default Task                        Reference          Creator
                                                                                                                                         (updated)
                           10 normal and 10
                           aggressive physical
 Vicon Physical Action                                     Many parameters recorded                                                                   [350][351]
                           actions that measure the                                        3000          Text       Classification       2011                          T. Theodoridis
 Data Set Dataset                                          by 3D tracker.
                           human activity tracked by
                           a 3D tracker.
                                                         Many sensors given, no
Daily and Sports         Motor sensor data for 19                                                                                               [352][353]        B. Barshan et
                                                         preprocessing done on         9120          Text       Classification     2013
Activities Dataset       daily and sports activities.                                                                                                             al.
                                                         signals.
                         Gyroscope and
Human Activity           accelerometer data from         Actions performed are
                                                                                                                                                [354][355]        J. Reyes-Ortiz
Recognition Using        people wearing                  labeled, all signals          10,299        Text       Classification     2012
                                                                                                                                                                  et al.
Smartphones Dataset      smartphones and                 preprocessed for noise.
                         performing normal actions.
Weight Lifting
                         Five variations of the
Exercises monitored                                      Some statistics calculated                                                             [358][359]        W. Ugulino et
                         biceps curl exercise                                          39,242        Text       Classification     2013
with Inertial                                            from raw data.                                                                                           al.
                         monitored with IMUs.
Measurement Units
                         Two databases of surface
sEMG for Basic Hand                                                                                                                             [360][361]        C. Sapsanis et
                         electromyographic signals       None.                         3000          Text       Classification     2014
movements Dataset                                                                                                                                                 al.
                         of 6 hand movements.
                         Evaluate techniques
                         dealing with the effects of
REALDISP Activity                                                                                                                               [361][362]
                         sensor displacement in          None.                         1419          Text       Classification     2014                           O. Banos et al.
Recognition Dataset
                         wearable activity
                         recognition.
                         18 different types of
PAMAP2 Physical
                         physical activities                                                                                                    [367]
Activity Monitoring                                      None.                         3,850,505     Text       Classification     2012                           A. Reiss
                         performed by 9 subjects
Dataset
                         wearing 3 IMUs.
                         Human Activity
                         Recognition from wearable,
OPPORTUNITY              object, and ambient
                                                                                                                                                [368][369]        D. Roggen et
Activity Recognition     sensors is a dataset            None.                         2551          Text       Classification     2012
                                                                                                                                                                  al.
Dataset                  devised to benchmark
                         human activity recognition
                         algorithms.
                         Human Activity
                         Recognition from wearable
                         devices. Distinguishes                                        3,150,000
Real World Activity                                                                                                                             [370]
                         between seven on-body           None.                         (per          Text       Classification     2016                           T. Sztyler et al.
Recognition Dataset
                         device positions and                                          sensor)
                         comprises six different
                         kinds of sensors.
                                                                                       10 healthy
                         3D human pose estimates                                       person and
                         (Kinect) of stroke patients                                   9 stroke
Toronto Rehab Stroke     and healthy participants                                      survivors                                                [371][372][373]   E. Dolatabadi
                                                         None.                                       CSV        Classification     2017
Pose Dataset             performing a set of tasks                                     (3500–                                                                     et al.
                         using a stroke rehabilitation                                 6000
                         robot.                                                        frames per
                                                                                       person)
Other signals
                                                                                                                                     Created
    Dataset Name               Brief description                 Preprocessing           Instances     Format     Default Task                   Reference            Creator
                                                                                                                                    (updated)
Physical data
Datasets from physical systems.
High-energy physics
                                                                                                                                       Created
     Dataset Name             Brief description                  Preprocessing             Instances     Format     Default Task                    Reference            Creator
                                                                                                                                      (updated)
                          Monte Carlo simulations of
                                                           28 features of each                                                                     [380][381][382]
 HIGGS Dataset            particle accelerator                                             11M           Text       Classification    2014                           D. Whiteson
                                                           collision are given.
                          collisions.
Systems
                                                                                                                                        Created
     Dataset Name              Brief description                  Preprocessing             Instances     Format     Default Task                   Reference            Creator
                                                                                                                                       (updated)
 Yacht Hydrodynamics      Yacht performance based           Six features are given for                                                              [384][385]
                                                                                            308          Text       Regression         2013                          R. Lopez
 Dataset                  on dimensions.                    each yacht.
                          A series of aerodynamic
 Airfoil Self-Noise       and acoustic tests of two         Data about frequency, angle                                                             [394]
                                                                                            1503         Text       Regression         2014                          R. Lopez
 Dataset                  and three-dimensional airfoil     of attack, etc., are given.
                          blade sections.
Astronomy
                                                                                                                                        Created
     Dataset Name              Brief description                   Preprocessing             Instances     Format    Default Task                   Reference            Creator
                                                                                                                                       (updated)
 Volcanoes on Venus –
                          Venus images returned by          Images are labeled by                                                                   [398][399]
 JARtool experiment                                                                          not given    Images     Classification    1991                          M. Burl
                          the Magellan spacecraft.          humans.
 Dataset
                          Monte Carlo generated high-       Numerous features
 MAGIC Gamma                                                                                                                                        [399][400]
                          energy gamma particle             extracted from the               19,020       Text       Classification    2007                          R. Bock
 Telescope Dataset
                          events.                           simulations.
                          Measurements of the
                          number of certain types of        Many solar flare-specific                                Regression,                    [401]
 Solar Flare Dataset                                                                         1389         Text                         1989                          G. Bradshaw
                          solar flare events occurring      features are given.                                      classification
                          in a 24-hour period.
Earth science
                                                                                                                                        Created
     Dataset Name              Brief description                   Preprocessing             Instances     Format    Default Task                   Reference            Creator
                                                                                                                                       (updated)
 Volcanoes of the World   Volcanic eruption data for all    Details such as region,          1535         Text       Regression,       2013         [403]            E. Venzke et al.
                          known volcanic events on          subregion, tectonic setting,                             classification
                          earth.
                                                      dominant rock type are
                                                      given.
                       Catchment hydrology
                       dataset with                                                              CSV,
                                                                                                                                          [408]        C. Alvarez-
CAMELS-Chile           hydrometeorological            see Reference                  516         Text,       Regression       2018
                                                                                                                                                       Garreton et al.
                       timeseries and various                                                    Shapefile
                       attributes
                       Catchment hydrology
                       dataset with                                                              CSV,
CAMELS-Brazil          hydrometeorological            see Reference                  897         Text,       Regression       2020        [409]        V. Chagas et al.
                       timeseries and various                                                    Shapefile
                       attributes
                       Catchment hydrology
                       dataset with                                                              CSV,
CAMELS-GB              hydrometeorological            see Reference                  671         Text,       Regression       2020        [410]        G. Coxon et al.
                       timeseries and various                                                    Shapefile
                       attributes
                       Catchment hydrology
                       dataset with                                                              CSV,
CAMELS-Australia       hydrometeorological            see Reference                  222         Text,       Regression       2021        [411]        K. Fowler et al.
                       timeseries and various                                                    Shapefile
                       attributes
                       Catchment hydrology
                       dataset with                                                              CSV,
LamaH-CE               hydrometeorological            see Reference                  859         Text,       Regression       2021        [412]        C. Klingler et al.
                       timeseries and various                                                    Shapefile
                       attributes
Other physical
                                                                                                                               Created
    Dataset Name            Brief description               Preprocessing            Instances   Format      Default Task                 Reference        Creator
                                                                                                                              (updated)
                       Dataset of concrete
Concrete Compressive                                  Nine features are given for                                                         [413][414]
                       properties and compressive                                    1030        Text        Regression       2007                     I. Yeh
Strength Dataset                                      each sample.
                       strength.
Concrete Slump Test    Concrete slump flow given      Features of concrete given                                                          [415][416]
                                                                                     103         Text        Regression       2009                     I. Yeh
Dataset                in terms of properties.        such as fly ash, water, etc.
                       Predict if a molecule, given                                                                                                    Arris
                                                      168 features given for each                                                         [417]
Musk Dataset           the features, will be a musk                                  6598        Text        Classification   1994                     Pharmaceutical
                                                      molecule.
                       or a non-musk.                                                                                                                  Corp.
                                                                                                                                                       Semeion
Steel Plates Faults    Steel plates of 7 different    27 features given for each                                                          [418]
                                                                                     1941        Text        Classification   2010                     Research
Dataset                types.                         sample.
                                                                                                                                                       Center
Biological data
Datasets from biological systems.
Human
                                                                                                                    Created
   Dataset Name          Brief description          Preprocessing          Instances   Format    Default Task                               Reference       Creator
                                                                                                                   (updated)
                                                 A five-step method to
                       A structured general-     infer birth and death
                       purpose dataset on        years, gender, and                                                            Paper[419]
                       life, work, and death     occupation from                                 Regression,                                            Amoradnejad
 Age Dataset                                                               1,223,009   Text                        2022
                       of 1.22 million           community-submitted                             Classification                Dataset[420]             et al.
                       distinguished people.     data to all language
                       Public domain.            versions of the
                                                 Wikipedia project.
                                                                                                                                                        United States
 National Survey on    Large scale survey on                                                                                                            Department of
                                                                                                 Classification,               [430]
 Drug Use and          health and drug use in    None.                     55,268      Text                        2012                                 Health and
                                                                                                 regression
 Health                the United States.                                                                                                               Human
                                                                                                                                                        Services
                       9 years of
 Diabetes 130-US
                       readmission data
 hospitals for years                             Many features of each                           Classification,               [435][436]
                       across 130 US                                       100,000     Text                        2014                                 J. Clore et al.
 1999–2008                                       readmission are given.                          clustering
                       hospitals for patients
 Dataset
                       with diabetes.
                       Features extracted
 Diabetic                                        Features extracted
                       from images of eyes                                                                                     [437][438]
 Retinopathy                                     and conditions            1151        Text      Classification    2014                                 B. Antal et al.
                       with and without
 Debrecen Dataset                                diagnosed.
                       diabetic retinopathy.
                       Methods to evaluate
                       segmentation and
 Diabetic                                        Features retinopathy
                       indexing techniques in                                          Images,   Classification,               [439][440]               Messidor
 Retinopathy                                     grade and risk of         1200                                    2008
                       the field of retinal                                            Text      Segmentation                                           Project
 Messidor Dataset                                macular edema
                       ophthalmology
                       (MESSIDOR)
                                                 Seven biological
 Liver Disorders       Data for people with                                                                                    [441][442]               Bupa Medical
                                                 features given for        345         Text      Classification    1990
 Dataset               liver disorders.                                                                                                                 Research Ltd.
                                                 each patient.
                       10 databases of
 Thyroid Disease                                                                                                               [443][444]
                       thyroid disease patient   None.                     7200        Text      Classification    1987                                 R. Quinlan
 Dataset
                       data.
                                                 Large number of
 Mesothelioma          Mesothelioma patient      features, including                                                           [445][446]               A. Tanrikulu et
                                                                           324         Text      Classification    2016
 Dataset               data.                     asbestos exposure,                                                                                     al.
                                                 are given.
                       2D human pose
 Parkinson's Vision-   estimates of              Camera shake has
                                                                                                 Classification,               [447][448][449]
 Based Pose            Parkinson's patients      been removed from         134         Text                        2017                                 M. Li et al.
                                                                                                 regression
 Estimation Dataset    performing a variety of   trajectories.
                       tasks.
 KEGG Metabolic        Network of metabolic      Detailed features for     65,554      Text      Classification,   2011        [450]                    M. Naeem et
 Reaction Network      pathways. A reaction      each network node                               clustering,                                            al.
                                                                                                 regression
(Undirected)           network and a relation       and pathway are
Dataset                network are given.           given.
Animal
                                                                                                                                              Created
    Dataset Name                Brief description                   Preprocessing            Instances     Format       Default Task                     Reference        Creator
                                                                                                                                             (updated)
                                                                                                                                                                      Marine
                           Physical measurements of
                                                                                                                                                         [453]        Research
Abalone Dataset            Abalone. Weather patterns        None.                            4177         Text          Regression           1995
                                                                                                                                                                      Laboratories –
                           and location are also given.
                                                                                                                                                                      Taroona
                           Primate splice-junction
Splice-junction Gene       gene sequences (DNA) with                                                                                                     [432]
                                                            None.                            3190         Text          Classification       1992                     G. Towell et al.
Sequences Dataset          associated imperfect
                           domain theory.
                           Expression levels of 77
Mice Protein                                                                                                            Classification,                  [457][458]
                           proteins measured in the         None.                            1080         Text                               2015                     C. Higuera et al.
Expression Dataset                                                                                                      Clustering
                           cerebral cortex of mice.
Fungi
                                                                                                                                              Created
    Dataset Name                 Brief description                    Preprocessing           Instances     Format       Default Task                    Reference        Creator
                                                                                                                                             (updated)
Plant
                                                                                                                                              Created
    Dataset Name                Brief description                   Preprocessing            Instances     Format       Default Task                     Reference        Creator
                                                                                                                                             (updated)
                           Forest fires and their           13 features of each fire are                                                                 [462][463]
Forest Fires Dataset                                                                         517          Text          Regression           2008                     P. Cortez et al.
                           properties.                      extracted.
                           Measurements of
                           geometrical properties of                                                                    Classification,                  [469][470]   Charytanowicz
Seeds Dataset                                               None.                            210          Text                               2012
                           kernels belonging to three                                                                   clustering                                    et al.
                           different varieties of wheat.
Microbe
                                                                                                                                           Created
     Dataset Name              Brief description                    Preprocessing            Instances    Format      Default Task                    Reference        Creator
                                                                                                                                          (updated)
                          Predictions of Cellular
                                                             Eight features given per                                                                 [484][485]
 Yeast Dataset            localization sites of                                              1484        Text        Classification       1996                     K. Nakai et al.
                                                             instance.
                          proteins.
Drug discovery
                                                                                                                                           Created
     Dataset Name              Brief description                    Preprocessing            Instances    Format      Default Task                    Reference        Creator
                                                                                                                                          (updated)
                          Prediction of outcome of           Chemical descriptors of                                                                  [486]
 Tox21 Dataset                                                                               12707       Text        Classification       2016                     A. Mayr et al.
                          biological assays.                 molecules are given.
Anomaly data
                                                                                                          Default      Created
    Dataset Name          Brief description             Preprocessing            Instances    Format                                             Reference              Creator
                                                                                                           Task       (updated)
                                               This dataset
                                               contains a large
                                               collection of Open
                       A large collection of   Neural SPARQL
                       Question to             Templates and
 DBpedia Neural        SPARQL specially        instances for
                                                                                                                                                            Hartmann,
 Question              design for Open         training Neural                                                 Question                        [491][492]
                                                                     894,499     Question-query pairs                              2018                     Soru, and
 Answering             Domain Neural           SPARQL                                                          Answering
                                                                                                                                                            Marx et al.
 (DBNQA) Dataset       Question Answering      Machines; it was
                       over DBpedia            pre-processed by
                       Knowledgebase.          semi-automatic
                                               annotation tools as
                                               well as by three
                                               SPARQL experts.
                                               This dataset
                                               comprises over
                       A large collection of   23,000 human-
 Vietnamese
                       Vietnamese              generated question-
 Question                                                                                                      Question                        [493]        Nguyen et
                       questions for           answer pairs based    23,074      Question-answer pairs                             2020
 Answering Dataset                                                                                             Answering                                    al.
                       evaluating MRC          on 5,109 passages
 (UIT-ViQuAD)
                       models.                 of 174 Vietnamese
                                               articles from
                                               Wikipedia.
                       A collection of
 Vietnamese                                    This corpus
                       Vietnamese                                                                              Question
 Multiple-Choice                               includes 2,783
                       multiple-choice                                                                         Answering/Machine               [494]        Nguyen et
 Machine Reading                               Vietnamese            2,783       Question-answer pairs                             2020
                       questions for                                                                           Reading                                      al.
 Comprehension                                 multiple-choice
                       evaluating MRC                                                                          Comprehension
 Corpus(ViMMRC)                                questions.
                       models.
                                                                                    Taskmaster-1 and
                                                                                    Taskmaster-2:
                                                                                    conversation id,
                                                                                    utterances, Instruction id
                                                                                    Taskmaster-3:
               "The                                                                 conversation          id,
               Taskmaster
                                                                                    utterances,     vertical,
               corpus            Taskmaster-1: goal-oriented
               consists of       conversational dataset. It includes                scenario, instructions.
               THREE             13,215 task-based dialogs
               datasets,         comprising six domains.                            For further details
               Taskmaster-1                                                         check the project's
               (TM-1),           Taskmaster-2: 17,289 dialogs
               Taskmaster-2                                                         GitHub repository (http
               (TM-2), and
                                 in    the    seven    domains                      s://github.com/google-          Dialog/Instruction
                                                                                                                                                                 Byrne and
Taskmaster                                                                                                                               2019        [498]       Krishnamoorthi
               Taskmaster-3      (restaurants, food ordering,                       research-datasets/Tas           prompted
                                                                                                                                                                 et al.
               (TM-3),           movies, hotels, flights, music                     kmaster)      or    the
               comprising        and sports).
               over 55,000                                                          Hugging Face dataset
               spoken and                                                           cards (taskmaster-1 (h
               written task-     Taskmaster-3: 23,757 movie
                                                                                    ttps://huggingface.co/d
               oriented          ticketing dialogs.
               dialogs in                                                           atasets/taskmaster1),
               over a dozen                                                         taskmaster-2 (https://h
               domains."[497]                                                       uggingface.co/dataset
                                                                                    s/taskmaster2),
                                                                                    taskmaster-3 (https://h
                                                                                    uggingface.co/dataset
                                                                                    s/taskmaster3)).
Cybersecurity
                      Brief                                                                             Default    Created
 Dataset Name                        Preprocessing      Instances                 Format                                              Reference
                   description                                                                           Task     (updated)
                  CVE is a list
                  of publicly
                  disclosed
                  cybersecurity
                                                                        Data can be downloaded
                  vulnerabilities
                                                                        from: Allitems (https://cve.                          [506]
CVE               that is free to                                                                                                                 C
                                                                        mitre.org/data/downloads/al
                  search, use,
                                                                        litems.csv)
                  and
                  incorporate
                  into products
                  and services.
                                                                        Software
                                                                        Development (https://c
                  Common                                                we.mitre.org/data/csv/
                  Weakness                                              699.csv.zip) Hardware                                 [507]
CWE                                                                                                                                               C
                  Enumeration
                  data.                                                 Design (https://cwe.mit
                                                                        re.org/1194.csv.zip)
                                                                        Research Concepts (h
                                                                        ttps://cwe.mitre.org/dat
                                                                        a/csv/1000.csv.zip)
                                              2009 (https://www.use
                                              nix.org/legacy/events/
                                              sec09/tech/), 2010 (htt
                                              ps://www.usenix.org/le
                                              gacy/events/sec10/tec
                                              h/) 2011 (https://static.
                                              usenix.org/event/sec1
                                              1/tech/), 2012 (https://
                                              www.usenix.org/confe
                                              rence/usenixsecurity1
                                              2/technical-sessions),
                                              2013 (https://www.use
                                              nix.org/conference/us
                                              enixsecurity13/technic
                                              al-sessions), 2014 (htt
                                              ps://www.usenix.org/c
                                              onference/usenixsecu
                                              rity14/technical-sessio
                                              ns), 2015 (https://www.
                                              usenix.org/conferenc
                                              e/usenixsecurity15/tec
                                              hnical-sessions), 2016
                                              (https://www.usenix.or
                                              g/conference/usenixse
                                              curity16/technical-ses
                                              sions), 2017 (https://w
                                              ww.usenix.org/confere
                                              nce/usenixsecurity17/t
                                              echnical-sessions),
                                              2018 (https://www.use
                                              nix.org/conference/us
                                              enixsecurity18/technic
                                              al-sessions), 2019 (htt
                                              ps://www.usenix.org/c
                                              onference/usenixsecu
                                              rity19/technical-sessio
                                              ns), 2020 (https://www.
                                              usenix.org/conferenc
                                              e/usenixsecurity20/tec
                                              hnical-sessions), 2021
                                              (https://www.usenix.or
                                              g/conference/usenixse
                                              curity21/technical-ses
                                              sions), 2022 (https://w
                                              ww.usenix.org/confere
                                              nce/usenixsecurity22/t
                                              echnical-sessions).
APTNotes   Collection of   This data is not   The GitHub repository (http    [510]   A
           public          pre-processed.     s://github.com/aptnotes/dat
           documents,                         a) of the project contains a
           whitepapers
                    and articles                        file with links to the data
                    about APT                           stored in box.
                    campaigns.
                    All the                             Data files can also be
                    documents
                    arepublicly
                                                        downloaded here (http
                    available                           s://github.com/ameza1
                    data.                               3/APTNotesData/).
                    Small
                    collection of
                    security
Security eBooks     eBooks, and      This data is not                                    [512][513][514][515][516][517][518][519][520][521][522][523]
for free            security         pre-processed.
                    presentations
                    publicly
                    available.
                    Repository of
                    worldwide
National Cyber
                    strategy         This data is not                                    [524]
Security strategy
                    documents        pre-processed.
repository
                    about
                    cybersecurity.
                                                                                                                                                        Y
                    Data about
                                                                                                                                                        C
                    cybersecurity    Tokenization,
Cyber Security                                                                                                                                          Y
                    strategies       meaningless-                                        [525]
Natural Language                                                                                                                                        W
                    from more        frequent words
Processing                                                                                                                                              Y
                    than 75          removal.
                                                                                                                                                        X
                    countries.
                                                                                                                                                        X
                    Sample of
                    APT reports,
                                                        All data is available in this
                    malware,         Raw and
APT Reports                                             GitHub (https://github.com/      [526]
                    technology,      tokenize data                                                                                                      b
collection                                              blackorbird/APT_REPORT)
                    and              available.
                                                        repository.
                    intelligence
                    collection
                                                        News (https://www.databre
                                                        aches.net/news/), list of
                                                        news from Aug 2022 to Feb
Databreaches                         This data is not                                    [531]
                                                        2023 (https://github.com/be
news                                 pre-processed.
                                                        e3202/cybersecurity-data-s
                                                        ources/blob/main/DATABR
                                                        EACHES.md)
                                                       News (https://cybernews.c
                                                       om/news/), curated list of
                                    This data is not   news (https://github.com/b       [532]
Cybernews
                                    pre-processed.     ee3202/cybersecurity-data-
                                                       sources/blob/main/CYBER
                                                       NEWS.md)
                                                       News (https://www.hipaajou
                                    This data is not                                    [533]
Hipaajournal                                           rnal.com/category/hipaa-co
                                    pre-processed.
                                                       mpliance-news/)
                   Matrix of
Mitre Defend       Defend                              json files                       [544]
                   artifacts
Mitre Atlas        Mitre Atlas is   This data is not                                    [545]
                   a knowledge      pre-processed
                     base of
                     adversary
                     tactics,
                     techniques,
                     and case
                     studies for
                     machine
                     learning (ML)
                     systems
                     based on
                     real-world
                     observations.
                     MITRE
                     Engage is a
                     framework for
                     planning and
                     discussing
                     adversary
                     engagement
                     operations
                                       This data is not                                                                        [546]
Mitre Engage         that
                                       pre-processed
                     empowers
                     you to
                     engage your
                     adversaries
                     and achieve
                     your
                     cybersecurity
                     goals.
                                       This data is not                                                                        [547]
Hacking Tutorials
                                       pre-processed
                                              Each claim is
                                              accompanied by five
                      A dataset adopting      manually annotated
                      the FEVER               evidence sentences                    Dataset HF card (https://hu
                      methodology that        retrieved from the                    ggingface.co/datasets/clima
                      consists of 1,535       English Wikipedia that                te_fever), and project's                                 [554]
CLIMATE-FEVER                                 support, refute or do                                                                                      Diggelmann et al.
                      real-world claims                                             GitHub repository (https://git
                      regarding climate-      not give enough                       hub.com/tdiggelm/climate-fe
                      change collected        information to validate               ver-dataset).
                      on the internet.        the claim totalling in
                                              7,675 claim-evidence
                                              pairs.[553]
                                              The dataset is made
                                                                                     Climate news DB (http://ww
                      A dataset for NLP       up of a number of data
                                                                                     w.climate-news-db.com/),
Climate News          and climate             artifacts (JSON,                                                                                [555]
                                                                                     Project's GitHub repository                                           ADGEfficiency
dataset               change media            JSONL & CSV text
                                                                                     (https://github.com/ADGEffi
                      researchers             files & SQLite
                                                                                     ciency/climate-news-db)
                                              database)
                      Climatext is a
                                                                                     HF dataset (https://huggingf
                      dataset for
                                                                                     ace.co/datasets/mwong/cli                                [556]
Climatext             sentence-based                                                                                                                       University of Zurich
                                                                                     matetext-evidence-related-e
                      climate change
                                                                                     valuation/tree/main/data)
                      topic detection.
                      Website with
                      articles about          This data is not pre-                                                                           [560]
GreenBiz                                                                                                                                                   GreenBiz
                      climate and             processed
                      sustainability
Code data
                                   Brief                                                                                            Default     Created
       Dataset Name                               Preprocessing       Instances                       Format                                               Reference     Creator
                                description                                                                                          Task      (updated)
                             The
                             Community
                             Distribution of
                                               This data is not   List of GitHub repositories of the project (http
OKD                          Kubernetes
                                               pre-processed      s://github.com/orgs/okd-project/repositories)
                             that powers
                             Red Hat
                             OpenShift
                             The developer
                             and operations                       List of GitHub repositories of the project (http
OpenShift                    friendly                             s://github.com/bee3202/open-shift-repos/blo
                             Kubernetes                           b/main/pages_openshift.md)
                             distro
                                                                  List of GitHub repositories of the project (http
                                               This data is not
Kubernetes                                                        s://github.com/bee3202/open-shift-repos/blo
                                               pre-processed
                                                                  b/main/pages_kubernetes.md)
                             GitHub home
                                                                  List of GitHub repositories of the project (http
                             of the Red Hat    This data is not
Red Hat Developer                                                 s://github.com/bee3202/open-shift-repos/blo
                             Developer         pre-processed
                                                                  b/main/pages_redhat_developer.md)
                             program
Red Hat
                                                                  List of GitHub repositories of the project (http
                                               This data is not
                                                                  s://github.com/bee3202/open-shift-repos/blo
Workshops                                      pre-processed
                                                                  b/main/pages_redhat_workshops.md)
Multivariate data
Financial
                                                                                                                                              Created
    Dataset Name                 Brief description                   Preprocessing              Instances      Format       Default Task                 Reference                Creator
                                                                                                                                             (updated)
Weather
                                                                                                                                        Created
     Dataset Name               Brief description                  Preprocessing            Instances     Format     Default Task                  Reference        Creator
                                                                                                                                       (updated)
Census
                                                                                                                                        Created
     Dataset Name              Brief description                   Preprocessing            Instances     Format     Default Task                  Reference        Creator
                                                                                                                                       (updated)
                          Partial data from 1990 US        Results randomized and                                    Classification,               [593]        United States
 US Census Data 1990                                                                        2,458,285    Text                          1990
                          census.                          useful attributes selected.                               regression                                 Census Bureau
Transit
                                                                                                                                        Created
     Dataset Name              Brief description                  Preprocessing             Instances     Format     Default Task                  Reference        Creator
                                                                                                                                       (updated)
                                                           Many features, including
                          Hourly and daily count of                                                                                                [594][595]
 Bike Sharing Dataset                                      weather, length of trip, etc.,   17,389       Text        Regression        2013                     H. Fanaee-T
                          rental bikes in a large city.
                                                           are given.
                                                                                            39,000
                          Speed, flow, occupancy                                            individual
                                                                                                                     Regression,
                          and other metrics from loop      Metric usually aggregated        detectors,   Comma                                                  California
                                                                                                                     Forecasting,      (updated    [600]
 PeMS                     detectors and other sensors      via Average into 5 minutes       each         separated                                              Department of
                                                                                                                     Nowcasting,       realtime)
                          in the freeway of the State      timesteps.                       containing   values                                                 Transportation
                                                                                                                     Interpolation
                          of California, U.S.A..                                            years of
                                                                                            timeseries
Internet
                                                                                                                                              Created
   Dataset Name             Brief description                 Preprocessing           Instances            Format          Default Task                  Reference        Creator
                                                                                                                                             (updated)
                        Large collection of
Webpages from           webpages and how they                                                                              clustering,                   [601]
                                                         None.                        3.5B              Text                                 2013                     V. Granville
Common Crawl 2012       are connected via                                                                                  classification
                        hyperlinks
                                                         Features encode
Internet                Dataset for predicting if a
                                                         geometry of ads and                                                                             [602][603]
Advertisements          given image is an                                             3279              Text               Classification    1998                     N. Kushmerick
                                                         phrases occurring in the
Dataset                 advertisement or not.
                                                         URL.
Internet Usage          General demographics of                                                                            Classification,               [604]
                                                         None.                        10,104            Text                                 1999                     D. Cook
Dataset                 internet users.                                                                                    clustering
                        Freebase is an online
Freebase Simple                                          Topics from Freebase                                              Classification,               [609][610]
                        effort to structure all                                       large             Text                                 2011                     Freebase
Topic Dump                                               have been extracted.                                              clustering
                        human knowledge.
                        An open-source
                        recreation of the WebText                                                                          Natural
                        corpus. The text is web          Extracted non-HTML           8,013,769                            Language
                                                                                                                                                         [616][617]   A. Gokaslan,
OpenWebText             content extracted from           content, deduplicated,       Documents,        Text               Processing,       2019
                                                                                                                                                                      V. Cohen
                        URLs shared on Reddit            and tokenized.               38GB                                 Text
                        with at least three                                                                                Prediction
                        upvotes.
Games
                                                                                                                                              Created
    Dataset Name               Brief description                    Preprocessing             Instances        Format      Default Task                  Reference        Creator
                                                                                                                                             (updated)
                                                            Attributes of each hand are
                         5 card hands from a                given, including the Poker                                     Regression,                   [618]
Poker Hand Dataset                                                                            1,025,010     Text                             2007                     R. Cattral
                         standard 52 card deck.             hands formed by the cards                                      classification
                                                            it contains.
Other multivariate
                                                                                                                                              Created
    Dataset Name             Brief description                   Preprocessing            Instances       Format          Default Task                   Reference        Creator
                                                                                                                                             (updated)
   OpenML:[646] Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating
   algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms.
   PMLB:[647] A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification
   and regression datasets in a standardized format that are accessible through a Python API.
   Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and
   counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic.
   Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question
   answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.[648][649]
See also
   Comparison of deep learning software
   List of manual image annotation tools
   List of biological databases
References
 1. Wissner-Gross, A. "Datasets Over Algorithms" (https://edge.org/resp                      4. Abney, Steven (17 September 2007). Semisupervised Learning for
    onse-detail/26587). Edge.com. Retrieved 8 January 2016.                                     Computational Linguistics (https://books.google.com/books?id=VC
 2. Weiss, G. M.; Provost, F. (1 September 2003). "Learning When                                d67cGB_rAC&pg=PP1). CRC Press. ISBN 978-1-4200-1080-0.
    Training Data are Costly: The Effect of Class Distribution on Tree                       5. Žliobaitė, Indrė; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoff
    Induction" (https://www.jair.org/index.php/jair/article/download/1034                       (2011). "Active Learning with Evolving Streaming Data". Machine
    6/24739). Journal of Artificial Intelligence Research. AI Access                            Learning and Knowledge Discovery in Databases. Berlin,
    Foundation. 19: 315–354. doi:10.1613/jair.1199 (https://doi.org/10.1                        Heidelberg: Springer Berlin Heidelberg. pp. 597–612.
    613%2Fjair.1199). ISSN 1076-9757 (https://www.worldcat.org/issn/                            doi:10.1007/978-3-642-23808-6_39 (https://doi.org/10.1007%2F978
    1076-9757). S2CID 2344521 (https://api.semanticscholar.org/Corpu                            -3-642-23808-6_39). ISBN 978-3-642-23807-9. ISSN 0302-9743 (ht
    sID:2344521).                                                                               tps://www.worldcat.org/issn/0302-9743).
 3. Turney, Peter (2000). "Types of cost in inductive concept learning".
    arXiv:cs/0212034 (https://arxiv.org/abs/cs/0212034).
 6. Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.;      17. Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature
    Kotsia, I. (2017). "Aff-Wild: Valence and Arousal 'In-the-Wild'                 extraction using field-programmable gate arrays". IEEE
    Challenge" (https://eprints.mdx.ac.uk/22045/1/aff_wild_kotsia.pdf)              Transactions on Systems, Man, and Cybernetics – Part B:
    (PDF). 2017 IEEE Conference on Computer Vision and Pattern                      Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848 (https://cit
    Recognition Workshops (CVPRW). Computer Vision and Pattern                      eseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.9848).
    Recognition Workshops (CVPRW), 2017. pp. 1980–1987.                             doi:10.1109/tsmcb.2005.862728 (https://doi.org/10.1109%2Ftsmcb.
    doi:10.1109/CVPRW.2017.248 (https://doi.org/10.1109%2FCVPR                      2005.862728). PMID 16903373 (https://pubmed.ncbi.nlm.nih.gov/16
    W.2017.248). ISBN 978-1-5386-0733-6. S2CID 3107614 (https://ap                  903373). S2CID 7334355 (https://api.semanticscholar.org/CorpusI
    i.semanticscholar.org/CorpusID:3107614).                                        D:7334355).
 7. Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.;       18. Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive
    Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction         database for facial expression analysis (http://www.ri.cmu.edu/pub_
    in-the-wild: Aff-Wild Database and Challenge, Deep Architectures,               files/pub2/kanade_takeo_2000_1/kanade_takeo_2000_1.pdf)."
    and Beyond" (https://rdcu.be/bmGm2). International Journal of                   Automatic Face and Gesture Recognition, 2000. Proceedings.
    Computer Vision. 127 (6–7): 907–929. doi:10.1007/s11263-019-                    Fourth IEEE International Conference on. IEEE, 2000.
    01158-4 (https://doi.org/10.1007%2Fs11263-019-01158-4).                     19. Zeng, Zhihong; et al. (2009). "A survey of affect recognition
    S2CID 13679040 (https://api.semanticscholar.org/CorpusID:136790                 methods: Audio, visual, and spontaneous expressions". IEEE
    40).                                                                            Transactions on Pattern Analysis and Machine Intelligence. 31 (1):
 8. Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit             39–58. CiteSeerX 10.1.1.144.217 (https://citeseerx.ist.psu.edu/view
    recognition: Aff-wild2, multi-task learning and arcface" (https://bmvc          doc/summary?doi=10.1.1.144.217). doi:10.1109/tpami.2008.52 (http
    2019.org/wp-content/uploads/papers/0399-paper.pdf) (PDF). British               s://doi.org/10.1109%2Ftpami.2008.52). PMID 19029545 (https://pub
    Machine Vision Conference (BMVC), 2019. arXiv:1910.04855 (http                  med.ncbi.nlm.nih.gov/19029545).
    s://arxiv.org/abs/1910.04855).                                              20. Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial
 9. Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing          expression images". The Japanese Female Facial Expression
    Affective Behavior in the First ABAW 2020 Competition" (https://ww              (JAFFE) Database. doi:10.5281/zenodo.3451524 (https://doi.org/1
    w.computer.org/csdl/proceedings-article/fg/2020/307900a794/1kecI                0.5281%2Fzenodo.3451524).
    Yu9wL6). 2020 15th IEEE International Conference on Automatic               21. Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro
    Face and Gesture Recognition (FG 2020). pp. 637–643.                            "Coding facial expressions with Gabor wavelets (https://zenodo.org/
    arXiv:2001.11409 (https://arxiv.org/abs/2001.11409).                            record/3430156)." Automatic Face and Gesture Recognition, 1998.
    doi:10.1109/FG47880.2020.00126 (https://doi.org/10.1109%2FFG4                   Proceedings. Third IEEE International Conference on. IEEE, 1998.
    7880.2020.00126). ISBN 978-1-7281-3079-8. S2CID 210966051 (h                22. Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to
    ttps://api.semanticscholar.org/CorpusID:210966051).
                                                                                    cleaning large face datasets (http://vintage.winklerbros.net/Publicati
10. Phillips, P. Jonathon; et al. (1998). "The FERET database and                   ons/icip2014a.pdf)." Image Processing (ICIP), 2014 IEEE
    evaluation procedure for face-recognition algorithms". Image and                International Conference on. IEEE, 2014.
    Vision Computing. 16 (5): 295–306. doi:10.1016/s0262-                       23. RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-
    8856(97)00070-x (https://doi.org/10.1016%2Fs0262-8856%2897%                     Miller, Erik (2015). "One-to-many face recognition with bilinear
    2900070-x).                                                                     CNNs". arXiv:1506.01342 (https://arxiv.org/abs/1506.01342) [cs.CV
11. Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch             (https://arxiv.org/archive/cs.CV)].
    graph matching". IEEE Transactions on Pattern Analysis and                  24. Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz.
    Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321 (h
                                                                                    "Robust face detection using the hausdorff distance." Audio-and
    ttps://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2321).               video-based biometric person authentication. Springer Berlin
    doi:10.1109/34.598235 (https://doi.org/10.1109%2F34.598235).                    Heidelberg, 2001.
    S2CID 30523165 (https://api.semanticscholar.org/CorpusID:305231
    65).                                                                        25. Huang, Gary B., et al. Labeled faces in the wild: A database for
                                                                                    studying face recognition in unconstrained environments (https://ha
12. Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson                    l.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf). Vol.
    Audio-Visual Database of Emotional Speech and Song                              1. No. 2. Technical Report 07-49, University of Massachusetts,
    (RAVDESS): A dynamic, multimodal set of facial and vocal                        Amherst, 2007.
    expressions in North American English" (https://www.ncbi.nlm.nih.g
    ov/pmc/articles/PMC5955500). PLOS ONE. 13 (5): e0196391.                    26. Bhatt, Rajen B., et al. "Efficient skin region segmentation using low
    Bibcode:2018PLoSO..1396391L (https://ui.adsabs.harvard.edu/abs/                 complexity fuzzy decision tree model (http://citeseerx.ist.psu.edu/vie
    2018PLoSO..1396391L). doi:10.1371/journal.pone.0196391 (http                    wdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf)." India
    s://doi.org/10.1371%2Fjournal.pone.0196391). PMC 5955500 (http                  Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.
    s://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500).                          27. Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue
    PMID 29768426 (https://pubmed.ncbi.nlm.nih.gov/29768426).                       areas in melanoma dermoscopy images" (https://www.ncbi.nlm.nih.
13. Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The                  gov/pmc/articles/PMC4287461). Computerized Medical Imaging
    Ryerson Audio-Visual Database of Emotional Speech and Song                      and Graphics. 38 (5): 403–410.
    (RAVDESS). doi:10.5281/zenodo.1188976 (https://doi.org/10.528                   doi:10.1016/j.compmedimag.2014.03.007 (https://doi.org/10.1016%
    1%2Fzenodo.1188976).                                                            2Fj.compmedimag.2014.03.007). PMC 4287461 (https://www.ncbi.n
                                                                                    lm.nih.gov/pmc/articles/PMC4287461). PMID 24786720 (https://pub
14. Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface–                   med.ncbi.nlm.nih.gov/24786720).
    surveillance cameras face database". Multimedia Tools and
    Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2 (htt           28. Maes, Chris, et al. "Feature detection on 3D face surfaces for pose
    ps://doi.org/10.1007%2Fs11042-009-0417-2). S2CID 207218990 (h                   normalisation and recognition (https://lirias.kuleuven.be/retrieve/135
    ttps://api.semanticscholar.org/CorpusID:207218990).                             678)." Biometrics: Theory Applications and Systems (BTAS), 2010
                                                                                    Fourth IEEE International Conference on. IEEE, 2010.
15. Wallace, Roy, et al. "Inter-session variability modelling and joint
    factor analysis for face authentication (https://repository.ubn.ru.nl/bit   29. Savran, Arman, et al. "Bosphorus database for 3D face analysis (htt
    stream/handle/2066/94489/94489.pdf)." Biometrics (IJCB), 2011                   ps://web.archive.org/web/20190222192331/http://pdfs.semanticsch
    International Joint Conference on. IEEE, 2011.                                  olar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf)."
                                                                                    Biometrics and Identity Management. Springer Berlin Heidelberg,
16. Georghiades, A. "Yale face database". Center For Computational                  2008. 47–56.
    Vision And Control At Yale University,
    http://CVC.yale.edu/Projects/Yalefaces/Yalefa. 2: 1997. {{cite              30. Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-
    journal}}: External link in |journal= (help)                                    dimensional face recognition: An eigensurface approach (http://epri
                                                                                    nts.whiterose.ac.uk/1526/01/austinj4.pdf)." Image Processing,
                                                                                    2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE,
                                                                                    2004.
                                                                                31. Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face
                                                                                    Recognition". Journal of Multimedia. 6 (5): 467–475.
                                                                                    CiteSeerX 10.1.1.461.9710 (https://citeseerx.ist.psu.edu/viewdoc/su
                                                                                    mmary?doi=10.1.1.461.9710). doi:10.4304/jmm.6.5.467-475 (https://
                                                                                    doi.org/10.4304%2Fjmm.6.5.467-475).
32. Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D          43. "SoF dataset" (https://sites.google.com/view/sof-dataset).
    face recognition by local shape difference boosting". IEEE                   sites.google.com. Retrieved 18 November 2017.
    Transactions on Pattern Analysis and Machine Intelligence. 32 (10):      44. "IMDb-WIKI" (https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/).
    1858–1870. CiteSeerX 10.1.1.471.2424 (https://citeseerx.ist.psu.ed           data.vision.ee.ethz.ch. Retrieved 13 March 2018.
    u/viewdoc/summary?doi=10.1.1.471.2424).                                  45. Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012).
    doi:10.1109/tpami.2009.200 (https://doi.org/10.1109%2Ftpami.200              "Structured learning of human interactions in TV shows". IEEE
    9.200). PMID 20724762 (https://pubmed.ncbi.nlm.nih.gov/2072476
                                                                                 Transactions on Pattern Analysis and Machine Intelligence. 34 (12):
    2). S2CID 15263913 (https://api.semanticscholar.org/CorpusID:152             2441–2453. doi:10.1109/tpami.2012.24 (https://doi.org/10.1109%2F
    63913).                                                                      tpami.2012.24). PMID 23079467 (https://pubmed.ncbi.nlm.nih.gov/2
33. Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face                    3079467). S2CID 6060568 (https://api.semanticscholar.org/CorpusI
    recognition using learned visual codebook (http://citeseerx.ist.psu.e        D:6060568).
    du/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf)."             46. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January
    Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE                 2013). Berkeley MHAD: A comprehensive multimodal human action
    Conference on. IEEE, 2007.                                                   database (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.
34. Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011).           1.432.5113&rep=rep1&type=pdf). In Applications of Computer
    "Facial expression recognition from near-infrared videos" (http://ww         Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.
    w.academia.edu/download/42229488/Image_and_Vision_Computi                47. Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a
    ng20160206-29020-1auzaon.pdf) (PDF). Image and Vision
                                                                                 large number of classes." ICCV Workshop on Action Recognition
    Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002 (http           with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-
    s://doi.org/10.1016%2Fj.imavis.2011.07.002).                                 Workshop. 2013.
35. Soyel, Hamit, and Hasan Demirel. "Facial expression recognition          48. Simonyan, Karen, and Andrew Zisserman. "Two-stream
    using 3D facial feature distances (https://pdfs.semanticscholar.org/cf       convolutional networks for action recognition in videos (https://pape
    81/4b618fcbc9a556cdce225e74a8806867ba84.pdf)." Image                         rs.nips.cc/paper/5353-two-stream-convolutional-networks-for-action
    Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–             -recognition-in-videos.pdf)." Advances in Neural Information
    838.                                                                         Processing Systems. 2014.
36. Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey
                                                                             49. Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu,
    of approaches and challenges in 3D and multi-modal 3D+ 2D face               Michel (2016). "Fast Action Localization in Large-Scale Video
    recognition". Computer Vision and Image Understanding. 101 (1):
                                                                                 Archives". IEEE Transactions on Circuits and Systems for Video
    1–15. CiteSeerX 10.1.1.134.8784 (https://citeseerx.ist.psu.edu/view
                                                                                 Technology. 26 (10): 1917–1930.
    doc/summary?doi=10.1.1.134.8784).                                            doi:10.1109/TCSVT.2015.2475835 (https://doi.org/10.1109%2FTC
    doi:10.1016/j.cviu.2005.05.005 (https://doi.org/10.1016%2Fj.cviu.20
                                                                                 SVT.2015.2475835). S2CID 31537462 (https://api.semanticscholar.
    05.05.005).                                                                  org/CorpusID:31537462).
37. Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature      50. Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata,
    sets for face recognition under difficult lighting conditions". IEEE         Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-
    Transactions on Image Processing. 19 (6): 1635–1650.                         Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017).
    Bibcode:2010ITIP...19.1635T (https://ui.adsabs.harvard.edu/abs/20            "Visual Genome: Connecting Language and Vision Using
    10ITIP...19.1635T). CiteSeerX 10.1.1.105.3355 (https://citeseerx.ist.        Crowdsourced Dense Image Annotations". International Journal of
    psu.edu/viewdoc/summary?doi=10.1.1.105.3355).                                Computer Vision. 123: 32–73. arXiv:1602.07332 (https://arxiv.org/ab
    doi:10.1109/tip.2010.2042645 (https://doi.org/10.1109%2Ftip.2010.            s/1602.07332). doi:10.1007/s11263-016-0981-7 (https://doi.org/10.1
    2042645). PMID 20172829 (https://pubmed.ncbi.nlm.nih.gov/20172               007%2Fs11263-016-0981-7). S2CID 4492210 (https://api.semantic
    829). S2CID 4943234 (https://api.semanticscholar.org/CorpusID:49
                                                                                 scholar.org/CorpusID:4492210).
    43234).
                                                                             51. Karayev, S., et al. "A category-level 3-D object dataset: putting the
38. Mousavi, Mir Hashem; Faez, Karim; Asghari, Amin (2008). "Three               Kinect to work (http://alliejanoch.com/iccvw2011.pdf)." Proceedings
    Dimensional Face Recognition Using SVM Classifier" (https://ieeex            of the IEEE International Conference on Computer Vision
    plore.ieee.org/document/4529822). Seventh IEEE/ACIS                          Workshops. 2011.
    International Conference on Computer and Information Science
    (Icis 2008). pp. 208–213. doi:10.1109/ICIS.2008.77 (https://doi.org/1    52. Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable
    0.1109%2FICIS.2008.77). ISBN 978-0-7695-3131-1.                              nonparametric image parsing with superpixels (http://152.2.128.56/
    S2CID 2710422 (https://api.semanticscholar.org/CorpusID:271042               ~jtighe/Papers/ECCV10/eccv10-jtighe.pdf) Archived (https://web.arc
    2).                                                                          hive.org/web/20190806022752/http://152.2.128.56/~jtighe/Papers/E
                                                                                 CCV10/eccv10-jtighe.pdf) 6 August 2019 at the Wayback Machine."
39. Amberg, Brian; Knothe, Reinhard; Vetter, Thomas (2008).                      Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010.
    "Expression invariant 3D face recognition with a Morphable Model"
                                                                                 352–365.
    (https://gravis.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf)
    (PDF). 2008 8th IEEE International Conference on Automatic Face          53. Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour
    & Gesture Recognition. pp. 1–6. doi:10.1109/AFGR.2008.4813376                Detection and Hierarchical Image Segmentation" (http://www.eecs.
    (https://doi.org/10.1109%2FAFGR.2008.4813376). ISBN 978-1-                   berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_
    4244-2154-1. S2CID 5651453 (https://api.semanticscholar.org/Corp             pami2010.pdf) (PDF). IEEE Transactions on Pattern Analysis and
    usID:5651453).                                                               Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161
                                                                                 (https://doi.org/10.1109%2Ftpami.2010.161). PMID 20733228 (http
40. Irfanoglu, M.O.; Gokberk, B.; Akarun, L. (2004). "3D shape-based             s://pubmed.ncbi.nlm.nih.gov/20733228). S2CID 206764694 (https://
    face recognition using automatically registered facial surfaces" (http       api.semanticscholar.org/CorpusID:206764694). Retrieved
    s://www.researchgate.net/publication/4090704). Proceedings of the            27 February 2016.
    17th International Conference on Pattern Recognition, 2004. ICPR
    2004. pp. 183-186 Vol.4. doi:10.1109/ICPR.2004.1333734 (https://d        54. Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Bourdev, Lubomir;
    oi.org/10.1109%2FICPR.2004.1333734). ISBN 0-7695-2128-2.                     Girshick, Ross; Hays, James; Perona, Pietro; Ramanan, Deva;
    S2CID 10987293 (https://api.semanticscholar.org/CorpusID:109872              Lawrence Zitnick, C.; Dollár, Piotr (2014). "Microsoft COCO:
    93).                                                                         Common Objects in Context". arXiv:1405.0312 (https://arxiv.org/ab
                                                                                 s/1405.0312) [cs.CV (https://arxiv.org/archive/cs.CV)].
41. Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D
    and grey level clues". Pattern Recognition Letters. 22 (12): 1321–       55. Russakovsky, Olga; et al. (2015). "Imagenet large scale visual
    1329. Bibcode:2001PaReL..22.1321B (https://ui.adsabs.harvard.ed              recognition challenge". International Journal of Computer Vision.
    u/abs/2001PaReL..22.1321B). doi:10.1016/s0167-8655(01)00077-                 115 (3): 211–252. arXiv:1409.0575 (https://arxiv.org/abs/1409.057
    0 (https://doi.org/10.1016%2Fs0167-8655%2801%2900077-0).                     5). doi:10.1007/s11263-015-0816-y (https://doi.org/10.1007%2Fs11
                                                                                 263-015-0816-y). hdl:1721.1/104944 (https://hdl.handle.net/1721.
42. Afifi, Mahmoud; Abdelhamed, Abdelrahman (13 June 2017).                      1%2F104944). S2CID 2930547 (https://api.semanticscholar.org/Cor
    "AFIF4: Deep Gender Classification based on AdaBoost-based                   pusID:2930547).
    Fusion of Isolated Facial Features and Foggy Faces".
    arXiv:1706.04277 (https://arxiv.org/abs/1706.04277) [cs.CV (https://     56. "COCO – Common Objects in Context" (https://cocodataset.org/).
    arxiv.org/archive/cs.CV)].                                                   cocodataset.org.
57. Xiao, Jianxiong, et al. "Sun database: Large-scale scene                 73. M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R.
    recognition from abbey to zoo." Computer vision and pattern                  Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes
    recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.                     Dataset (https://www.cityscapes-dataset.com/wordpress/wp-conten
58. Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang,          t/papercite-data/pdf/cordts2015cvprw.pdf)." In CVPR Workshop on
    Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep                    The Future of Datasets in Vision, 2015.
    Convolutional Activation Feature for Generic Visual Recognition".        74. Everingham, Mark; et al. (2010). "The pascal visual object classes
    arXiv:1310.1531 (https://arxiv.org/abs/1310.1531) [cs.CV (https://arx        (voc) challenge" (https://www.research.ed.ac.uk/portal/en/publicatio
    iv.org/archive/cs.CV)].                                                      ns/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-
59. Deng, Jia, et al. "Imagenet: A large-scale hierarchical image                442b-ab2d-284210cf72d6).html). International Journal of Computer
    database (https://www.researchgate.net/profile/Li_Jia_Li/publicatio          Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4 (https://do
    n/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Data                   i.org/10.1007%2Fs11263-009-0275-4).
    base/links/00b495388120dbc339000000/ImageNet-a-Large-Scale-                  hdl:20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6 (http
    Hierarchical-Image-Database.pdf)."Computer Vision and Pattern                s://hdl.handle.net/20.500.11820%2F88a29de3-6220-442b-ab2d-28
    Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.                4210cf72d6). S2CID 4246903 (https://api.semanticscholar.org/Corp
60. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet          usID:4246903).
    classification with deep convolutional neural networks (http://paper     75. Felzenszwalb, Pedro F.; et al. (2010). "Object detection with
    s.nips.cc/paper/4824-imagenet-classification-with-deep-convolution           discriminatively trained part-based models". IEEE Transactions on
    al-neural-networks.pdf)." Advances in neural information                     Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645.
    processing systems. 2012.                                                    CiteSeerX 10.1.1.153.2745 (https://citeseerx.ist.psu.edu/viewdoc/su
                                                                                 mmary?doi=10.1.1.153.2745). doi:10.1109/tpami.2009.167 (https://d
61. Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan;
    Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale             oi.org/10.1109%2Ftpami.2009.167). PMID 20634557 (https://pubme
                                                                                 d.ncbi.nlm.nih.gov/20634557). S2CID 3198903 (https://api.semantic
    Visual Recognition Challenge". International Journal of Computer
    Vision. 115 (3): 211–252. arXiv:1409.0575 (https://arxiv.org/abs/140         scholar.org/CorpusID:3198903).
    9.0575). doi:10.1007/s11263-015-0816-y (https://doi.org/10.1007%2        76. Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A
    Fs11263-015-0816-y). hdl:1721.1/104944 (https://hdl.handle.net/17            procrustean approach to learning binary codes." Computer Vision
    21.1%2F104944). S2CID 2930547 (https://api.semanticscholar.org/              and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE,
    CorpusID:2930547).                                                           2011.
62. Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-        77. "CINIC-10 dataset" (http://www.bayeswatch.com/2018/10/09/CINI
    Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari,             C/). Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J.
    Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun,                   Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 9 October
    Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for                 2018. Retrieved 13 November 2018.
    large-scale multi-label and multi-class image classification, 2017.      78. fashion-mnist: A MNIST-like fashion product database. Benchmark
    Available from https://github.com/openimages."                               :point_right (https://github.com/zalandoresearch/fashion-mnist),
63. Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast                Zalando Research, 7 October 2017, retrieved 7 October 2017
    News Videos (https://dl.acm.org/citation.cfm?id=2683546)."               79. "notMNIST dataset" (http://yaroslavvb.blogspot.com/2011/09/notmni
    Proceedings of the 2014 Indian Conference on Computer Vision                 st-dataset.html). Machine Learning, etc. 8 September 2011.
    Graphics and Image Processing. ACM, 2014.                                    Retrieved 13 October 2017.
64. Hauptmann, Alexander G., and Michael J. Witbrock. "Story                 80. Houben, Sebastian, et al. "Detection of traffic signs in real-world
    segmentation and detection of commercials in broadcast news                  images: The German Traffic Sign Detection Benchmark (https://ww
    video (https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f8            w.researchgate.net/profile/Sebastian_Houben/publication/2423466
    4893bfab1137f0b2.pdf)." Research and Technology Advances in                  25_Detection_of_Traffic_Signs_in_Real-World_Images_The_Germ
    Digital Libraries, 1998. ADL 98. Proceedings. IEEE International             an_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e9
    Forum on. IEEE, 1998.                                                        7000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-Ger
65. Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and            man-Traffic-Sign-Detection-Benchmark.pdf)." Neural Networks
    visualizing nonlinear correlation clusters (https://www.researchgate.        (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
    net/profile/Anthony_Tung/publication/221214229_CURLER_Findin             81. Mathias, Mayeul, et al. "Traffic sign recognition—How far are we
    g_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a               from the solution? (http://www.varcity.eu/paper/ijcnn2013_mathias_t
    08aed621de05cd92.pdf)." Proceedings of the 2005 ACM SIGMOD                   rafficsign.pdf)." Neural Networks (IJCNN), The 2013 International
    international conference on Management of data. ACM, 2005.                   Joint Conference on. IEEE, 2013.
66. Jarrett, Kevin, et al. "What is the best multi-stage architecture for    82. Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready
    object recognition? (https://ieeexplore.ieee.org/abstract/document/5         for autonomous driving? the kitti vision benchmark suite (https://ww
    459469/)." Computer Vision, 2009 IEEE 12th International                     w.cvlibs.net/publications/Geiger2012CVPR.pdf)." Computer Vision
    Conference on. IEEE, 2009.                                                   and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE,
67. Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond                 2012.
    bags of features: Spatial pyramid matching for recognizing natural       83. Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D
    scene categories (https://hal.inria.fr/inria-00548585/documen                SLAM systems (http://jsturm.de/publications/data/sturm12iros.pdf)."
    t)."Computer Vision and Pattern Recognition, 2006 IEEE Computer              Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ
    Society Conference on. Vol. 2. IEEE, 2006.                                   International Conference on. IEEE, 2012.
68. Griffin, G., A. Holub, and P. Perona. Caltech-256 object category        84. The KITTI Vision Benchmark Suite (https://www.youtube.com/watc
    dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available:         h?v=KXpZ6B1YB_k) on YouTube
    http://authors.library.caltech.edu/7694, 2007.                           85. Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5
69. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern                      dataset. Chaladze.com. Retrieved 13 November 2017, from
   🐺
    information retrieval. Vol. 463. New York: ACM press, 1999.                  http://chaladze.com/l5/
70.     COYO-700M: Image-Text Pair Dataset (https://github.com/kakao         86. Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle
    brain/coyo-dataset), Kakao Brain, 3 November 2022, retrieved                 Detection in Agriculture" (https://vision.eng.au.dk/fieldsafe).
    3 November 2022                                                              Sensors. 17 (11): 2579. arXiv:1709.03526 (https://arxiv.org/abs/170
71. Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing                 9.03526). Bibcode:2017Senso..17.2579K (https://ui.adsabs.harvar
    (https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc            d.edu/abs/2017Senso..17.2579K). doi:10.3390/s17112579 (https://d
    21f22d0085.pdf)." Computer Vision—ACCV 2014. Springer                        oi.org/10.3390%2Fs17112579). PMC 5713196 (https://www.ncbi.nl
    International Publishing, 2014. 162–177.                                     m.nih.gov/pmc/articles/PMC5713196). PMID 29120383 (https://pub
                                                                                 med.ncbi.nlm.nih.gov/29120383).
72. Heitz, Geremy; et al. (2009). "Shape-based object localization for
    descriptive classification". International Journal of Computer Vision.   87. Afifi, Mahmoud (12 November 2017). "Gender recognition and
    84 (1): 40–62. CiteSeerX 10.1.1.142.280 (https://citeseerx.ist.psu.ed        biometric identification using a large dataset of hand images".
    u/viewdoc/summary?doi=10.1.1.142.280). doi:10.1007/s11263-009-               arXiv:1711.04322 (https://arxiv.org/abs/1711.04322) [cs.CV (https://
    0228-y (https://doi.org/10.1007%2Fs11263-009-0228-y).                        arxiv.org/archive/cs.CV)].
    S2CID 646320 (https://api.semanticscholar.org/CorpusID:646320).
 88. Lomonaco, Vincenzo; Maltoni, Davide (18 October 2017).                   103. Behrendt, Karsten; Novak, Libor; Botros, Rami (May 2017). "A deep
     "CORe50: a New Dataset and Benchmark for Continuous Object                    learning approach to traffic lights: Detection, tracking, and
     Recognition". arXiv:1705.03550 (https://arxiv.org/abs/1705.03550)             classification" (https://ieeexplore.ieee.org/document/7989163).
     [cs.CV (https://arxiv.org/archive/cs.CV)].                                    2017 IEEE International Conference on Robotics and Automation
 89. She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin;                  (ICRA). pp. 1370–1377. doi:10.1109/ICRA.2017.7989163 (https://d
     Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao;                   oi.org/10.1109%2FICRA.2017.7989163). ISBN 978-1-5090-4633-1.
     Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (15 November 2019).                  S2CID 6257133 (https://api.semanticscholar.org/CorpusID:625713
     "OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for                 3).
     Lifelong Deep Learning". arXiv:1911.06487v2 (https://arxiv.org/abs/      104. "FRSign Dataset" (https://frsign.irt-systemx.fr/). frsign.irt-systemx.fr.
     1911.06487v2) [cs.CV (https://arxiv.org/archive/cs.CV)].                      Retrieved 5 May 2023.
 90. Morozov, Alexei; Sushkova, Olga (13 June 2019). "THz and thermal         105. Harb, Jeanine; Rébéna, Nicolas; Chosidow, Raphaël; Roblin,
     video data set" (http://www.fullvision.ru/monitoring/description_eng.         Grégoire; Potarusov, Roman; Hajri, Hatem (5 February 2020).
     php). Development of the multi-agent logic programming approach               "FRSign: A Large-Scale Traffic Light Dataset for Autonomous
     to a human behaviour analysis in a multi-channel video                        Trains". arXiv:2002.05665 (https://arxiv.org/abs/2002.05665) [cs.CY
     surveillance. Moscow: IRE RAS. Retrieved 19 July 2019.                        (https://arxiv.org/archive/cs.CY)].
 91. Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov,              106. "ifs-rwth-aachen/GERALD" (https://github.com/ifs-rwth-aachen/GER
     Alexander (9 July 2019). "Development of a method of terahertz                ALD). Chair and Institute for Rail Vehicles and Transport Systems.
     intelligent video surveillance based on the semantic fusion of                30 April 2023. Retrieved 5 May 2023.
     terahertz and 3D video images" (http://ceur-ws.org/Vol-2391/paper1       107. Leibner, Philipp; Hampel, Fabian; Schindler, Christian (3 April
     9.pdf) (PDF). CEUR. 2391: paper19. Retrieved 19 July 2019.                    2023). "GERALD: A novel dataset for the detection of German
 92. "Papers with Code - Daimler Monocular Pedestrian Detection                    mainline railway signals" (https://journals.sagepub.com/doi/abs/10.
     Dataset" (https://paperswithcode.com/dataset/daimler-monocular-p              1177/09544097231166472). Proceedings of the Institution of
     edestrian-detection). paperswithcode.com. Retrieved 5 May 2023.               Mechanical Engineers, Part F: Journal of Rail and Rapid Transit:
 93. Enzweiler, Markus; Gavrila, Dariu M. (December 2009). "Monocular              095440972311664. doi:10.1177/09544097231166472 (https://doi.o
     Pedestrian Detection: Survey and Experiments" (https://ieeexplore.i           rg/10.1177%2F09544097231166472). ISSN 0954-4097 (https://ww
     eee.org/document/4657363). IEEE Transactions on Pattern                       w.worldcat.org/issn/0954-4097). S2CID 257939937 (https://api.sem
     Analysis and Machine Intelligence. 31 (12): 2179–2195.                        anticscholar.org/CorpusID:257939937).
     doi:10.1109/TPAMI.2008.260 (https://doi.org/10.1109%2FTPAMI.20           108. Wojek, Christian; Walk, Stefan; Schiele, Bernt (June 2009). "Multi-
     08.260). ISSN 1939-3539 (https://www.worldcat.org/issn/1939-353               cue onboard pedestrian detection" (https://ieeexplore.ieee.org/docu
     9). PMID 19834140 (https://pubmed.ncbi.nlm.nih.gov/19834140).                 ment/5206638). 2009 IEEE Conference on Computer Vision and
     S2CID 1192198 (https://api.semanticscholar.org/CorpusID:119219                Pattern Recognition. pp. 794–801.
     8).                                                                           doi:10.1109/CVPR.2009.5206638 (https://doi.org/10.1109%2FCVP
 94. Yin, Guojun; Liu, Bin; Zhu, Huihui; Gong, Tao; Yu, Nenghai (28 July           R.2009.5206638). ISBN 978-1-4244-3992-8. S2CID 18000078 (http
     2020). "A Large Scale Urban Surveillance Video Dataset for                    s://api.semanticscholar.org/CorpusID:18000078).
     Multiple-Object Tracking and Behavior Analysis". arXiv:1904.11784        109. Toprak, Tuğçe; Aydın, Burak; Belenlioğlu, Burak; Güzeliş, Cüneyt;
     (https://arxiv.org/abs/1904.11784) [cs.CV (https://arxiv.org/archive/c        Selver, M. Alper (5 April 2020). "Railway Pedestrian Dataset
     s.CV)].                                                                       (RAWPED)" (https://zenodo.org/record/3741742).
 95. "Object Recognition in Video Dataset" (https://mi.eng.cam.ac.uk/res           doi:10.1109/TVT.2020.2983825 (https://doi.org/10.1109%2FTVT.20
     earch/projects/VideoRec/CamVid/). mi.eng.cam.ac.uk. Retrieved                 20.2983825). S2CID 216510283 (https://api.semanticscholar.org/C
     5 May 2023.                                                                   orpusID:216510283). Retrieved 5 May 2023.
 96. Brostow, Gabriel J.; Shotton, Jamie; Fauqueur, Julien; Cipolla,          110. Toprak, Tugce; Belenlioglu, Burak; Aydın, Burak; Guzelis, Cuneyt;
     Roberto (2008). "Segmentation and Recognition Using Structure                 Selver, M. Alper (May 2020). "Conditional Weighted Ensemble of
     from Motion Point Clouds" (https://link.springer.com/chapter/10.100           Transferred Models for Camera Based Onboard Pedestrian
     7/978-3-540-88682-2_5). Computer Vision – ECCV 2008. Lecture                  Detection in Railway Driver Support Systems" (https://ieeexplore.ie
     Notes in Computer Science. Springer. 5302: 44–57.                             ee.org/document/9050835). IEEE Transactions on Vehicular
     doi:10.1007/978-3-540-88682-2_5 (https://doi.org/10.1007%2F978-               Technology. 69 (5): 5041–5054. doi:10.1109/TVT.2020.2983825 (ht
     3-540-88682-2_5). ISBN 978-3-540-88681-5.                                     tps://doi.org/10.1109%2FTVT.2020.2983825). ISSN 1939-9359 (htt
 97. Brostow, Gabriel J.; Fauqueur, Julien; Cipolla, Roberto (15 January           ps://www.worldcat.org/issn/1939-9359). S2CID 216510283 (https://
     2009). "Semantic object classes in video: A high-definition ground            api.semanticscholar.org/CorpusID:216510283).
     truth database" (https://www.sciencedirect.com/science/article/abs/p     111. Tilly, Roman; Neumaier, Philipp; Schwalbe, Karsten; Klasek, Pavel;
     ii/S0167865508001220). Pattern Recognition Letters. 30 (2): 88–               Tagiew, Rustam; Denzler, Patrick; Klockau, Tobias; Boekhoff,
     97. Bibcode:2009PaReL..30...88B (https://ui.adsabs.harvard.edu/ab             Martin; Köppel, Martin (2023). "Open Sensor Data for Rail 2023" (in
     s/2009PaReL..30...88B). doi:10.1016/j.patrec.2008.04.005 (https://d           German). doi:10.57806/9mv146r0 (https://doi.org/10.57806%2F9mv
     oi.org/10.1016%2Fj.patrec.2008.04.005). ISSN 0167-8655 (https://              146r0).
     www.worldcat.org/issn/0167-8655).                                        112. Tagiew, Rustam; Köppel, Martin; Schwalbe, Karsten; Denzler,
 98. "WildDash 2 Benchmark" (https://wilddash.cc/railsem19).                       Patrick; Neumaier, Philipp; Klockau, Tobias; Boekhoff, Martin;
     wilddash.cc. Retrieved 5 May 2023.                                            Klasek, Pavel; Tilly, Roman (4 May 2023). "OSDaR23: Open
 99. Zendel, Oliver; Murschitz, Markus; Zeilinger, Marcel; Steininger,             Sensor Data for Rail 2023". arXiv:2305.03001 (https://arxiv.org/abs/
     Daniel; Abbasi, Sara; Beleznai, Csaba (June 2019). "RailSem19: A              2305.03001) [cs.CV (https://arxiv.org/archive/cs.CV)].
     Dataset for Semantic Rail Scene Understanding" (https://ieeexplor        113. "Home" (https://www.argoverse.org/). Argoverse. Retrieved 5 May
     e.ieee.org/document/9025646). 2019 IEEE/CVF Conference on                     2023.
     Computer Vision and Pattern Recognition Workshops (CVPRW).               114. Chang, Ming-Fang; Lambert, John; Sangkloy, Patsorn; Singh,
     pp. 1221–1229. doi:10.1109/CVPRW.2019.00161 (https://doi.org/1                Jagjeet; Bak, Slawomir; Hartnett, Andrew; Wang, De; Carr, Peter;
     0.1109%2FCVPRW.2019.00161). ISBN 978-1-7281-2506-0.                           Lucey, Simon; Ramanan, Deva; Hays, James (6 November 2019).
     S2CID 198166233 (https://api.semanticscholar.org/CorpusID:19816               "Argoverse: 3D Tracking and Forecasting with Rich Maps".
     6233).                                                                        arXiv:1911.02620 (https://arxiv.org/abs/1911.02620) [cs.CV (https://
100. "The Boreas Dataset" (https://www.boreas.utias.utoronto.ca/#/).               arxiv.org/archive/cs.CV)].
     www.boreas.utias.utoronto.ca. Retrieved 5 May 2023.                      115. Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept
101. Burnett, Keenan; Yoon, David J.; Wu, Yuchen; Li, Andrew Zou;                  definitions (https://pdfs.semanticscholar.org/9f0e/1349d1422f1b455
     Zhang, Haowei; Lu, Shichen; Qian, Jingxing; Tseng, Wei-Kang;                  b8ccc26ebf7b114b8db20.pdf)." Fuzzy Systems, 1993., Second
     Lambert, Andrew; Leung, Keith Y. K.; Schoellig, Angela P.; Barfoot,           IEEE International Conference on. IEEE, 1993.
     Timothy D. (26 January 2023). "Boreas: A Multi-Season                    116. Frey, Peter W.; Slate, David J. (1991). "Letter recognition using
     Autonomous Driving Dataset". arXiv:2203.10168 (https://arxiv.org/a            Holland-style adaptive classifiers" (https://doi.org/10.1007%2Fbf00
     bs/2203.10168) [cs.RO (https://arxiv.org/archive/cs.RO)].                     114162). Machine Learning. 6 (2): 161–182.
102. "Bosch Small Traffic Lights Dataset" (https://hci.iwr.uni-heidelberg.d        doi:10.1007/bf00114162 (https://doi.org/10.1007%2Fbf00114162).
     e/content/bosch-small-traffic-lights-dataset). hci.iwr.uni-
     heidelberg.de. 1 March 2017. Retrieved 5 May 2023.
117. Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved          132. Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of
     learning of Riemannian metrics for exploratory analysis". Neural             handwritten digit recognition tested on MNIST database". Image
     Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865 (https://cite          and Vision Computing. 22 (12): 971–981.
     seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.4865).                       doi:10.1016/j.imavis.2004.03.008 (https://doi.org/10.1016%2Fj.imav
     doi:10.1016/j.neunet.2004.06.008 (https://doi.org/10.1016%2Fj.neu            is.2004.03.008).
     net.2004.06.008). PMID 15555853 (https://pubmed.ncbi.nlm.nih.go         133. Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of
     v/15555853).                                                                 combining multiple classifiers and their applications to handwriting
118. Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January              recognition". IEEE Transactions on Systems, Man and Cybernetics.
     2013). "Online and offline handwritten Chinese character                     22 (3): 418–435. doi:10.1109/21.155943 (https://doi.org/10.1109%2
     recognition: Benchmarking on new databases". Pattern                         F21.155943). hdl:10338.dmlcz/135217 (https://hdl.handle.net/1033
     Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L (http              8.dmlcz%2F135217).
     s://ui.adsabs.harvard.edu/abs/2013PatRe..46..155L).                     134. Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based
     doi:10.1016/j.patcog.2012.06.021 (https://doi.org/10.1016%2Fj.patc           handwritten digit recognition (http://citeseerx.ist.psu.edu/viewdoc/su
     og.2012.06.021).                                                             mmary?doi=10.1.1.25.6299)." (1996).
119. Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A           135. Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using
     Database of Online Handwritten Chinese Characters". 2009 10th                relevance weighted LDA". Pattern Recognition. 38 (4): 485–493.
     International Conference on Document Analysis and Recognition.               Bibcode:2005PatRe..38..485T (https://ui.adsabs.harvard.edu/abs/2
     pp. 1206–1210. doi:10.1109/ICDAR.2009.163 (https://doi.org/10.11             005PatRe..38..485T). doi:10.1016/j.patcog.2004.09.005 (https://doi.
     09%2FICDAR.2009.163). ISBN 978-1-4244-4500-4.                                org/10.1016%2Fj.patcog.2004.09.005). S2CID 10580110 (https://ap
     S2CID 5705532 (https://api.semanticscholar.org/CorpusID:570553               i.semanticscholar.org/CorpusID:10580110).
     2).                                                                     136. Hong, Yi, et al. "Learning a mixture of sparse distance metrics for
120. Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting            classification and dimensionality reduction (https://pages.ucsd.edu/
     motion primitives from natural handwriting data (https://www.era.lib.        ~ztu/publication/iccv11_sparsemetric.pdf)." Computer Vision
     ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20                   (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
     thesis%2009.pdf?sequence=1). Springer Berlin Heidelberg, 2006.          137. Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 (ht
121. Meier, Franziska, et al. "Movement segmentation using a primitive            tps://arxiv.org/abs/1701.08380) [cs.CV (https://arxiv.org/archive/cs.C
     library (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3          V)].
     95.8598&rep=rep1&type=pdf)."Intelligent Robots and Systems              138. Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat;
     (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.               Mukhopadhyay, Supratik (20 June 2018). "Pixel-level
122. T. E. de Campos, B. R. Babu and M. Varma. Character recognition              Reconstruction and Classification for Noisy Handwritten Bangla
     in natural images (http://personal.ee.surrey.ac.uk/Personal/T.Decam          Characters". arXiv:1806.08037 (https://arxiv.org/abs/1806.08037)
     pos/papers/decampos_etal_visapp2009.pdf). In Proceedings of the              [cs.CV (https://arxiv.org/archive/cs.CV)].
     International Conference on Computer Vision Theory and                  139. Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019),
     Applications (VISAPP), Lisbon, Portugal, February 2009
                                                                                  "PCGAN-CHAR: Progressively Trained Classifier Generative
123. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van                   Adversarial Networks for Classification of Noisy Handwritten
     Schaik (2017). "EMNIST: An extension of MNIST to handwritten                 Bangla Characters", Digital Libraries at the Crossroads of Digital
     letters". arXiv:1702.05373v1 (https://arxiv.org/abs/1702.05373v1)            Information for the Future, Springer International Publishing, pp. 3–
     [cs.CV (https://arxiv.org/archive/cs.CV)].                                   15, arXiv:1908.08987 (https://arxiv.org/abs/1908.08987),
124. "The EMNIST Dataset" (https://www.nist.gov/itl/products-and-servic           doi:10.1007/978-3-030-34058-2_1 (https://doi.org/10.1007%2F978-
     es/emnist-dataset). NIST. 4 April 2017.                                      3-030-34058-2_1), ISBN 978-3-030-34057-5, S2CID 201665955 (ht
125. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van                   tps://api.semanticscholar.org/CorpusID:201665955)
     Schaik (2017). "EMNIST: An extension of MNIST to handwritten            140. "iSAID" (https://captain-whu.github.io/iSAID/index.html). captain-
     letters". arXiv:1702.05373 (https://arxiv.org/abs/1702.05373) [cs.CV         whu.github.io. Retrieved 30 November 2021.
     (https://arxiv.org/archive/cs.CV)].                                     141. Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman &
126. Llorens, David, et al. "The UJIpenchars Database: a Pen-Based                Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-
     Database of Isolated Handwritten Characters (https://web.archive.or          Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for
     g/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef150             Instance Segmentation in Aerial Images. website (https://captain-wh
     94c59322560377bbf8e4185245c654f.pdf)." LREC. 2008.                           u.github.io/iSAID/index.html)
127. Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures     142. Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013).
     of von mises distributions for people trajectory shape analysis".            "Systematic benchmarking of aerial image segmentation". IEEE
     IEEE Transactions on Circuits and Systems for Video Technology.              Geoscience and Remote Sensing Letters. 10 (6): 1527–1531.
     21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550 (https://doi.org/10.         Bibcode:2013IGRSL..10.1527Y (https://ui.adsabs.harvard.edu/abs/
     1109%2Ftcsvt.2011.2125550). S2CID 1427766 (https://api.semanti               2013IGRSL..10.1527Y). doi:10.1109/lgrs.2013.2261453 (https://doi.
     cscholar.org/CorpusID:1427766).                                              org/10.1109%2Flgrs.2013.2261453). S2CID 629629 (https://api.se
128. Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature            manticscholar.org/CorpusID:629629).
     selection challenge (http://papers.nips.cc/paper/2728-result-analysi    143. Vatsavai, Ranga Raju. "Object based image classification: state of
     s-of-the-nips-2003-feature-selection-challenge.pdf)." Advances in            the art and computational challenges (https://dl.acm.org/citation.cf
     neural information processing systems. 2004.                                 m?id=2534927)." Proceedings of the 2nd ACM SIGSPATIAL
129. Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (11 December                International Workshop on Analytics for Big Geospatial Data. ACM,
     2015). "Human-level concept learning through probabilistic                   2013.
     program induction" (https://doi.org/10.1126%2Fscience.aab3050).         144. Butenuth, Matthias, et al. "Integrating pedestrian simulation,
     Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L (htt             tracking and event detection for crowd analysis (http://www.hartman
     ps://ui.adsabs.harvard.edu/abs/2015Sci...350.1332L).                         n-alberts.de/dirk/pub/proceedings2011e.pdf)." Computer Vision
     doi:10.1126/science.aab3050 (https://doi.org/10.1126%2Fscience.a             Workshops (ICCV Workshops), 2011 IEEE International
     ab3050). ISSN 0036-8075 (https://www.worldcat.org/issn/0036-807              Conference on. IEEE, 2011.
     5). PMID 26659050 (https://pubmed.ncbi.nlm.nih.gov/26659050).           145. Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis
130. Lake, Brenden (9 November 2019), Omniglot data set for one-shot              using frame-wise normalized feature for people counting (http://ww
     learning (https://github.com/brendenlake/omniglot), retrieved                w.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf)."
     10 November 2019                                                             Information Forensics and Security (WIFS), 2012 IEEE International
131. LeCun, Yann; et al. (1998). "Gradient-based learning applied to              Workshop on. IEEE, 2012.
     document recognition". Proceedings of the IEEE. 86 (11): 2278–          146. Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan.
     2324. CiteSeerX 10.1.1.32.9552 (https://citeseerx.ist.psu.edu/viewd          "A hybrid pansharpening approach and multiscale object-based
     oc/summary?doi=10.1.1.32.9552). doi:10.1109/5.726791 (https://do             image analysis for mapping diseased pine and oak trees (http://cite
     i.org/10.1109%2F5.726791). S2CID 14542261 (https://api.semantic              seerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep
     scholar.org/CorpusID:14542261).                                              1&type=pdf)." International journal of remote sensing34.20 (2013):
                                                                                  6969–6982.
147. Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha,               161. Waszak et al. "Semantic Segmentation in Underwater Ship
     Norwati; Perumal, Thinagaran (2015). "A new classification model             Inspections: Benchmark and Data Set (https://ieeexplore.ieee.org/d
     for a class imbalanced data set using genetic programming and                ocument/9998080)." IEEE Journal of Oceanic Engineering. IEEE,
     support vector machines: Case study for wilt disease classification"         2022.
     (https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.106          162. Ebadi, Ashkan; Paul, Patrick; Auer, Sofia; Tremblay, Stéphane (12
     2159). Remote Sensing Letters. 6 (7): 568–577.                               November 2021). "NRC-GAMMA: Introducing a Novel Large Gas
     doi:10.1080/2150704X.2015.1062159 (https://doi.org/10.1080%2F2               Meter Image Dataset". arXiv:2111.06827 (https://arxiv.org/abs/2111.
     150704X.2015.1062159). S2CID 58788630 (https://api.semanticsc                06827) [cs.CV (https://arxiv.org/archive/cs.CV)].
     holar.org/CorpusID:58788630).
                                                                             163. Canada, Government of Canada National Research Council
148. Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification          (2021). "The gas meter image dataset (NRC-GAMMA) - NRC
     from Optical Aerial Images with Convolutional Neural Networks (htt           Digital Repository" (https://nrc-digital-repository.canada.ca/eng/vie
     ps://www.mdpi.com/2072-4292/10/4/511)." Remote Sensing. 2018;                w/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4). nrc-digital-
     10(4):511.                                                                   repository.canada.ca. doi:10.4224/3c8s-z290 (https://doi.org/10.422
149. Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery             4%2F3c8s-z290). Retrieved 2 December 2021.
     dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018.     164. Rabah, Chaima Ben; Coatrieux, Gouenou; Abdelfattah, Riadh
150. Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using               (October 2020). "The Supatlantique Scanned Documents Database
     geographically weighted variables for image classification".                 for Digital Image Forensics Purposes" (https://dx.doi.org/10.1109/ici
     Remote Sensing Letters. 3 (6): 491–499.                                      p40778.2020.9190665). 2020 IEEE International Conference on
     doi:10.1080/01431161.2011.629637 (https://doi.org/10.1080%2F01               Image Processing (ICIP). IEEE. pp. 2096–2100.
     431161.2011.629637). S2CID 122543681 (https://api.semanticscho               doi:10.1109/icip40778.2020.9190665 (https://doi.org/10.1109%2Fic
     lar.org/CorpusID:122543681).                                                 ip40778.2020.9190665). ISBN 978-1-7281-6395-6.
151. Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid         S2CID 224881147 (https://api.semanticscholar.org/CorpusID:22488
     NN-GA Model Based Approach (https://www.researchgate.net/profil              1147).
     e/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Cl            165. Mills, Kyle; Tamblyn, Isaac (16 May 2018), Big graphene dataset,
     assification_A_Hybrid_NN-GA_Model_Based_Approach/links/574                   National Research Council of Canada,
     93cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN-                doi:10.4224/c8sc04578j.data (https://doi.org/10.4224%2Fc8sc0457
     GA-Model-Based-Approach.pdf)." Information Systems Design and                8j.data)
     Intelligent Applications. Springer India, 2016. 227–236.                166. Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (16 May 2018).
152. Diegert, Carl. "A combinatorial method for tracing objects using             "Quantum simulation". Quantum simulations of an electron in a two
     semantics of their shape (https://www.osti.gov/servlets/purl/127883          dimensional potential well. National Research Council of Canada.
     7)." Applied Imagery Pattern Recognition Workshop (AIPR), 2010               doi:10.4224/PhysRevA.96.042113.data (https://doi.org/10.4224%2F
     IEEE 39th. IEEE, 2010.                                                       PhysRevA.96.042113.data).
153. Razakarivony, Sebastien, and Frédéric Jurie. "Small target              167. Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). "A
     detection combining foreground and background manifolds (https://            database for fine grained activity detection of cooking activities".
     hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf)."           2012 IEEE Conference on Computer Vision and Pattern
     IAPR International Conference on Machine Vision Applications.                Recognition. IEEE. pp. 1194–1201. doi:10.1109/cvpr.2012.6247801
     2013.                                                                        (https://doi.org/10.1109%2Fcvpr.2012.6247801). ISBN 978-1-4673-
154. "SpaceNet" (http://explore.digitalglobe.com/spacenet).                       1228-8.
     explore.digitalglobe.com. Retrieved 13 March 2018.                      168. Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of
155. Etten, Adam Van (5 January 2017). "Getting Started With SpaceNet             actions: Recovering the syntax and semantics of goal-directed
     Data" (https://medium.com/the-downlinq/getting-started-with-spacen           human activities (https://www.cv-foundation.org/openaccess/content
     et-data-827fd2ec9f53). The DownLinQ. Retrieved 13 March 2018.                _cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_pap
                                                                                  er.pdf)."Proceedings of the IEEE Conference on Computer Vision
156. Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July
     2017). "Integrating edge/Boundary priors with classification scores          and Pattern Recognition. 2014.
     for building detection in very high resolution data". 2017 IEEE         169. Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in
     International Geoscience and Remote Sensing Symposium                        authentication based on physical non-cloneable functions: The
     (IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705 (htt                Forensic Authentication Microstructure Optical Set (FAMOS). (http://
     ps://doi.org/10.1109%2FIGARSS.2017.8127705). ISBN 978-1-                     vision.unige.ch/publications/postscript/2012/2012.WIFS.database.p
     5090-4951-6. S2CID 8297433 (https://api.semanticscholar.org/Corp             df)"Proc. Proceedings of IEEE International Workshop on
     usID:8297433).                                                               Information Forensics and Security. 2012.
157. Yang, Yi; Newsam, Shawn (2010). Bag-of-visual-words and spatial         170. Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile
     extensions for land-use classification. Proceedings of the 18th              fine-grained recognition of pharma packages (https://archive-ouvert
     SIGSPATIAL International Conference on Advances in Geographic                e.unige.ch/unige:97444/ATTACHMENT01)."Proc. European Signal
     Information Systems – GIS '10. New York, New York, USA: ACM                  Processing Conference (EUSIPCO). 2017.
     Press. doi:10.1145/1869790.1869829 (https://doi.org/10.1145%2F1         171. Khosla, Aditya, et al. "Novel dataset for fine-grained image
     869790.1869829). ISBN 9781450304283. S2CID 993769 (https://a                 categorization: Stanford dogs (https://people.csail.mit.edu/khosla/pa
     pi.semanticscholar.org/CorpusID:993769).                                     pers/fgvc2011.pdf)."Proc. CVPR Workshop on Fine-Grained Visual
158. Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik;                      Categorization (FGVC). 2011.
     DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (3                 172. Parkhi, Omkar M., et al. "Cats and dogs (http://www.robots.ox.ac.uk:
     November 2015). DeepSat: a learning framework for satellite                  5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf)."Computer
     imagery. ACM. p. 37. doi:10.1145/2820783.2820816 (https://doi.org/           Vision and Pattern Recognition (CVPR), 2012 IEEE Conference
     10.1145%2F2820783.2820816). ISBN 9781450339674.                              on. IEEE, 2012.
     S2CID 4387134 (https://api.semanticscholar.org/CorpusID:438713          173. Biggs, Benjamin; Boyne, Oliver; Charles, James; Fitzgibbon,
     4).                                                                          Andrew; Cipolla, Roberto (2020). Computer Vision – ECCV 2020.
159. Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay,                      Lecture Notes in Computer Science. Vol. 12356. arXiv:2007.11110
     Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna               (https://arxiv.org/abs/2007.11110). doi:10.1007/978-3-030-58621-8
     (21 November 2019). "DeepSat V2: feature augmented                           (https://doi.org/10.1007%2F978-3-030-58621-8). ISBN 978-3-030-
     convolutional neural nets for satellite image classification". Remote        58620-1. S2CID 227173931 (https://api.semanticscholar.org/Corpu
     Sensing Letters. 11 (2): 156–165. arXiv:1911.07747 (https://arxiv.or         sID:227173931).
     g/abs/1911.07747). doi:10.1080/2150704x.2019.1693071 (https://d         174. Razavian, Ali, et al. "CNN features off-the-shelf: an astounding
     oi.org/10.1080%2F2150704x.2019.1693071). ISSN 2150-704X (htt                 baseline for recognition (https://www.cv-foundation.org/openaccess/
     ps://www.worldcat.org/issn/2150-704X). S2CID 208138097 (https://             content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Feat
     api.semanticscholar.org/CorpusID:208138097).                                 ures_Off-the-Shelf_2014_CVPR_paper.pdf)." Proceedings of the
160. Md Jahidul Islam, et al. "Semantic Segmentation of Underwater                IEEE Conference on Computer Vision and Pattern Recognition
     Imagery: Dataset and Benchmark (https://ieeexplore.ieee.org/abstra           Workshops. 2014.
     ct/document/9340821)." 2020 IEEE/RSJ International Conference
     on Intelligent Robots and Systems (IROS). IEEE, 2020.
175. Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity    192. Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.;
     queries in MARS". IEEE Transactions on Knowledge and Data                     Derbaz, S. (July 2016). "A new compression technique for
     Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079 (https://cit           surveillance videos: Evaluation using new dataset". 2016 Sixth
     eseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.6079).                       International Conference on Digital Information and Communication
     doi:10.1109/69.738357 (https://doi.org/10.1109%2F69.738357).                  Technology and its Applications (DICTAP). pp. 159–164.
176. He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán.                doi:10.1109/DICTAP.2016.7544020 (https://doi.org/10.1109%2FDI
     "Multiscale conditional random fields for image labeling (ftp://www-v         CTAP.2016.7544020). ISBN 978-1-4673-9609-7. S2CID 8698850
     host.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr            (https://api.semanticscholar.org/CorpusID:8698850).
     04.pdf)." Computer vision and pattern recognition, 2004. CVPR            193. Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David
     2004. Proceedings of the 2004 IEEE computer society conference                W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.;
     on. Vol. 2. IEEE, 2004.                                                       Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White,
177. Deneke, Tewodros, et al. "Video transcoding time prediction for               Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.;
     proactive load balancing (https://ieeexplore.ieee.org/abstract/docum          Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob
     ent/6890256/)." Multimedia and Expo (ICME), 2014 IEEE                         S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna
     International Conference on. IEEE, 2014.                                      K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.;
                                                                                   Photopoulou, Theoni (2018). "Machine learning to classify animal
178. Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh,
                                                                                   species in camera trap images: Applications in ecology" (https://doi.
     Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick,
     Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick,                org/10.1111%2F2041-210X.13120). Methods in Ecology and
                                                                                   Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120 (https://d
     Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
     (13 April 2016). "Visual Storytelling". arXiv:1604.03968 (https://arxi        oi.org/10.1111%2F2041-210X.13120). ISSN 2041-210X (https://ww
     v.org/abs/1604.03968) [cs.CL (https://arxiv.org/archive/cs.CL)].              w.worldcat.org/issn/2041-210X).
                                                                              194. Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed,
179. Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset (htt
     ps://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf)."                  Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M.
                                                                                   (November 2017). "Can we see photosynthesis? Magnifying the
     (2011).
                                                                                   tiny color changes of plant green leaves using Eulerian video
180. Duan, Kun, et al. "Discovering localized attributes for fine-grained          magnification". Journal of Electronic Imaging. 26 (6): 060501.
     recognition (http://vision.soic.indiana.edu/papers/attributes2012cvp          arXiv:1706.03867 (https://arxiv.org/abs/1706.03867).
     r.pdf)." Computer Vision and Pattern Recognition (CVPR), 2012                 Bibcode:2017JEI....26f0501T (https://ui.adsabs.harvard.edu/abs/20
     IEEE Conference on. IEEE, 2012.                                               17JEI....26f0501T). doi:10.1117/1.jei.26.6.060501 (https://doi.org/1
181. "YouTube-8M Dataset" (https://research.google.com/youtube8m/).                0.1117%2F1.jei.26.6.060501). ISSN 1017-9909 (https://www.worldc
     research.google.com. Retrieved 1 October 2016.                                at.org/issn/1017-9909). S2CID 12367169 (https://api.semanticschol
182. Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul;             ar.org/CorpusID:12367169).
     Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan,           195. "Mathematical Mathematics Memes" (https://www.kaggle.com/abdel
     Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale                    ghanibelgaid/mathematical-mathematics-memes).
     Video Classification Benchmark". arXiv:1609.08675 (https://arxiv.or      196. Karras, Tero; Laine, Samuli; Aila, Timo (June 2019). "A Style-Based
     g/abs/1609.08675) [cs.CV (https://arxiv.org/archive/cs.CV)].                  Generator Architecture for Generative Adversarial Networks" (http
183. "YFCC100M Dataset" (http://mmcommons.org). mmcommons.org.                     s://dx.doi.org/10.1109/cvpr.2019.00453). 2019 IEEE/CVF
     Yahoo-ICSI-LLNL. Retrieved 1 June 2017.                                       Conference on Computer Vision and Pattern Recognition (CVPR).
184. Bart Thomee; David A Shamma; Gerald Friedland; Benjamin                       IEEE. pp. 4396–4405. arXiv:1812.04948 (https://arxiv.org/abs/1812.
     Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April          04948). doi:10.1109/cvpr.2019.00453 (https://doi.org/10.1109%2Fc
     2016). "Yfcc100m: The new data in multimedia research".                       vpr.2019.00453). ISBN 978-1-7281-3293-8. S2CID 54482423 (http
     Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817 (http              s://api.semanticscholar.org/CorpusID:54482423).
     s://arxiv.org/abs/1503.01817). doi:10.1145/2812802 (https://doi.org/     197. McAuley, Julian; Targett, Christopher; Shi, Qinfeng; Anton van den
     10.1145%2F2812802). S2CID 207230134 (https://api.semanticsch                  Hengel (2015). "Image-based Recommendations on Styles and
     olar.org/CorpusID:207230134).                                                 Substitutes". arXiv:1506.04757 (https://arxiv.org/abs/1506.04757)
185. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-                   [cs.CV (https://arxiv.org/archive/cs.CV)].
     ACCEDE: A Video Database for Affective Content Analysis (https://        198. "Amazon review data" (https://nijianmo.github.io/amazon/index.htm
     hal.archives-ouvertes.fr/hal-01375518/document)," in IEEE                     l). nijianmo.github.io. Retrieved 8 October 2021.
     Transactions on Affective Computing, 2015.                               199. Ganesan, Kavita; Zhai, Chengxiang (2012). "Opinion-based entity
186. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep                     ranking". Information Retrieval. 15 (2): 116–150.
     Learning vs. Kernel Methods: Performance for Emotion Prediction               doi:10.1007/s10791-011-9174-8 (https://doi.org/10.1007%2Fs1079
     in Videos (https://hal.archives-ouvertes.fr/hal-01193144/documen              1-011-9174-8). hdl:2142/15252 (https://hdl.handle.net/2142%2F152
     t)," in 2015 Humaine Association Conference on Affective                      52). S2CID 16258727 (https://api.semanticscholar.org/CorpusID:16
     Computing and Intelligent Interaction (ACII), 2015.                           258727).
187. M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E.              200. Lv, Yuanhua, Dimitrios Lymberopoulos, and Qiang Wu. "An
     Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The                       exploration of ranking heuristics in mobile local search (http://citese
     mediaeval 2015 affective impact of movies task (https://www.resear            erx.ist.psu.edu/viewdoc/download?doi=10.1.1.599.1442&rep=rep1
     chgate.net/profile/Hanli_Wang2/publication/309704559_The_Medi                 &type=pdf)." Proceedings of the 35th international ACM SIGIR
     aEval_2015_Affective_Impact_of_Movies_Task/links/581dada308a                  conference on Research and development in information retrieval.
     e12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-T                 ACM, 2012.
     ask.pdf)," in MediaEval 2015 Workshop, 2015.                             201. Harper, F. Maxwell; Konstan, Joseph A. (2015). "The MovieLens
188. S. Johnson and M. Everingham, "Clustered Pose and Nonlinear                   Datasets: History and Context". ACM Transactions on Interactive
     Appearance Models for Human Pose Estimation (http://sam.johnso                Intelligent Systems. 5 (4): 19. doi:10.1145/2827872 (https://doi.org/1
     n.io/research/publications/johnson10bmvc.pdf)", in Proceedings of             0.1145%2F2827872). S2CID 16619709 (https://api.semanticschola
     the 21st British Machine Vision Conference (BMVC2010)                         r.org/CorpusID:16619709).
189. S. Johnson and M. Everingham, "Learning Effective Human Pose             202. Koenigstein, Noam, Gideon Dror, and Yehuda Koren. "Yahoo!
     Estimation from Inaccurate Annotation (http://sam.johnson.io/resear           music recommendations: modeling music ratings with temporal
     ch/publications/johnson11cvpr.pdf)", In Proceedings of IEEE                   dynamics and item taxonomy (https://www.researchgate.net/profile/
     Conference on Computer Vision and Pattern Recognition                         Noam_Koenigstein/publication/221141054_Yahoo_music_recomm
     (CVPR2011)                                                                    endations_Modeling_music_ratings_with_temporal_dynamics_and
190. Afifi, Mahmoud; Hussain, Khaled F. (2 November 2017). "The                    _item_taxonomy/links/5404184a0cf2c48563b03c68/Yahoo-music-r
     Achievement of Higher Flexibility in Multiple Choice-based Tests              ecommendations-Modeling-music-ratings-with-temporal-dynamics-
     Using Image Classification Techniques". arXiv:1711.00972 (https://            and-item-taxonomy.pdf)." Proceedings of the fifth ACM conference
     arxiv.org/abs/1711.00972) [cs.CV (https://arxiv.org/archive/cs.CV)].          on Recommender systems. ACM, 2011.
191. "MCQ Dataset" (https://sites.google.com/view/mcq-dataset/mcqe-da
     taset). sites.google.com. Retrieved 18 November 2017.
203. McFee, Brian, et al. "The million song dataset challenge (https://bm       217. Amini, Massih R.; Usunier, Nicolas; Goutte, Cyril (2009). "Learning
     cfee.github.io/papers/msdchallenge.pdf)." Proceedings of the 21st               from Multiple Partially Observed Views – an Application to
     international conference companion on World Wide Web. ACM,                      Multilingual Text Categorization" (http://papers.nips.cc/paper/3690-l
     2012.                                                                           earning-from-multiple-partially-observed-views-an-application-to-m
204. Bohanec, Marko, and Vladislav Rajkovic. "Knowledge acquisition                  ultilingual-text-categorization). Advances in Neural Information
     and explanation for multi-attribute decision making (https://www.res            Processing Systems. 22: 28–36.
     earchgate.net/profile/Marko_Bohanec/publication/246614940_KNO              218. Liu, Ming; et al. (2015). "VRCA: a clustering algorithm for massive
     WLEDGE_ACQUISITION_AND_EXPLANATION_FOR_MULTI-AT                                 amount of texts" (https://www.aaai.org/ocs/index.php/IJCAI/IJCAI15/
     TRIBUTE_DECISION_MAKING/links/02e7e532152f452d8700000                           paper/download/10903/10990). Proceedings of the 24th
     0.pdf)." 8th Intl Workshop on Expert Systems and their Applications.            International Conference on Artificial Intelligence. AAAI Press.
     1988.                                                                      219. Al-Harbi, S; Almuhareb, A; Al-Thubaity, A; Khorsheed, M. S.; Al-
205. Tan, Peter J., and David L. Dowe. "MML inference of decision                    Rajeh, A (2008). "Automatic Arabic Text Classification".
     graphs with multi-way joins (http://www.csse.monash.edu.au/~dld/P               Proceedings of the 9th International Conference on the Statistical
     ublications/2002/Tan+Dowe2002_MMLDecisionGraphs.ps)."                           Analysis of Textual Data, Lyon, France.
     Australian Joint Conference on Artificial Intelligence. 2002.              220. "Relationship and Entity Extraction Evaluation Dataset: Dstl/re3d"
206. "Quantifying comedy on YouTube: why the number of o's in your                   (https://github.com/dstl/re3d). GitHub. 17 December 2018.
     LOL matter" (https://metatext.io/datasets). Metatext NLP Database.         221. "The Examiner – SpamClickBait Catalogue" (https://www.kaggle.co
     Retrieved 26 October 2020.                                                      m/therohk/examine-the-examiner).
207. Kim, Byung Joo (2012). "A Classifier for Big Data" (https://link.sprin     222. "A Million News Headlines" (https://www.kaggle.com/therohk/millio
     ger.com/chapter/10.1007/978-3-642-32692-9_63). Convergence                      n-headlines).
     and Hybrid Information Technology. Communications in Computer
                                                                                223. "One Week of Global News Feeds" (https://www.kaggle.com/theroh
     and Information Science. Vol. 310. pp. 505–512. doi:10.1007/978-3-
                                                                                     k/global-news-week).
     642-32692-9_63 (https://doi.org/10.1007%2F978-3-642-32692-9_6
     3). ISBN 978-3-642-32691-2.                                                224. Kulkarni, Rohit (2018), Reuters News-Wire Archive, Harvard
                                                                                     Dataverse, doi:10.7910/DVN/XDB74W (https://doi.org/10.7910%2F
208. Pérezgonzález, Jose D.; Gilbey, Andrew (2011). "Predicting Skytrax              DVN%2FXDB74W)
     airport rankings from customer reviews" (https://www.ingentaconnec
     t.com/content/hsp/cam/2011/00000005/00000004/art00007).                    225. "IrishTimes – the Waxy-Wany News" (https://www.kaggle.com/thero
     Journal of Airport Management. 5 (4): 335–339.                                  hk/ireland-historical-news).
209. Loh, Wei-Yin, and Yu-Shan Shih. "Split selection methods for               226. "News Headlines Dataset For Sarcasm Detection" (https://kaggle.c
     classification trees (http://www3.stat.sinica.edu.tw/statistica/oldpdf/A        om/rmisra/news-headlines-dataset-for-sarcasm-detection).
     7n41.pdf)." Statistica sinica(1997): 815–840.                                   kaggle.com. Retrieved 27 April 2019.
210. Lim, Tjen-Sien; Loh, Wei-Yin; Shih, Yu-Shan (2000). "A comparison          227. Klimt, Bryan, and Yiming Yang. "Introducing the Enron Corpus (http
     of prediction accuracy, complexity, and training time of thirty-three           s://bklimt.com/papers/2004_klimt_ceas.pdf)." CEAS. 2004.
     old and new classification algorithms". Machine Learning. 40 (3):          228. Kossinets, Gueorgi; Kleinberg, Jon; Watts, Duncan (2008). "The
     203–228. doi:10.1023/a:1007608224229 (https://doi.org/10.1023%2                 Structure of Information Pathways in a Social Communication
     Fa%3A1007608224229). S2CID 17030953 (https://api.semanticsch                    Network". arXiv:0806.3201 (https://arxiv.org/abs/0806.3201)
     olar.org/CorpusID:17030953).                                                    [physics.soc-ph (https://arxiv.org/archive/physics.soc-ph)].
211. Kiet Van Nguyen, Vu Duc Nguyen, Phu X. V. Nguyen, Tham T. H.               229. Androutsopoulos, Ion; Koutsias, John; Chandrinos, Konstantinos V.;
     Truong, Ngan Luu-Thuy Nguyen. "UIT-VSFC: Vietnamese                             Paliouras, George; Spyropoulos, Constantine D. (2000). "An
     Students’ Feedback Corpus for Sentiment Analysis (https://ieeexplo              evaluation of Naive Bayesian anti-spam filtering". In Potamias, G.;
     re.ieee.org/document/8573337)                                                   Moustakis, V.; van Someren, M. (eds.). Proceedings of the
212. Ho, Vong Anh; Nguyen, Duong Huynh-Cong; Nguyen, Danh                            Workshop on Machine Learning in the New Information Age. 11th
     Hoang; Pham, Linh Thi-Van; Nguyen, Duc-Vu; Nguyen, Kiet Van;                    European Conference on Machine Learning, Barcelona, Spain.
     Nguyen, Ngan Luu-Thuy (2020). "Emotion Recognition for                          Vol. 11. pp. 9–17. arXiv:cs/0006013 (https://arxiv.org/abs/cs/000601
     Vietnamese Social Media Text" (https://link.springer.com/chapter/1              3). Bibcode:2000cs........6013A (https://ui.adsabs.harvard.edu/abs/2
     0.1007/978-981-15-6168-9_27). Computational Linguistics.                        000cs........6013A).
     Communications in Computer and Information Science. Vol. 1215.             230. Bratko, Andrej; et al. (2006). "Spam filtering using statistical data
     pp. 319–333. arXiv:1911.09339 (https://arxiv.org/abs/1911.09339).               compression models" (http://www.jmlr.org/papers/volume7/bratko06
     doi:10.1007/978-981-15-6168-9_27 (https://doi.org/10.1007%2F978                 a/bratko06a.pdf) (PDF). The Journal of Machine Learning
     -981-15-6168-9_27). ISBN 978-981-15-6167-2. S2CID 208202333                     Research. 7: 2673–2698.
     (https://api.semanticscholar.org/CorpusID:208202333).                      231. Almeida, Tiago A., José María G. Hidalgo, and Akebo Yamakami.
213. Nhung Thi-Hong Nguyen, Phuong Ha-Dieu Phan, Luan Thanh                          "Contributions to the study of SMS spam filtering: new collection
     Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen (24 April 2021).                  and results (http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
     "Vietnamese Open-domain Complaint Detection in E-Commerce                       doceng11.pdf)."Proceedings of the 11th ACM symposium on
     Websites". arXiv:2104.11969 (https://arxiv.org/abs/2104.11969)                  Document engineering. ACM, 2011.
     [cs.CL (https://arxiv.org/archive/cs.CL)].                                 232. Delany; Jane, Sarah; Buckley, Mark; Greene, Derek (2012). "SMS
214. Phu Gia Hoang, Canh Duc Luu, Khanh Quoc Tran, Kiet Van                          spam filtering: methods and data" (https://arrow.dit.ie/cgi/viewconten
     Nguyen, Ngan Luu-Thuy Nguyen (26 January 2023). "ViHOS: Hate                    t.cgi?article=1022&context=scschcomart). Expert Systems with
     Speech Spans Detection for Vietnamese". arXiv:2301.10186 (http                  Applications. 39 (10): 9899–9908. doi:10.1016/j.eswa.2012.02.053
     s://arxiv.org/abs/2301.10186) [cs.CL (https://arxiv.org/archive/cs.C            (https://doi.org/10.1016%2Fj.eswa.2012.02.053). S2CID 15546924
     L)].                                                                            (https://api.semanticscholar.org/CorpusID:15546924).
215. Dermouche, Mohamed; Velcin, Julien; Khouas, Leila; Loudcher,               233. Joachims, Thorsten. A Probabilistic Analysis of the Rocchio
     Sabine (2014). "A Joint Model for Topic-Sentiment Evolution over                Algorithm with TFIDF for Text Categorization (https://apps.dtic.mil/dt
     Time". 2014 IEEE International Conference on Data Mining. IEEE.                 ic/tr/fulltext/u2/a307731.pdf). No. CMU-CS-96-118. Carnegie-
     pp. 773–778. doi:10.1109/icdm.2014.82 (https://doi.org/10.1109%2                mellon univ pittsburgh pa dept of computer science, 1996.
     Ficdm.2014.82). ISBN 978-1-4799-4302-9.                                    234. Dimitrakakis, Christos, and Samy Bengio. Online Policy Adaptation
216. Rose, Tony; Stevenson, Mark; Whitehead, Miles (2002). "The                      for Ensemble Algorithms (https://infoscience.epfl.ch/record/82788/fil
     Reuters Corpus Volume 1-from Yesterday's News to Tomorrow's                     es/rr02-28.pdf). No. EPFL-REPORT-82788. IDIAP, 2002.
     Language Resources" (https://web.archive.org/web/201908060150              235. Dooms, S. et al. "Movietweetings: a movie rating dataset collected
     15/https://pdfs.semanticscholar.org/3e4b/dc7f8904c58f8fce1993892                from twitter, 2013. Available from
     99ec1ed8e1226.pdf) (PDF). LREC. 2. S2CID 9239414 (https://api.s                 https://github.com/sidooms/MovieTweetings."
     emanticscholar.org/CorpusID:9239414). Archived from the original           236. RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-
     (https://pdfs.semanticscholar.org/3e4b/dc7f8904c58f8fce199389299                Miller, Erik (2017). "Twitter100k: A Real-world Dataset for Weakly
     ec1ed8e1226.pdf) (PDF) on 6 August 2019.                                        Supervised Cross-Media Retrieval". arXiv:1703.06618 (https://arxiv.
                                                                                     org/abs/1703.06618) [cs.CV (https://arxiv.org/archive/cs.CV)].
                                                                                237. "huyt16/Twitter100k" (https://github.com/huyt16/Twitter100k).
                                                                                     GitHub. Retrieved 26 March 2018.
238. Go, Alec; Bhayani, Richa; Huang, Lei (2009). "Twitter sentiment          256. Sordoni, Alessandro; Galley, Michel; Auli, Michael; Brockett, Chris;
     classification using distant supervision". CS224N Project Report,             Ji, Yangfeng; Mitchell, Margaret; Nie, Jian-Yun; Gao, Jianfeng;
     Stanford. 1: 12.                                                              Dolan, Bill (2015). "A Neural Network Approach to Context-
239. Chikersal, Prerna, Soujanya Poria, and Erik Cambria. "SeNTU:                  Sensitive Generation of Conversational Responses".
     sentiment analysis of tweets by combining a rule-based classifier             arXiv:1506.06714 (https://arxiv.org/abs/1506.06714) [cs.CL (https://a
     with supervised learning (https://www.aclweb.org/anthology/S15-21             rxiv.org/archive/cs.CL)].
     08)." Proceedings of the International Workshop on Semantic              257. Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET
     Evaluation, SemEval. 2015.                                                    corpus (2005–2011) Edmonton, AB: University of Alberta
240. Zafarani, Reza, and Huan Liu. "Social computing data repository at            (downloaded from
     ASU." School of Computing, Informatics and Decision Systems                   http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.d
     Engineering, Arizona State University (2009).                            258. KAN, M. (2011, January). NUS Short Message Service (SMS)
241. Bisgin, Halil, Nitin Agarwal, and Xiaowei Xu. "Investigating                  Corpus. Retrieved from
     homophily in online social networks (http://www.academia.edu/dow              http://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/
     nload/3746109/4191a533.pdf)." Web Intelligence and Intelligent                Archived (https://web.archive.org/web/20180629055042/http://www.
     Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International                    comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/) 29 June
     Conference on. Vol. 1. IEEE, 2010.                                            2018 at the Wayback Machine
242. McAuley, Julian J.; Leskovec, Jure. "Learning to Discover Social         259. Stuck_In_the_Matrix. (2015, July 3). I have every publicly available
     Circles in Ego Networks". NIPS. 2012: 2012.                                   Reddit comment for research. ~ 1.7 billion comments @ 250 GB
                                                                                   compressed. Any interest in this? [Original post]. Message posted to
243. Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko (2014). "Network-based
     statistical comparison of citation topology of bibliographic                  https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_pu
     databases" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC417829          260. Lowe, Ryan; Pow, Nissan; Serban, Iulian; Pineau, Joelle (2015).
     2). Scientific Reports. 4 (6496): 6496. arXiv:1502.05061 (https://arxi        "The Ubuntu Dialogue Corpus: A Large Dataset for Research in
     v.org/abs/1502.05061). Bibcode:2014NatSR...4E6496S (https://ui.a              Unstructured Multi-Turn Dialogue Systems". arXiv:1506.08909 (http
     dsabs.harvard.edu/abs/2014NatSR...4E6496S).                                   s://arxiv.org/abs/1506.08909) [cs.CL (https://arxiv.org/archive/cs.C
     doi:10.1038/srep06496 (https://doi.org/10.1038%2Fsrep06496).                  L)].
     PMC 4178292 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178           261. Jason Williams Antoine Raux Matthew Henderson, "[1] (https://ww
     292). PMID 25263231 (https://pubmed.ncbi.nlm.nih.gov/25263231).               w.microsoft.com/en-us/research/publication/the-dialog-state-trackin
244. Abdulla, N., et al. "Arabic sentiment analysis: Corpus-based and              g-challenge-series-a-review/)", Dialogue & Discourse | April 2016 .
     lexicon-based." Proceedings of the IEEE conference on Applied            262. Hoppe, Travis (16 December 2021), The-Pile-FreeLaw (https://githu
     Electrical Engineering and Computing Technologies (AEECT).                    b.com/thoppe/The-Pile-FreeLaw), retrieved 11 January 2023
     2013.                                                                    263. Zheng, Lucia; Guha, Neel; Anderson, Brandon R.; Henderson,
245. Abooraig, Raddad, et al. "On the automatic categorization of Arabic           Peter; Ho, Daniel E. (21 June 2021). "When does pretraining help?"
     articles based on their political orientation (https://www.researchgat        (https://dx.doi.org/10.1145/3462757.3466088). Proceedings of the
     e.net/profile/Shadi_Alzubi/publication/324487844_Automatic_categ              Eighteenth International Conference on Artificial Intelligence and
     orization_of_Arabic_articles_based_on_their_political_orientation/li          Law. New York, NY, USA: ACM: 159–168.
     nks/5c1201c9299bf139c7549e1a/Automatic-categorization-of-Arabi                doi:10.1145/3462757.3466088 (https://doi.org/10.1145%2F346275
     c-articles-based-on-their-political-orientation.pdf)." Third                  7.3466088). ISBN 9781450385268. S2CID 233296302 (https://api.s
     International Conference on Informatics Engineering and                       emanticscholar.org/CorpusID:233296302).
     Information Science (ICIEIS2014). 2014.                                  264. "pile-of-law/pile-of-law · Datasets at Hugging Face" (https://huggingf
246. Kawala, François, et al. "Prédictions d'activité dans les réseaux             ace.co/datasets/pile-of-law/pile-of-law). huggingface.co. 4 July
     sociaux en ligne (https://hal.archives-ouvertes.fr/hal-00881395/docu          2022. Retrieved 11 January 2023.
     ment)." 4ième conférence sur les modèles et l'analyse des réseaux:       265. "About | Caselaw Access Project" (https://case.law/about/).
     Approches mathématiques et informatiques. 2013.                               case.law. Retrieved 11 January 2023.
247. Sabharwal, Ashish; Samulowitz, Horst; Tesauro, Gerald (2015).            266. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S.
     "Selecting Near-Optimal Learners via Incremental Data Allocation".            Gerber and L. E. Barnes, "HDLTex: Hierarchical Deep Learning for
     arXiv:1601.00024 (https://arxiv.org/abs/1601.00024) [cs.LG (https://a         Text Classification", 2017 16th IEEE International Conference on
     rxiv.org/archive/cs.LG)].                                                     Machine Learning and Applications (ICMLA), pp. 364–371. doi:
248. Xu et al. "SemEval-2015 Task 1: Paraphrase and Semantic                       10.1109/ICMLA.2017.0-134 (https://doi.org/10.1109/ICMLA.2017.0-
     Similarity in Twitter (PIT) (https://www.aclweb.org/anthology/S15-20          134)
     01)" Proceedings of the 9th International Workshop on Semantic           267. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S.
     Evaluation. 2015.                                                             Gerber and L. E. Barnes, "Web of Science Dataset",
249. Xu et al. "Extracting Lexically Divergent Paraphrases from Twitter (h         doi:10.17632/9rw3vkcfy4.6 (https://doi.org/10.17632%2F9rw3vkcfy
     ttps://transacl.org/ojs/index.php/tacl/article/viewFile/498/64)"              4.6)
     Transactions of the Association for Computational (TACL). 2014.          268. Galgani, Filippo, Paul Compton, and Achim Hoffmann. "Combining
250. Middleton, Stuart E; Middleton, Lee; Modafferi, Stefano (2014).               different summarization techniques for legal text (https://www.aclwe
     "Real-Time Crisis Mapping of Natural Disasters Using Social                   b.org/anthology/W12-0515)." Proceedings of the Workshop on
     Media" (https://eprints.soton.ac.uk/370581/1/ieee-is2014.pdf)                 Innovative Hybrid Approaches to the Processing of Textual Data.
     (PDF). IEEE Intelligent Systems. 29 (2): 9–17.                                Association for Computational Linguistics, 2012.
     doi:10.1109/MIS.2013.126 (https://doi.org/10.1109%2FMIS.2013.12          269. Nagwani, N. K. (2015). "Summarizing large text collection using
     6). S2CID 15139204 (https://api.semanticscholar.org/CorpusID:151              topic modeling and clustering based on MapReduce framework" (ht
     39204).                                                                       tps://doi.org/10.1186%2Fs40537-015-0020-5). Journal of Big Data.
251. "geoparsepy" (https://pypi.org/project/geoparsepy). 2016. Python              2 (1): 1–18. doi:10.1186/s40537-015-0020-5 (https://doi.org/10.118
     PyPI library                                                                  6%2Fs40537-015-0020-5).
252. Gupta, Aakash (5 December 2020). "Dutch social media collection"         270. Schler, Jonathan; et al. (2006). "Effects of Age and Gender on
     (http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/            Blogging" (https://www.aaai.org/Papers/Symposia/Spring/2006/SS-
     MTPTL7). doi:10.5072/FK2/MTPTL7 (https://doi.org/10.5072%2FF                  06-03/SS06-03-039.pdf) (PDF). AAAI Spring Symposium:
     K2%2FMTPTL7). {{cite journal}}: ; Check |url= value                           Computational Approaches to Analyzing Weblogs. 6.
     (help)                                                                   271. Anand, Pranav, et al. "Believe Me-We Can Do This! Annotating
253. "Streamlit" (https://huggingface.co/datasets/viewer/?dataset=dutch_           Persuasive Acts in Blog Text."Computational Models of Natural
     social). huggingface.co. Retrieved 18 December 2020.                          Argument. 2011.
254. "Dutch Social media collection" (https://kaggle.com/skylord/dutch-t      272. Traud, Amanda L., Peter J. Mucha, and Mason A. Porter. "Social
     weets). kaggle.com. Retrieved 18 December 2020.                               structure of Facebook networks." Physica A: Statistical Mechanics
255. Forsyth, E., Lin, J., & Martell, C. (2008, June 25). The NPS Chat             and its Applications391.16 (2012): 4165–4180.
     Corpus. Retrieved from http://faculty.nps.edu/cmartell/NPSChat.htm
273. Richard, Emile; Savalle, Pierre-Andre; Vayatis, Nicolas (2012).           292. "DSL Corpus Collection" (http://ttg.uni-saarland.de/resources/DSLC
     "Estimation of Simultaneously Sparse and Low Rank Matrices".                   C/). ttg.uni-saarland.de. Retrieved 22 September 2017.
     arXiv:1206.6474 (https://arxiv.org/abs/1206.6474) [cs.DS (https://arx     293. "Urban Dictionary Words and Definitions" (https://www.kaggle.com/t
     iv.org/archive/cs.DS)].                                                        herohk/urban-dictionary-words-dataset).
274. Richardson, Matthew; Burges, Christopher JC; Renshaw, Erin                294. H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F.
     (2013). "MCTest: A Challenge Dataset for the Open-Domain                       Laforest, E. Simperl, "T-REx: A Large Scale Alignment of Natural
     Machine Comprehension of Text" (https://www.aclweb.org/antholog                Language with Knowledge Base Triples (https://www.aclweb.org/an
     y/D13-1020). EMNLP. 1.                                                         thology/L18-1544)", Proceedings of the Eleventh International
275. Weston, Jason; Bordes, Antoine; Chopra, Sumit; Rush, Alexander                 Conference on Language Resources and Evaluation (LREC-2018).
     M.; Bart van Merriënboer; Joulin, Armand; Mikolov, Tomas (2015).          295. Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy,
     "Towards AI-Complete Question Answering: A Set of Prerequisite                 Omer; Bowman, Samuel R. (2018). "GLUE: A Multi-Task
     Toy Tasks". arXiv:1502.05698 (https://arxiv.org/abs/1502.05698)                Benchmark and Analysis Platform for Natural Language
     [cs.AI (https://arxiv.org/archive/cs.AI)].                                     Understanding". arXiv:1804.07461 (https://arxiv.org/abs/1804.0746
276. Marcus, Mitchell P.; Ann Marcinkiewicz, Mary; Santorini, Beatrice              1) [cs.CL (https://arxiv.org/archive/cs.CL)].
     (1993). "Building a large annotated corpus of English: The Penn           296. "Computers Are Learning to Read—But They're Still Not So Smart"
     Treebank" (http://repository.upenn.edu/cgi/viewcontent.cgi?article=1           (https://www.wired.com/story/computers-are-learning-to-read-but-th
     246&context=cis_reports). Computational Linguistics. 19 (2): 313–              eyre-still-not-so-smart/). Wired. Retrieved 29 December 2019.
     330.                                                                      297. "GLUE Benchmark" (https://gluebenchmark.com/).
277. Collins, Michael (2003). "Head-driven statistical models for natural           gluebenchmark.com. Retrieved 25 February 2019.
     language parsing" (https://doi.org/10.1162%2F0891201033227533             298. Quan, Hoang Lam; Quang, Duy Le; Van Kiet, Nguyen; Ngan, Luu-
     56). Computational Linguistics. 29 (4): 589–637.                               Thuy Nguyen. "UIT-ViIC: A Dataset for the First Evaluation on
     doi:10.1162/089120103322753356 (https://doi.org/10.1162%2F089
                                                                                    Vietnamese Image Captioning" (https://www.springerprofessional.d
     120103322753356).
                                                                                    e/uit-viic-a-dataset-for-the-first-evaluation-on-vietnamese-image-/18
278. Guyon, Isabelle, et al., eds. Feature extraction: foundations and              612672).
     applications (https://books.google.com/books?id=FOTzBwAAQBAJ
                                                                               299. To, Quoc Huy; Nguyen, Van Kiet; Nguyen, Luu Thuy Ngan; Nguyen,
     &q=DEXTER). Vol. 207. Springer, 2008.                                          Gia Tuan Anh (2020). "Gender Prediction Based on Vietnamese
279. Lin, Yuri, et al. "Syntactic annotations for the google books ngram            Names with Machine Learning Techniques". Proceedings of the 4th
     corpus (https://www.aclweb.org/anthology/P/P12/P12-3029.pdf)."                 International Conference on Natural Language Processing and
     Proceedings of the ACL 2012 system demonstrations. Association                 Information Retrieval. pp. 55–60. arXiv:2010.10852 (https://arxiv.or
     for Computational Linguistics, 2012.                                           g/abs/2010.10852). doi:10.1145/3443279.3443309 (https://doi.org/1
280. Krishnamoorthy, Niveda; et al. (2013). "Generating Natural-                    0.1145%2F3443279.3443309). ISBN 9781450377607.
     Language Video Descriptions Using Text-Mined Knowledge" (http                  S2CID 224814110 (https://api.semanticscholar.org/CorpusID:22481
     s://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/download/645                  4110).
     4/7204). AAAI. 1.                                                         300. Nguyen, Luan Thanh; Van Nguyen, Kiet; Nguyen, Ngan Luu-Thuy
281. Luyckx, Kim, and Walter Daelemans. "Personae: a Corpus for                     (18 March 2021). "Constructive and Toxic Speech Detection for
     Author and Personality Prediction from Text (http://www.academia.e             Open-Domain Social Media Comments in Vietnamese". Advances
     du/download/30766398/759.pdf)." LREC. 2008.                                    and Trends in Artificial Intelligence. Artificial Intelligence Practices.
282. Solorio, Thamar, Ragib Hasan, and Mainul Mizan. "A case study of               Lecture Notes in Computer Science. Vol. 12798. pp. 572–583.
     sockpuppet detection in wikipedia (https://www.aclweb.org/antholog             arXiv:2103.10069 (https://arxiv.org/abs/2103.10069).
     y/W13-1107)." Workshop on Language Analysis in Social Media                    doi:10.1007/978-3-030-79457-6_49 (https://doi.org/10.1007%2F978
     (LASM) at NAACL HLT. 2013.                                                     -3-030-79457-6_49). ISBN 978-3-030-79456-9. S2CID 232269671
                                                                                    (https://api.semanticscholar.org/CorpusID:232269671).
283. "Pushshift Files" (https://files.pushshift.io/). files.pushshift.io.
     Retrieved 12 January 2023.                                                301. M. Versteegh, R. Thiollière, T. Schatz, X.-N. Cao, X. Anguera, A.
                                                                                    Jansen, and E. Dupoux (2015). "The Zero Resource Speech
284. Baumgartner, Jason; Zannettou, Savvas; Keegan, Brian; Squire,
                                                                                    Challenge 2015," in INTERSPEECH-2015.
     Megan; Blackburn, Jeremy (23 January 2020). "The Pushshift
     Reddit Dataset". arXiv:2001.08435 (https://arxiv.org/abs/2001.0843        302. M. Versteegh, X. Anguera, A. Jansen, and E. Dupoux, (2016). "The
     5) [cs.SI (https://arxiv.org/archive/cs.SI)].                                  Zero Resource Speech Challenge 2015: Proposed Approaches
285. Ciarelli, Patrick Marques, and Elias Oliveira. "Agglomeration and              and Results (https://core.ac.uk/download/pdf/82574050.pdf)," in
     elimination of terms for dimensionality reduction (https://ieeexplore.i        SLTU-2016.
     eee.org/abstract/document/5364970/)." Intelligent Systems Design          303. Sakar, Betul Erdogdu; et al. (2013). "Collection and analysis of a
     and Applications, 2009. ISDA'09. Ninth International Conference                Parkinson speech dataset with multiple types of sound recordings".
     on. IEEE, 2009.                                                                IEEE Journal of Biomedical and Health Informatics. 17 (4): 828–
286. Zhou, Mingyuan, Oscar Hernan Madrid Padilla, and James G.                      834. doi:10.1109/jbhi.2013.2245674 (https://doi.org/10.1109%2Fjbh
     Scott. "Priors for random count matrices derived from a family of              i.2013.2245674). PMID 25055311 (https://pubmed.ncbi.nlm.nih.gov/
                                                                                    25055311). S2CID 15491516 (https://api.semanticscholar.org/Corp
     negative binomial processes." Journal of the American Statistical
                                                                                    usID:15491516).
     Association just-accepted (2015): 00–00.
                                                                               304. Zhao, Shunan, et al. "Automatic detection of expressed emotion in
287. Kotzias, Dimitrios, et al. "From group to individual labels using deep
                                                                                    Parkinson's disease (https://www.researchgate.net/profile/Steven_L
     features (http://datalab.ics.uci.edu/papers/kdd2015_dimitris.pdf)."
     Proceedings of the 21th ACM SIGKDD International Conference on                 ivingstone2/publication/267623907_Automatic_detection_of_expre
     Knowledge Discovery and Data Mining. ACM, 2015.                                ssed_emotion_in_Parkinson%27s_Disease/links/5453af1d0cf26d5
                                                                                    090a54cfe/Automatic-detection-of-expressed-emotion-in-Parkinson
288. Ning, Yue; Muthiah, Sathappan; Rangwala, Huzefa; Ramakrishnan,                 s-Disease.pdf)." Acoustics, Speech and Signal Processing
     Naren (2016). "Modeling Precursors for Event Forecasting via                   (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
     Nested Multi-Instance Learning". arXiv:1602.08033 (https://arxiv.or
     g/abs/1602.08033) [cs.SI (https://arxiv.org/archive/cs.SI)].              305. Used in: Hammami, Nacereddine, and Mouldi Bedda. "Improved
                                                                                    tree model for Arabic speech recognition." Computer Science and
289. Buza, Krisztian. "Feedback prediction for blogs (http://www.cs.bme.            Information Technology (ICCSIT), 2010 3rd IEEE International
     hu/~buza/pdfs/gfkl2012_blogs.pdf)."Data analysis, machine                      Conference on. Vol. 5. IEEE, 2010.
     learning and knowledge discovery. Springer International
                                                                               306. Maaten, Laurens. "Learning discriminative fisher kernels (https://lvd
     Publishing, 2014. 145–152.
                                                                                    maaten.github.io/publications/papers/ICML_2011.pdf)."
290. Soysal, Ömer M (2015). "Association rule mining with mostly                    Proceedings of the 28th International Conference on Machine
     associated sequential patterns". Expert Systems with Applications.             Learning (ICML-11). 2011.
     42 (5): 2582–2592. doi:10.1016/j.eswa.2014.10.049 (https://doi.org/
                                                                               307. Cole, Ronald, and Mark Fanty. "Spoken letter recognition (https://w
     10.1016%2Fj.eswa.2014.10.049).
                                                                                    ww.aclweb.org/anthology/H90-1075)." Proc. Third DARPA Speech
291. Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning,                 and Natural Language Workshop. 1990.
     Christopher D. (2015). "A large annotated corpus for learning
     natural language inference". arXiv:1508.05326 (https://arxiv.org/ab
     s/1508.05326) [cs.CL (https://arxiv.org/archive/cs.CL)].
308. Chapelle, Olivier; Sindhwani, Vikas; Keerthi, Sathiya S. (2008).            324. Esposito, Roberto; Radicioni, Daniele P. (2009). "Carpediem:
     "Optimization techniques for semi-supervised support vector                      Optimizing the viterbi algorithm and applications to supervised
     machines" (http://www.jmlr.org/papers/volume9/chapelle08a/chapel                 sequential learning" (http://www.jmlr.org/papers/volume10/esposito
     le08a.pdf) (PDF). The Journal of Machine Learning Research. 9:                   09a/esposito09a.pdf) (PDF). The Journal of Machine Learning
     203–233.                                                                         Research. 10: 1851–1880.
309. Kudo, Mineichi; Toyama, Jun; Shimbo, Masaru (1999).                         325. Sourati, Jamshid; et al. (2016). "Classification Active Learning
     "Multidimensional curve classification using passing-through                     Based on Mutual Information" (https://doi.org/10.3390%2Fe180200
     regions". Pattern Recognition Letters. 20 (11): 1103–1111.                       51). Entropy. 18 (2): 51. Bibcode:2016Entrp..18...51S (https://ui.ads
     Bibcode:1999PaReL..20.1103K (https://ui.adsabs.harvard.edu/abs/                  abs.harvard.edu/abs/2016Entrp..18...51S). doi:10.3390/e18020051
     1999PaReL..20.1103K). CiteSeerX 10.1.1.46.2515 (https://citeseer                 (https://doi.org/10.3390%2Fe18020051).
     x.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2515).                          326. Salamon, Justin; Jacoby, Christopher; Bello, Juan Pablo. "A dataset
     doi:10.1016/s0167-8655(99)00077-x (https://doi.org/10.1016%2Fs0                  and taxonomy for urban sound research (https://www.researchgate.
     167-8655%2899%2900077-x).                                                        net/profile/Justin_Salamon/publication/267269056_A_Dataset_and
310. Jaeger, Herbert; et al. (2007). "Optimization and applications of                _Taxonomy_for_Urban_Sound_Research/links/544936af0cf2f6388
     echo state networks with leaky-integrator neurons". Neural                       0810a84/A-Dataset-and-Taxonomy-for-Urban-Sound-Research.pd
     Networks. 20 (3): 335–352. doi:10.1016/j.neunet.2007.04.016 (http                f)." Proceedings of the ACM International Conference on
     s://doi.org/10.1016%2Fj.neunet.2007.04.016). PMID 17517495 (http                 Multimedia. ACM, 2014.
     s://pubmed.ncbi.nlm.nih.gov/17517495).                                      327. Lagrange, Mathieu; Lafay, Grégoire; Rossignol, Mathias; Benetos,
311. Tsanas, Athanasios; et al. (2010). "Accurate telemonitoring of                   Emmanouil; Roebel, Axel (2015). "An evaluation framework for
     Parkinson's disease progression by noninvasive speech tests" (htt                event detection using a morphological model of acoustic scenes".
     p://precedings.nature.com/documents/3920/version/1). IEEE                        arXiv:1502.00141 (https://arxiv.org/abs/1502.00141) [stat.ML (https://
     Transactions on Biomedical Engineering (Submitted manuscript).                   arxiv.org/archive/stat.ML)].
     57 (4): 884–893. doi:10.1109/tbme.2009.2036000 (https://doi.org/1           328. Gemmeke, Jort F., et al. "Audio Set: An ontology and human-
     0.1109%2Ftbme.2009.2036000). PMID 19932995 (https://pubmed.                      labeled dataset for audio events." IEEE International Conference on
     ncbi.nlm.nih.gov/19932995). S2CID 7382779 (https://api.semantics                 Acoustics, Speech, and Signal Processing (ICASSP). 2017.
     cholar.org/CorpusID:7382779).
                                                                                 329. "Watch out, birders: Artificial intelligence has learned to spot birds
312. Clifford, Gari D.; Clifton, David (2012). "Wireless technology in                from their songs" (https://www.science.org/content/article/watch-out-
     disease management and medicine". Annual Review of Medicine.                     birders-artificial-intelligence-has-learned-spot-birds-their-songs).
     63: 479–492. doi:10.1146/annurev-med-051210-114650 (https://doi.                 Science | AAAS. 18 July 2018. Retrieved 22 July 2018.
     org/10.1146%2Fannurev-med-051210-114650). PMID 22053737 (h
                                                                                 330. "Bird Audio Detection challenge" (http://machine-listening.eecs.qmu
     ttps://pubmed.ncbi.nlm.nih.gov/22053737).                                        l.ac.uk/bird-audio-detection-challenge/). Machine Listening Lab at
313. Zue, Victor; Seneff, Stephanie; Glass, James (1990). "Speech                     Queen Mary University. 3 May 2016. Retrieved 22 July 2018.
     database development at MIT: TIMIT and beyond". Speech                      331. Wichern, Gordon; Antognini, Joe; Flynn, Michael; Licheng Richard
     Communication. 9 (4): 351–356. doi:10.1016/0167-6393(90)90010-                   Zhu; McQuinn, Emmett; Crow, Dwight; Manilow, Ethan; Jonathan Le
     7 (https://doi.org/10.1016%2F0167-6393%2890%2990010-7).
                                                                                      Roux (2019). "WHAM!: Extending Speech Separation to Noisy
314. Kapadia, Sadik, Valtcho Valtchev, and S. J. Young. "MMI training for             Environments". arXiv:1907.01160 (https://arxiv.org/abs/1907.01160)
     continuous phoneme recognition on the TIMIT database."                           [cs.SD (https://arxiv.org/archive/cs.SD)].
     Acoustics, Speech, and Signal Processing, 1993. ICASSP-93.,
                                                                                 332. Drossos, K., Lipping, S., and Virtanen, T. "Clotho: An Audio
     1993 IEEE International Conference on. Vol. 2. IEEE, 1993.                       Captioning Dataset" IEEE International Conference on Acoustics,
315. Halabi, Nawar (2016). Modern Standard Arabic Phonetics for                       Speech, and Signal Processing (ICASSP). 2020.
     Speech Synthesis (http://en.arabicspeechcorpus.com/Nawar%20H                333. Drossos, K., Lipping, S., and Virtanen, T. (2019). Clotho dataset
     alabi%20PhD%20Thesis%20Revised.pdf) (PDF) (PhD Thesis).                          (Version 1.0) [Data set]. Zenodo.
     University of Southampton, School of Electronics and Computer
                                                                                      http://doi.org/10.5281/zenodo.3490684 (https://doi.org/10.5281/zeno
     Science.                                                                         do.3490684)
316. Ardila, Rosana; Branson, Megan; Davis, Kelly; Henretty, Michael;
                                                                                 334. The CAIDA UCSD Dataset on the Witty Worm – 19–24 March 2004,
     Kohler, Michael; Meyer, Josh; Morais, Reuben; Saunders, Lindsay;
                                                                                      http://www.caida.org/data/passive/witty_worm_dataset.xml
     Tyers, Francis M.; Weber, Gregor (13 December 2019). "Common
     Voice: A Massively-Multilingual Speech Corpus".                             335. Chen, Zesheng, and Chuanyi Ji. "Optimal worm-scanning method
     arXiv:1912.06670v2 (https://arxiv.org/abs/1912.06670v2) [cs.CL (htt              using vulnerable-host distributions (https://web.archive.org/web/201
     ps://arxiv.org/archive/cs.CL)].                                                  90806022753/https://pdfs.semanticscholar.org/672e/7be9499fef9a7
                                                                                      ff6b131b650a4de7614aae8.pdf)." International Journal of Security
317. "The LJ Speech Dataset" (https://keithito.com/LJ-Speech-Dataset).
                                                                                      and Networks 2.1–2 (2007): 71–80.
     keithito.com. Retrieved 13 April 2022.
                                                                                 336. Kachuee, Mohamad, et al. "Cuff-less high-accuracy calibration-free
318. Zhou, Fang, Q. Claire, and Ross D. King. "Predicting the
                                                                                      blood pressure estimation using pulse transit time (http://download.
     geographical origin of music (https://ieeexplore.ieee.org/abstract/do
                                                                                      xuebalib.com/533elteIDEwk.pdf)." Circuits and Systems (ISCAS),
     cument/7023456/)." Data Mining (ICDM), 2014 IEEE International                   2015 IEEE International Symposium on. IEEE, 2015.
     Conference on. IEEE, 2014.
                                                                                 337. PhysioBank, PhysioToolkit. "PhysioNet: components of a new
319. Saccenti, Edoardo; Camacho, José (2015). "On the use of the                      research resource for complex physiologic signals." Circulation.
     observation‐wise k‐fold operation in PCA cross‐validation". Journal
                                                                                      v101 i23. e215-e220.
     of Chemometrics. 29 (8): 467–478. doi:10.1002/cem.2726 (https://d
     oi.org/10.1002%2Fcem.2726). hdl:10481/55302 (https://hdl.handle.            338. Vergara, Alexander; et al. (2012). "Chemical gas sensor drift
     net/10481%2F55302). S2CID 62248957 (https://api.semanticschola                   compensation using classifier ensembles". Sensors and Actuators
     r.org/CorpusID:62248957).                                                        B: Chemical. 166: 320–329. doi:10.1016/j.snb.2012.01.074 (https://
                                                                                      doi.org/10.1016%2Fj.snb.2012.01.074).
320. Bertin-Mahieux, Thierry, et al. "The million song dataset." ISMIR
     2011: Proceedings of the 12th International Society for Music               339. Korotcenkov, G.; Cho, B. K. (2014). "Engineering approaches to
     Information Retrieval Conference, 24–28 October 2011, Miami,                     improvement of conductometric gas sensor parameters. Part 2:
     Florida. University of Miami, 2011.                                              Decrease of dissipated (consumable) power and improvement
                                                                                      stability and reliability". Sensors and Actuators B: Chemical. 198:
321. Henaff, Mikael; et al. (2011). "Unsupervised learning of sparse
                                                                                      316–341. doi:10.1016/j.snb.2014.03.069 (https://doi.org/10.1016%2
     features for scalable audio classification" (https://archives.ismir.net/i        Fj.snb.2014.03.069).
     smir2011/paper/000128.pdf) (PDF). ISMIR. 11.
                                                                                 340. Quinlan, John R (1992). "Learning with continuous classes" (https://
322. Rafii, Zafar (2017). "Music". MUSDB18 – a corpus for music
                                                                                      sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf)
     separation. doi:10.5281/zenodo.1117372 (https://doi.org/10.5281%                 (PDF). 5th Australian Joint Conference on Artificial Intelligence. 92.
     2Fzenodo.1117372).
                                                                                 341. Merz, Christopher J.; Pazzani, Michael J. (1999). "A principal
323. Defferrard, Michaël; Benzi, Kirell; Vandergheynst, Pierre; Bresson,              components approach to combining regression estimates" (https://d
     Xavier (6 December 2016). "FMA: A Dataset For Music Analysis".                   oi.org/10.1023%2Fa%3A1007507221352). Machine Learning. 36
     arXiv:1612.01840 (https://arxiv.org/abs/1612.01840) [cs.SD (https://
                                                                                      (1–2): 9–32. doi:10.1023/a:1007507221352 (https://doi.org/10.102
     arxiv.org/archive/cs.SD)].                                                       3%2Fa%3A1007507221352).
342. Torres-Sospedra, Joaquin, et al. "UJIIndoorLoc-Mag: A new               353. Nathan, Ran; et al. (2012). "Using tri-axial acceleration data to
     database for magnetic field-based localization problems." Indoor             identify behavioral modes of free-ranging animals: general
     Positioning and Indoor Navigation (IPIN), 2015 International                 concepts and tools illustrated for griffon vultures" (https://www.ncbi.
     Conference on. IEEE, 2015.                                                   nlm.nih.gov/pmc/articles/PMC3284320). The Journal of
343. Berkvens, Rafael, Maarten Weyn, and Herbert Peremans. "Mean                  Experimental Biology. 215 (6): 986–996. doi:10.1242/jeb.058602 (h
     Mutual Information of Probabilistic Wi-Fi Localization (https://www.r        ttps://doi.org/10.1242%2Fjeb.058602). PMC 3284320 (https://www.
     esearchgate.net/profile/Raf_Berkvens/publication/284154212_Mea               ncbi.nlm.nih.gov/pmc/articles/PMC3284320). PMID 22357592 (http
     n_Mutual_Information_of_Probabilistic_Wi-Fi_Localization/links/56            s://pubmed.ncbi.nlm.nih.gov/22357592).
     4c6b7508aeab8ed5e92fcb.pdf)." Indoor Positioning and Indoor             354. Anguita, Davide, et al. "Human activity recognition on smartphones
     Navigation (IPIN), 2015 International Conference on. Banff,                  using a multiclass hardware-friendly support vector machine (http
     Canada: IPIN. 2015.                                                          s://upcommons.upc.edu/bitstream/handle/2117/101769/IWAAL201
344. Paschke, Fabian, et al. "Sensorlose Zustandsüberwachung an                   2.pdf)." Ambient assisted living and home care. Springer Berlin
     Synchronmotoren."Proceedings. 23. Workshop Computational                     Heidelberg, 2012. 216–223.
     Intelligence, Dortmund, 5.-6. Dezember 2013. KIT Scientific             355. Su, Xing; Tong, Hanghang; Ji, Ping (2014). "Activity recognition
     Publishing, 2013.                                                            with smartphone sensors". Tsinghua Science and Technology. 19
345. Lessmeier, Christian, et al. "Data Acquisition and Signal Analysis           (3): 235–249. doi:10.1109/tst.2014.6838194 (https://doi.org/10.110
     from Measured Motor Currents for Defect Detection in                         9%2Ftst.2014.6838194). S2CID 62751498 (https://api.semanticsch
     Electromechanical Drive Systems (https://www.researchgate.net/pr             olar.org/CorpusID:62751498).
     ofile/Olaf_Enge-Rosenblatt/publication/264441239_Data_Acquisiti         356. Kadous, Mohammed Waleed. Temporal classification: Extending
     on_and_Signal_Analysis_from_Measured_Motor_Currents_for_De                   the classification paradigm to multivariate time series (https://pdfs.s
     fect_Detection_in_Electromechanical_Drive_Systems/links/53df97               emanticscholar.org/4bad/c3f0ad169ed9ec7d073375e9b168fa9f6c8
     e90cf2a768e49bb3b9.pdf)."                                                    f.pdf). Diss. The University of New South Wales, 2002.
346. Ugulino, Wallace, et al. "Wearable computing: Accelerometers’ data      357. Graves, Alex, et al. "Connectionist temporal classification: labelling
     classification of body postures and movements (http://groupware.se           unsegmented sequence data with recurrent neural networks (https://
     condlab.inf.puc-rio.br/public/papers/2012.Ugulino.WearableComput             mediatum.ub.tum.de/doc/1292048/file.pdf)." Proceedings of the
     ing.HAR.Classifier.RIBBON.pdf) Archived (https://web.archive.org/w           23rd international conference on Machine learning. ACM, 2006.
     eb/20200925222906/http://groupware.secondlab.inf.puc-rio.br/publi       358. Velloso, Eduardo, et al. "Qualitative activity recognition of weight
     c/papers/2012.Ugulino.WearableComputing.HAR.Classifier.RIBBO                 lifting exercises (https://www.perceptualui.org/publications/velloso1
     N.pdf) 25 September 2020 at the Wayback Machine." Advances in                3_ah.pdf)."Proceedings of the 4th Augmented Human International
     Artificial Intelligence-SBIA 2012. Springer Berlin Heidelberg, 2012.         Conference. ACM, 2013.
     52–61.
                                                                             359. Mortazavi, Bobak Jack, et al. "Determining the single best axis for
347. Schneider, Jan; et al. (2015). "Augmenting the senses: a review on           exercise repetition recognition and counting on smartwatches (htt
     sensor-based learning support" (https://www.ncbi.nlm.nih.gov/pmc/            p://www.thehabitslab.com/assets/papers/28.pdf) Archived (https://w
     articles/PMC4367401). Sensors. 15 (2): 4097–4133.                            eb.archive.org/web/20211104043511/https://www.thehabitslab.co
     Bibcode:2015Senso..15.4097S (https://ui.adsabs.harvard.edu/abs/2             m/assets/papers/28.pdf) 4 November 2021 at the Wayback
     015Senso..15.4097S). doi:10.3390/s150204097 (https://doi.org/10.             Machine." Wearable and Implantable Body Sensor Networks
     3390%2Fs150204097). PMC 4367401 (https://www.ncbi.nlm.nih.go                 (BSN), 2014 11th International Conference on. IEEE, 2014.
     v/pmc/articles/PMC4367401). PMID 25679313 (https://pubmed.ncb
                                                                             360. Sapsanis, Christos, et al. "Improving EMG based Classification of
     i.nlm.nih.gov/25679313).
                                                                                  basic hand movements using EMD (https://www.researchgate.net/pr
348. Madeo, Renata CB, Clodoaldo AM Lima, and Sarajane M. Peres.                  ofile/Christos_Sapsanis/publication/257602303_Improving_EMG_b
     "Gesture unit segmentation using support vector machines:                    ased_classification_of_basic_hand_movements_using_EMD/links/
     segmenting gestures from rest positions (https://tarjomefa.com/wp-c          56dfb7fd08ae979addef64a2/Improving-EMG-based-classification-o
     ontent/uploads/2016/11/5781-English.pdf)." Proceedings of the                f-basic-hand-movements-using-EMD.pdf)." Engineering in Medicine
     28th Annual ACM Symposium on Applied Computing. ACM, 2013.                   and Biology Society (EMBC), 2013 35th Annual International
349. Lun, Roanna; Zhao, Wenbing (2015). "A survey of applications and             Conference of the IEEE. IEEE, 2013.
     human motion recognition with Microsoft Kinect" (https://engagedsc      361. Andrianesis, Konstantinos; Tzes, Anthony (2015). "Development
     holarship.csuohio.edu/cgi/viewcontent.cgi?article=1417&context=e             and control of a multifunctional prosthetic hand with shape memory
     nece_facpub). International Journal of Pattern Recognition and               alloy actuators". Journal of Intelligent & Robotic Systems. 78 (2):
     Artificial Intelligence. 29 (5): 1555008.                                    257–289. doi:10.1007/s10846-014-0061-6 (https://doi.org/10.100
     doi:10.1142/s0218001415550083 (https://doi.org/10.1142%2Fs021                7%2Fs10846-014-0061-6). S2CID 207174078 (https://api.semantic
     8001415550083).                                                              scholar.org/CorpusID:207174078).
350. Theodoridis, Theodoros, and Huosheng Hu. "Action classification         362. Banos, Oresti; et al. (2014). "Dealing with the effects of sensor
     of 3d human models using dynamic ANNs for mobile robot                       displacement in wearable activity recognition" (https://www.ncbi.nl
     surveillance (https://cswww.sx.ac.uk/staff/hhu/Papers/ROBIO07-66.            m.nih.gov/pmc/articles/PMC4118358). Sensors. 14 (6): 9995–
     pdf) Archived (https://web.archive.org/web/20190806015015/https://           10023. Bibcode:2014Senso..14.9995B (https://ui.adsabs.harvard.e
     cswww.sx.ac.uk/staff/hhu/Papers/ROBIO07-66.pdf) 6 August 2019                du/abs/2014Senso..14.9995B). doi:10.3390/s140609995 (https://do
     at the Wayback Machine."Robotics and Biomimetics, 2007. ROBIO                i.org/10.3390%2Fs140609995). PMC 4118358 (https://www.ncbi.nl
     2007. IEEE International Conference on. IEEE, 2007.                          m.nih.gov/pmc/articles/PMC4118358). PMID 24915181 (https://pub
351. Etemad, Seyed Ali, and Ali Arya. "3D human action recognition and            med.ncbi.nlm.nih.gov/24915181).
     style transformation using resilient backpropagation neural             363. Stisen, Allan, et al. "Smart Devices are Different: Assessing and
     networks." Intelligent Computing and Intelligent Systems, 2009.              MitigatingMobile Sensing Heterogeneities for Activity Recognition
     ICIS 2009. IEEE International Conference on. Vol. 4. IEEE, 2009. (h          (https://www.researchgate.net/profile/Henrik_Blunck/publication/30
     ttps://ieeexplore.ieee.org/abstract/document/5357690/)                       1464144_Smart_Devices_are_Different_Assessing_and_Mitigatin
352. Altun, Kerem; Barshan, Billur; Tunçel, Orkun (2010). "Comparative            gMobile_Sensing_Heterogeneities_for_Activity_Recognition/links/
     study on classifying human activities with miniature inertial and            585a4c4908ae3852d256f186.pdf)."Proceedings of the 13th ACM
     magnetic sensors". Pattern Recognition. 43 (10): 3605–3620.                  Conference on Embedded Networked Sensor Systems. ACM,
     Bibcode:2010PatRe..43.3605A (https://ui.adsabs.harvard.edu/abs/2             2015.
     010PatRe..43.3605A). doi:10.1016/j.patcog.2010.04.019 (https://do       364. Bhattacharya, Sourav, and Nicholas D. Lane. "From Smart to Deep:
     i.org/10.1016%2Fj.patcog.2010.04.019). hdl:11693/11947 (https://h            Robust Activity Recognition on Smartwatches using Deep Learning
     dl.handle.net/11693%2F11947).                                                (http://discovery.ucl.ac.uk/1503672/1/deepwatch_wristsense.pdf)."
                                                                             365. Bacciu, Davide; et al. (2014). "An experimental characterization of
                                                                                  reservoir computing in ambient assisted living applications". Neural
                                                                                  Computing and Applications. 24 (6): 1451–1464.
                                                                                  doi:10.1007/s00521-013-1364-4 (https://doi.org/10.1007%2Fs0052
                                                                                  1-013-1364-4). hdl:11568/237959 (https://hdl.handle.net/11568%2F
                                                                                  237959). S2CID 14124013 (https://api.semanticscholar.org/CorpusI
                                                                                  D:14124013).
366. Palumbo, Filippo; Barsocchi, Paolo; Gallicchio, Claudio; Chessa,          379. Kaya, Heysem, Pınar Tüfekci, and Fikret S. Gürgen. "Local and
     Stefano; Micheli, Alessio (2013). "Multisensor Data Fusion for                 global learning methods for predicting power of a combined gas &
     Activity Recognition Based on Reservoir Computing" (https://link.sp            steam turbine." International conference on emerging trends in
     ringer.com/chapter/10.1007/978-3-642-41043-7_3). Evaluating AAL                computer and electronics engineering (ICETCEE'2012), Dubai.
     Systems Through Competitive Benchmarking. Communications in                    2012.
     Computer and Information Science. Vol. 386. pp. 24–35.                    380. Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2014).
     doi:10.1007/978-3-642-41043-7_3 (https://doi.org/10.1007%2F978-                "Searching for exotic particles in high-energy physics with deep
     3-642-41043-7_3). ISBN 978-3-642-41042-0.                                      learning". Nature Communications. 5: 2014. arXiv:1402.4735 (http
367. Reiss, Attila, and Didier Stricker. "Introducing a new benchmarked             s://arxiv.org/abs/1402.4735). Bibcode:2014NatCo...5.4308B (https://
     dataset for activity monitoring (https://www.researchgate.net/profile/         ui.adsabs.harvard.edu/abs/2014NatCo...5.4308B).
     Attila_Reiss/publication/235348485_Introducing_a_New_Benchma                   doi:10.1038/ncomms5308 (https://doi.org/10.1038%2Fncomms530
     rked_Dataset_for_Activity_Monitoring/links/00b7d5309d19ca43460                 8). PMID 24986233 (https://pubmed.ncbi.nlm.nih.gov/24986233).
     00000/Introducing-a-New-Benchmarked-Dataset-for-Activity-Monito                S2CID 195953 (https://api.semanticscholar.org/CorpusID:195953).
     ring.pdf)."Wearable Computers (ISWC), 2012 16th International             381. Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2015).
     Symposium on. IEEE, 2012.                                                      "Enhanced Higgs Boson to τ+ τ− Search with Deep Learning".
368. Roggen, Daniel, et al. "OPPORTUNITY: Towards opportunistic                     Physical Review Letters. 114 (11): 111801. arXiv:1410.3469 (http
     activity and context recognition systems (https://infoscience.epfl.ch/r        s://arxiv.org/abs/1410.3469). Bibcode:2015PhRvL.114k1801B (http
     ecord/138648/files/RoggenFoCaHoFaTrLuPiBaKuFeHoRiChMi09.                       s://ui.adsabs.harvard.edu/abs/2015PhRvL.114k1801B).
     pdf)." World of Wireless, Mobile and Multimedia Networks &                     doi:10.1103/physrevlett.114.111801 (https://doi.org/10.1103%2Fph
     Workshops, 2009. WoWMoM 2009. IEEE International Symposium                     ysrevlett.114.111801). PMID 25839260 (https://pubmed.ncbi.nlm.ni
     on a. IEEE, 2009.                                                              h.gov/25839260). S2CID 2339142 (https://api.semanticscholar.org/
369. Kurz, Marc, et al. "Dynamic quantification of activity recognition             CorpusID:2339142).
     capabilities in opportunistic systems (https://www.researchgate.net/      382. Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.;
     profile/Marc_Kurz/publication/220271166_Dynamic_Quantification                 Kégl, B.; Rousseau, D. (2015). "The Higgs Machine Learning
     _of_Activity_Recognition_Capabilities_in_Opportunistic_Systems/li              Challenge" (https://higgsml.lal.in2p3.fr/). Journal of Physics:
     nks/09e4150f66b480c97a000000/Dynamic-Quantification-of-Activit                 Conference Series. 664 (7): 072015.
     y-Recognition-Capabilities-in-Opportunistic-Systems.pdf)."                     Bibcode:2015JPhCS.664g2015A (https://ui.adsabs.harvard.edu/ab
     Vehicular Technology Conference (VTC Spring), 2011 IEEE 73rd.                  s/2015JPhCS.664g2015A). doi:10.1088/1742-6596/664/7/072015
     IEEE, 2011.                                                                    (https://doi.org/10.1088%2F1742-6596%2F664%2F7%2F072015).
370. Sztyler, Timo, and Heiner Stuckenschmidt. "On-body localization of        383. Baldi, Pierre; Cranmer, Kyle; Faucett, Taylor; Sadowski, Peter;
     wearable devices: an investigation of position-aware activity                  Whiteson, Daniel (2016). "Parameterized neural networks for high-
     recognition (https://sensor.informatik.uni-mannheim.de/publications/           energy physics". The European Physical Journal C. 76 (5): 235.
     presentation/percom2016.pdf)." Pervasive Computing and                         arXiv:1601.07913 (https://arxiv.org/abs/1601.07913).
     Communications (PerCom), 2016 IEEE International Conference                    Bibcode:2016EPJC...76..235B (https://ui.adsabs.harvard.edu/abs/2
     on. IEEE, 2016.                                                                016EPJC...76..235B). doi:10.1140/epjc/s10052-016-4099-4 (https://
371. Zhi, Ying Xuan; Lukasik, Michelle; Li, Michael H.; Dolatabadi,                 doi.org/10.1140%2Fepjc%2Fs10052-016-4099-4).
     Elham; Wang, Rosalie H.; Taati, Babak (2018). "Automatic                       S2CID 254108545 (https://api.semanticscholar.org/CorpusID:25410
     Detection of Compensation During Robotic Stroke Rehabilitation                 8545).
     Therapy" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788403).          384. Ortigosa, I.; Lopez, R.; Garcia, J. "A neural networks approach to
     IEEE Journal of Translational Engineering in Health and Medicine.              residuary resistance of sailing yachts prediction". Proceedings of
     6: 2100107. doi:10.1109/JTEHM.2017.2780836 (https://doi.org/10.1               the International Conference on Marine Engineering MARINE.
     109%2FJTEHM.2017.2780836). ISSN 2168-2372 (https://www.worl                    2007.
     dcat.org/issn/2168-2372). PMC 5788403 (https://www.ncbi.nlm.nih.          385. Gerritsma, J., R. Onnink, and A. Versluis.Geometry, resistance and
     gov/pmc/articles/PMC5788403). PMID 29404226 (https://pubmed.n                  stability of the delft systematic yacht hull series. Delft University of
     cbi.nlm.nih.gov/29404226).                                                     Technology, 1981.
372. Dolatabadi, Elham; Zhi, Ying Xuan; Ye, Bing; Coahran, Marge;              386. Liu, Huan, and Hiroshi Motoda. Feature extraction, construction and
     Lupinacci, Giorgia; Mihailidis, Alex; Wang, Rosalie; Taati, Babak              selection: A data mining perspective (https://books.google.com/book
     (23 May 2017). The toronto rehab stroke pose dataset to detect                 s?id=zi_0EdWW5fYC). Springer Science & Business Media, 1998.
     compensation during stroke rehabilitation therapy. ACM. pp. 375–
                                                                               387. Reich, Yoram. Converging to Ideal Design Knowledge by Learning.
     381. doi:10.1145/3154862.3154925 (https://doi.org/10.1145%2F31                 [Carnegie Mellon University], Engineering Design Research
     54862.3154925). ISBN 9781450363631. S2CID 24581930 (https://
                                                                                    Center, 1989.
     api.semanticscholar.org/CorpusID:24581930).
                                                                               388. Todorovski, Ljupčo; Džeroski, Sašo (1999). "Experiments in Meta-
373. "Toronto Rehab Stroke Pose Dataset" (https://www.kaggle.com/der
                                                                                    level Learning with ILP" (https://link.springer.com/chapter/10.1007/9
     ekdb/toronto-robot-stroke-posture-dataset).
                                                                                    78-3-540-48247-5_11). Principles of Data Mining and Knowledge
374. Jung, Merel M.; Poel, Mannes; Poppe, Ronald; Heylen, Dirk K. J. (1             Discovery. Lecture Notes in Computer Science. Vol. 1704. pp. 98–
     March 2017). "Automatic recognition of touch gestures in the corpus            106. doi:10.1007/978-3-540-48247-5_11 (https://doi.org/10.1007%2
     of social touch". Journal on Multimodal User Interfaces. 11 (1): 81–           F978-3-540-48247-5_11). ISBN 978-3-540-66490-1.
     96. doi:10.1007/s12193-016-0232-9 (https://doi.org/10.1007%2Fs1                S2CID 39382993 (https://api.semanticscholar.org/CorpusID:393829
     2193-016-0232-9). ISSN 1783-8738 (https://www.worldcat.org/issn/               93).
     1783-8738). S2CID 1802116 (https://api.semanticscholar.org/Corpu
                                                                               389. Wang, Yong. A new approach to fitting linear models in high
     sID:1802116).                                                                  dimensional spaces (http://www.cs.waikato.ac.nz/~ml/publications/2
375. Jung, M.M. (Merel) (1 June 2016). "Corpus of Social Touch (CoST)"              000/thesis.pdf). Diss. The University of Waikato, 2000.
     (https://data.4tu.nl/articles/dataset/Corpus_of_Social_Touch_CoST         390. Kibler, Dennis; Aha, David W.; Albert, Marc K. (1989). "Instance‐
     _/12696869). University of Twente. doi:10.4121/uuid:5ef62345-
                                                                                    based prediction of real‐valued attributes" (https://escholarship.org/
     3b3e-479c-8e1d-c922748c9b29 (https://doi.org/10.4121%2Fuuid%
                                                                                    uc/item/68f860zb). Computational Intelligence. 5 (2): 51–57.
     3A5ef62345-3b3e-479c-8e1d-c922748c9b29).                                       doi:10.1111/j.1467-8640.1989.tb00315.x (https://doi.org/10.1111%2
376. Aeberhard, S., D. Coomans, and O. De Vel. "Comparison of                       Fj.1467-8640.1989.tb00315.x). S2CID 40800413 (https://api.seman
     classifiers in high dimensional settings." Dept. Math. Statist., James         ticscholar.org/CorpusID:40800413).
     Cook Univ., North Queensland, Australia, Tech. Rep 92-02 (1992).          391. Palmer, Christopher R., and Christos Faloutsos. "Electricity based
377. Basu, Sugato. "Semi-supervised clustering with limited background              external similarity of categorical attributes (http://citeseerx.ist.psu.ed
     knowledge (http://www.aaai.org/Papers/AAAI/2004/AAAI04-138.pd                  u/viewdoc/download?doi=10.1.1.469.989&rep=rep1&type=pdf)."
     f)." AAAI. 2004.                                                               Advances in Knowledge Discovery and Data Mining. Springer
378. Tüfekci, Pınar (2014). "Prediction of full load electrical power output        Berlin Heidelberg, 2003. 486–500.
     of a base load operated combined cycle power plant using machine
     learning methods". International Journal of Electrical Power &
     Energy Systems. 60: 126–140. doi:10.1016/j.ijepes.2014.02.027 (ht
     tps://doi.org/10.1016%2Fj.ijepes.2014.02.027).
392. Tsanas, Athanasios; Xifara, Angeliki (2012). "Accurate quantitative      404. Sikora, Marek; Wróbel, Łukasz (2010). "Application of rule induction
     estimation of energy performance of residential buildings using               algorithms for analysis of data collected by seismic hazard
     statistical machine learning tools". Energy and Buildings. 49: 560–           monitoring systems in coal mines" (https://www.infona.pl/resource/b
     567. doi:10.1016/j.enbuild.2012.03.003 (https://doi.org/10.1016%2F            wmeta1.element.baztech-article-BPZ5-0008-0008). Archives of
     j.enbuild.2012.03.003).                                                       Mining Sciences. 55 (1): 91–114.
393. De Wilde, Pieter (2014). "The gap between predicted and                  405. Sikora, Marek, and Beata Sikora. "Rough natural hazards
     measured energy performance of buildings: A framework for                     monitoring." Rough Sets: Selected Methods and Applications in
     investigation". Automation in Construction. 41: 40–49.                        Management and Engineering. Springer London, 2012. 163–179.
     doi:10.1016/j.autcon.2014.02.009 (https://doi.org/10.1016%2Fj.autc       406. Addor, Nans; Newman, Andrew J.; Mizukami, Naoki; Clark, Martyn
     on.2014.02.009).                                                              P. (20 October 2017). "The CAMELS data set: catchment attributes
394. Brooks, Thomas F., D. Stuart Pope, and Michael A. Marcolini. Airfoil          and meteorology for large-sample studies" (https://hess.copernicus.
     self-noise and prediction (https://ntrs.nasa.gov/archive/nasa/casi.ntr        org/articles/21/5293/2017/). Hydrology and Earth System Sciences.
     s.nasa.gov/19890016302.pdf). Vol. 1218. National Aeronautics and              21 (10): 5293–5313. Bibcode:2017HESS...21.5293A (https://ui.adsa
     Space Administration, Office of Management, Scientific and                    bs.harvard.edu/abs/2017HESS...21.5293A). doi:10.5194/hess-21-
     Technical Information Division, 1989.                                         5293-2017 (https://doi.org/10.5194%2Fhess-21-5293-2017).
395. Draper, David. "Assessment and propagation of model uncertainty               ISSN 1607-7938 (https://www.worldcat.org/issn/1607-7938).
     (http://www2.denizyuret.com/ref/draper/assessment-and-propagatio         407. Newman, A. J.; Clark, M. P.; Sampson, K.; Wood, A.; Hay, L. E.;
     n.pdf)." Journal of the Royal Statistical Society, Series B                   Bock, A.; Viger, R. J.; Blodgett, D.; Brekke, L.; Arnold, J. R.; Hopson,
     (Methodological) (1995): 45–97.                                               T. (14 January 2015). "Development of a large-sample watershed-
396. Lavine, Michael (1991). "Problems in extrapolation illustrated with           scale hydrometeorological data set for the contiguous USA: data
     space shuttle O-ring data". Journal of the American Statistical               set characteristics and assessment of regional variability in
     Association. 86 (416): 919–921.                                               hydrologic model performance" (https://hess.copernicus.org/articles/
     doi:10.1080/01621459.1991.10475132 (https://doi.org/10.1080%2F                19/209/2015/). Hydrology and Earth System Sciences. 19 (1): 209–
     01621459.1991.10475132).                                                      223. Bibcode:2015HESS...19..209N (https://ui.adsabs.harvard.edu/
397. Wang, Jun, Bei Yu, and Les Gasser. "Concept tree based clustering             abs/2015HESS...19..209N). doi:10.5194/hess-19-209-2015 (https://
                                                                                   doi.org/10.5194%2Fhess-19-209-2015). ISSN 1607-7938 (https://w
     visualization with shaded similarity matrices (https://www.researchg
     ate.net/profile/Bei_Yu2/publication/228407462_Concept_Tree_Bas                ww.worldcat.org/issn/1607-7938).
     ed_Ordering_for_Shaded_Similarity_Matrix/links/00b7d5175607b6            408. Alvarez-Garreton, Camila; Mendoza, Pablo A.; Boisier, Juan Pablo;
     1d2e000000.pdf)." Data Mining, 2002. ICDM 2003. Proceedings.                  Addor, Nans; Galleguillos, Mauricio; Zambrano-Bigiarini, Mauricio;
     2002 IEEE International Conference on. IEEE, 2002.                            Lara, Antonio; Puelma, Cristóbal; Cortes, Gonzalo; Garreaud, Rene;
                                                                                   McPhee, James (13 November 2018). "The CAMELS-CL dataset:
398. Pettengill, Gordon H.; Ford, Peter G.; Johnson, William T. K.; Raney,
                                                                                   catchment attributes and meteorology for large sample studies –
     R. Keith; Soderblom, Laurence A. (1991). "Magellan: Radar
     Performance and Data Products" (https://www.science.org/doi/abs/1             Chile dataset" (https://hess.copernicus.org/articles/22/5817/2018/).
                                                                                   Hydrology and Earth System Sciences. 22 (11): 5817–5846.
     0.1126/science.252.5003.260). Science. 252 (5003): 260–265.
                                                                                   Bibcode:2018HESS...22.5817A (https://ui.adsabs.harvard.edu/abs/
     Bibcode:1991Sci...252..260P (https://ui.adsabs.harvard.edu/abs/19
     91Sci...252..260P). doi:10.1126/science.252.5003.260 (https://doi.o           2018HESS...22.5817A). doi:10.5194/hess-22-5817-2018 (https://do
     rg/10.1126%2Fscience.252.5003.260). PMID 17769272 (https://pub                i.org/10.5194%2Fhess-22-5817-2018). ISSN 1607-7938 (https://ww
                                                                                   w.worldcat.org/issn/1607-7938). S2CID 133955609 (https://api.sem
     med.ncbi.nlm.nih.gov/17769272). S2CID 43398343 (https://api.sem
     anticscholar.org/CorpusID:43398343).                                          anticscholar.org/CorpusID:133955609).
                                                                              409. Chagas, Vinícius B. P.; Chaffe, Pedro L. B.; Addor, Nans; Fan,
399. Aharonian, F.; et al. (2008). "Energy spectrum of cosmic-ray
     electrons at TeV energies". Physical Review Letters. 101 (26):                Fernando M.; Fleischmann, Ayan S.; Paiva, Rodrigo C. D.;
     261104. arXiv:0811.3894 (https://arxiv.org/abs/0811.3894).                    Siqueira, Vinícius A. (8 September 2020). "CAMELS-BR:
                                                                                   hydrometeorological time series and landscape attributes for 897
     Bibcode:2008PhRvL.101z1104A (https://ui.adsabs.harvard.edu/ab
                                                                                   catchments in Brazil" (https://essd.copernicus.org/articles/12/2075/2
     s/2008PhRvL.101z1104A). doi:10.1103/PhysRevLett.101.261104
     (https://doi.org/10.1103%2FPhysRevLett.101.261104).                           020/). Earth System Science Data. 12 (3): 2075–2096.
                                                                                   Bibcode:2020ESSD...12.2075C (https://ui.adsabs.harvard.edu/abs/
     hdl:2440/51450 (https://hdl.handle.net/2440%2F51450).
                                                                                   2020ESSD...12.2075C). doi:10.5194/essd-12-2075-2020 (https://do
     PMID 19437632 (https://pubmed.ncbi.nlm.nih.gov/19437632).
     S2CID 41850528 (https://api.semanticscholar.org/CorpusID:418505               i.org/10.5194%2Fessd-12-2075-2020). ISSN 1866-3516 (https://ww
                                                                                   w.worldcat.org/issn/1866-3516). S2CID 234737197 (https://api.sem
     28).
                                                                                   anticscholar.org/CorpusID:234737197).
400. Bock, R. K.; et al. (2004). "Methods for multidimensional event
     classification: a case study using images from a Cherenkov               410. Coxon, Gemma; Addor, Nans; Bloomfield, John P.; Freer, Jim; Fry,
                                                                                   Matt; Hannaford, Jamie; Howden, Nicholas J. K.; Lane, Rosanna;
     gamma-ray telescope". Nuclear Instruments and Methods in
     Physics Research Section A: Accelerators, Spectrometers,                      Lewis, Melinda; Robinson, Emma L.; Wagener, Thorsten (12
     Detectors and Associated Equipment. 516 (2): 511–528.                         October 2020). "CAMELS-GB: hydrometeorological time series and
                                                                                   landscape attributes for 671 catchments in Great Britain" (https://ess
     Bibcode:2004NIMPA.516..511B (https://ui.adsabs.harvard.edu/abs/
     2004NIMPA.516..511B). doi:10.1016/j.nima.2003.08.157 (https://do              d.copernicus.org/articles/12/2459/2020/). Earth System Science
     i.org/10.1016%2Fj.nima.2003.08.157).                                          Data. 12 (4): 2459–2483. Bibcode:2020ESSD...12.2459C (https://ui.
                                                                                   adsabs.harvard.edu/abs/2020ESSD...12.2459C). doi:10.5194/essd-
401. Li, Jinyan; et al. (2004). "Deeps: A new instance-based lazy                  12-2459-2020 (https://doi.org/10.5194%2Fessd-12-2459-2020).
     discovery and classification system" (https://doi.org/10.1023%2Fb%            ISSN 1866-3516 (https://www.worldcat.org/issn/1866-3516).
     3Amach.0000011804.08528.7d). Machine Learning. 54 (2): 99–                    S2CID 226192657 (https://api.semanticscholar.org/CorpusID:22619
     124. doi:10.1023/b:mach.0000011804.08528.7d (https://doi.org/10.              2657).
     1023%2Fb%3Amach.0000011804.08528.7d).
                                                                              411. Fowler, Keirnan J. A.; Acharya, Suwash Chandra; Addor, Nans;
402. Villaescusa-Navarro, Francisco; al., et (2022). "The CAMELS                   Chou, Chihchung; Peel, Murray C. (6 August 2021). "CAMELS-
     Multifield Data Set: Learning the Universe's Fundamental                      AUS: hydrometeorological time series and landscape attributes for
     Parameters with Artificial Intelligence". The Astrophysical Journal           222 catchments in Australia" (https://essd.copernicus.org/articles/1
     Supplement Series. 259 (2): 61. arXiv:2109.10915 (https://arxiv.org/          3/3847/2021/). Earth System Science Data. 13 (8): 3847–3867.
     abs/2109.10915). Bibcode:2022ApJS..259...61V (https://ui.adsabs.              Bibcode:2021ESSD...13.3847F (https://ui.adsabs.harvard.edu/abs/
     harvard.edu/abs/2022ApJS..259...61V). doi:10.3847/1538-                       2021ESSD...13.3847F). doi:10.5194/essd-13-3847-2021 (https://do
     4365/ac5ab0 (https://doi.org/10.3847%2F1538-4365%2Fac5ab0).                   i.org/10.5194%2Fessd-13-3847-2021). ISSN 1866-3516 (https://ww
     S2CID 237604997 (https://api.semanticscholar.org/CorpusID:23760               w.worldcat.org/issn/1866-3516). S2CID 238796784 (https://api.sem
     4997).                                                                        anticscholar.org/CorpusID:238796784).
403. Siebert, Lee, and Tom Simkin. "Volcanoes of the world: an
     illustrated catalog of Holocene volcanoes and their eruptions."
     (2014).
412. Klingler, Christoph; Schulz, Karsten; Herrnegger, Mathew (16              425. Donchin, Emanuel; Spencer, Kevin M.; Wijesinghe, Ranjith (2000).
     September 2021). "LamaH-CE: LArge-SaMple DAta for Hydrology                    "The mental prosthesis: assessing the speed of a P300-based
     and Environmental Sciences for Central Europe" (https://essd.coper             brain-computer interface". IEEE Transactions on Rehabilitation
     nicus.org/articles/13/4529/2021/). Earth System Science Data. 13               Engineering. 8 (2): 174–179. doi:10.1109/86.847808 (https://doi.org/
     (9): 4529–4565. Bibcode:2021ESSD...13.4529K (https://ui.adsabs.h               10.1109%2F86.847808). PMID 10896179 (https://pubmed.ncbi.nlm.
     arvard.edu/abs/2021ESSD...13.4529K). doi:10.5194/essd-13-4529-                 nih.gov/10896179).
     2021 (https://doi.org/10.5194%2Fessd-13-4529-2021). ISSN 1866-            426. Detrano, Robert; et al. (1989). "International application of a new
     3516 (https://www.worldcat.org/issn/1866-3516). S2CID 240533508                probability algorithm for the diagnosis of coronary artery disease".
     (https://api.semanticscholar.org/CorpusID:240533508).                          The American Journal of Cardiology. 64 (5): 304–310.
413. Yeh, I–C (1998). "Modeling of strength of high-performance                     doi:10.1016/0002-9149(89)90524-9 (https://doi.org/10.1016%2F000
     concrete using artificial neural networks". Cement and Concrete                2-9149%2889%2990524-9). PMID 2756873 (https://pubmed.ncbi.nl
     Research. 28 (12): 1797–1808. doi:10.1016/s0008-8846(98)00165-                 m.nih.gov/2756873).
     3 (https://doi.org/10.1016%2Fs0008-8846%2898%2900165-3).                  427. Bradley, Andrew P (1997). "The use of the area under the ROC
414. Zarandi, MH Fazel; et al. (2008). "Fuzzy polynomial neural                     curve in the evaluation of machine learning algorithms" (http://espac
     networks for approximation of the compressive strength of                      e.library.uq.edu.au/view/UQ:8925/pr-t.pdf) (PDF). Pattern
     concrete". Applied Soft Computing. 8 (1): 488–498.                             Recognition. 30 (7): 1145–1159. Bibcode:1997PatRe..30.1145B (ht
     Bibcode:2008ApSoC...8...79S (https://ui.adsabs.harvard.edu/abs/20              tps://ui.adsabs.harvard.edu/abs/1997PatRe..30.1145B).
     08ApSoC...8...79S). doi:10.1016/j.asoc.2007.02.010 (https://doi.org/           doi:10.1016/s0031-3203(96)00142-2 (https://doi.org/10.1016%2Fs0
     10.1016%2Fj.asoc.2007.02.010).                                                 031-3203%2896%2900142-2). S2CID 13806304 (https://api.seman
415. Yeh, I. "Modeling slump of concrete with fly ash and                           ticscholar.org/CorpusID:13806304).
     superplasticizer." Computers and Concrete5.6 (2008): 559–572.             428. Street, W. N.; Wolberg, W. H.; Mangasarian, O. L. (1993). "Nuclear
416. Gencel, Osman; et al. (2011). "Comparison of artificial neural                 feature extraction for breast tumor diagnosis" (https://www.spiedigita
     networks and general linear model approaches for the analysis of               llibrary.org/conference-proceedings-of-spie/1905/0000/Nuclear-feat
     abrasive wear of concrete". Construction and Building Materials. 25            ure-extraction-for-breast-tumor-diagnosis/10.1117/12.148698.short).
     (8): 3486–3494. doi:10.1016/j.conbuildmat.2011.03.040 (https://doi.            In Acharya, Raj S; Goldgof, Dmitry B (eds.). Biomedical Image
     org/10.1016%2Fj.conbuildmat.2011.03.040).                                      Processing and Biomedical Visualization (http://digital.library.wisc.e
                                                                                    du/1793/59692). Vol. 1905. pp. 861–870. doi:10.1117/12.148698 (ht
417. Dietterich, Thomas G., et al. "A comparison of dynamic reposing
     and tangent distance for drug activity prediction (http://papers.nips.c        tps://doi.org/10.1117%2F12.148698). S2CID 14922543 (https://api.
                                                                                    semanticscholar.org/CorpusID:14922543).
     c/paper/781-a-comparison-of-dynamic-reposing-and-tangent-distan
     ce-for-drug-activity-prediction.pdf)." Advances in Neural Information     429. Demir, Cigdem, and Bülent Yener. "Automated cancer diagnosis
     Processing Systems (1994): 216–216.                                            based on histopathological images: a systematic survey (http://cites
                                                                                    eerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.1199&rep=rep1
418. Buscema, Massimo, William J. Tastle, and Stefano Terzi. "Meta net:
                                                                                    &type=pdf)." Rensselaer Polytechnic Institute, Tech. Rep (2005).
     A new meta-classifier family (https://www.researchgate.net/profile/M
     assimo_Buscema/publication/13731626_MetaNet_The_Theory_of                 430. Abuse, Substance. "Mental Health Services Administration, Results
     _Independent_Judges/links/0deec52baf2937fc8e000000.pd                          from the 2010 National Survey on Drug Use and Health: Summary
     f)."Data Mining Applications Using Artificial Adaptive Systems.                of National Findings, NSDUH Series H-41, HHS Publication No.
     Springer New York, 2013. 141–182.                                              (SMA) 11-4658." Rockville, MD: Substance Abuse and Mental
                                                                                    Health Services Administration 201 (2011).
419. Amoradnejad, Issa; Amoradnejad, Rahimberdi; et al. (2022). "Age
     dataset: A structured general-purpose dataset on life, work, and          431. Hong, Zi-Quan; Yang, Jing-Yu (1991). "Optimal discriminant plane
     death of 1.22 million distinguished people" (http://workshop-procee            for a small number of samples and design method of classifier on
     dings.icwsm.org/abstract?id=2022_82). Workshop Proceedings of                  the plane". Pattern Recognition. 24 (4): 317–324.
     the 16th International AAAI Conference on Web and Social Media                 Bibcode:1991PatRe..24..317H (https://ui.adsabs.harvard.edu/abs/1
     (ICWSM). 3: 1–4. doi:10.36190/2022.82 (https://doi.org/10.36190%2              991PatRe..24..317H). doi:10.1016/0031-3203(91)90074-f (https://do
     F2022.82). S2CID 249668669 (https://api.semanticscholar.org/Corp               i.org/10.1016%2F0031-3203%2891%2990074-f).
     usID:249668669).                                                          432. Li, Jinyan, and Limsoon Wong. "Using rules to analyse bio-medical
420. "Age Dataset" (https://github.com/Moradnejad/AgeDataset). GitHub.              data: a comparison between C4. 5 and PCL." Advances in Web-
     7 June 2022.                                                                   Age Information Management. Springer Berlin Heidelberg, 2003.
421. "Synthetic Fundus Dataset" (https://web.archive.org/web/20211129               254–265.
     155047/http://math.unipa.it/cvalenti/fundus/). Archived from the          433. Güvenir, H. Altay, et al. "A supervised machine learning algorithm
     original (http://math.unipa.it/cvalenti/fundus/) on 29 November 2021.          for arrhythmia analysis (http://repository.bilkent.edu.tr/bitstream/han
     Retrieved 22 February 2023.                                                    dle/11693/27699/bilkent-research-paper.pdf?sequence=
422. Lo Castro, Dario; et al. (2020). "A visual framework to create                 1)."Computers in Cardiology 1997. IEEE, 1997.
     photorealistic retinal vessels for diagnosis purposes". Journal of        434. Lagus, Krista, et al. "Independent variable group analysis in
     Biomedical Informatics. 108: 103490. doi:10.1016/j.jbi.2020.103490             learning compact representations for data (http://users.ics.aalto.fi/ah
     (https://doi.org/10.1016%2Fj.jbi.2020.103490). PMID 32640292 (htt              onkela/papers/Lagus05akrr.pdf)." Proceedings of the International
     ps://pubmed.ncbi.nlm.nih.gov/32640292). S2CID 220429697 (http                  and Interdisciplinary Conference on Adaptive Knowledge
     s://api.semanticscholar.org/CorpusID:220429697).                               Representation and Reasoning (AKRR'05), T. Honkela, V.
423. Ingber, Lester (1997). "Statistical mechanics of neocortical                   Könönen, M. Pöllä, and O. Simula, Eds., Espoo, Finland. 2005.
     interactions: Canonical momenta indicatorsof                              435. Strack, Beata, et al. "Impact of HbA1c measurement on hospital
     electroencephalography". Physical Review E. 55 (4): 4578–4593.                 readmission rates: analysis of 70,000 clinical database patient
     arXiv:physics/0001052 (https://arxiv.org/abs/physics/0001052).                 records (http://downloads.hindawi.com/journals/bmri/2014/781670.
     Bibcode:1997PhRvE..55.4578I (https://ui.adsabs.harvard.edu/abs/1               pdf)." BioMed Research International 2014; 2014
     997PhRvE..55.4578I). doi:10.1103/PhysRevE.55.4578 (https://doi.o          436. Rubin, Daniel J (2015). "Hospital readmission of patients with
     rg/10.1103%2FPhysRevE.55.4578). S2CID 6390999 (https://api.se                  diabetes". Current Diabetes Reports. 15 (4): 1–9.
     manticscholar.org/CorpusID:6390999).                                           doi:10.1007/s11892-015-0584-7 (https://doi.org/10.1007%2Fs1189
424. Hoffmann, Ulrich; Vesin, Jean-Marc; Ebrahimi, Touradj; Diserens,               2-015-0584-7). PMID 25712258 (https://pubmed.ncbi.nlm.nih.gov/2
     Karin (2008). "An efficient P300-based brain–computer interface for            5712258). S2CID 3908599 (https://api.semanticscholar.org/CorpusI
     disabled subjects". Journal of Neuroscience Methods. 167 (1): 115–             D:3908599).
     125. CiteSeerX 10.1.1.352.4630 (https://citeseerx.ist.psu.edu/viewd       437. Antal, Bálint; Hajdu, András (2014). "An ensemble-based system for
     oc/summary?doi=10.1.1.352.4630).                                               automatic screening of diabetic retinopathy". Knowledge-Based
     doi:10.1016/j.jneumeth.2007.03.005 (https://doi.org/10.1016%2Fj.jn             Systems. 60 (2014): 20–27. arXiv:1410.8576 (https://arxiv.org/abs/1
     eumeth.2007.03.005). PMID 17445904 (https://pubmed.ncbi.nlm.ni                 410.8576). Bibcode:2014arXiv1410.8576A (https://ui.adsabs.harvar
     h.gov/17445904). S2CID 9648828 (https://api.semanticscholar.org/               d.edu/abs/2014arXiv1410.8576A).
     CorpusID:9648828).                                                             doi:10.1016/j.knosys.2013.12.023 (https://doi.org/10.1016%2Fj.kno
                                                                                    sys.2013.12.023). S2CID 13984326 (https://api.semanticscholar.or
                                                                                    g/CorpusID:13984326).
438. Haloi, Mrinal (2015). "Improved Microaneurysm Detection using             451. Javadi, Soroush; Mirroshandel, Seyed Abolghasem (2019). "A
     Deep Neural Networks". arXiv:1505.04424 (https://arxiv.org/abs/150             novel deep learning method for automatic assessment of human
     5.04424) [cs.CV (https://arxiv.org/archive/cs.CV)].                            sperm images". Computers in Biology and Medicine. 109: 182–194.
439. ELIE, Guillaume PATRY, Gervais GAUTHIER, Bruno LAY, Julien                     doi:10.1016/j.compbiomed.2019.04.030 (https://doi.org/10.1016%2
     ROGER, Damien. "ADCIS Download Third Party: Messidor                           Fj.compbiomed.2019.04.030). ISSN 0010-4825 (https://www.worldc
     Database" (http://www.adcis.net/en/Download-Third-Party/Messido                at.org/issn/0010-4825). PMID 31059902 (https://pubmed.ncbi.nlm.ni
     r.htmldownload.php). adcis.net. Retrieved 25 February 2018.                    h.gov/31059902). S2CID 146809768 (https://api.semanticscholar.or
                                                                                    g/CorpusID:146809768).
440. Decencière, Etienne; Zhang, Xiwei; Cazuguel, Guy; Lay, Bruno;
     Cochener, Béatrice; Trone, Caroline; Gain, Philippe; Ordonez,             452. "soroushj/mhsma-dataset: MHSMA: The Modified Human Sperm
     Richard; Massin, Pascale (26 August 2014). "Feedback on a                      Morphology Analysis Dataset" (https://github.com/soroushj/mhsma-
     Publicly Distributed Image Database: The Messidor Database" (http              dataset). github.com. Retrieved 3 May 2019.
     s://doi.org/10.5566%2Fias.1155). Image Analysis & Stereology. 33          453. Clark, David, Zoltan Schreter, and Anthony Adams. "A quantitative
     (3): 231–234. doi:10.5566/ias.1155 (https://doi.org/10.5566%2Fias.             comparison of dystal and backpropagation." Proceedings of 1996
     1155). ISSN 1854-5165 (https://www.worldcat.org/issn/1854-5165).               Australian Conference on Neural Networks. 1996.
441. Bagirov, A. M.; et al. (2003). "Unsupervised and supervised data          454. Jiang, Yuan, and Zhi-Hua Zhou. "Editing training data for kNN
     classification via nonsmooth and global optimization". Top. 11 (1):            classifiers with neural network ensemble (https://cs.nju.edu.cn/zhou
     1–75. CiteSeerX 10.1.1.1.6429 (https://citeseerx.ist.psu.edu/viewdo            zh/zhouzh.files/publication/isnn04a.pdf)." Advances in Neural
     c/summary?doi=10.1.1.1.6429). doi:10.1007/bf02578945 (https://do               Networks–ISNN 2004. Springer Berlin Heidelberg, 2004. 356–361.
     i.org/10.1007%2Fbf02578945). S2CID 14165678 (https://api.seman            455. Ontañón, Santiago, and Enric Plaza. "On similarity measures based
     ticscholar.org/CorpusID:14165678).                                             on a refinement lattice." Case-Based Reasoning Research and
442. Fung, Glenn, et al. "A fast iterative algorithm for fisher discriminant        Development. Springer Berlin Heidelberg, 2009. 240–255.
     using heterogeneous kernels (https://jinbo-bi.uconn.edu/wp-conten         456. "PLF data inventory" (https://github.com/Animal-Data-Inventory/PLF
     t/uploads/sites/2638/2018/12/icml04_kernel.pdf)."Proceedings of                DataInventory). GitHub. 5 November 2021.
     the twenty-first international conference on Machine learning. ACM,
                                                                               457. Higuera, Clara; Gardiner, Katheleen J.; Cios, Krzysztof J. (2015).
     2004.                                                                          "Self-organizing feature maps identify proteins critical to learning in
443. Quinlan, John Ross, et al. "Inductive knowledge acquisition: a case            a mouse model of down syndrome" (https://www.ncbi.nlm.nih.gov/p
     study." Proceedings of the Second Australian Conference on                     mc/articles/PMC4482027). PLOS ONE. 10 (6): e0129126.
     Applications of expert systems. Addison-Wesley Longman                         Bibcode:2015PLoSO..1029126H (https://ui.adsabs.harvard.edu/ab
     Publishing Co., Inc., 1987.                                                    s/2015PLoSO..1029126H). doi:10.1371/journal.pone.0129126 (http
444. Zhou, Zhi-Hua; Jiang, Yuan (2004). "NeC4. 5: neural ensemble                   s://doi.org/10.1371%2Fjournal.pone.0129126). PMC 4482027 (http
     based C4. 5". IEEE Transactions on Knowledge and Data                          s://www.ncbi.nlm.nih.gov/pmc/articles/PMC4482027).
     Engineering. 16 (6): 770–773. CiteSeerX 10.1.1.1.8430 (https://cites           PMID 26111164 (https://pubmed.ncbi.nlm.nih.gov/26111164).
     eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.8430).                      458. Ahmed, Md Mahiuddin; et al. (2015). "Protein dynamics associated
     doi:10.1109/tkde.2004.11 (https://doi.org/10.1109%2Ftkde.2004.1                with failed and rescued learning in the Ts65Dn mouse model of
     1). S2CID 1024861 (https://api.semanticscholar.org/CorpusID:1024               Down syndrome" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4
     861).                                                                          368539). PLOS ONE. 10 (3): e0119491.
445. Er, Orhan; et al. (2012). "An approach based on probabilistic neural           Bibcode:2015PLoSO..1019491A (https://ui.adsabs.harvard.edu/ab
     network for diagnosis of Mesothelioma's disease". Computers &                  s/2015PLoSO..1019491A). doi:10.1371/journal.pone.0119491 (http
     Electrical Engineering. 38 (1): 75–81.                                         s://doi.org/10.1371%2Fjournal.pone.0119491). PMC 4368539 (http
     doi:10.1016/j.compeleceng.2011.09.001 (https://doi.org/10.1016%2               s://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368539).
     Fj.compeleceng.2011.09.001).                                                   PMID 25793384 (https://pubmed.ncbi.nlm.nih.gov/25793384).
446. Er, Orhan, A. Çetin Tanrikulu, and Abdurrahman Abakay. "Use of            459. Langley, PAT (2014). "Trading off simplicity and coverage in
     artificial intelligence techniques for diagnosis of malignant pleural          incremental concept learning" (https://web.archive.org/web/201908
     mesothelioma (https://dergipark.org.tr/download/article-file/5452              06184005/https://www.westmont.edu/~iba/pubs/hillary-paper.pdf)
     1)."Dicle Tıp Dergisi 42.1 (2015).                                             (PDF). Machine Learning Proceedings. 1988: 73. Archived from the
447. Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (25              original (https://www.westmont.edu/~iba/pubs/hillary-paper.pdf)
     July 2017). "Vision-Based Assessment of Parkinsonism and                       (PDF) on 6 August 2019. Retrieved 6 August 2019.
     Levodopa-Induced Dyskinesia with Deep Learning Pose                       460. "Mushroom Data Set 2020" (https://mushroom.mathematik.uni-marb
     Estimation" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC621908               urg.de/). mushroom.mathematik.uni-marburg.de. Retrieved 6 April
     2). Journal of Neuroengineering and Rehabilitation. 15 (1): 97.                2021.
     arXiv:1707.09416 (https://arxiv.org/abs/1707.09416).                      461. Wagner, Dennis; Heider, Dominik; Hattab, Georges (14 April 2021).
     Bibcode:2017arXiv170709416L (https://ui.adsabs.harvard.edu/abs/                "Mushroom data creation, curation, and simulation to support
     2017arXiv170709416L). doi:10.1186/s12984-018-0446-z (https://do                classification tasks" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC
     i.org/10.1186%2Fs12984-018-0446-z). PMC 6219082 (https://www.                  8046754). Scientific Reports. 11 (1): 8134.
     ncbi.nlm.nih.gov/pmc/articles/PMC6219082). PMID 30400914 (http                 Bibcode:2021NatSR..11.8134W (https://ui.adsabs.harvard.edu/abs/
     s://pubmed.ncbi.nlm.nih.gov/30400914).                                         2021NatSR..11.8134W). doi:10.1038/s41598-021-87602-3 (https://
448. Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (May             doi.org/10.1038%2Fs41598-021-87602-3). ISSN 2045-2322 (http
     2018). "Automated assessment of levodopa-induced dyskinesia:                   s://www.worldcat.org/issn/2045-2322). PMC 8046754 (https://www.
     Evaluating the responsiveness of video-based features".                        ncbi.nlm.nih.gov/pmc/articles/PMC8046754). PMID 33854157 (http
     Parkinsonism & Related Disorders. 53: 42–45.                                   s://pubmed.ncbi.nlm.nih.gov/33854157).
     doi:10.1016/j.parkreldis.2018.04.036 (https://doi.org/10.1016%2Fj.p       462. Cortez, Paulo, and Aníbal de Jesus Raimundo Morais. "A data
     arkreldis.2018.04.036). ISSN 1353-8020 (https://www.worldcat.org/i             mining approach to predict forest fires using meteorological data."
     ssn/1353-8020). PMID 29748112 (https://pubmed.ncbi.nlm.nih.gov/                (2007).
     29748112). S2CID 13666294 (https://api.semanticscholar.org/Corp           463. Farquad, M. A. H.; Ravi, V.; Raju, S. Bapi (2010). "Support vector
     usID:13666294).                                                                regression based hybrid rule extraction methods for forecasting".
449. "Parkinson's Vision-Based Pose Estimation Dataset | Kaggle" (http              Expert Systems with Applications. 37 (8): 5577–5589.
     s://www.kaggle.com/limi44/parkinsons-visionbased-pose-estimatio                doi:10.1016/j.eswa.2010.02.055 (https://doi.org/10.1016%2Fj.eswa.
     n-dataset/home). kaggle.com. Retrieved 22 August 2018.                         2010.02.055).
450. Shannon, Paul; et al. (2003). "Cytoscape: a software environment          464. Fisher, Ronald A (1936). "The use of multiple measurements in
     for integrated models of biomolecular interaction networks" (https://          taxonomic problems". Annals of Eugenics. 7 (2): 179–188.
     www.ncbi.nlm.nih.gov/pmc/articles/PMC403769). Genome                           doi:10.1111/j.1469-1809.1936.tb02137.x (https://doi.org/10.1111%2
     Research. 13 (11): 2498–2504. doi:10.1101/gr.1239303 (https://doi.             Fj.1469-1809.1936.tb02137.x). hdl:2440/15227 (https://hdl.handle.n
     org/10.1101%2Fgr.1239303). PMC 403769 (https://www.ncbi.nlm.ni                 et/2440%2F15227).
     h.gov/pmc/articles/PMC403769). PMID 14597658 (https://pubmed.n
     cbi.nlm.nih.gov/14597658).
465. Ghahramani, Zoubin, and Michael I. Jordan. "Supervised learning           478. Muresan, Horea; Oltean, Mihai (2018). "Fruit recognition from
     from incomplete data via an EM approach (http://papers.nips.cc/pap             images using deep learning" (https://www.researchgate.net/publicat
     er/767-supervised-learning-from-incomplete-data-via-an-em-approa               ion/321475443). Acta Univ. Sapientiae, Informatica. 10 (1): 26–42.
     ch.pdf)." Advances in neural information processing systems 6.                 doi:10.2478/ausi-2018-0002 (https://doi.org/10.2478%2Fausi-2018-
     1994.                                                                          0002).
466. Mallah, Charles; Cope, James; Orwell, James (2013). "Plant leaf           479. Oltean, Mihai; Muresan, Horea (2017). "A dataset with fruit images
     classification using probabilistic integration of shape, texture and           on Kaggle" (https://www.kaggle.com/moltean/fruits).
     margin features" (https://www.researchgate.net/publication/2666323        480. Nakai, Kenta; Kanehisa, Minoru (1991). "Expert system for
     57). Signal Processing, Pattern Recognition and Applications. 5: 1.            predicting protein localization sites in gram‐negative bacteria".
467. Yahiaoui, Itheri, Olfa Mzoughi, and Nozha Boujemaa. "Leaf shape                Proteins: Structure, Function, and Bioinformatics. 11 (2): 95–110.
     descriptor for tree species identification (http://www.cmlab.csie.ntu.e        doi:10.1002/prot.340110203 (https://doi.org/10.1002%2Fprot.34011
     du.tw/~zenic/Data/Download/ICME2012/Conference/data/4711a25                    0203). PMID 1946347 (https://pubmed.ncbi.nlm.nih.gov/1946347).
     4.pdf) Archived (https://web.archive.org/web/20190806184006/htt                S2CID 27606447 (https://api.semanticscholar.org/CorpusID:276064
     p://www.cmlab.csie.ntu.edu.tw/~zenic/Data/Download/ICME2012/C                  47).
     onference/data/4711a254.pdf) 6 August 2019 at the Wayback                 481. Ling, Charles X., et al. "Decision trees with minimal costs (https://cli
     Machine." Multimedia and Expo (ICME), 2012 IEEE International                  ng.csd.uwo.ca/cs860/ICML04-Ling.pdf)." Proceedings of the twenty-
     Conference on. IEEE, 2012.                                                     first international conference on Machine learning. ACM, 2004.
468. Tan, Ming, and Larry Eshelman. "Using weighted networks to                482. Mahé, Pierre, et al. "Automatic identification of mixed bacterial
     represent classification knowledge in noisy domains (https://www.s             species fingerprints in a MALDI-TOF mass-spectrum (https://acade
     ciencedirect.com/science/article/pii/B9780934613644500189)."                   mic.oup.com/bioinformatics/article/30/9/1280/237488)."
     Proceedings of the Fifth International Conference on Machine                   Bioinformatics (2014): btu022.
     Learning. 2014.
                                                                               483. Barbano, Duane; et al. (2015). "Rapid characterization of
469. Charytanowicz, Małgorzata, et al. "Complete gradient clustering                microalgae and microalgae mixtures using matrix-assisted laser
     algorithm for features analysis of x-ray images (http://home.agh.edu.          desorption ionization time-of-flight mass spectrometry (MALDI-TOF
     pl/~kulpi/publ/Charytanowicz_Niewczas_Kulczycki_Kowalski_Luk                   MS)" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4536233).
     asik_Zak_-_Information_Technologies_in_Biomedicine_-_2010.pd                   PLOS ONE. 10 (8): e0135337. Bibcode:2015PLoSO..1035337B (htt
     f)." Information technologies in biomedicine. Springer Berlin                  ps://ui.adsabs.harvard.edu/abs/2015PLoSO..1035337B).
     Heidelberg, 2010. 15–24.                                                       doi:10.1371/journal.pone.0135337 (https://doi.org/10.1371%2Fjour
470. Sanchez, Mauricio A.; et al. (2014). "Fuzzy granular gravitational             nal.pone.0135337). PMC 4536233 (https://www.ncbi.nlm.nih.gov/p
     clustering algorithm for multivariate data". Information Sciences.             mc/articles/PMC4536233). PMID 26271045 (https://pubmed.ncbi.nl
     279: 498–511. doi:10.1016/j.ins.2014.04.005 (https://doi.org/10.101            m.nih.gov/26271045).
     6%2Fj.ins.2014.04.005).                                                   484. Horton, Paul; Nakai, Kenta (1996). "A probabilistic classification
471. Blackard, Jock A.; Dean, Denis J. (1999). "Comparative accuracies              system for predicting the cellular localization sites of proteins" (http
     of artificial neural networks and discriminant analysis in predicting          s://www.aaai.org/Papers/ISMB/1996/ISMB96-012.pdf) (PDF). ISMB-
     forest cover types from cartographic variables". Computers and                 96 Proceedings. 4: 109–15. PMID 8877510 (https://pubmed.ncbi.nl
     Electronics in Agriculture. 24 (3): 131–151.                                   m.nih.gov/8877510).
     CiteSeerX 10.1.1.128.2475 (https://citeseerx.ist.psu.edu/viewdoc/su       485. Allwein, Erin L.; Schapire, Robert E.; Singer, Yoram (2001).
     mmary?doi=10.1.1.128.2475). doi:10.1016/s0168-1699(99)00046-0                  "Reducing multiclass to binary: A unifying approach for margin
     (https://doi.org/10.1016%2Fs0168-1699%2899%2900046-0).                         classifiers" (http://www.jmlr.org/papers/volume1/allwein00a/allwein
     S2CID 13985407 (https://api.semanticscholar.org/CorpusID:139854                00a.pdf) (PDF). The Journal of Machine Learning Research. 1:
     07).                                                                           113–141.
472. Fürnkranz, Johannes. "Round robin rule learning (http://citeseerx.is      486. Mayr, Andreas; Klambauer, Guenter; Unterthiner, Thomas;
     t.psu.edu/viewdoc/summary?doi=10.1.1.20.9520)."Proceedings of                  Hochreiter, Sepp (2016). "DeepTox: Toxicity Prediction Using Deep
     the 18th International Conference on Machine Learning (ICML-01):               Learning" (http://bioinf.jku.at/research/DeepTox/tox21.html).
     146—153. 2001.                                                                 Frontiers in Environmental Science. 3: 80.
473. Li, Song; Assmann, Sarah M.; Albert, Réka (2006). "Predicting                  doi:10.3389/fenvs.2015.00080 (https://doi.org/10.3389%2Ffenvs.20
     essential components of signal transduction networks: a dynamic                15.00080).
     model of guard cell abscisic acid signaling" (https://www.ncbi.nlm.ni     487. Lavin, Alexander; Ahmad, Subutai (12 October 2015). "Evaluating
     h.gov/pmc/articles/PMC1564158). PLOS Biol. 4 (10): e312. arXiv:q-              Real-Time Anomaly Detection Algorithms -- the Numenta Anomaly
     bio/0610012 (https://arxiv.org/abs/q-bio/0610012).                             Benchmark". 2015 IEEE 14th International Conference on Machine
     Bibcode:2006q.bio....10012L (https://ui.adsabs.harvard.edu/abs/200             Learning and Applications (ICMLA). pp. 38–44. arXiv:1510.03336
     6q.bio....10012L). doi:10.1371/journal.pbio.0040312 (https://doi.org/          (https://arxiv.org/abs/1510.03336). doi:10.1109/ICMLA.2015.141 (htt
     10.1371%2Fjournal.pbio.0040312). PMC 1564158 (https://www.ncb                  ps://doi.org/10.1109%2FICMLA.2015.141). ISBN 978-1-5090-0287-
     i.nlm.nih.gov/pmc/articles/PMC1564158). PMID 16968132 (https://p               0. S2CID 6842305 (https://api.semanticscholar.org/CorpusID:68423
     ubmed.ncbi.nlm.nih.gov/16968132).                                              05).
474. Munisami, Trishen; et al. (2015). "Plant Leaf Recognition Using           488. Iurii D. Katser; Vyacheslav O. Kozitsin. "SKAB GitHub repository" (h
     Shape Features and Colour Histogram with K-nearest Neighbour                   ttps://github.com/waico/skab). GitHub. Retrieved 12 January 2021.
     Classifiers" (https://doi.org/10.1016%2Fj.procs.2015.08.095).
                                                                               489. Iurii D. Katser; Vyacheslav O. Kozitsin (2020). "Skoltech Anomaly
     Procedia Computer Science. 58: 740–747.
                                                                                    Benchmark (SKAB)" (https://www.kaggle.com/yuriykatser/skoltech-
     doi:10.1016/j.procs.2015.08.095 (https://doi.org/10.1016%2Fj.procs.            anomaly-benchmark-skab). Kaggle.
     2015.08.095).
                                                                                    doi:10.34740/KAGGLE/DSV/1693952 (https://doi.org/10.34740%2F
475. Li, Bai (2016). "Atomic potential matching: An evolutionary target             KAGGLE%2FDSV%2F1693952). Retrieved 12 January 2021.
     recognition approach based on edge features". Optik. 127 (5):             490. Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello,
     3162–3168. Bibcode:2016Optik.127.3162L (https://ui.adsabs.harvar
                                                                                    Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira;
     d.edu/abs/2016Optik.127.3162L). doi:10.1016/j.ijleo.2015.11.186 (h             Houle, Michael E. (2016). "On the evaluation of unsupervised
     ttps://doi.org/10.1016%2Fj.ijleo.2015.11.186).                                 outlier detection: measures, datasets, and an empirical study". Data
476. Nilsback, Maria-Elena, and Andrew Zisserman. "A visual                         Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-
     vocabulary for flower classification (http://www.robots.ox.ac.uk/~me           015-0444-8 (https://doi.org/10.1007%2Fs10618-015-0444-8).
     n/papers/nilsback_cvpr06.pdf)."Computer Vision and Pattern                     ISSN 1384-5810 (https://www.worldcat.org/issn/1384-5810).
     Recognition, 2006 IEEE Computer Society Conference on. Vol. 2.                 S2CID 1952214 (https://api.semanticscholar.org/CorpusID:195221
     IEEE, 2006.                                                                    4).
477. Giselsson, Thomas M.; et al. (2017). "A Public Image Database for         491. Ann-Kathrin Hartmann, Tommaso Soru, Edgard Marx. Generating a
     Benchmark of Plant Seedling Classification Algorithms".                        Large Dataset for Neural Question Answering over the DBpedia
     arXiv:1711.05458 (https://arxiv.org/abs/1711.05458) [cs.CV (https://           Knowledge Base (https://www.researchgate.net/publication/324482
     arxiv.org/archive/cs.CV)].                                                     598_Generating_a_Large_Dataset_for_Neural_Question_Answeri
                                                                                    ng_over_the_DBpedia_Knowledge_Base). 2018.
492. Tommaso Soru, Edgard Marx. Diego Moussallem, Andre                      505. "CAPEC - Common Attack Pattern Enumeration and Classification
     Valdestilhas, Diego Esteves, Ciro Baron. SPARQL as a Foreign                 (CAPEC™)" (https://capec.mitre.org/). capec.mitre.org. Retrieved
     Language. 2018.                                                              14 January 2023.
493. Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan               506. "CVE - Home" (https://cve.mitre.org/cve/). cve.mitre.org. Retrieved
     Luu-Thuy Nguyen. A Vietnamese Dataset for Evaluating Machine                 14 January 2023.
     Reading Comprehension (https://www.aclweb.org/anthology/2020.c          507. "CWE - Common Weakness Enumeration" (https://cwe.mitre.org/ind
     oling-main.233.pdf). COLING 2020.                                            ex.html). cwe.mitre.org. Retrieved 14 January 2023.
494. Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan              508. Lim, Swee Kiat; Muis, Aldrian Obaja; Lu, Wei; Ong, Chen Hui (July
     Nguyen, Ngan Luu-Thuy Nguyen. Enhancing Lexical-Based                        2017). "MalwareTextDB: A Database for Annotated Malware
     Approach With External Knowledge for Vietnamese Multiple-                    Articles" (https://aclanthology.org/P17-1143). Proceedings of the
     Choice Machine Reading Comprehension (https://ieeexplore.ieee.o              55th Annual Meeting of the Association for Computational
     rg/document/9247161). IEEE Access. 2020.                                     Linguistics (Volume 1: Long Papers). Vancouver, Canada:
495. Anantha, Raviteja; Vakulenko, Svitlana; Tu, Zhucheng; Longpre,               Association for Computational Linguistics: 1557–1567.
     Shayne; Pulman, Stephen; Chappidi, Srinivas (2020). "Open-                   doi:10.18653/v1/P17-1143 (https://doi.org/10.18653%2Fv1%2FP17
     Domain Question Answering Goes Conversational via Question                   -1143). S2CID 7816596 (https://api.semanticscholar.org/CorpusID:7
     Rewriting". arXiv:2010.04898 (https://arxiv.org/abs/2010.04898)              816596).
     [cs.IR (https://arxiv.org/archive/cs.IR)].                              509. "USENIX" (https://www.usenix.org/). USENIX. Retrieved
496. Khashabi, Daniel; Min, Sewon; Khot, Tushar; Sabharwal, Ashish;               19 January 2023.
     Tafjord, Oyvind; Clark, Peter; Hajishirzi, Hannaneh (November           510. "APTnotes | Read the Docs" (https://readthedocs.org/projects/aptno
     2020). "UNIFIEDQA: Crossing Format Boundaries with a Single QA               tes/). readthedocs.org. Retrieved 19 January 2023.
     System" (https://aclanthology.org/2020.findings-emnlp.171).
                                                                             511. "Cryptography and Security authors/titles recent submissions" (http
     Findings of the Association for Computational Linguistics: EMNLP
                                                                                  s://arxiv.org/list/cs.CR/recent). arxiv.org. Retrieved 19 January 2023.
     2020. Online: Association for Computational Linguistics: 1896–
     1907. arXiv:2005.00700 (https://arxiv.org/abs/2005.00700).              512. "Holistic Info-Sec for Web Developers - Fascicle 0" (https://f0.holisti
     doi:10.18653/v1/2020.findings-emnlp.171 (https://doi.org/10.1865             cinfosecforwebdevelopers.com/).
     3%2Fv1%2F2020.findings-emnlp.171). S2CID 218487109 (https://a                f0.holisticinfosecforwebdevelopers.com. Retrieved 20 January
     pi.semanticscholar.org/CorpusID:218487109).                                  2023.
497. Taskmaster (https://github.com/google-research-datasets/Taskmast        513. "Holistic Info-Sec for Web Developers - Fascicle 1" (https://f1.holisti
     er), Google Research Datasets, 17 December 2022, retrieved                   cinfosecforwebdevelopers.com/).
     7 January 2023                                                               f1.holisticinfosecforwebdevelopers.com. Retrieved 20 January
                                                                                  2023.
498. Byrne, Bill; Krishnamoorthi, Karthik; Sankar, Chinnadhurai;
     Neelakantan, Arvind; Duckworth, Daniel; Yavuz, Semih; Goodrich,         514. Vincent, Adam. "Web Services Web Services Hacking and
     Ben; Dubey, Amit; Cedilnik, Andy; Kim, Kyu-Young (1 September                Hardening" (https://owasp.org/www-pdf-archive/Web_Services_Ha
     2019). "Taskmaster-1: Toward a Realistic and Diverse Dialog                  cking_and_Hardening.pdf) (PDF). owasp.org.
     Dataset". arXiv:1909.05358 (https://arxiv.org/abs/1909.05358)           515. McCray, Joe. "Advanced SQL Injection" (https://defcon.org/images/d
     [cs.CL (https://arxiv.org/archive/cs.CL)].                                   efcon-17/dc-17-presentations/defcon-17-joseph_mccray-adv_sql_in
499. Yasunaga, Michihiro; Liang, Percy (21 November 2020). "Graph-                jection.pdf) (PDF). defcon.org.
     based, Self-Supervised Program Repair from Diagnostic                   516. Shah, Shreeraj. "Blind SQL injection discovery & exploitation
     Feedback" (https://proceedings.mlr.press/v119/yasunaga20a.html).             technique" (https://blueinfy.com/wp/blindsql.pdf) (PDF).
     International Conference on Machine Learning. PMLR: 10799–                   blueinfy.com.
     10808. arXiv:2005.10636 (https://arxiv.org/abs/2005.10636).             517. Palcer, C. C. "Ethical hacking" (https://blueinfy.com/wp/blindsql.pdf)
500. Wang, Yizhong; Mishra, Swaroop; Alipoormolabashi, Pegah; Kordi,              (PDF). textfiles.
     Yeganeh; Mirzaei, Amirreza; Arunkumar, Anjana; Ashok, Arjun;            518. "Hacking Secrets Revealed - Information and Instructional Guide"
     Dhanasekaran, Arut Selvan; Naik, Atharva; Stap, David; Pathak,               (https://www.onlinepot.org/security/HackersSecrets.pdf) (PDF).
     Eshaan; Karamanolakis, Giannis; Lai, Haizhi Gary; Purohit, Ishan;       519. Park, Alexis. "Hack any website" (https://defcon.org/images/defcon-
     Mondal, Ishani (24 October 2022). "Super-NaturalInstructions:                11/dc-11-presentations/dc-11-Gentil/dc-11-gentil.pdf) (PDF).
     Generalization via Declarative Instructions on 1600+ NLP Tasks".
                                                                             520. Cerrudo, Cesar; Martinez Fayo, Esteban. "Hacking Databases for
     arXiv:2204.07705 (https://arxiv.org/abs/2204.07705) [cs.CL (https://a
                                                                                  Owning your Data" (https://www.blackhat.com/presentations/bh-eur
     rxiv.org/archive/cs.CL)].
                                                                                  ope-07/Cerrudo/Whitepaper/bh-eu-07-cerrudo-WP-up.pdf) (PDF).
501. Paperno, Denis; Kruszewski, Germán; Lazaridou, Angeliki; Pham,               blackhat.
     Quan Ngoc; Bernardi, Raffaella; Pezzelle, Sandro; Baroni, Marco;
                                                                             521. O'Connor, Tj. "Violent Python-A Cookbook for Hackers, Forensic
     Boleda, Gemma; Fernández, Raquel (7 August 2016), The
                                                                                  Analysts, Penetration Testers and Security Engineers" (https://githu
     LAMBADA dataset (https://zenodo.org/record/2630551),
                                                                                  b.com/reconSF/python/blob/master/Syngress.Violent.Python.a.Coo
     doi:10.5281/zenodo.2630551 (https://doi.org/10.5281%2Fzenodo.2
                                                                                  kbook.for.Hackers.2013.pdf) (PDF). Github.
     630551), retrieved 7 January 2023
                                                                             522. Grand, Joe. "Hardware Reverse Engineering: Access, Analyze, &
502. Paperno, Denis; Kruszewski, Germán; Lazaridou, Angeliki; Pham,
                                                                                  Defeat" (https://media.blackhat.com/bh-dc-11/Grand/BlackHat_DC_
     Ngoc Quan; Bernardi, Raffaella; Pezzelle, Sandro; Baroni, Marco;
                                                                                  2011_Grand-Workshop.pdf) (PDF). blackhat.
     Boleda, Gemma; Fernández, Raquel (August 2016). "The
     LAMBADA dataset: Word prediction requiring a broad discourse            523. Chang, Jason V. "Computer Hacking: Making the Case for National
     context" (https://aclanthology.org/P16-1144). Proceedings of the             Reporting Requirement" (https://cyber.harvard.edu/sites/cyber.law.h
     54th Annual Meeting of the Association for Computational                     arvard.edu/files/ComputerHacking.pdf) (PDF). cyber.harvard.edu.
     Linguistics (Volume 1: Long Papers). Berlin, Germany: Association       524. "National Cybersecurity Strategies Repository" (https://www.itu.int:4
     for Computational Linguistics: 1525–1534. doi:10.18653/v1/P16-               43/en/ITU-D/Cybersecurity/Pages/National-Strategies-repository.as
     1144 (https://doi.org/10.18653%2Fv1%2FP16-1144).                             px). ITU. Retrieved 20 January 2023.
     hdl:10230/32702 (https://hdl.handle.net/10230%2F32702).                 525. Chen, Yanlin (31 August 2022), Cyber Security Natural Language
     S2CID 2381275 (https://api.semanticscholar.org/CorpusID:238127               Processing (https://github.com/Ychen463/Cyber), retrieved
     5).                                                                          20 January 2023
503. Wei, Jason; Bosma, Maarten; Zhao, Vincent; Guu, Kelvin; Yu,             526. "https://twitter.com/blackorbird" (https://twitter.com/blackorbird).
     Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V.               Twitter. Retrieved 20 January 2023. {{cite web}}: External link
     (10 February 2022). "Finetuned Language Models are Zero-Shot                 in |title= (help)
     Learners" (https://openreview.net/forum?id=gEZrGCozdqR).
                                                                             527. Zampieri, Marcos; Malmasi, Shervin; Nakov, Preslav; Rosenthal,
     arXiv:2109.01652 (https://arxiv.org/abs/2109.01652).                         Sara; Farra, Noura; Kumar, Ritesh (16 April 2019). "Predicting the
504. "Working with ATT&CK | MITRE ATT&CK®" (https://attack.mitre.or               Type and Target of Offensive Posts in Social Media".
     g/resources/working-with-attack/). attack.mitre.org. Retrieved               arXiv:1902.09666 (https://arxiv.org/abs/1902.09666) [cs.CL (https://a
     14 January 2023.                                                             rxiv.org/archive/cs.CL)].
                                                                             528. "Threat reports" (https://www.ncsc.gov.uk/section/keep-up-to-date/th
                                                                                  reat-reports). www.ncsc.gov.uk. Retrieved 20 January 2023.
529. "Category: APT reports | Securelist" (https://securelist.com/category/    554. Diggelmann, Thomas; Boyd-Graber, Jordan; Bulian, Jannis;
     apt-reports/). securelist.com. Retrieved 23 January 2023.                      Ciaramita, Massimiliano; Leippold, Markus (2 January 2021).
530. "Your Cybersecurity News Connection - Cyber News | CyberWire"                  "CLIMATE-FEVER: A Dataset for Verification of Real-World Climate
     (https://thecyberwire.com/). The CyberWire. Retrieved 23 January               Claims". arXiv:2012.00614 (https://arxiv.org/abs/2012.00614) [cs.CL
     2023.                                                                          (https://arxiv.org/archive/cs.CL)].
531. "News" (https://www.databreaches.net/news/). Retrieved                    555. "climate-news-db" (http://www.climate-news-db.com/). www.climate-
     23 January 2023.                                                               news-db.com. Retrieved 3 February 2023.
532. "Cybernews" (https://cybernews.com/). Cybernews.                          556. "Climatext" (http://www.sustainablefinance.uzh.ch/en/research/clim
533. "HIPAA Journal" (https://www.hipaajournal.com/). HIPAA Journal.                ate-fever/climatext.html). www.sustainablefinance.uzh.ch. Retrieved
     Retrieved 23 January 2023.                                                     19 February 2023.
                                                                               557. "Greenbiz" (https://www.greenbiz.com/). www.greenbiz.com.
534. "BleepingComputer" (https://www.bleepingcomputer.com/).
     BleepingComputer. Retrieved 23 January 2023.                                   Retrieved 2 March 2023.
                                                                               558. "Explore the @Reuters Hot List of 1,000 top climate scientists" (http
535. "Homepage" (https://therecord.media/). The Record from Recorded
                                                                                    s://www.reuters.com/investigates/special-report/climate-change-sci
     Future News. Retrieved 23 January 2023.
                                                                                    entists-list/). Reuters. Retrieved 22 March 2023.
536. "HackRead | Latest Cyber Crime - InfoSec- Tech - Hacking News"
                                                                               559. "Blogs | Alliance for Research on Corporate Sustainability" (https://c
     (https://www.hackread.com/). 8 January 2022. Retrieved 23 January
     2023.                                                                          orporate-sustainability.org/blogs/). corporate-sustainability.org.
                                                                                    Retrieved 27 March 2023.
537. "Securelist | Kaspersky's threat research and reports" (https://secure
                                                                               560. "Greenbiz" (https://www.greenbiz.com/). www.greenbiz.com.
     list.com/). securelist.com. Retrieved 31 January 2023.
                                                                                    Retrieved 29 March 2023.
538. Harshaw, Christopher R.; Bridges, Robert A.; Iannacone, Michael
                                                                               561. "CSR News" (https://www.csrwire.com/press_releases).
     D.; Reed, Joel W.; Goodall, John R. (5 April 2016). "GraphPrints:
     Towards a Graph Analytic Method for Network Anomaly Detection"                 www.csrwire.com. Retrieved 29 March 2023.
     (https://doi.org/10.1145/2897795.2897806). Proceedings of the 11th        562. "CDP Homepage" (https://www.cdp.net/en). www.cdp.net.
     Annual Cyber and Information Security Research Conference.                     Retrieved 29 March 2023.
     CISRC '16. New York, NY, USA: Association for Computing                   563. "Hybrid cloud blog" (https://content.cloud.redhat.com/blog).
     Machinery: 1–4. doi:10.1145/2897795.2897806 (https://doi.org/10.1              content.cloud.redhat.com. Retrieved 9 April 2023.
     145%2F2897795.2897806). ISBN 978-1-4503-3752-6.                           564. "Production-Grade Container Orchestration" (https://kubernetes.io/).
539. "Farsight Security, cyber security intelligence solutions" (https://ww         Kubernetes. Retrieved 9 April 2023.
     w.farsightsecurity.com/). Farsight Security. Retrieved 13 February        565. "Home | Official Red Hat OpenShift Documentation" (https://docs.op
     2023.                                                                          enshift.com/). docs.openshift.com. Retrieved 9 April 2023.
540. "Schneier on Security" (https://www.schneier.com/).                       566. "Cloud Native Computing Foundation" (https://www.cncf.io/). Cloud
     www.schneier.com. Retrieved 13 February 2023.                                  Native Computing Foundation. Retrieved 9 April 2023.
541. "#1 in Cloud Security & Endpoint Cybersecurity" (https://www.trend        567. CNCF Community Presentations (https://github.com/cncf/presentati
     micro.com/en_us/business.html). Trend Micro. Retrieved                         ons/blob/2ff57e4d78f6d70bb1fd5daf81e76f04a54c8520/kubernete
     13 February 2023.                                                              s/README.md), Cloud Native Computing Foundation (CNCF), 11
542. "The Hacker News | #1 Trusted Cybersecurity News Site" (https://th             April 2023, retrieved 11 April 2023
     ehackernews.com/). The Hacker News. Retrieved 13 February                 568. "Red Hat - We make open source technologies for the enterprise"
     2023.                                                                          (https://www.redhat.com/en). www.redhat.com. Retrieved 1 May
543. "Krebs on Security – In-depth security news and investigation" (http           2023.
     s://krebsonsecurity.com/). Retrieved 25 February 2023.                    569. Brown, Michael Scott, Michael J. Pelosi, and Henry Dirska.
544. "MITRE D3FEND Knowledge Graph" (https://d3fend.mitre.org/).                    "Dynamic-radius species-conserving genetic algorithm for the
     d3fend.mitre.org. Retrieved 31 March 2023.                                     financial forecasting of Dow Jones index stocks (http://www.academ
545. "MITRE | ATLAS™" (https://atlas.mitre.org/). atlas.mitre.org.                  ia.edu/download/46729605/BrownPelosiDirska79880027.pdf)."
     Retrieved 31 March 2023.                                                       Machine Learning and Data Mining in Pattern Recognition.
546. "MITRE Engage™ | An Adversary Engagement Framework from                        Springer Berlin Heidelberg, 2013. 27–41.
     MITRE" (https://engage.mitre.org/). Retrieved 1 April 2023.               570. Shen, Kao-Yi; Tzeng, Gwo-Hshiung (2015). "Fuzzy Inference-
547. "Hacking Tutorials - The best Step-by-Step Hacking Tutorials" (http            Enhanced VC-DRSA Model for Technical Analysis: Investment
     s://www.hackingtutorials.org/). Hacking Tutorials. Retrieved 1 April           Decision Aid". International Journal of Fuzzy Systems. 17 (3): 375–
                                                                                    389. doi:10.1007/s40815-015-0058-8 (https://doi.org/10.1007%2Fs
     2023.
                                                                                    40815-015-0058-8). S2CID 68241024 (https://api.semanticscholar.o
548. "TCFD Knowledge Hub" (https://www.tcfdhub.org/). TCFD                          rg/CorpusID:68241024).
     Knowledge Hub. Retrieved 3 February 2023.
                                                                               571. Quinlan, J. Ross (1987). "Simplifying decision trees". International
549. "ResponsibilityReports.com" (https://www.responsibilityreports.co              Journal of Man-Machine Studies. 27 (3): 221–234.
     m/). www.responsibilityreports.com. Retrieved 3 February 2023.                 CiteSeerX 10.1.1.18.4267 (https://citeseerx.ist.psu.edu/viewdoc/su
550. "About — IPCC" (https://www.ipcc.ch/about/). Retrieved                         mmary?doi=10.1.1.18.4267). doi:10.1016/s0020-7373(87)80053-6
     20 February 2023.                                                              (https://doi.org/10.1016%2Fs0020-7373%2887%2980053-6).
551. "Alliance for Research on Corporate Sustainability | ARCS serves          572. Hamers, Bart; Suykens, Johan AK; De Moor, Bart (2003). "Coupled
     as a vehicle for advancing rigorous academic research on                       transductive ensemble learning of kernel models" (http://ftp.esat.kul
     corporate sustainability issues" (https://corporate-sustainability.or          euven.be/pub/SISTA/hamers/BH_clm.pdf) (PDF). Journal of
     g/). corporate-sustainability.org. Retrieved 2 March 2023.                     Machine Learning Research. 1: 1–48.
552. Mehra, Srishti; Louka, Robert; Zhang, Yixun (26 March 2022).              573. Shmueli, Galit, Ralph P. Russo, and Wolfgang Jank. "The
     "ESGBERT: Language Model to Help with Classification Tasks                     BARISTA: a model for bid arrivals in online auctions (https://project
     Related to Companies Environmental, Social, and Governance                     euclid.org/download/pdfview_1/euclid.aoas/1196438025)." The
     Practices". Embedded Systems and Applications: 183–190.                        Annals of Applied Statistics(2007): 412–441.
     arXiv:2203.16788 (https://arxiv.org/abs/2203.16788).                      574. Peng, Jie, and Hans-Georg Müller. "Distance-based clustering of
     doi:10.5121/csit.2022.120616 (https://doi.org/10.5121%2Fcsit.2022.             sparsely observed stochastic processes, with applications to online
     120616). ISBN 9781925953657. S2CID 247825524 (https://api.sem                  auctions (https://projecteuclid.org/download/pdfview_1/euclid.aoas/
     anticscholar.org/CorpusID:247825524).                                          1223908052)." The Annals of Applied Statistics (2008): 1056–1077.
553.               This article incorporates text (https://www.tensorflow.or   575. Eggermont, Jeroen, Joost N. Kok, and Walter A. Kosters. "Genetic
     g/datasets/community_catalog/huggingface/climate_fever)                        programming for data classification: Partitioning the search space
     available under the CC BY 4.0 license.                                         (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.8725&r
                                                                                    ep=rep1&type=pdf)."Proceedings of the 2004 ACM symposium on
                                                                                    Applied computing. ACM, 2004.
576. Moro, Sérgio; Cortez, Paulo; Rita, Paulo (2014). "A data-driven           591. Bay, Stephen D (2001). "Multivariate discretization for set mining".
     approach to predict the success of bank telemarketing". Decision               Knowledge and Information Systems. 3 (4): 491–512.
     Support Systems. 62: 22–31. doi:10.1016/j.dss.2014.03.001 (https://            CiteSeerX 10.1.1.217.921 (https://citeseerx.ist.psu.edu/viewdoc/su
     doi.org/10.1016%2Fj.dss.2014.03.001). hdl:10071/9499 (https://hdl.             mmary?doi=10.1.1.217.921). doi:10.1007/pl00011680 (https://doi.or
     handle.net/10071%2F9499). S2CID 14181100 (https://api.semantic                 g/10.1007%2Fpl00011680). S2CID 10945544 (https://api.semantic
     scholar.org/CorpusID:14181100).                                                scholar.org/CorpusID:10945544).
577. Payne, Richard D.; Mallick, Bani K. (2014). "Bayesian Big Data            592. Ruggles, Steven (1995). "Sample designs and sampling errors".
     Classification: A Review with Complements". arXiv:1411.5653 (http              Historical Methods. 28 (1): 40–46.
     s://arxiv.org/abs/1411.5653) [stat.ME (https://arxiv.org/archive/stat.M        doi:10.1080/01615440.1995.9955312 (https://doi.org/10.1080%2F0
     E)].                                                                           1615440.1995.9955312).
578. Akbilgic, Oguz; Bozdogan, Hamparsum; Balaban, M. Erdal (2014).            593. Meek, Christopher, Bo Thiesson, and David Heckerman. "The
     "A novel Hybrid RBF Neural Networks model as a forecaster".                    Learning Curve Method Applied to Clustering (https://www.microsof
     Statistics and Computing. 24 (3): 365–375. doi:10.1007/s11222-                 t.com/en-us/research/wp-content/uploads/2001/01/lc-aistats.pdf)."
     013-9375-7 (https://doi.org/10.1007%2Fs11222-013-9375-7).                      AISTATS. 2001.
     S2CID 17764829 (https://api.semanticscholar.org/CorpusID:177648           594. Fanaee-T, Hadi; Gama, Joao (2013). "Event labeling combining
     29).                                                                           ensemble detectors and background knowledge" (http://repositorio.i
579. Jabin, Suraiya. "Stock market prediction using feed-forward artificial         nesctec.pt/handle/123456789/3506). Progress in Artificial
     neural network (http://citeseerx.ist.psu.edu/viewdoc/download?doi=             Intelligence. 2 (2–3): 113–127. doi:10.1007/s13748-013-0040-3 (htt
     10.1.1.677.8985&rep=rep1&type=pdf)." Int. J. Comput. Appl. (IJCA)              ps://doi.org/10.1007%2Fs13748-013-0040-3). S2CID 3345087 (http
     99.9 (2014).                                                                   s://api.semanticscholar.org/CorpusID:3345087).
580. Yeh, I-Cheng; Che-hui, Lien (2009). "The comparisons of data              595. Giot, Romain, and Raphaël Cherrier. "Predicting bikeshare system
     mining techniques for the predictive accuracy of probability of                usage up to one day ahead (https://hal.archives-ouvertes.fr/docs/01/
     default of credit card clients". Expert Systems with Applications. 36          06/59/83/PDF/paper_final.pdf)." Computational intelligence in
     (2): 2473–2480. doi:10.1016/j.eswa.2007.12.020 (https://doi.org/10.            vehicles and transportation systems (CIVTS), 2014 IEEE
     1016%2Fj.eswa.2007.12.020). S2CID 15696161 (https://api.semant                 symposium on. IEEE, 2014.
     icscholar.org/CorpusID:15696161).                                         596. Zhan, Xianyuan; et al. (2013). "Urban link travel time estimation
581. Lin, Shu Ling (2009). "A new two-stage hybrid approach of credit               using large-scale taxi data with partial information". Transportation
     risk in banking industry". Expert Systems with Applications. 36 (4):           Research Part C: Emerging Technologies. 33: 37–49.
     8333–8341. doi:10.1016/j.eswa.2008.10.015 (https://doi.org/10.101              doi:10.1016/j.trc.2013.04.001 (https://doi.org/10.1016%2Fj.trc.2013.
     6%2Fj.eswa.2008.10.015).                                                       04.001).
582. Pelckmans, Kristiaan; et al. (2005). "The differogram: Non-               597. Moreira-Matias, Luis; et al. (2013). "Predicting taxi–passenger
     parametric noise variance estimation and its use for model                     demand using streaming data" (http://repositorio.inesctec.pt/handle/
     selection". Neurocomputing. 69 (1): 100–122.                                   123456789/5356). IEEE Transactions on Intelligent Transportation
     doi:10.1016/j.neucom.2005.02.015 (https://doi.org/10.1016%2Fj.ne               Systems. 14 (3): 1393–1402. doi:10.1109/tits.2013.2262376 (https://
     ucom.2005.02.015).                                                             doi.org/10.1109%2Ftits.2013.2262376). S2CID 14764358 (https://a
583. Bay, Stephen D.; et al. (2000). "The UCI KDD archive of large data             pi.semanticscholar.org/CorpusID:14764358).
     sets for data mining research and experimentation". ACM SIGKDD            598. Hwang, Ren-Hung; Hsueh, Yu-Ling; Chen, Yu-Ting (2015). "An
     Explorations Newsletter. 2 (2): 81–85. CiteSeerX 10.1.1.15.9776 (ht            effective taxi recommender system based on a spatio-temporal
     tps://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9776).               factor analysis model". Information Sciences. 314: 28–40.
     doi:10.1145/380995.381030 (https://doi.org/10.1145%2F380995.38                 doi:10.1016/j.ins.2015.03.068 (https://doi.org/10.1016%2Fj.ins.201
     1030). S2CID 534881 (https://api.semanticscholar.org/CorpusID:53               5.03.068).
     4881).                                                                    599. H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis
584. Lucas, D. D.; et al. (2015). "Designing optimal greenhouse gas                 Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and
     observing networks that consider performance and cost" (https://doi.           Cyrus Shahabi. Big data and its technical challenges. Commun.
     org/10.5194%2Fgi-4-121-2015). Geoscientific Instrumentation,                   ACM, 57(7):86–94, July 2014.
     Methods and Data Systems. 4 (1): 121. Bibcode:2015GI......4..121L         600. Caltrans PeMS (http://pems.dot.ca.gov/)
     (https://ui.adsabs.harvard.edu/abs/2015GI......4..121L).                  601. Meusel, Robert, et al. "The Graph Structure in the Web—Analyzed
     doi:10.5194/gi-4-121-2015 (https://doi.org/10.5194%2Fgi-4-121-201
                                                                                    on Different Aggregation Levels (https://www.nowpublishers.com/art
     5).
                                                                                    icle/OpenAccessDownload/JWS-0003)."The Journal of Web
585. Pales, Jack C.; Keeling, Charles D. (1965). "The concentration of              Science 1.1 (2015).
     atmospheric carbon dioxide in Hawaii". Journal of Geophysical
                                                                               602. Kushmerick, Nicholas. "Learning to remove internet advertisements
     Research. 70 (24): 6053–6076. Bibcode:1965JGR....70.6053P (http                (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.5686
     s://ui.adsabs.harvard.edu/abs/1965JGR....70.6053P).                            &rep=rep1&type=pdf)." Proceedings of the third annual conference
     doi:10.1029/jz070i024p06053 (https://doi.org/10.1029%2Fjz070i02
                                                                                    on Autonomous Agents. ACM, 1999.
     4p06053).
                                                                               603. Fradkin, Dmitriy, and David Madigan. "Experiments with random
586. Sigillito, Vincent G., et al. "Classification of radar returns from the
                                                                                    projections for machine learning (https://www.researchgate.net/profi
     ionosphere using neural networks." Johns Hopkins APL Technical
                                                                                    le/Dmitriy_Fradkin/publication/2573186_Experiments_with_Rando
     Digest10.3 (1989): 262–266.                                                    m_Projections_for_Machine_Learning/links/0fcfd50b6230aaf30900
587. Zhang, Kun, and Wei Fan. "Forecasting skewed biased stochastic                 0000.pdf)."Proceedings of the ninth ACM SIGKDD international
     ozone days: analyses, solutions and beyond (http://citeseerx.ist.ps            conference on Knowledge discovery and data mining. ACM, 2003.
     u.edu/viewdoc/download?doi=10.1.1.218.9860&rep=rep1&type=pd               604. This data was used in the American Statistical Association
     f)." Knowledge and Information Systems14.3 (2008): 299–326.
                                                                                    Statistical Graphics and Computing Sections 1999 Data Exposition.
588. Reich, Brian J., Montserrat Fuentes, and David B. Dunson.                 605. Ma, Justin, et al. "Identifying suspicious URLs: an application of
     "Bayesian spatial quantile regression (https://www.ncbi.nlm.nih.gov/
                                                                                    large-scale online learning (https://cseweb.ucsd.edu/~voelker/pubs/
     pmc/articles/PMC3583387/)." Journal of the American Statistical
                                                                                    mal-url-icml09.pdf)."Proceedings of the 26th annual international
     Association (2012).                                                            conference on machine learning. ACM, 2009.
589. Kohavi, Ron (1996). "Scaling Up the Accuracy of Naive-Bayes
                                                                               606. Levchenko, Kirill, et al. "Click trajectories: End-to-end analysis of
     Classifiers: A Decision-Tree Hybrid". KDD. 96.
                                                                                    the spam value chain (http://www.icir.org/christian/publications/2011
590. Oza, Nikunj C., and Stuart Russell. "Experimental comparisons of               -oakland-trajectory.pdf)." Security and Privacy (SP), 2011 IEEE
     online and batch versions of bagging and boosting." Proceedings of             Symposium on. IEEE, 2011.
     the seventh ACM SIGKDD international conference on Knowledge
     discovery and data mining. ACM, 2001.
607. Mohammad, Rami M., Fadi Thabtah, and Lee McCluskey. "An                    623. Matheus, Christopher J.; Rendell, Larry A. (1989). "Constructive
     assessment of features related to phishing websites using an                    Induction on Decision Trees" (http://www.academia.edu/download/4
     automated technique (http://eprints.hud.ac.uk/16229/1/The_7th_ICI               0413240/Constructive_Induction_On_Decision_Trees20151126-44
     TST_2012_Conference_-An_Assessment_of_Features_Related_t                        70-tjt71n.pdf) (PDF). IJCAI. 89.
     o_Phishing_Websites_using_an_Automated_Technique.pd                        624. Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression
     f)."Internet Technology And Secured Transactions, 2012                          diagnostics: Identifying influential data and sources of collinearity.
     International Conference for. IEEE, 2012.                                       Vol. 571. John Wiley & Sons, 2005.
608. Singh, Ashishkumar, et al. "Clustering Experiments on Big                  625. Ruotsalo, Tuukka; Aroyo, Lora; Schreiber, Guus (2009).
     Transaction Data for Market Segmentation (https://dl.acm.org/citatio            "Knowledge-based linguistic annotation of digital cultural heritage
     n.cfm?id=2644161)." Proceedings of the 2014 International                       collections" (http://dare.ubvu.vu.nl/bitstream/handle/1871/24407/24
     Conference on Big Data Science and Computing. ACM, 2014.                        3319.pdf?sequence=3) (PDF). IEEE Intelligent Systems. 24 (2): 64–
609. Bollacker, Kurt, et al. "Freebase: a collaboratively created graph              75. doi:10.1109/MIS.2009.32 (https://doi.org/10.1109%2FMIS.2009.
     database for structuring human knowledge (http://citeseerx.ist.psu.e            32). hdl:1871.1/9f6091aa-9596-46a9-9251-f11edeeb28b7 (https://h
     du/viewdoc/download?doi=10.1.1.538.7139&rep=rep1&type=pdf)."                    dl.handle.net/1871.1%2F9f6091aa-9596-46a9-9251-f11edeeb28b
     Proceedings of the 2008 ACM SIGMOD international conference on                  7). S2CID 6667472 (https://api.semanticscholar.org/CorpusID:6667
     Management of data. ACM, 2008.                                                  472).
610. Mintz, Mike, et al. "Distant supervision for relation extraction without   626. Li, Lihong; Chu, Wei; Langford, John; Wang, Xuanhui (2011).
     labeled data (https://www.aclweb.org/anthology/P09-1113)."                      "Unbiased offline evaluation of contextual-bandit-based news
     Proceedings of the Joint Conference of the 47th Annual Meeting of               article recommendation algorithms". Proceedings of the fourth ACM
     the ACL and the 4th International Joint Conference on Natural                   international conference on Web search and data mining. pp. 297–
     Language Processing of the AFNLP: Volume 2-Volume 2.                            306. arXiv:1003.5956 (https://arxiv.org/abs/1003.5956).
     Association for Computational Linguistics, 2009.                                doi:10.1145/1935826.1935878 (https://doi.org/10.1145%2F193582
611. Mesterharm, Chris, and Michael J. Pazzani. "Active learning using               6.1935878). ISBN 9781450304931. S2CID 744200 (https://api.sem
     on-line algorithms (http://research.cs.rutgers.edu/~mesterha/active-            anticscholar.org/CorpusID:744200).
     online.pdf) Archived (https://web.archive.org/web/20170922013803/          627. Yeung, Kam Fung, and Yanyan Yang. "A proactive personalized
     http://research.cs.rutgers.edu/~mesterha/active-online.pdf) 22                  mobile news recommendation system (https://ieeexplore.ieee.org/a
     September 2017 at the Wayback Machine."Proceedings of the 17th                  bstract/document/5633837/)." Developments in E-systems
     ACM SIGKDD international conference on Knowledge discovery                      Engineering (DESE), 2010. IEEE, 2010.
     and data mining. ACM, 2011.                                                628. Gass, Susan E.; Roberts, J. Murray (2006). "The occurrence of the
612. Wang, Shusen; Zhang, Zhihua (2013). "Improving CUR matrix                       cold-water coral Lophelia pertusa (Scleractinia) on oil and gas
     decomposition and the Nyström approximation via adaptive                        platforms in the North Sea: colony growth, recruitment and
     sampling" (http://www.jmlr.org/papers/volume14/wang13c/wang13c.                 environmental controls on distribution". Marine Pollution Bulletin.
     pdf) (PDF). The Journal of Machine Learning Research. 14 (1):                   52 (5): 549–559. Bibcode:2006MarPB..52..549G (https://ui.adsabs.h
     2729–2769. arXiv:1303.4207 (https://arxiv.org/abs/1303.4207).                   arvard.edu/abs/2006MarPB..52..549G).
     Bibcode:2013arXiv1303.4207W (https://ui.adsabs.harvard.edu/abs/                 doi:10.1016/j.marpolbul.2005.10.002 (https://doi.org/10.1016%2Fj.
     2013arXiv1303.4207W).                                                           marpolbul.2005.10.002). PMID 16300800 (https://pubmed.ncbi.nlm.
613. "The Pile" (https://pile.eleuther.ai/). pile.eleuther.ai. Retrieved             nih.gov/16300800).
     14 April 2022.                                                             629. Gionis, Aristides; Mannila, Heikki; Tsaparas, Panayiotis (2007).
614. "JSON Lines" (https://jsonlines.org/). jsonlines.org. Retrieved                 "Clustering aggregation". ACM Transactions on Knowledge
     14 April 2022.                                                                  Discovery from Data. 1 (1): 4. CiteSeerX 10.1.1.709.528 (https://cite
                                                                                     seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.709.528).
615. Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe,
                                                                                     doi:10.1145/1217299.1217303 (https://doi.org/10.1145%2F121729
     Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish;
     Nabeshima, Noa; Presser, Shawn (31 December 2020). "The Pile:                   9.1217303). S2CID 433708 (https://api.semanticscholar.org/CorpusI
                                                                                     D:433708).
     An 800GB Dataset of Diverse Text for Language Modeling".
     arXiv:2101.00027 (https://arxiv.org/abs/2101.00027) [cs.CL (https://a      630. Obradovic, Zoran, and Slobodan Vucetic.Challenges in Scientific
     rxiv.org/archive/cs.CL)].                                                       Data Mining: Heterogeneous, Biased, and Large Samples.
                                                                                     Technical Report, Center for Information Science and Technology
616. Cohen, Vanya. "OpenWebTextCorpus" (https://skylion007.github.io/
     OpenWebTextCorpus/). OpenWebTextCorpus. Retrieved 9 January                     Temple University, 2004.
     2023.                                                                      631. Van Der Putten, Peter; van Someren, Maarten (2000). "CoIL
                                                                                     challenge 2000: The insurance company case". Published by
617. "openwebtext · Datasets at Hugging Face" (https://huggingface.co/d
     atasets/openwebtext). huggingface.co. 16 November 2022.                         Sentient Machine Research, Amsterdam. Also a Leiden Institute of
                                                                                     Advanced Computer Science Technical Report. 9: 1–43.
     Retrieved 9 January 2023.
618. Cattral, Robert; Oppacher, Franz; Deugo, Dwight (2002).                    632. Mao, K. Z. (2002). "RBF neural network center selection based on
     "Evolutionary data mining with automatic rule generalization" (http             Fisher ratio class separability measure". IEEE Transactions on
                                                                                     Neural Networks. 13 (5): 1211–1217.
     s://web.archive.org/web/20190806015013/https://pdfs.semanticsch
     olar.org/c068/ea7807367573f4b5f98c0681fca665e9ef74.pdf)                         doi:10.1109/tnn.2002.1031953 (https://doi.org/10.1109%2Ftnn.200
     (PDF). Recent Advances in Computers, Computing and                              2.1031953). PMID 18244518 (https://pubmed.ncbi.nlm.nih.gov/1824
                                                                                     4518).
     Communications: 296–300. S2CID 18625415 (https://api.semantics
     cholar.org/CorpusID:18625415). Archived from the original (https://p       633. Olave, Manuel; Rajkovic, Vladislav; Bohanec, Marko (1989). "An
     dfs.semanticscholar.org/c068/ea7807367573f4b5f98c0681fca665e                    application for admission in public school systems" (http://kt.ijs.si/M
     9ef74.pdf) (PDF) on 6 August 2019.                                              arkoBohanec/pub/Nursery89.pdf) (PDF). Expert Systems in Public
619. Burton, Ariel N.; Kelly, Paul H.J. (2006). "Performance prediction of           Administration. 1: 145–160.
     paging workloads using lightweight tracing". Future Generation             634. Lizotte, Daniel J.; Madani, Omid; Greiner, Russell (2012).
     Computer Systems. Elsevier BV. 22 (7): 784–793.                                 "Budgeted Learning of Naive-Bayes Classifiers". arXiv:1212.2472
     doi:10.1016/j.future.2006.02.003 (https://doi.org/10.1016%2Fj.futur             (https://arxiv.org/abs/1212.2472) [cs.LG (https://arxiv.org/archive/cs.
     e.2006.02.003). ISSN 0167-739X (https://www.worldcat.org/issn/01                LG)].
     67-739X).                                                                  635. Lebowitz, Michael (1986). Concept learning in a rich input domain:
620. Bain, Michael; Muggleton, Stephen (1994). "Learning optimal chess               Generalization-based memory (https://books.google.com/books?id=
     strategies". Machine Intelligence. Oxford University Press, Inc. 13.            f9RylgKpHZsC&q=%22Concept+learning+in+a+rich+input+domai
621. Quilan, J. R. (1983). "Learning efficient classification procedures             n:+Generalization-based+memory%22&pg=PA193). Machine
                                                                                     Learning: An Artificial Intelligence Approach. Vol. 2. pp. 193–214.
     and their application to chess end games". Machine Learning: An
                                                                                     ISBN 9780934613002.
     Artificial Intelligence Approach. 1: 463–482. doi:10.1007/978-3-662-
     12405-5_15 (https://doi.org/10.1007%2F978-3-662-12405-5_15).               636. Yeh, I-Cheng; Yang, King-Jang; Ting, Tao-Ming (2009). "Knowledge
     ISBN 978-3-662-12407-9.                                                         discovery on RFM model using Bernoulli sequence". Expert
                                                                                     Systems with Applications. 36 (3): 5866–5871.
622. Shapiro, Alen D. (1987). Structured induction in expert systems.
     Addison-Wesley Longman Publishing Co., Inc.                                     doi:10.1016/j.eswa.2008.07.018 (https://doi.org/10.1016%2Fj.eswa.
                                                                                     2008.07.018).
637. Lee, Wen-Chen; Cheng, Bor-Wen (2011). "An intelligent system for                 645. Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella,
     improving performance of blood donation" (http://www.airitilibrary.co                 Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio;
     m/Publication/alDetailedMesh?docid=10220690-201104-20110505                           Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno (2015). "A
     0019-201105050019-173-185). Journal of Quality Vol. 18 (2): 173.                      multi-source dataset of urban life in the city of Milan and the
638. Schmidtmann, Irene, et al. "Evaluation des Krebsregisters NRW                         Province of Trentino" (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
     Schwerpunkt Record Linkage (http://www.krebsregister-nrw.de/filea                     C4622222). Scientific Data. 2: 150055.
     dmin/user_upload/dokumente/Evaluation/EKR_NRW_Evaluation_                             Bibcode:2015NatSD...250055B (https://ui.adsabs.harvard.edu/abs/
     Abschlussbericht_2009-06-11.pdf)." Abschlußbericht vom 11                             2015NatSD...250055B). doi:10.1038/sdata.2015.55 (https://doi.org/
     (2009).                                                                               10.1038%2Fsdata.2015.55). ISSN 2052-4463 (https://www.worldca
639. Sariyar, Murat; Borg, Andreas; Pommerening, Klaus (2011).                             t.org/issn/2052-4463). PMC 4622222 (https://www.ncbi.nlm.nih.gov/
                                                                                           pmc/articles/PMC4622222). PMID 26528394 (https://pubmed.ncbi.n
     "Controlling false match rates in record linkage using extreme value
                                                                                           lm.nih.gov/26528394).
     theory". Journal of Biomedical Informatics. 44 (4): 648–654.
     doi:10.1016/j.jbi.2011.02.008 (https://doi.org/10.1016%2Fj.jbi.2011.             646. Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013). "OpenML:
     02.008). PMID 21352952 (https://pubmed.ncbi.nlm.nih.gov/2135295                       networked science in machine learning". SIGKDD Explorations. 15
     2).                                                                                   (2): 49–60. arXiv:1407.7722 (https://arxiv.org/abs/1407.7722).
640. Candillier, Laurent, and Vincent Lemaire. "Design and Analysis of                     doi:10.1145/2641190.2641198 (https://doi.org/10.1145%2F264119
                                                                                           0.2641198). S2CID 4977460 (https://api.semanticscholar.org/Corpu
     the Nomao challenge Active Learning in the Real-World (https://we
     b.archive.org/web/20181206102406/https://pdfs.semanticscholar.or                      sID:4977460).
     g/1647/fc91cfe3e68ef3c41d727b7292ce20482b11.pdf)."                               647. Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH
     Proceedings of the ALRA: Active Learning in Real-world                                (2017). "PMLB: a large benchmark suite for machine learning
     Applications, Workshop ECML-PKDD. 2012.                                               evaluation and comparison" (https://www.ncbi.nlm.nih.gov/pmc/artic
                                                                                           les/PMC5725843). BioData Mining. 10: 36. arXiv:1703.00512 (http
641. Marquez, Ivan Garrido. "A Domain Adaptation Method for Text
                                                                                           s://arxiv.org/abs/1703.00512). Bibcode:2017arXiv170300512O (http
     Classification based on Self-adjusted Training Approach (http://ccc.i
     naoep.mx/~mmontesg/tesis%20estudiantes/TesisMaestria-IvanGarr                         s://ui.adsabs.harvard.edu/abs/2017arXiv170300512O).
     ido.pdf)." (2013).                                                                    doi:10.1186/s13040-017-0154-4 (https://doi.org/10.1186%2Fs1304
                                                                                           0-017-0154-4). PMC 5725843 (https://www.ncbi.nlm.nih.gov/pmc/art
642. Nagesh, Harsha S., Sanjay Goil, and Alok N. Choudhary. "Adaptive                      icles/PMC5725843). PMID 29238404 (https://pubmed.ncbi.nlm.nih.
     Grids for Clustering Massive Data Sets." SDM. 2001.                                   gov/29238404).
643. Kuzilek, Jakub, et al. "OU Analyse: analysing at-risk students at The            648. "Off The Shelf Datasets" (https://appen.com/off-the-shelf-datasets/).
     Open University (http://oro.open.ac.uk/42529/1/__userdata_docume                      appen.com. Appen. Retrieved 30 December 2020.
     nts4_ctb44_Desktop_analysing-at-risk-students-at-open-university.
                                                                                      649. "Open Source Datasets" (https://appen.com/resources/datasets/).
     pdf)." Learning Analytics Review (2015): 1–16.
                                                                                           appen.com. Appen. Retrieved 30 December 2020.
644. Siemens, George, et al. Open Learning Analytics: an integrated &
     modularized platform (http://search.ror.unisa.edu.au/record/UNISA_
     ALMA11143300720001831/media/digital/open/991590917910183
     1/12143300710001831/13143328550001831/pdf). Diss. Open
     University Press, 2011.