ARTICLE INFORMATION
Article title
A Novel Dataset of Date Fruit for Inspection and Classification
Authors
Abdul Khalique Maitlo*, Riaz Ahmed Shaikh, Rafaqat Hussain Arain
Please list all authors (first name(s) - last name) and mark the corresponding author with *.
Affiliations
Shah Abdul Latif University
Corresponding author’s email address and Twitter handle
Please.
Keywords
Machine Learning, Deep Learning, Classification, Agriculture Science.
Abstract
Date fruit grading and inspection is a challenging and crucial process in the industry. The grading
process requires skilled and experienced labour. Unfortunately, the labour turnover in the date
processing industries increased daily. Therefore, the quality of date fruit is compromised. Due to this
increase, fruit wastage and instability of fruit prices were noticed. Currently, deep learning
algorithms achieved focus from the researcher in agriculture.
Furthermore, classification and sorting of date fruit problems have become prominent among the
research community. The classification and grading of date fruit needed a net and clean dataset. In
this article, a novel dataset of date fruit varieties contains 3,004. The dataset contains four date fruit
varieties and size-based grading named large, medium, and small. Also, external quality was
considered and separated into grades grade-1, grade-2, and grade-3. The eighteen folders contain
data on different date fruits according to the industry's requirements. This dataset will contribute to
developing an intelligent system for date fruit grading and inspection to add value to the sustainable
economic growth of fruit processing industries and farmers locally and internationally.
SPECIFICATIONS TABLE
 Subject               Computer Science, Agriculture Science, Machine learning
 Specific subject      Computer vision, Image processing, embedded systems.
 area
 Data format          Raw image having Jpg format
 Type of data         Jpg images data
 Data collection      The dataset has been collected in a controlled environment through a ring light
                      with a 26 cm outer diameter and a colour temperature of 3200K-5600K. The
                      Huawei 6ys camera with 13 megapixels is used to acquire images of different
                      varieties of date fruit. A sample of date fruit has been collected from the local
                      industry of Khairpur Mir's. Date fruits were segregated with the help of industry
                      experts and local farmers.
 Data source          Date fruit collected from the Khairpur Mirs local industry and Village Pir Bux
 location             Maitlo Sindh Pakistan farmers.
 Data accessibility   Repository name: Mandalay Data
                      Data identification number: 10.17632/s5zfvsw5kv.1
                      Direct URL to data: https://data.mendeley.com/datasets/s5zfvsw5kv/1
VALUE OF THE DATA
       Date fruit processing is one of the most challenging and time-consuming jobs. A machine
        learning-based model is used to sort and classify the date fruit.
       Fruit processing develops a system to reduce food waste, increase sustainable growth, and
        increase quality production that stably farmers economically.
       The dataset is beneficial for the researcher to conduct the studies to develop a novel system
        for date fruit processing. Additionally, researchers will perform experiments and testing
        along with other date fruit varieties datasets to enhance variety classification, like authors'
        published datasets of different varieties[1][2].
DATA DESCRIPTION
 The dataset is classified contains images of four varieties of date fruit. The acquired dataset has
been organized according to the international market standard [3]. The Date fruit folder comprises
four varieties-based sub-folders (Aseel, Fasli Toto, Gajar and Kupro). These folders are divided into
sub-folders (Large, Medium, and Small) except for the Kupro date variety. The Kupro variety contains
only large-scale-based data. Moreover, each size-based category is stored separately using grading
parameters (Grade-1, Grade-2, and Grade-3).
Aseel and Fasli toto date fruit folders contain large, medium, and small sub-folders in which the
grade-1 folder is stored. The Gajar variety folder contains three sub-folders according to size (Large,
Medium, and Small). These folders also contain three grade folders (Grade-1, Grade-2, and Grade-3)
—the sample of the collected dataset mentioned in Fig. 1, fig.2, fig.3, and 4.
                                            Aseel Date Fruit
 Grade       Large                            Medium                           Small
 Grade-1
                                   Fig.1 Sample of Aseel Date Fruit
                                         Fasli Toto Date Fruit
     Grade       Large                  Medium                         Small
      Grade-1
                                       Fig.2 Sample of Fasli Toto
                            Gajar Date Fruit
Grade       Large              Medium                  Small
  Grade-1
  Grade-2
  Grade-3
                    Fig.3 Sample of Gajar Date Fruit
                           Kupro Date Fruit
 Size            Grade-1                       Grade-2                         Grade-3
        Large
                                    Fig.4 Sample of Kupro Date Fruit
EXPERIMENTAL DESIGN, MATERIALS AND METHODS
Dataset acquired through a predefined environment where 26 cm outer diameter a ring light having
3200K-5600K colour temperature with white light. The Hawaii smartphone, with 13 megapixels, has
been used for image acquisition. After capturing the images of different varieties of date fruit were
stored in different folders to fulfil the requirement of standards of the international export market[3].
The date fruit varieties samples were collected from the local industry of Khairpur Mir's and farmers
of date fruits called Bekher. Each variety sample size was different according to their availability. The
sample of date fruit varieties were Aseel 500, Fasli Toto 550, Gajar 700, and Kupro 500. Experts of
local industry and farmers examined the collected samples. The experts separated date fruit
according to defined standards parameters like size, colour, texture, and shape.
Once they had shortlisted date fruits for dataset preparation, the image of selected date fruit
varieties was acquired. Images were acquired through the top view with a 25.4 cm distance between
the camera and the object. The captured images were verified manually to remove blurred and poor-
quality images.
The filtered dataset was used for preprocessing; Images were cropped, and object extraction was
performed through Python language script using the OpenCV library. The Canny Edge detector was
applied with 10,250 to identify the object's edges. The morphological kernel was created using
morphological structuring with 7,7 values. Furthermore, the detected edges and morphological
kernel passed for morphological transformations using the opening operation to extract object
shapes. When the object was successfully extracted from raw material, a new image was created
with the write function of OpenCV and stored in respective folders. Overall, 3,004 images were stored
in the dataset. Each image has different width and height due to size-based grading. Table 1, table 2,
and Table 3 depict the required images detail.
Table 1
Grade-1 date fruit detail.
                               Grade-1
 Date Fruit            Large       Medium   Small   Sub Total
 Varieties
 Aseel                 155        168       171     494
 Fasli Toto            153        153       84      390
 Gajar                 175        201       200     576
 Kupro                 324        0         0       324
                                            Total   1784
Table 2
Grade-2 date fruit detail.
                               Grade-2
 Date Fruit            Large       Medium   Small   Sub Total
 Varieties
 Gajar                 120        62        141     323
 Kupro                 370        0         0       370
                                            Total   693
Table 3
Grade-3 date fruit detail.
                               Grade-3
 Date Fruit            Large       Medium   Small   Sub Total
 Varieties
 Gajar                 122        80        209     411
 Kupro                 116        0         0       116
                                            Total   527
LIMITATIONS
‘Not applicable.
ETHICS STATEMENT
The funding was not received from any agency or organization. Therefore, no conflict of interest.
Also, both humans and animals were not involved in utilizing experiment.
CRediT AUTHOR STATEMENT
Abdul Khalique: Methodology, Data curation, Writing, Investigation, Validation. Riaz Ahmed Shaikh:
Formal Analysis, Project Administration; Rafaqat Hussain Arain: Drafting, Data Collection.
ACKNOWLEDGEMENTS
This research received no specific grant from funding agencies in the public, commercial, or not-for-
profit sectors.
DECLARATION OF COMPETING INTERESTS
         The authors declare that they have no known competing financial interests or personal
          relationships that could have appeared to influence the work reported in this paper.
REFERENCES
[1]       D. B. Pérez-Pérez, R. Salomón-Torres, and J. P. García-Vázquez, “Dataset for localization and
          classification of Medjool dates in digital images,” Data Br., vol. 36, 2021, doi:
          10.1016/j.dib.2021.107116.
[2]       H. Altaheri, M. Alsulaiman, G. Muhammad, S. U. Amin, M. Bencherif, and M. Mekhtiche,
          “Date fruit dataset for intelligent harvesting,” Data Br., vol. 26, p. 104514, 2019, doi:
          10.1016/j.dib.2019.104514.
[3]       “STANDARD FOR DATES CXS 143-1985 Adopted in 1985. Amended in 2019.,” 2019.