BigData 1
BigData 1
net/publication/378373891
CITATIONS READS
0 263
3 authors:
Saad Ahmed
Riphah International University
22 PUBLICATIONS 97 CITATIONS
SEE PROFILE
All content following this page was uploaded by Syeda Alishba Fatima on 22 February 2024.
Abstract—This comprehensive research investigates using distributed processing [11-16]. It focuses on the critical role of
distributed and parallel computing for big data analytics in these technologies in precision farming, crop monitoring, yield
agriculture to improve farming operations' sustainability, prediction, and forecasting, as well as the impact of frameworks
efficiency, and innovation. The paper emphasizes how big like Hadoop and Spark in processing and analysing agricultural
data analytics, cloud computing, and parallel distributed data for informed decision-making and optimised farming
processing can revolutionize the agricultural industry. The operations [17-22].
research objectives include investigating the benefits and
limitations of big data analytics in precision farming and A. Volume
crop monitoring, identifying the constraints of integrating Sensors, satellite imagery, weather stations, and other
big data analytics in agriculture and investigating the role sources generate massive amounts of data, and big data
of frameworks such as Hadoop and Spark in processing and analytics is utilised in agriculture to manage this data [23-26].
analyzing agricultural data for informed decision-making This includes data on crop condition, weather patterns, soil
and optimized farming operations. The methodology used condition, and other topics. Volume emphasises the massive
in the paper is a literature review, which draws on various amount of data required for analytics. This feature emphasises
sources to provide insights into the topic matter. The the importance of handling huge data quantities that are beyond
findings indicate that big data analytics can considerably the capacity of typical systems.
improve precision farming and crop monitoring;
nevertheless, there are hurdles to incorporating big data B. Velocity
analytics in agriculture, such as data privacy and security In the agriculture industry, the rate at which data is created
concerns. According to the study, frameworks such as and processed is crucial. The second V in the field of big data
Hadoop and Spark are crucial in processing and analyzing analytics is the rate at which information is generated,
agricultural data for informed decision-making and better disseminated, and analysed. It is critical to obtain information
farming operations. Overall, this study offers useful insights from multiple sources in a timely manner, but it is also critical
into the possibilities of big data analytics and distributed to put that information to use as soon as feasible. This speed is
and parallel computing in revolutionizing the agriculture
essential for quick decision-making because delayed processing
industry.
may prohibit firms from keeping up with the volume of data.
Farmers can make timely irrigation, pest management, and
Index Terms— Big data, analytics, parallelization.
harvesting decisions by using real-time data from IoT devices,
I. INTRODUCTION weather sensors, and machinery [27-29].
C. Variety
T HIS paper provides a strong investigation of the
transformative possibilities of distributed and parallel
computing in agriculture big data analytics [1-4]. It emphasises
Agricultural data comes in a variety of formats, including
semi-structured data from Internet of Things devices,
the importance of harnessing sophisticated technologies to unstructured data from satellite photos, and structured data from
improve farming operations' sustainability, efficiency, and databases. By examining this diversity of data, it is able to
innovation [5-10]. This study intends to address the issues and anticipate crop yields, detect disease outbreaks, and enhance
opportunities in the Agriculture sector by diving into the planting schedules. Variety emphasises the various forms and
application of big data analytics, cloud computing, and parallel sources that can be employed to collect large data. In contrast
This work is licensed under a Creative Commons AttributionShareAlike4.0 International License, which permits
Unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
2
to typical databases, which only contain orderly tables, big data languages, systems, and procedures must be generated on a
can contain a variety of unstructured formats, such as regular basis.
information from social media, sensor data, and other sources.
A. Big Data's Present Status in Agriculture
This is troublesome since arranging the data for processing may
be difficult [30-32]. Big Data in agriculture is gaining traction as more people
grasp how data-driven technology can alter farming processes,
D. Veracity improve decision-making, and allow farmers to produce more
Data quality is critical in agriculture to ensure reliable food on less land.Big data is currently employed in a variety of
decision-making. Addressing issues with data accuracy, areas, including business, healthcare, and agriculture, and it is
consistency, and completeness is required to ensure the veracity rapidly expanding across all of these markets. Because of
of agricultural data. Farmers can gain more reliable insights people's reliance on big data to remain ahead of the curve and
from higher-quality data analysis. The reliability and keep their products and services up to date with developing
correctness of vast amounts of data are referred to as data trends, the big data market is also rapidly expanding. The
veracity. When resolving problems, it is critical to ensure that volume of data also has a major impact on the conclusion and
the data is reliable so that judgments may be made with validity of the data [4].
confidence. Because of the numerous duplicates, irregularities,
B. Big Data's Accuracy in The agricultural sector
and inconsistencies observed in large data, analysis and
processing are required to improve data analysis accuracy [33]. To assure the accuracy of big data analysis in agriculture, an
emphasis on data quality, integration, precision agriculture
E. Value: practices, technology infrastructure, and the dependability of
The ultimate goal of big data analytics in agriculture is to decision support systems is required. Taking care of these
produce relevant insights that benefit farmers. This includes challenges can increase the precision and efficacy of big data
reducing environmental impact, increasing crop yields, and use in agriculture. Low data volumes will diminish the model's
allocating resources as efficiently as feasible. Social media and accuracy, implying that predictions will be less accurate and
the internet have had an impact on the modern age, resulting in may result in incorrect assumptions. The three basic Big Data
a large increase in digital data, or "big data." There are several frameworks—Velocity, Volume, and Variety—are covered in
platforms from which to generate this, such as wearable this section [4]. In the diagram below, we explore the accuracy
trackers, mobile devices, and sensors. The magnitude of big of the data when any of the three variables is altered and how it
data poses significant challenges to the capability of existing affects the accuracy.
storage, processing, and analysis technologies. To overcome In agriculture, parallelization is required to manage data
these difficulties, it is vital to constantly develop new models, scalability, allow for real-time review, process data effectively,
languages, systems, and algorithms for efficiently gathering, maximise resource consumption, and improve decision-making
categorising, analysing, and learning from huge data [34]. for lucrative and sustainable farming operations. Parallelization
is critical in data analytics for increasing productivity and
II. LITERATURE REVIEW lowering costs. This method involves breaking the problem
In the application of big data in agriculture, four major phases down into smaller pieces and running them concurrently on
can be distinguished: data collecting, data transformation, data numerous cores, threads, or processors. This reduces
storage, data analysis, and data marketing. The availability of calculation time and allows for the processing of larger datasets.
free Big Data analysis tools encourages the growth of smart This strategy works effectively when engaging with time
farming research. The new agricultural applications promise to systems or real-time data.
enhance food production and security by empowering farmers C. Advantages of Parallelization for Agricultural Matters:
to employ efficient agricultural methods with appropriate Time Reduction and Cost Effectiveness
recommendations that are environmentally friendly. When Parallelization allows computational workloads to be divided
Precision Agriculture (PA) is implemented, the agriculture into smaller, parallelizable subtasks. This significantly reduces
industry generates massive amounts of diverse data. Data the amount of time required to complete complex
collection includes soil parameters, seeding rates, and crop computational activities such as simulating, analysing
yields, in addition to historical records on weather patterns, enormous datasets, and optimising crop management strategies.
terrain, and crop performance [35]. Parallelization enhances resource utilisation, which reduces
The Internet of Things and social media platforms have agricultural costs. By speeding up computational procedures,
generated an unprecedented amount of digital data, which is farmers and researchers can make better use of computing
expanding at an unprecedented rate in the modern period [3]. resources, reduce operational costs, and boost overall output.
Big Data refers to information that comes from a variety of Parallelization has the potential to substantially reduce the
sources, including security cameras, wearable trackers, mobile amount of time necessary to manage datasets by allowing
devices, and sensors. The sheer volume of Big Data, on the smaller actions to be completed in parallel. This accelerates
other hand, poses enormous difficulties to conventional processing and is valuable in instances where quick insights
processing, storage, and analytical capabilities. To properly from real-time data are required.
obtain, preserve, assess, and learn from Big Data, new models, Employing cloud computing to analyse big data concerning
3
agriculture[24]Cloud computing provides powerful data optimization, soil condition and nutrient management, market
storage and management capabilities, advanced analytics and price intelligence, disease and pest management, and other
services, and scalable, accessible, and low-cost computational applications are some of the key uses of big data in agriculture.
resources, making big data analytics easier to deploy in These examples demonstrate how big data is altering
agriculture. Agricultural stakeholders may benefit from this agriculture in a variety of ways, allowing stakeholders to make
link by better utilising big data to develop innovative farming informed decisions, allocate resources wisely, and boost
practices, manage resources more effectively, and make better productivity while fostering resilience and sustainability in the
decisions. Security, processing, storing, and analysing big data agricultural business.
present a variety of challenges. This aids in the management
F. Intelligent Transportation System
of massive amounts of data. Cloud computing, which offers
scalable and cost-effective solutions, can help to tackle these Intelligent Transportation Systems (ITS) are critical in
issues.[5] Cloud computing enables quick access to a shared agriculture for boosting operational effectiveness, upgrading
resource pool, which includes storage and applications. [6] It transportation and logistics systems, and fostering sustainable
qualifies multitasking systems such as MapReduce, Hadoop, farming methods. By leveraging cutting-edge technologies and
and Big Table for big data applications. Cloud computing data-driven insights, ITS systems assist to optimise
successfully manages and analyses data by utilising cutting- transportation operations, improve supply chain management,
edge techniques such as virtualization, indexing, and mining. and connect transportation with precision agriculture practices.
Furthermore, authentication and encryption are used to preserve Big data techniques radically revolutionised the transportation
data privacy and security. The link between big data and cloud business by allowing massive amounts of data produced by a
computing can significantly cut infrastructure costs and number of sources, such as GPS, sensors, and monitoring
simplify data processing, allowing for faster analysis. devices, to be collected and analysed. The development of these
algorithms seeks to improve the efficiency of the intelligent
D. Big Data's Uncertainty in Agriculture transportation system and provide data for traffic control. Big
Big data applications in agriculture are fraught with data in ITS has driven the development of increasingly
unknowns and problems that must be overcome before they can complicated models to handle the rise of deep learning as an
be implemented successfully. Data quality and dependability, algorithm [8].
security and privacy concerns, data integration and
interoperability, infrastructure for technology and accessibility, III. METHODOLOGY
ethical and social implications, and regulatory and legislative The following research aims are highlighted in this
frameworks are among the challenges surrounding big data in comprehensive review paper:
agriculture. A comprehensive approach that considers data 1. To investigate the use of distributed and parallel computing
governance, technology innovation, stakeholder participation, for big data analytics in agriculture.
and ethical considerations is required to address these 2. To investigate the potential benefits and drawbacks of using
challenges. By proactively removing these impediments, the big data analytics in agriculture.
agriculture industry may be able to successfully leverage the 3. To investigate the role of several frameworks, such as
benefits of big data while reducing associated uncertainties. Hadoop and Spark, in the processing and analysis of
Uncertainty pervades every step of the process due to a agricultural data.
variety of issues such as idea variance, limited data gathering, 4. To emphasise the importance of big data analytics, cloud
and the complexity provided by multimodal data. Particular computing, and parallel distributed processing in transforming
situations of uncertainty were highlighted, such as a large agriculture and improving operational results and decision-
number of missing links in social networks and missing values making.
for temporal data elements. According to their forecast, 80 A. Key Features of the Parallel Architecture
percent of the world's data would be unclear by 2015. When
The core of the design is its simplicity and independence from
biases, incompleteness, or erroneous sampling pollute training other file systems. It does not use static partitioning like
data, uncertainty's negative influence on analytical outputs is standard configurations and instead partitions data dynamically
increased. To solve these issues, a range of complicated during runtime. Each processing node works separately,
approaches such as probability theory, fuzziness, fuzzy logic, handling a distinct data partition. This independence
Bayesian theory, and belief function theory are presented. Each encourages balanced parallel processing across nodes, which is
of the aforementioned strategies provides a specific strategy for critical for speed optimization.
dealing with uncertainty, which reflects the state of the art [7].
B. Node Interaction and Master Processing
E. Big Data's Implications for agriculture The design takes a decentralised approach, with numerous
Big data applications in agriculture offer a wide range processing nodes operating concurrently. Each node processes
of scenarios for use that make use of massive volumes of its allocated data partition separately, promoting parallelism in
heterogeneous data to increase sustainability, efficiency, and data computation. The partial findings from each node are then
innovation in agricultural operations. Precision farming, crop pooled at a master processing node. This master node is critical
monitoring, yield prediction and forecasting, supply chain in aggregating scattered calculations and finalising overall
4
results. This technique guarantees good coordination and J. Agricultural Data Processing Using YARN and Spark
synchronisation of parallel activities, which contributes to the Integrating Spark with YARN for parallelism provides a
architecture's overall efficiency. powerful approach for improving agricultural data processing
and analysis. This section looks at how to use YARN's resource
C. Hadoop and Spark Frameworks for Agricultural Data management with Spark's cluster computing technologies for
Processing parallel and distributed processing, allowing for complex
machine learning algorithms and real-time decision assistance.
The Hadoop framework, an open-source behemoth, and the
Spark's distributed computing capabilities to better understand
adaptable Spark engine serve as the foundation of agricultural
agricultural data, allocate resources more efficiently, and make
data processing. HDFS, YARN, and MapReduce are key
data-driven decisions that will increase production and
components that contribute to scalable and effective data
sustainability in the industry by deploying Spark in agriculture.
processing.
K. Spark Executor Configuration and Dynamic Allocation
D. The Impact of Hadoop on Agricultural Innovation This section describes the deployment of Spark applications,
with a focus on the development of Application Masters and
This section delves into Hadoop's critical role in modern
Spark executors. It looks into configuration settings such as the
agriculture, examining how the architecture supplies critical
number of executors, the number of cores per executor, and the
computational resources. Key benefits for agricultural
amount of memory per executor, stressing appropriate
stakeholders include improved data analysis, real-time decision
parallelism and dynamic allocation for better resource usage.
support, and the incorporation of cutting-edge technology.
L. MapReduce Optimization Techniques for Agricultural
E. Uncovering Hadoop's Potential: Storage and Processing
Data
This section delves into Hadoop's dual functioning as a
This section discusses optimization techniques in
storage and processing solution. We deconstruct how Hadoop
MapReduce, including data location optimization, partitioning,
successfully handles and analyses huge data in the agricultural
combiner functions, and job parallelism. It stresses the
setting, from the distributed structure of the Hadoop file system
efficiency attained by MapReduce Online, which allows for
to its scalability.
continuous and interactive queries on streaming data for better
F. MapReduce: The Foundation of Agricultural Data resource utilisation in agriculture.
Processing
M. Pig: Using High-Level Scripting to Simplify Agricultural
This section investigates the popularity of MapReduce in Data Analysis
parallel programming, focusing on its scalability, simplicity,
This section examines the merits of Pig, a high-level
throughput, and fault tolerance. MapReduce remains a
scripting language built on Hadoop MapReduce, for
preferred open-source framework for large-scale data
agricultural data processing. Pig's simplified scripting language
processing in agriculture in both cloud computing settings and
makes it easier to communicate complex data conversions and
commodity clusters.
analytics procedures, making it a vital tool for the agriculture
G. Online MapReduce Real-Time Analysis business.
This section focuses on the capabilities of MapReduce N. Agricultural Data Processing Map-Reduce Execution
Online for online agricultural data analysis, utilising a Model
customised version of Hadoop MapReduce. Its adaptability and
This section delves into the Map-Reduce execution model,
aptitude for dynamic data processing, which distinguishes it
explaining how Pig Latin is translated into MapReduce tasks
from the original model, are critical aspects in its importance in
that are run on a Hadoop cluster. It highlights MapReduce's
the agricultural industry.
parallel and distributed processing capabilities, making it a
H. YARN for Agriculture: Cluster Resource Management viable method for managing enormous agricultural datasets and
This section examines how YARN, as a key component of generating significant insights.
Hadoop's MapReduce 2.0 framework, alters cluster design. The O. Agricultural Data Processing Scalability and Efficiency
elimination of bottlenecks and the flexibility to scale beyond
This section emphasises the combined contribution of the
4000 nodes make YARN critical for providing computational
discussed technologies to boosting resource efficiency,
capability to the agriculture business.
sustainability, and production in farming techniques by
I. YARN in Action: Resource Management Optimization addressing the scalability and efficiency elements of the
This section discusses how YARN maintains optimal use by discussed technologies. The combination of YARN and Spark,
delving into its core role of efficiently allocating resources such optimization techniques in MapReduce, and the use of Pig pave
as CPU and memory across diverse applications. The the way for productive and insightful data analysis in the
relationship between Application Master and Resource agriculture business.
Manager, as well as the role of NodeManagers, is critical for Moreover, a wide range of tools and algorithms for creating
improved agricultural resource management. prediction models, clustering analysis, and recommendation
5
systems are available in Spark's machine learning library Parameter Apache Storm Apache Spark
(MLlib). These tools and algorithms can be used for agricultural Low latency, Lower latency for
use cases like disease detection, crop yield prediction, and Latency suitable for time- real-time
resource optimization. Additionally, Spark's GraphX library's sensitive tasks processing
graph processing capabilities make it possible to analyse
Simpler to use
intricate agricultural networks, including supply chains, More complex,
with a unified batch
logistics for transportation, and social connections among Ease of Use designed for stream
and stream
farming communities. Organisations can use processing
processing model
Apache Storm Precision agriculture, agricultural yield
Programmi Supports
prediction, and weather forecasting can all benefit from Apache
ng multiple languages
Storm's distributed real-time data processing capabilities. Primarily Java
Language (Java, Scala,
Storm's real-time handling of high-velocity data streams
Support Python)
enables agricultural enterprises to monitor and respond to
Fault tolerance
changing field conditions. Because of its support for machine Strong fault
Fault through lineage
learning libraries and complex event processing, predictive tolerance through
Tolerance information and
algorithms for agricultural applications can be constructed. message hacking
data replication
Storm's fault-tolerant architecture, which provides continuous
data processing and analysis, enables increased availability and Highly scalable,
Scales well for
reliability for agricultural data systems. By applying Apache suitable for large-
Scalability high-throughput
Storm to the agriculture industry, organisations can improve scale data
streams
efficiency and long-term profitability. This enables the study of processing
real-time agricultural data, the efficient allocation of resources, Resilient
Data
and the formulation of data-driven decisions. Tuple-based Distributed
Processing
processing model Datasets (RDDs)
P. Real-time Stream Processing Paradigm
and DataFrames
Apache Storm is specifically designed for processing real- Integration with Tight integration
Integration
time, streaming data. [20]It enables the processing of Hadoop and with Hadoop,
with
continuous data streams with low-latency requirements, various data extensive
Ecosystem
making it suitable for applications that demand real-time sources ecosystem support
analytics. Limited support,
Batch Strong support
Q. Scalability not the primary
Processing for batch processing
focus
Storm is horizontally scalable, which means that more nodes
Easier
may be added to the cluster to manage increasing demands.
deployment with
Because of its scalability, it is well-suited to handling massive Ease of Requires more
built-in cluster
amounts of data and addressing the demands of Big Data Deploymen manual setup and
management (via
applications. In the event of a node loss, Apache Storm provides t configuration
Spark Standalone,
built-in fault tolerance by shifting processing jobs to other
YARN, or Mesos)
nodes. This ensures that the calculation remains uninterrupted
Large and active
and contributes to the system's reliability. While Apache Spark Communit Active
open-source
is commonly used for batch and micro-batch processing, y Support community support
community support
Apache Storm is a better solution for applications that require
Suitable for
low-latency and continuous streaming data processing. The Well-suited for
monitoring and
decision between Spark and Storm is frequently determined by analysing large
reacting to real-
the use case's specific requirements and the nature of the data datasets in
time events in
processing burden. Use in agriculture,
precision
Agriculture including crop yield
R. Comparison of Apache Storm and Spark in Agricultural agriculture,
Domain: prediction, soil
weather
analysis, and farm
monitoring, and
management
Parameter Apache Storm Apache Spark sensor data
Processing Real-time Batch and real- S. NoSQL Databases in Agriculture: Trade-off Analysis
Model stream processing time processing
NoSQL databases such as MongoDB, CouchDB, and HBase
Ideal for real- promote availability and partition tolerance over tight
Suitable for both
time data consistency, addressing issues associated with maintaining
Use Case batch and real-time
processing in varied agricultural data sets. Despite not having the data
processing
streams integrity of relational databases, they allow for excellent pre-
6
processing and analysis, allowing machine learning techniques concerns. Overall, our paper has shown how big data analytics,
to be used for smart agricultural decision-making. cloud computing, and parallel distributed processing have the
potential to alter the agriculture industry and improve
T. Knowledge of CAP Theory in Distributed Systems operational performance and decision-making.
The CAP theorem describes the trade-offs in distributed
systems between consistency, availability, and partition A. Limitations
tolerance. Exploring these characteristics clarifies the trade-offs Despite the potential benefits of using big data analytics in
involved in developing solutions for effective data management agriculture, there are significant constraints and problems
in agriculture. involved with its implementation. One drawback is a lack of
uniformity in data collection and management, which can result
U. Using NoSQL Databases to Accelerate Data Processing in discrepancies and mistakes in agricultural data analysis.
Because of their distributed and horizontally scalable nature, Another constraint is the high cost of establishing and
NoSQL databases expedite data processing. Their ability to maintaining big data analytics infrastructure, which may be too
efficiently handle enormous volumes of data, particularly in expensive for small-scale farms. Concerns have also been
real-time circumstances, distinguishes them as valuable raised concerning data privacy and security, as well as the
instruments for fast data analysis in agricultural applications. possibility of unexpected outcomes such as greater
The Role of Kafka in Agricultural Data Flow Management environmental impact. These constraints and problems must be
Kafka, a versatile platform, improves agricultural data flow solved in order to fully realise the potential of big data analytics
efficiency across precision farming, smart agriculture, and agri- in agriculture.
tech applications. Its fault-tolerant, scalable, and high- B. Future Work
throughput qualities make it well-suited for a variety of
activities such as sensor data collection, equipment telemetry, There are various areas for future research in the application of
supply chain management, and real-time environmental distributed and parallel computing for big data analytics in
monitoring. agriculture. One area of future research will be to solve the
issues of data quality, privacy, and security in agricultural data.
Parallelism and Efficiency in Apache Kafka Data Pipelines
Another area of future work will be to investigate new prospects
Kafka's distributed streaming infrastructure excels at
for innovation and sustainability in farming systems utilising
parallelism by partitioning pipelines for high throughput and big data analytics. Furthermore, additional study is required to
fault tolerance. This section looks at how Kafka keeps message optimise the performance of various frameworks such as
order within partitions, which helps with both speed and Hadoop and Spark in processing and analysing agricultural
sequence preservation in data processing. data.
V. Kafka Fault Tolerance: Ensuring Uninterrupted
Processing REFERENCES
Parallelism in Kafka contributes to its fault-tolerance [1] M. Y. Sokiyna, M. J Aqel, and O. A. Naqshbandi, “Cloud computing
technology algorithms capabilities in managing and processing big data
characteristics. This section describes how Kafka automatically in business organizations: Mapreduce, hadoop, parallel programming,”
reassigns partitions in the event of consumer failures within a Journal of Information Technology Management, vol. 12, no. 3, pp. 100–
group, ensuring continuous data processing and avoiding single 113, 2020.
[2] T. Hussain, A. Sanga, and S. Mongia, “Big data hadoop tools and
points of failure. technologies: A review,” in Proceedings of International Conference on
Advancements in Computing & Management (ICACM), 2019.
W. Kafka Streams: Taking Advantage of Parallelism in [3] Y. Zhang, J. Ren, J. Liu, C. Xu, H. Guo, and Y. Liu, “A survey on
Stream Processing emerging computing paradigms for big data,” Chinese Journal of
Electronics, vol. 26, no. 1, pp. 1–12, 2017.
Kafka Streams is a high-level stream processing toolkit that
[4] H. Su, “How Accurate are Predictions Made Using Big Data?,” in 2022
extends Kafka's partitioning mechanism to allow for parallel 7th International Conference on Social Sciences and Economic
processing across several instances. This section describes how Development (ICSSED 2022), Atlantis Press, 2022, pp. 806–810.
Kafka Streams helps agricultural stream processing [5] H. Elazhary, “Cloud computing for big data,” MAGNT Res Rep, vol. 2,
no. 4, pp. 135–144, 2014.
applications be robust and scalable. [6] D. Agrawal, S. Das, and A. El Abbadi, “Big data and cloud computing:
current state and future opportunities,” in Proceedings of the 14th
international conference on extending database technology, 2011, pp.
530–533.
IV. CONCLUSION [7] R. H. Hariri, E. M. Fredericks, and K. M. Bowers, “Uncertainty in big
data analytics: survey, opportunities, and challenges,” J Big Data, vol. 6,
Our review study provides a thorough examination of the use no. 1, pp. 1–16, 2019.
of distributed and parallel computing for big data analytics in [8] S. Kaffash, A. T. Nguyen, and J. Zhu, “Big data algorithms and
the agriculture industry. It investigated the role of different applications in intelligent transportation system: A review and
frameworks, such as Hadoop and Spark, in processing and bibliometric analysis,” Int J Prod Econ, vol. 231, p. 107868, 2021.
[9] Z. H. Munim, M. Dushenko, V. J. Jimenez, M. H. Shakil, and M. Imset,
analysing agricultural data in order to promote informed “Big data and artificial [10] intelligence in the maritime industry: a
decision-making and optimise farming operations. However, bibliometric review and future research directions,” Maritime Policy &
the report acknowledges the limitations and obstacles Management, vol. 47, no. 5, pp. 577–597, 2020.
connected with the integration of big data analytics in [10] C. Ordonez, S. T. Al-Amin, and X. Zhou, “A simple low cost parallel
architecture for big data analytics,” in 2020 IEEE International
agriculture, such as data quality, infrastructure, and privacy Conference on Big Data (Big Data), IEEE, 2020, pp. 2827–2832.
7
[11] L. Belcastro, R. Cantini, F. Marozzo, A. Orsino, D. Talia, and P. Trunfio, [24] D. Waga and K. Rabah, "Environmental Conditions’ Big Data
“Programming big data analysis: principles and solutions,” J Big Data, Management and Cloud Computing Analytics for Sustainable
vol. 9, no. 1, pp. 1–50, 2022. Agriculture," *World Journal of Computer Application and Technology*,
[12] S. Zeebaree, H. Shukur, L. Haji, R. Zebari, K. Jacksi, and S. Abass, vol. 3, no. 2, pp. 73-81, 2014.
“Characteristics and Analysis of Hadoop Distributed Systems,” [25] Abawajy, J. (2015). Comprehensive analysis of big data variety
Technology Reports of Kansai University, vol. 62, pp. 1555–1564, Apr. landscape. International journal of parallel, emergent and distributed
2020. systems, 30(1), 5-14
[13] P. Natesan, V. E. Sathishkumar, S. K. Mathivanan, M. Venkatasen, P. [26] Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M.,
Jayagopal, and S. M. Allayear, “A Distributed Framework for Predictive Kulkarni, S., ... & Murthy, R. (2014). Storm@ twitter. In Proceedings of
Analytics Using Big Data and MapReduce Parallel Programming,” Math the 2014 ACM SIGMOD international conference on Management of
Probl Eng, vol. 2023, 2023. data (pp. 147-156). - Taylor, J. R., & Kumar, L. (2016). Big data analytics
[14] R. K. Verma, S. Singh, and Y. Mohan, “Importance Of Big Data And in agriculture: A review. Computers and Electronics in Agriculture, 123,
Cloud Computing Techniques In Modern Scenario,” J Algebr Stat, vol. 389-398
13, no. 2, pp. 1024–1043, 2022. [27] Shankarnarayan, V.K. and Ramakrishna, H. "Paradigm change in Indian
[15] J.-Y. Kim and J. Kim, “Optimized data processing analysis using big data agricultural practices using Big Data: Challenges and opportunities from
cloud platform,” Journal of Knowledge Information Technology and field to plate." Information Processing in Agriculture, 7(3), pp.355-368.
Systems (JKITS), vol. 16, no. 1, pp. 1–7, 2021. [28] Yadav, R., Rathod, J. and Nair, V. "Big data meets small sensors in
[16] A. P. D. R. Hassan and J. N. Hasoon, “Big Data Techniques: A Survey,” precision agriculture." International Journal of Computer Applications,
Iraqi Journal of Information Technology Vol, vol. 9, no. 4, p. 2018, 2019. 975, p.8887.
[17] S. C. #1 and Z. Ansari, “Apache Pig-A Data Flow Framework Based on [29] Anjanamma, C. and Rao, N.S. "An internet of things (IoT) system
Hadoop Map Reduce,” International Journal of Engineering Trends and development and implementation of data analytics in agriculture
Technology, vol. 50, 2017, Accessed: Dec. 21, 2023. [Online]. Available: production safety enhancement." Materials Today: Proceedings.
http://www.ijettjournal.org [30] Kaur, R., Garg, R. and Aggarwal, H. "Big data analytics framework to
[18] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig latin: identify crop disease and recommendation a solution." 2016 International
a not-so-foreign language for data processing,” in Proceedings of the 2008 Conference on Inventive Computation Technologies (ICICT) (Vol. 2, pp.
ACM SIGMOD international conference on Management of data, 2008, 1-5). IEEE.
pp. 1099–1110. [31] Rajeswari, S., Suthendran, K. and Rajakumar, K. "A smart agricultural
[19] N. Deshai, S. Venkataramana, B. Sekhar, K. Srinivas, and G. P. Saradhi model by integrating IoT, mobile and cloud-based big data analytics."
Varma, “A Study on Big Data Processing Frameworks: Spark and Storm,” 2017 international conference on intelligent computing and control (I2C2)
in Smart Intelligent Computing and Applications: Proceedings of the (pp. 1-5). IEEE.
Third International Conference on Smart Computing and Informatics, [32] Priya, R., Ramesh, D. and Khosla, E. "Crop prediction on the region belts
Volume 2, Springer, 2020, pp. 415–424. of India: a Naïve Bayes MapReduce precision agricultural model." 2018
[20] R. Myung, H. Yu, and D. Lee, “Optimizing parallelism of big data international conference on advances in computing, communications and
analytics at distributed computing system,” Int J Adv Sci Eng Inf Technol, informatics (ICACCI) (pp. 99-104). IEEE.
vol. 7, no. 5, pp. 1716–1721, 2017. [33] Bhosale, S.V., Thombare, R.A., Dhemey, P.G. and Chaudhari, A.N.
[21] A. Ali, S. Naeem, S. Anam, and M. M. Ahmed, “A state of art survey for "Crop yield prediction using data analytics and hybrid approach." 2018
Big Data processing and NoSQL database architecture,” International Fourth International Conference on Computing Communication Control
Journal of Computing and Digital Systems, vol. 14, no. 1, p. 1, 2023. and Automation (ICCUBEA) (pp. 1-5). IEEE. ,[object Object],
[22] B. Leang, S. Ean, G.-A. Ryu, and K.-H. Yoo, “Improvement of Kafka [34] Ip, R.H., Ang, L.M., Seng, K.P., Broster, J.C. and Pratley, J.E. "Big data
streaming using partition and multi-threading in big data environment,” and machine learning for crop protection." Computers and Electronics in
Sensors, vol. 19, no. 1, p. 134, 2019. Agriculture, 151, pp.376-383.
[23] Á. B. Hernández, M. S. Perez, S. Gupta, and V. Muntés-Mulero, “Using [35] "International Journal of Scientific Research in Engineering and
machine learning to optimize parallelism in big data applications,” Future Management (IJSREM) Volume: 06 Issue: 04 | April - 2022.
Generation Computer Systems, vol. 86, pp. 1076–1092, 2018, doi:
https://doi.org/10.1016/j.future.2017.07.003.