Big Data Analytics For SCM
Big Data Analytics For SCM
                                         Benny Tjahjono
              Supply Chain Research Centre, School of Management, Cranfield University
                Cranfield, Bedford MK43 0AL, UK, E-mail: b.tjahjono@cranfield.ac.uk
                                              ABSTRACT
       Big Data Analytics offers vast prospects in today’s business transformation. Whilst big data
       have remarkably captured the attentions of both practitioners and researchers especially in
       the financial services and marketing sectors, there is a myriad of premises that big data
       analytics can play even more crucial roles in Supply Chain Management (SCM). This paper
       therefore intends to explore these premises. The investigation ranges from the fundamentals
       of big data analytics, its taxonomy and the level of maturity of big data analytics solutions
       in each of them, to implementation issues and best practices. Finally, some examples of
       advanced analytics applications will also be presented as a way of unveiling some of the
       relatively unexplored territories in big data analytics research.
1. INTRODUCTION
        Major business players who embrace Big Data as a new paradigm are seemingly offered
endless promises of business transformation and operational efficiency improvements. In Supply
Chain Management (SCM) in particular, some examples have captured the attention of both
practitioners and researchers, hitting the headlines of recent news. Amazon uses Big Data to
monitor, track and secure 1.5 billion items in its inventory that are laying around 200 fulfilment
centres around the world, and then relies on predictive analytics for its ‘anticipatory shipping’ to
predict when a customer will purchase a product, and pre-ship it to a depot close to the final
destination (Ritson, 2014). Wal-Mart handles more than a million customer transactions each hour
(Sanders, 2014), imports information into databases to contain more than 2.5 petabytes and asked
their suppliers to tag shipments with radio frequency identification (RFID) systems (Feng et al.,
2014) that can generate 100 to 1000 times the data of conventional bar code systems. UPS
deployment of telematics in their freight segment helped in their global redesign of logistical
networks (Davenport and Patil, 2012).
        SCM organisations are inundated with data, so much that McAfee and Brynjolfsson (2012)
reported “business collect more data than they know what to do with”. This is apparently true in
firms that are considered a benchmark for warehouse data management, marketing or
transportation. Nonetheless, the reality reveals that these cases are not just anecdotes of success;
they are the face of a change where failure to adapt could mean irrelevance. Hopkins et al. (2010)
reported from a Sloan Management Review survey that analytics’ top performers outpace industry
peers performance up to three times.
                         th
                        6 International Conference on Operations and Supply Chain Management, Bali, 2014
        While most organisations have high expectations from Big Data Analytics (BDA) in their
supply chain, the actual use is limited and many firms struggle to unveil its business value
(Pearson et al., 2014). In the pursuit of a change to that situation and a willingness to guide the
SCM practice to capitalise BDA, the overall aim of this research is to close the knowledge gap
between data science and Supply Chain Management domain, linking the data, technology and
functional knowledge in BDA applications across procurement, transportation, warehouse
operations and marketing. Specifically, this paper will (1) redefine, by research on previous
scientific work, what BDA means in the context of Supply Chain Management, and how it differs
and has evolved from previous analytics technologies; (2) develop taxonomy of Big Data within
SCM that identifies and classifies the different sources and types of data arising in modern supply
chains and (3) suggest some applications of BDA and show the potential high value this
technology offers to solve complex SCM challenges.
3. RESEARCH METHODOLOGY
        Gimenez (2005) argued that conducting research in SCM through the application of
multiple methods assures that variances are trait-related and not method-related, as well as the fact
that each methodology is more appropriate for the development of a particular stage of the
research. In order to build a definition of BDA and its associated list of themes, the first part of the
research was about understanding BDA in its own terms. Like most of the areas close to Big Data,
BDA meaning is mainly what people have made of it. The systematic literature review
transformed a broad spectrum of documentation first into a delimited set of themes, and then into
synthesised extracted data. The analysis of the themes structure resulted in a somewhat exhaustive
description of its features, specifically in the SCM context and produced a solid base of
knowledge and substantive justification on which to build subsequent phases of the research.
        The inclusion of the case study in this work was to maintain practicality at the core. Case
studies investigated simultaneous BDA examples, typically in emerging practices, thus being a
successful way of including the latest trends detected in the industry. Both business cases from the
                        th
                       6 International Conference on Operations and Supply Chain Management, Bali, 2014
literature as well as those reported through semi-structured interviews with consultants at a major
consulting company in the UK were used. The combined systematic literature review and case
studies was used to create a toolset that is based on academic sources as well as practical
experience and that was helpful and useful to use.
      Decision trees, CART and Random Forests that use a hierarchical sequential structure;
      Kernel methods: Support Vector Machines (SVM, LS-SVM) and Neural networks/multi-
      layer perceptron
    • Clustering, the most extended unsupervised learning technique that includes hierarchical,
      k-means and density based models.
    • Dimensionality reduction, such as t-distributed stochastic neighbour embedding.
        Manyika et al. (2011) reflected their vision of Big Data as “the next frontier for innovation,
competition, and productivity”. Their definition of Big Data is associated with high computer
power requirements: “Big data refers to datasets whose size is beyond the ability of typical
database software tools to capture, store, manage, and analyse”. The application of advanced
analytics in SCM derived in the appearance of Supply Chain Analytics, a subset of technologies
part of the extended supply chain and the precedent of what BDA is considered today in SCM.
Early Supply Chain Analytics resembled OLAP tools that support multidimensional analysis of
data from transactional databases, allowing for summarisation, consolidation and multi-
perspective data view, enabling to measure, monitor, forecast and manage data on SCM business
processes (Smith, 2000).
        The focus on better business process has led some authors such as Grimes (2000) to
identify Supply Chain Analytics as a business process reengineering enabler. Marabotti (2003)
                          th
                         6 International Conference on Operations and Supply Chain Management, Bali, 2014
added the fact that the analytics information must be presented and extracted in a way that
supported the final user. The evolution of Business Intelligence (BI) enabled wider possibilities of
data integration, and Supply Chain Analytics targeted enhanced visibility across the whole supply
chain (Sahay and Ranjan, 2008). Also, processing velocity made the use of data mining intelligent
methods to extract more complex patterns much more accessible, as well as to update information
in real time, so the patterns responded not only to past but to current business situations.
         Pearson (2011b) made a shift in the definition referring to the fact that the purpose of the
analysis should be “forward-looking”, and also assessing the impact on “prospective” decisions.
O’Dwyer and Renner (2011) synthesised this shift, already evolving to the term Advanced Supply
Chain Analytics, describing a new paradigm where models have to be proactive to data instead of
reactive. Waller and Fawcett (2013), reaffirmed the need for including domain knowledge in the
use of analytics. Sanders (2014) offered a generic definition of BDA without specifically tailoring
it for SCM. The evolution of definitions of Analytics in SCM is summarised in Table 1.
        So far, the concept of Supply Chain Analytics does not appear to cover the interaction with
Big Data technologies until very recently. This situation is identified as a lag between the
emergence of new BDA technologies and their accepted use in SCM. BDA is the natural evolution
of data analysis in SCM. The lack of previous attempts to conceptualise this phenomenon has led
us to propose the following definition that converged the general concepts above and closed the
research question of the systematic review.
Finding 1: SCM Big Data Analytics is the process of applying advanced analytics techniques in
           combination with SCM theory to datasets whose volume, velocity or variety require
           information technology tools from the Big Data technology stack; leveraging supply
           chain professionals with the ability to continually sense and respond to SCM relevant
           problems by providing accurate and timely business insights.
                                           th
                                         6 International Conference on Operations and Supply Chain Management, Bali, 2014
- Variety
        Each of the three shaded areas includes data sources that fall between core transactional
data, internal systems data or others, respectively. The frontier of all three areas has a much wider
horizon when moving along the variety of formats (horizontally) than on the other two dimensions
(vertically). If the model E above is a linear regression, all parameters in vector β are strictly
positive. In practical means, that fact relates to a positive correlation between larger volumes and
velocity of information in unstructured formats. This proposition is supported by many
                        th
                       6 International Conference on Operations and Supply Chain Management, Bali, 2014
practitioners and academia, and although there is no previous conclusive quantitative analysis, it is
considered as rule of thumb that 80% of usable business information is unstructured (Roberts,
2010).
        Validation of that trend in SCM has clear implications in the approach to data management
for BDA. Although transactional data in relational databases from different systems such as ERP,
CRM or SRM remain as the core of internal information and have relative high volumes, they are
relatively a small fraction of the total data sources available for use (8 out of 52 in the taxonomy).
        Following an observation of high concentration of points at the top right, most of the
customer interface data platforms are in this high volume/unstructured region: social media, online
surveys or mobile location devices. Email data is another example. Massively employed nowadays
as the first communication and information tool, email is rarely used for analysis, when it certainly
provides unstructured feedback about experiences with clients or suppliers (Ordenes et al., 2014).
Finding 2: SCM Big Data sources are commonly generated in unstructured formats that are
           difficult to analyse with traditional IT tools. Whilst data management has focused on
           expanding velocity and volume capabilities for transactional data, the number of core
           transactional data sources is relatively small. There is an asymmetry in SCM data
           sources between the relatively smaller variations of volume and speed versus the
           larger ones in data variety, and a positive correlation between the unstructured
           formats and high volume/velocity.
the data sources not only for spending data management process, but also for the entire
procurement function.
        Warehouse management (particularly inventory management) has been radically changed
by modern identification systems after successful introduction of RFID. Within this group, the
largest clusters of data are related to an automated sensing capability, especially as the Internet of
Things and extended sensors, connectivity and intelligence to material handling and packaging
systems applications evolved. Position sensors for on-shelf availability share space with
traditionally SKU levels and BOMs.
        Transportation analysis applying Operational Research models has been widely used for
location, network design or vehicle routing using origin and destination (OND), logistics network
topology or transportation costs as “static” data, as described by Crainic and Laporte (1997). New
alternatives to manage and coordinate in real time using operational data rely on mobile and direct
sensing over shipments that are integrated into in-transit inventory, estimated lead times based on
traffic conditions, weather variables, real time marginal cost for different channels, intelligent
transportation systems or crowd-based delivery networks among sources of Big Data. A detailed
analysis of the 3 Vs in transportation data revealed to be the lever with proportionally higher
speeds in data transition.
Figure 2. Kamada-Kawai Network of the identified Big Data sources across SCM
        Most data sources appearing in the periphery, with a high level of symmetry, suggest large
data sets with incidence only on one of the four SCM functions, or at least with a much stronger
association with one of the four, rather than being utilised across the whole SCM enterprise. There
are a number of data sources that can be grouped together, e.g. location information (Clusters 55,
                               th
                              6 International Conference on Operations and Supply Chain Management, Bali, 2014
42, 35, 24) between marketing and transportation, or data from shipment orders (Clusters 17, 30,
16) between procurement and transportation; but most sources are hosted by a single domain.
        In a more favourable scenario, not necessarily all sources would need to share the same
degree of incidence across the whole SCM, as certainly all kinds of data have a particular area
where they are more useful, but the prevalence of many datasets anchored to only one area (52%),
the fact that in most of that cases the area is the same as where the data was first generated, and the
belief that among then a large fraction would add value in other areas, makes us suspect of
systemic data silos.
        This generates a barrier for BDA implementation due to the emphasised importance of
aggregated layers of data from multiple sources in order to enhance the predictive capabilities of
such models. As an example, a procurement department that has managerial incentives to apply a
BDA model that monitors raw material pricing (44) in order to predict the best moment to buy at a
low in the market. If they do not include in their model in-transit inventory (28) or inventory costs
(29) associated with the final products using the raw materials (which is information hosted at the
warehouse operations lever in our model and not at procurement), then obtaining more raw
materials when there is enough final products in stock, even at low prices, could be suboptimal for
the company by creating higher inventories and pushing costs downstream.
        Research by Dell’Anno and Dukatz (2014) found out that leveraging many different data
sources unlock value by fostering data connections and gaining actionable insights quickly.
Addressing key challenges on “movement, processing and interactivity” of the data would help
organisations achieving the modern data supply chain. Daugherty et al. (2014) reported that only 1
out of 5 organisations integrate their data across the enterprise.. In order to improve this situation
they presented a model of data intelligent transportation throughout the organisation that could
help breaking down data silos, usually built and owned by a single department, and enable data to
flow freely for the benefit of the whole organisation.
Finding 3: SCM Big Data are made up of large information silos distributed among business
           functions and external sources, largely not interconnected, and therefore do not
           provide an end-to-end visibility of SCM. As a basis for BDA models generating
           accurate insights valuable to the organisation as a whole, and not only to single
           processes or sub-functions, most organisations must strive to make disparate data
           sources accessible by aggregating their data into a single point of access.
1
    A fuller list can be provided upon request
                                 th
                                6 International Conference on Operations and Supply Chain Management, Bali, 2014
SCM lever        Functional       Type of data               BDA proposed solution                               BDA techniques
                 problem
Marketing        Sentiment        Blogs and news,            1. Create lexicons from training datasets that      Natural language
                 analysis of      feeds, ratings and            identify key terms that relate to the demand     processing
                 demand new       reputation from 3rd           of a product.                                    Text mining with R tm
                 trends           parties, web logs,         2. Integrate all data sources that relate to a      package: (Corpus,
                                  loyalty programs, call        product into a unified text corpus.              term-document matrix)
                                  centres records,           3. Use supervised learning algorithms to            Logistic regression,
                                  customer surveys              predict sentiment scores of the corpus’ term     random forests, CART,
                                                                document matrix based on training datasets.      Naïve Bayes, k-NN;
Procurement      Informing        SRM Transaction            1. Capture performance requirements for             Suitable supervised
                 supplier         data, Supplier current        procurement contracts (SLA or other quality      learning algorithms,
                 negotiations     capacity & top                measures).                                       expert systems
                                  customers, supplier        2. Require or publicly capture data regarding       modelling
                                  financial performance         previous transactions of the supplier with
                                  information                   other third parties in similar characteristics
                                                                (delivery locations, lead times).
Warehouse        Warranty         Internet of things         1. Aggregate multiple sensing sources on real       t-distributed stochastic
Operations       Analytics        sensing, user                 time with reports on monitored assets            neighbour embedding
                                  demographics,                 together with user demographics.                 (t-SNE)
                                  historical asset usage     2. Aggregate patterns in user and usage
                                  data                          clusters in order to generate
                                                                multidimensional segmentations.
Transportation   Real time        Traffic density,           1. In order to address time variability for         Spatial regression
                 route            weather conditions,           deliveries in predefined networks, model the     modelling
                 optimisation     transport systems             delivery network and update it with current
                                  constraints, intelligent      position of delivery units.
                                  transport systems,         2. New requirements for delivery are entered
                                  GPS-enabled Big               in the system. Taking into account all
                                  Data telematics               network availability factors, from each
                                                                delivery unit a spatial regression predicts
                                                                time/cost of serving a delivery to other point
                                                                of the network.
 7. CONCLUDING REMARKS
          We concur with Waller and Fawcett (2013) who (more or less) argue that previous
 research had not yet properly closed the gap between supply chain functional knowledge, supply
 chain data and BDA techniques which was the reason to present this paper bottom-up, inferring
 the strategic benefits of BDA from the understanding of the data sources present in the supply
 chain, and from the application of BDA models to specific problems in SCM. Some of the
 practical applications proposed a disruptive shift for certain SCM activities that require a holistic
 change in the strategy. However, in other cases, BDA offers substantial efficiency improvements
 to existing processes with minor modifications, apart from the fact of understanding problems
 both functionally in SCM terms, and analytically in BDA terms.
          We argue that in order to succeed in Big Data, we need to consider the data no longer as an
 information asset but as a strategic asset. By doing so, organisations in SCM could realise the
 economic value inherent in the data and the potential to capitalise it when combined with BDA
 through revenue generating activities. Some evidence presented here demonstrated that BDA is in
 its early stages in the supply chain, but the incoming steps will show the potential of BDA through
 more specific applications in SCM.
                               th
                             6 International Conference on Operations and Supply Chain Management, Bali, 2014
8. REFERENCES2
Antai, I and Olson, H., (2013). Interaction: a new focus for supply chain vs supply chain
     competition, International Journal of Physical Distribution & Logistics Management 43 (7),
     pp.511- 528
Barratt, M. and Oke, A., (2007). Antecedents of supply chain visibility in retail supply chains: A
     resource-based theory perspective, Journal of Operations Management 25 (6), pp.1217-1233
Chae, B., Sheu, C., Yang, C. and Olson, D., (2014). The impact of advanced analytics and data
     accuracy on operational performance: A contingent resource based theory (RBT)
     perspective, Decision Support Systems 59 (1), pp. 119-126
Christopher, M. (2011). Logistics & supply chain management, 4th Ed, FT Prentice Hall, NY
Edwards, P., Peters, M. and Sharman, G., (2001). The Effectiveness of Information Systems in
     Supporting the Extended Supply Chain, Journal of Business Logistics 22 (1), 1-27
Grimes, S. (2000). Here today, gone tomorrow, Intelligent Enterprise 3 (9), pp. 42-48
Laney, D., (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety,
     Gartner
Lustig, I., Dietrich, B., Johnson, C. and Dziekan, C., (2010). The Analytics Journey, Institute for
     Operations Research and the Management Sciences
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. and Byers, A. H. (2011).
     Big data: The next frontier for innovation, competition, and productivity, Big Data: The Next
     Frontier for Innovation, Competition & Productivity, pp. 1-143
Marabotti, D. (2003). Build supplier metrics, build better product, Quality 42 (2), pp. 40-43.
O’Dwyer, J. and Renner, R. (2011). The Promise of Advanced Supply Chain Analytics, Supply
     Chain Management Review 15(1), pp. 32-37
Pearson, M. (2011b). Predictive Analytics: Looking forward to better supply chain decisions,
     Logistics Management 50 (9), pp. 22
Sahay, B.S. and Ranjan, J. (2008). Real time business intelligence in supply chain analytics,
     Information Management & Computer Security 16 (1), pp. 28-48
Sanders, N. R. (2014). Big Data Driven Supply Chain Management: A Framework for
     Implementing Analytics and Turning Information into Intelligence, 1st Ed, Pearson, NJ
Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die,
     Wiley Publishing
Smith, M. (2000). The visible supply chain, Intelligent Enterprise 3 (16), pp. 44-50
Smith, G. E., Watson, K. J., Baker, W. H. and Pokorski, J. A., (2007). A critical balance:
     collaboration and security in the IT-enabled supply chain, International Journal of
     Production Research 45 (11), pp. 2595-2613
Waller, M. A. and Fawcett, S. E. (2013). Data Science, Predictive Analytics, and Big Data: A
     Revolution That Will Transform Supply Chain Design and Management, Journal of
     Business Logistics 34(2), pp. 77-84
Ward and Barker (2013). School of Computer Science, University of St Andrews, Undefined By
     Data: A Survey of Big Data Definitions UK
Watson, H. J. (2013a). The Business Case for Analytics, BizEd 12(3), pp. 49-54
Zeng, X., Lin., D and Xu, Q. (2011). Query Performance Tuning in Supply Chain Analytics, 4th
     Int Conf on Computational Sciences and Optimization (CSO), pp. 327
2
    Due to space limitation, other reference items can be provided upon request