Comminution and Flotation Models
Comminution and Flotation Models
Minerals Engineering
journal homepage: www.elsevier.com/locate/mineng
A R T I C L E I N F O A B S T R A C T
Keywords: Deep learning (DL), a subset of machine learning (ML) has been a popular research interest after obtaining
Automated Machine Learning remarkable achievements on various tasks like image classification, object detection, language processing, and
Simulation and Modelling artificial intelligence. However, the successes of these algorithms were highly dependent on human expertise for
Comminution
hyperparameter optimisation and data preparation. As a result, widespread application of DL systems in minerals
Flotation
Neural Networks
processing is still absent despite the increasing ability to collect data from process information (PI) and assay
data. Automated Machine Learning (AutoML) is an emerging area of research which aims to automate the
development of ready-to-use end-to-end ML models with little to no user ML knowledge. However, existing
commercially available AutoML algorithms are not well designed for minerals processing data.
In this study, we develop an AutoML algorithm to develop steady-state minerals processing models suitable for
mine scheduling and process optimisation. The algorithm consists of data filtering, temporal resolution align
ment, feature selection, neural network architecture search, and development. The AutoML algorithm was tested
on three case studies of different processes and ore types. These case studies cover the range of difficulties of
possible datasets encountered in the mining and processing industry from clean simulated data to noisy data with
poor correlation. The algorithm successfully developed neural network models within hours from hourly raw PI
and/or daily assay data with no human intervention. These models derived from process data have minimal
errors as low as < 3 % for major valuables like Ni and Cu, 6–7 % for by-products like Au, 8–10 % for deleterious
minerals like MgO, and 5–8 % for gangue.
The algorithm was also designed so that expert minerals processing knowledge can influence the pipeline to
improve the quality of models. As a result, the AutoML algorithm becomes a powerful tool for mining and
mineral processing experts to apply their domain knowledge of the process to develop models of equipment or
processing circuits.
1. Introduction i) Physics-based models. These are purely derived from first prin
ciples and require complete understanding of the phenomena
One of the uses of mathematical modelling of units in minerals occurring. If every possible characteristic of the unit operation
processing are for mine scheduling or plant life optimisation to maxi can be measured and characterised, then physics-based models
mise efficiency of the life of mine. These unit models can be divided into are feasible.
three categories: ii) Phenomenological models. These contain some physical bounds
(like conservation of mass or energy) where understood but
* Corresponding author.
E-mail address: edwin.koh@uq.net.au (E.J.Y. Koh).
https://doi.org/10.1016/j.mineng.2022.107886
Received 26 January 2022; Received in revised form 30 August 2022; Accepted 10 October 2022
Available online 23 October 2022
0892-6875/© 2022 Elsevier Ltd. All rights reserved.
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
phenomena which are not understood or difficult to measure are plants, there is demand for the development of an AutoML algorithm to
calibrated through empirical constants. help minerals processing engineers predict the behaviour of the process
iii) Data-driven models. The input–output relationships of these plant.
models are derived purely based on their correlation and This study aims to develop an Automated Machine Learning
distribution. (AutoML) algorithm that provides the capability of producing end-to-
end neural network regression models with steps designed specifically
Wherever possible, i) and ii) are preferred as they are bound by for minerals processing data collected from operations. The scope of this
physics and predictions are interpretable. However, physics-based AutoML pipeline is to focus on steady-state simulation of the mine value-
models are uncommon in industrial use because not every physical chain process for:
characteristic of unit operations can be constantly measured. Therefore,
phenomenological models are the typical choice of model and empirical i) mine scheduling (selecting the order of blocks processed by the
constants are used in-place of physical properties which are not easily chain or blending stockpiles) (Amini et al., 2020),
measurable. Despite this, calibration of these models requires extensive ii) set-point optimisation of processing for block model, changing
experimental design and pilot studies to fully understand the effect and target p80 or blasting conditions for increased throughput,
response of each parameter. On the other hand, some prefer data-driven iii) adding units to the value chain e.g., adding a new high pressure
models because they are quicker to develop from readily available his grinding rolls unit.
torical data. Data-driven models can be accurate when predicting within
the training data bounds but have poor extrapolation capabilities. As a The AutoML algorithm was designed to be used together with the
result, data-driven models are only useful for brownfield operations Integrated Extraction Simulator (IES) (Stange et al., 2014) which is a
whereas physics-based and phenomenological models are more suitable cloud-based flowsheet simulator of the mine value chain from blasting
for greenfield operations or when investigating operating conditions to flotation. For example, most sulphide concentrators have well-
outside of available historical data. calibrated comminution circuits and blast models but require develop
For models to be suitable for process optimisation and mine sched ment of an empirical model for their flotation circuit from historical data
uling, these models must not contain input variables that are only to optimise the entire value chain. The simulation capabilities of IES are
measurable upon processing or are directly dependent on the outputs. further compounded with:
For example, mill power draw cannot be used as an input to predict
throughput because the mill power draw would not be measurable until i) multi-component modelling,
the ore is processed by the mill. Sadat et al. (2018) proposed a neural ii) ability to develop and connect multiple phenomenological,
network model to predict SAG power from industrial data but included physics-based, and empirical models together in the same
mill load cell mass as an input. This model would not be useful for mine flowsheet,
scheduling because mill load cell mass is only measurable after the ore iii) an in-line non-linear optimiser, KNITRO (Byrd et al., 2006) which
has been processed. A more useful approach by Amini et al. (2021) incorporates constraints along the value chain, and
utilised typical operating conditions and ore characteristics as inputs to iv) mass simulation capabilities by exploiting cloud-based
predict mill load and consequently power draw. computing to evaluate many simulations (Amini et al., 2020).
Machine learning (ML) models developed from historical sensor and
assay data from operations are an emerging form of data-driven models For example, Amini et al., (2020) combined a Semi-Autogenous
in minerals processing. Deep Learning (DL) is a subset of ML which Grinding (SAG) Mill Power ML model with the M435 Variable Rates
specifically uses deep neural networks with more than one hidden layer. SAG Mill model by JKTech in IES to perform 14 million simulations with
Examples of ML regression models in minerals processing include thin- actual value chain constraints for mine scheduling. Other commercial
section microscopy (Koh et al. 2021b), flotation (Nakhaei and Iranna packages like GE Digital’s Proficy CSense (GE Digital, 2022) also ad
jad, 2013), comminution (Baek and Choi, 2019), hydrocyclone perfor vertises developing AutoML from process data but there is no published
mance (Mohanty et al. 2019) among many others (McCoy and Auret, literature on the underlying algorithm. AutoML is a broad research area
2019; Aldrich et al. 2010). Most published research applying DL in and there are many different possible steps taken to develop a variety of
minerals processing utilise relatively small neural networks due to data ML models (He et al., 2021). In contrast, this paper outlines all the steps
limitations as they are developed from laboratory data (typically 〈1 0 0) in the AutoML algorithm based on literature recommendations.
(McCoy and Auret, 2019). In comparison, historical Process Information Although the AutoML algorithm was developed to be made for use with
data (PI) and assay data are collected in modern plants in the order of IES, the software uses open-source libraries and can be implemented
thousands of samples. Datasets of this size are adequate for deeper into any program.
models that can offer increased accuracy over traditional statistical The algorithm expands on Koh et al. (2021a) which utilised an
models. evolutionary algorithm to search for a surrogate comminution model. In
Development of reliable and accurate ML models in minerals pro this study, additional components like data filtering, temporal resolu
cessing requires two types of expertise: 1- the knowledge in ML and tion, and feature selection were added to the AutoML algorithm. The
computer science to develop and deploy ML models, 2 – a comprehen application of the AutoML algorithm is presented through three case
sive understanding of the underlying process and data collected. It is a studies of different minerals processing sites of different ore types. The
rare occurrence for engineers in the mining industry to have both. AutoML algorithm can produce data-driven models from PI and assay
Without the proper precautions, the flexibility of ML models can predict data without ML and computer science expert but just with knowledge
the training data but yield poor performance in unseen test data. This of the process itself. By developing an AutoML algorithm for minerals
phenomenon is known as overfitting and is common in ML models that processing, we will achieve the following objectives:
are not developed correctly. Recently, Automated Machine Learning
(AutoML) algorithms have been proposed to automate the ML model i) Provide a tool for plant metallurgists to develop accurate neural
selection and optimisation through search algorithms to remove the network regression models without in depth programming and
requirement for expert knowledge in ML and computer science (He ML knowledge,
et al., 2021). However, these off-the-shelf AutoML algorithms still lack ii) Reduce model development time compared to physics-based and
the suitable data preparation steps like temporal resolution for minerals even manually developed ML models, typically within hours from
processing and do not allow expert process knowledge to influence the data to model,
results. Considering the wide availability of historical data collected in
2
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
iii) Enable operations to create predictive models of unit operations step is provided here along with their significance to minerals processing
or circuits and simulate the entire minerals processing plant data, but more information can be found in the original review paper.
along with existing phenomenological models. Data Collection. One of the main difficulties of deep learning is
obtaining a high-quality labelled dataset. In minerals processing, this
Although the authors understand there are benefits of incorporating has limited the use of deep neural networks as most neural networks
dynamic models for long-term planning, one of the objectives is to use developed in literature are shallow and for pilot-scale or laboratory
the developed ML models with existing phenomenological models for experiments which have<100 experimental conditions (McCoy and
simulation but the latter is typically steady-state models. Dealing with Auret, 2019). Most modern plants collect process data through sensors
dynamic process and time-series data have their own challenges and into process historians at various sampling periods between seconds (1 s,
require different data curation rules and model types and would be 10 s, 60 s …) to hours. These can then be aggregated to a single time
investigated in a future study. resolution depending on the algorithm and data. In this study, the pro
This section first introduces the neural network hyperparameters cess data from the process historian is aggregated and provided in an
specifically used in regression modelling and precautions to design the hourly resolution. The appropriate selection of various algorithms
optimal model without overfitting. Then, components in a typical available used to aggregate the multiple time periods is outside the
AutoML algorithm are briefly introduced with a focus on their context scope of this study. Next, there is also process assay data at various
for minerals processing data. Finally, algorithms like feature selection sampling points in the process which is done per shift or daily. In this
and evolutionary algorithms which minimise overfitting in AutoML study, the assay data is provided at a daily resolution.
literature are presented. Data Cleaning. Sensors can degrade over time and lose accuracy due
to poor maintenance. Routine events like process start up or shut down
1.1. Neural networks for regression for maintenance or failures can also cause the process to run sub-
optimally, yielding outliers in the dataset. Expert knowledge about the
The simplest form of a DL model for steady-state regression is the process and knowing typical operating conditions can benefit in filtering
fully connected feedforward neural network. There are several hyper the dataset.
parameter groups that are considered when optimising this type of Data Augmentation. In minerals processing, there is a temporal res
neural networks: olution step to combine process data and assay data to the same reso
lution for modelling. In this pipeline, the algorithm aggregates the
i) Number of nodes in each hidden layer and number of hidden layers. hourly process data and daily assay data, to a daily resolution whilst
These influence the flexibility and predictive capability of the incorporating filtering rules. In general, for multiple datasets to be used
neural network. The optimum size depends on the complexity of together, all datasets must be resolved to the dataset with the lowest
the underlying input–output parameters correlations that needs temporal resolution.
to be managed based on the number of the observations/data sets Feature Selection: As mentioned in Section 1.1, optimal selection of
available. This is typically optimised through a trial-and-error input features is important to reduce overfitting. Redundant features
practice using the validation data set (the validation set is a that are highly correlated can introduce noise and yield poor predictions
pseudo-test set used during training to optimise on unseen data. Including features which has no impact on the phe
hyperparameters). nomena also renders the model unusable when one or more features are
ii) Activation function(s). The non-linear activation functions are the not available. This is relevant in minerals processing PI data where
key to the neural network’s ability to approximate any non-linear sensors are prone to failure and can be highly correlated or redundant.
input–output mappings (Hornik, 1989). The most used activation For example, in a flotation circuit, an acid addition flowrate could be
function for regression is sigmoid or hyperbolic tangent. contradicting a pH sensor reading. Although the acid addition flowrate
iii) Choice of input features in the input layer. Inclusion of insignificant would provide insight to the process, ultimately the pH is the deter
inputs/features can introduce noise or overfit the training set. mining factor to flotation performance. Therefore, feature selection can
Determining the optimal set of features is known as feature be done:
selection.
iv) Learning rate and Optimizer choice. The optimizer and learning rate i) manually through knowledge of the process by eliminating
determine how the neural network weights are updated from the redundant features,
backpropagated error gradients during training. These affect the ii) linear filter algorithms like principal component analysis (PCA)
speed of convergence and model accuracy. or correlation matrix,
v) Regularization methods. Regularization is a collection of methods iii) wrapper algorithms like Backwards Elimination, or Forward
used during training to penalize unnecessary complexity to Selection.
minimise overfitting.
In this study, the Backwards Elimination wrapper method is used.
The hyperparameter choices are interdependent and the vast search This is described in detail in Section 1.2.
space can be overwhelming to an inexperienced machine learning user Feature Construction: This refers to the method of building new input
to develop an optimal neural network model. Therefore, AutoML variables based on prior knowledge to improve machine learning model
research aims to utilise algorithms to automate the selection of these performance. For example, dosages of flotation reagents can be nor
hyperparameters according to the dataset size and to minimise malised by the throughput which is more meaningful for flotation
overfitting. modelling. Although neural networks inherently perform feature con
struction with each hidden layer operation, these learned feature con
1.2. AutoML steps for minerals processing structions are not interpretable compared to hand-crafted features. By
constructing features using domain knowledge, it is possible to reduce
An overview of AutoML algorithms steps can found in the literature the network size and improve generalisation performance. Expert min
review by He et al. (2021). In this study, the AutoML algorithm is erals processing users can use their domain knowledge to construct
focused on developing neural networks with a dense feed-forward ar features prior to feeding the data into the AutoML algorithm. This is not
chitecture for minerals processing steady-state regression models. explicitly done in the AutoML algorithm as searching through basic
Therefore, some steps can be omitted, and algorithm does not need to mathematical combinations (addition, subtraction, multiplication, and
search for model types beyond this. A brief explanation of each relevant division) between pairing variables do not scale well.
3
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Neural Architecture Search: In this study, the chosen algorithm for 1.4. Evolutionary algorithm for hyperparameter optimisation
neural architecture search is an evolutionary algorithm. This algorithm
searches for the optimal hyperparameters and architecture through model The Evolutionary Algorithm (EA) is a common metaheuristic opti
evaluation on the validation set. The validation set is a set is a pseudo- mization algorithm that is inspired from biological evolution. It is suit
test set used during training to evaluate the model performance on un able for non-continuous and non-differentiable objective functions. As a
seen data. However, since the validation set is used for training, it is not result, it is a popular choice of an algorithm for optimising hyper
a true test set. Therefore, the dataset is typically divided into a training, parameters in a neural architecture search (He et al., 2021). There are
validation, and test set. Proper ML practices including regularisation other hyperparameter optimization algorithms like the Bayesian Opti
methods to prevent overfitting in model development is key to any misation, and Grid Search for deep neural networks but a study by
AutoML algorithm. In this pipeline, Early-stopping is used (Goodfellow Alibrahim & Ludwig (2021) showed that Evolutionary Algorithm always
et al. 2016; Bishop, 2006). outperformed the others. The algorithm used in this study is largely
influenced by the study published by Real et al. (2017) as it was suc
1.3. Backward elimination feature selection with wrapper cessfully demonstrated to have comprehensive search capabilities even
for deep neural network architectures.
The two main methods used for feature selection are the wrapper and
filter methods (Kohavi and John, 1997). The filter method analyses the 2. Methods
correlation or similarities of input features through the correlation
matrix or entropies and removes highly correlated features. In the This section summarises the methodology used in this study. Firstly,
wrapper method, the score of each combination of features is by training the software and hardware used in this study are detailed for repro
and evaluating the model performance on the validation set (Alpaydin, ducibility. Then, details of the algorithms in the AutoML pipeline
2014). Wrapper methods tend to give better results at the cost of developed in this study are explained. Finally, a summary of the data
increased computation (Sánchez-Maroño, 2007). In this study, the collected from the three minerals processing plants is provided.
wrapper method is used for two main reasons. Firstly, there is a possi
bility of non-linear correlations between features which are not captured 2.1. Software and hardware
by the correlation matrix. Secondly, Guyon and Elisseeff (2003) show a
simple example that features can have high correlation with each other The software in this study makes use of Python 3.7.9 64-bit, Keras
but can still be influential towards output prediction. Also, the increased 2.3.1 (Chollet, 2015), and Tensorflow 2.0.0 Central Processing Unit
computational complexity is negligible with the recent improvements in (CPU) backend (Abadi et al. 2016). The CPU backend is used because the
hardware. networks in this study do not benefit from Graphical Processing Units
The two most used feature selection wrapper method are the back (GPU) hardware acceleration. These are typically beneficial for con
ward elimination and forward selection (Guyon and Elisseeff, 2003). volutional operations and matrix inputs with much deeper networks
The backward elimination algorithm is chosen over the forward selec (greater than20 layers). The hardware used for this study was a com
tion because feature interactions are accounted for when sequentially puter with Intel i7-9700 CPU.
removing features from a full set compared to sequentially adding from
an empty set (Chowdhury and Turin, 2020). 2.2. AutoML pipeline
4
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
IES which is a cloud-based minerals processing flowsheet simulator filtering. Depending on their label, values outside these limits are either
(Stange, et al. 2014). Therefore, the steps in the algorithm are explained deleted (if they are main inputs or output) or replaced with means (free
in context of the simulator and cloud-based implementations for multi- inputs). By providing variable limits, the user chooses whether to include
tenant end-user benefits. However, the underlying methods can also be or exclude process anomalies like start-up/shut-down periods. If none
implemented for on-premise software for single-tenant users. The major are provided, the limits are calculated to be 3 standard deviations from
components in the developed AutoML pipeline are summarised in Fig. 1. the mean. However, this methodology assumes a normal distribution
The process and/or assay data is provided by the site personnel in a that may not be true. Finally, the user can provide the variable to use for
cloud storage system in the Data Upload step. For the scope of this study, stratified sampling which is explained in Section 2.2.4.
we assume that the process data is already aggregated into an hourly These configurations can greatly affect the quality of models devel
resolution through the process historian and the assay data is provided oped by the AutoML pipeline. It is also crucial that the user has
on a daily resolution. Although the proposed algorithm can execute with knowledge about the process, and potential changes in the plant that can
datasets as small as 30 days, a suitable dataset size depends on the use cause process drift. For example, liner wear can greatly affect SAG mill
case of the model and whether the dataset covers the required operating performance (Toor et al., 2015) and simply including the entire histor
conditions/ore variability that is required for prediction. The expert ical dataset might not be optimised to predict SAG mill performance.
minerals processing user provides Configuration to the AutoML pipeline. This can be broken down into maintenance cycles or simply including a
Then, the hourly PI data are resolved to the same resolution as the assay new variable “number of days since maintenance”. A brief feedback
data in the Temporal resolution step. Next, the data are filtered according report about the data is also generated after the Data Filtering step so that
to the limits set by the user in the Data Filtering step. Following this step, the user can investigate the data filtering results and decide whether to
a data cleaning report is generated and provided to the user to and re proceed or not.
quires a decision whether the pipeline proceeds with the filtered data or
to modify the data configuration. If approved, the pipeline will split the 2.2.2. Temporal resolution
filtered data into training, validation, and test datasets through stratified The temporal resolution step is necessary if multiple temporal reso
sampling in the Data Sampling step. The appropriate input features are lution data are present. The discussion for the rest of this study assumes
trialled and selected in the Feature Selection step. Once the features are the assay data have daily resolution if provided. Firstly, the algorithm
chosen, the each optimal neural network architecture for each output is checks whether the date is present in both the daily and hourly dataset,
searched in the Evolutionary Algorithm step. After the hyperparameters as sometimes the provided daily and hourly dataset have date ranges
are finalised, the model is trained and evaluated in the Model Evaluation with no overlap. To be considered a valid date, there must be 21 valid
step. In this step, the model is also compared against a simple partial hourly sensor data out of the 24 h-period which excludes invalid data
least squares regression to ensure the pipeline is executed correctly. like:
With the report, the user decides whether to accept or reject the
developed model. If accepted, the built model is translated to funda i) Main input or output readings outside of the limits,
mental mathematical operations for deployment in the Model Deploy ii) No reading available or negative reading or non-numerical reading
ment step and published as an available Model for Use on IES which can (for numerical variables).
be used alongside other equipment models If rejected, the user returns to
the beginning of the pipeline to modify the configuration. The following If these criteria are met, the mean is taken for only the valid readings
section provides greater detail of each of these steps. for each hourly variables and used as the daily equivalent resolution to
be combined with the daily assay data. If these criteria are not met, these
2.2.1. Configuration dates are flagged as invalid when counting the total available readings in
With most commercially available AutoML algorithms, the user has 2.2.3.
very little interaction with the algorithm besides uploading the data and
selecting the machine learning task. In this AutoML algorithm, the 2.2.3. Data filtering
expert user can provide information to influence the data filtering, By this step, the dataset would only be in one temporal resolution,
feature selection, and data sampling step. The user is required to label whether daily (if both or only daily data are present) or hourly (if only
each variable in the uploaded data as one of the following: hourly data is present). The filtering algorithm checks each variable for
invalid values as above. Days with invalid readings for any main input
i) Main input. The user is certain that this variable directly affects variables or output variables are removed as these cannot be modelled.
the output, e.g., throughput of a comminution circuit to predict Then, for free inputs, if there are<5 % invalid values, the invalid values
power draw. Therefore, data rows where this variable is not are replaced with the mean of the valid values corresponding to that
available will be removed. This variable cannot be removed variable. This helps with restoring invalid PI sensor data when there are
during feature selection. The feature selection algorithm does not malfunctions or sensor drift. Otherwise, if there are more than 5 %
yield a unique set of inputs especially when there are highly invalid values the free variable is removed from the analysis.
correlated inputs. By allowing the expert user to label variables,
they can preserve inputs that have relevance to the process and 2.2.4. Data sampling
critical for model understanding. One of the recommended practices for training generalized deep
ii) Free input. This variable may or may not correlate with the output. learning models is the 3-way holdout method (Raschka, 2018). The
These input variables will be evaluated during feature selection. dataset needs to be divided into the training, validation, and test set.
iii) Redundant input. This variable is not relevant for modelling the Except for the Los Bronces Comminution Case Study, the training/vali
output or is highly correlated with another variable. For example, dation/test split is 80 %/10 %/10 %. In terms of absolute number, 10 %
if there are two comminution circuits and both throughputs and of the smallest dataset, 686 was roughly 60 or two months of data for
the total throughput are available, only the total throughput is test and validation was considered adequate. The Los Bronces case study
used to predict flotation performance. These variables are is an exception where the test set is 1000 times the training/validation
immediately removed in the filtering step. set. In classification ML problems, there is consensus that stratified
iv) Output. Data rows where this variable is not available will be sampling benefits model generalisation. This is because the dataset is
removed. divided such that each dataset contains representative data from the
original population to minimise bias in the training and model evalua
The user can also provide appropriate limits of each variable for data tion. For regression, stratification can also be done through binning the
5
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
continuous data. In this AutoML pipeline, the user can specify which distribution is shown. The mean of minimum MSE is taken to identify the
variable to perform stratified sampling on but defaults to the target best possible performance for a particular feature set.
variable if not specified.
In this process, each row/day is sorted according to the specified 2.2.6. Evolutionary algorithm
column. Hence, given a sorted dataset λ with n samples, the split dataset Once the optimal set of features Foptim is determined, the pipeline
μ which is a fraction p of the original dataset will contain every 1p th data begins optimising the neural network architecture and optimiser
from the original dataset, learning rate for the set of features,
{ } minimize (
1 2 3 )
μ = {λi }, where i = 0, , , ⋯n , and 0 < p < 1. (1) f θ, A, F = Foptim (6)
p p p θ, A
Note elements of i are rounded up when it is a decimal and zero- From our experience, multiple output neural networks models
based indexing is used. Currently, p is set to 0.1 for 10 % split for the require larger architectures for good generalisation unless the output
validation and test data. The stratified sampling algorithm above is also variables are correlated with each other. Therefore, we split the models,
deterministic and yields the same split when the input dataset is the i.e., if there were n output variables, there will be n single output models.
same. This allows for the user to infer effects of different feature com The downside to splitting the models is that the outputs can no longer be
binations or engineering on the model errors and not from effects of dependent. For example, predicting 5 particle sizes in a distribution
random sampling. The data are also min–max normalised between 0 and would be better if it were in a single model since the outputs are
1, which makes the mean squared error (MSE) function during neural dependent on each other, i.e., particle size cumulative passing must
network training smoother and allow for quicker convergence (Hastie strictly increase, or discrete particle size distributions must sum to 100
et al., 2008a). %. Thus, the evolutionary algorithm is executed n times.
The original EA algorithm can be found in (Real, et al., 2017) and
2.2.5. Feature selection demonstrated that their modifications can yield optimal solutions for
The pseudocode for the algorithm used in this study is provided in very deep convolutional neural networks without the crossover opera
Appendix 8.1. In the sequential backward elimination method, the tion and only relying on intuitive mutations. In our previous study, we
feature set starts with the entire set (main inputs and free inputs) and the successfully adopted the algorithm to only include mutations suitable
validation MSE is evaluated. Each free input is sequentially removed, for fully connected feedforward neural networks for regression models
and the validation MSE is evaluated. If the validation MSE is lower than (Koh et al., 2021a). For this AutoML pipeline, we make further modifi
the benchmark, then the current feature set is passed onto the next cations, namely:
iteration of the feature selection algorithm. This continues until there
are no free inputs left or the validation MSE stops improving. If the i) The neural network candidates are limited to have a limited
validation MSE of a neural network e, can be expressed as a function of number of parameters, at least 30 % of the training samples and
the training strategy θ, the architecture A, the feature set F, and the at most 70 % of the number of training samples. This ensures that
dataset σ , the proposed architectures do not overfit and underfit and
computational resources are not wasted evaluating unnecessary
e = f (θ, A, F, σ). (2)
candidates.
The training strategy, which is Stochastic Gradient Descent (SGD), is ii) The initialisation of population 0 is changed to cover a variety of
kept constant with Nesterov-accelerated gradients (Sutskever et al., 1 layer and 2 layer neural networks based on the number of
2013), a learning rate of 0.03, momentum = 0.9, with early-stopping, training samples (condition I still applies). This change ensures
and MSE. If the full feature set is ∅, the ideal AutoML algorithm that good potential neural network candidates are already pre
should minimize f by changing A and F simultaneously (σ is constant), sent in the first solution, reducing the number of generations
subject to F being a subset of ∅, required for convergence. Previously, the algorithm always ini
tialised with 1 hidden layer and 1 node to demonstrate the ca
minimize
f (A, F), subject to F⊂∅. (3) pabilities of the search algorithm to evolve from the most basic
A, F structure, but this is no longer necessary in pursuit of shorter
In this study, F and A are searched separately in the FS and EA al pipeline times.
gorithm, respectively. In this approach, we assume F is independent of A iii) Activation functions are no longer searched but set to hyperbolic
to determine F in the FS algorithm. To achieve this, we assumed that tangent (tanh) considering the focus is only to develop regression
using a single layer neural network with constant number of hidden models. Linear activations do not offer good approximations
nodes, A0 was an adequate proxy for the optimal generalisation per while ReLU is typically used for classification. Then, the choice of
formance of the feature set, hyperbolic tangent over the logistic sigmoid is due to more effi
cient convergence during backpropagation (LeCun et al. 1998).
f (A, F) ≈ f (A = A0 , F). (4) iv) The optimizer of choice is now Stochastic Gradient Descent (SGD)
This reduces (2) to. with Nesterov accelerated-gradients instead of Adamax (Kingma
and Ba, 2014) because SGD produces better generalised solutions
minimize than Adamax (Keskar & Socher, 2017; Zhou et al., 2020)
f (A = A0 , F), subject to F⊂∅. (5)
F although the latter converges much quicker. Having generalised
solutions are more applicable for this pipeline considering the
Ideally, EA is done for each feature set F to determine the true e in (2) scope of the pipeline is focused on industrial data where noise is
as they are not truly independent. In this case (2), if there were k features always present.
to be tested, the EA algorithm needs to run k, k – 1, k – 2, … times until v) When a mutation event is triggered, it is possible to have up to 3
termination (arithmetic sum of integers). In comparison with the mutations at once from the pool of available mutations. This
assumption (4), the EA algorithm only needs to run once. This is a sig provides larger variety of possible solutions and allows the pop
nificant time saving and the assumption is analysed in the results sec ulation to escape local minima to not converge to a small pool of
tion. Due to the stochastic nature of neural network models, the pipeline solutions.
performs 10 random initialisations for each feature set and the MSE
6
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Table 1 where n is the number of data points in that dataset, y̌j is the predicted
Summary of the results of the AutoML algorithm major steps for each of the case
study.*Number of samples quoted includes train and validation after Data
value of the output variable and yj is the actual value of the output
Filtering as these are used for the mentioned steps. variable. The final model is selected based on the lowest average of the
training and validation MAPE and the training/validation MAPE must
Method Los Bronces Nickel CdA
(14,284 Concentrator (949
be within 1 % of each other to avoid:
Samples*) (686 Samples*) Samples*)
(14 main (6 main input, 53 (6 main i) models with low train error but high validation error,
inputs, 1 free inputs, 14 inputs, 2 free ii) models with low validation error but high training error.
output) redundant, 4 inputs, 4
outputs) outputs)
Note that i) typically does not occur considering regularization is
AutoML Feature N/A 2.04 0.40
done during the training procedure to minimise validation MSE. How
Selection
(hours) ever from experience, sometimes ii) occurs when the early-stopping
Evolutionary 1.38 0.37 0.39 converges to a low validation MSE during training by chance. There
Algorithm fore, these additional criteria are employed because the end goal is to
(hours) have a model with good generalisation, i.e., with consistent low MAPE
Model 5.00 0.20 0.30
across all datasets. As the training MAPE is not directly fed into the
Evaluation
(hours) training process but at the model selection process at the end after
Total Time 6.38 2.61 1.09 training with regularization, it is unlikely that this would cause
(hours) overfitting.
PLS Total Time 43.0 4.00 3.00
If there is not a good correlation between input and outputs, the
(seconds)
model will not converge to a good solution (<10 % MAPE) across the
train, validation, and test data. In these instances, the AutoML pipeline
Thus, the hyperparameters optimised in the EA algorithm are the will raise warnings to the user to ensure that these specific outputs
optimiser learning rate, the number of layers, and the number of nodes should not be used.
in each layer. In the EA convention, the fitness function is defined such
that higher is better when comparing candidates. Therefore, the fitness 2.2.8. Model deployment
function here is just the inverse of validation MSE. The results are pre The final neural network model is deconstructed to its basic math
sented in terms of the average fitness of population and the best fitness ematical operations of summations, matrix multiplications and activa
with the highest fitness per generation. This shows where the best tion function transformations in JavaScript so that it can be deployed
candidate was found and how the overall average fitness improves with onto IES. However, this can be deconstructed to any other programming
generation. languages for deployment to any platform. This removes the need for
dependencies and third-party libraries. The model is made available on
2.2.7. Model Evaluation the list of models for the user on IES and can be combined with
Once the optimal neural network architectures and hyperparameters phenomenological models in a flowsheet for complete simulation.
are determined from the evolutionary algorithm, each output model is
trained 10 times with different random re-initialisations. For each ini
2.3. Partial least Squares regression comparison
tialisation, the quality of model is evaluated by calculating the mean
absolute percentage error (MAPE) for each output for each dataset:
Partial Least Squares (PLS) Regression is a type of statistical linear
⃒ ⃒ regression model which utilises feature extraction by linearly projecting
⃒ˇ ⃒
100%∑n ⃒⃒yj − yj ⃒⃒ the initial input features, X into a subset of latent features which max
MAPE = (7)
j=1 ⃒ y ⃒
n ⃒ j ⃒ imises the variance direction in the output, Y dimension (Hastie et al.,
2008b). PLS is suitable when the input dimensions are highly correlated
Fig. 2. Flowsheet summary of the Los Bronces Confluencia Comminution Circuit on IES. Taken from Koh et al. (2021a).
7
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Table 2 inputs, then both the AutoML model and PLS model will have high
Summary of the data provided for the Los Bronces Confluencia Las Tortolas MAPE and high bias (predicting the mean).
Comminution Circuit case study.
Resolution Number of Data Variable Name Variable 3. Case studies
Type
Daily Train (14,284),Test 7 Ore Characteristics: Main input Considering the AutoML pipeline was designed for steady-state
(14,150,279) Density,SAG Power Index simulation, we demonstrate the effectiveness of the developed AutoML
, (Spi) algorithm on three case studies of different minerals processing plants
(14 main inputs, 1 ,Bond Work Index (Bwi)
with different ore types that represent the range of “difficulties” of
output) ,Crusher Index (Ci)
,Drop Weight Test b possible datasets encountered in industry:
(stratified column)
, i) Los Bronces copper sulfide comminution circuit dataset simulated
Drop Weight Test A, from the previous study (Koh et al. 2021a). This dataset would be
Drop Weight Test ta
7 Flowsheet Conditions:
the easiest as they are data generated from calibrated phenome
Target P80, nological models with no noise and sampling issues.
SAG Feed Size Dfeed, ii) Nickel Concentrator nickel sulfide flotation circuit dataset. This
Grade Engineering dataset represents industrial data with nickel assay data that have
Throughput MGE,
relatively higher grades and consequently easier to obtain a
Grade Engineering Extra
Fines Fraction GEF, representative sample for detection than precious metals.
Grade Engineering Fines Therefore, the errors in sampling will not affect the data
Fraction GF, significantly.
Grade Engineering Medium iii) Carmen de Andacollo copper–gold flotation circuit dataset. This
Fraction GM,
dataset represents difficult industrial data with copper, gold, and
Grade Engineering Screen
Size SSGE mercury assay data. The sampling and detection methods of low-
Throughput Output grade elements are highly variable, and the noise can signifi
cantly affect the model correlations.
and can benefit from dimensionality reduction. In the case where the
Table 1 summarises the time taken for each of the major steps in the
number of latent features is equal to the number of initial input features,
AutoML algorithm in comparison with PLS for the three case studies.
the PLS regression reverts to the ordinary linear least squares regression.
The FS step time is largely dependent on the number of free inputs
To number of latent features is chosen based on the number which yields
and outputs. Meanwhile, the EA time is dependent on the number of
minimum validation MSE.
samples and number of outputs. The Model Evaluation step depends on
For each case study, PLS was done after the Feature Selection step of
the quality of data available, as the Los Bronces noise-free dataset
the AutoML step for a fair comparison between the neural network
allowed the models to train longer during the Model Evaluation step. A
models and the PLS models. At this step, both algorithms had access to
realistic estimate for the full AutoML pipeline using industrial data with
the same number of samples, input, and output variables within the
both daily and hourly datasets would be closer to 2.48 h like the Nickel
training, validation, and test sets. The non-linear mapping of neural
Concentrator case study. A simpler model like the CdA flotation dataset
networks should perform at least equal to (in the case of high collin
which only utilised 3 years of daily assay data takes only 1.09 h. In
earity between input and output variables) or better than the PLS models
comparison, the PLS models are much quicker to compute, with most
(in the case of high non-linear correlation between input and output
requiring only seconds. The PLS models are a form of linear regression
variables) in the test sets. If the output(s) have poor correlation with the
and will always be computationally simpler than any non-linear models.
Table 3
A comparison of the results from the AutoML and the previous published results from Koh et al., 2021a on the same Los Bronces dataset. *Dashed blue lines indicate the
epoch where the model weights were taken.
Output Evolutionary Algorithm Model Evaluation* Final Model Hyperparameters
8
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Fig. 3. The parity charts for the best model produced for the Los Bronces Case Study.
9
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Table 5 In the previous study, it was demonstrated that only a deep neural
Summary of the data provided for the Nickel Concentrator Flotation Circuit case network only requires 1 in 1000 sampled data out of the block model to
study. *Refer to Section 8.2 for the full list of hourly sensors. predict the entire block model for a given blasting scenario at a rate of
Resolution Number of Data Variable Name Variable 3363 times faster. In this study, we reproduce the results with the 14,284
Type training blocks sampled from the 14,164,563 block simulations
Daily 1000 4 Final Concentrate Output (14,150,279 test) with the proposed AutoML pipeline. By doing this, we
(6 main inputs, 1 free Assay:Total Nickel demonstrate the improvements in the changes made in the EA algo
input, 4 outputs, 2 Recovery rithm, and that the proposed AutoML pipeline is capable of scaling to a
redundant) (stratified column)
large dataset.
Final Con Nickel Grade
Final Con MgO Grade
Gangue Recovery 3.1.1. Los Bronces comminution circuit data
5 Flotation Feed Assay: Main input Since the exact inputs of the phenomenological model are known,
Nickel Feed Grade there were no free inputs, resulting in no feature selection step in that
Mg Feed Grade
Sulfur Feed Grade
study. All the data were simulations in daily resolution, so no temporal
Talc Feed Grade resolution nor data filtering was required. These were simulations from
Flotation Feed P80 phenomenological models which were calibrated to daily data collected
Module 1 Throughput Redundant from site. A summary of the Los Bronces comminution data can be found
Module 2 Throughput
in (Table 2).
Total Throughput Main input
Operational Hour Free input
Hourly* 23,980 12 pH Sensors Free input 3.1.2. Los Bronces comminution circuit results and discussion
(52 free inputs, 12 5 Frother Flowrates Considering the Los Bronces dataset is simulated data, it represents
redundant) 19 Guar Flowrates the most ideal dataset in minerals processing and provides a sanity check
16 Xanthate Flowrates
that the AutoML pipeline works. Table 3 compares the results from the
12 Acid Flowrates Redundant
evolutionary algorithm and model evaluation step in this study and from
Koh et al. (2021a).
was itself innovative, there was an interest to accelerate this speed for For the same number of epochs, the fitness (inverse of validation
future projects or if the block model is recalibrated. The block model can MSE) is much higher for the previous study (8920 compared to 3490).
get recalibrated or updated as more blocks are mined, and data is This is because Adamax converges much quicker than SGD (Kingma &
collected. Thus, it is desirable to reduce the simulation time and cost Ba, 2014), which is also seen in the model evaluation plots (lower MSE
whenever this occurs. Hence the objective of the previous study by Koh at the same epoch). Recall equation (2), the e is dependent on the
et al. (2021a) was to utilise a deep neural network to approximate the training strategy, θ, architecture, A, feature set, F, and dataset, σ.
phenomenological models of the calibrated comminution circuit Therefore, e is only a measure of A if all others equal. The fitness of
throughput for faster simulations using: candidates can only be compared within the same study with the same θ
but not across studies utilising different optimisers. The proposed
i) the block model inputs (density, SAG Power Index, Bond Work Index, changes to the EA algorithm resulted in convergence to the optimal
Crusher Index, and three JK Drop Weight Test parameters b, A and ta) solution 3 generations earlier (7 compared to 10). This could be a
and combination of the better initialisation, discarding poor candidates prior
ii) the seven flowsheet condition variables (target grind size, feed size, to evaluation, reducing the search space by limiting the activation
and pre-concentration conditions) which represent the operating functions, and multiple mutations per mutation instance. The resulting
strategy. parameters is much less (5,918 vs 9,136) compared to the previous
study. This is very likely due to tanh neural networks having better
Fig. 5. Feature Selection results for the Nickel Concentrator flotation data. The boxes represent the 1st and 3rd quartile, while the whiskers represent the ranges of
the model MSE.
10
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Table 6
Results of the Evolutionary Algorithm and Model Evaluation of the final models for the Nickel Concentrator flotation data. *Dashed blue lines indicate the epoch where
the model weights were taken.
Output Evolutionary Algorithm Model Evaluation* Final Model Hyperparameters
function approximation capabilities than deeper ReLU networks (De phenomenological models.
Ryck et al., 2021)
The SGD optimiser is still capable of converging to similar MSE to 3.2. Nickel Concentrator flotation circuit
Adamax in the model evaluation step but requires many more epochs.
This is acceptable as the final solution from SGD typically has better The Nickel Concentrator processes nickel sulphide ore. In this case
generalisation capabilities for noisy data (Zhou et al., 2020, Keskar & study, the project was to utilise IES to optimise the throughput circuit
Socher, 2017). Nevertheless, the resulting model still reproduces the conditions to maximise metal production. Hence, a flotation circuit
same test set MAPE (0.397 %) despite being a much smaller network model of the plant needed to be developed to reflect the upstream
(Fig. 3). The 0.397 % MAPE was deemed acceptable because the opti comminution changes on the flotation recoveries. This is because p80
miser accuracy in IES was set to 20 t/h steps when evaluating conver and throughput of the comminution circuit is related, i.e., throughput
gence. This was a trade-off between time and accuracy (as finer can be increased by increasing p80 of the comminution circuit. How
throughput resolutions require more steps during search). Hence, it is ever, there is a threshold p80 for flotation where further increasing the
impossible for the neural network to achieve better than 10 t/h error size drastically reduces flotation recovery. Ultimately, the plant needs to
(half the finest resolution) as it is trained on the optimiser solutions. We maximise metal recovery at a given grade and not just comminution
evaluated that 0.4 % MAPE is roughly equal to 10 t/h for t/h values of circuit throughput. The models would be used to make operational de
2000-4800 t/h. cisions on how to operate the comminution circuit and target p80 for
Refer to Appendix 8.3.1 for the parity plots of the PLS regression on mine scheduling rather than optimising the flotation circuit for now. The
the same dataset. The AutoML solution has lower MAPE for all datasets Nickel Concentrator has two parallel modules for grinding and deslim
compared to PLS especially for the test dataset (Table 4). This indicates ing prior to flotation. A simplified flowsheet of the Nickel Concentrator
that the underlying correlations between the input and outputs are non- flotation circuit is shown in Fig. 4.
linear. Therefore, the PLS would not be able to approximate the
11
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Fig. 6. The parity charts the best models for the 4 outputs of the Nickel Concentrator flotation data.
3.2.1. Nickel Concentrator flotation circuit data included in modelling, the throughput and recoveries would not be at its
In this case study, the AutoML pipeline was used to develop a neural typical conditions. The filtering limit of 21–24 h to remove these tran
network model of the flotation circuit to predict four outputs: total sient conditions. As we are uncertain if the operational hour has a cor
nickel recovery, final nickel concentrate grade, final MgO concentrate relation with the flotation performance, it was set as free.
grade, and gangue recovery. A summary of the industrial data used in
the Nickel Concentrator flotation case study can be found in Table 5. 3.2.2. Nickel Concentrator flotation circuit results and discussion
The Temporal Resolution step is required as there were both daily The pipeline begins by removing the 18 redundant columns specified
and hourly variables. Only the total throughput is labelled as main since in the data. Then, in the Temporal Resolution step, 96 days were
there is only one flotation circuit that concentrates material from both removed as they did not contain more than 21 operational hours or had
modules. Hence, Module 1 Throughput and Module 2 Throughput are too many outliers in the PI data. The Data Filtering step further removed
labelled redundant as the total throughput is a summation of both. The 141 days due to outliers and 25 free columns for containing too many
acid flowrates were labelled as redundant because the pH values were invalid readings. Refer to Appendix 8.2 for a list of columns removed. At
already provided. The pH limits were set from 2 to 14. All other PI sensor this step, the dataset now has 762 days with 34 input and 4 output
variables were tagged as free and the operational hour for the feature variables. The data are sampled according to the Nickel Recovery values,
selection to determine the optimal combinations. From investigating the yielding 617 training, 69 validation, and 76 test data for the Feature
data, the operational hour variable could be used to remove transient Selection algorithm (Fig. 5).
conditions like start-up and shut-down periods. If these periods were The benchmark with all the 34 input variables yielded a mean
12
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
minimum MSE and is selected over the full feature set. Fig. 7 shows that
Table 7 removing 4 columns does not significantly change the validation MSE, i.
A comparison of the AutoML and PLS models for the Nickel Concentrator Case e., these 4 columns are not influential in model fitting. Now, the dataset
Study. Lower MAPE is better. has 30 inputs and 4 output variables prior to the evolutionary algorithm.
Output Algorithm Test MAPE (%) The results from the Evolutionary Algorithm and Model Evaluation steps
Ni Recovery AutoML 2.78
can be found in Table 6.
PLS 3.27 The modification in the Evolutionary Algorithm initialisation resul
Ni Concentrate Grade AutoML 5.92 ted in less generations for convergence compared to the original initi
PLS 6.10 alisation in Los Bronces. The MSE curves for the Nickel Concentrator
MgO Concentrate Grade AutoML 10.3
flotation data show stopping points with validation and training MSE in
PLS 11.4
Gangue Recovery AutoML 5.89 the order of magnitude of 10-3 (for min–max normalised data) which
PLS 6.03 indicates a good model. This is also reflected by the low MAPE in the
parity charts across the train/validation/test datasets for all outputs
(Fig. 6).
Table 8 The MAPE is very likely from the noise of the dataset and cannot be
Summary of the data provided for the Carmen de Andacollo Flotation Circuit reduced further. The MAPE in parity charts of the Nickel Concentrator
case study. flotation data appears to be normally distributed, which indicates that
the correlations are well modelled besides noise in the data. There is no
Resolution Number of Data Variable Name Variable
Type skew in the parity charts.
Assuming there is no bias in the assaying process, this likely indicates
Daily 1,247 2 Geometallurgy Recoveries: Free
(6 main inputs, 2 Gold Geometallurgical Recovery that the assay data collection methods onsite are consistent. The low Ni
free input, 4 Copper Geometallurgical recovery MAPE also indicate that the assaying data is accurate. This also
outputs) Recovery indicates that the assay samples were taken to be representative of the
6 Flotation Feed Assay: Main process performance. If the assaying data is not accurate and represen
Flotation Throughput,
tative of the performance, then there would be no correlation with other
Flotation P80,
Total Cu Feed Grade, variables at it would be noise. Considering Ni have considerably higher
Cu Oxide Feed Grade, grades compared to Au or Hg which occurs in ppm, it is easier to obtain a
Au Feed Grade, representative sample.
Hg Feed Grade
Considering the aim of the project was to have an accurate model to
4 Flotation Recoveries:Total Cu Output
Recovery (stratified column)
predict metal recoveries to reflect upstream comminution circuit
, changes, MAPE around 5 % for gangue recovery, Ni concentrate grade
Total Au Recovery, and 2 % for Ni recovery is more than acceptable as the phenomeno
Total Gangue Recovery, logical models have MAPE around 10 %. Furthermore, the data-driven
Total Hg Recovery
approach is feasible for predicting recoveries at varying reagent addi
tions (Guar, Xanthate, and frother) and pH as inputs. Having these in
minimum MSE of 6.185 × 10-3. These were the results from removing puts to the model can also allow operators to understand effect of
columns in the Feature Selection algorithm: reagent additions to optimise the dosages during operation. Compara
tively, previous phenomenological modelling approaches for flotation
i) Removing the Primary Recleaner Bank 3 Guar Flowrate resulted required parameter measurements within an industrial cell which is not
in a decrease to 6.116 × 10-3. feasible during operation. Details about such phenomenological models
ii) Removing Cleaner Scavenger 3 Guar Flowrate further decreased can be found in (Nguyen & Schulze, 2003). In this case study, the
it to 5.947 × 10-3. AutoML pipeline can provide the capabilities for minerals processing
iii) Removing Operational Hour further decreased it to 5.938 × 10-3. experts to develop their own machine learning regression models for
iv) Removing Fine Rougher pH further decreased it to 5.920 × 10-3. missing phenomenological models to optimise the entire value chain on
a flowsheet simulator.
This feature set after removing 4 columns yielded the lowest mean The results of the PLS regression can be found in Appendix 8.3.2 and
the comparison between the AutoML models can be found in Table 7.
13
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Fig. 8. Feature Selection results for the CdA flotation data. The boxes represent the 1st and 3rd quartile, while the whiskers represent the ranges of the model MSE.
Although both the parity charts of the PLS and AutoML models are performance based on the upstream comminution changes evaluated by
spread along the 45-degree line indicating good quality models, the the optimiser. Ultimately, they needed to maintain the metal production
AutoML models have a tighter spread with lower MAPE across all at a certain grade.
datasets. This demonstrates that PLS regression can yield acceptable
linear models, but the AutoML models can provide additional accuracy 3.3.1. Carmen de Andacollo flotation circuit data
and significantly lower MAPE where possible. This further emphasises In this case study, the AutoML pipeline was used to predict four total
the use case of the AutoML pipeline if the AutoML models are always flotation recoveries of the flotation circuit. A summary of the data
better than simple statistical models. provided for the CdA case study can be found in Table 8.
Although useful, the client did not provide operational variables of
3.2.3. Analysis of the assumption of feature selection methodology the flotation circuit. As the aim of the study was to investigate effects of
From the results, we can comment on the validity of the assumption the upstream throughput and p80 rather than optimising the flotation
made in Section 2.2.5. To compare the single layer validation MSE proxy circuit performance, it was not required. The geometallurgical re
and the true validation MSE, we compare e at the end of FS to the e in EA coveries refer to block model interpolation values from laboratory
respectively: floatability studies of the drill core samples. These represent the theo
retical upper limit of the flotation recoveries in ideal laboratory condi
i) Ni Recovery (FS: 3.55 × 10-3, EA: 3.12 × 10-3) (1.138x higher) tions. However, considering these values were laboratory float
ii) Ni Concentrate Grade (FS: 7.55 × 10-3, EA: 6.95 × 10-3), (1.086x conditions and dispersed to the block model, it is not certain whether it
higher) is correlated to the actual plant flotation recoveries, so they were
iii) MgO Concentrate Grade (FS: 7.85 × 10-3, EA: 6.93 × 10-3), labelled as free. Since there were no hourly PI data provided, there will
(1.133x higher) be no time resolution step.
iv) Gangue Recovery (FS: 4.78 × 10-3, EA: 4.58 × 10-3). (1.044x
higher) 3.3.2. Carmen de Andacollo flotation circuit results and discussion
In the data filtering process, 193 days were removed due to outliers.
As expected, the true e (EA) should always have lower or equal to The dataset now has 1054 rows of daily data with 6 main inputs, 2 free
MSE than the proxy e (FS). This is because the FS uses a single layer inputs, and 4 outputs. These were sampled according to the Cu Recov
neural network with constant number of hidden nodes which is not the ery, yielding 845 train, 104 validation, and 105 test. At this step, the
optimal architecture for that feature set. The proxy seems to be 4–14 % AutoML pipeline performed Feature Selection and the results are shown
higher than the optimal architecture. If the proxy is always higher than in Fig. 8.
the EA results within these ranges, then the assumption of using a single The benchmark with all 8 inputs yielded 0.01824 MSE, while
layer network to evaluate the performance of the feature set should hold. removing Copper geometallurgical Recovery yielded 0.01817 (lower
More tests in the future should be done to evaluate this assumption. MSE), but removing Gold geometallurgical Recovery yielded 0.01872
(higher MSE). Therefore, Copper geometallurgical Recovery was not
used in the remainder of the pipeline (7 inputs). The results from the
3.3. Carmen de Andacollo flotation circuit Evolutionary Algorithm and Model Evaluation are summarised in
Table 9. Like the Nickel Concentrator case study, the smarter initiali
Carmen de Andacollo (CdA) is a copper and gold mine owned by sation of the evolutionary algorithm used allowed for convergence in
Teck. A simplified flowsheet of the flotation circuit is shown in Fig. 7. In less generations than the Los Bronces case study.
this case study, the aim was to investigate the levers in the comminution The global minima validation MSE for Cu Recovery and Gangue
circuit to maximise throughput when one of the main crushers were out Recovery occurs in the order of magnitude of 10-3 which is reflected by
of operation for four months. The process plant was investigating to the good parity charts in Fig. 9. The residuals appear normally distrib
increase the capacity of an on-site mobile crusher along with other le uted as well so the errors are likely to be from noise in the data. How
vers like target p80 and stockpile blending. Like the Nickel Concentrator ever, the global minima validation MSE for Au and Hg recoveries occurs
case study, an accurate flotation circuit model was required to predict
14
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Table 9
Results of the Evolutionary Algorithm and Model Evaluation of the final models for the CdA flotation data. *Dashed blue lines indicate the epoch where the model
weights were taken.
Output Evolutionary Algorithm Model Evaluation* Final Model Hyperparameters
at a normalised MSE of an order of magnitude of 10-2 which is much sample size taken to estimate the assay grades given typical variances for
higher than other outputs in other case studies. This indicated that the these commodities.
neural network cannot find a generalised solution with low MSE be Despite the relatively poor correlation between inputs and Au and
tween the training and validation. Hg, the models were implemented into IES and is used by the client as
The results from the AutoML pipeline on the CdA flotation dataset overall outcome exceeded expectation with the available data. The
show the importance of having clean, representative, and accurate data MAPE for the Cu and Au recovery developed in this study were still
for developing machine learning models. The Cu and gangue recoveries lower (more accurate) compared to any empirical or phenomenological
had better correlations with the input likely because of the higher grade models developed previously. Previously, CdA was using the dispersed
in the sampling point and accuracy of the assaying method. Therefore, laboratory geometallurgical recoveries as a predictor for actual plant
the noise in the data did not corrupt the correlation and the neural flotation recoveries. When comparing the MAPE from these models, the
network was able to converge to a generalised solution. Comparatively, test MAPE was 3.18 % (AutoML) vs 3.79 % (geometallurgical) for Cu and
the Au and Hg recoveries likely had poor correlation with the inputs due 6.79 % (AutoML) vs 8.06 % (geometallurgical) for Au. The flotation
to having a much lower grade in the sampling point. This indicates that model was used with the comminution circuit phenomenological models
the accuracy and reproducibility of assay data are heavily dependent on to provide suggestions to the operation on levers to maximise
the sampling method and sample size to ensure representativeness. Hg throughput while one of the main crushers were down for four months.
grades are also much harder to accurately assay compared to Au grades, To ensure that the AutoML modelling approach was done correctly,
which can be accurately determined through fire assay. It is also possible the PLS models benchmark was done on the same dataset and the results
that the sampling point/period is not representative of the process at are summarised in Table 10 and Appendix 8.3.3.
that given point/period. It is recommended that the site reconsiders the The AutoML models have lower test MAPE than the PLS models,
15
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Fig. 9. The parity charts for the training, validation, and test datasets for the best models for the 4 outputs of the CdA flotation data.
16
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
iv) dividing models for each output variable, and feature sets being data was used to develop a flotation model to investigate the effect of a
equal for each output variable. primary crusher being out of operation for 4 months and how the other
v) comparing other methods of model selection and validation like comminution units can be maximised to maintain the metal production.
k-fold cross validation whether the improvement of accuracy is The difficulty in sampling and assaying low grade elements yielded high
justified by repeating the algorithm k times (Sanjay & Shukla, noise in the data and corrupted the input/output correlations. This
2016). dataset demonstrated the limitations of the machine learning models
when the dataset contains poorly correlated inputs and outputs. It is
Further steps can be added to the existing pipeline such as: important that an adequate sample size is taken to be representative of
the assay grade at the sampling point especially for low-grade com
i) the ability for the model to flag samples that are different from modities. For each of these case studies, we compared the results with
training conditions when evaluating them. Partial Least Squares (PLS) regression as a benchmark. In all the case
ii) extending the AutoML algorithm to handle various data resolu studies, the AutoML are always better than the PLS models but require
tions instead of just hourly process and daily assay data, longer development time.
iii) allocating the most recent data as the test set as a more accurate The applicability and success of the AutoML tool developed in this
way of testing the generalisation behaviour of the models pro study is shown across the three case studies to be a feasible approach
duced and deployed as the plant is operating, towards deploying machine learning models in industry. Industrial
iv) including model interpretation tools such as SHAP (Lundberg & plants typically have an abundance of historical data but do not have the
Lee, 2017) for interpreting the developed neural network models expertise in developing end-to-end machine learning models. The
in the model evaluation step, AutoML pipeline developed in this study bridges this gap and allows
v) extending the AutoML algorithm capabilities to develop dynamic expert minerals processing to develop machine learning models within
models, i.e., time-dependent process models. hours for use with existing calibrated phenomenological models.
This study investigated the possibility of developing an Automated Edwin J. Y. Koh: Methodology, Software, Validation, Formal anal
Machine Learning (AutoML) pipeline specifically for developing steady- ysis, Investigation, Resources, Data curation, Writing – original draft,
state mineral processing models. The AutoML pipeline consists of the Visualization, Methodology, Software, Validation, Formal analysis,
following major steps: data collection, temporal resolution, data Investigation, Resources, Data curation, Writing – original draft, Visu
filtering, data sampling, feature selection, evolutionary algorithm, alization. Eiman Amini: Conceptualization, Methodology, Validation,
model evaluation, and model deployment. The pipeline also requires Writing – review & editing, Visualization, Supervision. Shruti Gaur:
user approval after the data filtering step (to ensure required quality Software, Data curation. Miguel Becerra Maquieira: Validation, Re
data) and the model evaluation step (to ensure generalised model sources. Christian Jara Heck: Validation, Resources. Geoffrey J.
appropriate for use). Although general AutoML solutions are available McLachlan: Writing – review & editing, Supervision. Nick Beaton:
commercially, this AutoML pipeline allows for expert minerals pro Conceptualization, Resources, Funding acquisition, Project
cessing users to influence the feature selection, data filtering, and tem administration.
poral resolution steps based on process knowledge.
Three case studies were chosen and presented in this article which
covers a range dataset quality encountered in industrial data and Declaration of Competing Interest
example use cases for machine learning models. Firstly, there are the Los
Bronces comminution circuit data that is simulated data from a previ The authors declare that they have no known competing financial
ously published study using calibrated phenomenological models with interests or personal relationships that could have appeared to influence
minimal noise. The model was used to determine the ideal pre- the work reported in this paper.
concentration, blasting, and comminution conditions for mine sched
uling. In this case study, the AutoML developed a neural network Data availability
capable of generalising to 14 million samples whilst being trained only
on 14,284 training samples. Next, the Nickel Concentrator flotation The data that has been used is confidential.
circuit data contained both hourly PI data and daily assay data from an
industrial concentrator. These data were used to develop a flotation Acknowledgements
model to identify the best comminution circuit operating conditions to
maximise metal production. Also, the Nickel Concentrator data had This research would not have been possible without the financial
Nickel assay data which had relatively less noise, yielding good corre support from Orica. The authors would also like to thank AngloAmer
lations in the final model. Finally, there are the Carmen de Andacollo ican and Teck for the effort in collecting and sharing the data for
flotation data which contained rare element assay data, Au, and Hg. The analysis.
Appendix
(SeeFig. 10) shows the pseudocode for the backward elimination feature selection algorithm used in this AutoML study.
17
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Fig. 10. Pseudocode for the sequential backward elimination feature selection algorithm.
• 12 pH sensors were Fine Rougher, Fine Rougher Feed, Fine Scavenger Feed,Fine Cleaner Scavenger 1, Cleaner Scavenger 2A, Cleaner Scavenger 3,
Coarse Middlings, Coarse Rougher, Coarse Scavenger 1,Coarse Scavenger 2A, Coarse Sands, and Deslime Conditioning Tank.
• Frother flowrate sensors were Module 1 Conditioning Tank, Module 1 Settling Column, Module 2 Conditioning Tank, Module 2 Settling Column,
and Flash Flotation.
• 19 Guar Flowrate Sensors were Coarse Conditioning Tank, Coarse Scavenger 1, Coarse Scavenger 2A, Coarse Sands, Coarse Rougher, Cleaner
Scavenger Rougher, Cleaner Scavenger 2A Feed, Cleaner Scavenger 3, Module 1 Settling Column, Module 1 Fine Rougher Feed, Module 1 Fine
Scavenger Feed, Module 2 Settling Column, Module 2 Conditioning Tank, Primary Cleaner Bank 1, Primary Cleaner Bank 2, Primary Recleaner
Bank 1, Primary Recleaner Bank 3, Secondary Cleaner, and Secondary Recleaner.
• 16 Xanthate Flowrate Sensors were Coarse Rougher, Coarse Middlings, Coarse Scavenger 1, Course Scavenger 2A Feed, Coarse Sands, Coarse
Conditioning Tank, Module 1 Conditioning Tank, Module 1 Fine Rougher Feed, Module 1 Fine Scavenger Feed, Module 2 Conditioning Tank,
Module 2 Settling Tank, Cleaner Scavenger Rougher, Cleaner Scavenger Middlings, Cleaner Scavenger 1, Cleaner Scavenger 2A, and Cleaner
Scavenger 3.
• 12 Acid Flowrate Sensors were Fine Rougher, Fine Rougher Feed, Fine Scavenger Feed, Cleaner Scavenger 1, Cleaner Scavenger 2A, Cleaner
Scavenger 2B, Coarse Rougher, Coarse Middlings, Coarse Sands, Coarse Scavenger 2A, Coarse Scavenger, and Deslime Conditioning Tank
18
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
19
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
References Bishop, C., 2006. Neural Networks. In: Bishop, I.C. (Ed.), Pattern Recognition and
Machine Learning. Springer, pp. 225–284.
Byrd, R.H., Nocedal, J., Waltz, R.A., 2006. KNITRO: An Integrated Package for Nonlinear
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J. , Yu, Y. (2016). Tensorflow:
Optimization. Springer, Boston, MA.
A System for Large-Scale Machine Learning. Savannah, USA: Proceedings of the 12th
Chollet, F. (2015). Keras. Retrieved from GitHub repository: https://github.com/
USENIX Symposium on Operating Systems Design and Implementation.
fchollet/keras.
Aldrich, C., Marais, C., Shean, B.J., Cilliers, J.J., 2010. Online monitoring and control of
Chowdhury, M.Z.I., Turin, T.C., 2020. Variable selection strategies and its importance in
froth flotation systems with machine vision: A review. Int. J. Miner. Process. 96 (1-
clinical prediction modelling. Family medicine and community health 8 (1),
4), 1–13.
e000262.
Alibrahim, H., & Ludwig, S. A. (2021). Hyperparameter Optimization: Comparing
De Ryck, T., Lanthaler, S., Mishra, S., 2021. Neural Networks 143, 732–750. https://doi.
Genetic Algorithm against Grid Search and Bayesian Optimization. 2021 IEEE
org/10.1016/j.neunet.2021.08.015.
Congress on Evolutionary Computation (CEC) (pp. 1551-1559). 2021 IEEE Congress
GE Digital. 2022. Retrieved from Proficy Software Family. GE Digital: https://www.ge.
on Evolutionary Computation (CEC).
com/digital/applications/proficy-software-family. 30 June 2022.
Alpaydin, E., 2014. Dimensionality Reduction. In: E. Alpaydin, Introduction to MachinE
Goodfellow, I., Bengio, Y., Courville, A., 2016. Regularization for Deep Learning. In:
LEarning. The MIT Press, pp. 115–160.
Goodfellow, I.I., Bengio, Y., Courville, A. (Eds.), Deep Learning. MIT Pess,
Amini, E., Becerra, M., Bachmann, T., Beaton, N., Shapland, G., 2021. Development and
pp. 224–270.
Reconciliation of a Mine Operation Value Chain Flowsheet in IES to Enable Grade
Guyon, I., Elisseeff, A., 2003. An Introduction to Variable and Feature Selection. Journal
Engineering and Process Mass Simulations for Scale-up and Strategic Planning
of Machine Learning Research 3, 1157–1182.
Analysis. Mining, Metallurgy, & Exploration 38, 721–730. https://doi.org/10.1007/
Hastie, T., Tibshirani, R., & Friedman, J. (2008). Chapter 3: Linear Methods for
s42461-020-00303-9.
Regression. In T. Hastie, R. Tibshirani, & J. Friedman, The Elements of Statistical
Baek, J., Choi, Y., 2019. Deep Neural Network for Ore Production and Crusher Utilization
Learning Data Mining, Inference, and Prediction (p. 80). Springer.
Prediction of Truck Haulage System in Underground Mine. Applied Sciences 9 (19).
https://doi.org/10.3390/app9194180.
20
E.J.Y. Koh et al. Minerals Engineering 189 (2022) 107886
Hastie, T., Tibshirani, R., Friedman, J., 2008a. Chapter 14: Unsupervised Learning. In: hydrocyclone. Mineral Processing and Extractive Metallugy 1–13. https://doi.org/
The Elements of Statistical Learning Data Mining, Inference, and Prediction. 10.1080/25726641.2019.1680177.
Springer, p. (p. 536).. Nakhaei, F., Irannajad, M., 2013. Comparison between Neural Networks and Multiple
He, X., Zhao, K., Chu, X., 2021. AutoML: A survey of the state-of-the-art. Knowl.-Based Regression Methods in Metallurgical Performance Modelling of Flotation Column.
Syst. 212, 106622. Physicochemical Problems of Mineral Processing 49 (1), 255–266.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer Feedforward Networks are Nguyen, A., Schulze, H.-J., 2003. Colloidal Science of Flotation. CRC Press.
Universal Approximators. Neural Networks 2 (5), 359–366. Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in
Sadat, H., F., Aliakbar, A., & Bahram, R. (2018). Semi-autogenous mill power prediction Machine Learning. arXiv preprint arXiv:1811.12808.
by a hybrid neural genetic algorithm. Journal of Central South University, 25, 151- Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., . . . Kurakin, A. (2017).
158. doi:https://doi.org/10.1007/s11771-018-3725-8. Large-Scale Evolution of Image Classifiers. Sydney, Australia: International
Keskar, N., S., & Socher, R. (2017). Improving Generalization Performance by Switching Conference of Machine Learning 2017.
from Adam to SGD. arXiv preprint arXiv:1712.07628. Sánchez-Maroño, N. A.-B.-S. (2007). Filter methods for feature selection–a comparative
Kingma, D. P., & Ba, J. (2014). Adam: A method for Stochastic Optimization. arXiv: study. Berlin, Germany: International Conference on Intelligent Data Engineering
1412.6980. and Automated Learning.
Koh, E.J., Amini, E., McLachlan, G.J., Beaton, N., 2021a. Utilising a deep neural network Sanjay, Y., Shukla, S., 2016. Analysis of k-fold cross-validation over hold-out validation
as a surrogate model to approximate phenomenological models of a comminution on colossal datasets for quality classification. In: IEEE 6th International Conference
circuit for faster simulations. Miner. Eng. 170 (107026), 1–11. https://doi.org/ on Advanced Computing. IEEE, pp. 78–83.
10.1016/j.mineng.2021.107026. Stange, W., Bye, A., Beaton, N., Groutsch, J., Manlapig, E., 2014. A roadmap for
Koh, E.J.Y., Amini, E., McLachlan, G.J., Beaton, N., 2021b. Utilising convolutional neural simulation. In: XXVII International Mineral Processing Congress - IMPC 2014.
networks to perform fast automated modal mineralogy analysis for thin-section Gecamin Digital Publications, Santiago, chile, pp. 127–137.
optical microscopy. Miner. Eng. 173, 107230. Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of
Kohavi, R., John, G.H., 1997. Wrappers for feature subset selection. Artif. Intell. 97 (1-2), initialization and momentum in deep learning. International conference on machine
273–324. learning. PMLR (pp. 1139-1147). International conference on machine learning.
LeCun, A. Y., Bottou, L., Orr., G. B., & Müller, K.-R. (1998). Efficient BackProp. In Neural PMLR.
Networks: Tricks of the Trade (pp. 9-48). Springer. Toor, P., Powell, M. S., Hilden, M., & Weerasekara, N. S. (2015). Understanding the
Lundberg, S., Lee, S.-I., 2017. A Unified Approach to Interpreting Model Predictions. effects of liner wear on SAG mill performance. The Australasian Institute of Mining
Advances in neural information processing systems. and Metallurgy 2015. The Australasian Institute of Mining and Metallurgy.
McCoy, J.T., Auret, L., 2019. Machine learning applications in minerals processing: A Zhou, P., Feng, J., Ma, C., Xiong, C., Hoi, S., & Weinan, E. (2020). Towards Theoretically
review. Miner. Eng. 132, 95–109. Understanding Why SGD Generalizes Better Than ADAM in Deep Learning. arXiv:
Mohanty, S., Das, S.K., Majumder, A.K., 2019. Artificial neural network modelling and 2010.05627.
experimental investigation to characterize the dewatering performance of a
21