KEMBAR78
Client Profiling Aml Sys | PDF | Cluster Analysis | Money Laundering
0% found this document useful (0 votes)
20 views8 pages

Client Profiling Aml Sys

This document summarizes a research paper that proposes using data mining and machine learning techniques to help detect money laundering. Specifically, it focuses on using transaction data to profile bank clients and identify patterns in their behavior. The system architecture involves two groups of agents: those that capture suspicious transactions based on client profiles and rules, and those that further analyze suspicious transactions that are flagged. The document discusses related work applying data mining in financial applications, such as using clustering, classification rules, and support vector machines to model customer behavior and detect abnormal transactions.

Uploaded by

Nour Allah tabib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Client Profiling Aml Sys

This document summarizes a research paper that proposes using data mining and machine learning techniques to help detect money laundering. Specifically, it focuses on using transaction data to profile bank clients and identify patterns in their behavior. The system architecture involves two groups of agents: those that capture suspicious transactions based on client profiles and rules, and those that further analyze suspicious transactions that are flagged. The document discusses related work applying data mining in financial applications, such as using clustering, classification rules, and support vector machines to model customer behavior and detect abnormal transactions.

Uploaded by

Nour Allah tabib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Client Profiling for an Anti-Money Laundering

System
Alexandre, Claudio Balsa, João
Faculdade de Ciências da Universidade de Lisboa Faculdade de Ciências da Universidade de Lisboa
BioISI-MAS BioISI-MAS
Lisbon, Portugal Lisbon, Portugal
Email: calexandre@di.fc.ul.pt Email: jbalsa@ciencias.ulisboa.pt
arXiv:1510.00878v2 [cs.LG] 11 Jan 2016

Abstract—We present a data mining approach for profiling sophistication of this criminal activity, the most critical part
bank clients in order to support the process of detection of anti- of process is still performed by human analysts. To make the
money laundering operations. We first present the overall system identification and analysis of the suspect transactions more
architecture, and then focus on the relevant component for this
paper. We detail the experiments performed on real world data agile, using data mining techniques and intelligent agents,
from a financial institution, which allowed us to group clients in reducing the need of human intervention is the main goal of
clusters and then generate a set of classification rules. We discuss this doctoral project, as detailed in [2].
the relevance of the founded client profiles and of the generated
classification rules. According to the defined overall agent-based
architecture, these rules will be incorporated in the knowledge In this paper, we focus on the results related to one of the
base of the intelligent agents responsible for the signaling of systems components, that has to do with the identification of a
suspicious transactions. financial institution’s customers’ behavior patterns, using the
Index Terms—anti-money laundering; data mining; classi- entire checking accounts transaction database of transactions
fication; customer clustering; multiagent systems; suspicious made over a year. In section II, we provide some context for
transactions
our work, presenting the system where the present research
is used. In section III, we present some related work on the
I. I NTRODUCTION
use of data mining techniques in financial applications.
Acts of prevention and fight against money laundering
(ML) crimes are prioritized by almost every government
in the world, at the same level of the most relevant global
issues. Money laundering is a crime that typically consists II. AML M ULTIAGENT S YSTEM P ROPOSED
in making a certain illegal financial gain into a legal gain.
According to the United Nations Office on Drugs and Crimes A. Problem Definition
(UNODC) the annual global estimate of laundered money
is about 2% - 5% of the Gross World Product, or US$800 Money Laundering is characterized by a set of commercial
billion - US$2 trillion [1]. As if the financial volume were or financial operations that aim to incorporate in each
not enough, another reason for governments to focus on this country’s economy, in a transitory or permanent way, illicitly
crime is for the fact that it is clearly connected to other obtained resources, goods or values.
types of crimes such as illegal drug trade, fraud, corruption,
kidnapping, terrorism, arms smuggling, among others. The role of financial institutions is to find ways to identify,
among the huge number of operations that occur every day,
Most countries’ financial authorities, usually Central those suspicious transactions and then investigate them in
Banks, are responsible for controlling and defining anti- more detail.
money laundering (AML) regulations, demanding from
financial institutions the implementation of procedures that The motivation for using an intelligent agent based solution
apply the defined norms. However, the constant increase results from the analysis of the problem, as described in [2],
on the amount of financial transactions along with the high and, mainly, in the observation that some of the tasks that
frequency of publication of new national and international we want to automate (at least partially), match perfectly the
rules, cause the lack of efficiency and timing of the fight and principles behind multi-agent system definition [3]. We need a
prevention activity. set of entities (agents) with autonomy to accomplish specific
tasks and that keep contact with other agents in order to reach a
Institutions already use semi-automated processes to common objective. Every agent must have its own knowledge
indicate suspicious ML transactions, based on medians and be able to ponder and come to an intelligent decision.
and predetermined standard irregularities. Still, due to the Besides, they need to present scalability and be flexible [4].
Fig. 1. General Architecture of the Proposed System Fig. 2. Inductive Learning Hierarchy

B. System Architecture theory, a positive factor of this approach is that it can handle
heterogeneous data set, however, the matrix dimensionality
We consider two groups of agents, according to their role in
left many doubts about the performance [7].
the process. The first one is a group responsible for capturing
suspicious transactions (CST), while the second is responsible
In [8], the authors propose an extension of the support
for analyzing suspicious transactions (AST) identified by
vector machine (SVM) [9] to detect customers’ abnormal
agents from the first group. Fig. 1 shows a schematic view
behavior. A combination of an RBF kernel (radial basis
and the global architecture flow of the proposed system.
function) is presented with improvements [10] as a definition
for distinct distances [11] and supervised and unsupervised
For the CST to work, it is necessary to know customers’
SVM algorithms. An SVM class [9] is a supervised way of
behavior patterns so that we can establish controlling rules.
learning used to detect value anomalies in a group of data
Other rules resulting from the defined norms made by the
without classes. The advantage of this approach is that you
financial system regulator entities will be included in these
can deal with heterogeneous datasets.
controlling rules. Fig. 1 also shows, in highlight, the step called
learning that incorporates the method we describe in this paper.
A combination of clustering and MLP (multilayer
III. R ELATED W ORK perceptron) was proposed by [12]. A simple center-based
clustering technique is used to detect suspicious cases of
From the work that has been done in the application of money laundering. This technique is based on two main
data mining technique to the finance domain, we select here characteristics, which are then used as an MLP creation
some of the most relevant for our research. process entry. The preliminary results show that this approach
is efficient. However, the number of characteristics and
Considering the data transformation process, which was training patterns is too small and that could affect precision.
especially critical in our work, in [5], a discretization process
was also applied to data set in order to find a more adequate In [7], the authors present a case study corresponding to the
set of clusters. Resources are mapped to n+2 dimensional application of a knowledge base solution that combines data
Euclidean space, n being the customer’s attributes, one mining techniques, clustering, neural networks and genetic
dimension for time and another for transactions. Customers’ algorithms to detect money-laundering patterns.
transactions are projected according to time, accumulating
transactions and its frequency to create a histogram. Clusters IV. DATA M INING P ROCESS
are created based on segments of the histogram. Analyses Basically, what differentiates the types of learning is the
of local and global correlations are then applied to detect existence or not of a class attribute docketing the registries
suspicious patterns. A good way to analyze individual of the data group used. When this class exists, the learning
behavior and/or group behavior is to examine their operations process will be supervised; when only part of the examples
to detect suspicious behaviors related to abnormal peaks on have this attribute it will be semi-supervised; and when these
the histogram. However, when it is necessary to analyze a labels don’t exist at all the learning process is unsupervised
big number of clients and transactions over a long period [13]. Fig. 2 illustrates this hierarchy besides highlighting the
of time, it may become difficult to detect suspicious cases, way chosen in this paper.
since there might be few peaks or none at all in the histogram.
A. Choice of tools and environment definition
The implementation proposal in [6] is an SVM extension The data used in this paper don’t have a class attribute
and a matrix with massive dimensionality was created. In (unsupervised). The initial goal is to discover patterns from
any regular characteristic in the dataset (Clustering), trying stood between 0.01 minimum and 536,852,446.89 maximum.
to form groups of clients with similar characteristics and
mutually exclusive (partitional). As evaluation measures trying to define the initial number
of clusters, we used the Silhouette Coefficient and the SSE
Despite the classic problem associated with the K-means (sum of squared error), which indicated the numbers of five
algorithm [14], the necessity of defining in advance the and seven clusters. Fig. 3 and Fig. 4 show the results.
number of clusters to be used, it is one of the most used
methods, perhaps for its simplicity, efficiency and for being In this case, the clusters created with nominal attributes
present in almost every platform that implements and showed a more coherent group of customers, for instance:
automates the data mining process. old customers with a high use of services, high monetary
values involved, credited values are rapidly withdraw (Cluster
Another point that needs to be looked at when talking 1); customers’ account with less than 4 years old, high use
about K-means is that we obtain better results when dealing of services, high quantity of entries, however, low monetary
with continuous numerical attributes in comparison with values and credited values rapidly withdrawn (Cluster 3).
nominal attribute use. The reason being that, originally, it
uses squared Euclidean distance to calculate proximity. In C. Rules Generation and Evaluation – Phase 1
nominal attributes this calculation can’t be made. In the subsequent step of the process, rule generation, the
PART algorithm was used, which is also included in the
The easy of use of the WEKA tool (Waikato Environment WEKA environment, and implements the C4.5 Decision Tree
for Knowledge Analysis), a product of the University of algorithm for interactions and uses the best leaf technique in
Waikato (New Zealand), along with the support offered in rule generation [16]. In the executions all the default values
every step of the Data Mining process, with a good graphic suggested by WEKA were used.
interface, besides natively implementing many clustering
algorithms, including SimpleKmeans [15], led to its choice Fig. 5 shows that, in this case, the metrics have had an
as the platform where we conducted our experiments. inverse result, meaning that the clusters based on numerical
attributes had better results. Nevertheless, the rules generated,
WEKA’s K-means version, either for Euclidean distance or in both cases, demonstrate not being appropriate for the
for Manhattan distance, use the closest neighbor technique research goal.
trying to diminish problems with nominal attributes. The
general rule is that for two values of numerical attributes X Fig. 6 shows a few examples of rules with little use for the
and Y, the result of X-Y is used in the distance calculation. proposed study, whether it be for its simplicity and incapacity
When the attribute is nominal, value 0 is attributed when X in helping taking a decision; or for its complexity, but resulting
and Y are the same and 1 when they differ. in few examples, also becoming disposable.

B. Preprocessing and Clustering – Phase 1 D. Preprocessing and Clustering – Phase 2


In a first phase, the dataset used in this paper come from Considering that the obtained results were unsatisfactory,
a financial institution and represent the accounts movement a new strategy was adopted. This consisted of creating a
over a period of three months. The most relevant tables new profile table; use numerical attributes, but more directed
in this dataset model are the transaction and register ones, towards the research goal; use a more ample database; make
with 14.5 million and 4.5 million lines, respectively. In experiments with other rule generating algorithms.
the pre-process step, data was clustered by customer with
numerical attributes that indicated the monthly average: of
services used; transactions made; debit transactions made,
credit transactions made. Besides, we included the average
monetary value of these transactions. To each one of these
attributes the standard deviation is also used, since there is a
major variation between the minimum and maximum values.
This table, called customers profiles resulted in 1.6 million
lines.

Despite the already mentioned restrictions when using


nominal attributes in cluster development with K-means, we
decided to separate data and test both scenarios, numerical
and nominal attributes. The motivation for this decision
was the big variation between the minimum and maximum
attribute values mentioned above, where the biggest variation Fig. 3. Clusters evaluation (numeric attributes) - Phase 1
Fig. 4. Clusters evaluation (nominal attributes) - Phase 1

Information with the monetary value involved in the


transaction, financial transfers between banks and temporality
of incoming and outgoing financial resources, started being
part of the new customer profile table. The new checking
accounts database incorporated transactions of the whole
year of 2014 and the main tables, transactions and register, Fig. 6. Rules Generated by Algorithm PART – Phase 1
started to have 90.6 million and 5.1 million lines, respectively.

The compliance analysts from the financial institution that


provided the database, besides confirming the importance of
the defined attributes, identified the transactions that have
no connection with money laundering. For example, charges
made by the bank. With this definition the quantity of lines
in the table will be reduced and the clusters generated will
be more specialized on transactions that might actually
correspond to money-laundering operations.

The new customer profile table remained with 2.4 million


lines after the clustering and removal of the insignificant
transactions. In the search of the adequate number of cluster
to be used the SimpleKmeans was executed 10 times, Fig. 7. Clusters evaluation (numeric attributes) - Phase 2
the Silhouette Coefficient [17], SSE, VRC (variance ratio

criterion) [18], Van Dongen and Rand [19] metrics were


analyzed, and the values found. Fig. 7 shows these results.

The Silhouette metrics, SSE and VRC indicate six clusters


as the ideal number, while Van Dongen and Rand indicate
seven clusters. Both Silhouette and SSE show a stability line
starting from number six, which can corroborate number
seven, identified by the other metrics. However, the standard
procedure is to make the choice on the curve “elbow” or the
higher value, depending on the metric.

All the algorithm executions were made using the database


Split function in the proportion of 66% for training and
34% for testing. The six clusters generated show excellent
customer grouping, allowing identifications as:

Fig. 5. Metrics of Rules Generation with Algorithm PART – Phase 1 Cluster #3 – Standard Customer: biggest group of
customers with high use of services, transactions financial
values indicating intermediate customers. The money flows
into account and the following days is withdraw;

Cluster #4 – Group of Risk: high quantity of transactions,


with low use of services; low financial values; money
flows in and the same day or in a small amount of time
is transferred to another financial institution. The small
difference between six and seven for the suggested quantity
of clusters indicates the need to verify the result with 7
clusters. The redistribution of instances for the creation of
the seventh cluster didn’t affect the basic characteristics of
the first six clusters. In proportional terms Cluster #3 was the
one that most gave elements for the creation of Cluster #7,
which the characteristics may be defined as follows:
Fig. 8. Clusters Assessment Report (training instances)

Cluster #7 – Group of Risk 2: older accounts profile


with great use of services and great volume of transactions.
one using database Split in the proportion of 66% for training
Financial values concentrated on areas called “legal limits”.
and 34% for testing; and another with cross-validation (10
A bigger percentage of outgoing financial resources, although
folds). Tab. 2, given in the Appendix, shows the result of the
with a low transference rate to other institutions. High rate of
30 executions. Fig. 10 present the rules generated that also
transfers between accounts of the same institution.
don’t have the expected quality, with strange repetitions of
attributes or attributes conflicts.
Because of the characteristics presented by Cluster #7, its
maintenance is important in the system configuration, thus,
Following these results we decided to separate the profiles,
we started working with the creation of seven clusters. Tab. I
now with class attribute, and perform the tests again. The
shows that this choice doesn’t change the quality of the result.
profiles were separated in three value groups with equivalent
quantity of instances percentage, if possible. Some attributes,
Executing the algorithm with the generated cluster
because of a concentration of occurrences of one particular
evaluation function we obtained a confusion matrix with a
value, were divided in only two groups. The experiments
level of accuracy above 99% if we consider the incorrect
were performed executing the same algorithms and using the
classification rates of 0.0683% e 0.0596%, for the training
same parameters of the previous experiment. Tab. 3, given in
base and testing base respectively, as shown in Fig. 8 and
the Appendix, presents this results.
Fig. 9.
The J48 and JPART algorithms present the best results in
For rule generation, PART algorithm experiments were
the group of experiments using Split and cross-validation,
made, J48 [20] and JRip [21], using the WEKA tool default
respectively. In both cases the number 1.000 limited the
parameters and restraining the rule coverage to 100 and 1000
instances. To each of these options the PART algorithms
and J48 were also executed with the “reducedErrorPruning”
option activated, in the JRip algorithm this option is already
part of the implementation.

This configuration (15 experiments) was executed twice:

TABLE I
DISTRIBUTION OF INSTANCES BY CLUSTER

Fig. 9. Clusters Assessment Report (test instances)


Fig. 10. Rules Generated by Algorithm PART – Phase 2

minimum instances per rule for PART and the minimum of


leaves on the J48 tree. Although not being significant, the Fig. 12. Algorithm PART (ROC Area & Kappa Statistic)
indicators presented better results if compared with previous
experiments.

To improve the experiment with the purpose of verifying


the behavior of these indicators, PART and J48 algorithms
were executed 22 times each, varying the minimum number
of instances by rule and of leaves in the tree. The minimal
number is each algorithm’s default and the maximal number
is the size of the smallest cluster generated (2 – 40,000).

Figures 11 to 14 show the experiment results and


demonstrate that the measure of rule quality is reversibly
proportional to the increase of the minimum quantity of
instances for the coverage of rules or for the leaves of the
tree. Fig. 13. Algorithm J48 (Number of Rules & Percent Correct)

With this experiment we understood that the results shown


in Table 3 can be used for generating and analyzing rules. An
analysis on rules generated by algorithms J48 and PART, using
the parameters of the best result, makes it possible to notice an
improvement in quality: more complete, without redundancies.
Fig. 15 shows an example of the generated rules.

V. C ONCLUSION
In this work we presented the work on a learning component
of an anti money-laundering system. The ultimate goal of our
work is to have a tool that can assist financial institutions in
the prevention and fight of money-laundering activities. We

Fig. 14. Algorithm J48 (ROC Area & Kappa Statistic)

detailed the activities related to account movement databases


in order to build client profiles, clusters, and the subsequent
generations of rules that will be part of the intelligent agent’s
knowledge bases, responsible for the identification of suspi-
cious transactions.
The quest for general client profiles, allied to the use
of a database that covered a small time span (3 months)
produced good results for both the cluster evaluation metrics
and generated rules. Nevertheless, precision was not as good
as we expected.
Fig. 11. Algorithm PART (Number of Rules & Percent Correct)
[8] J. Tang and J. Yin, “Developing an intelligent data discriminating system
of anti-money laundering based on svm,” in Machine Learning and
Cybernetics, 2005. Proceedings of 2005 International Conference on,
vol. 6, Aug 2005, pp. 3453–3457.
[9] B. Scholkopf, “A short tutorial on kernels,” Microsoft Research, Tech.
Rep. MSR-TR-2000-6t, 2000.
[10] B. Scholkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C.
Williamson, “Estimating the support of a high-dimensional distribution,”
Neural Computing, vol. 13, no. 7, pp. 1443–1471, jul 2001.
[11] D. R. Wilson and T. R. Martinez, “Improved heterogeneous distance
functions,” J. Artif. Int. Res., vol. 6, no. 1, pp. 1–34, jan 1997.
[12] N.-A. Le-Khac, S. Markos, and M. T. Kechadi, “Towards a new data
mining-based approach for anti-money laundering in an international
investment bank,” in Digital Forensics and Cyber Crime - First Interna-
tional ICST Conference (ICDF2C). Albany, NY, USA: Springer, 2009,
pp. 77–84.
[13] J. METZ, “Interpretação de clusters gerados por algoritmos de clustering
hierárquico,” Dissertação de Mestrado em Ciências de Computação
e Matemática Computacional, Instituto de Ciências Matemáticas e
de Computação, Universidade de São Paulo, São Carlos, ago 2006.
[Online]. Available: http://www.teses.usp.br/teses/disponiveis/55/55134/
tde-14092006-090701
[14] G. Hamerly and C. Elkan, “Alternatives to the k-means algorithm that
find better clusterings,” in Proceedings of the Eleventh International
Conference on Information and Knowledge Management. New York,
NY, USA: ACM, 2002, pp. 600–607.
[15] D. Arthur and S. Vassilvitskii, “K-means++: The advantages of careful
seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Sym-
posium on Discrete Algorithms. Philadelphia, PA, USA: Society for
Industrial and Applied Mathematics, 2007, pp. 1027–1035.
[16] E. Frank and I. H. Witten, “Generating accurate rule sets without global
Fig. 15. Rules Generated by Algorithms PART and J48 – Phase 2
optimization,” in Proceedings of the Fifteenth International Conference
on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1998, pp. 144–151.
The definition of client profiles that are more tailored to the [17] P. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and
system’s goal, with a database of a greater time span (1 year) validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1,
pp. 53–65, nov 1987.
and a more thorough exploration of the types of attributes [18] T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,”
available for the used algorithms, produced better results. It Communications in Statistics-Simulation and Computation, vol. 3, no. 1,
was possible to build clusters that represent risk groups and pp. 1–27, 1974.
[19] S. Wagner and D. Wagner, “Comparing clusterings – an overview,”
rules both more discriminating and with a wider coverage, Universität Karlsruhe (TH), Tech. Rep. 2006-04, 2007. [Online].
regarding the profiles’ attributes. This improvement was vali- Available: http://digbib.ubka.uni-karlsruhe.de/volltexte/1000011477
dated by a human specialist from the financial institution that [20] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc., 1993.
provided the data. [21] W. W. Cohen, “Fast effective rule induction,” in In Proceedings of
the Twelfth International Conference on Machine Learning. Morgan
R EFERENCES Kaufmann, 1995, pp. 115–123.
[1] U. N. O. on Drugs and Crime, “Unodc annual report 2014,” Online,
2014, accessed on jul. 10,2015. [Online]. Available: https://www.unodc.
org/documents/AnnualReport2014/Annual Report 2014 WEB.pdf
[2] C. Alexandre and J. Balsa, “A multiagent based approach to money
laundering detection and prevention,” in Proceedings of the International
Conference on Agents and Artificial Intelligence, S. Loiseau, J. Filipe,
B. Duval, and H. J. van den Herik, Eds., vol. 1. Lisbon: SciTePress,
2015, pp. 230–235. [Online]. Available: http://www.scitepress.org/
portal/PublicationsDetail.aspx?ID=pJRstwtoDBg=&t=1
[3] M. Wooldridge, An Introduction to Multiagent SystemsAn Introduction
to Multiagent Systems, 2nd ed. Chichester, UK: Wiley Publishing,
2009.
[4] Y. Demazeau, “From interactions to collective behaviour in Agent-
Based systems,” in In: Proceedings of the 1st. European Conference
on Cognitive Science. Saint-Malo, 1995, pp. 117–132.
[5] Z. M. Zhang, J. J. Salerno, and P. S. Yu, “Applying data mining in
investigating money laundering crimes,” in Proceedings of the Ninth
ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, ser. KDD ’03. New York, NY, USA: ACM, 2003, pp.
747–752.
[6] J. Kingdon, “Ai fights money laundering,” IEEE Intelligent Systems,
vol. 19, no. 3, pp. 87–89, May 2004.
[7] N. A. Le Khac and M.-T. Kechadi, “Application of data mining for
anti-money laundering detection: A case study,” in Proceedings of
the 2010 IEEE International Conference on Data Mining Workshops.
Washington, DC, USA: IEEE Computer Society, 2010, pp. 577–584.
A PPENDIX

TABLE II
EXPERIMENT FOR GENERATION OF RULES (NUMERICAL ATTRIBUTES)

TABLE III
EXPERIMENT FOR GENERATION OF RULES (NOMINAL ATTRIBUTES)

You might also like