MSc Intelligent Systems
Fuzzy
Logic
Assignment 1:
Practical
Tutors: Prof. Francisco Chiclana Parrilla, Dr Jenny Carter
Samuel Keays
1-1-2014
Contents
Background............................................................................................................ 2
Input Variables.................................................................................................... 4
Name delta...................................................................................................... 4
Address delta................................................................................................... 4
NI Number edit distance.................................................................................. 4
Home phone number edit distance................................................................. 5
Geographical location...................................................................................... 5
Gender............................................................................................................ 5
Output Variable................................................................................................... 6
Rules:.................................................................................................................. 6
Defuzzification.................................................................................................... 8
Experiments and Tweaking:................................................................................ 8
Conclusion........................................................................................................ 10
Bibliography......................................................................................................... 10
Background
The aim of this report is to present a fuzzy logic system which enables one to
produce reasonable estimates of how closely two addresses match by defining
rules that allow matching to take place.
Master Data Management (Haselden, 2006) is a type of software solution that
has developed rapidly and has been invested in by many large corporate and
government agencies. In essence it is a process whereby multiple database
sources  often from widely different systems such as CRMs and product
catalogues  are mapped together to form one single view of a customer or a
product. This can either be a registry system where the master database merely
points to records it is mastering (and sends messages to inform them of the need
to update or change), and physical (or repository) models where an actual
database model is constructed that is used as the underlying database models
for all other systems after the outlying systems have been processed in. (Wolter,
2007) This can either be in the form of a single batch file (initial load) or deltas
which are difference files between times (which is an especially important use
case in registry style MDM).
As an example of how this works: there may be an individual called Joseph
Ethelbert Bloggs. At a particular bank he has a current account and a loan. On
his current account he may be called Joe E Bloggs. On his loan he is simply
Joseph Bloggs. The task of the system is to match such entries to inform the
bank they are one and the same people. Naturally there are many more
datapoints that are taken into account when matching. National insurance
numbers for example may be particularly relevant. Names are often standardised
with the use of tables that map from common nicknames to their fullname.
Auxiliary tasks include linking households of people with the same address and
so forth. Most systems also have a degree of human interaction. So called data
stewards take those entries whose matching is uncertain by the system and
make a decision. Naturally any match can full into three tiers: automatic
matching, review matching and no match.
The matching itself often involves making decisions about who or what will be
matched together into one source. There are two main types of matching engine,
one uses rules based system, as for example Informaticas MDM solution. (Lira,
2013) Others such as IBMs initiate use probabilistic matching which usually uses
the mutual information of data in the system that matches to conclude how
important the overlap is. (Whei-Jen Chen, 2014, pp. 167-185) My concern will
primarily be with the rules matching, because this is where, to my mind there is
some overlap. Obviously this is a huge project potentially with many research
possibilities. For this report I want to demonstrate that some kind of simple fuzzy
logic rule can be used to match potential duplicates together with different
degrees of certainty based on five different correlations between the data in two
entries  such as edit distance or number of fields for (say) address that match.
Note this is different to fuzzy string matching, which is already used in such
systems. Instead it will be necessary to create a rule set that defines how close
two names are by virtue of their edit distances, among other things. The system
will largely consist of rules based on these various metrics which will feed to a
fuzzy set which is a score of how closely they are matched.
An interesting question arises: is fuzzy logic necessary? The key benefit of fuzzy
logic is that it is possible to define uncertain concepts in a mathematical model.
In this case, the linguistic variable is close is by its very nature uncertain.
Existing systems use complex rule bases with detailed rules about edit distances
and various compound conditions. This therefore is an attempt to see if a small
subset of that rules base  the names  is reducible to a fuzzy system whose
membership functions can be defined which replicates these rules. If they can,
then there is the possibility that these very rules can be captured by some kind
of Computation with Words methodology. Instead of a user specifying that a
name must be within 2 edit distance, say, the user could simply state that the
two names should be very close with an appropriate relation being formed as a
consequence from the fuzzy system. This could potentially speed up the time it
takes to develop the rules set.
Input Variables
I have chosen 6 separate input variables:
Name delta
Address delta
NI Number edit distance
Home phone number edit distance
Geographical location
Gender
Note the assumption is that inputs have already been standardised, for examples
nicknames to names. I tried to stick to the principle that where possible the total
membership grade should equal one so as not to produce strange results with
the output function where parameters are needlessly truncated (Lilly, 2010, pp.
15-16). However, for the output function itself this rule had to be broken in order
to produce membership functions that gave useful centroids for the final results
(that is narrow Gaussians at the edge that give numbers close to 1).
Name delta
For this input I am specifying the number of different entries between two
aggregate names, made up of first name, middle name and last name. For
example, the name Samuel John Keays would produce a value of 1 with
Samuel Keays a value of 2 with Joseph Peter Keays and a value of 4 with
Elizabeth Mary Cook.
The initial membership functions for this will be a sigmoidal function with slope
value a = 3.75 and the intercept point c = 1.75. This is because having 1 name
match  which in this case will produce a very weak membership grade, is not
particularly insightful. A lot of people share first names, middle names and last
names. When two names match there is a good possibility of a match (especially
if other inputs are fired). 3 names matching is a definite name match. The
sigmoidal function maps this as shown in the appendix. I took some
experimenting to get a slope with the desired degree of curvature but this was
eventually achieved. The second membership function is no match, which
naturally is symmetric around 2 but in the opposite direction.
Address delta
For this input I am specifying the number of different entries between two
aggregate addresses, made up of the address line 1, address line 2, address line
3, city and postcode.
The initial membership functions for this will be a sigmoidal function with slope
value a = 1 and the intercept point c = 2.5 to get the curves to sit flush with 1.
Again experimental evidence needed to produce the appropriate curve, and the
no match membership function was again symmetrical around 3  the mid-. The
curve has been defined as much smaller because even one match between
different addresses is likely to suggest a reasonable chance of a match, although
it may be a town. In a real system there would be more input values for specific
values of the address which would have additional rules, i.e. postcode would give
a high chance of matching, whereas town would give a low chance. For the sake
of this though only the address delta will be of concern.
NI Number edit distance
NI stands for National Insurance number and this is often stored in employee or
government records and is a mixture of alphabetically and numerical characters
9 characters long . Edit distance here means the Levenshtein distance  which in
a nutshell uses swaps, insertions and deletions (all equal to 1) to determine how
far apart two strings are. (Black, 2013) NI edit distance is 9 as this is the size of
the set (assume all changes were swaps). However it is generally assumed that
beyond 2 the chances of matching are very low and edit distances beyond 2 are
almost certainly never going to match. Therefore I have defined a very sharp set
with a sigmoidal function on the left from 0 to 100 with a low degree of matching
on 1, even less on 2, and virtually nothing on 0 and a no-matching membership
function which is again symmetrical. The appropriate measure had an a
parameter of 20 and a b parameter of 2.
Home phone number edit distance
There are 11 numbers typically in a phone number so a maximum edit distance
of 11 is expected, with the same parameters as for the NI Number.
Geographical location
Many systems have geolocation data. Great circle arc distances of addresses can
be used to match people. There will be three separate membership functions, a
sigmoid for identical from 0km to 3km to account for measure measure issues. I
will create another which is potentially moved based on the assumption that
people move typically within 25km or so, with a standard deviation of 20 km. For
this it seems to make sense to use a Gaussian membership function. Finally
there will be a sigmoid from identical to the end of the range which is different
location. Given I am assuming UK data a maximum distance of 1000km seems
reasonable.
Gender
This is categorical data which either has the value -1 (not match) or 1 (match),
with 0 as uncertain in cases where the data is unknown. This is a sigmoid with its
match in the centre with membership grade 0 for both at 0.5 for not sure, and 1
for sure on either side.
Output Variable
The output is a match decision. This is either no match, that it should be sent to
a human data steward or match. Initially these are set as three evenly spaced
Gaussians, on the assumption that the application of various rules should
produce nice evenly spaced output sets that will defuzzify in a relatively
straightforward manner due to their balanced nature, especially under centroid
systems where the height of the various Gaussian should linearly move the
defuzzified value around.
Rules:
The number of potential rules are:
R=l n
Where l is the number of linguistic inputs for each label (Ross, 2004, p. 275). This
produces:
2 x 2 x 2 x 2 x 3 x 2 = 96 rules, which whilst tractable can be reduced.
In order to reduce the number of rules necessary to calculate this it is necessary
instead to relate expert judgement of when matches take place. In other words
the specific scenarios that generate the three types of outcome are deduced
from my domain knowledge, translated into logical rules and processed as such.
Then it should be checked that every combination has some kind of effect, even
if this is deliberately to ignore inputs that is are not useful unless in conjunction
with another input.
Due to the relatively small number of membership functions in each (2 mainly)
the number of combinations is lower and it is not necessary to use and so much
 so methods such as the Comb method or SVD decomposition are not necessary
 and apply to Sugeno type inference systems typically in any case. However,
once the two main scenarios have been defined, the use of the Or operator
helps reduce to a smaller size the number of conditions that fail to match, which
would otherwise take up a large bulk of the defined rules. Primarily the rules
have been minimised by considering the conditions under which expert opinion
would categorise certain data, and then excluding the negative cases later on.
The following scenarios are regarded as being appropriate for an automatic
match, based on my own experience with configuring MDM systems:
There are four situations in which automatic matching should take place:
1. When name delta is match and address delta is match and NI edit
distance is match then Matching is Automatch
When name, address and NI number all match then it is almost certainly the
same person.
2. When name delta is match and address delta is match and
Geographical distance is identical then Matching is Automatch
When name, address and the geolocation data all match then it is almost
certainly the same person  the geolocation data corroborates a likely match.
3. When name delta is match and address delta is match and Gender
is match then Matching is Automatch
When name, address and the gender data all match then it is almost certainly
the same person  the gender data corroborates a likely match.
4. When name delta is match and NI edit distance is match and
Gender is match then Matching is Automatch
Both NI edit distance and gender suggest a match, because this combination
of three entries is picks up identical individuals who have moved addressed 
using the NI number as a unique identifier
There are also five situations where it should be sent to the data steward:
5. When name delta is match then matching is datasteward
Any names that match fully should be investigated as they are highly
likely to be duplicates.
6. If address delta is match and Home Phone Number is match then
matching is data steward
7. If address delta is match and Geographical Distance is not
different location then matching is data steward
8. If Home Phone Number and Geographical distance match then
matching is data steward
These are all cases of someone possibly else living at the same address. But
it will need to be investigated by a human data steward to determine if they
are different people and that it is not due to data errors
9. NI number matches then matching is data steward
NI number is a unique identifier. Any match should be investigated by a
human user if everything else doesnt match.
Note gender by itself, or geographical information are not enough to cause a
data steward call. Neither is home phone number. This was a difficult call to
make and may need to be edited when more information comes in. But
intuitively as a data steward you would suspect home phone numbers which are
not matched to geographical locations or addresses to be a mistake.
To cover the other circumstances uniquely I am using the following rule to detect
nonmatching states:
10.
If name delta is no match or address delta is no match or NI
Edit distance is no match or gender is no match then Matching is
no match.
However, a no-match state with home telephone number does not suggest
very much. Home phone number
11.
Address delta is no match and home phone number then
Matching is no match
12.
Geographical distance is not identical and home phone
number then Matching is no match
13.
NI Edit distance is no match and home phone number then
Matching is no match
These all demand that the person in question is identified uniquely or has the
same address before a penalty is incurred for the home phone number.
14.
Address delta is no match and geographical distance is not
identical then Matching is no match
This demands that the incorrect geographical distance only gives a penalty if
the address is not the same. Otherwise no rule is fired, which therefore acts
as a filter against otherwise useless information.
Defuzzification
I initially went with a centroid method, as ultimately I want the various peaks of
the Gaussians to be picked up and utilised.
Experiments and Tweaking:
To perform experiments I have taken a list of 20 matching items from
anonymised data in a current Informatica MDM system and 20 records that dont
match. I calculated the variables for these and used them as the base line for my
testing (see appendix) I wanted to see if the numbers intuitively matched. See
appendix F for a few examples of the testing process. The important thing is that
the rules in the system matched outputs in the fuzzy inference system. As
expected the non-matching entities produced low scores, however the
parameters for the outlying Gaussian had too large a tail. I changed the standard
deviation of the outliers to 0.05 and this produced much more satisfactory
results with non-matching entities having scores of 0.04-0.1 on average.
The next major issue faced was the data stewardship Gaussian tended to blur
out the maximum Gaussian. The chief cause of this was that the data steward
and match rules conflicted with each other and got fired at the same time. To
remedy this I changed to data steward rules to be the following:
1. When name delta is nomatch and address delta is match and NI
edit distance is match then matching is datasteward and gender
is match then matching is data steward
2. If address delta is match and Home Phone Number is match and
geographic location is not identical then matching is data steward
3. If name is no match and address delta is match and Geographical
Distance is not different location then matching is data steward
4. If name is no match and Home Phone Number and Geographical
distance match then matching is data steward
5. NI number matches and gender matches then matching is data
steward
6. If name delta is not match and address delta is match and NI
number is match and geographical distance is identical then
matching is datasteward
7. If name delta is match and address is match and geographical and
NI number doesnt match distance is identical then matching is
datasteward
8. If name delta is match and gender is no match then matching is
data steward
9. If name delta matches and address doesnt match and geographic
distance is moved then matching is data steward
These no longer fire when all conditions are met. The last four were added to
capture an extra couple of conditions I noticed were need in the data it more it
more plausible. These were derived when I noticed that pairs of data entries I
would have matched in the anonymised data set but that were not being flagged
for potential matching because of the no match rules.
As expected, gender has a very sharp effect on the outcomes (it can instantly
make a full match to a review). The other values have more gradual effects on
the outcome, going from a score (out of 1) of 0.95 with a gradual linear decrease
for the other values before a cut-off point where one can be certain that the
difference statistic is large enough to prevent matching, which usually pushes it
into non-matching territory. I would advise that 0.75 would be a good threshold
for automatching and 0.3 for non-matching, though as in real MDM this would
depend heavily on the clients risk assessment and a detailed overview of the
examples to set the appropriate threshold levels.
I also added in additional gender requirements for all automatch rules so that
incorrect gender cannot give anything but a data steward referral result. With
this I got a nice balance of curves for the various options, with the majority of
results from my test data coming out with the right score, and intermediate
scores for other results which meant that an ordering was possible based on
closeness. I.e. an entry which matched with 2 edit distances away from the
address would also be below one with 1 edit distance, which is a desirable
feature and something that the traditional rule base system could not do easily
but which probabilistic systems could  and this fills a gap between the two
systems. Please see appendix for the m file listing of these rules.
I experimented with triangular and trapezoidal rules for the output. This had no
significant effect on the outcome, however it was harder to construct triangular
curves that produced the spiked output I wanted for closely matching entities.
For the inputs I replaced the sigmoidals with triangular functions which produced
similar effects. It was easier to select parameters which didnt cause the
functions to miss reaching the maximum value of 1 due to the degree of
curvature being insufficient which was an advantage when building and
understanding the parameters effects on the shape of the membership function
 there were no attempts to define it out of bounds to simulate as close as
possible the membership function reaching 1. For example in the name delta it
was necessary to have a sigmoid with a c  the intercept point of 1.75 to
reproduce the desired curve. With simple trapezoid and triangular functions one
simply states the points where the transitions take place, making its definition
much less opaque. The outputs were very similar  and didnt really affect
defuzzication, the end results for, say a data steward match were typically arounf
0.4-0.5 as in the previous system (depending on the variables) - but perhaps had
the advantage of simplicity. The appendix contains the m file listing for this.
The testing of the defuzzificaiton was relatively easy in the end. Because I have
membership functions that reach maximum at the beginning and the end and a
balancing middle membership function that rarely reaches maximum, I found
that smallest of maximums, mean of maximums and largest of maximums
produces sharp shifts between 0 and 1 as one or the other membership function
on the edges of the output range jumped to one, which in some sense rendered
the defuzzification far too coarse to the point of being meaningless. Because of
my triangular or Gaussian curves there was little to choose from between
centroid and bisector, so I used the centroid for its simplicity and because the
definition of a bisector negates some of what I was trying to achieve with the
narrow Gaussians by ignoring their moment contribution at their tips.
Conclusion
This was designed as a proof of concept on a small set of data. I feel that it has
been sufficiently useful that potentially it could be the basis for further
development  and it offers a useful means to score matching in rules based
MDM systems without necessarily having to resort to complex and
computationally expensive probabilistic calculations and matching. Whether or
not one classifies two items as matching is essentially a linguistic variable that
will display a good deal of fuzzy uncertainty between people and seems ideally
suited to fuzzy modelling.
To implement it in reality it would be critical to have a way of mapping from user
requests or rules to fuzzy rules. This would not be a simple task, but basic
models could be mapped into member functions - these, for what it is worth,
could also be enhanced by linguistic hedges which would give the user finer
grained control over the matching rules than they have now - and simple user
requests into user rules. As I found calibrating the rules themselves is difficult
and this would be a major challenge for automating the system. Nonetheless it is
a least conceivably possible, especially considering there are libararies in Java
such as jFuzzyLogic  whose basic features are not so different to MATLAB - which
could integrate into the necessary softwares API and be used as a plug-in.
Bibliography
Black, P. E. (2013). Levenshtein distance (Vol. Dictionary of Algorithms and Data
Structures ). U.S. National Institute of Standards and Technology.
Haselden, R. W. (2006, November). The What, Why, and How of Master Data
Management. Retrieved from Microsoft Developer Network:
http://msdn.microsoft.com/en-us/library/bb190163.aspx
Lilly, J. H. (2010). Fuzzy Control and Identification. Wiley.
Lira, J. (2013). External Match. Informatica University.
Ross, T. J. (2004). Fuzzy Logic: with Engineering Applications (2nd ed.).
Whei-Jen Chen, B. A. (2014). Building 360-Degree Information Applications. IBM
Redbooks.
Wolter, R. (2007, April). Master Data Management (MDM) Hub Architecture.
Retrieved from Microsoft Development Network:
http://msdn.microsoft.com/en-us/library/bb410798.aspx
Appendix A: Membership Sets first trial:
Name delta:
match
1
nomatch
Degree of membership
0.8
0.6
0.4
0.2
0
0
0.5
1.5
name delta
2.5
Address Delta:
match
1
nomatch
Degree of membership
0.8
0.6
0.4
0.2
0
0
0.5
1.5
NI Edit distance
2
2.5
3
address delta
3.5
4.5
nomatch
1
match
Degree of membership
0.8
0.6
0.4
0.2
0
0
10
20
30
40
50
60
NI Edit distance
70
80
90
100
Home Phone Number edit distance:
nomatch
1
match
Degree of membership
0.8
0.6
0.4
0.2
0
0
4
5
NI Edit distance
Geographical distance
differntlocation
identical
identical
Degree of membership
0.8
0.6
0.4
0.2
100
200
300
400
500
Geographical distance
600
700
800
900
1000
Gender
nomatch
match
Degree of membership
0.8
0.6
0.4
0.2
-1
-0.8
-0.6
-0.4
-0.2
0
Gender
0.2
0.4
Match decision:
nomatch
1
datasteward
automatch
Degree of membership
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Matching
0.7
0.8
0.9
0.6
0.8
Appendix B: Sigmoidal/Gaussian m file:
fuzzymdmfis=newfis('fuzzymdmfis');
fuzzymdmfis=addvar(fuzzymdmfis,'input','name delta',[0 3]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',1,'nomatch','sigmf',[3.75 1.75]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',1,'match','sigmf',[-3.75 1.75]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','address delta',[0 5]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',2,'nomatch','sigmf',[2.5 3.0]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',2,'match','sigmf',[-2.5 3.0]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','NI Edit distance',[0 9]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',3,'match','sigmf',[20.0 2.0]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',3,'nomatch','sigmf',[-20.0 2.0]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Home Phone Number Edit distance',[0
11]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',4,'match','sigmf',[20.0 2.0]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',4,'nomatch','sigmf',[-20.0 2.0]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Geographical distance',[0 1000]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'differntlocation','sigmf',[3.0
1.5]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'moved','gaussmf',[20.0 25.0]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'identical','sigmf',[-3 1.5]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Gender',[-1 1]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',6,'match','sigmf',[20 0]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',6,'nomatch','sigmf',[-20 0]);
fuzzymdmfis=addvar(fuzzymdmfis,'output','Matching',[0 1]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'nomatch','gaussmf',[0.05 0]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'datasteward','gaussmf',[0.25
0.5]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'automatch','gaussmf',[0.05 1]);
Appendix C: Triangular/Trapezoidal Gaussian
Membership Functions
match
1
nomatch
1
nomatch
0.8
Degree of membership
0.8
Degree of membership
match
0.6
0.4
0.2
0.6
0.4
0.2
0
0
0.5
1.5
name delta
2.5
match
1
nomatch
3
4
5
6
7
8
Home Phone Number Edit distance
10
nomatch
11
match
0.8
0.6
Degree of membership
Degree of membership
0.8
0.4
0.2
0
0
0.5
1.5
2
2.5
3
address delta
3.5
4.5
0.6
0.4
0.2
0
-1
nomatch
1
-0.6
-0.4
-0.2
0
Gender
identical
moved
match
0.2
0.4
0.6
0.8
differntlocation
0.8
Degree of membership
0.8
Degree of membership
-0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
4
5
NI Edit distance
100
200
300
400
500
Geographical distance
600
700
800
900
1000
nomatch
1
datasteward
automatch
Degree of membership
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Matching
0.7
0.8
0.9
Appendix D: Triangular/Tapezoidal m file:
fuzzymdmfis=newfis('fuzzymdmfis');
fuzzymdmfis=addvar(fuzzymdmfis,'input','name delta',[0 3]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',1,'nomatch','trapmf', [1 1.5 3 3]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',1,'match','trimf', [0 0 2]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','address delta',[0 5]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',2,'nomatch','trapmf', [1.4 3.5 5
5]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',2,'match','trimf', [0 0 3.5]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','NI Edit distance',[0 9]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',3,'match','trapmf', [1.35 4 9 9]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',3,'nomatch','trimf', [0 0 5]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Home Phone Number Edit distance',[0
11]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',4,'match','trapmf', [1 3 11 11]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',4,'nomatch','trimf', [0 0 3.9]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Geographical distance',[0 1000]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'differntlocation','trapmf', [25 25
1000 1000]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'moved','trimf', [0 25 75]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',5,'identical','trimf', [0 0 50]);
fuzzymdmfis=addvar(fuzzymdmfis,'input','Gender',[-1 1]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',6,'match','trapmf', [-0.2 0.2 1 1]);
fuzzymdmfis=addmf(fuzzymdmfis,'input',6,'nomatch','trapmf', [-1 -1 0
0.2]);
fuzzymdmfis=addvar(fuzzymdmfis,'output','Matching',[0 1]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'nomatch','trimf', [0 0 0.5]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'datasteward','trimf', [0 0.5
1]);
fuzzymdmfis=addmf(fuzzymdmfis,'output',1,'automatch','trimf', [0.5 1 1]);
Appendix E: Rule System
This Matlab code adds the necessary rules to both systems:
rules
2 2 2
2 2 0
2 2 0
2 0 2
2 1 1
1 0 2
1 2 0
1 2 0
0 1 2
0 1 0
1 1 1
0 1 0
0 0 0
0 0 1
0 1 0
1 2 2
2 2 1
2 0 0
2 1 0
=
0
0
0
0
0
0
2
0
0
2
0
1
1
1
0
0
0
0
0
[
0 1 3 1 1
3 1 3 1 1
0 1 3 1 1
0 1 3 1 1
0 0 2 1 1
0 1 2 1 1
0 1 2 1 1
3 0 2 1 1
0 1 2 1 1
-1 0 2 1 1
0 2 1 1 2
0 0 1 1 1
-3 2 2 1 1
0 2 1 1 1
-3 0 1 1 2
3 0 2 1 1
3 0 2 1 1
0 2 2 1 1
2 1 2 1 1];
fuzzymdmfis = addrule(fuzzymdmfis, rules);
Appendix F Testing:
Sample
Matching
fields
Result in
system?
Matched with
fuzzy
inference?
Yes
James
Frederick
Abrhams
James
Frederick
Abrhams
Name 3
characters, NI
Number,
Address,
Gender
Automatch
James
Madrigalson
Fairbuck
James
Modrigalson
Fairbuck
Elizabeth
Podrigal
Ginzer
James Toons
Ginzer
Name 2
characters,
Address, ,
Gender
Data steward
referral
Yes
Name 1
character
Fail
No
Successful?
No
This example
led to the
rejigging of
the
membership
function to
peak at the
edges
Yes
Yes. Although
it did trigger
one of the
rules for data
stewardship
the
membership
value when
only 1 name
is different is
too low to
trigger which
is the desired
result