Multidimensional scaling
From Wikipedia, the free encyclopedia Jump to: navigation, search Multidimensional scaling (MDS) is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. MDS is a special case of ordination. An MDS algorithm starts with a matrix of itemitem similarities, then assigns a location to each item in N-dimensional space, where N is specified a priori. For sufficiently small N, the resulting locations may be displayed in a graph or 3D visualisation.
Contents
1 Types 2 Details 3 Procedure 4 Applications o 4.1 Marketing o 4.2 Comparison and advantages 5 Implementations 6 See also 7 Bibliography 8 External links
Types
MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: Classical multidimensional scaling Also known as Principal Coordinates Analysis, Torgerson Scaling or Torgerson Gower scaling. Takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain.[1] Metric multidimensional scaling A superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Non-metric multidimensional scaling In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression.
Louis Guttman's smallest space analysis (SSA) is an example of a non-metric MDS procedure.
Generalized multidimensional scaling An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In case when the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.[2]
Details
The data to be analyzed is a collection of objects (colors, faces, stocks, . . .) on which a distance function is defined, i,j := distance between i th and j th objects. These distances are the entries of the dissimilarity matrix
The goal of MDS is, given , to find vectors for all ,
such that
where is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function.[3] In other words, MDS attempts to find an embedding from the objects into RN such that distances are preserved. If the dimension N is chosen to be 2 or 3, we may plot the vectors xi to obtain a visualization of the similarities between the objects. Note that the vectors xi are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances .
There are various approaches to determining the vectors xi. Usually, MDS is formulated as an optimization problem, where for example, is found as a minimizer of some cost function,
A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix eigendecompositions.[citation needed]
Procedure
This article may be confusing or unclear to readers. Please help clarify the article; suggestions may be found on the talk page. (November 2009) There are several steps in conducting MDS research: 1. Formulating the problem What variables do you want to compare? How many variables do you want to compare? More than 20 is often considered cumbersome.[citation needed] Fewer than 8 (4 pairs) will not give valid results.[citation needed] What purpose is the study to be used for? 2. Obtaining input data Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7 point Likert scale from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as where Q is the number of questions and N is the number of brands. This approach is referred to as the Perception data : direct approach. There are two other approaches. There is the Perception data : derived approach in which products are decomposed into attributes that are rated on a semantic differential scale. The other is the Preference data approach in which respondents are asked their preference rather than similarity. Running the MDS statistical program Software for running the procedure is available in many software for statistics. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). Decide number of dimensions The researcher must decide on the number of dimensions they want the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to interpret the results. Mapping the results and defining the dimensions The statistical program (or a related module) will map the results. The map will plot each product (usually in two dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. The dimensions must be labelled by the researcher. This requires subjective judgment and is often very challenging.[vague] The results must be interpreted (see perceptual mapping).[vague] Test the results for reliability and validity Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level.[citation needed] An Rsquare of 0.8 is considered good for metric scaling and .9 is considered good for nonmetric scaling. Other possible tests are Kruskals Stress, split data tests, data stability tests (i.e., eliminating one brand), and test-retest reliability. Report the results comprehensively Along with the mapping, at least distance measure (e.g., Sorenson index, Jaccard index) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the Monte Carlo method results, the number of iterations, the assessment of stability, and the proportional variance of each axis (rsquare).
3.
4.
5.
6.
7.
Applications
Applications include scientific visualisation and data mining in fields such as cognitive science, information science, psychophysics, psychometrics, marketing and ecology. New applications arise in the scope of autonomous wireless nodes that populate a space or an area. MDS may apply as a real time enhanced approach to monitoring and managing such populations. Furthermore, MDS has been used extensively in geostatistics for modeling the spatial variability of the patterns of an image, by representing them as points in a lower-dimensional space.[4]
Marketing
In marketing, MDS is a statistical technique for taking the preferences and perceptions of respondents and representing them on a visual grid, called perceptual maps.
Comparison and advantages
Potential customers are asked to compare pairs of products and make judgments about their similarity. Whereas other techniques (such as factor analysis, discriminant analysis, and conjoint analysis) obtain underlying dimensions from responses to product attributes identified by the researcher, MDS obtains the underlying dimensions from respondents judgments about the similarity of products. This is an important advantage.[citation needed] It does not depend on researchers judgments. It does not require a list of attributes to be shown to the respondents. The underlying dimensions come from respondents judgments about pairs of products. Because of these advantages, MDS is the most common technique used in perceptual mapping.
MULTIDIMENSIONAL SCALING
Forrest W. Young, University of North Carolina This paper originally appeared in Kotz-Johnson (Ed.) Encyclopedia of Statistical Sciences, Volume 5, Copyright (c) 1985 by John Wiley & Sons, Inc. This HTML document derives from the original as follows: The original was scanned and converted to text by optical-character-recognition software, which was then edited using MS-WORD. This in turn was converted to HTML automatically. The figures also derive from scanning the originals. I apologize for poor quality and mistakes in processing the text. The software and hardware for doing this is far from perfect.
ABSTRACT In this entry we summarize the major types of multidimensional scaling (MDS), the distance models used by MDS, the similarity data analyzed by MDS, and the computer programs that implement MDS. We also present three brief examples. We do not discuss experimental design, interpretation, or the mathematics of the algorithms. The entry should be helpful to those who are curious about what MDS is and to those who wish to know more about the types of data and models relevant to MDS. It should help the researcher. the statistical consultant, or the data analyst who needs to decide if MDS is appropriate for a particular set of data and what computer program should be used. For a more complete, but still brief. introduction to MDS, the, reader should turn to Kruskal and Wish [6]. A complete discussion of the topics covered here as well as of experimental design. data analysis, and interpretive procedures can be found in Schiffman et al. [141. An intermediate-level mathematical treatment of some MDS algorithms is given in Davison (1983). An advanced treatment of the theory of MDS, illustrated with innovative applications, is presented by Young and Hamer [211]. Reviews of the current state of the art are presented by Young (1984a; 1984b). Multidimensional scaling is related to principal component analysis, factor analysis, cluster analysis, and numerical taxonomy: the reader is referred to the appropriate entries in this encyclopedia, along with the SCALING and PROXIMITY DATA entries.
1. OVERVIEW OF MULTIDIMENSIONAL SCALING Multidimensional scaling (MDS) is a set of data analysis techniques that display the structure of distance-like data as a geometrical picture. It is an extension of the procedure discussed in SCALING. MDS has its origins in psychometrics. where it was proposed to help understand people's judgments of the similarity of members of a set of objects. Torgerson [18] proposed the first MDS method and coined the term. his work evolving from that of Richardson [11]. MDS has now become a general data analysis technique used in a wide variety of fields [14]. For example, the book on theory and applications of MDS by Young and Hamer, [21] presents applications of MDS in such diverse fields as marketing'. sociology, physics, political science', and biology. However, we limit our examples here to the field with which the author is most familiar, psychology. MDS pictures the structure of a set of objects from data that approximate the distances between pairs of the objects. The data, which are called similarities. dissimilarities, distances, or proximities, must reflect the amount of (dissimilarity between pairs of the. In this article we use the term similarity generically to refer to both similarities (where large numbers refer to great similarity) and to dissimilarities (where large numbers refer to great dissimilarity).
In addition to the traditional human similarity judgment, the data can be an "objective" similarity measure (the driving time between pairs of cities) or an index calculated from multivariate data (the proportion of agreement in the votes cast by pairs of senators). However, the data must always I represent the degree of similarity of pairs of objects (or events). Each object or event is represented by a point in a multidimensional space. The points are arranged in this space so that the distances between pairs of points have the strongest possible relation to the similarities among the pairs of objects. That is, two similar objects are represented by two points that are close together, and two dissimilar objects are represented by two points that are far apart. The space is usually a two- or three-dimensional Euclidean space, but may be non-Euclidean and may have more dimensions. MDS is a generic term that includes many different specific types. These types can be classified according to whether the similarities data are qualitative (called nonmetric MDS) or quantitative (metric MDS). The number of similarity matrices and the nature of the MDS model can also classify MDS types. This classification yields classical MDS (one matrix, unweighted model), replicated MDS (several matrices, unweighted model), and weighted MDS (several matrices, weighted model). We discuss the nonmetric-metric and the classical-replicated-weighted classifications in the following subsections. CLASSICAL MDS The identifying aspect of classical MDS (CMDS) is that there is only one similarity matrix. Table I is a matrix of similarity data suitable for CMDS. It contains the flying mileages between 10 American cities. The cities are the "objects." and the mileages are the "similarities." An MDS of these data gives the picture in Fig. 1, a map of the relative locations of these 10 cities in the United States. This map has 10 points, one for each of the 10 cities. Cities that are similar (have short flying mileages) are represented by points that are close together, and cities that are dissimilar (have large mileages) by points far apart.
Generally, CMDS employs Euclidean distance to model dissimilarity. That is, the distance dij between points i and j is defined as
Where xi. Specifies the position (coordinate) of point i on dimension a. The distance can also be defined according to the Minkowski model:
Where the value of p (<1) is set by the investigator. For either definition of distance there are n points, one for each of the n objects. There are also r dimensions, where the value of r is determined by the investigator. The coordinates xia are contained in the nXr matrix X. Using matrix algebra, the Euclidean model can be defined as
where xi is the ith row of X and contains the r coordinates of the ith point on all r dimensions. A simple matrix expression for the Minkowski model is not possible. For both models, the distances dij are contained in the n X n symmetric matrix D. Finally. The similarity sij are contained in the matrix S, also n X n. METRIC CMDS. The first major CMDS proposal [18] was metric (i.e.. the similarities had to be quantitative). Torgerson's development required the data to be at the ratio level of measurement, although this was soon generalized to the interval level [8]. While the data could contain random error, this early type of MDS required that the data be dissimilarities (not similarities), complete (no missing values), and symmetric (the dissimilarity of objects I and J had to equal that of objects J and 1). These CMDS proposals also required the distance model to be Euclidean. The flying mileage example is metric CMDS because flying mileages are at the ratio level of measurement. For metric CMDS the distances D are determined so that they are as much like the dissimilarities S as possible. There are a variety of ways in which "like" is strictly defined, but a common one is a least-squares definition. In this case we define 1 {S} = D + E. Where I {S} is read "a linear transformation of the similarities." If the measurement level is ratio, and then the linear transformation has a zero intercept, but can be nonzero when the level is interval. If the data are similarities. The slope of the transformation is negative: if dissimilarities, it is positive. In the preceding equation, E is a matrix of errors (residuals) that in the leastsquares optimization situation, we wish to minimize. Since the distances D are a function of the coordinates X, the goal of CMDS is to calculate the coordinates X so that the sum of squares of E is minimized, subject to suitable normalization of X. We also need to calculate the best linear transformation l{S}. Torgerson's method does not actually minimize the sum of squares of E, nor do ALSCAL or MULTISCALE. The KYST, MINTSSA, and SMACOF programs do. These programs are all discussed in the Computer Programs section.
NONMETRIC CMDS. The second major CMDS proposal [5, 15] was nonmetric. That is. The data could be at the ordinal level of measurement (see ORDINAL DATA). In addition. The data S could be either complete or incomplete. Symmetric or asymmetric, and similarities or dissimilarities. These nonmetric CMDS proposals extended the distance model to the Minkowski case and generalized the relation between similarities and distances. They enable defining m {S} = D + E, Where m {S} is read "a monotonic transformation of the similarities." If S is actually dissimilarities then m {S} preserves order, whereas if S is similarities, it reverses order. Thus, for nonmetric CMDS, we need to solve for the monotonic (order-preserving) transformation m{S} and the coordinates X. which together minimize the sum of squares of the errors E (after normalization of X). This exact problem is solved by the MINISSA, KYST, and SMACOF programs (discussed in the final section) while ALSCAL and MULTISCALE solve closely related problems. The nonmetric optimization represents a much more difficult problem to solve than the metric problem and is an important breakthrough in multidimensional scaling. In fact, nonmetric CMDS is the first example of using quantitative models to describe qualitative data that belongs to the approach discussed by Young [19] (see QUALITATIVE DATA. STATISTICAL ANALYSIS). It is reassuring to know that when we degrade the flying mileages (Table 1) into ranks of flying mileages, and then submit the ranks to nonmetric CMDS, the map that results is indistinguishable from that shown in Fig. 1. REPLICATED MDS The next major development, replicated MDS (RMDS), permitted the analysis of several matrices of similarity data simultaneously [7]. There are m matrices S, one for each subject k, k = 1, . . . , m. RMDS uses the same distance models as CMDS, but uses them to describe several similarity matrices rather than one. With RMDS, the matrix of distances D is determined so that it is simultaneously like all the similarity matrices Sk. For metric RMDS, the least-squares definition of "like" is 1k {S k} = D + E k, Where 1k {Sk} is the linear transformation of the kth similarity matrix Sk which best fits the distances D. The data may be similarities or dissimilarities and may be at the ratio or interval levels, just as in metric CMDS. The analysis minimizes the sum of the squared elements in ail error matrices Ek. subject to normalization of X.
For nonmetric RMDS we minimize the several Ek in mk {Sk} = D + Ek , Where mk {Sk} is the monotonic transformation of the similarity matrix Sk that is a least-squares fit to the distances in matrix D. The data may be similarities of dissimilarities. Just as in CMDS. Note that for RMDS each linear or monotonic transformation /k. or mk is subscripted, letting each data matrix Sk have a unique linear or monotonic relation to the distances D. Since k ranges up to m. there are m separate linear or monotonic transformations, one for each of the m dissimilarity matrices Sk. This implies that RMDS treats all the matrices of data as being related to each other (through D) by a systematic linear or monotonic transformation (except for a random error component). The KYST and SMACOF programs minimize the sum of squares of Ek, while ALSCAL and MULTISCALE solve other closely related problems. In psychological terms, RMDS accounts for differences in the ways subjects use the response scale (i.e., differences in response bias). Jacobowitz [4] used RMDS to study the way language develops as children grow to adulthood. In his experiment he asked children and adults to judge the similarity of all pairs of 15 parts of the human body. The judges were five-, seven-. And nine-year-olds, and adults. There were 15 judges at each age. Four separate RMDS analyses were done. One for each age group.
The RMDS results for the five-year-olds are shown in Fig. 2a, and for the adults in Fig. 2b. The analysis located the points in the space, but did not draw the lines. The lines were drawn by Jacobowitz to interpret the
psycholinguistic structure that people have for body-part words. Jacobowitz theorized that the structure would be hierarchical. We can see that it is. He further theorized that the structure would become more complex as the children become adults. This theory is also supported, since the adults' hierarchy also involves a classification of corresponding arm and leg terms. (In Fig. 2b the corresponding terms are linked by dashed lines. the implied classification terms are shown in parentheses, and the word sole, which was not a stimulus, is shown in the position that we would predict it to be in if the study were repeated.) WEIGHTED MDS The next major MDS development, weighted MDS (WMDS), generalized the distance model so that several similarity matrices Sk could be assumed to differ from each other in systematically nonlinear or nonmonotonic ways. Whereas RMDS only accounts for individual differences in response bias, WMDS incorporates a model to account for individual differences in the fundamental perceptual or cognitive processes that generate the responses. For this reason, WMDS is often called individual differences scaling (INDSCAL) and is often regarded as the second major breakthrough in multidimensional scaling. WMDS invokes the following definition of weighted Euclidean distance:
Which, in matrix algebra is
Where Wk is a r x r diagonal matrix. The diagonal values, which must not be negative, are weights for subject k on each of the r dimensions. WMDS is appropriate for the same type of data as RMDS. However, RMDS generates a single distance matrix D, while WMDS generates m unique distance matrices Dk, one for each data matrix Sk. The distances Dk are calculated so that they are all as much like their corresponding data matrices Sk as possible. For metric WMDS, the least-squares problem is I, {Sk} = Dk + Dk , And for nonmetric WMDS, the problem is mk {Sk} = Dk + Ek..
Thus, for WMDS, we need to solve for the matrix of coordinates X, the m diagonal matrices of weights Wk, and the m transformations Mk or 1. We wish to do this so that the sum of squared elements in all error matrices, E, is minimal subject to normalization constraints on X and Wk. Neither of the two most commonly used computer programs solves either of the problems defined. (These programs and others are discussed in the last section.) The INDSCAL program, by Carroll and Chang [1], provided the first metric WMDS solution. However, it optimizes the fit of scalar products to a transformation of the data. The ALSCAL program, by Takane et al. [17] (see also Young and Lewyckyj [22] and Young et al. [24], provided the first and still the only algorithm to incorporate both nonmetric and metric solution to WMDS and optimize the fit of squared distances to the data. In fact, ALSCAL is still the only algorithm to provide the user with nonmetric and metric solutions to the CMDS, RMDS, and WMDS situations discussed, and it is regarded as the third major breakthrough in multidimensional scaling. The MULTISCALE algorithm by Ramsay (101 provided the first metric WMDS solution to optimize the preceding index (it fits distances to the data). Finally, the SMACOF algorithm [2, 3] and its associated program [16], which is still under development, will more than likely be the first program to be able to fit distances to the data so that the sum of squares of E is strictly minimized, where the distances may be CMDS, RMDS, or WMDS distances, and where the transformation may be metric or nonmetric. While WMDS incorporates the RMDS notion of individual differences in response bias (via mk and lk), the important aspect of WMDS is that it provides specific parameters for individual variation in cognitive or perceptual processes. These parameters are the weights. The weights are interpreted as the importance, relevance. or salience of each dimension to each individual. A large weight means that the dimension is important to the individual, a small weight means the dimension is unimportant. If the similarity matrices correspond to experimental conditions, say, rather than to people, the interpretation is that the weights reflect the importance of each dimension in the various experimental conditions. The Jacobowitz data already discussed provide a nice example of WMDS. An analysis of the 15 five-year-olds together with the 15 adults provided the results displayed in Fig. 3. In Fig. 3a, we see that the stimulus structure is the anticipated hierarchy. In Fig. 3b, which is the weight space. we see that the children and adults occupy different parts of the space, showing that the children and adults have different cognitive structures for parts of the body.
2. COMPUTER PROGRAMS Several computer programs have become a significant part of the MDS discipline. These programs and several of their characteristics are listed in Table 2. A complete reference for each program is given in the bibliography.
The first four rows of Table 2 refer to the type of data each program can analyze. The information includes whether each program can analyze similarity data in addition to dissimilarity data; asymmetric data in addition to symmetric data; data with missing elements in addition to data without; and two-way in addition to three-way data. The next two rows of Table 2 refer to the types of analyses each program can provide. The Measurement row refers to whether the program can provide only nonmetric analysis (N), only metric analyses (M), or both (MN). The Model row refers to whether the program can provide analyses that are classical (C), replicated (R), weighted (W), or other types (0). The next three rows of Table 2 refer to several aspects of the iterative algorithm employed by each program. The Fit row refers to the aspect of the model that is fit to (a transformation of) the data (D indicates distances. P scalar products, S squared distances, and L log distances). The Algorithm row indicates whether the program is a leastsquares program (L) or a maximum likelihood program (M). The Converge row shows whether the algorithm is convergent (each iteration must improve the fit index being optimized) or not. The final four rows specify the maximum size problem that can be analyzed by each program. Some programs place specific limits on the number of stimuli. matrices, dimensions, or total number of data elements. These limits are indicated by a number. Other programs are dynamic and place no limit. These are indicated as "dyn." The ALSCAL-84, MULTISCALE-2, and SMACOF-IB programs are the current state-of-the-art programs. Of the programs listed in Table 2, ALSCAL-84 [23] is the most flexible, fitting the widest range of models to the widest range of data. The ALSCAL algorithm is convergent (which is desirable) and is faster
than MULTISCALE but slower than SMACOF. ALSCAL is the only MDS program currently available in major statistical systems (SAS and SPSS) and is the easiest program to use. However, the algorithm optimizes the fit of squared distances to the dissimilarities. which is not the most desirable optimization criterion. ALSCAL is descriptive, having no inferential aspects. MULTISCALE-2 [10] has the unique feature that it is based on the maximum likelihood principle. Of the programs listed in Table 2. it is the only one that enables statistically based significance tests and that can be used for inferential purposes. MULTISCALE provides the user with a selection of models that is smaller than that provided by ALSCAL, but larger than that provided by SMACOF. However, of the three programs, MULTISCALE is the least flexible in the types of data that can be analyzed, and the slowest. Also it has a nonconvergent algorithm. SMACOF-IB [16], clearly the fastest of these three programs. optimizes the fit of distances to dissimilarities by a convergent algorithm. The algorithm [2, 3] is the simplest and most elegant of any program listed in Table 2. It fits the CMDS. RMDS, and WMDS models and is as flexible as ALSCAL in the types of data it can analyze. However, SMACOF is currently under active development: it is difficult to use and is not available in any statistical package. When fully mature, SMACOF will be the program of choice. REFERENCES