Ye Yang - Pattern Analysis of Functional MRI Data
Ye Yang - Pattern Analysis of Functional MRI Data
Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING Approved Dr. Ranadip Pal
Chairperson of the Committee
August, 2010
ACKNOWLEDGMENTS
First I wish to express my sincere gratitude to Dr. Ranadip Pal, my thesis chairperson. His inspiration and guidance made this thesis possible. His generous support and professional advice enabled me to develop and understand the research. I also want to give thanks to Dr. Sunanda Mitra for her encouragement, guidance and support from the initial to the final level. Besides, I would like to thank my lab mates. They provided valuable information for my thesis and constant suggestions and assistance. Without their help, I would face many difficulties while doing this thesis. I had a great time working with them. Finally, I want to give my best appreciation to my parents and friends for always being there for me, and for their understandings and supports for me in completing this thesis. Whenever thinking of them, I gained courage to continue my education, even when I was discouraged.
ii
TABLE OF CONTENTS
ACKNOWLEDGMENTS .................................................................................................... ii ABSTRACT ..................................................................................................................... iv LIST OF FIGURES ............................................................................................................ v I. INTRODUCTION ........................................................................................................... 1 1.1 Background ........................................................................................................ 1 1.2 Problem Statement ............................................................................................. 2 1.3 Outline ................................................................................................................ 4 II. DATA ......................................................................................................................... 6 2.1 Experiments and Equipments............................................................................... 6 2.2 Data Preprocessing Tools ..................................................................................... 7 2.3 Data Preprocessing Process.................................................................................. 8
2.3.1 Step 1 ...................................................................................................................................................... 8 2.3.2 Step 2 ...................................................................................................................................................... 9 2.3.3 Step 3 ....................................................................................................................................................11
III. M ETHODS .............................................................................................................. 13 3.1 Feature Selection and Sequential Floating Forward Search............................... 13 3.2 Spatial SFFS ....................................................................................................... 16 3.3 Classification and Classifier Error Estimation ................................................... 16 IV. R ESULTS ................................................................................................................. 18 V. CONCLUSIONS AND FUTURE WORK......................................................................... 30 REFERENCES ................................................................................................................ 31
iii
ABSTRACT
A fundamental goal of the analysis of fMRI data is to locate areas of brain activation that can differentiate various cognitive tasks. Traditionally, researchers have approached fMRI analysis through characterizing the relationship between cognitive variables and individual brain voxels. In recent years, multivariate approaches (analyze more than one voxel at once) to fMRI data analysis have gained importance. But in the majority of the multivariate approaches, the voxels used for classification are selected based on prior biological knowledge or discriminating power of individual voxels. We used the sequential floating forward search (SFFS) feature selection approach for selecting the voxels and applied it to distinguish the cognitive states of whether a subject is doing a reasoning or a counting task. We obtained superior classifier performance by using the sequential approach to feature selection as compared to selecting the features with best individual classifier performance. We analyzed the problem of over-fitting in this extremely high dimensional feature space with limited training samples. For estimating the accuracy of the classifier, we employed various estimation methods and discussed their importance in this small sample scenario. Also we modified the feature selection algorithm by adding spatial information to incorporate the biological constraint that spatially nearby voxels tends to represent similar things. For our cognitive task, the spatial SFFS approach picked features with increased classifier performance for multi-subject data.
iv
LIST OF FIGURES
1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Temporal Cortex .................................................................................... 3 Visual Cortex.......................................................................................... 4 Basic Steps involved in classification of cognitive states based on fMRI data ................................................................................ 5 Reasoning Task Sample ......................................................................... 6 Counting Task Sample ........................................................................... 7 BET Brain Extraction Tool in FSL ..................................................... 9 FLIRT Tool Cost Function Setting ................................................... 10 FLIRT Tool Interpolation Setting ..................................................... 10 Data Time Series .................................................................................. 11 Data Preprocess .................................................................................... 12 LDA...................................................................................................... 17 Classification Accuracy by NRSFFS Approach Leave one out .................................................................................................. 18 Classification Accuracy by NRSFFS Approach Resubstitution ........................................................................................... 19 Classification Accuracy by NRSFFS Approach 6- fold cross validation..................................................................................... 19 Classification Accuracy by NRSFFS Approach Bootstrap .............................................................................................. 20 Classification accuracy for subjects 3 using NRSFFS approach for number of features varying from 1 to 10 ........................ 21 Classification accuracy for subjects 4 using NRSFFS approach for number of features varying from 1 to 10 ........................ 21 Classification accuracy using RSFFS approach for the eight subjects Leave one out ............................................................. 22 Classification accuracy using RSFFS approach for the eight subjects Re-substitution............................................................ 22 Classification accuracy using RSFFS approach for the eight subjects K-fold ......................................................................... 23
Texas Tech University, Ye Yang, May 2010 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 Classification accuracy for five subjects based on the spatial SFFS algorithm Leave one out .............................................. 23 Classification accuracy for five subjects based on the spatial SFFS algorithm Re-substitution............................................. 24 Classification accuracy for five subjects based on the spatial SFFS algorithm Bootstrap ..................................................... 24 Testing results based on SFFS with Leave one out crossvalidation methods ............................................................................... 25 Testing results based on spatial SFFS with Leave one out cross- validation methods................................................................ 26 Brodmann Area 18 and 19 ................................................................... 27 Brodmann Area 20 ............................................................................... 27 Brodmann Area 21 and 22 ................................................................... 28 Brodmann Area 36 ............................................................................... 28 Brodmann Area 37 ............................................................................... 28 Brodmann Area 39 ............................................................................... 29
vi
Chapter I INTRODUCTION
1.1 Background Functional Magnetic Resonance Imaging [1] is one of the most widely used non- invasive techniques for investigating human brain activity [2]. A primary goal of the analysis of fMRI data is to locate areas of brain activation that can differentiate various cognitive tasks. Pattern recognition from fMRI data has applications in developing Brain-Computer Interfaces (BCI) [3]; studying neural activation in normal and diseased brains [4] and lie detection [5]. An fMRI based BCI has to interpret the fMRI data from subjects considering different tasks and perform an action based on the predicted mental state of the participant. Traditionally, while analyzing the fMRI data, analysis methods have focused on the individual brain voxel. Individual voxel time series were analyzed to assess their predicting power in differentiating various cognitive states [6]. And it has been shown that the individual brain voxels analysis approach is quite productive. For example, the General Linear Model (GLM) analysis is one famous method to analysis the fMRI data voxel by voxel, and this method decides whether the voxel is activated or not by statistical analysis with the cognitive task. However, the limitation of the individual voxel analysis is also obvious. Since the analysis is based on the isolation between voxels in the brain, the connections between voxels are ignored. So, multivoxel patterns of activity are studied to gain further insight. In this case, multi- voxel activity will be considered at the same time and will be treated as a pattern for each cognitive state. Then the fMRI data analysis, in fact, could be seen as a pattern recognition/ classification problem, which enables us to use a number of methods in the pattern recognition field to solve the analysis problems in fMRI. A classification problem in fMRI is the problem of separating the fMRI data into different classes, and providing a criterion for determining whether the BOLD response of the subject at a particular time during the experiment is in a specific cognitive state or not. Also, there
Texas Tech University, Ye Yang, May 2010 are other benefits of the multi- voxel approach such as: 1. More sensitive detection of cognitive states. The Multi Voxel Pattern Analysis (MVPA) uses pattern classification techniques to extract the signal that is present in the pattern of response across multiple voxels, even if the voxels might not be significantly responsive to the conditions of interest. 2. Relating brain activity to behavior on a trial-by-trial basis. MVPA could correlate classifier estimates of a subject in a single experiment with behavioral measures across trials. 3. Characterizing the structure of the neural code. MVPA methods can be used to characterize how these cognitive states are represented in the brain. [7] In recent years, a variety of approaches have been proposed for multivariate analysis of fMRI data such as that based on machine learning, [7], [8] artificial neural networks, [9], [10] clustering [11][15] or independent component analysis (ICA) [16], [17 ]. Multivariate approaches based on machine learning treat the fMRI analysis as a classification problem with BOLD signals in the voxels as input features and they associate a class label to each input image [18]. The huge dimensionality of fMRI images makes it computationally prohibitive to use all voxels as features for classifier design. Furthermore, the number of samples (fMRI images collected corresponding to a cognitive state) is significantly lower than the total number of voxels per image which will result in over- fitting if all the voxels are used as features [19]. This is known as peaking phenomenon [20] where for small sample scenarios, classification accuracy decreases when a large number of features is used. Thus, there is a need for feature selection in fMRI analysis.
1.2 Proble m State ment The application of machine learning techniques for multivariate analysis of fMRI data is recently gaining momentum. There are four basic steps in MVPA analysis [7]. They are: Step 1: Feature Selection; Step 2: Pattern Assembly; Step 3: Classifier Training; and Step 4: Generalization Testing [7].
Texas Tech University, Ye Yang, May 2010 In step 1, feature selection approaches are mainly based on filter or wrapper approaches: the former do not interact with the final designed classifier, whereas the latter use classifier design in the search itself [19], [21]. Reducing the dimensionality of the fMRI data has mostly been achieved by region of interest (ROI) based feature reduction or principal component analysis (PCA) or independent component analysis (ICA) type approaches [7], [8], [16], [22], [23]. The other highly used approach is based on filter feature selection techniques where the voxels that, considered individually, do the best job of discriminating between the conditions of interest are selected [8], [23], [24]. Firstly, with the case of picking ROI in the brain by their biological knowledge, the problem is that the objective choice of the feature selection limits the analysis to the specific regions. For example, with the Haxbys representations of faces and objects data, the analysis area is constrained with the ventral temporal cortex (Figure 1.1) [25]; with the orientation dataset, the researchers only concentrate on the visual cortex (Figure 1.2) [26]. These methods make sense when the experiments of the fMRI signal acquisition are extremely simple. Then we can suppose that in a specific region of the brain is working/ most activated during the experiments. Even though, the assumption is correct, we cant say that other parts of the brain would not contribute in the test, which means we cant get rid of other parts of the brain in the analysis that easily especially when the experiments are not simple such as the dataset we use in this thesis; well explain the data and experiments in the second chapter in detail.
Figure 1.2 Visual Cortex [26] Secondly, the problem with univariate feature selection is that it might miss features that may not perform well individually but in combination with other features can produce a very accurate classifier. In this thesis, we use the wrapper approach of sequential floating forward search (SFFS) to select feature voxels which in combination is supposed to produce higher classification rates. It is to be noted that an exhaustive enumeration will be able to pick the best feature set combination but the extreme huge dimension of the data prohibits this approach. We obtained superior classifier performance in distinguishing the cognitive states of whether a subject is doing a reasoning or a counting task by using the sequential approach to feature selection as compared to selecting the features with best individual classifier performance. We also proposed a novel spatial SFFS approach that produced higher classification accuracy for multi-subject data as compared to regular SFFS. We utilized a number of approaches (leave-one out, cross- validation, re-substitution and bootstrap) to estimate the error of the designed classifiers and discussed their importance in the small sample scenario. When the number of samples is huge, all the error estimation methods will produce similar error estimates but with small number of samples they will have significant variations with different biases. 1.3 Outline The basic approach to multivariate analysis of fMRI data in this thesis can be described as a series of steps shown in Figure 1.1. Due to the extreme high dimensions (voxels/features) of fMRI datasets with limited number of training samples and presence of noise, it is highly imperative that careful consideration is given to the design of each step or else the analysis can be flawed.
Figure 1.3 Basic Steps involved in classification of cognitive states based on fMRI data :(A) Cognitive Experiments conducted to extract fMRI, MRI data (B) PreProcessing for registering to structural MRI and Mapping to standard normal Space (C) Selection of a small set of voxels relevant in differentiating the cognitive tasks in hand (D) Classification based on a specific classifier (E) Estimate the accuracy of the classifier (validation). Note that Steps D and E are also integrated in step C when we use wrapper feature selection algorithms such as SFFS. This thesis consists of five chapters. After this introduction, Chapter II will describe the fMRI experiments, data acquisition and the data preprocessing (Part A and B in Figure 1.3). Chapter III will discuss the feature selection, classifier design and error estimation methods (Part C, D and E in Figure 1.3). Chapter IV will show the results. Chapter V will summarize the conclusion of this thesis and will point out future work.
Chapter II DATA
2.1 Experime nts and Equipments Eight participants (students at Texas Tech University) were asked to perform (i) a nonverbal abstract reasoning task based on the Ravens Progressive Matrices test [27] and (ii) a counting task. Raven's Progressive Matrices (often referred to simply as Raven's Matrices) are multiple choice tests of abstract reasoning, originally developed by Dr John C. Raven in 1936 [28]. In each test item, a candidate is asked to identify the missing segment required to complete a larger pattern. Many items are presented in the form of a 3x3 or 2x2 matrix, giving the test its name [29]. An analytic reasoning task was designed such that participants viewed a set of patterned squares presented as elements of a matrix, with one square of the matrix left blank. The participant chooses which of the four patterned squares best completes the relationship depicted by the elements and responds by fiber optic button press (see Figure 2.1). This task is thought to engage a variety of frontal cortical areas on both sides of the brain, that are associated with analytic reasoning, decision making and other aspects of executive functioning. It was also anticipated that the parietal lobes would be actively engaged, especially those of the right hemisphere, as they are thought to be involved in the formation and manipulation of mental images, which may also be employed in deriving the solution of such problems.
Texas Tech University, Ye Yang, May 2010 In the counting task a number of objects are presented in a similar matrix-type display, and the subjects are asked to count the number of objects shown inside the square (see Figure 2.2).This task is designed to engage the same primary visual processing areas of the brain (i.e., occipital cortex) as the reasoning task does because participants view the same items in the visual display. The counting task also engages the same motor cortex as the reasoning task, given that participants press the same four buttons in response to each type of stimuli.
Figure 2.2 Counting Task Sample Eighteen reasoning stimuli (and 18 counting/control stimuli) were presented in the fMRI equipment via a projector in a block presentation design; with nine seconds of stimulus on-time interspersed with three seconds of rest. A block consisted of three counting problems followed by three analytic reasoning problems until six such blocks had been completed for each stimulus. Eighteen counting problems and eighteen reasoning questions were completed for a total of 36 stimulus presentations. Functional MRI signals were acquired via a GE Signa 1.5T MRI scan system. The functional image acquisition settings were: TR (repetition time) 3000 ms, TE (time to echo) 14.4 ms, Flip 30, and Slice Thickness = 4.5 mm (no gap). Single Shot GR-EPI imaging techniques were employed. 2.2 Data Preprocessing Tools The FMRIB Software Library (FSL) [30] is used in this thesis for major data preprocessing. Its a software library containing image analysis and statistical tools for
Texas Tech University, Ye Yang, May 2010 functional, structural and diffusion MRI brain imaging data. FSL is written mainly by members of the Analysis Group, FMRIB, Oxford, UK. In the FSL tools, the following ones were used in the thesis: I. BET (Brain Extraction Tool) deletes non-brain tissue from an image of the whole head. It can also estimate the inner and outer skull surfaces, and outer scalp surface, if you have good quality T1 and T2 input images [30]. II. FLIRT (FMRIB's Linear Image Registration Tool) is a fully automated robust and accurate tool for linear (affine) intra- and inter-modal brain image registration [30]. MNI152 brain template is applied to ensure all the eight different brain images conform to the same space. 2.3 Data Preprocessing Process As mentioned above, eight participants (students at TTU) were asked to perform (i) a nonverbal abstract reasoning task based on the Ravens Progressive Matrices test and (ii) a counting task. Each subject performed 18 reasoning tasks and 18 counting tasks of 9 seconds duration each with a rest time of 3 seconds following every task. The data was collected using a 1.5 Tesla GE Signa (Excite 11) MRI scanner with the whole volume of head scanned every 3 seconds. Each volume of head consisted of 646430 = 122880 voxels. According to the length of the whole experiment, the time series are 144. So we got 4D data from the fMRI machine and the dimensions are 646430144. The raw data is in DICOM (Digital Imaging and Communications in Medicine) format that is a standard for handling, storing, printing, and transmitting information in medical imaging [31]. For convenience of processing, usually we convert the DICOM format into NIfTI (Neuroimaging Informatics Technology Initiative) format by FSL, SPM5, MRIcron and many other brain imaging tools. Then the following steps are applied.
2.3.1 Step 1
To avoid irrelevant noisy data influencing the classifier performance, the fMRI data outside the brain area was set to zero. We used BET to remove the skull and set
Texas Tech University, Ye Yang, May 2010 the values outside the brain to 0 [32], [33]. The BET tool can work with the 4D data directly.
2.3.2 Step 2
We used FLIRT [34] to register individual brain to a common space/template: MNI 152 template and the standard template size is 109 91 91. Due to the complex registration, the FLIRT can only work with 3D data. So we have to divide the 4D data into 144 3D data and applied the same process with all of them. And then combine them to 4D again.
10
Texas Tech University, Ye Yang, May 2010 Figure 2.4 and Figure 2.5 show the setting of FLIRT Tool. We used Matlab with the Tools for NIfTI and ANALYZE image [35] to divide and combine the data in NIfTI format.
2.3.3 Step 3
Figure 2.6 shows the time series of the data. Since for each reasoning or counting task, 3 head volumes were scanned irrespective of when the subject finished the task, we averaged the data over the 3 head volumes. Then well have one Counting, one Resting, one Reasoning, one Resting and one Counting again and so on. The total number of the volumes becomes 72 including 36 resting, 18 Counting and 18 Reasoning. Furthermore, to have a reasonable number of overall features to start with, we used the average BOLD response over blocks of size 6 6 6 as one feature. Following this processing, we had 181515 = 4050 features and 18 samples for reasoning and 18 samples for counting for each of the 8 subjects.
Figure 2.6 Data Time Series Figure 2.7 is the flow chart of the whole data preprocess.
11
12
known as approximation error provides a measure of how well the classification rule can approximate the optimal (unconstrained) Bayes classifier. Let us denote the classifier designed from the data of n samples byn, C with error rate, [36], [37]. The C we can design from the data. We would like both n, C and C to be low. A complex classification rule will have a low C and when it is made even more complex to eventually include the optimal Bayes Classifier, then C = 0. A simple classification rule will have a low n, C as we will make less error in learning a simpler classifier from data. When the number of samples is large, even a complex classification rule will have low n, C but with small samples as we have in our case (36 samples per subject), the variance of n, C with a complex classification rule can be large. Henceforth, we used the simple Linear Discriminant Analysis (LDA) as our classifier for the results reported in the paper. A theorem by Vapnik and Chervonenkis [37] provides a bound on the expected design error: E[n, C ] 8 where V is the VC (Vapnik-Chervonenkis) [37] dimension of C. We notice
V log n + 4 2n
(1)
large or if V is small. A simple classifier rule has a low VC dimension, for instance 13
that the bound will be tight when the number of samples used for classifier design n is
Texas Tech University, Ye Yang, May 2010 dimension of a classifier increases when the number of features used for classification increases. Thus for small sample scenarios such as ours, it is safer to use few dimensions (features) of the data for classification. Use of large number of features can result in over-fitting and increase in n,C. Feature selection methods search for a subset of voxels that show strong association with the classification task. Feature selection methods are broadly classified into filter and wrapper methods; the former do not interact with the final designed classifier, whereas the latter use classifier design in the search itself [19], [21]. In filter approach, the features are rated based on general characteristics such as interclass distance, statistical independence or correlation of individual features with classifier groups. On the other hand, wrapper techniques evaluate subsets based on their predictive accuracy based on a particular classifier. Filters are faster but they tend to introduce bias and sometimes miss the multivariate relationships of features. A feature may not perform well individually but in combination with other features can produce a very accurate classifier. Wrapper methods even though slow to run tend to capture the feature combinations with highest classifier accuracy. In wrapper methods, the goodness of a particular feature subset Sm for feature selection is evaluated using an objective function, J(Sm) which in our case is the designed classifier accuracy. We used the wrapper method of Sequential Floating Forward Search (SFFS) [38] with minor modification. Since in small sample scenario, the monotonicity condition is not necessarily valid for the estimate of the classifier accuracy J(Sm) (the estimation approaches described in next paragraph), we started with an empty feature set and sequentially added a feature that produces the best value of the objective function J among all additions rather than a necessary increase in J from previous set (the usual SFFS approach). At every step the already selected set of features are evaluated and features with low performance are removed. To compare, we also approached feature selection by selecting top 10 features that produces best individual classifier performance and we termed this method as the Best Discriminating Features (BDF) approach. This alternative BDF technique is similar to approaches that have been used
14
Texas Tech University, Ye Yang, May 2010 in a number of cognitive studies [23], [24], [25] where the discriminating power of individual voxels is computed and the top m voxels are selected. SFFS: The SFFS method can be described algorithmically in the following way: [38] Input: Y = {yj|j = 1 D} //available measurements// Output: Xk = {xj |j = 1 k, xj Y}, k = 0, 1 D Initialization: X0 := ; k:=0 Termination: Stop when k equals the number of features required Step 1(inclusion) x+ :=arg max xYXk J(Xk + x); (the most significant feature with respect to Xk ) Xk+1 := Xk + x+; k:=k+1; Step 2(Conditional Exclusion) x- :=arg max xX k J(Xk -x); (the least significant feature in Xk ) if J(Xk x ) > J(Xk1 ) then Xk1 := Xk x; k:=k-1 Go to Step 2 else Go to Step 1 We implemented the SFFS algorithm on individual sample sets for the registered data set and for the non-registered data sets. We will denote them by RSFFS and NRSFFS respectively. We also applied the approach on the combination of the eight subjects registered data set. We will denote this approach by CSFFS.
15
Texas Tech University, Ye Yang, May 2010 3.2 Spatial SFFS The above approaches do not consider the spatial location while selecting the features. It is generally assumed that a cognitive task will produce few spatially distinct areas of activation where voxels in each individual area are spatially connected. To include this biological constraint, we modified the algorithm by adding spatial information to the results. The first feature is selected to be the voxel with the highest classification accuracy. For adding the additional features, the voxels that are connected to the current set of features are given more weight compared to the voxels that are not spatially connected to the current set of features. In this algorithm, the different weights with different locations are applied during the process of calculating the classification accuracy. In detail, the voxels around the voxels which has been chosen in the former step will have 10% more weight, which means if we have a 333 and the central one was picked in the last step, the other 26 voxels' results will be multiplied with 1.1. We perform the same process to all selected voxels, and then compare all the classification accuracy to make the decision of picking the next feature. Especially, when recording the final accuracy, the program will only record the real classification accuracy rather than the weighted results. We will denote this approach by spatial SFFS. Then we can compare the spatial SFFS results and the ones of SFFS. 3.3 Classification and Classifier Error Estimation In this thesis, the classifier between the two tasks data is produced by the Matlab function: classify which is based on the Linear Discriminant Analysis (LDA) [39]. LDA is used to find the linear combination of features that best separate two classes. The resulting combination can be used as a linear classifier or, more commonly, for dimensionality reduction before later classification [39]. If we have n d-dimensional samples x1 , x2 , x3 xn and there are two classes: 1 and 2. We want to find w which could form the best linear combination of the components of x that are well-separated. Y=wX; LDA will help us to find the best w.
16
Figure 3.1 LDA--Left one show the two classes in the original coordinates; Right one shows the results of projections of the two classes on the new line When the classifier is ready, we also need the cross-validation [40], [41] methods to estimate the classification error. Then, what is cross-validation? It is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a classifier will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds [42]. We have used four different cross- validation methods to calculate the classifier performance (J= accuracy = (1 - error)): (1) Re-substitution : The same samples are used for training and testing the classifier; (2) Leave one out : One sample at a time is left out for testing and rest are used for training and this will repeat n (the total number of the feature vectors) times; take the average accuracy ; (3) k fold cross validation : The data is randomly divided into k folds and (k-1) folds are used for training and the remaining fold used for testing (in our reported results, k=6 and we repeated this procedure 100 times) (4) Bootstrap: N samples are selected from the data with replacement and the samples not selected are used for testing and this procedure repeated l00 times.
17
Chapter IV RESULTS
Figure 4.1-4.4 shows the classification accuracy by NRSFFS approach in differentiating the cognitive states corresponding to reasoning and counting tasks for five different subjects (S1, S2, S3, S4, S5) using four different error estimators. For each subject, the classification accuracy for the classifier using 10 features selected by SFFS algorithm (Dark grey bars SFFS) and the classifier using the top 10 features that individually best classify the two tasks (Light grey bars BDF) are shown. We notice that the designed classifier performance (as measured by four different estimators) using the SFFS approach of feature selection is highly superior to the performance of a classifier designed using the same number of features but the features selected based on individual discriminating power. The classification accuracy as measured by leave one out or re-substitution is around 90% for all the subjects. For bootstrap or 6-fold cross validation estimator, it is around or more than 80% in all the subjects.
18
19
Figure 4.4 Classification Accuracy by NRSFFS Approach Bootstrap Figure 4.5- 4.6 show the classification accuracy for subjects 3 and 4 using NRSFFS approach for number of features varying from 1 to 10. The graph shows that the classifier accuracy stabilizes after few features especially when re-substitution or leave one out estimators are used. The graphs show that monotonicity (better performance with more features) does not hold for bootstrap and k- fold cross validation estimates. We should however remember that these feature selection approaches are still sub-optimal and an exhaustive search (computationally prohibitive in our case) would have produced better results.
20
Figure 4.5 Classification accuracy for subjects 3 using NRSFFS approach for number of features varying from 1 to 10
Figure 4.6 Classification accuracy for subjects 4 using NRSFFS approach for number of features varying from 1 to 10
21
Figure 4.7- 4.9 shows the classification accuracy using RSFFS approach for the eight subjects (S1, S2, S3, S4, S5, S6, S7, S8) using three different error estimators. Comparison of figures 4.1 and 4.7 shows that the classification accuracy for the registered data set is similar to the non-registered data set for individual subject cognitive classification.
Figure 4.7 Classification accuracy using RSFFS approach for the eight subjects Leave one out
Figure 4.8 Classification accuracy using RSFFS approach for the eight subjects Re-substitution
22
Figure 4.9 Classification accuracy using RSFFS approach for the eight subjects K-fold
Figure 4.10 - 4.12 shows the classification accuracy for five subjects (S1, S2, S3, S4, S5) based on the spatial SFFS algorithm. Comparison of figures 4.7 and 4.10 illustrate that the classification accuracy of spatial SFFS is lower compared to regular SFFS due to the additional spatial constraint.
Figure 4.10 Classification accuracy for five subjects based on the spatial SFFS algorithm Leave one out
23
Figure 4.11 Classification accuracy for five subjects based on the spatial SFFS algorithm Re-substitution
Figure 4.12 Classification accuracy for five subjects based on the spatial SFFS algorithm Bootstrap Figure 13 shows the classification results based on standard SFFS with the leave one out error estimation methods with all eight subjects data. The accuracy reaches around 75% when the features number is close to 50.
24
Figure 4.13 Testing results based on SFFS with Leave one out cross-validation methods Meanwhile, Figure 14 shows the classification results based on spatial SFFS with the leave one out error estimation methods with all eight subjects data. The accuracy reaches around 80% when the features number is close to 50.From the final accuracy (when the number of features is 50) of two figures, we can see that the spatial SFFS works better than the standard SFFS when it comes to the combination case.
25
Figure 4.14 Testing results based on spatial SFFS with Leave one out crossvalidation methods These results can be considered universal in the sense that the design of the classifier is based on multiple subject data rather than data of an individual subject. The resulting classification accuracy is lower than the accuracy obtained with individual sample sets because different subjects may not share the same region or voxels that optimally differentiate cognitive tasks. These results point towards the fact that few spatially connected regions as compared to a large number of spatially distinct voxels produces better classification results for a dataset consisting of multiple subjects. As mentioned previously, the reverse is true for results obtained from individual subjects. The results reported in this paper illustrate that a general heuristic for feature selection in fMRI analysis can be the following: regular SFFS approach for differentiating cognitive tasks in individual subjects and a spatial SFFS type approach for generating a universal classifier for multiple subjects. When we get the feature selections results, we can look into the biological meaning of the selected features especially with the spatial SFFSs result. We analyzed the spatial locations of the features selected using Spatial SFFS approach and
26
Texas Tech University, Ye Yang, May 2010 found them to be located in the following Brodmann areas: 18, 19, 20, 21, 22, 36, 37, and 39. Brodmann (BM) areas 18 and 19 are part of the occipital cortex and comprise the extrastriate cortex. It is considered to be a visual association area with featureextracting, shape recognition and multimodal integrating functions [43]. For differentiating reasoning and counting tasks, we would expect these areas to be important as the cognitive tasks require shape recognition and feature extraction capabilities.
Figure 4.15 Brodmann Area 18 and 19 Similarly Brodmann area 20 is related to recognition memory and high- level visual processing that are required in the reasoning and counting tasks.
27
Texas Tech University, Ye Yang, May 2010 BM areas 21 and 22 are related to language and auditory processing which might be involved when participants are counting objects by silently saying the numerals.
Figure 4.17 Brodmann Area 21 and 22 BM area 36 is involved in sensory input processing including visual senses.
Figure 4.18 Brodmann Area 36 BM area 37 is most likely involved in processing of semantics in speech and vision.
28
Texas Tech University, Ye Yang, May 2010 BM area 39 is assumed to be involved in a number of processes related to language, mathematics and cognition which might be required in the cognitive tasks of reasoning and counting.
29
30
REFERENCES
SA, H., AW, S., and G, M. Functional Magnetic Resonance Imaging. Sinauer Associates. 2004. Di Bono, M. G. and Zorzi, M. DecodingCognitive States from fMRI Data using Support Vector Regression, Psych-Nology Journal 6.2 (2008): 189201. Weiskopf, N., Mathiak, K., Bock, S. W., Scharnowski, F., Veit, R., Grodd, W., Goebel, R., and Birbaumer, N.. Principles of a Brain-Computer Interface (bci) based on Real-Time Functional Magnetic Resonance Imaging (fmri). IEEE Transactions on Biomedical Engineering 51.6 (2004): 966970. Greicius, M., Srivastava, G., Reiss, A., and Menon, V.. Default- mode network activity distinguishes alzheimers disease from healthy aging: evidence from functional mri. Proc Natl Acad Sci 101.13 (2004): 463742. Kozel, F., Johnson, K., Grenesko, E., Laken, S., Kose, S., Lu, X., Pollina, D., Ryan, A., and George, M., Functional mri detection of deception after committing a mock sabotage crime. J Forensic Sci 54.1 (2009): 220231. Friston, K. J., Holmes, A. P.,Worsley, K. J., Poline, J. P., Frith, C. D., and Frackowiak, R. S. J.. Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping 2.4 (1995): 189210. Norman, K., Polyn, S., Detre, G., and Haxby, J.. Beyond mind-reading: multi- voxel pattern analysis of fmri data. Trends in Cognitive Sciences 10 (September, 2006): 424430. Mitchell, T., Hutchinson, R., Niculescu, R. S., Pereira, F., Wang, X., Just, M., and Newman, S. Learning to decode cognitive states from brain images. Machine Learning 57 (2004): 145175. Chuang, K. H., Chiu,M. J., Lin, C. C., and Chen, J. H.. Model- free functionalmri analysis using kohonen clustering neural network and fuzzy c- means. IEEE Trans Med Imaging 18.12 (1999): 11171128. Voultsidou, M., Dodel, S., and Herrmann, J., M., Neural networks approach to clustering of activity in fmri data, IEEE Transactions on Medical Imaging (2005): 987996. Meyer, F. G. and Chinrungrueng, J.. Analysis of event-related fmri data using best clustering bases. IEEE Transactions on Medical Imaging 22 (2003): 933939. Liao, T. W.. Clustering of time series data - a survey. Pattern Recognition 38.11 (2005): 18571874. 31
Texas Tech University, Ye Yang, May 2010 Heller, R., Stanley, D., Yekutieli, D., Rubin, N., and Beniamini, Y.. Cluster-based analysis of fmri data. NeuroImage 33 (2006): 599608. Chen, S., Bouman, C. A., and Lowe, M. J.. Clustered components analysis for functional mri. IEEE Transactions on Medical Imaging 23 (2004): 8598. Chen, H., Yuan, H., Yao, D., Chen, L., and Chen, W.. An integrated neighbourhood correlation and hierarchical clustering approach of functional mri. IEEE Transactions on Biomedical Engineering 53 (2006): 452458. Hu, D., Yan, L., Liu, Y., Zhou, Z., Friston, K. J., Tan, C., and Wu, D.. Unified spmcica for fmri analysis. NeuroImage 25 (2005): 746755. Meyer-Baese, A., Wismueller, A., and Lange, O.. Comparison of two exploratory data analysis methods for fmri: unsupervised clustering versus independent component analysis. IEEE Transactions on Information Technology in Biomedicine 8 (2004): 387398. Jin, B., Strasburger, A., Laken, S. J., Kozel, F. A., Johnson, K. A., George, M. S., and Lu, X.. Feature selection for fmri-based deception detection. Translational Bioinformatics (2009). Jain, A. and Zongker, D.. Feature selection-evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Machine Intell. 19 (1997): 153158. Hughes, G. F.. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Information Theory 14 (1968): 5563. Kohavi, R. and John, G.. Wrappers for feature subset selection. Pattern Recog. Lett. 97 (1997): 273324. Mouro-Miranda, J., Friston, K. J., and Brammer, M.. Dynamic discrimination analysis: a spatial- temporal svm. neuroimage, Neuroimage 36.1 (2007): 88 99. Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., and Pietrini, P.. Distributed and overlapping representations of faces and objects in ventral temporal cortex, Science 293 (September 2001): 24252430. Polyn, S. M., Natu, V. S., Cohen, J. D., and Norman, K. A.. Category-specific cortical activity precedes retrieval during memory search. Science 310 (December 2005): 19631966.
32
Texas Tech University, Ye Yang, May 2010 James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jeenifer L. Schouten, Pietro Pietrini. Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex. Science, 293 (2001): 2425 - 2430. Yukiyasu Kamitani, Frank Tong. Decoding the Visual and Subjective Contents of the Human Brain. Nature Neuroscience, 8 (2005): 679 685. Raven, J. and Raven, J.. Raven Progressive Matrices. New York: Kluwer Academic/Plenum Publishers, 2003. 223240. Raven, J. C. Mental tests used in genetic studies: The performance of related individuals on tests mainly educative and mainly reproductive, MSc Thesis, University of London, 1936. Wikipedia. Raven's Progressive Matrices. January, 21, 2010. (http://en.wikipedia.org/wiki/Ravens_Progressive_Matrices.) Analysis Group, FMRIB, Oxford, UK. The FMRIB Software Library. FSL 4.1 August 2008 (http://www.fmrib.ox.ac.uk/fsl/index.html.) Medical Imaging & Technology Alliance. Digital Imaging and Communications in Medicine. February, 14, 2010. (http://dicom.nema.org/) S. M. Smith. Fast robust automated brain extraction. Human Brain Mapping 17.3 (November 2002): 143-155. M. Jenkinson, M. Pechaud, and S. Smith. Eleventh Annual Meeting of the Organization for Human Brain Mapping. 2005. BET2: MR-based estimation of brain, skull and scalp surfaces. M. Jenkinson and S. M. Smith. A global optimization method for robust affine registration of brain images. Medical Image Analysis 5.2 (2001): 143-156. Jimmy Shen, Tools for NIfTI and ANALYZE image. MATLAB CENTRAL. 23 Oct 2005, 13 Apr 2010 (http://www.mathworks.com/matlabcentral/fileexchange/8797.) Braga-Neto, U.. Fads and fallacies in the name of small-sample microarray classification. IEEE Signal Processing Magazine 24.1 (2007): 91 - 99. Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag, 1996. Pudil, P., Novovicova, J., and Kittler, J.. Floating search methods in feature selection. Pattern Recog. Lett. 15 (1994): 11191125.
33
Texas Tech University, Ye Yang, May 2010 Ronald Fisher. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7 (1936): 179188. Kohavi, Ron. A study of cross- validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2.12 (1995): 11371143. Devijver, P. A., and J. Kittler. Pattern Recognition: A Statistical Approach. London: Prentice-Hall, 1982. Kohavi, Ron (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): 11371143. Wikipedia. Brodmann area. March, 18, 2010. (http://en.wikipedia.org/wiki/Brodmann_area)
34