KEMBAR78
FML Unit 1 2 Book | PDF
0% found this document useful (0 votes)
69 views53 pages

FML Unit 1 2 Book

Dfghbc

Uploaded by

Keval Kotadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
69 views53 pages

FML Unit 1 2 Book

Dfghbc

Uploaded by

Keval Kotadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 53
Introduction to Machine Learning ‘Syllabus Overview of Human Learning and Machine Learning, Types of Machine Learning, Applications of Machine Learning , Tools and Technology for Machine Learning Contents 1.1 Overview of Human Leaming 1.2 Overview of Machine Leaming 1.3. Types of Machine Leaming 1.4 Applications of Machine Leeming 1.5. Tools and Technology for Machine Leaming 1.8 Fill in the Blanks 4.7 Multiple Choice Questions a9 ‘Machine Leaming 1-2 Introduction to Machine Leaming ERB Overview of Human Learning Leaming is the process of acquiring new understanding, knowledge, behaviours, skills, values, attitudes and preferences. Learning process happens when you observe a phenomenon and recognize a pattern. Learning is a phenomenon and process which has manifestations of various aspects. Learning process includes gaining of new symbolic knowledge and development of cognitive skills through instruction and practice. It is also discovery of new facts and theories through observation and experiment. All human learning is observing something, identifying a pattern, building a theory (model) to explain this pattern and testing this theory to check if its fits in most or all observations. Fig. 1.1.1 shows human learning. Eo, Fig. 1.1.1 Human ming Both human as well as machine learning generate knowledge, one residing in the brain the other residing in the machine. Human leaming process varies from person to person. Once a leaning process is set into the minds of people, it is difficult to change it. Fig. 1.1.2 shows relation between human and machine learning. Human learning Machine learning Intetigence ===> models i a a) materials, * Learning by creating tests beame? = ===> Skiteam { ° Interloaving learning * Learning by ignoring 12 TECHNICAL PUBLICATIONS®, Machine Leeming Introduction to Machine Learning Types of human learning «Human learning take place in following way : 1, Self-learning : Human try many times after multiple attempts, some being unsuccessful. Knowledge gained from expert : We build our own notion indirectly based on what we have learnt from the expert in the past. ‘Learning directly from expert : Either somebody who is an expert in the subject directly teaches us. ‘Humans acquire knowledge through experience either directly or shared by others. Humans begin learning by memorizing. After few years, he realizes that mere capability to memorize is not intelligence. In humans, learning speed depends on individuals and in machines, learning speed depends on the algorithm selected and the volume of examples exposed to it, EEE Difference between Human and Machine Learning Be Human leaming Machine learning | ‘Humans acquire knowledge through experience Machines acquire knowledge through either directly or shared by others. experience shared in the form of past data. Model-free and model-based mechanisms can be found in human learning. Observation = Learning => Skill Overview of Machine Learning * Machine Leaming (ML) is a sub-field of Artificial Intelligence (AI) which concerns with developing computational theories of leaning and building learning machines. Learning is a phenomenon and process which has manifestations of various aspects. Learning process includes gaining of new symbolic knowledge and development of cognitive skills through instruction and practice. It is also discovery of new facts and theories through observation and experiment. ¢ Machine Learning Definition : A computer program is said to leam from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Knowledge based learning in machine learning. Data = Machine Learning =® Skill TECHNICAL PUBLICATIONS® - an up-thrust for knowtedge 1-4 Introduction to Machine Le, emg Machine Learning, sng computers to optimize a performan c ‘ © criterion Why is Machine Learning im Machine learning 's programming oi experience. Application of machine learnin, & Methods using example data OF past to large databases i ‘called data mining It is very hard to write PI ve problems like recognizing a hy face, We do not know whet program to waite because we pa dade tee does it. Instead of Writing @ rogram by hand, 1 is possible to ne examples that specify the correct output for given input. lots of ‘h machine leaming aigorthn takes these examples and produces a pro ‘hat does the job. The PYOBTam reduced by the learning algoritun may esa different from a typical hend-witten program. It may contain millions of numbeo, sf we do it right, the Program orks for new cases as well as {he ones oe it on. Main goal of machine Teaming g algorithms that do the ibaming automatically without ; man intervention oF assistance, The machine Jearning paradigm can De ‘viewed as "programming by example.” Another goal is to develop computational wtgels of human learning process and perform computer simulations. ‘The goal of machine Tearing 1 Jearn from their experience: ‘Algorithm is used to solve is to devise Jearnin; to build computer systems that can adapt and problem on computer. ‘An algorithm is a sequence of Jnstruction. It should carry out to transform the input to output. For example, for ‘addition of four numbers is carried out by giving four number as input to the algorithm and output is sum of all four numbers. For the same task, there may be various algorithms. It is interested to find the most efficient one, requiring the east number of instructions Or For some tasks, however, we do not have an algorithm. portant ? Machine learning algorithms con figure generalizing from examples. Machine Learning provides business insight and intelligence. Decision makers me provided with greater insights into their organizations. This adaptive te being used by global enterprises to gain a competitive edge Machine learning algorithms discover the relationships betwee system (input, output and hidden) from direct samples of the system. Following are some of the reasons = 1, Some tasks cannot be defined well, except by examples- Recognizing people. fe out how to perform important tasks by a the variab example For Machine Leeming 1-5 Introduction to Machine Learning 2. Relationships and correlations can be hidden within large amounts of data. To solve these problems, machine learning and data mining may be able to find these relationships. 3. Human designers often produce machines that do not work as well as desired in the environments in which they are used. The amount of knowledge available about certain tasks might be too large for explicit encoding by humans. 5. Environments change time to time. 6. New knowledge about tasks is constantly being discovered by humans. * Machine learning also helps us find solutions of many problems in computer vision, speech recognition and robotics. Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. How Machines Learn ? « Machine learning typically follows three phases : 1. Training : A training set of examples of correct behavior is analyzed and some representation of the newly learnt knowledge is stored. This is some form of rules. 2, Validation : The rules are checked and, if necessary, additional training is given. Sometimes additional test data are used, but instead, a human expert may validate the rules, or some other automatic knowledge - based component may be used. The role of the tester is often called the opponent. 3. Application : The rules are used in responding to some new situation. Fig. 1.2.1 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming 1-6 Introduction to Machine Leaming ESI How do Machine Learn? Machine learning process in divided into three parts : Data inputs, abstraction and ‘generalization. Fig, 1.2.2 shows machine learning process. Information is used for future decision making. Abstraction : Input data is represented in broader way through the underlying algorithm. Generalization : It forms framework for making decision. Machine learning is a form of Artificial Intelligence (Al) that teaches computers to think in a similar way to how humans do : Leaning and improving upon past experiences. It works by exploring data and identifying patterns and involves minimal human intervention. Algorithm is used to solve a problem on computer. An algorithm is a sequence of instruction. It should carry out to transform the input to output. For example, for addition of four numbers is carried out by giving four number as input to the algorithm and output is sum of all four numbers. For the same task, there may be various algorithms. It is interested to find the most efficient one, requiring the least number of instructions or memory or both. Fig. 1.2.2 Machine learning process Abstraction During the machine leaning process, knowledge is fed in the form of input data. Collected data is raw data. It can not used directly for processing. Model known in machine leaning paradigm is summarized knowledge representation of raw data. The model may be in any one of the following forms : 1, Mathematical equations. 2. Specific data structure like trees. 3. Logical grouping of similar observations. 4, Computational blocks. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Learning 4-7 Introduction to Machine Leaming © Choice of the model used to solve specific learning problem is the human task. Some of the parameters are as follows : a) Type of problem to be solved. b) Nature of the input data. ©) Problem domain. Well Posed Learning Problem Definition : A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. * A (machine leaming) problem is well-posed if a solution to it exists, if that sohition is unique, and if that solution depends on the data / experience but it is not sensitive to (reasonably small) changes in the data / experience. «Identify three features are as follows : 1. Class of tasks 2. Measure of performance to be improved 3. Source of experience * What are T, P, E ? How do we formulate a machine learning problem ? + A Robot Driving Learning Problem 1. Task T : Driving on public, 4-lane highway using vision sensors. 2, Performance measure P : Average distance traveled before an error (as judged by human overseer). 3. Training experience E : A sequence of images and steering commands recorded while observing a human driver. © A Handwriting Recognition Learning Problem. 1. Task T : Recognizing and classifying handwritten words within images. 2. Performance measure P : Percent of words correctly classified. 3. Training experience E : A database of handwritten words with given classifications. © Text Categorization Problem. 1, Task T : Assign a document to its content category. 2. Performance measure P : Precision and Recall. 3. Training experience E : Example pre-~classified documents. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Learning 1-8 Introduction to Machine Leaming EE] Types of Machine Learning Learning is constructing or modifying representation of what is being experienced. Lear means to get knowledge of by study, experience or being taught. Machine leaming is a scientific discipline concemed with the design and development of the algorithm that allows computers to evolve behaviours based ‘on empirical data, such as form sensors data or database. Machine learning is usually divided into three types : Supervised, unsupervised and reinforcement learning. Why do machine learning ? 1. To understand and improve efficiency of human learning. 2. Discover new things or structure that is unknown to humans. 3. Fill in skeletal or incomplete specifications about a domain. ‘Machine tearning ‘Supervised leaming Unsupervised feaming Reinforcement leaming Classification Clustering Regression Association analysis Fig. 1.3.1 Supervised Learning Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples, ‘The task of the supervised learner is to predict the output behavior of a system for any set of input values, after an initial training phase. Supervised learning in which the network is trained by providing it with input and matching output patterns. These input-output pairs are usually provided by an external teacher. Human learning is based on the past experiences. A computer does not have experiences A computer system leams from data, which represent some "past experiences’ of an application domain. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leeming 1-9 introduction to Machine Leaming * To lear a target function that can be used to predict the values of a discrete class attribute, eg., approve or not-approved, and high-risk or low risk. The task is commonly called : Supervised learning, Classification or inductive learning. * Training data includes both the input and the desired results. For some examples the correct results (targets) are known and are given in input to the model during the learning process. The construction of a proper training, validation and test set is crucial. These methods are usually fast and accurate. * Have to be able to generalize : Give the correct results when new data are given in input without knowing a priori the target. © Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised leaming, each example is a pair consisting of an input object and a desired output value. * A supervised leaming algorithm analyzes the training data and produces an inferred function, which is called a classifier or a regression function. Fig. 132. shows supervised learning process. Cc > > k=} senee 5, => Training Testing Fig. 1.3.2 Supervised learning process * The learned model helps the system to perform task better as compared to no learning. * Each input vector requires a corresponding target vector. Training Pair = (Input Vector, Target Vector) Fig. 1.3.3 TECHNICAL PUBLICATIONS® - en up-ihrust for knowledge ‘Machine Leaming 1-10 Introduction to Machine Learning ‘+ Supervised learning denotes a method in which some input vectors are collected and presented to the network. The output computed by the net-work is observed and the deviation from the expected answer is measured. The weights are corrected according to the magnitude of the error in the way defined by the learning algorithm. * Supervised leaming is further divided into methods which use reinforcement or error correction. The perceptron learning algorithm is an example of supervised learning with reinforcement. © In order to solve a given problem of supervised learning, following steps are performed : 1. Find out the type of training examples. 2. Collect a training set. 3. Determine the input feature representation of the learned function. 4. Determine the structure of the learned function and corresponding learning algorithm. 5. Complete the design and then run the learning algorithm on the collected training set. 6. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set. Classification © Classification predicts categorical labels (classes), prediction models. continuous-valued functions. Classification is considered to be supervised learning. © Classifies data based on the training set and the values in a classifying attribute and uses it in classifying new data. Prediction means models continuous-valued functions, ie., predicts unknown or missing values. © Preprocessing of the data in preparation for classification and prediction can involve data cleaning to reduce noise or handle missing values, relevance analysis to remove irrelevant or redundant attributes, and data transformation, such as generalizing the data to higher level concepts or normalizing data. + Fig. 1.3.4 shows the classification. Aim ; To predict categorical class labels for new samples. Input : Training set of samples, each with a class label. Output : Classifier is based on the training set and the class labels. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge — Machine Leaming 1-11 Introduction to Mechine Leaming Fig. 1.3.4 Classification Prediction is similar to classification. It constructs a model and uses the model to predict unknown or missing value. * Classification is the process of finding a model that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data. * Classification and prediction may need to be preceded by relevance analysis, which attempts to identify attributes that do not contribute to the classification or prediction process. + Numeric prediction is the task of predicting continuous values for given input. For example, we may wish to predict the salary of college employee with 15 years of work experience, or the potential sales of a new product given its price. Some of the classification methods like back-propagation, support vector machines, and k-nearest-neighbor classifiers can be used for prediction. Regression * For an input x, if the output is continuous, this is called a regression problem. For example, based on historical information of demand for tooth paste in your supermarket, you are asked to predict the demand for the next month. «Regression is concemed with the prediction of continuous quantities. Linear regression is the oldest and most widely used predictive model in the field of machine learning. The goal is to minimize the sum of the squared errors to fit a straight line to a set of data points. «For regression tasks, the typical accuracy metrics are Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). These metrics measure the distance between the predicted numeric target and the actual numeric answer. TECHNICAL PUBLICATIONS® - an up-thnust for knowledge Machine Leaming Introduction to Machine Leaming Regression Line * Least squares : The least squares regression line is the line that makes the sum of squared residuals as small as possible. Linear means “straight line". + Regression line is the line which gives the best estimate of one variable from the value of any other given variable. © The regression line gives the average relationship between the two variables in mathematical form. «For two variables X and Y, there are always two lines of regression. © Regression line of X on ¥ : Gives the best estimate for the value of X for any specific given values of Y : X= a+by where a = X- intercept b = Slope of the line X = Dependent variable Y = Independent variable + Regression line of Y on X: Gives the best estimate for the value of Y for any specific given values of X = Y = atbx where a = Y- intercept b = Slope of the line Y = Dependent variable x = Independent variable * By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) we are able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of : Machine Leaming 1-13 Introduction to Machine Leaming % Pepdaton Population Riotiors y- intercept nh Potdial change : | chan yt x8 fs ete hangs x et erate =Y- intercept Fig. 1.3.5 + Regression analysis is the art and science of fitting straight lines to pattems of data. In a linear regression model, the variable of interest ( “dependent” variable) is predicted from k other variables ("independent” variables) using a linear equation. If Y denotes the dependent variable, and X;,...,X4, are the independent variables, then the assumption is that the value of Y at time t in the data sample is determined by the linear equation : Y1 = Bo +81 Xz +B2 Xre +--+ Bu Xia He where the betas are constants and the sve epsilons are independent and identically Blaster 4a Z fam alisuabtited: normal random variables Input rd x a with mean zero. a ee % © In a regression tree the idea is this : Since the target variable does Fig. 1.3.6 not have classes, we fit a regression model to the target variable using each of the independent variables. ‘Then for each independent variable, the data is split at several split points. * At each split point, the “error” between the predicted value and the actual values is squared to get a “Sum of Squared Errors (SSE)". The split point errors across the variables are compared and the variable/point yielding the lowest SSE is chosen as the root node/split point. This process is recursively continued. * Error function measures how much our predictions deviate from the desired answers. Mean-squared error Jy = 2 5 * Multiple linear regression is an extension of linear regression, which allows a response variable, y, to be modeled as a linear function of two or more predictor variables. TECHNICAL PUBLICATIONS® - an uptrust for knowtedge ‘Machine Leaming 114 Introduction to Machine Leaming Evaluating a Regression Model ‘* Assume we want to predict a car's price using some features such as dimensions, horsepower, engine specification, mileage etc. This is a typical regression problem, where the target variable (price) is a continuous numeric value. * We can fit a simple linear regression model that, given the feature values of a certain car, can predict the price of that car. This regression model can be used to score the same dataset we trained on. Once we have the predicted prices for all of the cars, we can evaluate the performance of the model by looking at how much the predictions deviate from the actual prices on average. Advantages : a. Training a linear regression model is usually much faster than methods such as neural networks. b. Linear regression models are simple and require minimum memory to implement. © By examining the magnitude and sign of the regression coefficients you can infer how predictor variables affect the target outcome. Assessing Performance of Regression- Error Measures ‘+ The training error is the mean error over the training sample. The test error is the expected prediction error over an independent test sample. * Fig. 137 shows the relationship between training set and test set. Machine Learning 1-15 Introduction to Machine Leaming © Unlike decision trees, regression trees and model trees are used for prediction. In regression trees, each leaf stores a continuous-valued prediction. In model trees, each leaf holds a regression model. EEE] Un - Supervised Learning + The model is not provided with the correct results during the training. It can be used to cluster the input data in classes on the basis of their statistical properties only. Cluster significance and labeling. * The labeling can be carried out even if the labels are only available for a small number of objects representative of the desired classes. All similar inputs patterns are grouped together as clusters. «If matching pattern is not found, a new cluster is formed. There is no error feedback. + External teacher is not used and is based upon only local information. It is also referred to as self-organization. * They are called unsupervised because they do not need a teacher or super-visor to label a set of training examples. Only the original data is required to start the analysis. ‘+ In contrast to supervised learning, unsupervised or self-organized leaming does not require an external teacher. During the training session, the neural network receives a number of different input patterns, discovers significant features in these pattems and learns how to classify input data into appropriate categories. © Unsupervised learning algorithms aim to leam rapidly and can be used in real-time. Unsupervised learning is frequently employed for data clustering, feature extraction etc. * Another mode of learning called recording leaming by Zurada is typically employed for associative memory networks. An associative memory networks is designed by recording several idea patterns into the networks stable states. Clustering © Clustering of data is a method by which large sets of data are grouped into clusters of smaller sets of similar data. Clustering can be considered the most important unsupervised learning problem. © A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. Fig. 1.3.8 shows cluster. TECHNICAL PUBLICATIONS® - an up-trust for knowledge Machine Leaming 1-16 Introduction to Machine ae BB ee ee & Fig. 1.3.8 Cluster «In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance : two or more objects belong to the same cluster if they are “close” according to a given distance (in this case geometrical distance). This is called distance-based clustering. + Clustering means grouping of data or dividing a large data set into smaller data sets of some similarity. + A clustering algorithm attempts to find natural groups of components or data based on some similarity. Also, the clustering algorithm finds the centroid of a group of data sets. + To determine cluster membership, most algorithms evaluate the distance between a point and the cluster centroids. The output from a clustering algorithm is basically a statistical description of the cluster centroids with the number of components in each cluster. | meow fo ae © Cluster centroid : The centroid of a cluster is a point whose parameter values are the mean of the parameter values of all the points in the clusters. Each cluster has a well defined centroid. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming 1-47 Introduction to Machine Leaming Distance : The distance between two points is taken as a common metric to as see the similarity among the components of a population. The commonly used distance measure is the Euclidean metric which defines the distance between two points p = (py, p2,-..) and q = (q1-q2-...) is given by : ds > (pi-ai)? m1 The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good clustering 7? It can be shown that there is no absolute "best" criterion which would be independent of the final aim of the clustering. Consequently, it is the user which must supply this criterion, in such a way that the result of the clustering will suit their needs. Clustering analysis helps construct meaningful partitioning of a large set of objects. Cluster analysis has been widely used in numerous applications, including pattern recognition, data analysis, image processing, etc. Clustering algorithms may be classified as listed below : 1. Exclusive clustering 2. Overlapping clustering 3. Hierarchical clustering 4. Probabilistic clustering ‘A good clustering method will produce high quality clusters with high intra-class similarity and low inter-class similarity. The quality of a clustering result depends ‘on both the similarity measure used by the method and its implementation. The quality of a clustering method is also measured by its ability to discover some or all of the hidden pattems. Examples of Clustering Applications i 2 Marketing : Help marketers discover distinct groups in their customer bases and then use this knowledge to develop targeted marketing programs. Land use : Identification of areas of similar land use in an earth observation database. Insurance : Identifying groups of motor insurance policy holders with a high average claim cost. Urban planning : Identifying groups of houses according to their house type, value, and geographical location. Seismology : Observed earth quake epicenters should be clustered along continent faults. TECHNICAL PUBLICATIONS® - an up-thnist for knowledge Machine Leeming 1-18 Introduction to Machine Loaming EEE] Reinforcement Leaming User will get immediate feedback in supervised leaming and no feedback from unsupervised learning. But in the reinforced learning, you will get delayed scalar feedback. Reinforcement leaming is learning what to do and how Se Ca) The leamer is not told which actions to take. Fig, 139 shows concept of reinforced S™Z° . learning. Reinforced learning is deals % car as et ifs [erouen ) and acts upon their environment. Tt combines classical Artificial Intelligence and machine learning techniques. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to leam its behavior; this is known as the reinforcement signal. Two most important distinguishing features of reinforcement learning is trial-and-error and delayed reward. With reinforcement leaming algorithms an agent can improve its performance by using the feedback it gets from the environment. This environmental feedback is called the reward signal. Based on accumulated experience, the agent needs to learn which action to take in a given situation in order to obtain a desired long term goal. Essentially actions that lead to long term rewards need to reinforced. Reinforcement learning has connections with control theory, Markov decision processes and game theory. = Example of Reinforcement Learning : A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. Fig. 1.3.9 Reinforced leaming TECHNICAL PUBLICATIONS® - an up-thrust for knowedge cl Introauction to Machine Learning Machine Leaming EEERI Elements of Reinforcement Learning Reinforcement learning elements are as follows t 1. Policy 2. Reward Function 3. Value Function 4. Model of the environment Fig. 13.10 shows Policy : Policy defines the learning agent behavior for given time period. It is mapping from perceived states of the environment to actions to be taken when in those states. Reward Function : Reward function is used to define a goal in a reinforcement learning problem. It also Fig, 4.3.19 : Elements of reinforcement learning maps each perceived state of the environment to a single number. Value function : Value functions specify what is good in the long run. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Model of the environment : Models are used for planning. Credit assignment problem : Reinforcement learning algorithms learn to generate ‘an internal value for the intermediate states as to how good they are in leading to the goal. The learning decision maker is called the agent. The agent interacts with the environment that includes everything outside the agent. ‘The agent has sensors to decide on its state in the environment and takes an action that modifies its state. The reinforcement leaming problem model is an agent continuously interacting with an environment. The agent and the environment interact in a sequence of time steps. At each time step t, the agent receives the state of the environment and a scalar numerical reward for the previous action, and then the agent then selects an action. Reinforcement Learning is a technique for solving Markov Decision Problems. Agent TECHNICAL PUBLICATIONS® - an up-thrust for Mechine Leaming 1-20 Introduction to Machine Leaming + Reinforcement learning uses a formal framework defining the interaction between a learning agent and its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of representing essential features of the artificial intelligence problem. Learning Difference between Supervised, Unsupervised and Reinforcement Supervised learning Bons ; : that the target variable is well defined and that a sufficient number of its values are | Supervised leaming deals with two main tasks regression and classification. ‘The input data in supervised earning in labelled data. Leams by using labelled data. ‘Maps the labeled inputs to the known outputs. | | Unsupervised learning For unsupervised learning typically either the target variable is unknown oF has only been recorded for too small a number of cases. Unsupervised Leaming deals with dlustering and associative rule mining problems. Unsupervised learning uses unlabelled data. Trained using unlabelled data without any guidance. Understands pattems and discovers the output. Applications of Machine Learning Examples of successful applications of machine learning : 1. Learning to recognize spoken words. sees Learning to drive an autonomous vehicle. Learning to classify new astronomical structures. Learning to play world-class backgammon. Spoken language understanding: within the context of a limited domain, Reinforcement learning Reinforcement learning is Jeaming what to do and how to map situations to actions. ‘The learner is not told which actions to take. Reinforcement learning deals with exploitation or exploration, Markor's decision | processes, policy learning, deep learning and value ea ‘The data is not predefined in reinforcement B Works on interacting with the environment. Follows the trial and error method. | | determine the meaning of something uttered by a speaker to the extent that it can be classified into one of a fixed set of categories. Face Recognition + Face recognition task is effortlessly and every day we recognize our friends, relative and family members. We also recognition by looking at the photographs. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge i Machine Learning 4-21 Introduetion to Machine Leeming In photographs, they are in different pose, hair styles, background light, makeup and without makeup. * We do it subconsciously and cannot explain how we do it. Because we can't explain how we do it, we can't write an algorithm * Face has some structure. It is not a random collection of pixel. It is symmetric structure. It contains predefined components like nose, mouth, eye, ears. Every person face is a pattern composed of a particular combination of the features. By analyzing sample face images of a person, a learning program captures the pattern specific to that person and uses it to recognize if a new real face or new image belongs to this specific person or not. * Machine learning algorithm creates an optimized model of the concept being learned based on data or past experience. Healthcare : * With the advent of wearable sensors and devices that use data to access health of a patient in real time, ML is becoming a fast-growing trend in healthcare. « Sensors in wearable provide real-time patient information, such as overall health condition, heartbeat, blood pressure and other vital parameters. * Doctors and medical experts can use this information to analyse the health condition of an individual, draw a pattern from the patient history and predict the occurrence of any ailments in the future. «The technology also empowers medical experts to analyze data to identify trends that facilitate better diagnoses and treatment. Financial services : * Companies in the financial sector are able to identify key insights in financial data as well as prevent any occurrences of financial fraud, with the help of machine learning technology. ‘* The technology is also used to identify opportunities for investments and trade. ‘© Usage of cyber surveillance helps in identifying those individuals or institutions which are prone to financial risk and take necessary actions in time to prevent fraud. EEA Tools and Technology for Machine Learning EEEI Python ‘© Python is a high-level scripting language which can be used for a wide variety of text processing, system administration and internet-related tasks. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming 1-22 Introduction to Machine Leaming Python is a true object-oriented language and is available on a wide variety of platforms. Python was developed in the early 1990's by Guido van Rossum, then at CWI in Amsterdam and currently at CNRI in Virginia. Python 3.0 was released in Year 2008. Python statements do not need to end with a special character. Python relies on modules, that is, self-contained programs which define a variety of functions and data types. ‘A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended, Within a module, the module's name (as a string) is available as the value of the global variable _name_. If a module is executed directly however, the value of the global variable _name__ will be "_main_". Modules can contain executable statements aside from definitions. These are executed only the first time the module name is encountered in an import statement as well as if the file is executed as a script. Integrated Development Environment (IDE) is the basic interpreter and editor environment that you can use along with Python. This typically includes an editor for creating and modifying programs, a translator for executing programs and a program debugger. A debugger provides a means of taking control of the execution of a program to aid in finding program errors. Python is most commonly translated by use of an interpreter. It provides the very useful ability to execute in interactive mode. The window that provides this interaction is referred to as the Python shell. Python support two basic modes : Normal mode and interactive mode. Normal mode : The normal mode is the mode where the scripted and finished . py files are run in the Python interpreter. This mode is also called as script mode. Interactive mode is a command line shell which gives immediate feedback for each statement, while running previously fed statements in active memory. = Start the Python interactive interpreter by typing python with no arguments at the command line. = To access the Python shell, open the terminal of your operating system and then type “python”. Press the enter key and the python shell will appear. (C:AWindows\system32> python Python 3.5.0(v.3.6.0:374{501f4567, Sep 13 2016, 2:27:37)[MSCv.1900 64 bit (AMD64)] on win32 Type “help’, copyright,‘credits" or ‘license* for more information. >>> TECHNICAL PUBLICATIONS® - an yp-hrust for knowedge >> indicates that the Python shell is ready to execute and send your commands to the Python intrepreter. The result is immediately displayed on the Python shell as soon as the Python interpreter interpreters the command. * For example, to print the text “Hello World”, we can type the following : >>> print("Hello World’) Hell World >>> «In script mode, a file must be created and saved before executing the code to get results. In interactive mode, the result is returned immediately after pressing the eneter key. In script mode, you are provided with a direct way of editing your code. This is not possible in interactive mode. © A variable is a way of referring to a memory location used by a computer program. * A variable is a symbolic name for this physical location. This memory location contains values, like numbers, text or more complicated types. © A variable is a name that refers to a value. The equal (=) operator is used to assign value to a variable, © Python's data types include : Numbers, strings, lists, dictionaries, tuples and files. © Python has no additional commands to declare a variable. As soon as the value is assigned to it, the variable is declared. © Rules for varibles are as follows : a. Special characters are not allowed. b. Variables are case sensitive. ¢. Variable can only contain aplha-numeric characters and underscores. d. Variable name always start with character, not with number. Features of Puython programming . Python is a high-level, interpreted, interactive and object-oriented scripting language. 2. It is simple and easy to learn. 3. It is portable. 4. Python is free and open source programming langauage. 5, Python can perform complex tasks using a few lines of code. TECHNICAL PUBLICATIONS® - an up-ihrust for knowtedge 1-24 Introduction to Machine Leaming 6. Python can run equally on different platforms such as Window, Linux, UNIX and Macintosh etc. 7. It provides a vast range of libraries for the various fields such as machine learing, web, developer and also for the scripting. Advantages of Python « Ease of programming. * Minimizes the time to develop and maintain code. ‘* Modular and object-oriented. Large community of users. «A large standard and user-constributed library. Disadvantages of Python + Interpreted and therefore slower than compiled languages. + Decentralized with pacakges. R Programming Language R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. © Ris often used for statistical computing and graphical presentation to analyse and visualize data * To use a function in a package, the package needs to be loaded in memory. é Command for this is library( ), for example : library(affy). R is case sensitive, so take care when typing in the commands. Multiple commands can be written on the same line. Command can have many arguments. These are always giving inside the brackets. Numeric (1, 2, 3...) or logic (I/F) values and names of existing objects are given for the azguments without quotes, but string values, such as file names, are always put inside quotes. + For example : mas5(dat3, normalize = T, analysis = “absolute"). Vectors and matrices in R are two ways to work with a collection of objects. Lists provide a third method. Unlike a vector or a matrix a list can hold different kinds of objects. One entry in a list may be a number, while the next is a matrix, while a third is a character string. Statistical functions of R usually return the result in the form of lists. So we must know how to unpack a list using the $ symbol. TECHNICAL PUBLICATIONS® - an up-tnrist for mnowedge Machine Leaming 1-25 Introduction to Machine Learning MATLAB MATLAB is a programming language developed by MathWorks. It started out as a matrix programming language where linear algebra programming was simple. It can be run both under interactive sessions and as a batch job. MATLAB is a high-performance language for technical computing. It integrates computation, visualization and programming environment. MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects, which together represent the state-of-the-art in software for matrix computation. The MATLAB system consists of five main parts : 1. The MATLAB language. This is a high-level matrix/array language with control flow statements, functions, data structures, input/output and object-oriented programming features. 2. The MATLAB working environment. This is the set of tools and facilities that you work with as the MATLAB user or programmer. It includes facilities for managing the variables in your workspace and importing and exporting data. 3. It handle graphics. This is the MATLAB graphics system. It includes high-level commands for two-dimensional and three-dimensional data visualization, image processing, animation and presentation graphics. 4. The MATLAB mathematical function library. This is a vast collection of computational algorithms ranging from elementary functions like sum, sine, cosine and complex arithmetic, to more sophisticated functions like matrix inverse, matrix eigenvalues, Bessel functions and fast Fourier transforms. 5. The MATLAB Application Program Interface (API). This is a library that allows you to write C and Fortran programs that interact with MATLAB. EEG Fill in the Blanks Qi az Machine leaning is a sub-field of which concerns with developing computational theories of learning and building learning machines. ______ learning in which the network is trained by providing it with input and matching output patterns. Both human as well as machine learning generate knowledge, one residing in the _____ the other residing in the TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Learning 1-26 Introduction to Machine Leaming aa as as a7 as as Q10 Qa quiz 13 aia ais 2.16 a7 Humans acquire through experience either directly or shared by others. Supervised learning and unsupervised learning are the types of Python is a true _______ language and is available on a wide variety of platforms. MATLAB is a programming language developed by _. Vectors and matrices in R are two ways to work with a collection of Machine learning algorithms discover the relationships between the variables of a system from direct ____ of the system. Human learning is based on the past A learning algorithm analyses the training data and produces an inferred function, which is called a classifier or a regression function. Supervised learning deals with two main tasks data. and Unsupervised learning uses CART stands for_ can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. learning is deals with agents that must sense and act upon their environment. It combines classical artificial intelligence and machine learning techniques. With reinforcement learning algorithms an agent can improve its performance by using the feedback it gets from the environment. This environmental feedback is called the ‘Supervised learning is also called learning. Unsupervised learning is also called _____ learning. When we are trying to predict a categorical or nominal variable, the problem is known as a problem. When we are trying to predict a real-valued variable, the’ problem falls under the category of EEA Muttipic Choice Questions ai ‘A computer program is said to learn from with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. training {b! experience [el testing {d) algorithm TECHNICAL PUBLICATIONS® - an up-thrust for knowledge os Machine Learning 1-27 Introduction to Machine Learning [a2 Jarvis Patrick Clustering algorithm is a clustering technique. [a) grid based [b) graph based [c) density based [al all of these @3 Which of the following is hierarchical clustering method : [a) Agglomerative [b) Divisive clustering [e PAM [a AandB 4 The k-means algorithm is sensitive to ______ because an object with an extremely large value may substantially distort the distribution of data. [a) outliers {b) text data [ej boasting [a] duster Q5 _____ hierarchical clustering method works by grouping data objects into a tree of clusters. fal PAM [b] Density-based method [e) Hierarchical [a] Grid-Based method 6 In DIANA, all of the objects are used to form initial luster. [al one [e] four Q7 If the clustering process is terminated when the distance between nearest clusters exceeds an arbitrary threshold, it is called a . [al dendrogram nearest-neighbor clustering algorithm minimal spanning tree algorithm [@ single-linkage algorithun Q8 Which of the following is NOT type of clusters ? [a] Well-separated clusters [b) Prototype-based clusters fe] Contiguity-based clusters [d) DBSCAN clusters TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming as aio qa quz aia quis aie 1-28 Introduction to Machine Learning Shared nearest neighbors is a clustering. [a density-based |b) well-separated fe [d) graph based Unsupervised learning deals with ____ and ___ mining problems. classification, regression [b) clustering, classification clustering, associative rule [d_ label, unlabelled data learning deals with two main tasks regression and classification. & Reinforcement [b) Deep © Un supervised d_ Supervised The individual tuples making up the training set are referred to as are selected from the database under analysis. © contiguity based learning tuples {b) training tuples samples [a] database Machine leaming is inherently a multi disciplinary field. Inter disciplinary [bo] Multi disciplinary Single [dl None methods have been used to train computer-controlled vehicles to steer correctly when driving on a variety of road types. ‘a. Machine learning {bl Data mining Neural networks [a) Robotics The individual tuples making up the training set are reffered to as _- and are selected from the database under analysis. {b) training tupes [a learing tupes [a database c) sampels Training perceptron is based on __. supervised learning technique |b) unsupervised learning [e|_ reinforced learning [a stochastic learning TECHNICAL PUBLICATIONS® - an up-thrust for knowledge a ————————— jie mt Machine Learning 1-29 Introduction to Machine Learning || Policy Q.17 List the elements of reinforcement learning. [b] Reward function {e] Value function {d) All of these Answor Keys for Fill In the Blanks Q4 | knowledge ‘MathWorks 7 | Q.10 | experiences Q.1 | artifical intelligence —Q.2.—Supervised Q3 brain, machine Q.5 machine leaming 6 | object-oriented Q.8 objects | Q.9 | samples Qa1 | supervised Q22 | Regression Classification 13 | unlabelled aploman Qa6 | Reinforcement Q49| descriptive Q.14 Classification and Q.15 | Concept leaming Regression 9 Q.17 reward signal =| Q.18 | predictive Q.20 classification «| Q.21 | regression Answer Keys for Multiple Choice Questions aa Qas ’ aQaQ TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Preparing to Model ‘Syllabus Machine Learning activities, Types of data in Machine Learning, Structures of data, Data quality and remediation, Data Pre-Processing : Dimensionality reduction, Feature subset selection. Contents 24 2.2 23 24 2.5 26 27 Machine Leaming Activities Types of Data in Machine Leaming Structures of Data Data Quality and Remediation Data Pre-Processing Fill in the Blanks Multiple Choice Questions a9 Mache Lasming 2-2 Preparing to Mode EZ] Machine Learning Activities Following are the typical preparation activities for model : a) Understand the types of input data ) Find protentional issue in the data ¢) Identify the nature and quality of data d) Find out the relationship between data e) Apply pre-processing Input data is divided into two parts : Training data and testing data Machine leaming is about learning some properties of a data set and applying them to new data. This is why a common practice in machine leaning to evaluate an algorithm is to split the data at hand in two sets, one that we call a training set ‘on which we leam data properties and one that we call a testing set, on which we test these properties. In training data, data is assigning the labels. In test data, data labels are unknown but not given. The training data consist of a set of training examples. The real aim of supervised learning is to do well on test data that is not known during learning. Choosing the values for the parameters that minimize the loss function on the training data is not necessarily the best policy. The training error is the mean error over the training sample. The test error is the expected prediction error over an independent test sample, Problem is that training error is not a good estimator for test error. Training error can be reduced by making the hypothesis more sensitive to training data, but this may lead to over fitting and poor generalization. Training set : A set of examples used for leaming, where the target value is known. Test set : It is used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased estimate of the generalization error. Training data is the knowledge about the data source which we use to construct the classifier. Fig. 2.1.1 shows four step process of machine learning. I: 2-3 Preparing to Mode! © Understand the types of input data i * Find protentional issue in the data Identify the nature and quality of data t + Find out the relationship between data + Apply pre-processing, 4 + Data partitioning | | ‘+ Model selection | | __ = Cross-validation | \ Step 3 Performance + Examine model performance | { __ evaluation © Visualize performance seep 4 Performance «Tuning model | improvement + Ensembling EEE] Types of Data in Machine Learning * Data set is collection of related records or information. The information may be on some entity or some subject area. * Collection of data objects and their attributes. Attributes captures the basic characteristics of an object. © Each row of a data set is called a record. Each data set also has multiple attributes, each of which gives information on a specific characteristic. * Following is an example of data set. wee ee, { | ale TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leeming 2-4 Preparing to Mode) * For example, in the data set on Emp, there are four attributes namely Emp-ID, Name, Department and Age, each of which understandably is a specific characteristic about the employee entity. # Attributes can also be termed as feature, variable, dimension or field. A row or record represents a point in the four-dimensional data space as each row has specific values for each of the four attributes or features. EAI qualitative and Quantitative Data © Data can broadly be divided into following two types : 1, Qualitative data 2 Quantitative data Quaitatve/Categorical Quantiative/Numeric t | J | Nominal Ordinal Intervel Ratio Fig. 224 Quatttative data : © Qualitative data provides information about the quality of an object or information which cannot be measured. Qualitative data cannot be expressed as a number. Data fist represent nominal scales such as gender, economic status, religious preference are usually considered to be qualitative data. © Qualitative data is data concerned with descriptions, which can be observed but b camnot be computed. Qualitative data is also called categorical data. Qualitative deta con be farther subdivided into two types as follows : 1 Nominal data Mos 2 Ordinal daz Nominal data © A nomena! data is the 1* level of measurement scale in which the numbers serve 28 “tags” ce “labels” to classify or identify the objects. TECHBOCAL PUBLICATIONS” - an opens for inowiedge is Machhne Leeming 2-6 Propering to Model ¢ A nominal data usually deals with the non-numeric variables or the numbers that do not have any value, While developing statistical models, nominal data are usually transformed before building the model. «It is also known as categorical variables Charactoristics of nominal data : 1, A nominal data variable is classified into two or more categories. In this measurement mechanism, the answer should fall into either of the classes. 2. It is qualitative. The numbers are used here to identify the objects. 3, The numbers don't define the object characteristics. The only permissible aspect of numbers in the nominal scale is “counting” © Example : 1. Gender : Male, Female, Other. 2. Hair color : Brown, Black, Blonde, Red, Other. Ordinal data * Ordinal data is a variable in which the value of the data is captured from an ordered set, which is recorded in the order of magnitude. * Ordinal represents the “order.” Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also ranked. © Characteristics of the ordinal data : a) The ordinal data shows the relative ranking of the variables. b) It identifies and describes the magnitude of a variable. ¢) Along with the information provided by the nominal scale, ordinal scales give the rankings of those variables. 4) The interval properties are not known. e) The surveyors can quickly analyze the degree of agreement conceming the identified order of variables. © Examples: a) University ranking : 1*, 9, 87°... 'b) Socioeconomic status : Poor, middle class, rich. ©) Level of agreement : Yes, maybe, no. d) Time of day : Dawn, moming, noon, afternoon, evening, night. Quantitative data © Quantitative data is the one that focuses on numbers and mathematical calculations and can be calculated and computed. TECHNICAL PUBLICATIONS® - an uptinust for imowtedge Machne Leeming 2-6 Preparing to Mode! © Quantitative data are anything that can be expressed as a number, or quantified, Examples of quantitative data are scores on achievement tests, number of hours of study, or weight of a subject. These data may be represented by ordinal, interval oF ratio scales and lend themselves to most statistical manipulation. « There are two types of quantitative data : Interval data and Ratio data Interval data : «Interval data corresponds to a variable in which the value is chosen from an interval set. * It is defined as a quantitative measurement scale in which the difference between the two variables is meaningful. In other words, the variables are measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary. © Characteristics of interval data : a) The interval data is quantitative as it can quantify the difference between the values. b) It allows calculating the mean and median of the variables ©) To understand the difference between the variables, you can subtract the values between the variables d) The interval scale is the preferred scale in statistics as it helps to assign any numerical values to arbitrary assessment such as feelings, calendar types, etc. © Examples : 1. Celsius temperature. 2. Fahrenheit temperature. 3. Time on a clock with hands. Ratio data : Any variable for which the ratios can be computed and are meaningful is called ratio data. It is a type of variable measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a unique feature. It possesses the character of the origin or zero points. © Characteristics of ratio data : a) Ratio scale has a feature of absolute zero. b) It doesn’t have negative numbers, because of its zero - point feature. ©) It affords unique opportunities for statistical analysis. The variables can be orderly added, subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio scale. TECHNICAL PUBLICATION? init for trowicon ‘Machine Learning 7 Preparing to Model ) Ratio data has unique and useful properties. One such feature is that it allows unit conversions like kilogram - calories, gram - calories, etc. + Examples : Age, Weight, Height, Ruler measurements, Number of children Ez Difference between Qualitative and Quantitative Data Qualitative data (Qualitative data pravides information about the Quantitative data relates to information about cannot be measured j | quality of an object or information which i Types : Nominal data and Ordinal data Narratives offen make use of adjectives and other descriptive words to refer to data on appearance, color, texture, and other qualities Quantitative data the quantity of an object; hence it can be measured — ‘Types : Interval data and Ratio data Measure’s quantities such as length, size, amount, price, and even duration. ‘They are descriptive rather than numerical in Expressed in numerical form. nature | For ecample: For example : | «The team is well prepared. = The leaf feels waxy. +The team has 7 players. +The leaf weighs 2 ounces. «The river is 25 miles long. = The river is peaceful. Ea Structures of Data + A data dictionary is a centralized repository of metadata. Metadata is data about data. © A data dictionary is a repository of names, definitions, and attributes that provides contextual information about data. A data dictionary traditionally refers to a database dictionary, metadata repository or business glossary. It primarily focuses on the meaning or definition of all columns in a data table. * In case the data dictionary is not available, we need to use standard library function of the machine learning tool that we are using and get the details. EERE Exploring Numerical Data © There are two most effective mathematical plots to explore numerical data ; Box plot and histogram 1) Understanding central tendency : © Central tendency is a descriptive summary of a dataset through a single value that reflects the center of the data distribution TECHNICAL PUBLICATIONS® - an up-ttrust for knowledge | } | | | | Machine Leaming 2-8 Preparing to Modey © To understand the nature of numeric variables, we can apply the measures of central tendency of data, ie. mean and median Let x1,x2,%3/..Xq be the set ‘n! values of the variate, then arithmetic mean or mean is given as, mL XL FXQ XG tet Xe = eee Median : Let the values of the variable are arranged in the ascending order of magnitude. Then median is the middle item, if number of values are odd and median will be mean of two middle terms if the number of values in even. * Median is the mid-value that divide total frequency in two equal parts. © Example : Below is the data set of pizza price is given cities. Find Mean and Median of both the cities . a A B co) Solution : Mean of New Delhi pizza price = 142+3+3+4+54647+9411+66 _ 19656 nN Mean of New Lucknow pizza price = 1+2+3+4+5+6+7+8+9410 _ 55 10 pe Machine Leaming 2-9 Preparing to Model Median of New Delhi pizza price = N*4obs = “Mobs = 6! obs Here 6" obs = 5 Median = Median of Lucknow pizza price = (N/2) + ((N+)/2) = (5/2) + (6/2) =55 Median = 5.5 © The mean has one main disadvantage : It is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. For example, consider the wages of staff at a factory below : caf 1 (2 [3 4 5 |6 17 8 |o ww | + | / Salary 15K 18K | 1K 14K 15K | 15K | 2K 7K | OK | 95K | © The mean salary for these ten staff is $30.7 K. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12 K to 18 K range. © The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency. As we will find out later, taking the median would be a better measure of central tendency in this situation.. 2) Understanding data spread : Definition : It is the scatteredness or spread of data about an average value. It gives an idea about how individual values difffer from the central value, i.c. whether they are closely packed around central value or widely scattered away from it. Variance Range ‘Standard Coefficients of deviation Variation Fig. 2.3.1 Measures of dispersion TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming 2-10 Preparing tp + The magnitude of the variation is called dispersion. * Fig. 23.1 shows measures of dispersion. Variance : The second central moment is called varidation. It is given as, 02 = Vath) = eo-myy? = Jem? hxtde wad = BX? ]-mj o a © Variance can also be given as, ee BEA? Here N = 4 Let ‘A’ be assumed mean, hi’ be the Then mem s= A+ BEM ‘Standard deviation = i the measure of spread over the values of "X' relative to mean value. It is given * Standard deviation of a data is measured as follows : Standard deviation (x) = Variance (@ © Larger value of variance or standard deviation indicates more dispersion in the data and vice versa. GERBER) Consiter the data values of two attributes. Abtrabate 1 values : 4, £6, 48, 45, $7 Calculate variance Solution = 2 Machine Leaming 2-94 _ 442 4467 +48? 4.452 4472 3 _ 193642116+ 2304 + 2025+ 2209 ( 5 = ao? o25 Preparing to Modal H+ 464 48445447) 5 Difference benyess: standard aayieton and arenes Standard deviation Standard deviation is a measure of dispersion | of the values of a data set from their mean. Itis a common term in statistical theory to calculate central tendency At measures the absolute variability of the dispersion | It is calculated by taking the square root of the variance. ‘The standard deviation is symbolized by the Greek letter sigma “0” as in lower case sigma =f S&- Mein where M = Mean, x = A values in a data set, and n = Number of values ‘Used in finance sector as a measure of market Variance It is the statistical measure of how far the numbers are spread in a data set from their average. Variance is primarily used for statistical i probability distribution to measure volatility from the mean It helps determine the size of the data spread. It is calculated by taking the average of the squared deviation of each value in the data set from the mean The notation for the variance of a variable is | sigma squared | = S(x-M} in where M = Mean, x = Each value in the data set, n = Number of values in the data set Used in asset allocation Seep wee ay [EEE] Plotting and Exploring Numerical Data 1. Box plots * The box plot is a useful graphical display for describing the behaviour of the data in the middle as well as at the ends of the distributions. The box plot uses the median and the lower and upper quartiles. If the lower quartile is Q, and the upper quartile is Q,, then the difference (Q; - Q,) is called the interquartile range or 10. Machine Leaming de Preparing to Mode! * Box plot is also called whisker plot. It shows data using the middle value of the | data and the quartiles, or 25 % divisions of the data. * Box plot shows the fivenumber summary of a set of data : Minimum, lower quartile, median, upper quartile and maximum. Lower quartile Upper quartile a, Median Q, am Wrsker ‘Whisker interquartile range Fig. 23.2 FREY Construct a box plot for the following data : 12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25 Solution : Step 1: Arrange the data in ascending order, Step 2: Find the median, lower, upper quartile 5 7, 1% 16 15, 22,25, 80,96, A,B { t { Lower quartile Median Upper quartile Median (middle value) = 22 Lower quartile (middle value of the lower half) = 12 Upper quartile (middle value of the upper half) = 36 Step 3: Draw a number line that will include the smallest and the largest data. 5 1 1% 2 2 30 35 40 580 Step 4: Draw three vertical lines at the lower quartile (12), median (22) and the upper quartile (36), just above the number line. TECHNICAL PUBLICATIONS”. an up-thrust for knowledge Machine Leaming 2-13 Proparing to Modal Step 5 Join the lines for the lower quartile and the upper quartile to form a box. CT tt ttt 5 1 18 20 2 30 35 40 45 50 Step 6 + Draw a line from the smallest value (5) to the left side of the box and draw a line from the right side of the box to the biggest value (53). ee RS eee eee Se 5 10 15 2 2 30 3 40 45 £60 Histogram : « Ina histogram, the data are grouped into ranges (eg. 10 - 19, 20 - 29) and then plotted as connected bars. Each bar represents a range of data. © The width of each bar is proportional to the width of each category, and the height is proportional to the frequency or percentage of that category. «Fig. 2.3.3 shows distributions of a Histogram. Fig. 2.3.3 (a) Normal distribution Fig. 2.3.3 (b) Blmodal distribution | | { Fig. 2.3.3 (c) Right-skewed distribution | Fig. 2.3.3 (d) Left-skewed distribution | TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Learning 204 Preparing to Mode! (o) Random distribution A normal distribution : In a normal distribution, points on one side of the average are as likely to occur as on the other side of the average. A bimodal distribution : In a bimodal distribution, there are two peaks. In a bimodal distribution, the data should be separated and analyzed as separate normal distributions. A right-skewed distribution : A right-skewed distribution is also called a positively skewed distribution. In a right-skewed distribution, a large number of data values occur on the left side with a fewer number of data values on the right side. A right-skewed distribution usually occurs when the data has a range boundary on the left-hand side of the histogram. For example, a boundary of 0. A left-skewed distribution : A left-skewed distribution is also called a negatively skewed distribution. In a left-skewed distribution, a large number of data values occur on the right side with a fewer number of data values on the left side. A right-skewed distribution usually occurs when the data has a range boundary on the right-hand side of the histogram. For example, a boundary such as 100. 5. A random distribution : A random distribution lacks an apparent pattern and has several peaks. In a random distribution histogram, it can be the case that different data properties were combined. Therefore, the data should be separated and analyzed separately. * Machine Leaming 2-15 Preparing to Model EEE Exploring Relationship between Variables Scatter plot : i @ It displays collection of all the points for the set of data limited only for two | values. It also called scatter plot, X-Y graph. While working with statistical data it is often observed that there are connections between sets of data. For example, the mass and height of persons are related, the taller the person the greater his/her mass. ® To find out whether or not two sets of data are connected scatter diagrams can be used. Fig. 2.3.4 shows scatter diagram. 160 150 140 130 Height 120 110 t 100 Fig, 2.3.4 Scatter dlagram © Scatter diagram shows the relationship between children's age and height. A scatter diagram is a tool for analyzing relationship between two variables. One variable is plotted on the horizontal axis and the other is plotted on the vertical axis. © The patter of their intersecting points can graphically show relationship patterns. Commonly a scatter diagram is used to prove or disprove cause-and-effect relationships. * While scatter diagram shows relationships, it does not by itself prove that one variable causes other. In addition to showing possible cause and effect relationships, a scatter diagram can show that two variables are from a common cause that is unknown or that one variable can be used as a surrogate for the other. Mechine Leaming 2-16 Preparing to Mode} Two - way cross - tabulations «© Two - way cross - tabulations is also called cross - tab or contingency table. It is used to understand the relationship of two categorical attributes in a concise way Data Quality and Remediation * Data remediation is the process of cleansing, organizing and migrating data so that it's properly protected and best serves its intended purpose Data Quality « A data which has the right quality helps to achieve better prediction accuracy, in case of supervised learning. Data quality problems are 1. Certain data elements without a value or data with a missing value. 2. Data elements having value surprisingly different from the other elements, which we term as outliers There are multiple factors which lead to these data quality issues. a) Incorrect sample set selection b) Errors in data collection Measuring data quality levels can help organizations identify data errors that need to be resolved and assess whether the data in their IT systems is fit to serve its intended purpose. . Data Remediation © Outliers are data elements with an abnormally high value which may impact prediction accuracy, especially in regression models. ® An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. © First quartile (Q;) : The first quartile is the value, where 25 % of the values are smaller than Qy and 75 % are larger. © Third quartile (Q;) : The third quartile is the value, where 75 % of the values are smaller than Q3 and 25 % are larger. Outliers are data elements with an abnormally high value which may impact prediction accuracy, especially in regression models Outlier detection is the process of detecting and subsequently excluding outliers from a given set of data. Fig. 24.1 shows outliers detection. Here O, and O, seem outliers from the rest. TECHNICAL puaucaTions® ‘an up-thrust for knowledge ‘Machine Leeming 2-17 Preparing to Model Fig. 2.4.1 Outilers detection * An outlier may be defined as a piece of data or observation that deviates drastically from the given norm or average of the data set. An outlier may be caused simply by chance, but it may also indicate measurement error or that the given data set has a heavy - tailed distribution Handling missing values © Ina data set, one or more data elements may have missing values in multiple records. «These dirty data will affects on miming procedure and Jed to unreliable and poor output. Therefore it is important for some data cleaning routines. How to handle noisy data in data mining 7 * Following methods are used for handling noisy data : 1. Ignore the tuple : Usually done when the class label is missing. This method is not good unless the tuple contains several attributes with missing values. 2. Fill in the missing value manually : It is time-consuming and not suitable for a large data set with many missing values. 3. Use a global constant to fill in the missing value : Replace all missing attribute values by the same constant. 4. Use the attribute mean to fill in the missing value : For example, suppose that the average salary of staff is Rs 65000/- . Use this value to replace the missing value for salary. 5. Use the attribute mean for all samples belonging to the same class as the given tuple 6. Use the most probable value to fill in the missing value Data Pre-Processing © Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Aim to reduce the data size, find the relation = Machine Leeming 2-18 Preparing to Mode) between data and normalized them. Data pre-processing is a proven method of resolving such issues. Data preprocessing prepares raw data for further Processing. Data which capture from various source is not pure. It contains some noise. It is called dirty data or incomplete data. In this data, there is lacking attribute values, lacking certain attributes of interest, or containing only aggregate data. For example : occupation=" Noisy data which contains errors ot outliers, For example : Salary="-10" * Inconsistent data which contains discrepancies in codes or names. For example : Age = "51" Birthday="03/08/1998" * Incomplete, noisy, and inconsistent data are commonplace properties of large real-world databases and data warehouses. Incomplete data can occur for a number of reasons EER] pimensionality Reduction © Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. © Most machine learning and data mining techniques may not be effective for high-dimensional data. Query accuracy and efficiency degrade rapidly as the dimension increases. The “dimensionality” simply refers to the number of features (ie. input variables) in your dataset. * When the number of features is very large relative to the number of observations in your dataset, certain algorithms struggle to train effective models. This is called the "Curse of Dimensionality,” and it's especially relevant for clustering algorithms that rely on distance calculations. © Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction. It reduces the time and storage space required. Removal of multi-collinearity improves the interpretation of the parameters of the machine leaming model. « There are many methods to perform dimension reduction. 1. Missing values : While exploring data, if we encounter missing values, what we do ? Our first step should be to identify the reason then impute missing values / drop variables using appropriate methods. But, what if we have too many missing values ? Should we impute missing values or drop the variables ? Machine Leaming 2-19 Preparing to Model 2. Low variance : Let's think of a scenario where we have a constant variable in our data set. 3. Desicion trees : It can be used as a ultimate solution tackle multiple challenges like missing values, outliers and identifying significant variavbles. 4, Random forest : Similar to decision tree is random forest. 5. High coreelation : Dimensions exhitbiting higher correlation can lower down the performance of model. Moreover, it is not good to have multipule variables of similar information or variation also known as "Multicollinearity”. ‘Advantagos of dimensionality reduction * It helps in data compression, and hence reduced storage space. « It reduces computation time. It also helps remove redundant features, if any. Disadvantages of dimensionality reduction « It may lead to some amount of data loss. PCA tends to find linear correlations between variables, which is sometimes undesirable. * PCA fails in cases where mean and covariance are not to define datasets. * We may not know how many principal components to keep in practice, some thumb rules are applied. Principal Component Analysis * If the original data can be reconstructed from the compressed data without any loss of information, the data reduction is called lossless. If, instead, we can reconstruct only an approximation of the original data, then the data reduction is called lossy. * Lossy dimensionality reduction methods are Principal Components Analysis (PCA) and wavelet transforms. * Principal Component Analysis (PCA) is to reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables, retains most of the sample's information and useful for the compression and classification of data. * In PCA, it is assumed that the information is carried in the variance of the features, that is, the higher the variation in a feature, the more information that feature carries. ‘+ Hence, PCA employs a linear transformation that is based on preserving the most variance in the data using the least number of dimensions. TECHNICAL PUBLICATIONS® - an up-thrust for inowtedge Machine Leeming 2-20 Preparing to Mode) ¢ The most common approach for dimensionality reduction is known as Principal Component Analysis (PCA). PCA is a statistical technique to convert a set of correlated variables into a set of transformed, uncorrelated variables called principal components. The principal components are a linear combination of the original variables. « A Discrete Wavelet Transform (DWT) is a transform that decomposes a given signal into a number of sets, where each set is a time series of coefficients describing the time evolution of the signal in the corresponding frequency band. + Another commonly used technique which is used for dimensionality reduction is Singular Value Decomposition (SVD). Feature Subset Selection * A good feature representation is central to achieving high performance in any machine learning task. © Consider an example of text categorization. Assume that we need to train a model for classifying a given document as spam and not spam. If we represent a document as a bag of words, the feature space consists of a vocabulary of all unique words present in all the documents in the training set. * For a collection of 100,000 to 1,000,000 documents, we can easily expect hundreds of thousands of features. If we further extend this document model to include all possible bigrams and trigrams, we could easily get over a million features. © A feature tree is a tree such that each internal node is labelled with a feature, and each edge emanating from an internal node is labelled with a literal. The set of f literals at a node is called a split. + Each leaf of the tree represents a logical expression, which is the conjunction of Eterals encountered on the path from the root of the tree to the leaf. The extension of that conjunction is called the instance space segment associated with the leaf. ‘Two features are redundant if they are highly correlated, regardless of whether they are correlated with the task or not. | Feature construction and transformation | © Feature construction involves transforming a given set of input features to | generate a new set of more powerful features which can then use for prediction. * Feature construction methods may be applied to pursue two distinct goals : Reducing data dimensionality and improving prediction performance. © Steps: 1. Start with an initial feature space Fy } TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Machine Leaming 2-21 Preparing to Mode! 2, Transform Fo to construct a new feature space Fjy 3. Select a subset of features F; from Fy 4. If some terminating criteria is achieved : Go back to step 3 otherwise set Fr =F 5. Fr is the newly constructed feature space The initial feature space Fy consists of manually constructed features that often encode some basic domain knowledge. The task of constructing appropriate features is often highly application specific and labour intensive. Thus, building auto-mated feature construction methods that require minimal user effort is challenging. In particular we want methods that : 1. Generate a set of features that help improve prediction accuracy. 2. Are computationally efficient. 3. Are generalizable to different classifiers. 4. Allow for easy addition of domain knowledge. Genetic programming is an evolutionary algorithm - based technique that starts with a population of individuals, evaluates them based on some fitness function and constructs a new population by applying a set of mutation and crossover operators on high scoring individuals and eliminating the low scoring ones. In the feature construction paradigm, genetic programming is used to derive a new feature set from the original one. Individuals are often tree like representations of features, the fitness function is usually based on the prediction performance of the classifier trained on these features while the operators can be applications specific. The method essentially performs a search in the new feature space and helps generate a high performing subset of features. The newly generated features may often be more comprehensible and intuitive than the original feature set, which makes GP-related methods well-suited for such tasks. Ih decision trees, the model explicitly selects features that are highly correlated with the label. In particular, by limiting the depth of the decision tree, one can at least hope that the model will be able to throw away irrelevant features. In the case of K-nearest neighbours, the situation is perhaps more terrible. Since KNN weighs each feature just as much as another feature, the introduction of irrelevant features can completely mess up KNN prediction. Feature extraction is a process that extracts a set of new features from the original features through some functional mapping. TECHNICAL PUBLICATIONS® - en up-thrust for knowledge Macnne Lose zz Preparing to Meas © Trensfommetion stadies ways of mapping original attributes to new feature, Didferent mappongs can be exployed to extract features. © ih gene the mappings cn be categorized into linear or nonlinesr transformations. One could categorize trensformations along two dimensions linear and labeled, near and non labeled, nonlinear and labeled, nonlinear and non labeled. Feature selection ¢ Feanme selection is 2 process thet chooses a subset of features from the original feanmes so that the feature space is optimally reduced according to a certain cciterion © Festre selection is a critical step in the feature construction process. In text categorization problems, some words simply do not appear very often. Pechaps the word “groovy” appears in exactly one training document, which is Positive Is it really worth keeping this word around as a feature ? It's a dangerous endeavour because it's hard to tell with just one training example if it is really correlated with the positive class, or is it just noise. ‘You could hope that your learning algorithm is smart enough to figure it out. Or you could just remove it. There are three general classes of feature selection algorithms : Filter methods, wrapper methods and embedded methods. «The role of feature selection is as follows: 1. To reduce the dimensionality of feature space. 2 To speed up a learning algorithm. 3. To improve the predictive accuracy of a classification algorithm. 4 To improve the comprehensibility of the learning results. ¢ Features selection algorithms are as follows : 1. Instance based approaches : There is no explicit procedure for feature subset generation. Many small data samples are sampled from the data. Features are weighted according to their roles in differentiating instances of different classes for a data sample. Features with higher weights can be selected. 2. Nondeterministic approaches : Genetic algorithms and simulated annealing are also used in feature selection. 3. Exhaustive complete approaches : Branch and bound evaluates estimated accuracy and ABB checks an inconsistency measure that is monotonic. Both start with a full feature set until the pre-set bound cannot be maintained. TECHNICAL PUBLICATIONS® - an ups! for knowedige Machine Leeming 2-23 Preperng to Mode! EA Fill in the Bianks = set is collection of related records or information. iQ2_ Each row of a data set is called a Qualitative data is also called data. laa _____ data provides information about the quality of an object or information which cannot be measured. 1.5 Dimensionality reduction helps in reducing irrelevance and in features. 126 Dimensionality reduction refers to the techniques of reducing the dimensionality of a data set by creating new attributes by combining the original 1.7 Lossy dimensionality reduction methods are and wavelet transforms. lag. An__ is an observation that lies an abnormal distance from other values in a random sample from a population. la9 Exploration of numerical data can be best done using and a.10 Data can be broadly divided into___ data and data. Multiple Choice Questions (ai Data can be broadly divided into - ‘a. qualitative data ‘D quantitative data [qualitative and Quantitative data d ratio data 22 Feature selection tries to eliminate features which are [a tich iB [el irrelevant redundant @ relevant 3 Principal component analysis is used for [a dimensionality Enhancement LU decomposition QR decomposition dimensionality reduction = {e (a. f 1.4 Which of the following methods to perform dimension reduction ? [a Missing values [b) Decision tree [e Random forest id) All of these TECHNICAL PUBLICATIONS® - an up-tnrust for knowiedge Machine Leaming 2-04 Preparing to Mo) Answer Keys for Fill In the Blanks Qi Data | Fs i | | Qoo0 TECHNICAL PUBLICATIONS® - an yphrut for knowledge

You might also like