MU -- MIT
MACHINE LEARNING-INDIVISUAL ASSIGNMENT
NAME SOLOMON ABRHA
ID - MIT/UR/122/12
1,What is mean input and output variables in ML? Which is dependent and
independent variable ?
Input Variables
       Also known as features, predictors, and it is independent variables.
       These are the variables or attributes used to predict the output variable.
       They are the input to the model and represent the information or data you provide to the
        algorithm for analysis or prediction.
       Example: In a house price prediction problem, input variables could be:
           o Number of bedrooms
           o Square footage
           o Location of the house
Output Variable
       Also known as the target variable, label, and it is dependent variable.
       This is the variable the model is trying to predict or explain.
       It is dependent on the input variables since its value is determined by them.
       Example: In the same house price prediction problem, the output variable is:
             o Price of the house
       Input variable is Independent Variable: It stands alone and is not affected by other
        variables in the dataset.
       Output variable is Dependent Variable: It depends on the independent variables. In
        ML, the goal is to find a relationship or function that maps the independent variables to
        the dependent variable.
2. Write the difference between model and algorithm ?
Model
       A model is a mathematical representation of a real-world process. It is the result of applying an
        algorithm to data and is used to make predictions or decisions based on new data.
           A model can make predictions on unseen data based on the patterns it has learned.
           A model is the final output of an ML algorithm after it has been trained on data. It represents
            the learned patterns, relationships, or rules that the algorithm discovered in the training data.
           Examples include linear regression models, decision trees, neural networks, etc.
Algorithm
             An algorithm is a set of rules or instructions for solving a problem or performing a
              task. In   the context of machine learning, it refers to the procedure used to learn the
              model from data.
             An algorithm is a set of mathematical instructions or a procedure that is used to train a model
              by finding patterns in data.
             Used To process data, optimize the model parameters, and derive the model.
             Examples include gradient descent, random forest, support vector machines, etc.
3, Write the difference between parametric and nonparametric ML algorithms.
 Parametric ML algorithm
           A parametric algorithm has a fixed number of parameters.
           Parametric methods make large assumptions about the mapping of the input variables to the
            output variable.
           Parametric machine learning algorithms simply the mapping to a know functional form
           Its Model Structure is defined by a fixed number of parameters (e.g., coefficients in
            linear regression).
           Its Training Speed is Generally faster to train because they require estimating a limited
            number of parameters.
Non-Parametric ML Algorithms
           These algorithms do not make strong assumptions about the data and have no fixed
            number of parameters. The complexity of the model can grow with the amount of data.
           Non-parametric algorithms uses a flexible number of parameters, and the number of
            parameters often grows as it learns from more data.
           Non-parametric algorithms uses a flexible number of parameters, and the number of
            parameters often grows as it learns from more data.
           Model Structure: The model complexity can grow with the amount of data (e.g., the
            number of training samples).
           Training Speed: Typically slower to train, especially with large datasets, since they may
            involve storing the entire dataset or a large portion of it.
4, Write the difference between over fitting and under fitting. Explain the cause of
over fitting and under fitting.
overfitting
      Over fitting refers to a model learns the training data too well but not generalizing well to new
       data,.
      High accuracy on the training dataset but poor performance on the validation/test dataset.
   
      The model is overly complex, often having too many parameters relative to the amount of
       training data.
       It reflects a situation where the model memorizes the training data instead of learning
       general patterns.
        Overfitting occurs when a model learns the details and noise in the training data to the
       extent that it negatively impacts its performance on new data. In essence, the model
       becomes too complex and captures patterns that do not generalize.
   underfitting
      Under fitting refers to a model that can neither well the training data not generalize to new
       data. It failing to learn the problem from the training data sufficiently.
      Causes due to Too Simple Models: Using overly simplistic models (e.g., a linear model
       for a non-linear relationship) which cannot capture the data's complexity.
      Causes due to Insufficient Training: Not training the model long enough for it to learn
       from the training data effectively.
      Causes due to Excessive Feature Reduction: Removing too many features can lead to
       loss of important information necessary for making accurate predictions.
      Underfitting occurs when a model is too simple to capture the underlying structure of the data.
       It fails to learn the relationships in the data, leading to poor performance on both the training
       and test datasets.
5, Explain the way to use data in ML. Describe attribute or feature.
      Using data effectively in machine learning (ML) is crucial for building models that generalize well
       to unseen data. The process involves several steps
1,problem understanding and Data Collection:
      Gather data from various sources, which can include databases, external APIs, web
       scraping, or existing datasets.
      Ensure that the data is relevant to the problem you're trying to solve.
2, Data Understanding:
      Explore and analyze the data to understand its structure and characteristics.
3,Data Cleaning:
      Handle missing values, duplicates, and outliers.
      Correct inconsistencies and format the data properly to ensure quality inputs for the
       model.
4, Feature Selection and Engineering:
       Select relevant features that contribute to the prediction of the target variable.
       Create new features from existing ones (feature engineering) to improve model
        performance. This might involve combining, transforming, or encoding variables.
5, Data Splitting:
       Divide the dataset into training, validation, and test sets. Common splits are 70% for
        training, 15% for validation, and 15% for testing.
6, Model Training:
       Choose an appropriate algorithm and train the model using the training data.
       Adjust the model's parameters to minimize prediction errors.
7,Model Evaluation:
       Test the trained model using the validation/test dataset.
       Use performance metrics (like accuracy, precision, recall, F1-score, etc.) to evaluate how
        well the model predicts on unseen data.
8,Model Tuning:
       Fine-tune the model's hyperparameters, structure, or features based on evaluation results.
       This may involve techniques such as cross-validation.
9, Deployment:
       Once the model is trained and evaluated, it can be deployed to make predictions on new
        data in a production environment.
10, Monitoring and Maintenance:
       Continuously monitor the model's performance over time.
       Update the model and data as necessary to ensure it remains relevant and accurate.
Attributes or features are individual measurable properties or characteristics of the data being used in
the machine learning model. They are the input variables that the model uses to make predictions.
Types of Features:
       Numerical Features: Continuous numerical values (e.g., age, temperature, salary) that
        can be further categorized into:
            o Continuous: Values can take on any real number (e.g., height, price).
            o Discrete: Countable values (e.g., number of children, number of cars).
       Categorical Features: Represent discrete categories or groups (e.g., gender, color, city).
        They can be further divided into:
            o Nominal: No inherent order (e.g., red, blue, green).
            o Ordinal: There is an order or ranking (e.g., ratings from 1 to 5).
       Binary Features: A specific type of categorical feature that has only two values (e.g.,
        yes/no, true/false).
       etc
6. What is feature engineering ?
Traditional ML algorithms require carefully handcrafted features also called feature engineering. It uses
external feature extraction algorithms and the extracted features depend on the algorithms.
Feature Engineering is a crucial step in the machine learning (ML) process that involves creating,
selecting, and transforming features (attributes) from raw data to improve the performance of machine
learning models. The goal of feature engineering is to provide the models with the most informative and
relevant data, enabling them to make better predictions or classifications.
Feature engineering is an iterative and creative process that requires domain knowledge, analytical
skills, and a deep understanding of the data. It plays an essential role in building effective machine
learning models and is often what distinguishes successful models from those that fail to perform well.
TO INSTRUCTOR SIMON H.
DUE DATE DECEMBER 20