KEMBAR78
Data | PDF | Standard Deviation | Outlier
0% found this document useful (0 votes)
42 views126 pages

Data

Uploaded by

6whoisoz9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views126 pages

Data

Uploaded by

6whoisoz9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

MATLAB®

Data Analysis

R2023b
How to Contact MathWorks

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

Phone: 508-647-7000

The MathWorks, Inc.


1 Apple Hill Drive
Natick, MA 01760-2098
MATLAB® Data Analysis
© COPYRIGHT 2005–2023 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied
only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form
without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through
the federal government of the United States. By accepting delivery of the Program or Documentation, the government
hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer
software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014.
Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain
to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government) and shall
supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is
inconsistent in any respect with federal procurement law, the government agrees to return the Program and
Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be
trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for
more information.
Revision History
September 2005 Online only New for MATLAB 7.1 (Release 14SP3)
March 2006 Online only Revised for MATLAB 7.2 (Release 2006a)
September 2006 Online only Revised for MATLAB 7.3 (Release 2006b)
March 2007 Online only Revised for MATLAB 7.4 (Release 2007a)
September 2007 Online only Revised for MATLAB 7.5 (Release 2007b)
March 2008 Online only Revised for MATLAB 7.6 (Release 2008a)
October 2008 Online only Revised for MATLAB 7.7 (Release 2008b)
March 2009 Online only Revised for MATLAB 7.8 (Release 2009a)
September 2009 Online only Revised for MATLAB 7.9 (Release 2009b)
March 2010 Online only Revised for MATLAB 7.10 (Release 2010a)
September 2010 Online only Revised for MATLAB 7.11 (Release 2010b)
April 2011 Online only Revised for MATLAB 7.12 (Release 2011a)
September 2011 Online only Revised for MATLAB 7.13 (Release 2011b)
March 2012 Online only Revised for MATLAB 7.14 (Release 2012a)
September 2012 Online only Revised for MATLAB 8.0 (Release 2012b)
March 2013 Online only Revised for MATLAB 8.1 (Release 2013a)
September 2013 Online only Revised for MATLAB 8.2 (Release 2013b)
March 2014 Online only Revised for MATLAB 8.3 (Release 2014a)
October 2014 Online only Revised for MATLAB 8.4 (Release 2014b)
March 2015 Online only Revised for MATLAB 8.5 (Release 2015a)
September 2015 Online only Revised for MATLAB 8.6 (Release 2015b)
March 2016 Online only Revised for MATLAB 9.0 (Release 2016a)
September 2016 Online only Revised for MATLAB 9.1 (Release 2016b)
March 2017 Online only Revised for MATLAB 9.2 (Release 2017a)
September 2017 Online only Revised for MATLAB 9.3 (Release 2017b)
March 2018 Online only Revised for MATLAB 9.4 (Release 2018a)
September 2018 Online only Revised for MATLAB 9.5 (Release 2018b)
March 2019 Online only Revised for MATLAB 9.6 (Release 2019a)
September 2019 Online only Revised for MATLAB 9.7 (Release 2019b)
March 2020 Online only Revised for MATLAB 9.8 (Release 2020a)
September 2020 Online only Revised for MATLAB 9.9 (Release 2020b)
March 2021 Online only Revised for MATLAB 9.10 (Release 2021a)
September 2021 Online only Revised for MATLAB 9.11 (Release 2021b)
March 2022 Online only Revised for MATLAB 9.12 (Release 2022a)
September 2022 Online only Revised for MATLAB 9.13 (Release 2022b)
March 2023 Online only Revised for MATLAB 9.14 (Release 2023a)
September 2023 Online only Revised for Version 23.2 (R2023b)
Contents

Data Processing
1
Importing and Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Importing Data into the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Exporting Data from the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

Plotting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Load and Plot Data from Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3

Remove Linear Trends from Timetable Data . . . . . . . . . . . . . . . . . . . . . . . . 1-5

Missing Data in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9

Data Smoothing and Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13

Clean Messy Data and Locate Extrema Using Live Editor Tasks . . . . . . . 1-26

Filter Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33


Filter Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
Moving-Average Filter of Traffic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
Modify Amplitude of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-34

Smooth Data with Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-37

Computing with Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41


Functions for Calculating Descriptive Statistics . . . . . . . . . . . . . . . . . . . 1-41
Example: Using MATLAB Data Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 1-42
Data Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43

Regression Analysis
2
Linear Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3

Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Residuals and Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

v
Fitting Data with Curve Fitting Toolbox Functions . . . . . . . . . . . . . . . . . 2-11

Interactive Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13


Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Preparing for Basic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Opening the Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Example: Using Basic Fitting UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14

Programmatic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26


MATLAB Functions for Polynomial Models . . . . . . . . . . . . . . . . . . . . . . . 2-26
Linear Model with Nonpolynomial Terms . . . . . . . . . . . . . . . . . . . . . . . . 2-26
Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Programmatic Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28

Time Series Analysis


3
Time Series Objects and Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

Manage Experiments
4
Keyboard Shortcuts for Experiment Manager . . . . . . . . . . . . . . . . . . . . . . . 4-2
Shortcuts for General Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Shortcuts for Experiment Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Shortcuts for Results Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

Run Experiments in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4


Run Multiple Simultaneous Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Run Single Trial on Multiple Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Set Up Parallel Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

Offload Experiments as Batch Jobs to a Cluster . . . . . . . . . . . . . . . . . . . . . 4-6


Create Batch Job on Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Track Progress of Batch Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
Cancel Batch Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
Delete Batch Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8

Debug General-Purpose Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9


Start Debugging Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Debug Experiment Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Verify Your Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10

Experiment with Predator-Prey Equations . . . . . . . . . . . . . . . . . . . . . . . . 4-12

Compare Air Resistance Models for Projectile Motion . . . . . . . . . . . . . . 4-20

vi Contents
1

Data Processing

• “Importing and Exporting Data” on page 1-2


• “Plotting Data” on page 1-3
• “Remove Linear Trends from Timetable Data” on page 1-5
• “Missing Data in MATLAB” on page 1-9
• “Data Smoothing and Outlier Detection” on page 1-13
• “Clean Messy Data and Locate Extrema Using Live Editor Tasks” on page 1-26
• “Filter Data” on page 1-33
• “Smooth Data with Convolution” on page 1-37
• “Computing with Descriptive Statistics” on page 1-41
1 Data Processing

Importing and Exporting Data


In this section...
“Importing Data into the Workspace” on page 1-2
“Exporting Data from the Workspace” on page 1-2

Importing Data into the Workspace


The first step in analyzing data is to import it into the MATLAB workspace. See “Supported File
Formats for Import and Export” for information about importing data from specific file formats.

Exporting Data from the Workspace


When you analyze your data, you might create new variables or modify imported variables. You can
export variables from the MATLAB workspace to various file formats, both character-based and
binary. You can, for example, create HDF and Microsoft® Excel® files containing your data. For
details, see the documentation on “Supported File Formats for Import and Export”.

1-2
Plotting Data

Plotting Data
In this section...
“Introduction” on page 1-3
“Load and Plot Data from Text File” on page 1-3

Introduction
After you import data into the MATLAB workspace, it is a good idea to plot the data so that you can
explore its features. An exploratory plot of your data enables you to identify discontinuities and
potential outliers, as well as the regions of interest.

The MATLAB figure window displays plots. See “Types of MATLAB Plots” for a full description of the
figure window. It also discusses the various interactive tools available for editing and customizing
MATLAB graphics.

Load and Plot Data from Text File

This example uses sample data in count.dat, a space-delimited text file. The file consists of three
sets of hourly traffic counts, recorded at three different town intersections over a 24-hour period.
Each data column in the file represents data for one intersection.

Load the count.dat Data

Import data into the workspace using the load function.

load count.dat

Loading this data creates a 24-by-3 matrix called count in the MATLAB workspace.

Get the size of the data matrix.

[n,p] = size(count)

n = 24

p = 3

n represents the number of rows, and p represents the number of columns.

Plot the count.dat Data

Create a time vector, t, containing integers from 1 to n.

t = 1:n;

Plot the data as a function of time, and annotate the plot.

plot(t,count),
legend('Location 1','Location 2','Location 3','Location','NorthWest')
xlabel('Time'), ylabel('Vehicle Count')
title('Traffic Counts at Three Intersections')

1-3
1 Data Processing

See Also
load | plot | legend | xlabel | ylabel | title | size

More About
• “Types of MATLAB Plots”

1-4
Remove Linear Trends from Timetable Data

Remove Linear Trends from Timetable Data

This example shows how to remove a linear trend from daily closing stock prices in a timetable. If the
data does have a trend, detrending forces the mean of the detrended data to zero and reduces overall
variation.

Create Stock Data

Create a sample timetable containing daily closing stock prices. Use randomly sampled numbers from
a normal distribution.

x = 0:300;
Time = days(x)';
dailyFluct = gallery("normaldata",size(x),2);
closing = cumsum(dailyFluct) + 20 + x/100;
StockPrice = closing';
TT = timetable(Time,StockPrice)

TT=301×1 timetable
Time StockPrice
_______ __________

0 days 21.749
1 day 21.892
2 days 22.227
3 days 21.443
4 days 21.768
5 days 21.251
6 days 22.193
7 days 23.368
8 days 21.332
9 days 20.698
10 days 22.449
11 days 22.946
12 days 24.004
13 days 25.503
14 days 26.783
15 days 24.937

Plot and label the stock price data.

plot(TT,"Time","StockPrice");
xlabel("Time (days)");
ylabel("Stock Price (dollars)");

1-5
1 Data Processing

Remove Trend

Apply detrend, which performs a linear fit to the stock prices and removes the trend. Specify to
append the detrended data to the input timetable.
TT = detrend(TT,ReplaceValues=false);

Compute the trend line by subtracting the detrended data from the input data.
trend = TT.StockPrice - TT.StockPrice_detrended;
TT = addvars(TT,trend,NewVariableNames="Trend")

TT=301×3 timetable
Time StockPrice StockPrice_detrended Trend
_______ __________ ____________________ ______

0 days 21.749 -3.7893 25.538


1 day 21.892 -3.7397 25.631
2 days 22.227 -3.4975 25.724
3 days 21.443 -4.3742 25.817
4 days 21.768 -4.1423 25.91
5 days 21.251 -4.7525 26.003
6 days 22.193 -3.9033 26.096
7 days 23.368 -2.8216 26.189
8 days 21.332 -4.9502 26.282
9 days 20.698 -5.6776 26.375
10 days 22.449 -4.0195 26.468
11 days 22.946 -3.6157 26.561

1-6
Remove Linear Trends from Timetable Data

12 days 24.004 -2.6498 26.654


13 days 25.503 -1.2442 26.747
14 days 26.783 -0.056718 26.84
15 days 24.937 -1.9958 26.933

Find the average closing price of the detrended data.

average_detrended = mean(TT.StockPrice_detrended)

average_detrended = 9.0647e-15

As expected, the detrended data has a mean very close to 0.

Visualize Detrended Data

Display the results by plotting the original daily closing stock prices, the trend line, the detrended
data, and its mean.

plot(TT,"StockPrice")
hold on
plot(TT,"Trend")
plot(TT,"StockPrice_detrended")
yline(average_detrended)
legend("Input Data","Trend","Detrended Data", ...
"Mean of Detrended Data","Location","northwest")
xlabel("Time (days)");
ylabel("Stock Price (dollars)");

1-7
1 Data Processing

See Also
Live Editor Tasks
Find and Remove Trends

Functions
detrend | gallery | plot | cumsum

1-8
Missing Data in MATLAB

Missing Data in MATLAB

Working with missing data is a common task in data preprocessing. Although sometimes missing
values signify a meaningful event in the data, they often represent unreliable or unusable data points.
In either case, MATLAB® has many options for handling missing data.

Create and Organize Missing Data

The form that missing values take in MATLAB depends on the data type. For example, numeric data
types such as double use NaN (not a number) to represent missing values.

x = [NaN 1 2 3 4];

You can also use the missing value to represent missing numeric data or data of other types, such as
datetime, string, and categorical. MATLAB automatically converts the missing value to the
data's native type.

xDouble = [missing 1 2 3 4]

xDouble = 1×5

NaN 1 2 3 4

xDatetime = [missing datetime(2014,1:4,1)]

xDatetime = 1x5 datetime


NaT 01-Jan-2014 01-Feb-2014 01-Mar-2014 01-Apr-2014

xString = [missing "a" "b" "c" "d"]

xString = 1x5 string


<missing> "a" "b" "c" "d"

xCategorical = [missing categorical({'cat1' 'cat2' 'cat3' 'cat4'})]

xCategorical = 1x5 categorical


<undefined> cat1 cat2 cat3 cat4

A data set might contain values that you want to treat as missing data, but are not standard MATLAB
missing values in MATLAB such as NaN. You can use the standardizeMissing function to convert
those values to the standard missing value for that data type. For example, treat 4 as a missing
double value in addition to NaN.

xStandard = standardizeMissing(xDouble,[4 NaN])

xStandard = 1×5

NaN 1 2 3 NaN

Suppose you want to keep missing values as part of your data set but segregate them from the rest of
the data. Several MATLAB functions enable you to control the placement of missing values before

1-9
1 Data Processing

further processing. For example, use the 'MissingPlacement' option with the sort function to
move NaNs to the end of the data.

xSort = sort(xStandard,'MissingPlacement','last')

xSort = 1×5

1 2 3 NaN NaN

Find, Replace, and Ignore Missing Data

Even if you do not explicitly create missing values in MATLAB, they can appear when importing
existing data or computing with the data. If you are not aware of missing values in your data,
subsequent computation or analysis can be misleading.

For example, if you unknowingly plot a vector containing a NaN value, the NaN does not appear
because the plot function ignores it and plots the remaining points normally.

nanData = [1:9 NaN];


plot(1:10,nanData)

However, if you compute the average of the data, the result is NaN. In this case, it is more helpful to
know in advance that the data contains a NaN, and then choose to ignore or remove it before
computing the average.

meanData = mean(nanData)

1-10
Missing Data in MATLAB

meanData = NaN

One way to find NaNs in data is by using the isnan function, which returns a logical array indicating
the location of any NaN value.
TF = isnan(nanData)

TF = 1x10 logical array

0 0 0 0 0 0 0 0 0 1

Similarly, the ismissing function returns the location of missing values in data for multiple data
types.
TFdouble = ismissing(xDouble)

TFdouble = 1x5 logical array

1 0 0 0 0

TFdatetime = ismissing(xDatetime)

TFdatetime = 1x5 logical array

1 0 0 0 0

Suppose you are working with a table or timetable made up of variables with multiple data types. You
can find all of the missing values with one call to ismissing, regardless of their type.
xTable = table(xDouble',xDatetime',xString',xCategorical')

xTable=5×4 table
Var1 Var2 Var3 Var4
____ ___________ _________ ___________

NaN NaT <missing> <undefined>


1 01-Jan-2014 "a" cat1
2 01-Feb-2014 "b" cat2
3 01-Mar-2014 "c" cat3
4 01-Apr-2014 "d" cat4

TF = ismissing(xTable)

TF = 5x4 logical array

1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

Missing values can represent unusable data for processing or analysis. Use fillmissing to replace
missing values with another value, or use rmmissing to remove missing values altogether.
xFill = fillmissing(xStandard,'constant',0)

1-11
1 Data Processing

xFill = 1×5

0 1 2 3 0

xRemove = rmmissing(xStandard)

xRemove = 1×3

1 2 3

Many MATLAB functions enable you to ignore missing values, without having to explicitly locate, fill,
or remove them first. For example, if you compute the sum of a vector containing NaN values, the
result is NaN. However, you can directly ignore NaNs in the sum by using the 'omitnan' option with
the sum function.

sumNan = sum(xDouble)

sumNan = NaN

sumOmitnan = sum(xDouble,'omitnan')

sumOmitnan = 10

See Also
ismissing | fillmissing | standardizeMissing | missing

Related Examples
• Clean Messy Data and Locate Extrema Using Live Editor Tasks on page 1-26
• “Clean Messy and Missing Data in Tables”

1-12
Data Smoothing and Outlier Detection

Data Smoothing and Outlier Detection

Data smoothing refers to techniques for eliminating unwanted noise or behaviors in data, while
outlier detection identifies data points that are significantly different from the rest of the data.

Moving Window Methods

Moving window methods are ways to process data in smaller batches at a time, typically in order to
statistically represent a neighborhood of points in the data. The moving average is a common data
smoothing technique that slides a window along the data, computing the mean of the points inside of
each window. This can help to eliminate insignificant variations from one data point to the next.

For example, consider wind speed measurements taken every minute for about 3 hours. Use the
movmean function with a window size of 5 minutes to smooth out high-speed wind gusts.

load windData.mat
mins = 1:length(speed);
window = 5;
meanspeed = movmean(speed,window);
plot(mins,speed,mins,meanspeed)
axis tight
legend("Measured Wind Speed","Average Wind Speed over 5 min Window")
xlabel("Time")
ylabel("Speed")

1-13
1 Data Processing

Similarly, you can compute the median wind speed over a sliding window using the movmedian
function.

medianspeed = movmedian(speed,window);
plot(mins,speed,mins,medianspeed)
axis tight
legend("Measured Wind Speed","Median Wind Speed over 5 min Window")
xlabel("Time")
ylabel("Speed")

Not all data is suitable for smoothing with a moving window method. For example, create a sinusoidal
signal with injected random noise.

t = 1:0.2:15;
A = sin(2*pi*t) + cos(2*pi*0.5*t);
Anoise = A + 0.5*rand(1,length(t));
plot(t,A,t,Anoise)
axis tight
legend("Original Data","Noisy Data")

1-14
Data Smoothing and Outlier Detection

Use a moving mean with a window size of 3 to smooth the noisy data.

window = 3;
Amean = movmean(Anoise,window);
plot(t,A,t,Amean)
axis tight
legend("Original Data","Moving Mean - Window Size 3")

1-15
1 Data Processing

The moving mean achieves the general shape of the data, but doesn't capture the valleys (local
minima) very accurately. Since the valley points are surrounded by two larger neighbors in each
window, the mean is not a very good approximation to those points. If you make the window size
larger, the mean eliminates the shorter peaks altogether. For this type of data, you might consider
alternative smoothing techniques.

Amean = movmean(Anoise,5);
plot(t,A,t,Amean)
axis tight
legend("Original Data","Moving Mean - Window Size 5")

1-16
Data Smoothing and Outlier Detection

Common Smoothing Methods

The smoothdata function provides several smoothing options such as the Savitzky-Golay method,
which is a popular smoothing technique used in signal processing. By default, smoothdata chooses a
best-guess window size for the method depending on the data.

Use the Savitzky-Golay method to smooth the noisy signal Anoise, and output the window size that it
uses. This method provides a better valley approximation compared to movmean.

[Asgolay,window] = smoothdata(Anoise,"sgolay");
plot(t,A,t,Asgolay)
axis tight
legend("Original Data","Savitzky-Golay","location","best")

1-17
1 Data Processing

window

window = 3

The robust Lowess method is another smoothing method that is particularly helpful when outliers are
present in the data in addition to noise. Inject an outlier into the noisy data, and use robust Lowess to
smooth the data, which eliminates the outlier.

Anoise(36) = 20;
Arlowess = smoothdata(Anoise,"rlowess",5);
plot(t,Anoise,t,Arlowess)
axis tight
legend("Noisy Data","Robust Lowess")

1-18
Data Smoothing and Outlier Detection

Detecting Outliers

Outliers in data can significantly skew data processing results and other computed quantities. For
example, if you try to smooth data containing outliers with a moving median, you can get misleading
peaks or valleys.

Amedian = smoothdata(Anoise,"movmedian");
plot(t,Anoise,t,Amedian)
axis tight
legend("Noisy Data","Moving Median")

1-19
1 Data Processing

The isoutlier function returns a logical 1 when an outlier is detected. Verify the index and value of
the outlier in Anoise.

TF = isoutlier(Anoise);
ind = find(TF)

ind = 36

Aoutlier = Anoise(ind)

Aoutlier = 20

You can replace outliers in your data by using the filloutliers function and specifying a fill
method. For example, fill the outlier in Anoise with the value of its neighbor immediately to the
right.

Afill = filloutliers(Anoise,"next");
plot(t,Anoise,t,Afill,"o-")
axis tight
legend("Noisy Data with Outlier","Noisy Data with Filled Outlier")

1-20
Data Smoothing and Outlier Detection

Alternatively, you can remove outliers from your data by using the rmoutliers function. For
example, remove the outlier in Anoise.

Aremove = rmoutliers(Anoise);
plot(t,Anoise,t(~TF),Aremove,"o-")
axis tight
legend("Noisy Data with Outlier","Noisy Data with Outlier Removed")

1-21
1 Data Processing

Nonuniform Data

Not all data consists of equally spaced points, which can affect methods for data processing. Create a
datetime vector that contains irregular sampling times for the data in Airreg. The time vector
represents samples taken every minute for the first 30 minutes, then hourly over two days.

t0 = datetime(2014,1,1,1,1,1);
timeminutes = sort(t0 + minutes(1:30));
timehours = t0 + hours(1:48);
time = [timeminutes timehours];
Airreg = rand(1,length(time));
plot(time,Airreg)
axis tight

1-22
Data Smoothing and Outlier Detection

By default, smoothdata smooths with respect to equally spaced integers, in this case, 1,2,...,78.
Since integer time stamps do not coordinate with the sampling of the points in Airreg, the first half
hour of data still appears noisy after smoothing.

Adefault = smoothdata(Airreg,"movmean",3);
plot(time,Airreg,time,Adefault)
axis tight
legend("Original Data","Smoothed Data with Default Sample Points")

1-23
1 Data Processing

Many data processing functions in MATLAB®, including smoothdata, movmean, and


filloutliers, allow you to provide sample points, ensuring that data is processed relative to its
sampling units and frequencies. To remove the high-frequency variation in the first half hour of data
in Airreg, use the SamplePoints name-value argument with the time stamps in time.

Asamplepoints = smoothdata(Airreg,"movmean", ...


hours(3),"SamplePoints",time);
plot(time,Airreg,time,Asamplepoints)
axis tight
legend("Original Data","Smoothed Data with Sample Points")

1-24
Data Smoothing and Outlier Detection

See Also
Functions
smoothdata | isoutlier | filloutliers | rmoutliers | movmean | movmedian

Live Editor Tasks


Smooth Data | Clean Outlier Data

Apps
Data Cleaner

Related Examples
• Clean Messy Data and Locate Extrema Using Live Editor Tasks on page 1-26
• “Filter Data” on page 1-33

1-25
1 Data Processing

Clean Messy Data and Locate Extrema Using Live Editor Tasks

You can interactively preprocess data using sequences of Live Editor tasks, visualizing the data at
each step. This example uses five tasks to clean noisy data with missing values and outliers in order
to identify local minima and maxima. For more information on Live Editor tasks, see “Add Interactive
Tasks to a Live Script”.

First, create and plot a vector of messy data, which contains four NaN values and five outliers.

x = 1:100;
data = cos(2*pi*0.05*x+2*pi*rand) + 0.5*randn(1,100);
data(20:20:80) = NaN;
data(10:20:90) = [-50 40 30 -45 35];

To plot the messy data, open the Create Plot task. Start by typing the keyword plot in a code block,
and then click Create Plot when it appears in the menu. Select the plot type and input data to plot
the data.

To see the code that this task generates, expand the task display by clicking at the bottom of the
task parameter area.

1-26
Clean Messy Data and Locate Extrema Using Live Editor Tasks

Fill Missing Data

To replace NaN values in the data and visualize the results, open the Clean Missing Data task. Start
by typing the keyword missing in a code block, and then click Clean Missing Data when it
appears in the menu. Select the input data and the cleaning method to plot the filled data
automatically.

To see the code that this task generates, expand the task display by clicking at the bottom of the
task parameter area.

1-27
1 Data Processing

Fill Outliers

You can now remove the outliers from the cleaned data in the previous task by using the Clean
Outlier Data task. Type the keyword outliers in a new code block and click Clean Outlier
Data to open the task. Select cleanedData as the input data. You can customize the methods for
cleaning and detecting outliers and adjust the threshold to find more or fewer outliers.

To see the code that this task generates, expand the task display by clicking at the bottom of the
task parameter area.

1-28
Clean Messy Data and Locate Extrema Using Live Editor Tasks

Smooth Data

Next, smooth the cleaned data from the previous task by using the Smooth Data task. Type the
keyword smooth and click the task when it appears. Select cleanedData2, the output from the
previous task, as the input data. Select a smoothing method, and adjust the smoothing factor for
more or less smoothing.

1-29
1 Data Processing

To see the code that this task generates, expand the task display by clicking at the bottom of the
task parameter area.

Locate Extrema

Finally, start typing the keyword extrema and click Find Local Extrema. Use smoothedData as
the input data and change the extrema type to find both the local maxima and local minima of the
cleaned, smoothed data. You can adjust the local extrema parameters to find more or fewer maxima
and minima.

1-30
Clean Messy Data and Locate Extrema Using Live Editor Tasks

To see the code that this task generates, expand the task display by clicking at the bottom of the
task parameter area.

See Also
Live Editor Tasks
Clean Missing Data | Clean Outlier Data | Find Change Points | Find Local Extrema | Smooth
Data | Find and Remove Trends

1-31
1 Data Processing

Functions
ismissing | rmmissing | fillmissing | isoutlier | filloutliers | rmoutliers | ischange |
islocalmin | islocalmax | smoothdata

Related Examples
• “Add Interactive Tasks to a Live Script”
• “Data Smoothing and Outlier Detection” on page 1-13
• “Missing Data in MATLAB” on page 1-9

1-32
Filter Data

Filter Data

Filter Difference Equation


Filters are data processing techniques that can smooth out high-frequency fluctuations in data or
remove periodic trends of a specific frequency from data. In MATLAB, the filter function filters a
vector of data x according to the following difference equation, which describes a tapped delay-line
filter.

a(1)y(n) = b(1)x(n) + b(2)x(n − 1) + … + b(Nb)x(n − Nb + 1)


−a(2)y(n − 1) − … − a(Na)y(n − Na + 1)

In this equation, a and b are vectors of coefficients of the filter, Na is the feedback filter order, and Nb
is the feedforward filter order. n is the index of the current element of x. The output y(n) is a linear
combination of the current and previous elements of x and y.

The filter function uses specified coefficient vectors a and b to filter the input data x. For more
information on difference equations describing filters, see [1].

Moving-Average Filter of Traffic Data

The filter function is one way to implement a moving-average filter, which is a common data
smoothing technique.

The following difference equation describes a filter that averages time-dependent data with respect to
the current hour and the three previous hours of data.

1 1 1 1
y(n) = x(n) + x(n − 1) + x(n − 2) + x(n − 3)
4 4 4 4

Import data that describes traffic flow over time, and assign the first column of vehicle counts to the
vector x.

load count.dat
x = count(:,1);

Create the filter coefficient vectors.

a = 1;
b = [1/4 1/4 1/4 1/4];

Compute the 4-hour moving average of the data, and plot both the original data and the filtered data.

y = filter(b,a,x);

t = 1:length(x);
plot(t,x,'--',t,y,'-')
legend('Original Data','Filtered Data')

1-33
1 Data Processing

Modify Amplitude of Data

This example shows how to modify the amplitude of a vector of data by applying a transfer function.

In digital signal processing, filters are often represented by a transfer function. The Z-transform of
the difference equation

a(1)y(n) = b(1)x(n) + b(2)x(n − 1) + . . . + b(Nb)x(n − Nb + 1)


−a(2)y(n − 1) − . . . − a(Na)y(n − Na + 1)

is the following transfer function.

−Nb + 1
b(1) + b(2)z−1 + . . . + b(Nb)z
Y(z) = H(z−1)X(z) = −Na + 1
X(z)
a(1) + a(2)z−1 + . . . + a(Na)z

Use the transfer function

b(z−1) 2 + 3z−1
H(z−1) = =
a(z−1) 1 + 0 . 2z−1

to modify the amplitude of the data in count.dat.

1-34
Filter Data

Load the data and assign the first column to the vector x.

load count.dat
x = count(:,1);

Create the filter coefficient vectors according to the transfer function H(z−1).

a = [1 0.2];
b = [2 3];

Compute the filtered data, and plot both the original data and the filtered data. This filter primarily
modifies the amplitude of the original data.

y = filter(b,a,x);

t = 1:length(x);
plot(t,x,'--',t,y,'-')
legend('Original Data','Filtered Data')

References
[1] Oppenheim, Alan V., Ronald W. Schafer, and John R. Buck. Discrete-Time Signal Processing. Upper
Saddle River, NJ: Prentice-Hall, 1999.

1-35
1 Data Processing

See Also
filter | conv | filter2 | smoothdata | movmean

Related Examples
• “Smooth Data with Convolution” on page 1-37

1-36
Smooth Data with Convolution

Smooth Data with Convolution

You can use convolution to smooth 2-D data that contains high-frequency components.

Create 2-D data using the peaks function, and plot the data at various contour levels.

Z = peaks(100);
levels = -7:1:10;
contour(Z,levels)

Inject random noise into the data and plot the noisy contours.

Znoise = Z + rand(100) - 0.5;


contour(Znoise,levels)

1-37
1 Data Processing

The conv2 function in MATLAB® convolves 2-D data with a specified kernel whose elements define
how to remove or enhance features of the original data. Kernels do not have to be the same size as
the input data. Small-sized kernels can be sufficient to smooth data containing only a few frequency
components. Larger sized kernels can provide more precision for tuning frequency response,
resulting in smoother output.

Define a 3-by-3 kernel K and use conv2 to smooth the noisy data in Znoise. Plot the smoothed
contours. The 'same' option in conv2 makes the output the same size as the input.

K = (1/9)*ones(3);
Zsmooth1 = conv2(Znoise,K,'same');
contour(Zsmooth1, levels)

1-38
Smooth Data with Convolution

Smooth the noisy data with a 5-by-5 kernel, and plot the new contours.

K = (1/25)*ones(5);
Zsmooth2 = conv2(Znoise,K,'same');
contour(Zsmooth2,levels)

1-39
1 Data Processing

See Also
conv2 | conv | filter | smoothdata

Related Examples
• “Filter Data” on page 1-33

1-40
Computing with Descriptive Statistics

Computing with Descriptive Statistics


In this section...
“Functions for Calculating Descriptive Statistics” on page 1-41
“Example: Using MATLAB Data Statistics” on page 1-42
“Data Statistics” on page 1-43

If you need more advanced statistics features, you might want to use the Statistics and Machine
Learning Toolbox™ software.

Functions for Calculating Descriptive Statistics


Use the following MATLAB functions to calculate the descriptive statistics for your data.

Note For matrix data, descriptive statistics for each column are calculated independently.

Statistics Function Summary

Function Description
max Maximum value
mean Average or mean value
median Median value
min Smallest value
mode Most frequent value
std Standard deviation
var Variance, which measures the spread or dispersion of the values

The following examples apply MATLAB functions to calculate descriptive statistics:

• “Example 1 — Calculating Maximum, Mean, and Standard Deviation” on page 1-41


• “Example 2 — Subtracting the Mean” on page 1-42

Example 1 — Calculating Maximum, Mean, and Standard Deviation

This example shows how to use MATLAB functions to calculate the maximum, mean, and standard
deviation values for a 24-by-3 matrix called count. MATLAB computes these statistics independently
for each column in the matrix.
% Load the sample data
load count.dat
% Find the maximum value in each column
mx = max(count)
% Calculate the mean of each column
mu = mean(count)
% Calculate the standard deviation of each column
sigma = std(count)

The results are

1-41
1 Data Processing

mx =
114 145 257

mu =
32.0000 46.5417 65.5833

sigma =
25.3703 41.4057 68.0281

To get the row numbers where the maximum data values occur in each data column, specify a second
output parameter indx to return the row index. For example:
[mx,indx] = max(count)

These results are


mx =
114 145 257

indx =
20 20 20

Here, the variable mx is a row vector that contains the maximum value in each of the three data
columns. The variable indx contains the row indices in each column that correspond to the maximum
values.

To find the minimum value in the entire count matrix, 24-by-3 matrix into a 72-by-1 column vector by
using the syntax count(:). Then, to find the minimum value in the single column, use the following
syntax:
min(count(:))

ans =
7

Example 2 — Subtracting the Mean

Subtract the mean from each column of the matrix by using the following syntax:
% Get the size of the count matrix
[n,p] = size(count)
% Compute the mean of each column
mu = mean(count)
% Create a matrix of mean values by
% replicating the mu vector for n rows
MeanMat = repmat(mu,n,1)
% Subtract the column mean from each element
% in that column
x = count - MeanMat

Note Subtracting the mean from the data is also called detrending. For more information about
removing the mean or the best-fit line from the data, see “Remove Linear Trends from Timetable
Data” on page 1-5.

Example: Using MATLAB Data Statistics

1-42
Computing with Descriptive Statistics

Data Statistics
The Data Statistics dialog box helps you calculate and plot descriptive statistics with the data. This
example shows how to use MATLAB Data Statistics to calculate and plot statistics for a 24-by-3
matrix, called count. The data represents how many vehicles passed by traffic counting stations on
three streets.

This section contains the following topics:

• “Calculating and Plotting Descriptive Statistics” on page 1-43


• “Formatting Data Statistics on Plots” on page 1-46
• “Saving Statistics to the MATLAB Workspace” on page 1-47
• “Generating Code Files” on page 1-48

Note MATLAB Data Statistics is available for 2-D plots only.

Calculating and Plotting Descriptive Statistics

1 Load and plot the data:

load count.dat
[n,p] = size(count);

% Define the x-values


t = 1:n;

% Plot the data and annotate the graph


plot(t,count)
legend('Station 1','Station 2','Station 3','Location','northwest')
xlabel('Time')
ylabel('Vehicle Count')

1-43
1 Data Processing

Note The legend contains the name of each data set, as specified by the legend function:
Station 1, Station 2, and Station 3. A data set refers to each column of data in the array
you plotted. If you do not name the data sets, default names are assigned: data1, data2, and so
on.
2 In the Figure window, select Tools > Data Statistics.

The Data Statistics dialog box opens and displays descriptive statistics for the X- and Y-data of
the Station 1 data set.

Note The Data Statistics dialog box displays a range, which is the difference between the
minimum and maximum values in the selected data set. The dialog box does not display the
range on the plot.
3 Select a different data set in the Data Statistics for list: Station 2.

This displays the statistics for the X and Y data of the Station 2 data set.
4 Select the check box for each statistic you want to display on the plot, and then click Save to
Workspace.

1-44
Computing with Descriptive Statistics

For example, to plot the mean of Station 2, select the mean check box in the Y column.

This plots a horizontal line to represent the mean of Station 2 and updates the legend to
include this statistic.

1-45
1 Data Processing

Formatting Data Statistics on Plots

The Data Statistics dialog box uses colors and line styles to distinguish statistics from the data on the
plot. This portion of the example shows how to customize the display of descriptive statistics on a
plot, such as the color, line width, line style, or marker.

Note Do not edit display properties of statistics until you finish plotting all the statistics with the
data. If you add or remove statistics after editing plot properties, the changes to plot properties are
lost.

To modify the display of data statistics on a plot:

1
In the MATLAB Figure window, click the (Edit Plot) button in the toolbar.

This step enables plot editing.


2 Double-click the statistic on the plot for which you want to edit display properties. For example,
double-click the horizontal line representing the mean of Station 2.

1-46
Computing with Descriptive Statistics

This step opens the Property Inspector, where you can modify the appearance of the line used to
represent this statistic.

3 In the Property Inspector window, specify the line and marker styles, sizes, and colors.

Tip Alternatively, right-click the statistic on the plot, and select an option from the shortcut
menu.

Saving Statistics to the MATLAB Workspace

Perform these steps to save the statistics to the MATLAB workspace.

Note When your plot contains multiple data sets, save statistics for each data set individually. To
display statistics for a different data set, select it from the Data Statistics for list in the Data
Statistics dialog box.

1 In the Data Statistics dialog box, click the Save to Workspace button.
2 In the Save Statistics to Workspace dialog box, select options to save statistics for either X data,
Y data, or both. Then, enter the corresponding variable names.

1-47
1 Data Processing

In this example, save only the Y data. Enter the variable name as Loc2countstats.

3 Click OK.

This step saves the descriptive statistics to a structure. The new variable is added to the MATLAB
workspace.

To view the new structure variable, type the variable name at the MATLAB prompt:

Loc2countstats

Loc2countstats =

struct with fields:

min: 9
max: 145
mean: 46.5417
median: 36
mode: 9
std: 41.4057
range: 136

Generating Code Files

This portion of the example shows how to generate a file containing MATLAB code that reproduces
the format of the plot and the plotted statistics with new data. Generating a code file is not available
in MATLAB Online™.

1 In the Figure window, select File > Generate Code.

This step creates a function code file and displays it in the MATLAB Editor.
2 Change the name of the function on the first line of the file from createfigure to something
more specific, like countplot. Save the file to your current folder with the file name
countplot.m.
3 Generate some new, random count data:

rng('default')
randcount = 300*rand(24,3);
4 Reproduce the plot with the new data and the recomputed statistics:

countplot(t,randcount)

1-48
Computing with Descriptive Statistics

1-49
2

Regression Analysis

• “Linear Correlation” on page 2-2


• “Linear Regression” on page 2-5
• “Interactive Fitting” on page 2-13
• “Programmatic Fitting” on page 2-26
2 Regression Analysis

Linear Correlation
In this section...
“Introduction” on page 2-2
“Covariance” on page 2-2
“Correlation Coefficients” on page 2-3

Introduction
Correlation quantifies the strength of a linear relationship between two variables. When there is no
correlation between two variables, then there is no tendency for the values of the variables to
increase or decrease in tandem. Two variables that are uncorrelated are not necessarily independent,
however, because they might have a nonlinear relationship.

You can use linear correlation to investigate whether a linear relationship exists between variables
without having to assume or fit a specific model to your data. Two variables that have a small or no
linear correlation might have a strong nonlinear relationship. However, calculating linear correlation
before fitting a model is a useful way to identify variables that have a simple relationship. Another
way to explore how variables are related is to make scatter plots of your data.

Covariance quantifies the strength of a linear relationship between two variables in units relative to
their variances. Correlations are standardized covariances, giving a dimensionless quantity that
measures the degree of a linear relationship, separate from the scale of either variable.

The following MATLAB functions compute sample correlation coefficients and covariance. These
sample coefficients are estimates of the true covariance and correlation coefficients of the population
from which the data sample is drawn.

Function Description
corrcoef Correlation coefficient matrix
cov Covariance matrix
xcorr Cross-correlation sequence of a random process (includes autocorrelation)

Covariance
Use the MATLAB cov function to calculate the sample covariance matrix for a data matrix (where
each column represents a separate quantity).

The sample covariance matrix has the following properties:

• cov(X) is symmetric.
• diag(cov(X)) is a vector of variances for each data column. The variances represent a measure
of the spread or dispersion of data in the corresponding column. (The var function calculates
variance.)
• sqrt(diag(cov(X))) is a vector of standard deviations. (The std function calculates standard
deviation.)
• The off-diagonal elements of the covariance matrix represent the covariances between the
individual data columns.

2-2
Linear Correlation

Here, X can be a vector or a matrix. For an m-by-n matrix, the covariance matrix is n-by-n.

For an example of calculating the covariance, load the sample data in count.dat that contains a 24-
by-3 matrix:

load count.dat

Calculate the covariance matrix for this data:

cov(count)

MATLAB responds with the following result:

ans =
1.0e+003 *
0.6437 0.9802 1.6567
0.9802 1.7144 2.6908
1.6567 2.6908 4.6278

The covariance matrix for this data has the following form:

s211 s212 s213


s221 s222 s223
s231 s232 s233
s2i j = s2 ji

Here, s2ij is the sample covariance between column i and column j of the data. Because the count
matrix contains three columns, the covariance matrix is 3-by-3.

Note In the special case when a vector is the argument of cov, the function returns the variance.

Correlation Coefficients
The function corrcoef produces a matrix of sample correlation coefficients for a data matrix (where
each column represents a separate quantity). The correlation coefficients range from -1 to 1, where

• Values close to 1 indicate that there is a positive linear relationship between the data columns.
• Values close to -1 indicate that one column of data has a negative linear relationship to another
column of data (anticorrelation).
• Values close to or equal to 0 suggest there is no linear relationship between the data columns.

For an m-by-n matrix, the correlation-coefficient matrix is n-by-n. The arrangement of the elements in
the correlation coefficient matrix corresponds to the location of the elements in the covariance
matrix, as described in “Covariance” on page 2-2.

For an example of calculating correlation coefficients, load the sample data in count.dat that
contains a 24-by-3 matrix:

load count.dat

Type the following syntax to calculate the correlation coefficients:

2-3
2 Regression Analysis

corrcoef(count)

This results in the following 3-by-3 matrix of correlation coefficients:

ans =
1.0000 0.9331 0.9599
0.9331 1.0000 0.9553
0.9599 0.9553 1.0000

Because all correlation coefficients are close to 1, there is a strong positive correlation between each
pair of data columns in the count matrix.

2-4
Linear Regression

Linear Regression
In this section...
“Introduction” on page 2-5
“Simple Linear Regression” on page 2-5
“Residuals and Goodness of Fit” on page 2-9
“Fitting Data with Curve Fitting Toolbox Functions” on page 2-11

Introduction
A data model explicitly describes a relationship between predictor and response variables. Linear
regression fits a data model that is linear in the model coefficients. The most common type of linear
regression is a least-squares fit, which can fit both lines and polynomials, among other linear models.

Before you model the relationship between pairs of quantities, it is a good idea to perform correlation
analysis to establish if a linear relationship exists between these quantities. Be aware that variables
can have nonlinear relationships, which correlation analysis cannot detect. For more information, see
“Linear Correlation” on page 2-2.

The MATLAB Basic Fitting UI helps you to fit your data, so you can calculate model coefficients and
plot the model on top of the data. For an example, see “Example: Using Basic Fitting UI” on page 2-
14. You also can use the MATLAB polyfit and polyval functions to fit your data to a model that is
linear in the coefficients. For an example, see “Programmatic Fitting” on page 2-28.

If you need to fit data with a nonlinear model, transform the variables to make the relationship linear.
Alternatively, try to fit a nonlinear function directly using either the Statistics and Machine Learning
Toolbox nlinfit function, the Optimization Toolbox™ lsqcurvefit function, or by applying
functions in the Curve Fitting Toolbox™.

This topic explains how to:

• Perform simple linear regression using the \ operator.


• Use correlation analysis to determine whether two quantities are related to justify fitting the data.
• Fit a linear model to the data.
• Evaluate the goodness of fit by plotting residuals and looking for patterns.
• Calculate measures of goodness of fit R2 and adjusted R2

Simple Linear Regression

This example shows how to perform simple linear regression using the accidents dataset. The
example also shows you how to calculate the coefficient of determination R2 to evaluate the
regressions. The accidents dataset contains data for fatal traffic accidents in U.S. states.

Linear regression models the relation between a dependent, or response, variable y and one or more
independent, or predictor, variables x1, . . . , xn. Simple linear regression considers only one
independent variable using the relation

y = β0 + β1x + ϵ,

2-5
2 Regression Analysis

where β0 is the y-intercept, β1 is the slope (or regression coefficient), and ϵ is the error term.

Start with a set of n observed values of x and y given by (x1, y1), (x2, y2), ..., (xn, yn). Using the simple
linear regression relation, these values form a system of linear equations. Represent these equations
in matrix form as

y1 1 x1
y2 1 x2 β0
= .
⋮ ⋮ ⋮ β1
yn 1 xn

Let

y1 1 x1
y2 1 x2 β0
Y= ,X = ,B = .
⋮ ⋮ ⋮ β1
yn 1 xn

The relation is now Y = XB.

In MATLAB, you can find B using the mldivide operator as B = X\Y.

From the dataset accidents, load accident data in y and state population data in x. Find the linear
regression relation y = β1x between the accidents in a state and the population of a state using the \
operator. The \ operator performs a least-squares regression.

load accidents
x = hwydata(:,14); %Population of states
y = hwydata(:,4); %Accidents per state
format long
b1 = x\y

b1 =
1.372716735564871e-04

b1 is the slope or regression coefficient. The linear relation is y = β1x = 0 . 0001372x.

Calculate the accidents per state yCalc from x using the relation. Visualize the regression by
plotting the actual values y and the calculated values yCalc.

yCalc1 = b1*x;
scatter(x,y)
hold on
plot(x,yCalc1)
xlabel('Population of state')
ylabel('Fatal traffic accidents per state')
title('Linear Regression Relation Between Accidents & Population')
grid on

2-6
Linear Regression

Improve the fit by including a y-intercept β0 in your model as y = β0 + β1x. Calculate β0 by padding x
with a column of ones and using the \ operator.

X = [ones(length(x),1) x];
b = X\y

b = 2×1
102 ×

1.427120171726537
0.000001256394274

This result represents the relation y = β0 + β1x = 142 . 7120 + 0 . 0001256x.

Visualize the relation by plotting it on the same figure.

yCalc2 = X*b;
plot(x,yCalc2,'--')
legend('Data','Slope','Slope & Intercept','Location','best');

2-7
2 Regression Analysis

From the figure, the two fits look similar. One method to find the better fit is to calculate the
coefficient of determination, R2. R2 is one measure of how well a model can predict the data, and falls
between 0 and 1. The higher the value of R2, the better the model is at predicting the data.

Where y represents the calculated values of y and y‾ is the mean of y, R2 is defined as

2
∑in= 1 yi − yi
R2 = 1 − 2
.
∑in= 1 yi − y‾

Find the better fit of the two fits by comparing values of R2. As the R2 values show, the second fit that
includes a y-intercept is better.

Rsq1 = 1 - sum((y - yCalc1).^2)/sum((y - mean(y)).^2)

Rsq1 =
0.822235650485566

Rsq2 = 1 - sum((y - yCalc2).^2)/sum((y - mean(y)).^2)

Rsq2 =
0.838210531103428

2-8
Linear Regression

Residuals and Goodness of Fit


Residuals are the difference between the observed values of the response (dependent) variable and
the values that a model predicts. When you fit a model that is appropriate for your data, the residuals
approximate independent random errors. That is, the distribution of residuals ought not to exhibit a
discernible pattern.

Producing a fit using a linear model requires minimizing the sum of the squares of the residuals. This
minimization yields what is called a least-squares fit. You can gain insight into the “goodness” of a fit
by visually examining a plot of the residuals. If the residual plot has a pattern (that is, residual data
points do not appear to have a random scatter), the randomness indicates that the model does not
properly fit the data.

Evaluate each fit you make in the context of your data. For example, if your goal of fitting the data is
to extract coefficients that have physical meaning, then it is important that your model reflect the
physics of the data. Understanding what your data represents, how it was measured, and how it is
modeled is important when evaluating the goodness of fit.

One measure of goodness of fit is the coefficient of determination, or R2 (pronounced r-square). This
statistic indicates how closely values you obtain from fitting a model match the dependent variable
the model is intended to predict. Statisticians often define R2 using the residual variance from a fitted
model:

R2 = 1 – SSresid / SStotal

SSresid is the sum of the squared residuals from the regression. SStotal is the sum of the squared
differences from the mean of the dependent variable (total sum of squares). Both are positive scalars.

To learn how to compute R2 when you use the Basic Fitting tool, see “R2, the Coefficient of
Determination” on page 2-19. To learn more about calculating the R2 statistic and its multivariate
generalization, continue reading here.

Example: Computing R2 from Polynomial Fits

You can derive R2 from the coefficients of a polynomial regression to determine how much variance in
y a linear model explains, as the following example describes:

1 Create two variables, x and y, from the first two columns of the count variable in the data file
count.dat:
load count.dat
x = count(:,1);
y = count(:,2);
2 Use polyfit to compute a linear regression that predicts y from x:
p = polyfit(x,y,1)

p =
1.5229 -2.1911

p(1) is the slope and p(2) is the intercept of the linear predictor. You can also obtain regression
coefficients using the Basic Fitting UI on page 2-13.
3 Call polyval to use p to predict y, calling the result yfit:
yfit = polyval(p,x);

2-9
2 Regression Analysis

Using polyval saves you from typing the fit equation yourself, which in this case looks like:

yfit = p(1) * x + p(2);


4 Compute the residual values as a vector of signed numbers:

yresid = y - yfit;
5 Square the residuals and total them to obtain the residual sum of squares:

SSresid = sum(yresid.^2);
6 Compute the total sum of squares of y by multiplying the variance of y by the number of
observations minus 1:

SStotal = (length(y)-1) * var(y);


7 Compute R2 using the formula given in the introduction of this topic:

rsq = 1 - SSresid/SStotal

rsq =
0.8707

This demonstrates that the linear equation 1.5229 * x -2.1911 predicts 87% of the variance
in the variable y.

Computing Adjusted R2 for Polynomial Regressions

You can usually reduce the residuals in a model by fitting a higher degree polynomial. When you add
more terms, you increase the coefficient of determination, R2. You get a closer fit to the data, but at
the expense of a more complex model, for which R2 cannot account. However, a refinement of this
statistic, adjusted R2, does include a penalty for the number of terms in a model. Adjusted R2,
therefore, is more appropriate for comparing how different models fit to the same data. The adjusted
R2 is defined as:

R2adjusted = 1 - (SSresid / SStotal)*((n-1)/(n-d-1))

where n is the number of observations in your data, and d is the degree of the polynomial. (A linear fit
has a degree of 1, a quadratic fit 2, a cubic fit 3, and so on.)

The following example repeats the steps of the previous example, “Example: Computing R2 from
Polynomial Fits” on page 2-9, but performs a cubic (degree 3) fit instead of a linear (degree 1) fit.
From the cubic fit, you compute both simple and adjusted R2 values to evaluate whether the extra
terms improve predictive power:

1 Create two variables, x and y, from the first two columns of the count variable in the data file
count.dat:

load count.dat
x = count(:,1);
y = count(:,2);
2 Call polyfit to generate a cubic fit to predict y from x:

p = polyfit(x,y,3)

p =
-0.0003 0.0390 0.2233 6.2779

2-10
Linear Regression

p(4) is the intercept of the cubic predictor. You can also obtain regression coefficients using the
Basic Fitting UI on page 2-13.
3 Call polyval to use the coefficients in p to predict y, naming the result yfit:

yfit = polyval(p,x);

polyval evaluates the explicit equation you could manually enter as:

yfit = p(1) * x.^3 + p(2) * x.^2 + p(3) * x + p(4);


4 Compute the residual values as a vector of signed numbers:

yresid = y - yfit;
5 Square the residuals and total them to obtain the residual sum of squares:

SSresid = sum(yresid.^2);
6 Compute the total sum of squares of y by multiplying the variance of y by the number of
observations minus 1:

SStotal = (length(y)-1) * var(y);


7 Compute simple R2 for the cubic fit using the formula given in the introduction of this topic:

rsq = 1 - SSresid/SStotal

rsq =
0.9083
8 Finally, compute adjusted R2 to account for degrees of freedom:
rsq_adj = 1 - SSresid/SStotal * (length(y)-1)/(length(y)-length(p))

rsq_adj =
0.8945

The adjusted R2, 0.8945, is smaller than simple R2, .9083. It provides a more reliable estimate of
the power of your polynomial model to predict.

In many polynomial regression models, adding terms to the equation increases both R2 and adjusted
R2. In the preceding example, using a cubic fit increased both statistics compared to a linear fit. (You
can compute adjusted R2 for the linear fit for yourself to demonstrate that it has a lower value.)
However, it is not always true that a linear fit is worse than a higher-order fit: a more complicated fit
can have a lower adjusted R2 than a simpler fit, indicating that the increased complexity is not
justified. Also, while R2 always varies between 0 and 1 for the polynomial regression models that the
Basic Fitting tool generates, adjusted R2 for some models can be negative, indicating that a model
that has too many terms.

Correlation does not imply causality. Always interpret coefficients of correlation and determination
cautiously. The coefficients only quantify how much variance in a dependent variable a fitted model
removes. Such measures do not describe how appropriate your model—or the independent variables
you select—are for explaining the behavior of the variable the model predicts.

Fitting Data with Curve Fitting Toolbox Functions


The Curve Fitting Toolbox software extends core MATLAB functionality by enabling the following
data-fitting capabilities:

2-11
2 Regression Analysis

• Linear and nonlinear parametric fitting, including standard linear least squares, nonlinear least
squares, weighted least squares, constrained least squares, and robust fitting procedures
• Nonparametric fitting
• Statistics for determining the goodness of fit
• Extrapolation, differentiation, and integration
• Dialog box that facilitates data sectioning and smoothing
• Saving fit results in various formats, including MATLAB code files, MAT-files, and workspace
variables

For more information, see the Curve Fitting Toolbox documentation.

2-12
Interactive Fitting

Interactive Fitting
In this section...
“Basic Fitting UI” on page 2-13
“Preparing for Basic Fitting” on page 2-13
“Opening the Basic Fitting UI” on page 2-13
“Example: Using Basic Fitting UI” on page 2-14

Basic Fitting UI
The MATLAB Basic Fitting UI allows you to interactively:

• Model data using a spline interpolant, a shape-preserving interpolant, or a polynomial up to the


tenth degree
• Plot one or more fits together with data
• Plot the residuals of the fits
• Compute model coefficients
• Compute the norm of the residuals (a statistic you can use to analyze how well a model fits your
data)
• Use the model to interpolate or extrapolate outside of the data
• Save coefficients and computed values to the MATLAB workspace for use outside of the dialog box
• Generate MATLAB code to recompute fits and reproduce plots with new data

Note The Basic Fitting UI is only available for 2-D plots. For more advanced fitting and regression
analysis, see the Curve Fitting Toolbox documentation and the Statistics and Machine Learning
Toolbox documentation.

Preparing for Basic Fitting


The Basic Fitting UI sorts your data in ascending order before fitting. If your data set is large and the
values are not sorted in ascending order, it will take longer for the Basic Fitting UI to preprocess your
data before fitting.

You can speed up the Basic Fitting UI by first sorting your data. To create sorted vectors x_sorted
and y_sorted from data vectors x and y, use the MATLAB sort function:
[x_sorted, i] = sort(x);
y_sorted = y(i);

Opening the Basic Fitting UI

To use the Basic Fitting UI, you must first plot your data in a figure window, using any MATLAB
plotting command that produces (only) x and y data.

To open the Basic Fitting UI, select Tools > Basic Fitting from the menus at the top of the figure
window.

2-13
2 Regression Analysis

Example: Using Basic Fitting UI

This example shows how to use the Basic Fitting UI to fit, visualize, analyze, save, and generate code
for polynomial regressions.

Load and Plot Census Data

The file census.mat contains U.S. population data for the years 1790 through 1990 at 10 year
intervals.

To load and plot the data, type the following commands at the MATLAB prompt:

load census
plot(cdate,pop,'ro')

The load command adds the following variables to the MATLAB workspace:

• cdate — A column vector containing the years from 1790 to 1990 in increments of 10. It is the
predictor variable.
• pop — A column vector with U.S. population for each year in cdate. It is the response variable.

The data vectors are sorted in ascending order, by year. The plot shows the population as a function
of year.

Now you are ready to fit an equation the data to model population growth over time.

Predict the Census Data with a Cubic Polynomial Fit

1 Open the Basic Fitting dialog box by selecting Tools > Basic Fitting in the Figure window.
2 In the TYPES OF FIT area of the Basic Fitting dialog box, select the Cubic check box to fit a
cubic polynomial to the data.

2-14
Interactive Fitting

MATLAB uses your selection to fit the data, and adds the cubic regression line to the graph as
follows.

2-15
2 Regression Analysis

In computing the fit, MATLAB encounters problems and issues the following warning:

This warning indicates that the computed coefficients for the model are sensitive to random
errors in the response (the measured population). It also suggests some things you can do to get
a better fit.
3 Continue to use a cubic fit. As you cannot add new observations to the census data, improve the
fit by transforming the values you have to z-scores before recomputing a fit. Select the Center
and scale x-axis data check box in the top right of the dialog box to make the Basic Fitting tool
perform the transformation.

To learn how centering and scaling data works, see “Learn How the Basic Fitting Tool Computes
Fits” on page 2-23.

2-16
Interactive Fitting

4 Under ERROR ESTIMATION (RESIDUALS), select the Norm of residuals check box. Select
Bar as the Plot Style.

Selecting these options creates a subplot of residuals as a bar graph.

2-17
2 Regression Analysis

The cubic fit is a poor predictor before the year 1790, where it indicates a decreasing population. The
model seems to approximate the data reasonably well after 1790. However, a pattern in the residuals
shows that the model does not meet the assumption of normal error, which is a basis for the least-
squares fitting. The data 1 line identified in the legend are the observed x (cdate) and y (pop) data
values. The Cubic regression line presents the fit after centering and scaling data values. Notice that
the figure shows the original data units, even though the tool computes the fit using transformed z-
scores.

For comparison, try fitting another polynomial equation to the census data by selecting it in the
TYPES OF FIT area.

View and Save the Cubic Fit Parameters

In the Basic Fitting dialog box, click the Expand Results button to display the estimated
coefficients and the norm of residuals.

2-18
Interactive Fitting

Save the fit data to the MATLAB workspace by clicking the Export to Workspace button on the
Numerical results panel. The Save Fit to Workspace dialog box opens.

With all check boxes selected, click OK to save the fit parameters as a MATLAB structure fit:

fit

fit =

struct with fields:

type: 'polynomial degree 3'


coeff: [0.9210 25.1834 73.8598 61.7444]

Now, you can use the fit results in MATLAB programming, outside of the Basic Fitting UI.

R2, the Coefficient of Determination

You can get an indication of how well a polynomial regression predicts your observed data by
computing the coefficient of determination, or R-square (written as R2). The R2 statistic, which ranges
from 0 to 1, measures how useful the independent variable is in predicting values of the dependent
variable:

• An R2 value near 0 indicates that the fit is not much better than the model y = constant.
• An R2 value near 1 indicates that the independent variable explains most of the variability in the
dependent variable.

R2 is computed from the residuals, the signed differences between an observed dependent value and
the value your fit predicts for it.

residuals = yobserved - yfitted (2-1)

The R2 number for the cubic fit in this example, 0.9988, is located under FIT RESULTS in the Basic
Fitting dialog.

2-19
2 Regression Analysis

To compare the R2 number for the cubic fit to a linear least-squares fit, select Linear under TYPES
OF FIT and obtain the R2 number, 0.921. This result indicates that a linear least-squares fit of the
population data explains 92.1% of its variance. As the cubic fit of this data explains 99.9% of that
variance, the latter seems to be a better predictor. However, because a cubic fit predicts using three
variables (x, x2, and x3), a basic R2 value does not fully reflect how robust the fit is. A more
appropriate measure for evaluating the goodness of multivariate fits is adjusted R2. For information
about computing and using adjusted R2, see “Residuals and Goodness of Fit” on page 2-9.

Interpolate and Extrapolate Population Values

Suppose you want to use the cubic model to interpolate the U.S. population in 1965 (a date not
provided in the original data).

In the Basic Fitting dialog box, under INTERPOLATE / EXTRAPOLATE DATA, enter the X value
1965 and check the Plot evaluated data box.

Note Use unscaled and uncentered X values. You do not need to center and scale first, even though
you selected to scale X values to obtain the coefficients in “Predict the Census Data with a Cubic
Polynomial Fit” on page 2-14. The Basic Fitting tool makes the necessary adjustments behind the
scenes.

2-20
Interactive Fitting

The X values and the corresponding values for f(X) are computed from the fit and plotted as follows:

2-21
2 Regression Analysis

Generate a Code File to Reproduce the Result

After completing a Basic Fitting session, you can generate MATLAB code that recomputes fits and
reproduces plots with new data.

1 In the Figure window, select File > Generate Code.

This creates a function and displays it in the MATLAB Editor. The code shows you how to
programmatically reproduce what you did interactively with the Basic Fitting dialog box.
2 Change the name of the function on the first line from createfigure to something more
specific, like censusplot. Save the code file to your current folder with the file name
censusplot.m The function begins with:

function censusplot(X1, Y1, valuesToEvaluate1)


3 Generate some new, randomly perturbed census data:

rng('default')
randpop = pop + 10*randn(size(pop));
4 Reproduce the plot with the new data and recompute the fit:

censusplot(cdate,randpop,1965)

2-22
Interactive Fitting

You need three input arguments: x,y values (data 1) plotted in the original graph, plus an x-
value for a marker.

The following figure displays the plot that the generated code produces. The new plot matches
the appearance of the figure from which you generated code except for the y data values, the
equation for the cubic fit, and the residual values in the bar graph, as expected.

Learn How the Basic Fitting Tool Computes Fits

The Basic Fitting tool calls the polyfit function to compute polynomial fits. It calls the polyval
function to evaluate the fits. polyfit analyzes its inputs to determine if the data is well conditioned
for the requested degree of fit.

When it finds badly conditioned data, polyfit computes a regression as well as it can, but it also
returns a warning that the fit could be improved. The Basic Fitting example section “Predict the
Census Data with a Cubic Polynomial Fit” on page 2-14 displays this warning.

One way to improve model reliability is to add data points. However, adding observations to a data set
is not always feasible. An alternative strategy is to transform the predictor variable to normalize its
center and scale. (In the example, the predictor is the vector of census dates.)

The polyfit function normalizes by computing z-scores:

2-23
2 Regression Analysis

x−μ
z=
σ

where x is the predictor data, μ is the mean of x, and σ is the standard deviation of x. The z-scores
give the data a mean of 0 and a standard deviation of 1. In the Basic Fitting UI, you transform the
predictor data to z-scores by selecting the Center and scale x-axis data check box.

After centering and scaling, model coefficients are computed for the y data as a function of z. These
are different (and more robust) than the coefficients computed for y as a function of x. The form of
the model and the norm of the residuals do not change. The Basic Fitting UI automatically rescales
the z-scores so that the fit plots on the same scale as the original x data.

To understand the way in which the centered and scaled data is used as an intermediary to create the
final plot, run the following code in the Command Window:

close
load census
x = cdate;
y = pop;
z = (x-mean(x))/std(x); % Compute z-scores of x data

plot(x,y,'ro') % Plot data as red markers


hold on % Prepare axes to accept new graph on top

zfit = linspace(z(1),z(end),100);
pz = polyfit(z,y,3); % Compute conditioned fit
yfit = polyval(pz,zfit);

xfit = linspace(x(1),x(end),100);
plot(xfit,yfit,'b-') % Plot conditioned fit vs. x data

The centered and scaled cubic polynomial plots as a blue line, as shown here:

2-24
Interactive Fitting

In the code, computation of z illustrates how to normalize data. The polyfit function performs the
transformation itself if you provide three return arguments when calling it:

[p,S,mu] = polyfit(x,y,n)

The returned regression parameters, p, now are based on normalized x. The returned vector, mu,
contains the mean and standard deviation of x. For more information, see the polyfit reference
page.

2-25
2 Regression Analysis

Programmatic Fitting
In this section...
“MATLAB Functions for Polynomial Models” on page 2-26
“Linear Model with Nonpolynomial Terms” on page 2-26
“Multiple Regression” on page 2-27
“Programmatic Fitting” on page 2-28

MATLAB Functions for Polynomial Models


Two MATLAB functions can model your data with a polynomial.

Polynomial Fit Functions

Function Description
polyfit polyfit(x,y,n) finds the coefficients of a polynomial p(x) of degree n
that fits the y data by minimizing the sum of the squares of the deviations
of the data from the model (least-squares fit).
polyval polyval(p,x) returns the value of a polynomial of degree n that was
determined by polyfit, evaluated at x.

If you are trying to model a physical situation, it is always important to consider whether a model of a
specific order is meaningful in your situation.

Linear Model with Nonpolynomial Terms

This example shows how to fit data with a linear model containing nonpolynomial terms.

When a polynomial function does not produce a satisfactory model of your data, you can try using a
linear model with nonpolynomial terms. For example, consider the following function that is linear in
the parameters a0, a1, and a2, but nonlinear in the t data:

y = a0 + a1e−t + a2te−t .

You can compute the unknown coefficients a0, a1, and a2 by constructing and solving a set of
simultaneous equations and solving for the parameters. The following syntax accomplishes this by
forming a design matrix, where each column represents a variable used to predict the response (a
term in the model) and each row corresponds to one observation of those variables.

Enter t and y as column vectors.

t = [0 0.3 0.8 1.1 1.6 2.3]';


y = [0.6 0.67 1.01 1.35 1.47 1.25]';

Form the design matrix.

X = [ones(size(t)) exp(-t) t.*exp(-t)];

Calculate model coefficients.

2-26
Programmatic Fitting

a = X\y

a = 3×1

1.3983
-0.8860
0.3085

Therefore, the model of the data is given by

y = 1 . 3983 − 0 . 8860e−t + 0 . 3085te−t .

Now evaluate the model at regularly spaced points and plot the model with the original data.

T = (0:0.1:2.5)';
Y = [ones(size(T)) exp(-T) T.*exp(-T)]*a;
plot(T,Y,'-',t,y,'o'), grid on
title('Plot of Model and Original Data')

Multiple Regression

This example shows how to use multiple regression to model data that is a function of more than one
predictor variable.

2-27
2 Regression Analysis

When y is a function of more than one predictor variable, the matrix equations that express the
relationships among the variables must be expanded to accommodate the additional data. This is
called multiple regression.

Measure a quantity y for several values of x1 and x2. Store these values in vectors x1, x2, and y,
respectively.

x1 = [.2 .5 .6 .8 1.0 1.1]';


x2 = [.1 .3 .4 .9 1.1 1.4]';
y = [.17 .26 .28 .23 .27 .24]';

A model of this data is of the form

y = a0 + a1x1 + a2x2 .

Multiple regression solves for unknown coefficients a0, a1, and a2 by minimizing the sum of the
squares of the deviations of the data from the model (least-squares fit).

Construct and solve the set of simultaneous equations by forming a design matrix, X.

X = [ones(size(x1)) x1 x2];

Solve for the parameters by using the backslash operator.

a = X\y

a = 3×1

0.1018
0.4844
-0.2847

The least-squares fit model of the data is

y = 0 . 1018 + 0 . 4844x1 − 0 . 2847x2 .

To validate the model, find the maximum of the absolute value of the deviation of the data from the
model.

Y = X*a;
MaxErr = max(abs(Y - y))

MaxErr = 0.0038

This value is much smaller than any of the data values, indicating that this model accurately follows
the data.

Programmatic Fitting

This example shows how to use MATLAB functions to:

• “Calculate Correlation Coefficients” on page 2-29

2-28
Programmatic Fitting

• “Fit a Polynomial to the Data” on page 2-30


• “Plot and Calculate Confidence Bounds” on page 2-32

Load sample census data from census.mat, which contains U.S. population data from the years 1790
to 1990.
load census

This adds the following two variables to the MATLAB workspace.

• cdate is a column vector containing the years 1790 to 1990 in increments of 10.
• pop is a column vector with the U.S. population numbers corresponding to each year in cdate.

Plot the data.


plot(cdate,pop,'ro')
title('U.S. Population from 1790 to 1990')

The plot shows a strong pattern, which indicates a high correlation between the variables.

Calculate Correlation Coefficients

In this portion of the example, you determine the statistical correlation between the variables cdate
and pop to justify modeling the data. For more information about correlation coefficients, see “Linear
Correlation” on page 2-2.

Calculate the correlation-coefficient matrix.

2-29
2 Regression Analysis

corrcoef(cdate,pop)

ans = 2×2

1.0000 0.9597
0.9597 1.0000

The diagonal matrix elements represent the perfect correlation of each variable with itself and are
equal to 1. The off-diagonal elements are very close to 1, indicating that there is a strong statistical
correlation between the variables cdate and pop.

Fit a Polynomial to the Data

This portion of the example applies the polyfit and polyval MATLAB functions to model the data.

Calculate fit parameters.

[p,ErrorEst] = polyfit(cdate,pop,2);

Evaluate the fit.

pop_fit = polyval(p,cdate,ErrorEst);

Plot the data and the fit.

plot(cdate,pop_fit,'-',cdate,pop,'+');
title('U.S. Population from 1790 to 1990')
legend('Polynomial Model','Data','Location','NorthWest');
xlabel('Census Year');
ylabel('Population (millions)');

2-30
Programmatic Fitting

The plot shows that the quadratic-polynomial fit provides a good approximation to the data.

Calculate the residuals for this fit.

res = pop - pop_fit;


figure, plot(cdate,res,'+')
title('Residuals for the Quadratic Polynomial Model')

2-31
2 Regression Analysis

Notice that the plot of the residuals exhibits a pattern, which indicates that a second-degree
polynomial might not be appropriate for modeling this data.

Plot and Calculate Confidence Bounds

Confidence bounds are confidence intervals for a predicted response. The width of the interval
indicates the degree of certainty of the fit.

This portion of the example applies polyfit and polyval to the census sample data to produce
confidence bounds for a second-order polynomial model.

The following code uses an interval of ±2Δ, which corresponds to a 95% confidence interval for large
samples.

Evaluate the fit and the prediction error estimate (delta).


[pop_fit,delta] = polyval(p,cdate,ErrorEst);

Plot the data, the fit, and the confidence bounds.


plot(cdate,pop,'+',...
cdate,pop_fit,'g-',...
cdate,pop_fit+2*delta,'r:',...
cdate,pop_fit-2*delta,'r:');
xlabel('Census Year');
ylabel('Population (millions)');
title('Quadratic Polynomial Fit with Confidence Bounds')
grid on

2-32
Programmatic Fitting

The 95% interval indicates that you have a 95% chance that a new observation will fall within the
bounds.

2-33
3

Time Series Analysis


3 Time Series Analysis

Time Series Objects and Collections

The recommended way to store time series data is with timetable, which has a broad set of
supporting functions for preprocessing, restructuring, and analysis. To get started with timetables,
see “Create Timetables”.

Some existing code uses these objects, described in this topic:

• timeseries — Stores numeric data and time values, as well as the metadata information that
includes units, events, data quality, and interpolation method.
• tscollection — Stores a collection of timeseries objects that share a common time vector,
convenient for performing operations on synchronized time series with different units.

Data Samples in timeseries

Consider data that consists of three sensor signals: two signals represent the position of an object in
meters, and the third represents its velocity in meters/second. NaN represents a missing data value.

x = [-0.2 -0.3 13;


-0.1 -0.4 15;
NaN 2.8 17;
0.5 0.3 NaN;
-0.3 -0.1 15];

The first two columns of x contain quantities with the same units, and you can create a multivariate
timeseries object to store these two time series.

ts_pos = timeseries(x(:,1:2),1:5,"name","Position")

timeseries

Common Properties:
Name: 'Position'
Time: [5x1 double]
TimeInfo: tsdata.timemetadata
Data: [5x2 double]
DataInfo: tsdata.datametadata

A data sample consists of one or more values associated with a specific time in the timeseries
object. The number of data samples in a time series is the same as the length of the time vector,
which is 5 in this example. To find the size of the data sample, use getdatasamplesize.

getdatasamplesize(ts_pos)

ans = 1×2

1 2

You can create a second timeseries object to store the velocity data.

ts_vel = timeseries(x(:,3),1:5,"name","Velocity");

If you want to perform operations on ts_pos and ts_vel while keeping them synchronized, group
them in a collection. For more information, see Time Series Collections on page 3-4.

3-2
Time Series Objects and Collections

Creating Time Series Objects

The sample data in count.dat has 24 rows and three columns. Each column represents hourly
vehicle counts at each of three town intersections.

Load the data and create three timeseries objects to store the data collected at each intersection.

load count.dat
count1 = timeseries(count(:,1),1:24,"name","Intersection1");
count2 = timeseries(count(:,2),1:24,"name","Intersection2");
count3 = timeseries(count(:,3),1:24,"name","Intersection3");

Alternatively, when all time series have the same data units and you want to keep them synchronized
during calculations, create a single object.

count_ts = timeseries(count,1:24,"name","traffic_counts");

Modifying Units and Interpolation Method

By default, a time series has a time vector with units of seconds and a start time of 0 sec, and the
series uses linear interpolation.

Modify the time units to be hours for the three time series.

count1.TimeInfo.Units = "hours";
count2.TimeInfo.Units = "hours";
count3.TimeInfo.Units = "hours";

Change the data units for count1 to cars.

count1.DataInfo.Units = "cars";

Set the interpolation method for count1 to zero-order hold. The other time series use the default
method, linear interpolation.

count1.DataInfo.Interpolation = tsdata.interpolation("zoh");

View the modified data properties.

count1.DataInfo

tsdata.datametadata
Package: tsdata

Common Properties:
Units: 'cars'
Interpolation: zoh (tsdata.interpolation)

Defining Events

Events mark the data at specific times. Events also provide a convenient way to synchronize multiple
time series.

Add two events to each series that mark the times of the AM commute and PM commute.

e1 = tsdata.event("AMCommute",8);
e1.Units = "hours";
count1 = addevent(count1,e1);

3-3
3 Time Series Analysis

count2 = addevent(count2,e1);
count3 = addevent(count3,e1);

e2 = tsdata.event("PMCommute",18);
e2.Units = "hours";
count1 = addevent(count1,e2);
count2 = addevent(count2,e2);
count3 = addevent(count3,e2);

Plot the first time series. Red circle markers indicate the events.
plot(count1)

Time Series Collections

A collection is a group of synchronized time series. The time vectors of the timeseries objects in a
collection must match. Each individual time series in a collection is called a member. Typically, you
use collections for time series that have different data units. In this simple example, all time series
have the same units.
tsc = tscollection({count1,count2,count3},"name","count_coll")

Time Series Collection Object: count_coll

Time vector characteristics

Start time 1 hours


End time 24 hours

3-4
Time Series Objects and Collections

Member Time Series Objects:

Intersection1
Intersection2
Intersection3

Resampling a Collection

A resampling operation is used to either select existing data at specific time values, or to interpolate
data at finer intervals. If the new time vector contains time values that did not exist in the previous
time vector, the new data values are calculated using the interpolation method associated with each
time series.

Resample the time series to include data values every two hours instead of every hour and save it as a
new tscollection object.

tsc1 = resample(tsc,1:2:24);

In some cases, you might need a finer sampling of information than you currently have and it is
reasonable to obtain it by interpolating data values. For instance, interpolate values at each half-hour
mark.

tsc1 = resample(tsc,1:0.5:24);

The new data points in Intersection1 are calculated by using the zero-order hold interpolation
method, which holds the value of the previous sample constant. Plot the members of tsc1 with
markers to see the results of interpolating.

plot(tsc1.Intersection1,"-x")

3-5
3 Time Series Analysis

The new data points in Intersection2 use linear interpolation, the default method.

plot(tsc1.Intersection2,"-o")

3-6
Time Series Objects and Collections

Adding a Data Sample to a Collection

Add a data sample to the first collection member at 3.25 hours.

tsc1 = addsampletocollection(tsc1,"time",3.25,"Intersection1",5);

The time series includes values for every half hour, so the new value is the sixth element.

tsc1.Intersection1.Data

ans = 48×1

11
11
7
7
14
5
14
11
11
43

Because you did not specify the data values for Intersection2 and Intersection3 in the new
sample, the missing values are represented by NaN for these members.

3-7
3 Time Series Analysis

tsc1.Intersection2.Data

ans = 48×1

11
12
13
15
17
NaN
15
13
32
51

Handling Missing Data

The Intersection2 and Intersection3 members in the tsc1 collection currently contain missing
values at 3.25 hours, represented by NaN. Before analyzing this data, you can either remove the
missing values or use interpolation to replace them.

For instance, find and remove the data samples that contain NaN values. For each missing value in
Intersection2, the data at that time is removed from all members of the collection.
tsc2 = delsamplefromcollection(tsc1,"index",...
find(isnan(tsc1.Intersection2.Data)));

Alternatively, replace the NaN values in Intersection2 and Intersection3 by resampling using
interpolation. The default interpolation method for these time series is linear interpolation.
tsc1 = resample(tsc1,tsc1.Time);
tsc1.Intersection2.Data

ans = 48×1

11
12
13
15
17
16
15
13
32
51

Plotting Collection Members

To plot data in a time series collection, plot its members one at a time.

Optionally, to display the time vectors as formatted dates and times, specify the start date. In this
case, the time units are hours, so you can specify a display format that shows hours and minutes.
tsc1.TimeInfo.StartDate = "25-DEC-2022 00:00:00";
tsc1.TimeInfo.Format = "HH:MM";

3-8
Time Series Objects and Collections

When you plot a single member of a collection, its time units display on the x-axis and its data units
display on the y-axis. The plot title is displayed as Time Series Plot:<member name>.

If you specify hold on before plotting multiple members of the collection, no annotations display.
The time series plot method does not attempt to show labels and titles on held figures because the
descriptors for the series can be different. To preserve date formatting, hold after plotting the first
member. Update the title and labels to reflect the entire collection.

plot(tsc1.Intersection1,"Displayname","Intersection 1")
hold on
plot(tsc1.Intersection2,"Displayname","Intersection 2")

legend("Location","NorthWest")
title("Intersections 1 and 2")
xlabel("Time (hours)")
ylabel("Number of cars")
hold off

See Also
timeseries | tscollection | tsdata.event

3-9
4

Manage Experiments

• “Keyboard Shortcuts for Experiment Manager” on page 4-2


• “Run Experiments in Parallel” on page 4-4
• “Offload Experiments as Batch Jobs to a Cluster” on page 4-6
• “Debug General-Purpose Experiments” on page 4-9
• “Experiment with Predator-Prey Equations” on page 4-12
• “Compare Air Resistance Models for Projectile Motion” on page 4-20
4 Manage Experiments

Keyboard Shortcuts for Experiment Manager


Use these keyboard shortcuts when working with Experiment Manager.

Note When you use the keyboard shortcuts on macOS systems:

• Press the Command (⌘) key instead of the Ctrl key.


• Press the Option key instead of the Alt key.

Shortcuts for General Navigation


Action Shortcut
Show the access keys for the toolstrip Alt+E
Move forward through the different areas of the Ctrl+F6
Experiment Manager app, including the toolstrip,
the Experiment Browser pane, and the
experiment definition and results tabs
Move backward through the different areas of the Ctrl+Shift+F6
Experiment Manager app, including the toolstrip,
the Experiment Browser pane, and the
experiment definition and results tabs
Move forward through the different elements in Tab
the start page, the toolstrip, or the experiment
definition and results tabs
Move backward through the different elements in Shift+Tab
the start page, the toolstrip, or the experiment
definition and results tabs
Select the current option in the start page or a Enter
menu
Cancel the current action, for example, hide the Esc
access keys for the toolstrip or close a menu

Shortcuts for Experiment Browser


Use Ctrl+F6 or Ctrl+Shift+F6 to navigate to the Experiment Browser. Then press Tab twice to
focus the currently selected project, experiments, or results. The Experiment Browser supports the
selection of multiple experiments and results.

Action Shortcut
Show the experiments in the project or the Right arrow
results for an experiment
If the contents of the project or experiment are
already visible, pressing the right arrow selects
the first experiment in the project or the first
result for the experiment.

4-2
Keyboard Shortcuts for Experiment Manager

Action Shortcut
Hide the experiments in the project or the results Left arrow
for an experiment
If the contents of an experiment are already
hidden, pressing the left arrow selects the project
that contains the experiment.
Move down and select the next item in the Down arrow
browser
Move up and select the previous item in the Up arrow
browser
Select multiple items in the browser Navigate to the next item in the browser that you
want to select by pressing Ctrl+up arrow or Ctrl
+down arrow. Then press the space bar to select.
Select a range of contiguous items in the browser Shift+down arrow or Shift+up arrow
Select all experiments Ctrl+A
Rename an experiment or result F2 or Enter
Delete selected experiments or results Delete

• Your selection must contain only experiments


or only results.
• If you delete an experiment, Experiment
Manager also deletes the results contained in
the experiment.

Shortcuts for Results Table


Use Ctrl+F6 or Ctrl+Shift+F6 to navigate to the experiment results tab. Then press Tab to navigate
to the results table.

Action Shortcut
Move to the next trial in the table Down arrow
Move to the previous trial in the table Up arrow
Move to the next column in the table Right arrow
Move to the previous column in the table Left arrow

See Also
Apps
Experiment Manager

More About
• “Run Experiments in Parallel” on page 4-4
• “Offload Experiments as Batch Jobs to a Cluster” on page 4-6
• “Debug General-Purpose Experiments” on page 4-9

4-3
4 Manage Experiments

Run Experiments in Parallel


By default, Experiment Manager runs one trial of your experiment at a time on a single CPU. If you
have Parallel Computing Toolbox™, you can configure your experiment to run multiple trials at the
same time or to run a single trial at a time on multiple GPUs, on a cluster, or in the cloud.

Tip For more information on running deep learning experiments in parallel, see “Use Experiment
Manager to Train Networks in Parallel” (Deep Learning Toolbox).

Run Multiple Simultaneous Trials


To run multiple trials of your experiment at the same time using one parallel worker for each trial:

1 Set up your parallel environment as described in “Set Up Parallel Environment” on page 4-5.
2 On the Experiment Manager toolstrip, set Mode to Simultaneous.

Alternatively, to offload the experiment as a batch job, set Mode to Batch Simultaneous and
specify your cluster and pool size. For more information, see “Offload Experiments as Batch Jobs
to a Cluster” on page 4-6.
3
Click Run .

Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. All
other trials in your experiment are queued for later evaluation.

Tip Load data for your experiment from a location that is accessible to all your parallel workers. For
example, store your data outside the project and access the data by using an absolute path.
Alternatively, create a datastore object that can access the data on another machine by setting up the
AlternateFileSystemRoots property of the datastore. For more information, see “Set Up
Datastore for Processing on Different Machines or Clusters”.

Run Single Trial on Multiple Workers


To run a single trial of your experiment at a time on multiple parallel workers:

1 In your experiment function, set up your parallel environment as described in “Set Up Parallel
Environment” on page 4-5. Then, use spmd, parfor, or parfeval to define the parallel
algorithm for your experiment. For more information, see “Choose Between spmd, parfor, and
parfeval” (Parallel Computing Toolbox).
2 On the Experiment Manager toolstrip, set Mode to Sequential.

Alternatively, to offload the experiment as a batch job, set Mode to Batch Sequential and
specify your cluster and pool size. For more information, see “Offload Experiments as Batch Jobs
to a Cluster” on page 4-6.
3
Click Run .

4-4
Run Experiments in Parallel

Set Up Parallel Environment


Run on Multiple GPUs

If you have multiple GPUs, parallel execution typically increases the speed of your experiment. Using
a GPU requires Parallel Computing Toolbox and a supported GPU device. For more information, see
“GPU Computing Requirements” (Parallel Computing Toolbox). To determine whether a usable GPU is
available, call the canUseGPU function.

For best results, before you run your experiment, create a parallel pool with as many workers as
GPUs. Otherwise, multiple workers share the same GPU, so you do not get the desired computational
speed-up and you increase the chance that the GPUs run out of memory. You can check the number of
available GPUs by using the gpuDeviceCount function.

numGPUs = gpuDeviceCount("available");
parpool(numGPUs)

Run on Cluster or in Cloud

If your experiments take a long time to run on your local machine, you can improve performance by
using a computer cluster on your onsite network or by renting high-performance GPUs in the cloud.
After you complete the initial setup, you can run your experiments with minimal changes to your
code. Working on a cluster or in the cloud requires MATLAB Parallel Server™. For more information,
see “Scale Up from Desktop to Cluster” (Parallel Computing Toolbox).

See Also
Apps
Experiment Manager

Functions
spmd | parfor | parfeval | gpuDeviceCount | parpool

Related Examples
• “Choose Between spmd, parfor, and parfeval” (Parallel Computing Toolbox)
• “Use Experiment Manager to Train Networks in Parallel” (Deep Learning Toolbox)
• “Offload Experiments as Batch Jobs to a Cluster” on page 4-6

4-5
4 Manage Experiments

Offload Experiments as Batch Jobs to a Cluster


By default, Experiment Manager runs your experiments interactively, so you can monitor the progress
of each trial by inspecting the results table. However, running an experiment interactively limits your
access to MATLAB functionality. For example, when an experiment is running, you cannot close the
project that contains the experiment or run other experiments.

If you have Parallel Computing Toolbox and MATLAB Parallel Server, you can send your experiment
as a batch job to a remote cluster. While the experiment is running in the cluster, you can:

• Run another experiment interactively or start another batch job using the same experiment, a
different experiment in the same project, or an experiment in a different project.
• Close the Experiment Manager app and continue using MATLAB.
• Close your MATLAB session.

If you only have Parallel Computing Toolbox, you can use a local cluster profile to develop and test
your experiments on your client machine instead of running them on a network cluster. If you close
your MATLAB session, any batch jobs using the local cluster profile also stop immediately.

Create Batch Job on Cluster


To start a batch job for your experiment:

1 Configure your experiment as described in “Configure General-Purpose Experiment”.

Tip Load data for your experiment from a location that is accessible to all your parallel workers.
For example, store your data outside the project and access the data by using an absolute path.
Alternatively, create a datastore object that can access the data on another machine by setting up
the AlternateFileSystemRoots property of the datastore. For more information, see “Set Up
Datastore for Processing on Different Machines or Clusters”.
2 In the Experiment Manager toolstrip, under Execution, use the Mode list to specify an
execution mode:

• To run one trial of the experiment at a time, select Batch Sequential.


• To run multiple trials at the same time, select Batch Simultaneous.
3 Use the Cluster list to select a cluster profile to use for your batch job. To create and manage
cluster profiles, open the Cluster Profile Manager. For more information, see “Discover Clusters
and Use Cluster Profiles” (Parallel Computing Toolbox).
4 In the Pool Size field, enter the number of workers for your batch job.

• If Mode is Batch Sequential, use this field to configure the number of parallel workers
that collaborate on each trial of the experiment. If you set the pool size to 0, the experiment
runs on a single worker.
• If Mode is Batch Simultaneous, use this field to specify the number of trials that the
cluster runs at the same time.

Because Experiment Manager uses an additional worker to run the batch job, the cluster must
have at least one more worker available than the number you specify in the Pool Size field. For
example, if you specify a pool size of 2, the cluster must have at least three workers available:

4-6
Offload Experiments as Batch Jobs to a Cluster

two workers for the experiment and an additional worker to run the batch job. For more
information, see “Run a Batch Job with a Parallel Pool” (Parallel Computing Toolbox).
5
Click Run . Experiment Manager uses the batch function to run the experiment in the
specified cluster.

While the batch job runs your experiment, you can close Experiment Manager and recover the results
later.

Track Progress of Batch Job


When you run a batch job for an experiment, Experiment Manager does not continually communicate
with the cluster to update the values in the results table and save the visualizations for your
experiment. Instead, you retrieve this information from the cluster by clicking the Refresh button
above the results table.

To monitor batch jobs without opening the Experiment Manager app, use the Job Monitor. The Job
Monitor tells you whether your batch job is queued, running, or finished.

Note Using the Job Monitor to cancel or delete jobs that you create with Experiment Manager can
lead to unexpected behavior. Instead, cancel and delete these batch jobs by using Experiment
Manager.

4-7
4 Manage Experiments

Cancel Batch Job

To cancel a batch job running an experiment, in the Experiment Manager toolstrip, click Cancel .
Experiment Manager marks any running and queued trials as Canceled and discards their results.

Batch execution does not support stopping, canceling, or restarting individual trials of an experiment.

Delete Batch Job


To avoid consuming resources unnecessarily, delete the job from the cluster by clicking the Clean up
button above the results table.

Note Before you delete the batch job for a deep learning experiment, first download your training
results from the cluster, as described in “Download Training Results” (Deep Learning Toolbox).
Deleting the batch job permanently discards any results that you have not downloaded from the
cluster.

See Also
Apps
Experiment Manager | Job Monitor

Functions
batch

Related Examples
• “Run Batch Parallel Jobs” (Parallel Computing Toolbox)
• “Discover Clusters and Use Cluster Profiles” (Parallel Computing Toolbox)
• “Use Parallel Computing Toolbox with Cloud Center Cluster in MATLAB Online” (Parallel
Computing Toolbox)

4-8
Debug General-Purpose Experiments

Debug General-Purpose Experiments


In Experiment Manager, you create a MATLAB function to specify the procedure that your experiment
uses. You can diagnose problems in your experiment function by stepping through your code line-by-
line and examining the values of your variables.

Tip For more information on how to diagnose problems with your deep learning experiments, see
“Debug Deep Learning Experiments” (Deep Learning Toolbox).

Start Debugging Session


You can debug your code before or after you run the experiment.

To debug your code before you run the experiment:

1 Open the experiment.


2
In the Experiment Manager toolstrip, select Run > Debug .
3 In dialog box, specify the parameter values for your experiment.
4 Click Start.

To debug your code after you run the experiment:

1 Open the results for the experiment.


2 In the results table, select a trial to debug. To ensure reproducibility, Experiment Manager reuses
the parameter values and the random seed saved for this trial.
3
Right-click the trial and select Debug .

Experiment Manager opens the experiment function in MATLAB Editor, places a breakpoint in the
first line of code, and runs the function.

4-9
4 Manage Experiments

Debug Experiment Function


When you debug your experiment function, MATLAB pauses at each line of code that has a
breakpoint. Then you can add other breakpoints to your function, view the values of your variables,
step through the code line-by-line, or continue to the next breakpoint. For more information, see
“Debug MATLAB Code Files”.

Verify Your Results


After your function runs to completion, verify your results by examining the parameters and output
values stored in the workspace variables

• functionName_params — A structure with fields from the Experiment Manager parameter table
• functionName_output — A cell array that contains the output values returned by the
experiment function

where functionName is the name of the experiment function.

See Also
Apps
Experiment Manager

4-10
Debug General-Purpose Experiments

More About
• “Set Breakpoints”
• “Debug MATLAB Code Files”

4-11
4 Manage Experiments

Experiment with Predator-Prey Equations

This example shows how to use Experiment Manager to explore how different coefficient values and
initial values affect the solution of a system of differential equations.

The Lotka-Volterra predator-prey model is a system of first-order ordinary differential equations that
describes the relationship between two competing populations. For example, these equations
describe the populations of rabbits and foxes with respect to time:

In these equations, is the number of rabbits and is the number of foxes. The coefficients and
describe the predation of rabbits by foxes. The population of rabbits increases when and
decreases when . The population of foxes increases when and decreases when .
For more information, see [1].

In this experiment, you observe the effect on the predicted populations of rabbits and foxes when you
use different values for the coefficients and and the initial values of and .

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment that
you can inspect and run. To open the experiment, in the Experiment Browser pane, double-click
LotkaVolterraExperiment.

4-12
Experiment with Predator-Prey Equations

The experiment consists of a description, a table of parameters, and an experiment function. For
more information, see “Configure General-Purpose Experiment”.

The Description field contains a textual description of the experiment. For this example, the
description is:

Find maximum and minimum population values in Lotka-Volterra predator-prey model:

Rabbits' = (1 - Foxes/Alpha) * Rabbits


Foxes' = -(1 - Rabbits/Beta) * Foxes

The Parameters section specifies the parameter values to use for the experiment. Experiment
Manager runs multiple trials of your experiment using a different combination of parameters for each
trial. This example uses the parameters Alpha and Beta to specify the values of the coefficients of
the Lotka-Volterra equations and the parameters RabbitsInitial and FoxesInitial to specify
the initial population of rabbits and foxes.

The Experiment Function section specifies the function LotkaVolterraFunction, which defines
the procedure for the experiment. To open this function in MATLAB® Editor, click Edit. The code for
the function also appears in Experiment Function. The input to the experiment function is a structure
called params with fields from the parameter table. The function uses dot notation to extract the

4-13
4 Manage Experiments

parameter values from this structure and to set up the initial conditions and differential equations for
the predator-prey problem.

yInitial = [params.RabbitsInitial params.FoxesInitial]';


tRange = [0 20];

signs = [1 -1]';
mu = [params.Alpha params.Beta]';
dydt = @(t,y) signs.*(1-flipud(y)./mu).*y;

Next, the experiment function calls ode45 to solve the differential equations and to compute the
maximum and minimum number of predicted rabbits and foxes.

[t,y] = ode45(dydt,tRange,yInitial);

RabbitsMin = min(y(:,1));
RabbitsMax = max(y(:,1));
FoxesMin = min(y(:,2));
FoxesMax = max(y(:,2));

Finally, the experiment function plots the populations of rabbits and foxes against time and against
each other. When you run the experiment, Experiment Manager adds two buttons to the Review
Results gallery in the toolstrip so you can display these figures. The Name property of each figure
specifies the name of the corresponding button in the toolstrip.

figure(Name="Population Size");
plot(t,y)
title("Population v. Time")
xlabel("Time")
ylabel("Population")
legend("Rabbits","Foxes")

figure(Name="Phase Plane");
hold on
plot(y(:,1),y(:,2))
plot([mu(2),mu(2)],[FoxesMin,FoxesMax],":r");
plot([RabbitsMin,RabbitsMax],[mu(1),mu(1)],":r");
title("Foxes v. Rabbits")
xlabel("Rabbits")
ylabel("Foxes")
hold off

Run Experiment

On the Experiment Manager toolstrip, click Run. Experiment Manager runs the experiment function
nine times, each time using a different combination of parameter values. A table of results displays
the output values for each trial.

4-14
Experiment with Predator-Prey Equations

Evaluate Results

To investigate how changing a parameter value affects the predicted populations of rabbits and foxes,
you can sort the results table using one of the output values. For example, to find the trial with the
largest rabbit population:

1 Point to the header of the RabbitsMax column.


2 Click the triangle icon.
3 Select Sort in Descending Order.

You can also select a trial in the results table and inspect the visualizations created by the experiment
function. To see the predicted populations plotted as functions of time, on the Experiment Manager
toolstrip, under Review Results, click Population Size. This plot shows the two populations
oscillating, with the maximum and minimum values in the rabbit population preceding the maximum
and minimum values in the fox population.

4-15
4 Manage Experiments

To see the populations plotted against each other, click Phase Plane. The phase plane plot shows the
cyclic relationship between the populations.

4-16
Experiment with Predator-Prey Equations

To perform additional computations on your results, you can export the results table to the MATLAB
workspace as a nested table array. For example, to compare the peak-to-peak amplitudes of the
populations over all the trials in your experiment:

1 On the Experiment Manager toolstrip, click Export.


2 In the dialog window, enter the name of a workspace variable for the exported table. The default
name is resultsTable.
3 In the MATLAB Command Window, use the exported table as the input to the function
populationAmplitudes:

populationAmplitudes(resultsTable)

To view the code for this function, see Compute Population Amplitudes. The function displays a
summary of the maximum, minimum, and mean peak-to-peak population amplitudes for rabbits and
foxes over all the trials in your experiment.

******************************************

Population of rabbits:
Maximum amplitude: 575.5 (Trial 9)
Minimum amplitude: 284.1 (Trial 7)
Mean amplitude: 416.9

Population of foxes:
Maximum amplitude: 473.9 (Trial 3)
Minimum amplitude: 121.2 (Trial 7)
Mean amplitude: 293.6

******************************************

To record observations about the results of your experiment, add an annotation:

1 In the results table, right-click the RabbitsMax cell for trial 9.


2 Select Add Annotation.
3 In the Annotations pane, enter your observations in the text box.

Close Experiment

In the Experiment Browser pane, right-click PredatorPreyProject and select Close Project.
Experiment Manager closes the experiment and results contained in the project.

4-17
4 Manage Experiments

Experiment Function

This function solves the Lotka-Volterra equations, plots the predicted populations of rabbits and foxes
over time and against each other, and returns the maximum and minimum number of rabbits and
foxes. The input argument params is a structure with fields from the parameter table.

function [RabbitsMin,RabbitsMax,FoxesMin,FoxesMax] = LotkaVolterraFunction(params)

yInitial = [params.RabbitsInitial params.FoxesInitial]';


tRange = [0 20];

signs = [1 -1]';
mu = [params.Alpha params.Beta]';
dydt = @(t,y) signs.*(1-flipud(y)./mu).*y;

[t,y] = ode45(dydt,tRange,yInitial);

RabbitsMin = min(y(:,1));
RabbitsMax = max(y(:,1));
FoxesMin = min(y(:,2));
FoxesMax = max(y(:,2));

figure(Name="Population Size");
plot(t,y)
title("Population v. Time")
xlabel("Time")
ylabel("Population")
legend("Rabbits","Foxes")

figure(Name="Phase Plane");
hold on
plot(y(:,1),y(:,2))
plot([mu(2),mu(2)],[FoxesMin,FoxesMax],":r");
plot([RabbitsMin,RabbitsMax],[mu(1),mu(1)],":r");
title("Foxes v. Rabbits")
xlabel("Rabbits")
ylabel("Foxes")
hold off

end

Compute Population Amplitudes

This function extracts the maximum and minimum number of rabbits and foxes for each trial from the
exported results table results. Then, the function computes the maximum, minimum, and mean
peak-to-peak population amplitudes over all trials.

function populationAmplitudes(results)

results = splitvars(results);
RabbitsAmplitude = results.RabbitsMax - results.RabbitsMin;
FoxesAmplitude = results.FoxesMax - results.FoxesMin;

[maxRabbitsAmplitude,maxRabbitsAmplitudeTrial] = max(RabbitsAmplitude);
[minRabbitsAmplitude,minRabbitsAmplitudgeTrial] = min(RabbitsAmplitude);

4-18
Experiment with Predator-Prey Equations

meanRabbitsAmplitude = mean(RabbitsAmplitude);

[maxFoxesAmplitude,maxFoxesAmplitudeTrial] = max(FoxesAmplitude);
[minFoxesAmplitude,minFoxesAmplitudeTrial] = min(FoxesAmplitude);
meanFoxesAmplitude = mean(FoxesAmplitude);

fprintf("\n******************************************\n\n");
fprintf("Population of rabbits:\n");
fprintf("Maximum amplitude: %.1f (Trial %d)\n", ...
maxRabbitsAmplitude,maxRabbitsAmplitudeTrial);
fprintf("Minimum amplitude: %.1f (Trial %d)\n", ...
minRabbitsAmplitude,minRabbitsAmplitudgeTrial);
fprintf("Mean amplitude: %.1f \n\n",meanRabbitsAmplitude);
fprintf("Population of foxes:\n");
fprintf("Maximum amplitude: %.1f (Trial %d)\n", ...
maxFoxesAmplitude,maxFoxesAmplitudeTrial);
fprintf("Minimum amplitude: %.1f (Trial %d)\n", ...
minFoxesAmplitude,minFoxesAmplitudeTrial);
fprintf("Mean amplitude: %.1f \n",meanFoxesAmplitude);
fprintf("\n******************************************\n\n");

end

References
[1] Moler, Cleve. "Predator-Prey Model." In Experiments with MATLAB. Natick, MA: The
MathWorks®, 2011. mathworks.com/moler/exm.html.

See Also
Apps
Experiment Manager

Functions
ode45 | table | splitvars

Related Examples
• “Solve Predator-Prey Equations”
• “Compare Air Resistance Models for Projectile Motion” on page 4-20

4-19
4 Manage Experiments

Compare Air Resistance Models for Projectile Motion

This example shows how to use Experiment Manager to compare the effects of air resistance on the
trajectory of a projectile assuming one of these models:

• No air resistance — The motion of the projectile depends only on gravity.


• Stokes drag — Air resistance is proportional to velocity.
• Newton drag — Air resistance is proportional to the square of velocity.

In this experiment, you observe the differences in trajectory shape, height, and range, and determine
the launch angle that produces the longest projectile range for each air resistance model.

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment that
you can inspect and run. To open the experiment, in the Experiment Browser pane, double-click
AirResistanceExperiment.

The experiment consists of a description, a table of parameters, and an experiment function. For
more information, see “Configure General-Purpose Experiment”.

The Description field contains a textual description of the experiment. For this example, the
description is:

4-20
Compare Air Resistance Models for Projectile Motion

Simulate projectile motion defined by angle Theta, coefficient of friction Mu, and one of these m
* None - no air resistance
* Stokes - air resistance is proportional to velocity
* Newton - air resistance is proportional to the square of velocity

The Parameters section specifies the parameter values to use for the experiment. Experiment
Manager runs multiple trials of your experiment using a different combination of parameters for each
trial. In this example, the parameters Model, Theta, and Mu specify the air resistance model, the
launch angle, and the coefficient of friction, respectively. Model is a string with the values "None",
"Stokes", and "Newton", Theta takes five values between 15 and 75 degrees, and Mu has a
constant value of 0.02.

The Experiment Function section specifies the function AirResistanceFunction, which defines
the procedure for the experiment. To open this function in MATLAB® Editor, click Edit. The code for
the function also appears in Experiment Function. The input to the experiment function is a structure
called params with fields from the parameter table. The function uses dot notation to extract the
parameter values from this structure and to set up the initial conditions and differential equations for
the projectile motion problem.

theta = params.Theta;
mu = params.Mu;
g = 9.81;
vInitial = 300;

tInitial = 0;
tFinal = 2*vInitial*sind(theta)/g + 1;

yInitial = [0; 0; vInitial*cosd(theta); vInitial*sind(theta)];

switch params.Model
case "None"
dydt = @(t,y) [y(3); y(4); 0; -g];
case "Stokes"
dydt = @(t,y) [y(3); y(4); -mu*y(3); -g-mu*y(4)];
case "Newton"
dydt = @(t,y) [y(3); y(4); -mu*y(3)*sqrt(y(3)^2+y(4)^2); ...
-g-mu*y(4)*sqrt(y(3)^2+y(4)^2)];
otherwise
error("Invalid air resistance model")
end

Next, the experiment function calls ode45 to solve the differential equations and to compute the
maximum height and range reached by the projectile, as well as the time it takes the projectile to
reach these points in the trajectory. The event functions endOfAscent and endOfDescent stop the
integration when the projectile reaches the highest point in the trajectory and when it returns to a
height of zero.

options = odeset('Events',@endOfAscent);
[~,yout,te,ye,~] = ode45(dydt,[tInitial tFinal],yInitial,options);
tMaxHeight = te;
maxHeight = ye(2);

tInitial = te;
yInitial = ye';

4-21
4 Manage Experiments

options = odeset('Events',@endOfDescent);
[~,y,te,ye,~] = ode45(dydt,[tInitial tFinal],yInitial,options);
yout = [yout; y(2:end,:)];
tMaxRange = te;
maxRange = ye(1);

Finally, the experiment function plots the predicted trajectory of the projectile and a parabolic path
with the same height and range. When you run the experiment, Experiment Manager adds a button to
the Review Results gallery in the toolstrip so you can display the figure. The Name property of the
figure specifies the name of the button in the toolstrip.

figure(Name="Projectile Trajectory")
hold on
plot(yout(:,1),yout(:,2),LineWidth=2)
X = maxRange*(0:0.05:1);
Y = 4*maxHeight*X.*(maxRange-X)/maxRange^2;
plot(X,Y,"-.")
title("Comparison of Trajectory and Parabolic Path")
xlabel("Horizontal Distance")
ylabel("Vertical Distance")
legend("Trajectory","Parabolic Path")
axis tight
hold off

Run Experiment

On the Experiment Manager toolstrip, click Run. Experiment Manager runs the experiment function
15 times, each time using a different combination of parameter values. A table of results displays the
output values for each trial.

Evaluate Results

The results table shows that the height and range of the predicted trajectory decrease as air
resistance increases. To visualize the effects of air resistance on the shape of the trajectory, select a
trial. Then, on the Experiment Manager toolstrip, under Review Results, click Projectile
Trajectory. When there is no air resistance, the trajectory of the projectile is a parabola.

4-22
Compare Air Resistance Models for Projectile Motion

The Stokes drag model offsets the trajectory slightly to the right of a parabolic path with the same
height and range.

In contrast, the trajectory under the Newton drag model is shifted significantly to the right of a
parabolic path.

To investigate how the launch angle affects the maximum range of the projectile under each model,
sort the results table by maximum range and by model:

1 Point to the header of the maxRange column.


2 Click the triangle icon.
3 Select Sort in Descending Order.
4 Repeat the previous steps for the Model column, but select Sort in Ascending Order.

4-23
4 Manage Experiments

When there is no air resistance, the maximum range occurs at a launch angle of 45 degrees. For the
Newton drag model, the maximum range occurs between 15 and 30 degrees, and for the Stokes drag
model, the maximum range occurs between 30 and 45 degrees.

Rerun Experiment

To identify the launch angle that maximizes the range of the projectile with greater precision, change
the parameter values and rerun the experiment:

1 Return to the experiment definition tab.


2 In the parameter table, change the value of the parameter Model to "Newton".
3 Change the value of the parameter Theta to 15:30.
4 Run the experiment using the new parameter values.

Experiment Manager runs the experiment function 16 times, each time using the Newton drag model
and a different launch angle between 15 and 30 degrees.

The results show that the maximum range for the Newton drag model occurs at approximately 24
degrees. A similar approach shows that the maximum range for the Stokes drag model occurs at
approximately 39 degrees.

4-24
Compare Air Resistance Models for Projectile Motion

Close Experiment

In the Experiment Browser pane, right-click ProjectileMotionProject and select Close Project.
Experiment Manager closes the experiment and results contained in the project.

Experiment Function

This function extracts the values in the parameter table and sets up the initial conditions and
differential equations for the projectile motion problem. The function calls ode45 to solve the
differential equations and to compute the maximum height and range reached by the projectile, as
well as the time it takes the projectile to reach these points in the trajectory. Then, the function plots
the trajectory of the projectile and a parabolic path with the same height and range.

function [maxHeight,maxRange,tMaxHeight,tMaxRange] = AirResistanceFunction(params)

theta = params.Theta;
mu = params.Mu;
g = 9.81;
vInitial = 300;

tInitial = 0;
tFinal = 2*vInitial*sind(theta)/g + 1;

yInitial = [0; 0; vInitial*cosd(theta); vInitial*sind(theta)];

switch params.Model
case "None"
dydt = @(t,y) [y(3); y(4); 0; -g];
case "Stokes"
dydt = @(t,y) [y(3); y(4); -mu*y(3); -g-mu*y(4)];
case "Newton"
dydt = @(t,y) [y(3); y(4); -mu*y(3)*sqrt(y(3)^2+y(4)^2); ...
-g-mu*y(4)*sqrt(y(3)^2+y(4)^2)];
otherwise
error("Invalid air resistance model")
end

options = odeset('Events',@endOfAscent);
[~,yout,te,ye,~] = ode45(dydt,[tInitial tFinal],yInitial,options);
tMaxHeight = te;
maxHeight = ye(2);

tInitial = te;
yInitial = ye';
options = odeset('Events',@endOfDescent);
[~,y,te,ye,~] = ode45(dydt,[tInitial tFinal],yInitial,options);
yout = [yout; y(2:end,:)];
tMaxRange = te;
maxRange = ye(1);

figure(Name="Projectile Trajectory")
hold on
plot(yout(:,1),yout(:,2),LineWidth=2)
X = maxRange*(0:0.05:1);
Y = 4*maxHeight*X.*(maxRange-X)/maxRange^2;
plot(X,Y,"-.")

4-25
4 Manage Experiments

title("Comparison of Trajectory and Parabolic Path")


xlabel("Horizontal Distance")
ylabel("Vertical Distance")
legend("Trajectory","Parabolic Path")
axis tight
hold off

end

Helper Functions

This event function stops the integration when the projectile reaches the highest point in the
trajectory.

function [value,isterminal,direction] = endOfAscent(~,y)


value = y(4);
isterminal = 1;
direction = 0;
end

This event function stops the integration when the projectile returns to a height of zero.

function [value,isterminal,direction] = endOfDescent(~,y)


value = y(2);
isterminal = 1;
direction = 0;
end

See Also
Apps
Experiment Manager

Functions
ode45 | odeset

Related Examples
• “Experiment with Predator-Prey Equations” on page 4-12
• “ODE Event Location”

4-26

You might also like