0% found this document useful (0 votes)

15 views11 pages

DSBA+Master+Codebook+ +Text+Mining+&+TSF

The document serves as a codebook for Data Science, focusing on essential skills in mathematical/statistical understanding, coding, and domain knowledge. It covers key topics such as Text Mining and Time Series Forecasting, providing code examples and explanations for various techniques in Python. The document emphasizes the importance of preprocessing text data and offers guidance on implementing models like Exponential Smoothing and ARIMA for time series analysis.

Uploaded by

kapisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

DSBA+Master+Codebook+ +Text+Mining+&+TSF

Uploaded by

kapisha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

DSBA

Codebook
Preface

Data Science is the art and science of solving real world problems and making data driven decisions. It involves an
amalgamation of three aspects and a good data scientist has expertise in all three of them. These are:
1) Mathematical/ Statistical understanding
2) Coding/ Technology understanding
3) Domain knowledge
Your lack of expertise should not become an impediment in your journey in Data Science. With consistent effort, you
can become fairly proficient in coding skills over a period of time. This Codebook is intended to help you become
comfortable with the finer nuances of Python and can be used as a handy reference for anything related to data science
codes throughout the program journey and beyond that.

In this document we have followed the following syntax:

- Brief description of the topic
- Followed with a code example.

Please keep in mind there is no one right way to write a code to achieve an intended outcome. There can be multiple
ways of doing things in Python. The examples presented in this document use just one of the approaches to perform
the analysis. Please explore by yourself different ways to perform the same thing.

1
Contents

PREFACE ................................................................................................................................................... 1

TEXT MINING .......................................................................................................................................... 3

Important Libraries ................................................................................................................................................... 4

TIME SERIES FORECASTING ............................................................................................................... 7

2
Text Mining
Text Analysis is a major application field for machine learning algorithms. However, the raw data, a sequence of symbols cannot
be fed directly to the algorithms themselves as most of them expect numerical features with a fixed size rather than the raw text
documents with variable length.
source: scikit-learn
Most of the data in the real world is unstructured text data and the method of mining this unstructured data or pre-processing the
text data to get useful insights is called Text Mining Analytics.
A few terminologies used in Text Mining:
1. Bag of Words: Simplification of text. Disregarding grammar.
2. Corpus: A large set of text.
3. Stop Words: Common words which are not useful for deriving meaningful insights. E.g. Articles or Prepositions
4. Stemming: Different variations of a word are changed into the original root word. E.g. Chopped and Chopping is changed
to Chop
5. Term Document Matrix (TDM): A matrix which contains the occurrence of the number of terms in each document.
6. Document Term Matrix (DTM): Transpose of TDM.
7. Term Frequency (TF): Normalized count of terms occurring in each document.
8. Inverse Document Frequency (IDF) - To be put very simply, IDF penalizes the term that occurs in almost every document.
E.g. “a , an, the”.
9. Lexicon: List of words
10. Bigrams: Collection of words taken two at a time

To analyse the text data, we can start by removing the stop words and then go on to stem words as well. Also, removing
punctuation might be a good idea.

from nltk.corpus import stopwords

‘iterative variable’ for ‘iterative variable’ in ‘variable which contains text data’ if not word in stopwords.words()

For stemming purposes, we can use various stemmers that are present in Python. The following is an example of Porter Stemmer

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

ps = PorterStemmer()

sentence = "Programers program with programing languages"

words = word_tokenize(sentence)

for w in words:

print(w, " : ", ps.stem(w))

Programers : program

program : program

with : with

programing : program

languages : language

3
You will notice that for stemming we have tokenized the data. A tokenizer is simply a function that breaks a string into a list of
words. In the following code snippet, we are removing the punctuation marks from a document of text.

import nltk

nltk.download('punkt')

from nltk.tokenize import RegexpTokenizer

tokenizer = RegexpTokenizer(r'\w+')

result = tokenizer.tokenize('hey! how are you ? buddy')

print(result)
['hey', 'how', 'are', 'you', 'buddy']

If we are to count the number of vectors, the following is an in-built function in sklearn. Do refer to the sklearn documentation to
understand more about the function

Source: scikit-learn
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()

corpus = [

'This is the first document.',

'This is the second document.',

'And the third one.',

'Is this the first document?',

X = vectorizer.fit_transform(corpus)

print(vectorizer.get_feature_names())
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

Important Libraries

Now, let us try to understand the functionalities of TF-IDF.

TF * IDF gives us a value for a particular word which tells us the significance of that word in the corpus.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer (max_features=2500, min_df=7, max_df=0.8, stop_words=stopwords.words('english'))

processed_features = vectorizer.fit_transform(processed_features).toarray()

Two things to check

(i) Processed features should be included before this code
(ii) In processed_features, feature should be mentioned as a sample.

Do look for the documentation of the TfidfVectorizer function in sklearn to learn more about the parameters that can be passed for
this function.
Source: scikit-learn
4
After building the TF-IDF, you have successfully managed to convert that unstructured data into structured numeric data and the
data can now be used for various Unsupervised or Supervised Learning problems.

With nltk (Natural Language Toolkit) and sklearn, we can also use regular expressions to get meaning out of our text data.

Following are just a few examples of regular expressions:

processed_features = [ ]

for sentence in range(0, len(features)): #here the unstructured has been saved in the variable ‘features’

# Remove all the special characters

processed_feature = re.sub(r'\W', ' ', str(features[sentence]))

# remove all single characters

processed_feature= re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_feature)

# Remove single characters from the start

processed_feature = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_feature)

# Substituting multiple spaces with single space

processed_feature = re.sub(r'\s+', ' ', processed_feature, flags=re.I)

# Removing prefixed 'b'

processed_feature = re.sub(r'^b\s+', '', processed_feature)

# Converting to Lowercase

processed_feature = processed_feature.lower()

processed_features.append(processed_feature)

Before executing these code snippets, you have to import the regular expression library by running the following code snippet
‘import re’.

To plot a Word Cloud, refer to the following code snippet. The biggest words in the Word Cloud are the words which occur the
most number of times. Do remember to remove the stop words and perform stemming of the words before the word cloud as that
will help you to get a better idea of the word occurring the most number of times correctly. Sometimes, word clouds are plotted
without stemming the words as well.

stop_words = set(stopwords.words('english')) #intialise stopwords from English Language

filtered_sentence = [] #empty list

5
for i in processed_features: # iterating in processes features through each sentence

word_tokens = word_tokenize(i) # converting each sentence to a token

for w in word_tokens:#in each token, removing stopwords from english language

if w not in stop_words:

filtered_sentence.append(w) #appending non-stopwords to filtered_sentence list

comment_words = ' ' #empty string

stop_words = set(STOPWORDS) #stopwords from Wordcloud

for words in filtered_sentence:

comment_words = comment_words + words + ' ' #converting to string

wordcloud = WordCloud(width = 1000, height = 1000, #wordcloud image creation

background_color ='white',

stopwords = stop_words,

min_font_size = 10).generate(comment_words)

# plot the WordCloud image

plt.figure(figsize = (8, 8), facecolor = None)

plt.imshow(wordcloud)

plt.axis("off")

plt.tight_layout(pad = 0)
plt.show()

The above code snippets are just some ways to preprocess the text data. By no means, they are the only means or ways to process
unstructured text data.

6
Time Series Forecasting

In this particular course, we deal with data which has some time stamps associated with it. We are going to see various tests and
techniques for predicting the future data based on the past data.

First let us see the syntaxes of Exponential Smoothing and understand how to code that in Python:

1. Simple Exponential Smoothing (SES)

from statsmodels.tsa import holtwinters as hw

build = hw.SimpleExpSmoothing(‘name of the time series’).fit() #for building the model

predict = build.forecast(steps=’for how long do you want to predict using this model’) #for predicting using the model built

Note: You can also select the value of ‘alpha’ manually while invoking the ‘.fit()’ function.

2. Holt’s Exponential Smoothing

from statsmodels.tsa.holtwinters import Holt
build = Holt(‘name of the time series’).fit() #for building the model

predict = build.forecast(steps=’for how long do you want to predict using this model’) #for predicting using the model built

Note: You can also select the value of ‘alpha’ and ‘beta’ manually while invoking the ‘.fit()’ function.

3. Holt-Winters Exponential Smoothing

from statsmodels.tsa.holtwinters import ExponentialSmoothing

build = ExponentialSmoothing (‘name of the time series’, trend=’additive’ , seasonal=’additive’).fit()
#for building the model. Here you have to mention the type of trend and seasonal components how you see fit for exponential
#smoothing. We have chosen additive. But do mention these according to the data at hand.

predict = build.forecast(steps=’for how long do you want to predict using this model’) #for predicting using the model built

Note: You can also select the value of ‘alpha’, ‘beta’, ‘gamma’ manually while invoking the ‘.fit()’ function.

Statsmodels link for Exponential Smoothing-

https://www.statsmodels.org/stable/examples/notebooks/generated/exponential_smoothing.html

Now let us check how to code up the ARIMA function in Python with appropriate syntaxes:

4. To plot Auto Correlation Function (ACF) and Partial Auto Correlation Functions (PACF):

a. ACF Plot

from statsmodels.graphics.tsaplots import plot_acf

plot_acf(‘name of the time series’, ax=plt.gca())
plt.show()

b. PACF Plot
7
from statsmodels.tsa.stattools import plot_pacf
plot_pacf(‘name of the time series’, ax=plt.gca())
plt.show()

Note: You can also pass different parameters to in the plot of ACF and PACF to get the desired specific output.

5. To calculate the Moving Average (MA):

‘name of the time series’.rolling(window=’order of the moving average’).mean()

6. To check for the stationarity of the series:

a. Augmented Dickey Fuller (ADF) Test

from statsmodels.tsa.stattools import adfuller

adfuller(‘name of the time series’)

Statsmodels link for Augmented Dickey-Fuller Test:

https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html

b. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

from statsmodels.tsa.stattools import kpss
kpss(‘name of the time series’)

Statsmodels link for KPSS Test

: https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html

7. To calculate the Seasonal Auto Regressive Integrated Moving Average (SARIMA) using the auto arima functionality
which looks to return us the best model which has the minimum Akaike Information Criteria (AIC) on the training data:

import itertools
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
# Generate all different combinations of p, d and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, q and q triplets

seasonal_pdq = [(x[0], x[1], x[2], ‘value of the seasonality in the SARIMA model’) for x in list(itertools.product(p, d, q))]

#Initializing the looping parameters

import numpy as np

best_aic = np.inf
best_pdq = None

8
best_seasonal_pdq = None
temp_model = None

#Loop function to calculate the auto arima

for param in pdq:

for param_seasonal in seasonal_pdq:

try:

temp_model = sm.tsa.statespace.SARIMAX(‘name of the time series’,

order = param,

seasonal_order = param_seasonal,

enforce_stationarity=True)

results = temp_model.fit()

if results.aic < best_aic:

best_aic = results.aic

best_pdq = param

best_seasonal_pdq = param_seasonal

except:

#print("Unexpected error:", sys.exc_info()[0])

continue

print("Best SARIMA{}{} model - AIC:{}".format(best_pdq, best_seasonal_pdq, best_aic))

Now that we have got the best seasonal parameters for the SARIMA, let us build the SARIMA model.

Import statesmodels.api as sm

best_model = sm.tsa.statespace.SARIMAX(‘name of the time series’,

order=(p, d, q), [(p.d.q) values got from the above loop]

seasonal_order=(P,D,Q,m), [(P,D,Q) values got from the above loop and m is the seasonal parameter])

best_results = best_model.fit() #building the model

best_results.forecast()#predicting using the model built

#To check the diagnostics of the model built

best_results.plot_diagnostics(lags=’desired lags to be specified’)

plt.show()

9
Note: You can also use the different parameters in the SARIMAX function to get the desired output of ARIMA, SARIMA and
SARIMAX. Do refer to the SARIMAX documentation in the statespace submodule of statsmodels library.

https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Methodology
No ratings yet
Methodology
9 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Text Mining & NLP for Academics
No ratings yet
Text Mining & NLP for Academics
38 pages
Text Analysis for Students
No ratings yet
Text Analysis for Students
11 pages
Text Mining and Dataset Creation in Python
No ratings yet
Text Mining and Dataset Creation in Python
13 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
DSC 202
No ratings yet
DSC 202
8 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
Unit 5
No ratings yet
Unit 5
8 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Python NLP
No ratings yet
Python NLP
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
Unit 5
No ratings yet
Unit 5
4 pages
Ir Lab 2 Ir Learning Outcomes: Pyterrier
No ratings yet
Ir Lab 2 Ir Learning Outcomes: Pyterrier
7 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
NLP Day1
No ratings yet
NLP Day1
4 pages
Text Preprocessing with NLTK
No ratings yet
Text Preprocessing with NLTK
42 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
No ratings yet
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
6 pages
Stemming, Lemmatization & NLP Basics
No ratings yet
Stemming, Lemmatization & NLP Basics
6 pages
Lab3 IR BIM
No ratings yet
Lab3 IR BIM
14 pages
British Airways Forage Report
No ratings yet
British Airways Forage Report
12 pages
Chapter V - Working With Text Data
No ratings yet
Chapter V - Working With Text Data
30 pages
6 - Text Vectorization-CSC688-SP22
No ratings yet
6 - Text Vectorization-CSC688-SP22
5 pages
Tutorial 3 - 206009L
No ratings yet
Tutorial 3 - 206009L
34 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Blue Doodle Project Presentation
No ratings yet
Blue Doodle Project Presentation
15 pages
Text Mining Notes
No ratings yet
Text Mining Notes
28 pages
Python Code Examples
100% (1)
Python Code Examples
30 pages
NLP Record
No ratings yet
NLP Record
15 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
NLTK Cheatsheet for Text Analysis
No ratings yet
NLTK Cheatsheet for Text Analysis
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
NLP Record
No ratings yet
NLP Record
16 pages
Ass 3
No ratings yet
Ass 3
3 pages
Python Text Classification Guide
No ratings yet
Python Text Classification Guide
34 pages
Chapter 8 Text Analytics
No ratings yet
Chapter 8 Text Analytics
42 pages
Machine Learning NLP LAB Sayak Mallick
No ratings yet
Machine Learning NLP LAB Sayak Mallick
4 pages
NLP Lab Manual for B.E. Students
No ratings yet
NLP Lab Manual for B.E. Students
21 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
Test 1 PDF
No ratings yet
Test 1 PDF
6 pages
Manual Allplan BCM Quantities
No ratings yet
Manual Allplan BCM Quantities
193 pages
Compressor: Dynamic Compressors Centrifugal Compressors
100% (1)
Compressor: Dynamic Compressors Centrifugal Compressors
7 pages
School As Learning Organisation: The Role of Principal's Transformational Leadership in Promoting Teacher Engagement
No ratings yet
School As Learning Organisation: The Role of Principal's Transformational Leadership in Promoting Teacher Engagement
6 pages
Overview of Timeline Panel
No ratings yet
Overview of Timeline Panel
15 pages
Excerpt
No ratings yet
Excerpt
10 pages
TSS HD Suspension
No ratings yet
TSS HD Suspension
2 pages
PM - I CIA
No ratings yet
PM - I CIA
5 pages
From Pseudo Code To Program Code
No ratings yet
From Pseudo Code To Program Code
24 pages
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
No ratings yet
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
6 pages
Ethical Dilemmas in Movies
No ratings yet
Ethical Dilemmas in Movies
13 pages
Jeep Wrangler Brochure - Compressed
No ratings yet
Jeep Wrangler Brochure - Compressed
22 pages
AWW Dust Collector Article Jan 2006
No ratings yet
AWW Dust Collector Article Jan 2006
7 pages
Vedic Chart Insights
No ratings yet
Vedic Chart Insights
12 pages
Vapor Diffusion in Air Streams
No ratings yet
Vapor Diffusion in Air Streams
8 pages
Circored
No ratings yet
Circored
2 pages
KV 27TS27
No ratings yet
KV 27TS27
10 pages
MANUAL IG - RS20 - RS30 - RS40 - Managed - 14 - 1209 - en
No ratings yet
MANUAL IG - RS20 - RS30 - RS40 - Managed - 14 - 1209 - en
62 pages
Exercise Workbook2 Basic
No ratings yet
Exercise Workbook2 Basic
90 pages
Ww85k5410uw - DC68 03677F 03 PDF
No ratings yet
Ww85k5410uw - DC68 03677F 03 PDF
56 pages
The Effectiveness of Isometric Contractions Compared With Isotonic Contractions in Reducing Pain For In-Season Athletes With Patellar Tendinopathy
No ratings yet
The Effectiveness of Isometric Contractions Compared With Isotonic Contractions in Reducing Pain For In-Season Athletes With Patellar Tendinopathy
4 pages
Lea Strength: Instruments
No ratings yet
Lea Strength: Instruments
3 pages
Namma Kalvi 12th Commerce Book Inside One Mark Study Material EM 220550
No ratings yet
Namma Kalvi 12th Commerce Book Inside One Mark Study Material EM 220550
145 pages
Working Sinewave Inverter
No ratings yet
Working Sinewave Inverter
10 pages
Protege CaseStudyBrief
No ratings yet
Protege CaseStudyBrief
2 pages
IOQM Counting Techniques Guide
No ratings yet
IOQM Counting Techniques Guide
4 pages
Poweroil: Power Gem Ep00, Ep0, Ep1 & Ep2 Extreme Pressure Greases
No ratings yet
Poweroil: Power Gem Ep00, Ep0, Ep1 & Ep2 Extreme Pressure Greases
1 page
Pers Soc Psychol Schultz
No ratings yet
Pers Soc Psychol Schultz
13 pages
25570929192444
No ratings yet
25570929192444
30 pages

DSBA+Master+Codebook+ +Text+Mining+&+TSF

Uploaded by

DSBA+Master+Codebook+ +Text+Mining+&+TSF

Uploaded by

DSBA

In this document we have followed the following syntax:

TEXT MINING .......................................................................................................................................... 3

Important Libraries ................................................................................................................................................... 4

TIME SERIES FORECASTING ............................................................................................................... 7

from nltk.corpus import stopwords

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

sentence = "Programers program with programing languages"

print(w, " : ", ps.stem(w))

from nltk.tokenize import RegexpTokenizer

result = tokenizer.tokenize('hey! how are you ? buddy')

'This is the first document.',

'This is the second document.',

'And the third one.',

'Is this the first document?',

Now, let us try to understand the functionalities of TF-IDF.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer (max_features=2500, min_df=7, max_df=0.8, stop_words=stopwords.words('english'))

Two things to check

Following are just a few examples of regular expressions:

# Remove all the special characters

processed_feature = re.sub(r'\W', ' ', str(features[sentence]))

# remove all single characters

processed_feature= re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_feature)

# Remove single characters from the start

processed_feature = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_feature)

# Substituting multiple spaces with single space

processed_feature = re.sub(r'\s+', ' ', processed_feature, flags=re.I)

# Removing prefixed 'b'

processed_feature = re.sub(r'^b\s+', '', processed_feature)

stop_words = set(stopwords.words('english')) #intialise stopwords from English Language

filtered_sentence = [] #empty list

word_tokens = word_tokenize(i) # converting each sentence to a token

for w in word_tokens:#in each token, removing stopwords from english language

filtered_sentence.append(w) #appending non-stopwords to filtered_sentence list

comment_words = ' ' #empty string

stop_words = set(STOPWORDS) #stopwords from Wordcloud

for words in filtered_sentence:

comment_words = comment_words + words + ' ' #converting to string

wordcloud = WordCloud(width = 1000, height = 1000, #wordcloud image creation

# plot the WordCloud image

plt.figure(figsize = (8, 8), facecolor = None)

1. Simple Exponential Smoothing (SES)

from statsmodels.tsa import holtwinters as hw

2. Holt’s Exponential Smoothing

3. Holt-Winters Exponential Smoothing

from statsmodels.tsa.holtwinters import ExponentialSmoothing

Statsmodels link for Exponential Smoothing-

from statsmodels.graphics.tsaplots import plot_acf

5. To calculate the Moving Average (MA):

‘name of the time series’.rolling(window=’order of the moving average’).mean()

6. To check for the stationarity of the series:

a. Augmented Dickey Fuller (ADF) Test

from statsmodels.tsa.stattools import adfuller

Statsmodels link for Augmented Dickey-Fuller Test:

b. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

Statsmodels link for KPSS Test

# Generate all different combinations of seasonal p, q and q triplets

#Initializing the looping parameters

#Loop function to calculate the auto arima

for param in pdq:

for param_seasonal in seasonal_pdq:

temp_model = sm.tsa.statespace.SARIMAX(‘name of the time series’,

if results.aic < best_aic:

#print("Unexpected error:", sys.exc_info()[0])

print("Best SARIMA{}{} model - AIC:{}".format(best_pdq, best_seasonal_pdq, best_aic))

best_model = sm.tsa.statespace.SARIMAX(‘name of the time series’,

order=(p, d, q), [(p.d.q) values got from the above loop]

best_results = best_model.fit() #building the model

best_results.forecast()#predicting using the model built

#To check the diagnostics of the model built

best_results.plot_diagnostics(lags=’desired lags to be specified’)

You might also like