0% found this document useful (0 votes)

55 views24 pages

MSE204 Lecture Questions

Lecture wise questions for MSE204

Uploaded by

Ahmed Adnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views24 pages

MSE204 Lecture Questions

Lecture wise questions for MSE204

Uploaded by

Ahmed Adnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

MSE204: Computational Methods in Materials Science and Engineering

Lecture wise plan and assignment problems (August 2023)

Note: Use Python to solve problems marked as (py)

Lecture 1

1. Why do we need to study probability: Some famous probability distributions

Boltzmann distribution
Maxwell-Boltzmann distribution (molecule in an ideal gas)

Exercise:

1. Starting from the Boltzmann distribution, derive the temperature dependence of susceptibility of
a paramagnet (Curie law).

2. Plot Maxwell-Boltzmann distribution.

3. Ideal gas considering gravity: find P (z), i.e., the probability of finding a molecule at height z.

4. Starting from the Boltzmann distribution, derive expression for specific heat of a paramagnet.

Lecture 2

1. General concepts of discrete random variable and discrete probability distribution

Probability mass function (PMF)

Relative frequency: “Experimental probability”
Calculate expectation values: Mean, variance
Cumulative distribution function (CDF)

2. Binomial distribution

Bernoulli trial: a trial with only two possible outcomes, “success” (probability p) and “fail-
ure” (probability q = 1 − p)
Binomial random variable x: number of “success” among n trials
Two parameters of binomial distribution: n and p
PMF and CDF of binomial distribution
Mean and variance of Binomial distribution

3. Python: Use of SciPy and Matplotlib library to solve simple problems, plot PMF and CDF

stats.binom.pmf(x,n,p)
stats.binom.cdf(x,n,p)

Exercise:

1. Various concepts: Let the experiment be single throw of a die and the random variable be the
outcome of a throw.

What is the sample space?

Plot the PMF. Do you see why this is a uniform discrete probability distribution?
How would you calculate the relative frequency and find the PMF experimentally?

1
Find the mean and variance. In this case, mean is just the simple arithmetic average of all
possible outcomes. However, it is NOT TRUE in general.
Convince yourself that mean is a WEIGHTED AVERAGE of the outcomes. For example,
assume an unfair die, with probability of getting 1 and 2 are 1/4 and rest are 1/8. What
would be the mean? Try to understand the concept of weighted average in terms of relative
frequency.
Another name of mean is EXPECTATION VALUE. Does is mean that if we do the experi-
ment once, the most likely outcome is equal to the expectation value?
Convince yourself that it is NOT necessary that the mean of a discrete probability distribu-
tion is one of the outcomes. If it happens, then it is just a coincidence.
What is the difference between simple ARITHMETIC AVERAGE and WEIGHTED AVER-
AGE? When are they same?

2. Random variable and sample space:

Give an example when the random variable is same as the sample space of the experiment.
Give an example when the random variable is not same as the sample space of the experiment.
Give an example when the sample space is same but the random variables are different.

3. Let the random variable be the outcome of sum of two dice thrown simultaneously.

Plot the probability mass function. Is this a uniform probability distribution?

Find the mean and variance.

4. You have ordered 2000 boxes of certain product, each box containing 5 items. Let the random
variable X be the number of defective items in each box. Given that P (X = 0) = 0.8, P (X =
1) = 0.08, P (X = 2) = 0.05, P (X = 3) = 0.04, P (X = 4) = 0.02, P (X = 5) = 0.01.

Find the mean and standard deviation of defective samples present in a box.
Is the mean same as one of the possible outcomes (X)? Explain the meaning of mean.
What would be the total number of defective items?

5. Let the random variable be the outcome of a single throw of a die.

Find the CDF and plot it.

Starting from the CDF, can you find the PMF?

6. Let the random variable be the outcome of sum of two dice thrown simultaneously.

Find the CDF and plot it.

Starting from the CDF, can you find the PMF?

7. Let the random variable X be the number of heads when three coins are tossed. Make a table of
X and the probability mass function f (xi ) = pi . Plot the PMF and CDF.

8. Sampling without replacement: Suppose 20 balls are kept in a box and 5 of them are red in color,
rest of them are blue. Two balls are picked randomly without replacement from the box. Let the
random variable X be the number of red balls picked in a trial. Find the CDF of X.

9. Sampling with replacement: 100 samples are kept in a box and 10 of them are defective. We
randomly pick a sample from the box. If the sample is not defective, we put it back to the box
and randomly pick another sample from the box. What is the probability that we find a defective
sample at xth trial? The probability distribution is known as the geometric distribution.

2
Calculate mean and variance of geometric distribution.
Find the CDF for the geometric distribution.
(py) The PMF of the geometric distribution is f (x) = (1 − p)x−1 p, where p is a parameter.
Plot the PMF and CDF for different values of p.

10. Scaling a random variable:

Prove that E(aX + b) = aE(X) + b and V (aX + b) = a2 V (X), where X is a random variable.
Choosing suitable values of a, b, can you scale the random variable X and define a new
random variable having mean 0 and variance 1? Scaling is frequently used in data science.

11. Based on the examples discussed so far, try to grasp the following theorem: Given a symmetric
distribution with respect to xi = c, i.e., f (c − xi ) = f (c + xi ); mean of the distribution is µ = c.

Verify whether the theorem works for experiments like single throw of a die, throwing two dice
simultaneously etc. Upshot: you can avoid the algebra, provided the probability distribution
is symmetric.

12. Derivation of PMF and CDF: A coin is tossed n = 3 times and the random variable x is the total
number of heads, such that X = 0, 1, 2, 3. Probability of getting a head is p = 1/2. Derive the
PMF and CDF of Binomial distribution.

13. There are 3 MCQs. Each question has 4 options and only one is correct. The person answering
the questions has no clue about the correct option and taking a guess. Possible outcomes are
RRR, RRW etc., where R stands for right and W stands for wrong.

Find out the probability of all right, three right, two right, one right and zero right answer.
Verify that P (X = 3), P (X = 2), P (X = 1) and P (X = 0) follow binomial probability
distribution.
Compare with the previous problem and comment.

14. Mean, variance:

Pn
Verify that PMF of binomial distribution is normalized, i.e., it satisfies x=0 f (x) = 1.
Derive mean µ = E(X) = np.
Derive E(X 2 ) = nx=0 n Cx px q n−x x2 = µ2 + npq.
P

For a given n, what value of p would give you the maximum variance?

15. (py) Symmetry of binomial distribution: Write a code to make bar plots of binomial probability
mass functions:

n = 20, p = 0.1
n = 20, p = 0.3
n = 20, p = 0.5
n = 20, p = 0.7
n = 20, p = 0.9

16. Using the theorem on symmetric probability distribution (discussed previously), prove that bino-
mial distribution is not symmetric in general. Find out, under what condition binomial distribu-
tion becomes symmetric.

17. (py) Sharpness of distribution:

3
Write a code to plot the PMF and CDF of a binomial distribution for fixed p = 0.5 and
three different values of n = 10, 100, 1000.
Do you see that the PMF becomes very sharply peaked as n increases? What happens to
the corresponding CDF?

18. The ratio σ/µ is a good measure of relative width of a probability distribution.
q
Prove that the relative width of a binomial distribution is √1n pq .
What happens when n becomes very large? Does it explain what you have observed in the
previous problem?

19. Let X be a binomial random variable. Define a new random variable Y , which is equal to the
difference between “successful” and “unsuccessful” trials.

What are the possible values of Y ?

Calculate the values of µY and σY2 .

This problem can be connected to 1D random walk and paramagnet, where the difference is equal
to the net displacement and net magnetic moment.

20. (py) Paramagnet: Total number of spins n = n↑ + n↓ and m = n↑ − n↓ .

Considering no external field, probability of ↑ spin is p = 0.5.1 Plot the probability distri-
bution function f (m) assuming n = 20. From the plot, find the ratio of f (4)/f (0), which
is significant. That means, there is a considerable probability of getting net magnetic mo-
ment in the absence of an external magnetic field in a paramagnet. But this is not possible
experimentally.
Plot the probability distribution function f (m) assuming n = 200. From the plot, find the
ratio of f (40)/f (0), which can be directly compared with the value obtained in the previous
problem for n = 20. What do you observe?
Imagine what would happen in case of a solid where n = 1023 .
Do you see that why properties of nano-materials generally have larger error-bars compared
to bulk materials?

21. (py) 30 samples are prepared every day in a workshop. The probability that a sample is defective
is 0.1. Clearly, number of defective samples X = 0, 1, 2, ....., 30 follows a binomial distribution.
Using a python code, find the following.

Probability of exactly 5 defective samples.

Probability of at most 5 defective samples.
Probability of at least 5 defective samples.
Plot PMF and CDF.

Lecture 3-4

1. Continuous random variable and continuous probability distribution

Probability density function (PDF)

Calculate expectation values: Mean, variance
Cumulative distribution function (CDF)
Median and mode
1
In the absence of external magnetic field, ↑ and ↓ spins are equally probable.

4
2. Normal distribution

Two Parameters of normal distribution: µ and σ

PDF of normal distribution
Standard normal variable: PDF and CDF
Introduction to standard normal cumulative probability table
Standardization

3. Python:

Use of SciPy and Matplotlib library to plot PDF and CDF

Numpy library: arrays, matrices and mathematical function
Pandas library: Python data analysis
Seaborn library: Statistical data visualization
Python: Solving problems using SciPy
– stats.norm.pdf(x,µ, σ)
– stats.norm.cdf(x,µ, σ)
– stats.norm.ppf(x,µ, σ)

Exercise:

1. Let a continuous random variable X denote the temperature. A vaccine is supposed to be stored
at 0◦ C. If the temperature goes beyond 3◦ C, it can not be used. Historical data show that the
temperature fluctuation can be modeled by a PDF f (x) = e−x for X ≥ 0 and f (x) = 0 for X < 0.
Estimate the fraction of vaccine dose wasted.

2. Exponential distribution: The example discussed above is known as an exponential probability

distribution. It has only one parameter λ. The PDF of exponential distribution has a general form
of f (x) = λe−λx . Prove that, exponential distribution has a mean equal to 1/λ and a variance
equal to 1/λ2 .

3. Repeat the first problem with different values.

The vaccine can not be used if the temperature goes beyond 1◦ C. Estimate the fraction of
vaccine dose wasted.
The vaccine can not be used if the temperature goes beyond 2◦ C. Estimate the fraction of
vaccine dose wasted.
The vaccine can not be used if the temperature goes beyond 3◦ C. Estimate the fraction of
vaccine dose wasted.

Convince yourself that if the tolerance limit is too close to the mean, the wastage is very high. It is
natural to estimate the distance between tolerance limit and mean in terms of standard deviation.
For example, the tolerance limit of 3◦ C is 2 standard deviation away from the mean.

4. How to get CDF from PDF and PDF from CDF?

5. Continuous uniform distribution: A continuous random variable X has a PDF f (x) = 1/(b − a)
for a ≤ x ≤ b.

Prove that,

a+b (b − a)2
µ= , σ2 = .
2 12

5
Verify that uniform distribution is symmetric about (a + b)/2.
Find the cumulative distribution function F (x) and plot it.
Starting from F (x), try to get f (x).

6. Theorem: Given a symmetric distribution with respect to x = c, i.e., f (c − x) = f (c + x);

mean of the distribution is µ = c.
2 2
(py) Gaussian or normal distribution is given by: f (x) = σ√12π e−(x−µ) /2σ . Plot normal
distribution for different values of µ and σ. Can we find the mean of the normal distribution
using the above theorem?
2 2
(py) Log-normal distribution is given by: f (x) = xω√ 1
2π
e−[ln(x)−θ] /2ω for 0 < x < ∞. Plot
log-normal distributions for θ = 0 and ω 2 = 0.5, 1, 2. Can we find the mean of the normal
distribution using the above theorem?
(py) Exponential distribution is given by: f (x) = λe−λx for x ≥ 0 and λ > 0. Plot exponen-
tial distribution for λ = 0.5, 1, 2. Can we find the mean of the normal distribution using the
above theorem?
Γ(α+β) α−1
(py) Beta distribution is given by: f (x) = Γ(α)Γ(β)
x (1 − x)β−1 for 0 ≤ x ≤ 1 and α, β > 0.
– Plot for α = β = 2, α = 2, β = 5 and α = 5, β = 2. Can you calculate mean using the
above theorem?

7. (py) Generating random data: in the last problem, you plotted probability distribution func-
tions of various continuous distributions. Often it is useful to generate a data set randomly,
which mimics certain probability distribution. Generate random data for normal, log-normal and
exponential distribution.

8. Mean, median and mode of symmetric, left-skewed and right-skewed distribution: it is known
α α−1
that mean of beta distribution is µ = α+β and mode of β distribution is α+β−2 . You may use this
information to verify the answers of the following problems.
To solve the following problems, you have to generate a very large data set randomly for beta
distribution and plot, as shown in the previous problem.

(py) Verify that Beta distribution is symmetric about x = 0.5, if α = β. For a symmetric
beta distribution, mean, mode and median values are equal to each other.2 You have to
verify this from the plots and the formula given above. Mode and median can be obtained
from the probability distribution and cumulative distribution, respectively.
(py) Verify that Beta distribution is right-skewed if α < β. For a right-skewed beta distribu-
tion, mode < median < mean.3 You have to verify this from the plots and the formula given
above. Mode and median can be obtained from the probability distribution and cumulative
distribution, respectively.
(py) Verify that Beta distribution is left-skewed if α > β. For a left-skewed beta distribution,
mode > median > mean.4 You have to verify this from the plots and the formula given
above. Mode and median can be obtained from the probability distribution and cumulative
distribution, respectively.
2 2
9. PDF of normal distribution: f (x) = √1 e−(x−µ) /2σ
σ 2π

Verify normalization.
Verify that mean is µ.
Verify that variance is σ 2 .
2
This is true for any symmetric distribution, for example normal distribution.
3
This is true for any right-skewed distribution.
4
This is true for any left-skewed distribution.

6
10. What is the CDF of normal distribution?

11. What are the values of P (µ − σ < X < µ + σ), P (µ − 2σ < X < µ + 2σ) and P (µ − 3σ < X <
µ + 3σ)?

12. What is a standard normal variable and what are the corresponding PDF and CDF?

13. Understand the standard normal cumulative probability table and find the values of:

P (Z ≤ −1.54)
P (Z > −1.54)
P (Z ≤ 1.02)
P (Z > 1.02
P (Z ≤ 0.28)
P (Z > 0.28)
P (0.28 < Z < 1.02)
Find z such that P (Z ≤ z) = 0.770350
Find z such that P (Z > z) = 0.0968

14. (py) Solve the last problem using python.

15. One more definition: D(z) = Φ(z) − Φ(−z). Answer the following.

D(1) = P (−1 < Z < 1) =?

D(2) = P (−2 < Z < 2) =?
D(3) = P (−3 < Z < 3) =?
Find z such that P (−z < Z < z) = 0.990120

16. (py) Solve the last problem using python.

17. Standardization: how do we apply the above method for any µ and σ?

18. A coating is applied on a glass plate. Coating thickness follows a normal distribution with a mean
of 10 mm and variance of 4 mm2 .

What is the probability that the coating thickness exceeds 12 mm?

What is the probability that the coating thickness is between 9 and 11 mm?
Convince yourself that 50% of the coatings have thickness ≤ 10 mm.
Fill in the blank: 98% of the coatings have thickness ≤ mm.

19. (py) Solve the last problem using python.

20. Say some detector is detecting some signal. Background noise follows a normal distribution with
mean 0 and standard deviation 2. The detector records a signal if the value is ≥ 4.

A false signal is recorded when the noise level is 4 or higher. What is the probability of
detecting a false signal?
Find the symmetric bound about the mean that include 95% of all noise readings.
Find the symmetric bound about the mean that include 97% of all noise readings.
Find the symmetric bound about the mean that include 99% of all noise readings.

7
21. A coating is applied on a glass plate. Coating thickness follows a normal distribution with a mean
of 10 mm and variance of 4 mm2 . The accepted dimension is 9 ± 2 mm. Find the fraction of
coated glass plates wasted? What would you do to minimize the wastage?

22. Normal approximation to binomial distribution: An exam has 50 MCQs. Each question has 4
choice and only one of them is correct. Assume that a candidate is guessing all the questions.
Let X be the random variable representing the number of correct answers. Using binomial
distribution,

Find P (X = 2), which is equal to the probability that exactly 2 answers are correct.
Find P (X ≤ 2), which is equal to the probability that at most 2 answers are correct.
Find P (1 ≤ X ≤ 3), which is equal to the probability that 1-3 answers are correct.

A standard normal variable can be defined using the mean and variance of binomial distribution:
X − np
Z=p .
np(1 − p)

Since we are converting from a discrete to a continuous random variable, we have to apply a
continuity correction,
 
 x − 0.5 − np X − np x + 0.5 − np 
P (X = x) = P (x − 0.5 ≤ X ≤ x + 0.5) = P  p ≤p ≤ p .
 
 np(1 − p) np(1 − p) np(1 − p) 
| {z }
Z

Similarly, !
a − 0.5 − np b + 0.5 − np
P (a ≤ X ≤ b) = P p ≤Z≤ p .
np(1 − p) np(1 − p)

Find P (X = 2) using normal approximation of binomial distribution.

Find P (X ≤ 2) using normal approximation of binomial distribution.
Find P (1 ≤ X ≤ 3) using normal approximation of binomial distribution.

23. This problem is designed to show the limits of normal approximation for a given binomial dis-
tribution. This will help you to understand when normal approximation of binomial distribution
may not work very well.

(py) Create bar-plots of binomial distribution with different parameters like (a) n = 10, p =
0.5, (b) n = 10, p = 0.1, (c) n = 10, p = 0.9. Plot probability density function of normal
approximation for each of the cases.
(py) Create bar-plots of binomial distribution with different parameters like (a) n = 50, p =
0.5, (b) n = 50, p = 0.1, (c) n = 50, p = 0.9. Plot probability density function of normal
approximation for each of the cases.

Comment: Normal approximation of binomial distribution is good for np > 5 and n(1 − p) > 5.

24. Assume that 100000 pages are printed per day in a press. Probability that a page is rejected due
to some error can be modeled by a binomial distribution and such a probability is p = 10−3 .

Find the probability that exactly 100 pages are rejected.

Find the probability that not more than 100 pages are rejected.

8
Lecture 5

1. Basic data analysis and visualization

Concept of population and sample

Numerical summary – mean, median, variance, quartiles

2. Python:

Numerical summary: Dataframe using pandas library - df.describe( )

Data visualization using seaborn library - histogram, box and whisker plot
Testing normality - normal probability plot

Exercise:

1. What is sample mean and sample variance?

2. (py) Using the data file given, get the data summary.

3. (py) Using the data file given, plot a histogram. What are the advantages of histogram plot?

4. (py) Using the data file given, plot a box and whisker diagram. What are the advantages of box
and whisker plot?

5. What is interquartile range? Where do we draw the box and whiskers?

6. What are known as outliers? Why do we care about them?

7. (py) Using the data file given, draw a normal probability plot. What is the purpose of using
normal probability plot?

8. (py) Compare normal probability plots of normal distribution, heavy-tailed (compared to normal)
distribution and skewed distribution.

Lecture 6-7

1. Point estimation of parameters

Concept of random sample, sampling distribution and central limit theorem

2. Interval estimation of parameters:

Confidence interval (CI) of µ for normal distribution

– σ 2 known: use Z-distribution
– σ 2 unknown: use T-distribution
Confidence interval (CI) of µ for non-normal distribution

3. Python:

CI using Z-distribution:
stats.norm.interval(alpha, loc = X mean, scale = X std/np.sqrt(X count))
CI using T-distribution:
stats.t.interval(alpha, X count − 1, loc = X mean, scale = X std/np.sqrt(X count))
Meaning of different terms: alpha = 0.9, 0.95, 0.99 etc., X count = sample size, X mean =
sample mean, X std = population standard deviation σ for Z-distribution and X std =
sample standard deviation for T-distribution

9
Exercise:

1. What are the point estimates of mean and variance of normal distribution?

2. (py) Verify that point estimates of parameters of a normal distribution are random variables and
they are not same as population mean and population variance.

3. (py) Central limit theorem: Let us do random sampling from a normal distribution with mean µ
and variance σ 2 . Using a code, verify that the sample average follows a normal distribution with
X−µ
mean µ and variance σ 2 /n. We have to define Z = σ/ √ and confirm that Z has mean 0 and
n
variance 1.

4. Central limit theorem holds good, even when we do random sampling from a non-normal dis-
tribution! Let us test this for a random variable X, having a continuous uniform probability
distribution, f (x) = 1/10 for 0 ≤ x ≤ 10 and f (x) = 0 otherwise.

If we take a random sample of size n = 30, what would be the probability distribution of
sample mean X, according to the central limit theorem.
Draw (by hand) the probability distribution of X and X.
(py) Using python, draw histogram and normal probability plot of X.
(py) Using python, standardize X and draw histogram and normal probability plot.

5. Given X is a random normal variable, with mean µ and variance σ 2 .

Draw probability distribution of X and X.

Draw probability distribution of X for two different sample size, n = 10 and n = 20.
(py) Repeat the last plot using python, assuming some reasonable values of µ and σ 2 .

6. Validity of point estimation: Machine parts are manufactured having a mean length of 100 mm
and standard deviation of 10 mm.

What is the probability that a random sample of size 25 have a sample average greater than
105 mm?
What is the probability that a random sample of size 10 have a sample average greater than
105 mm?
What do you conclude?

7. Strength of an alloy is a normal random variable distributed with σ = 2 MPa. Ten measurement
are as follows: 11.39, 7.46, 13.78, 11.98, 8.06, 11.69, 9.86, 12.02, 9.12, 11.73.

Find a 95% CI on µ.
Find a 99% CI on µ.

8. (py) Repeat the previous problem using python.

9. Consider the previous problem.

What is the precision level (interval width) and absolute error in estimating µ?
What would be the interval width for 95% and 99% confidence level?
What would be the sample size if we want to reduce absolute error to 0.715 for 95% confidence
level?

10. The length of a rod is a normal random variable distributed with σ = 1 cm. Ten measurements
are as follows: 50.7, 50.5, 50.6, 50.5, 50.3, 50.6, 50.8, 50.2, 50.3, 50.1.

10
Find a 95% CI on µ, the mean length.
Repeat the same problem, this time for 99% CI.
From the above two problems, verify that the precision level for 99% CI is lower than that
of 95% CI.
What would you do to improve the precision level of 99% CI to the level of 95% CI?

11. (py) Verify that CI is a random variable: Assuming a normal population with µ = 10, σ = 2,
generate several random samples of size 10.

Find CI for each random sample, assuming 95% confidence level.

Verify whether the interval contains actual µ for each random sample.
What is the meaning of 95% CI?

12. Understanding absolute error:

Plot E/σ (absolute error measured in the unit of σ) as a function of sample size n for 95%
and 99% CI.
(py) Plot E s /E σ as a function of sample size n for 95% and 99% CI, assuming equal values
of s and σ.
Based on the plots, what would you conclude?

13. Number of days after which a component needs to be replaced is as follows: 7.5, 12.7, 16.7, 11.9,
15.4, 11.9, 15.8, 11.4, 14.9, 7.9, 17.6, 13.6, 10.1, 18.5, 14.1, 8.8, 19.8, 15.4, 11.4, 19.5, 15.4. Find
the sample average and sample variance using python and then solve the rest of the problem by
hand.

Find a 90% CI on µ.
Find a 95% CI on µ.
Find a 98% CI on µ.

Lecture 8-10

1. Hypothesis testing and decision making

Concept of null and alternate hypothesis

One sided and two sided test
Acceptance region, rejection/critical region, critical value
Errors: type I error, type II error, significance level, power of test
One-sample Z-test on the mean: test statistic Z = X−µ
√
σ/ n

– Fixed significance level hypothesis test

– Pvalue hypothesis test
One-sample T-test on the mean: test statistic T = X−µ
√
S/ n
X 1 −X 2 −(µ1 −µ2 )
Two-sample test on the mean: test statistic Z = r
2
σ1 σ2
. T-statistic used when σ1
n1
+ n2
2
and σ2 unknown.
Paired-sample test on the mean: test statistic T = D−∆√
SD / n
2
Goodness of fit test: test statistic χ2 = i (Oi −E i)
P
Ei

2. Python

11
One-sample T-test on the mean:
stats.ttest 1samp(df,popmean,alternative=‘greater’)
where df is a data frame, containing the sample values; popmean is the value of null hy-
pothesis (assumed value of population mean); alternative=‘greater’ (right-sided or upper-
tailed test) or alternative=‘less’ (left-sided or lower-tailed test) or alternative=‘two-
sided’ (two-sided test).
Two-sample T-test on the mean:
stats.ttest ind(df[‘Population 1’],df[‘Population 2’],alternative=‘greater’)
where df is a data frame, containing two data columns Population 1 and Population 2;
alternative=‘greater’ (right-sided or upper-tailed test) or alternative=‘less’ (left-sided
or lower-tailed test) or alternative=‘two-sided’ (two-sided test).
Paired-sample T-test on the mean:
stats.ttest rel(df[‘After’],df[‘Before’],alternative=‘greater’)
where df is a data frame, containing two data columns After and Before; alterna-
tive=‘greater’ (right-sided or upper-tailed test) or alternative=‘less’ (left-sided or lower-
tailed test) or alternative=‘two-sided’ (two-sided test).
Exercise:
1. General ideas about the significance level and type II error.
Consider a normal distribution with σ 2 = 9 and the null hypothesis H0 : µ = 90. In a
two-sided test, the acceptance region is 90 ± 1.5, such that a type I error occurs if X > 91.5
or X < 88.5. Find the significance level, taking n = 9.
If we widen the acceptance region to 90 ± 2.5, what would be the significance level.
If we shorten the acceptance region to 90 ± 0.5, what would be the significance level.
Effect of sample size on the significance level: Repeat the previous problem, taking n = 64.
Given the null hypothesis H0 : µ = 90 and alternative hypothesis H1 : µ = 94, find the type
II error, given that acceptance range is 90 ± 1.5, σ 2 = 9, n = 9.
Type II error increases as null and alternative hypothesis gets closer. Given the null hypoth-
esis H0 : µ = 90 and alternative hypothesis H1 : µ = 92. Other parameters are same as the
previous problem.
Effect of sample size on type II error: show that type II error decreases if we take n = 64.
Other parameters are same as the previous problem.
2. Fixed significance level hypothesis test:
(py) Right-sided test: Assume X to be a normal random variable with variance σ 2 = 20.
Using a sample size of 10, test the hypothesis H0 : µ = 100 against the alternative H1 : µ >
100 and plot the power function. Assume a fixed significance level α = 0.05.
(py) Left-sided test: Assume X to be a normal random variable with variance σ 2 = 20. Using
a sample size of 10, test the hypothesis H0 : µ = 100 against the alternative H1 : µ < 100
and plot the power function. Assume a fixed significance level α = 0.05.
(py) Two-sided test: Assume X to be a normal random variable with variance σ 2 = 20. Using
a sample size of 10, test the hypothesis H0 : µ = 100 against the alternative H1 : µ 6= 100
and plot the power function. Assume a fixed significance level α = 0.05.
3. Pvalue hypothesis test: Say a random normal sample is
{103.5, 104, 102, 103, 101, 99.5, 100.5, 103.5, 102.5},
and population variance is σ 2 = 9. Perform a two-sided test of the null hypothesis H0 : µ = 100
against the alternative H1 : µ 6= 100.

12
4. Say we are testing a null hypothesis H0 : µ = 100, against an alternate hypothesis H1 : µ > 100.
Do the following exercise and verify that your decision depends on the sample size! How do you
explain this?

Given sample size n = 9, sample average x = 101.4 and σ 2 = 9, find the p−value.
Given sample size n = 16, sample average x = 101.4 and σ 2 = 9, find the p−value.
Given sample size n = 25, sample average x = 101.4 and σ 2 = 9, find the p−value.

5. (py) Sensitivity of Pvalue hypothesis test: Show that, if the null hypothesis is significantly wrong
compared to the true population mean, Pvalue should be able to detect that. Do the test with
sample size 10 and sample size 20 or even higher. What would you conclude?

6. One-sample T-test on the mean:

Say sample mean is x = 0.83724 and sample standard deviation is s = 0.024557 for a sample
size n = 15. Test null hypothesis H0 : µ = 0.82 against alternative hypothesis H1 : µ > 0.82.
(py) Do the above exercise using python.

7. Two-sample test on the mean:

Z-test: A manufacturer claims to have improved the battery life. From the new population,
a random sample of size n = 10 has a sample average of x1 = 121 days of battery life. On the
other hand, from the old population, a random sample of size n = 10 has a sample average
of x2 = 112 days of battery life. For both the populations, standard deviation σ = 8 days.
What conclusion can we draw about the claim of improvement of the battery life?
T-test: A company is asking for more price of a material (population 1) than that of another
(population 2), because the former has higher thermal conductivity. Two random samples
are as follows:
P opulation1 P opulation2
118 108
127 123
117 119
117 119
126 124
126 116
123 114
130 124
120 115
113 114
Indeed, population 1 has higher sample average (121.7 W/mK) than that of population 2
(117.6 W/mK). Determine whether the difference is statistically significant.
(py) Repeat the above exercise using python.

8. Paired-sample test on the mean:

Values are given (in GPa) for a high strength steel after and before the heat treatment. We

13
want to test whether the process of heat treatment has really lead to strength enhancement.

Af ter Bef ore Dif f erence

1.186 1.161 0.025
1.151 1.210 −0.059
1.322 1.263 0.059
1.339 1.262 0.077
1.200 1.265 −0.065
1.402 1.278 0.124
1.365 1.337 0.028
1.537 1.486 0.051
1.559 1.452 0.107

The sample average before heat treatment is 1.3 GPa and after heat treatment is 1.34 GPa.
Looking at the sample average, it looks like the heat treatment process has marginally
increased the strength. But is this statistically significant?
(py) Repeat the above exercise using python.

9. Goodness of fit – χ2 -test:

Manager of a workshop claims that out of the total products manufactured in a day, 35% of
the products fall in excellent category, 40% of the product fall in good category, 20% of the
product fall in acceptable category and 5% are the products fall in rejected category. On a
particular day, 500 products are randomly sampled and it was found that 190 of them belong
to excellent, 185 of them belong to good, 90 of them belong to acceptable and 35 of them
belong to the rejected category. Based on this, verify the claim of the workshop manager.
(py) Repeat the above exercise using python.

Lecture 11-14

1. Simple linear regression

Method of least squares

Residual analysis
Analysis of variance (ANOVA): total sum of squares (SST ), regression sum of squares (SSR ),
error sum of squares (SSE ), mean squared error (M SE ), coefficient of determination (R2 )
Hypothesis test on regression coefficients
Confidence intervals on regression coefficients
Confidence intervals about the regression line

2. Binary logistic regression

Odds and logit function

Hypothesis test on regression coefficients
Confidence intervals on regression coefficients

3. Python

Building a linear and logistic regression model

– Use statsmodel, sklearn
Residual analysis and influence plot
ANOVA, hypothesis test, confidence interval

14
xi yi (xi − x)(yi − y) (xi − x)2 ŷi (yi − y)2 (ŷi − y)2 (ŷi − yi )2
10.2 89.05
12.9 93.74
13.6 94.45
14.6 96.73
14.0 93.65
11.5 92.52
10.1 89.45
9.5 87.33
x =? y =? Sxy =? Sxx =? SST =? SSR =? SSE =?
M SR = SSR M SE = σ̂ 2 = SSE
n−2
=?
F = M SR /M SE =?
Hypothesis test on coeffs 95% CI
β̂0 =? se(β̂0 ) =? t =? Pvalue =? [=?,=?]
β̂1 =? se(β̂1 ) =? t =? Pvalue =? [=?,=?]

Table 1: Find the linear regression coefficients, β̂1 = Sxy /Sxx , β̂0 = y − β̂1 x and complete the table.
Verify that (a) SST = SSR + SSE , (b) SSE = SST − β̂1 Sxy , (c) SSR = β̂1 Sxy . Compare the numbers
obtained so far with the output of the command summary2().

Exercise:
Simple linear regression

1. Using method of least squares derive β̂0 = y − β̂1 x and β̂1 = Sxy /Sxx .

2. Residuals are defined as i = yi − ŷi . What are the assumptions of linear regression model about
the residuals?

3. Write the ANOVA identity.

4. What are the degrees of freedom of SSE , SSR , SST ?

5. Define M SE , M SR , M ST .

6. Write the null and alternate hypothesis for testing the regression coefficients.

7. Write the test statistic for hypothesis testing of regression coefficients.

8. Write the formula of 95% CI for regression coefficients.

9. Test the significance of regression using the following: H0 : β1 = 0, H1 : β1 6= 0, β̂1 = 14.86, σ̂ 2 =

0.96, Sxx = 0.92, n = 30 and significance level α = 0.05.

10. Find a 95% CI on the slope of the regression line using the following: β̂1 = 14.86, σ̂ 2 = 0.96, Sxx =
0.92, n = 30.

11. (py) Complete Table 1 using some spreadsheet software. Solve it separately using python and
compare the results.

12. (py) Using the data given in this link, build a linear regression model and perform some model
diagnosis like

ANOVA, R2
Residual analysis to verify residuals are normal random variables with constant variance
Influence plot

15
Hypothesis test and confidence intervals on regression coefficients

Binary logistic regression

1. When do we need to use logistic regression?

2. Define odds and logit function.

1
3. A logistic regression model is ŷ = 1+e−(15.4394−0.2373x)
.

Log of odds reduces by how much per unit increase of x?

Odds reduces by how much per unit increase of x?
Find the value of odds for x = 60 and x = 70.

4. Given β̂1 = −0.24 and se(β̂1 ) = 0.09. Using hypothesis test, verify the significance of regression.
Find the Pvalue , reference value of zα for α = 0.05 and 95% CI.

16
CI cheat sheet: We want to construct a CI on some population parameter. Define a new random
variable (RV)
θ̂ − θ
RV = ,
se(θ)
where θ is a population parameter, θ̂ is the corresponding point estimate and se(θ) is the standard
error/ estimated standard error. For example, if we want CI on population mean µ, the point estimate
is the sample average X. Following are the list of all cases covered in this course.
X−µ
CI on µ, σ −Zα ≤ ≤ Zα ; Zα = 1.96 for 95% CI Sample average X, size n, population
se(X)
known X − Zα · se(X) ≤ µ ≤ X + Zα · se(X) variance σ, standard error se(X) = √σn
X−µ
CI on µ, σ −Tα ≤ se(X) ≤ Tα ; Tα = T0.025,n−1 for 95% CI Sample average X, size n, sample vari-
unknown X − Tα · se(X) ≤ µ ≤ X + Tα · se(X) ance S, estimated std error se(X) = √Sn
β̂1 −β1
CI on β1 −Tα ≤ se(β̂1 )
≤ Tα ; Tα = T0.025,n−2 for 95% CI Linear regression slope β̂1 , size n, esti-
(linear) β̂1 − Tα · se(β̂1 ) ≤ β1 ≤ β̂1 + Tα · se(β̂1 ) mated standard error se(β̂1 ) = √Sσ̂xx
1 −β1
CI on β1 −Zα ≤ β̂se( β̂1 )
≤ Zα ; Zα = 1.96 for 95% CI Logistic regression slope β̂1 , estimated
(logistic) β̂ − Z · se(β̂ ) ≤ β ≤ β̂ + Z · se(β̂ ) standard error se(β̂1 )
1 α 1 1 1 α 1

Example: Given X = 10, σ 2 = 9, n = 9. Find 95% CI.

Answer: For 95% CI, Zα = 1.96 and the CI is [8.04,11.96].

Example: Given X = 10, S 2 = 9, n = 9. Find 95% CI.

Answer: For 95% CI, Tα = T0.025,8 = 2.306 and the CI is [7.694,12.306]. Note that, width of the
CI is more than the previous case, although rest of the parameters are same. Instead of population
variance, we are using sample variance. Thus, we have additional randomness and the extra width is
provided to account for the additional randomness.

Example: In a linear regression problem, you found the slope to be β̂1 = 10, SSE = 7, Sxx = 1, n = 9.
Find the 95% CI on the slope.
Answer: Mean square error M SE = σ̂ 2 = SS n−2
E
= 1 and standard error se(β̂1 ) = √Sσ̂xx = 1. For 95%
CI, T0.025,7 = 2.365. Thus, the 95% CI on the slope is [7.635,12.365]. Since the CI on slope does not
contain 0, there is strong evidence that the slope is not zero. If the interval contains 0, then it is very
unlikely that linear regression model is correct.

Example: In a logistic regression proble, you find β̂1 = −0.24 and se(β̂1 ) = 0.09. Find the 95% CI
on the slope.
Answer: The reference value of Z for 95% CI is z0.025 = 1.96. Thus, the 95% CI on the slope is
[-0.4164,-0.0636]. Since the CI on slope does not contain 0, there is strong evidence that the slope is
not zero. If the interval contains 0, then it is very unlikely that regression model is correct.

Hypothesis test cheat sheet: We want to test the null hypothesis H0 : θ = θ0 . The test statistic
is,
θ̂ − θ0
Statistic = ,
se(θ)
where θ0 is the null hypothesis (guess value of the population parameter), θ̂ is the corresponding point
estimate and se(θ) is the standard error/ estimated standard error. For example, if we want to test
population mean µ, the point estimate is the sample average X. Following are the list of all cases
covered in this course.

17
H0 : µ = µ0 ; σ known Z = X−µ 0
se(X)
; N (0, 1) Sample average X, sample size n, population
distribution variance σ, standard error se(X) = √σn
H0 : µ = µ0 ; σ unknown T = X−µ se(X)
0
; Tα,n−1 Sample average X, sample size n, sample vari-
distribution ance S, estimated standard error se(X) = √Sn
X 1 −X 2 −∆
H0 : µ1 − µ2 = ∆; σ1 , σ2 Z = se(X 1 −X 2 )
; Sample average X 1 , X 2 , sample
q 2 size 2 n1 , n2 ,
known (two-sample) N (0, 1) distribution σ σ
standard error se(X1 − X 2 ) = n11 + n22
D−∆
H0 : µD = ∆ (paired- T = se(D) ; Tα,n−1 Paired-sample average D, sample size n, esti-
SD
sample) distribution mated standard error se(D) = √ n
β̂1 −0
H0 : β1 = 0 (linear) T = se( β̂1 )
; Tα,n−2 Linear regression slope β̂1 , sample size n, esti-
distribution mated standard error se(β̂1 ) = √Sσ̂xx
β̂1 −0
H0 : β1 = 0 (logistic) Z = se( β̂1 )
; N (0, 1) Linear regression slope β̂1 , estimated standard
distribution error se(β̂1 )

Hypothesis test: Z-statistic

Alternative Condition for accepting
H0 using Z-statistic
Fixed significance Right sided or upper tailed H1 : µ > µ0 Z < Z0.95
level test (α = 0.05) Left sided or lower tailed H1 : µ < µ0 Z > Z0.05
Two sided test H1 : µ 6= µ0 Z0.025 < Z < Z0.975
Pvalue test Right sided or upper tailed H1 : µ > µ0 Pvalue > 0.05
Left sided or lower tailed H1 : µ < µ0 Pvalue > 0.05
Two sided test H1 : µ 6= µ0 Pvalue > 0.05
Hypothesis test: T-statistic
Alternative Condition for accepting
H0 using T-statistic
Fixed significance Right sided or upper tailed H1 : µ > µ0 T < T0.95,n−1
level test (α = 0.05) Left sided or lower tailed H1 : µ < µ0 T > T0.05,n−1
Two sided test H1 : µ 6= µ0 T0.025,n−1 < T < T0.975,n−1
Pvalue test Right sided or upper tailed H1 : µ > µ0 Pvalue > 0.05
Left sided or lower tailed H1 : µ < µ0 Pvalue > 0.05
Two sided test H1 : µ 6= µ0 Pvalue > 0.05

After calculating the statistic (Z-statistic/ T-statistic), we either do fixed significance level test or
Pvalue test. Generally acceptable value of significance level is α = 0.05.
Reference Z-values for one-sided test (α = 0.05): Z0.95 = 1.65, Z0.05 = −1.65
Reference Z-values for two-sided test (α = 0.05): Z0.975 = 1.96, Z0.025 = −1.96

Example: Given X = 11.5, σ 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ > 10. Assume fixed significance level α = 0.05.
Answer: It is an upper tailed test. Reference value is Z0.95 = 1.65. Test statistic is Z = 11.5−10
√
3/ 9
= 1.5.
Since Z < 1.65, we accept the null hypothesis.
Same problem we can solve using Pvalue test. Test statistic is Z = 11.5−10 √
3/ 9
= 1.5. Thus, Pvalue =
1 − Φ(1.5) = 1 − 0.93 = 0.07. Since Pvalue > 0.05, we accept the null hypothesis. Note that, test of
hypothesis using Z-score or Pvalue are equivalent.

Example: Given X = 8, σ 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ < 10. Assume fixed significance level α = 0.05.
8−10
Answer: It is a lower tailed test. Reference value is Z0.05 = −1.65. Test statistic is Z = 3/√ = −2.
9
Since Z < −1.65, we reject the null hypothesis.

18
8−10
√ = −2. Thus, Pvalue =
Same problem we can solve using Pvalue test. Test statistic is Z = 3/ 9
Φ(−2) = 0.02. Since Pvalue < 0.05, we reject the null hypothesis. Note that, test of hypothesis using
Z-score or Pvalue are equivalent.

Example: Given X = 11, σ 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ 6= 10. Assume fixed significance level α = 0.05.
Answer: It is a two-sided test. Reference values are Z0.025 < Z < Z0.975 . Test statistic is Z =
11−10
√ = 1. Since −1.96 < Z < 1.96, we accept the null hypothesis.
3/ 9
Same problem we can solve using Pvalue test. Test statistic is Z = 11−10√
3/ 9
= 1. Thus, Pvalue =
1 − [Φ(1) − Φ(−1)] = 2 × Φ(−1) = 0.32. Since Pvalue > 0.05, we accept the null hypothesis. Note that,
test of hypothesis using Z-score or Pvalue are equivalent.

Example: Given X = 11.5, S 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ > 10. Assume fixed significance level α = 0.05.
Answer: It is an upper tailed test. Reference value is T0.05,8 = 1.86. Test statistic is T = 11.5−10
√
3/ 9
=
1.5. Since T < 1.86, we accept the null hypothesis. Note that, for the same set of data, reference
Z-value would be 1.65. The reference T value is higher, which takes care of the additional randomness.
Same problem we can solve using Pvalue test. Test statistic is T = 11.5−10 √
3/ 9
= 1.5. For DOF=8,
this falls between T=1.860 (α = 0.05) and T=1.397 (α = 0.10). Thus, 0.05 < Pvalue < 0.10. Since
Pvalue > 0.05, we accept the null hypothesis.

Example: Given X = 8, S 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ < 10. Assume fixed significance level α = 0.05.
8−10
Answer: It is a lower tailed test. Reference value is T0.05,8 = −1.86. Test statistic is T = 3/√ = −2.
9
Since T < −1.86, we reject the null hypothesis.
8−10
√ = −2. For DOF=8, this
Same problem we can solve using Pvalue test. Test statistic is T = 3/ 9
falls between T=-1.860 (α = 0.05) and T=-2.306 (α = 0.025). Thus, 0.025 < Pvalue < 0.05. Since
Pvalue < 0.05, we reject the null hypothesis.

Example: Given X = 11, S 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

H1 : µ 6= 10. Assume fixed significance level α = 0.05.
Answer: It is a two-sided test. Reference value is T0.025,8 = 2.306. Test statistic is T = 11−10
√ = 1.
3/ 9
Since −2.306 < T < 2.306, we accept the null hypothesis.
Same problem we can solve using Pvalue test. Test statistic is T = 11−10 √
3/ 9
= 1. For DOF=8, this
falls between T=0.706 (α = 0.25) and T=1.397 (α = 0.10). Since Pvalue > 2 × 0.10, we accept the null
hypothesis.

19
Lecture 15-18

1. Multiple linear regression

Method of least squares

Matrix representation
ANOVA, Coefficient of determination
Hypothesis testing and confidence interval

2. Classification problem

Odds and logit function

Hypothesis test on regression coefficients
Confidence intervals on regression coefficients
Confusion matrix and related parameters

3. Python

Building and analyzing a linear regression model

– Use statsmodel, sklearn
– Feature selection: correlation or heat map
– Model analysis: residual analysis and influence plot
– Model analysis: ANOVA, hypothesis test, confidence interval
Building and analyzing a regression model for classification
– Use statsmodel, sklearn
– Feature selection
– Model analysis: hypothesis test, confidence interval
– Model analysis: confusion matrix

Exercise:
Multiple linear regression

Obs. no. yi xi1 xi2 ŷi 2i = (yi − ŷi )2 (ŷi − y)2
1 24 8 110
2 32 11 120
3 35 10 550
4 25 8 295
5 45 15 250
6 24 9 100
7 27 8 300
8 37 11 400
9 42 12 500
10 35 10 540
y =? SSE = T =? SSR = (ŷ − y)T (ŷ − y) =?
M SE = SSE /(n − k − 1) =? M SR = SSR /k =?
β̂0 =? β̂1 =? β̂2 =? F = M SR /M SE =?

Table 2: Data given for multiple linear regression problem.

1. (py) Complete Table 2 using matrix method.

20
2. (py) Use the data given in Table 2 and fit the regression model using sklearn and statsmodel.
Compare the regression coefficients

3. Given SSE = 9.02, SSR = 501.38, total number of data points n = 10 and total number of
features k = 2. Find the F-statistic and coefficient of determination R2 .

4. In a multiple linear regression problem, there are k = 2 feature variables and n = 10 observa-
tions. Estimated regression coefficients and standard errors are β̂1 = 2.9, β̂2 = 0.02, se(β̂1 ) =
0.17, se(β̂2 ) = 0.002. Verify the significance of regression.

5. (py) Using the data given in this link, do the following

Build a linear regression model after properly selecting the important features using a cor-
relation or heat map
ANOVA, R2
Residual analysis to verify residuals are normal random variables with constant variance
Influence plot
Hypothesis test and confidence intervals on regression coefficients

Classification
1. (py) Using the data given in this link, do the following

Build a regression model after properly selecting the important features

Hypothesis test and confidence intervals on regression coefficients
Confusion matrix

2. From the confusion matrix obtained above, calculate precision, recall, F1-score and accuracy.
What is your overall conclusion about the regression model?

Lecture 19-20 Two lectures are kept for case studies and any other general discussion.

Lecture 21-25
1. Taylor series expansion

2. Solution of non-linear equations

Bisection method
Relaxation method
Newton-Raphson method
Order of an iterative method

3. Solution of linear equations

Non-iterative method (Gaussian elimination)

Iterative methods (Jacobi and Gauss-Seidel)
Systematic formulation of iterative solution of Ax = b
Convergence criteria for iterative methods

4. Python

Codes for bisection, relaxation, and Newton-Raphson methods

Codes for Jacobi and Gauss-Seidel method

21
Exercise:

1. Do the Taylor series expansion about an extremum point.

1
2. Find the Maclaurin series expansion of 1−x
.

3. Find the Maclaurin series expansion of ex , cos x, sin x.

4. (py) Write a python code to find all the roots of x3 − 15x − 4 = 0, using bisection method.
2
5. (py) Write a python code to find the root of x = e−x , using relaxation method. Verify first few
steps by hand (using a scientific calculator).

6. Find both the roots of f (x) = x2 − 3x + 1 = 0 using the relaxation method.

7. (py) Write a code to find the roots of f (x) = x2 − 3x + 1 = 0 using the relaxation method.

8. Convince yourself that f (x) = x3 + x − 1 = 0 has a root between 0 and 1. Try to find the root
by relaxation method using x = 1/(1 + x2 ) and using x = 1 − x3 . Which one works and why?

9. (py) Write a python code to solve the above problem.

10. Solve f (x) = x − 2 sin x using Newton-Raphson method.

11. (py) Write a python code to solve the above problem.

12. Solve f (x) = x3 − 15x − 4 = 0 using Newton-Raphson method.

13. (py) Write a python code to solve the above problem.

14. Prove that Newton-Raphson method is at least a second order method.

15. Solve the following by Gaussian elimination and back substitution.

3x0 + 3x1 + x2 = 12, 2x0 + x1 + 2x2 = 10, x0 + 2x1 + 3x2 = 14.

16. Solve the following by Jacobi and Gauss-Seidel method

4x0 + 2x1 + 3x2 = 8, 3x0 − 5x1 + 2x2 = −14, −2x0 + 3x1 + 8x2 = 27.

17. xm+1 = Pxm + q. Derive the form of P and q for the Jacobi and Gauss-Seidel method in terms
of D , U and L.

Lecture 26-30

1. Numerical integration

Rectangle method
Trapezoidal method
Gauss integration method
Error analysis

2. Numerical differentiation

First derivative: forward, backward and central derivative

Second and higher derivatives
Error analysis

22
3. Python

Codes for calculating integrals and derivatives

Exercise:

1. Derive the upper bound of total error for rectangle method.

Z 2
2. Evaluate ex dx using the rectangle method, with N = 4. Find the upper bound of the total
0
error. How would you ensure that the error does not exceed 0.001?
Z 2
3. Evaluate ex dx using the trapezoidal method, with N = 4. Find the upper bound of the total
0
error. How would you ensure that the error does not exceed 0.001?
Z 1
4. Evaluate ex dx using the Gauss integration method, with n = 2.
−1
Z 2
5. Evaluate ex dx using the Gauss integration method, with n = 5.
0
Z 1
6. Employing a Gaussian integration scheme with n=3, evaluate cos xdx. Calculate the error.
−1
If you have to use rectangular method to calculate the integral numerically, what would be the
required number of rectangles to get a similar error.
Z 2
2
7. (py) Write a code to evaluate I = e−x dx using trapezoidal rule. Start with h = 0.5 and keep
0
halving the interval, i.e., h = 0.25, h = 0.125 etc. Plot the value of Ih/2 − Ih , as you change the
value of h.

8. Using the Taylor series expansion, derive the forward, backward and central difference formula to
calculate the first derivative. Identify the first and second order method(s).

9. Derive a fourth order formula for calculating first derivative.

10. Derive formula for second and higher derivatives.

11. (py) Write codes to calculate the first derivative of x3 /3 using central difference and forward
difference in the range [-1,1]. Plot the quadratic and linear dependence of error on h in central
and forward difference.

12. Calculate numerical derivative of x3 /3 at x = 1, using h = 0.1. Calculate the error and compare
with what you get from the code.

13. (py) Write a code to calculate the second derivative of x3 /3 in the range [-1,1]. Plot the dependence
of error on h.

Lecture 31-33

1. Ordinary differential equation

Explicit method
Implicit method

2. Python

Python codes for explicit and implicit method

23
Exercise:

1. Solve ẋ = ax by Euler forward and Euler backward method. Do stability analysis and comment.

2. (py) Write codes to solve ẋ = ax by Euler forward and Euler backward method. Take a = −10.

3. Solve ẋ = x2 − 100x using x(0) = 10 and ∆t = 0.001 using forward Euler method.

4. (py) Write a code to solve the above problem.

5. Solve ẋ = x2 − 100x using x(0) = 10 and ∆t = 0.02 using backward Euler method.

6. (py) Write a code to solve the above problem.

7. What is the advantage of Euler backward method over Euler forward method?

Lecture 34-40

1. Partial differential equation

Classification - parabolic, hyperbolic, elliptic equations

Some famous equations - Laplace’s equation, diffusion or heat equation, wave equation
Diffusion or heat equation
– Analytical solution via separation of variables
– Numerical solution - explicit or FTCS method
– Numerical solution - Crank-Nicolson method

2. Python

Python code for FTCS method

Exercise:

1. Using separation of variables, solve diffusion or heat equation with initial condition f (x, t = 0) =
50 and boundary conditions f (x = 0, t) = f (x = l, t) = 0.

2. (py) Using a python code, plot the solution obtained in the previous problem for different values
of t.

3. Consider a metal bar of length 1 and α2 = 1 in the heat equation. Both the ends of the bar are kept
at temperature 0◦ C. At time t = 0, the temperature distribution in the bar is f (x, t = 0) = sin πx.
Applying the FTCS method with h = 0.2 and r = 1/2, find the temperature f (x, t) in the bar
when t > 0.

4. Consider a metal bar of length 1 and α2 = 1 in the heat equation. Both the ends of the bar are kept
at temperature 0◦ C. At time t = 0, the temperature distribution in the bar is f (x, t = 0) = sin πx.
Applying the CN method with h = 0.2 and r = 1, find the temperature f (x, t) in the bar when
t > 0.

5. (py) Write a FTCS code for solving the diffusion equation.

Midterm Exam Review Guide
No ratings yet
Midterm Exam Review Guide
7 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
4 pages
1.017/1.010 Class 7 Random Variables and Probability Distributions
No ratings yet
1.017/1.010 Class 7 Random Variables and Probability Distributions
3 pages
Discrete Probability Distributions Guide
No ratings yet
Discrete Probability Distributions Guide
23 pages
Statistical Techniques-II - Complete Notes With Solved Examples
No ratings yet
Statistical Techniques-II - Complete Notes With Solved Examples
11 pages
3200chap3 wk6
No ratings yet
3200chap3 wk6
28 pages
QT I (Probability Dist)
No ratings yet
QT I (Probability Dist)
22 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
5 pages
Output
No ratings yet
Output
6 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Chapter 3-2857
No ratings yet
Chapter 3-2857
8 pages
Random Variables Distributions
No ratings yet
Random Variables Distributions
36 pages
MA1201 Probability Notes
No ratings yet
MA1201 Probability Notes
30 pages
PTST Group3 WR
No ratings yet
PTST Group3 WR
18 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
0 Deep Learning Fundamentals of Probability Theory
No ratings yet
0 Deep Learning Fundamentals of Probability Theory
31 pages
F (X) Is Reviewed
No ratings yet
F (X) Is Reviewed
18 pages
Lecture 3 - Probability - BMSLec02
No ratings yet
Lecture 3 - Probability - BMSLec02
16 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Print
No ratings yet
Print
12 pages
SlidesCourse 14 Oct
No ratings yet
SlidesCourse 14 Oct
10 pages
RM2
No ratings yet
RM2
102 pages
S2 Vol2 Jointcontsdistributions
No ratings yet
S2 Vol2 Jointcontsdistributions
81 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Random Variables & Probability Distributions
No ratings yet
Random Variables & Probability Distributions
7 pages
Chapter 2 - Lec 3-4
No ratings yet
Chapter 2 - Lec 3-4
57 pages
Section06 Solutions
No ratings yet
Section06 Solutions
11 pages
Discrete Random Vaiable
No ratings yet
Discrete Random Vaiable
13 pages
Random Variables
No ratings yet
Random Variables
26 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
CHP 5
No ratings yet
CHP 5
63 pages
Math2101Stat 4
No ratings yet
Math2101Stat 4
15 pages
3 Probability Mass Function
No ratings yet
3 Probability Mass Function
51 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
Random Variables
No ratings yet
Random Variables
9 pages
Stat 130n Answers To The LAs in Lessons 3.1-3.3
No ratings yet
Stat 130n Answers To The LAs in Lessons 3.1-3.3
18 pages
ENENDA30 - Module 3
No ratings yet
ENENDA30 - Module 3
48 pages
Chapter 3
100% (1)
Chapter 3
9 pages
Lecture 3 - Adv. Probability - Discrete Random Variables
No ratings yet
Lecture 3 - Adv. Probability - Discrete Random Variables
51 pages
Sta 2110 Lectures Notes
No ratings yet
Sta 2110 Lectures Notes
21 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
STA 120-All Lectures
No ratings yet
STA 120-All Lectures
64 pages
Lecture 3 - CSE38900 - Rev
No ratings yet
Lecture 3 - CSE38900 - Rev
88 pages
Chapter 3 Discrete Probability Distributions - 2
No ratings yet
Chapter 3 Discrete Probability Distributions - 2
15 pages
Notes Dvi
No ratings yet
Notes Dvi
34 pages
Lesson 4 Chapter 3: Random Variables and Their Distributions
No ratings yet
Lesson 4 Chapter 3: Random Variables and Their Distributions
80 pages
CH 3 3502
No ratings yet
CH 3 3502
9 pages
Random Variables
No ratings yet
Random Variables
14 pages
Topic 2 - Discrete Random Variables and Probability Distributions
No ratings yet
Topic 2 - Discrete Random Variables and Probability Distributions
25 pages
Discrete Random Variables: 4.1 Definition, Mean and Variance
No ratings yet
Discrete Random Variables: 4.1 Definition, Mean and Variance
15 pages
Statistics Master
No ratings yet
Statistics Master
12 pages
MFCS Notes
No ratings yet
MFCS Notes
88 pages
3and4 Main
No ratings yet
3and4 Main
10 pages
Binomial Distribution Notes
No ratings yet
Binomial Distribution Notes
3 pages
EEE 6542 - Lecture 3 Notes - Complete - F2024
No ratings yet
EEE 6542 - Lecture 3 Notes - Complete - F2024
53 pages
Probability Exam Formula Sheet
No ratings yet
Probability Exam Formula Sheet
6 pages
ETS Ntegrals ONT Ounting Echniques: S I C - C T
No ratings yet
ETS Ntegrals ONT Ounting Echniques: S I C - C T
6 pages
Introduction To Probability and Statistics: Slides 3 - Chapter 3
No ratings yet
Introduction To Probability and Statistics: Slides 3 - Chapter 3
38 pages
Chapter 3 Discrete Probability Distributions - Final 3
No ratings yet
Chapter 3 Discrete Probability Distributions - Final 3
27 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (16)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Python Programming Lecture Notes
89% (9)
Python Programming Lecture Notes
116 pages
Statistical Reasoning: 8.1 Probability and Bayes' Theorem
100% (1)
Statistical Reasoning: 8.1 Probability and Bayes' Theorem
8 pages
Machine Learning Notes
91% (11)
Machine Learning Notes
19 pages
Anna University Engineering Question Bank
No ratings yet
Anna University Engineering Question Bank
7 pages
CP 4152 Database Practices I Previous Question Paper
86% (7)
CP 4152 Database Practices I Previous Question Paper
3 pages
Cloud Computing Notes
78% (9)
Cloud Computing Notes
35 pages
Computer Networking MCQs
81% (16)
Computer Networking MCQs
19 pages
Devops Full Notes
100% (6)
Devops Full Notes
230 pages
IOT Unit 3 To 5 Material
100% (2)
IOT Unit 3 To 5 Material
118 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
React JS
0% (1)
React JS
93 pages
IT6013-Software Quality Assurance
No ratings yet
IT6013-Software Quality Assurance
9 pages
Generative AI for Business Leaders
100% (18)
Generative AI for Business Leaders
80 pages
Data Structures Full Notes
92% (12)
Data Structures Full Notes
90 pages
Artificial Intelligence Handwritten Notes by Riya
100% (1)
Artificial Intelligence Handwritten Notes by Riya
118 pages
Object Oriented Analysis & Design QBank
No ratings yet
Object Oriented Analysis & Design QBank
10 pages
Machine Learning?
100% (5)
Machine Learning?
114 pages
C Language Full Notes
83% (12)
C Language Full Notes
179 pages
Software Engineering - Notes
80% (5)
Software Engineering - Notes
109 pages
Python for Absolute Beginners
92% (13)
Python for Absolute Beginners
161 pages
MCQ in Computer Science
No ratings yet
MCQ in Computer Science
686 pages
AI
100% (10)
AI
36 pages
NLP Question Bank Template
100% (1)
NLP Question Bank Template
11 pages
mc4205 Unit III Cyber Security Notes
No ratings yet
mc4205 Unit III Cyber Security Notes
42 pages
Software Quality Assurance Question Bank
No ratings yet
Software Quality Assurance Question Bank
4 pages
Unit 4
No ratings yet
Unit 4
54 pages
Practical Projects
100% (30)
Practical Projects
478 pages
Temporal Probabilistic Models
No ratings yet
Temporal Probabilistic Models
26 pages
Let Us Python by Yashavant Kanetkar
89% (27)
Let Us Python by Yashavant Kanetkar
429 pages
CIE A-Level Biology Data Analysis
No ratings yet
CIE A-Level Biology Data Analysis
45 pages
Qbus2810 Notes PDF
100% (1)
Qbus2810 Notes PDF
58 pages
Intermediate STATS
No ratings yet
Intermediate STATS
23 pages
Quant Exercices 221124 Solutions
No ratings yet
Quant Exercices 221124 Solutions
6 pages
Jurnal Personality and Individual
No ratings yet
Jurnal Personality and Individual
6 pages
Practical 1 Data Analysis Descriptive Statistics
No ratings yet
Practical 1 Data Analysis Descriptive Statistics
12 pages
Pine Needle Length Comparisons in Conifers
0% (1)
Pine Needle Length Comparisons in Conifers
6 pages
MB0040-Statistics For Management-Answer Keys
78% (9)
MB0040-Statistics For Management-Answer Keys
34 pages
Package Caret': R Topics Documented
No ratings yet
Package Caret': R Topics Documented
136 pages
Review of Hypothesis Testing and Basic Tests 1. 2-2 2. 2-15: 2-1 © 2006 A. Karpinski
No ratings yet
Review of Hypothesis Testing and Basic Tests 1. 2-2 2. 2-15: 2-1 © 2006 A. Karpinski
70 pages
Statistics Help Card Full
No ratings yet
Statistics Help Card Full
6 pages
Sample Data For Item Analysis - LAC
No ratings yet
Sample Data For Item Analysis - LAC
17 pages
Confidence Intervals with σ unknown
No ratings yet
Confidence Intervals with σ unknown
9 pages
The Slave Trade and The Origins of Mistrust in Africa: Citation
No ratings yet
The Slave Trade and The Origins of Mistrust in Africa: Citation
33 pages
Data Analysis for MCA Students
No ratings yet
Data Analysis for MCA Students
40 pages
Socio-Economic Impact on Creativity
No ratings yet
Socio-Economic Impact on Creativity
2 pages
Iron Content in Vitamin Tablets: Spectrophotometric Analysis
No ratings yet
Iron Content in Vitamin Tablets: Spectrophotometric Analysis
8 pages
Lecture 3 - Sampling-Distribution & Central Limit Theorem
No ratings yet
Lecture 3 - Sampling-Distribution & Central Limit Theorem
5 pages
Degrees of Freedom
No ratings yet
Degrees of Freedom
16 pages
EBSCO-FullText-03 16 2025
No ratings yet
EBSCO-FullText-03 16 2025
28 pages
Some Important Sampling Distributions
No ratings yet
Some Important Sampling Distributions
71 pages
Moving Block and Stationary Block Bootstrap For Time Series Data Darren Keeley
No ratings yet
Moving Block and Stationary Block Bootstrap For Time Series Data Darren Keeley
4 pages
Transport Logit Models Analysis
No ratings yet
Transport Logit Models Analysis
20 pages
BSC Sample Surveys Unit I Part II
No ratings yet
BSC Sample Surveys Unit I Part II
12 pages
2012 ECON 1203 S1 Solutions
No ratings yet
2012 ECON 1203 S1 Solutions
9 pages
Guidelines For Statistics and Graphs in General Education Biology
No ratings yet
Guidelines For Statistics and Graphs in General Education Biology
9 pages
Variance Reduction For MCQMC Methods To Evaluate Option Prices
No ratings yet
Variance Reduction For MCQMC Methods To Evaluate Option Prices
24 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
Journal of Animal Science Style and Form (Revised 2005) : Types of Articles Research Articles
No ratings yet
Journal of Animal Science Style and Form (Revised 2005) : Types of Articles Research Articles
15 pages
Chapter7 2
No ratings yet
Chapter7 2
37 pages

MSE204 Lecture Questions

Uploaded by

MSE204 Lecture Questions

Uploaded by

MSE204: Computational Methods in Materials Science and Engineering

Lecture wise plan and assignment problems (August 2023)

1. Why do we need to study probability: Some famous probability distributions

2. Plot Maxwell-Boltzmann distribution.

1. General concepts of discrete random variable and discrete probability distribution

 Probability mass function (PMF)

 What is the sample space?

2. Random variable and sample space:

 Plot the probability mass function. Is this a uniform probability distribution?

5. Let the random variable be the outcome of a single throw of a die.

 Find the CDF and plot it.

 Find the CDF and plot it.

10. Scaling a random variable:

14. Mean, variance:

17. (py) Sharpness of distribution:

 What are the possible values of Y ?

20. (py) Paramagnet: Total number of spins n = n↑ + n↓ and m = n↑ − n↓ .

 Probability of exactly 5 defective samples.

1. Continuous random variable and continuous probability distribution

 Probability density function (PDF)

 Two Parameters of normal distribution: µ and σ

 Use of SciPy and Matplotlib library to plot PDF and CDF

2. Exponential distribution: The example discussed above is known as an exponential probability

3. Repeat the first problem with different values.

4. How to get CDF from PDF and PDF from CDF?

6. Theorem: Given a symmetric distribution with respect to x = c, i.e., f (c − x) = f (c + x);

14. (py) Solve the last problem using python.

 D(1) = P (−1 < Z < 1) =?

16. (py) Solve the last problem using python.

 What is the probability that the coating thickness exceeds 12 mm?

19. (py) Solve the last problem using python.

 Find P (X = 2) using normal approximation of binomial distribution.

 Find the probability that exactly 100 pages are rejected.

1. Basic data analysis and visualization

 Concept of population and sample

 Numerical summary: Dataframe using pandas library - df.describe( )

1. What is sample mean and sample variance?

5. What is interquartile range? Where do we draw the box and whiskers?

6. What are known as outliers? Why do we care about them?

1. Point estimation of parameters

 Concept of random sample, sampling distribution and central limit theorem

2. Interval estimation of parameters:

 Confidence interval (CI) of µ for normal distribution

5. Given X is a random normal variable, with mean µ and variance σ 2 .

 Draw probability distribution of X and X.

8. (py) Repeat the previous problem using python.

9. Consider the previous problem.

 Find CI for each random sample, assuming 95% confidence level.

12. Understanding absolute error:

1. Hypothesis testing and decision making

 Concept of null and alternate hypothesis

– Fixed significance level hypothesis test

6. One-sample T-test on the mean:

7. Two-sample test on the mean:

8. Paired-sample test on the mean:

Af ter Bef ore Dif f erence

9. Goodness of fit – χ2 -test:

1. Simple linear regression

 Method of least squares

2. Binary logistic regression

 Odds and logit function

 Building a linear and logistic regression model

3. Write the ANOVA identity.

4. What are the degrees of freedom of SSE , SSR , SST ?

7. Write the test statistic for hypothesis testing of regression coefficients.

8. Write the formula of 95% CI for regression coefficients.

9. Test the significance of regression using the following: H0 : β1 = 0, H1 : β1 6= 0, β̂1 = 14.86, σ̂ 2 =

Binary logistic regression

1. When do we need to use logistic regression?

2. Define odds and logit function.

 Log of odds reduces by how much per unit increase of x?

Example: Given X = 10, σ 2 = 9, n = 9. Find 95% CI.

Example: Given X = 10, S 2 = 9, n = 9. Find 95% CI.

Hypothesis test: Z-statistic

Example: Given X = 11.5, σ 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

Example: Given X = 8, σ 2 = 9, n = 9, test the hypothesis H0 : µ = 10 against the alternative

Probability mass function (PMF)

What is the sample space?

Plot the probability mass function. Is this a uniform probability distribution?

Find the CDF and plot it.

Find the CDF and plot it.

What are the possible values of Y ?

Probability of exactly 5 defective samples.

Probability density function (PDF)

Two Parameters of normal distribution: µ and σ

Use of SciPy and Matplotlib library to plot PDF and CDF

D(1) = P (−1 < Z < 1) =?

What is the probability that the coating thickness exceeds 12 mm?

Find P (X = 2) using normal approximation of binomial distribution.

Find the probability that exactly 100 pages are rejected.

Concept of population and sample

Numerical summary: Dataframe using pandas library - df.describe( )

Concept of random sample, sampling distribution and central limit theorem

Confidence interval (CI) of µ for normal distribution

Draw probability distribution of X and X.

Find CI for each random sample, assuming 95% confidence level.

Concept of null and alternate hypothesis

Method of least squares

Odds and logit function

Building a linear and logistic regression model

Log of odds reduces by how much per unit increase of x?

Method of least squares

Odds and logit function

Building and analyzing a linear regression model

Build a regression model after properly selecting the important features

Non-iterative method (Gaussian elimination)

Codes for bisection, relaxation, and Newton-Raphson methods

First derivative: forward, backward and central derivative

Codes for calculating integrals and derivatives

Python codes for explicit and implicit method

Classification - parabolic, hyperbolic, elliptic equations

Python code for FTCS method