Selected Topics in Computer Science(CoSc4181)
Lecture 01: Data Science through Python
Programming Language
Department of Computer Science
Dilla University
By: Tsegalem G/hiwot
2022 G.C
1
Part I: Contents of Python programing
• Introduction to Python
• Why Python in Data Science
• Python programing basic structure
• Python Variables, data types
• Python Operators
• Python Conditional Statements
• Python For & While Loops
• Function
• Modules
2
Introduction to Python
Python is a high-level, interpreted, interactive and object-
oriented scripting language.
Python is designed to be highly readable.
It uses English keywords frequently where as other languages
use punctuation, and it has fewer syntactical constructions
than other languages.
Why Python in Data Science
Very simple and readable
Powerful libraries
Free and open source
Amazing community
An Integrated Development Environment (IDE) brings the
programmer’s entire tool into one convenient place.
Example
o Integrated Development and Learning Environment(IDLE)
o Pycharm
o Thonny
Python Hello World — Create Your First Python Program
Print(“Hello World”)
Note:
Python Identifiers: A Python identifier is a name used to identify a variable,
function, class, module or other object.
Reserved Words: These are reserved words and you cannot use them as
constant or variable or any other identifier names.
Lines and Indentation: In python, blocks of code for class, function definitions
or flow control are denoted by line indentation, which is rigidly enforced.
Quotation in Python: Python accepts single ('), double (") and triple (''' or """)
quotes to denote string literals, as long as the same type of quote starts and
ends the string.
Comments in Python: Comments can be used to explain the code, to improve
readability of the code and to prevent execution when testing code.
• Single line Comment (#)
• Multi-line Comments(""“)
Python Variables, data types
Variables are used for holding data values so that they can be
utilized in various computations in a program.
Variables can store data of different types, and different types
can do different things.
• Numeric Types: int, float, complex
• Sequence Types: list, tuple, range
• Mapping Type: dict
• Boolean Type: bool
– counter=100 # An integer assignment
– miles=1000.0 # A floating point
– name="John" # A string
– print(counter)
– print(miles)
Collecting User Input
• str=input("Enter your input: ")
• print("Received input is : ", str)
Collecting User Input
o Example 1:
message = input("Tell me something, and I will repeat it back to you: ")
print(message)
Example 2:
age = input("How old are you? ")
print(“age” )
Print(age)
Example 3:
age = int(input("How old are you? "))
print(“age” )
Print(age)
Operators in Python
Operator in python are used to perform operations between
variables
Constructs or special symbols that are used to manipulate the
value of the operand.
The values used in an operation are called operand
Operand Operator Operand
Result
10
Types of operator in Python
1.Arithmetic Operators
Are used to perform arithmetic operations between variables
addition(+)
subtraction(-)
Multiplication(*)
Division(/)
modules(%)
Exponentiation(**)
11
Cont…
2.Assignment operator
used to assign values of right operand to the left operand.
Operator Example
= x=10
+= x+=10 is equal to x=x+10
-= x-=10 is equal to x=x-10
*= x*=10 is equal to x=x*10
/= x/= 10 is equal to x=x/10
**= x**=10 is equal to x=x**10
%= x%=10 is equal to x=x%10
12
Cont…
3. Comparison operator
used to compare two values.
1.Equals(==)-print true if the operands ere equal otherwise print
false
Example. a=3 ,b=4 print(a==b)-prints False
2. Not Equals(!=) print True if operands are not equal otherwise
print false
3.Greater than(>)
4 .Greater than or equal(>=)
5 .Less than(<)
6.Less than or equal(<=) 13
Cont…
4. Logical Operators.
used to combine conditional statements.
1.Logical AND- Returns true if both condition are true ,otherwise it
print false.
E.g. a=3,b=4 ,c=5,d=6
print(a<b) and (c<d)=prints True.
2. Logical OR- Returns False if both condition are true, otherwise it
prints true
E.g. a=3,b=4 ,c=5,d=6
print(a>b) or (c>d)=prints False.
3.Logical not- return negation . E.g. a=4>5. not(a)=print True. 14
Cont…
5. Identity Operators.
identity operators are used to compare objects.
objects are anything in python(like data type, variables).
1.IS- returns True if both variables are same object
Example. a=15,b=15 print(x is y) = print True
a=15,b=5 print(x is y) =print False
2. Is not –Returns True if both variables are not same object.
Example. a=15,b=15 print(x is not y) =print False
a=15,b=5 print(x is not y) =print True
15
Cont…
6. Membership Operator
are used to check is a sequence is present in an object.
1. IN -returns True if sequence with the specified value is present
in the object
Example. List1=[1,2,3,5,6)
list2=[1,2,3]
print(list2 in list1)=print True, because all list1
element present in list2.
2. NOT IN- returns True if a sequence with the specified value is
not present in the object. Example based on the above example,
16
Print(list1 not in list2)-prints True.
Cont…
7.Bitwise operators: are used to compare binary numbers.
1.Bitwise AND($)- sets each bit to 1 if both bits are1. e.g.1010(10)
&1000(8)=1000(8)
2.Bitwise OR(|) –sets each bit to 1 if one of the bits is 1.
e.g.1010(10) |1000(8)=1010(1010)
3.Bitwise XOR(^)- compare two input bits and generate 1 if the
bits are different, and 0 if the bits are same.
x Y X^y
0 0 0
0 1 1
1 0 1
1 1 0 17
Cont…
4.Bitwise not(~)-inverts all bits(changes bits from 0 to 1 and
vice versa)
5.Left Shift(<<)- shift left by pushing in zeros from the right and
let the leftmost bits fell off.
Example.10 in binary=1010 and 10<<2=1000(8)
6.Right Shift(>>)- shift right by pushing in zeros from the right
and let the rightmost bits fell off.
Example.10 in binary=1010 and 10<<2=0010(2)
18
Python Collections
There are four collection data types in the Python programming
language:
– List is a collection that is ordered and changeable. Allows
duplicate members.
– Tuple is a collection that is ordered and unchangeable. Allows
duplicate members.
– Set is a collection that is unordered and unindexed. No
duplicate members.
– Dictionary is a collection that is unordered, changeable, and
indexed. No duplicate members.
List
List is a class of data structure, used to store multiple items in
one variable and can be created using square brackets.
Example: mylist=["apple", "banana", "cherry", "orange"]
print(mylist)
print(mylist[1])
print(mylist[-1])
print(mylist[1:3])
Some Method used in list are: append(), clear(), copy(),
count(), extend(), index(), insert(), pop(), remove(),
reverse(), sort(), etc
20
Tuple
A tuple is like a list, except you can't change the values in a tuple
once it's defined and can be created using parentheses.
Access Items: You access the list items by referring to the index
number:
Example: mylist=("apple", "banana", "cherry", "orange“)
print(mylist)
print(mylist[1])
print(mylist[-1])
print(mylist[1:3])
Some Method used in list are: cmp(tuple1, tuple2), len(tuple) ,
max(tuple), min(tuple) , tuple(seq) , etc 21
Tuple
A tuple is like a list, except you can't change the values in a tuple
once it's defined and can be created using parentheses.
Access Tuple Items: You can access tuple items by referring to
the index number, inside square brackets:
Example: mytuple=("apple", "banana", "cherry", "orange“)
print(mytuple)
print(mytuple[1])
print(mytuple[-1])
print(mytuple[1:3])
Some Method used in list are: cmp(tuple1, tuple2), len(tuple) ,
max(tuple), min(tuple) , tuple(seq) , etc 22
Dictionary
A dictionary is a collection that is unordered, changeable, and
indexed. In Python, dictionaries are written with curly brackets,
and they have keys and values.
Accessing Items: You can access the items of a dictionary by
referring to its key name, inside square brackets:
mydictionary={1:"apple", 2:"banana", 3:"cherry", 4:"orange“}
Print(mydictionary)
Some Method used in list are: dict.clear() , dict.copy() ,
dict.fromkeys() , dict.get(key, default=None) , dict.has_key(key),
dict.items() , dict.keys() , dict.setdefault(key, default=None),
dict.update(dict2), dict.values(), etc 23
Set
A set is a collection that is unordered and unindexed. In Python,
sets are written with curly brackets.
• Access Items: You cannot access items in a set by referring to an
index since sets are unordered the items has no index.
• But you can loop through the set items using a for loop, or ask if a
specified value is present in a set, by using the in keyword.
myset=("apple", "banana", "cherry", "orange“)
print(mytuple)
Some Method used in list are: add(), clear(), copy(), difference(),
difference_update(), discard(), intersection(), intersection_update()
,isdisjoint(),issubset(),issuperset(),pop(),remove(),symmetric_difference(),s
24
ymmetric_difference_update(),union(),update(), etc
Python Conditional Statements
The order in which statements are executed is called flow
control (or control flow).
Flow control in python program is typically sequential.
Flow control determines what is executed during a run and what
is not, therefore affecting the overall outcome of the program.
The control flow of a Python program is regulated by conditional
statements and loops.
If statements
An "if statement" is written by using the if keyword.
Syntax for the if statement:
if expression1:
statement(s) Example 2:
Example 1:
a=int(input(“Enter any num”))
a=2 if a%2==0:
b=3 print(“a is Even")
if b>a:
print("b is greater than a")
• Note: Python relies on indentation (whitespace at the beginning of a
line) to define scope in the code. Other programming languages
often use curly-brackets for this purpose. 26
If-else
The if-else allows you to specify two alternative statements one
which is executed if a condition is satisfied and one which is
executed if the condition is not satisfied.
Syntax for the if-else statement:
if expression1:
statement(s1)
else:
Example 2:
statement(s2)
Example 1: a=int(input(“Enter any num”))
if a%2==0:
a=2
b=3
print(“a is Even")
else:
if a>b:
print(“a is Odd”)
print(“a is greater")
else:
print(“b is greater") 27
elif
The elif saying "if the condition in expression1 is not satisfied, then
try the condition in expression2".
Syntax for the elif statement: if expression1:
if expression1: statement(s1)
statement(s1)
elif expression2:
elif expression2:
statement(s2)
statement(s2)
elif expression3:
Example 1::
a=3 statement(s3)
b=3
if b > a: ...
print("b is greater than a") else:
elif a==b:
28
print("a and b are equal") statement(sn)
Cont…
Example,
a=20
b=33
if b>a:
print("b is greater than a")
elif a==b:
print("a and b are equal")
else:
print("a is greater than b")
Cont…
and
The and keyword is a logical operator, and is used to combine
conditional statements:
Example: Test if a is greater than b, AND if c is greater than a:
a=200
b=33
c=500
if a > b and c > a:
print("Both conditions are True")
cont…
Or
The or keyword is a logical operator, and is used to combine
conditional statements:
Example: Test if a is greater than b, OR if a is greater than c:
a=200
b=33
c=500
if a>b or a>c:
print("At least one of the conditions is True")
cont…
Nested If
You can have if statements inside if statements, this is
called nested if statements.
Example
x=41
if x>10:
print("Above ten,")
if x>20:
print("and also above 20!")
else:
print("but not above 20.")
Loops
A loop statement allows us to execute a statement or group of
statements multiple times.
Python has two primitive loop commands:
– while loops
– for loops
The while Loop
With the while loop we can execute a set of statements as long
as a condition is true.
Syntax for the while loop statement:
while expression:
statement(s)
• Example: Print i as long as i is less than 6:
i=1
while i<6:
print(i)
i+=1
The break Statement
• With the break statement we can stop the loop even if the
while condition is true:
• Example: Exit the loop when i is 3:
i=1
while i<6:
print(i)
if i==3:
break
i+=1
The continue Statement
• With the continue statement we can stop the current
iteration, and continue with the next:
• Example: Continue to the next iteration if i is 3:
i=0
while i<6:
i+=1
if i==3:
continue
print(i)
Python for loops
A for loop is used for iterating over a sequence (that is either a
list, a tuple, a dictionary, a set, or a string.
Example: Print each fruit in a fruit list:
fruits=["apple", "banana", "cherry"]
for x in fruits:
print(x)
Looping through a String
Even strings are iterable objects, they contain a sequence of
characters:
• Example: Loop through the letters in the word "banana":
for x in "banana":
print(x)
break Statement in Python for loops
With the break statement we can stop the loop before it has
looped through all the items:
Example: Exit the loop when x is "banana":
fruits=["apple", "banana", "cherry"]
for x in fruits:
print(x)
if x=="banana":
break
Cont…
Example: Exit the loop when x is "banana", but this time the
break comes before the print:
fruits=["apple", "banana", "cherry"]
for x in fruits:
if x=="banana":
break
print(x)
The continue Statement in Python for Loop
With the continue statement we can stop the current
iteration of the loop, and continue with the next:
Example: Do not print banana:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
if x == "banana":
continue
print(x)
The range() Function
To loop through a set of code a specified number of times, we can
use the range() function,
The range() function returns a sequence of numbers, starting
from 0 by default, and increments by 1 (by default), and ends at a
specified number.
Example: Using the range() function:
for x in range(6): Example,
print(x)
Increment the sequence with 3:
Example:
for x in range(2, 30, 3):
Using the start parameter:
print(x)
for x in range(2, 6):
print(x)
Nested Loops
A nested loop is a loop inside a loop.
The "inner loop" will be executed one time for each iteration of
the "outer loop":
Example: Print each adjective for every fruit:
adj=["red", "big", "tasty"]
fruits=["apple", "banana", "cherry"]
for x in adj:
for y in fruits:
print(x, y)
Python Functions
A function is a block of code which only runs when it is called.
Through Functions, programs be easier to write, read, test, and
fix
You can pass data, known as parameters, into a function. A
function can return a result.
Cont…
Creating function
• In Python a function is defined using the def keyword:
Example:
def my_function():
print("Hello from a function")
Calling a Function
• To call a function, use the function name followed by
parenthesis. Example
def my_function():
print("Hello from a function")
def my_function() 44
Cont…
Function Arguments
• Information can be passed into functions as arguments.
• Arguments are specified after the function name, inside the
parentheses. You can add as many arguments as you want,
just separate them with a comma.
Example
def my_function(fname):
print(fname + " is CS program student")
my_function(“Abebe")
my_function(“Hana")
my_function(“Azeb") 45
Cont…
Return values
• To let a function return a value, we use the return statement
Example
def my_function(x): Output
return 5*x 15
25
Print(my_function(3)) 50
Print(my_function(5))
Print(my_function(10))
46
Scope of Variables
In Python, variables are the containers for storing data
values.
• The location where we can find a variables and also access
it if required is called Scope of variables
• Scope of variable can be either local or global.
1. Global variables
• Those variables that are defined and declared outside any
function and are not specified at any function.
• They can be used at any part of the program.
Cont…
Example of global Variable
X=3
Y=1
Def function1():
print(x+y)
• In the above program x and y are global variables because
they are defined outside of the function named function1
Cont…
2.Local variables
A variable created inside a function belongs to the local
scope of that function, and can only be used inside that
function.
Example,
A variable created inside a function is available inside that
function:
def myfunc():
x = 300
print(x)
myfunc()
Cont…
Naming Variables
If you operate with the same variable name inside and outside of a
function, Python will treat them as two separate variables, one
available in the global scope (outside the function) and one available
in the local scope (inside the function).
Example
X=20
Def fun(): output
X=10 10
Print(x) 20
Fun()
Print(x)
Python Modules
Modules refer to a file containing Python statements and
definitions.
A file containing Python code, for example: myfile.py , is called
a module, and its module name would be myfile.
We use modules to break down large programs into small
manageable and organized files.
Cont…
To create a module just save the code you want in a file with
the file extension .py
Example: Save this code in a file named mymodule.py
def func(name):
print("Hello, " + name)
Now we can use the module we just created, by using import
statement:
import mymodule
mymodule. func(“Abebe")
Import From Module: You can choose to import only parts from
a module, by using the from keyword.
Cont…
Example: The module named mymodule has one function
and one dictionary:
def func(name):
print("Hello, " + name)
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example: Import only the person1 dictionary from the
module:
from mymodule import person1
print (person1["age"])
Cont…
Example: The module named mymodule has one function
and one dictionary:
def func(name):
print("Hello, " + name)
person1 = {
"name": "John",
"age": 36,
"country": "Norway"
}
Example: Import only the person1 dictionary from the
module:
from mymodule import person1
print (person1["age"])
Part II: Contents Data Science
• Describe what data science is and the role of data scientists.
• Differentiate data and information.
• Describe data processing life cycle
• Understand different data types from diverse perspectives
• Describe data science life cycle in era of big data.
55
An Overview of Data Science
Data science means extraction of knowledge from large
volumes of data that are structured or unstructured.
Data science is the process of using data to find solutions to
predict outcomes for a problem statement.
Data science is a multi-disciplinary field that uses scientific
methods, processes, algorithms, and systems to extract
knowledge and understandings.
It’s much more than simply analyzing data.
56
Cont.…
Today, data professionals understand that they must advance
the traditional skills of analyzing large amounts of data, data
mining, and programming skills.
Data scientist is a professional who process and transform
raw data into useful understandings to make better business
decision
Data scientists must master the full field of the data science
life cycle and keep a level of flexibility and understanding to
maximize returns at each phase of the process..
57
Cont…
Data scientists need to be result-oriented, with special
knowledge and communication skills that allow them to
explain highly practical results.
They own a strong measurable experience in
• Statistics and
• Linear algebra,
• Programming ,
• Data warehousing,
• Mining, and modeling to build and analyze algorithms.
58
Typical jobs for data scientists
Collects large amounts of data and transforming it into more
usable format.
Solving business related problems using data focused
techniques.
Working with variety of programming tools like weka, R and
python.
Having understanding of statistics including statistical tests
and distribution.
Keep on top of analytical techniques such as machine
learning, deep learning and text analytics.
59
Cont…
Communicating and collaborating with both IT and business.
Looking for order and pattern Data as well as recognizing
trends that can help a business bottom line.
60
Who are data scientists?
Computer science-20%
Statistics and mathematics -19%
Economics and social science -19%
Data science and analysis -13%
Natural science(biology, chemistry, physics) -11%
Engineering-9%
Others-9%
61
What are data and information?
Data :-can be defined as a representation of facts, concepts,
or instructions in a formalized manner, which should be
suitable for communication, interpretation, or processing,
by human or electronic machines.
It can be described as unprocessed facts and figures.
It is represented with the help of characters such as
alphabets (A-Z, a-z), digits (0-9) or special characters (+, -,
/, *, <,>, =, etc.).
For example 12122013
62
Cont.…
Information:- is the processed data on which decisions and
actions are based.
It is data that has been processed into a form that is meaningful
to the receiver and is of real or perceived value in the current or
the future action or decision of receiver.
Furtherer more, information is interpreted data; created from
organized, structured, and processed data in a particular
context.
For example 12/12/2013
63
Data Processing Cycle
Data processing is the re-structuring or re-ordering of data
by people or machines to increase their usefulness and add
values for a particular purpose.
Consists of the following basic steps
• Input,
• Processing, and
• Output.
These three steps constitute the data processing cycle.
64
Cont.…
Input - in this step, data is prepared in some convenient form for
processing.
The form depend on the processing machine.
• For example, when electronic computers are used, the input
data can be recorded on any one of the several types of storage
medium, such as hard disk, CD, flash disk and so on.
Processing - in this step, the input data is changed to produce
data in a more useful form.
• For example, interest can be calculated on deposit to a bank, or
a summary of sales for the month can be calculated from the
sales orders.
65
Cont.…
Output - at this stage, the result of the processing data is
collected.
The particular form of the output data depends on the use of
the data.
• For example, output data may be payroll for employees.
66
Data types and their representation
Data types are described from diverse views in computer
science and computer programming,
data type is simply an attribute of data that tells the compiler
or interpreter how the programmer means to use the data.
67
Data types from Computer programming perspective
Integers(int)- is used to store numbers, mathematically known
as integers. Example 1 and 3645
Booleans(bool)- is used to represent restricted to one of two
values: Example true or false
Characters(char)- is used to store a single character.
Example ‘a’, ‘7’ etc.
Floating-point numbers(float)- is used to store real numbers
for example 1.354
Alphanumeric strings(string)- used to store a combination of
characters and numbers. Example ‘’a234’’ etc
68
Data types from Data Analytics perspective
From a data analytics point of view, there are three common
types of data types or structures: Structured, Semi-
structured, and Unstructured data types.
69
Structured Data
Structured data is data that follows to a pre-defined data
model and is therefore direct to analyze.
It obeys to a tabular format with a relationship between the
different rows and columns.
• Examples of structured data are Excel files or SQL
databases.
Each of these has structured rows and columns that can be
sorted.
70
Semi-structured Data
Data between structured and unstructured.
Data with no rigid structure.
Also known as a self-describing structure.
• Examples of semi-structured data include email, JSON and
XML are forms of semi-structured data.
71
Unstructured Data
Its information that either not organized in a pre-defined
manner.
Is typically text-heavy but may contain data such as dates,
numbers, and facts as well.
This results in irregularities and ambiguities that make it
difficult to understand using traditional programs as
compared to data stored in structured databases.
• Examples of unstructured data include audio, video files or
NoSQL databases
72
Metadata – Data about Data
The last category of data type is metadata.
From a technical point of view, this is not a separate data
structure, but it is one of the most important elements for Big
Data analysis and big data solutions.
Metadata is data about data.
It provides additional information about a specific set of data.
• For example In a set of photographs, metadata could
describe when and where the photos were taken.
73
Data science life cycle
1. Business Requirements:
Understanding the problems of business.
Understanding general objectives.
Understanding variables that need to be predicted.
Data scientists need to work with business people and those
with expertise in understanding the data.
74
2. Data collection
Process of gathering data before it is put in data warehouse
or storage on which data analysis can be carried out.
Specifying what data need for problems.
Understanding data sources (like Facebook, LinkedIn, Google
etc.)
Selecting efficient way to store data and access all of it.
Its is one of the major big data challenges in terms of
infrastructure requirements/storage and management.
75
3. Data cleaning
sometimes unnecessary data is collected such that only
increases complexity of the problem
Transforming data into desired format.
Challenging is integration.
Data cleaning:-
• missing value and duplicate value.
• Corrupted data
• Remove unnecessary data
• misspelled value
• inconsistent data
76
4. Data exploration and analysis
Making raw data open to use in decision-making as well as
domain-specific usage.
Understanding patterns in data
Retrieving useful understanding and Forming hypothesis
Data analysis involves exploring, data with the goal of
importance relevant data, creating and extracting useful
hidden information with high potential from a business point of
view.
Related areas include data mining, and machine learning
77
5. Data modeling
Create a model that predicts the target most accurately.
Evaluate and test the efficiency of model.
Identify the model that best fits the business requirement.
Its like visualization(e.g histogram)
78
6. Deployment and optimization
Last stage is deployed in users and feedback is collected and
maintained.
Deploy the model in test environment.
Monitor the performance
79
Data science process life cycle
80
Needs of data science
Solves business problems
For better decision( A or B)
Predictive analysis(what will happen next?)
Pattern discovery (is there any hidden information in the data?)
Deep understanding about your customer.
Additional layer of evidence for stakeholders.
Optimize your resource use.
81
Seminar on Python Libraries for Data Science
Some Python Libraries for Data Science
• Pandas
• NumPy
• SciPy
• Matplotlib
• SciKit-Learn
• TensorFlow
• Keras