Lab 02 - Data Analysis and Visualization
I/O Processing and Regular Expressions
                                                                    Trần Lương Quốc Đại
1. Input and Output
Python provides powerful tools for handling input/output (I/O) opera;ons and working
with regular expressions. This exercise covers the essen;al topics for file opera;ons,
including reading and wri;ng to files, various file opera;ons, and working with text and
binary data. Addi;onally, it explores the module “re” which allows paHern matching, text
searching, and string manipula;on using regular expressions. By the end of this tutorial,
you'll understand how to process files efficiently and leverage regular expressions for text
processing in Python.
1.1. Output
The print() func;on in Python is used to display output on the console. The func;on takes
mul;ple arguments, but only the first one (objects) is required. Other parameters are
op;onal and control how the output is formaHed.
print(*objects, sep=' ', end='\n', file=sys.stdout)
Some common keywords of the above command:
object: These are the values to be printed. Before being displayed, they are automa;cally
converted to strings.
sep: Specifies how to separate mul;ple objects in a single print() statement. By default, it
is a space (sep=' ').
end: Defines what is printed at the end of the output. By default, end='\n' adds
a newline aSer each print()statement.
                                            1
file: Determines where the output is displayed. By default, it is sys.stdout (the screen).
You can print text (strings) directly or display the value of variables:
print("Hello, World!")
name = "Alice"
print("Hello,", name)
By default, print() separates mul;ple values with a space. You can change this behavior
using the sep parameter:
print("Data Analysis and Visualization", "is", "fun", sep="-")
By default, print() adds a newline (\n) at the end of output. You can modify this using
the end parameter:
print("Hello", end=" ")
print("World!")
Python supports formaHed output using f-strings for cleaner and more readable print
statements:
name = "Alice"
age = 33
print(f"My name is {name} and I am {age} years old.")
You can direct print() output to a file using the file parameter:
with open("sample.txt", "w") as file:
    print("This is saved in a file", file=file)
1.2. Taking Input from User
We can use the input() func;on to receive data from the user. This func;on captures user
input as a string, regardless of the type of data entered.
name = input ("Enter your name: ")
print("Hello " + name)
If it is required that the input data be of a par;cular data type, we must explicitly convert
the received string to the appropriate type using conversion func;ons such as int(), float(),
or complex().
                                               2
For example, to accept integer input, you can use int(input("Enter number: ")), which
ensures that user input is correctly interpreted as an integer. This conversion process is
essen;al to perform mathema;cal opera;ons or logical comparisons on input data.
age = int(input("Enter your age: "))
print("Hello! " + name + ". You are currently " + str(age) + " years
old.")
1.3. Taking MulMple Inputs from User
In Python, you can take mul;ple inputs from a user in a single line using
the input() func;on combined with the split() method. The split() method breaks the
input string into a list of values based on spaces or a specified delimiter. This technique is
par;cularly useful when handling mul;ple pieces of data efficiently without requiring
separate input statements.
For example, if you want to take mul;ple space-separated numbers as input, you can use:
numbers = input("Enter numbers separated by spaces: ").split()
print(numbers)
By default, split() divides the input string wherever whitespace is found. If numeric values
are required, you should convert them using appropriate data types like int() or float(). To
convert each string to an integer before storing it in a list, we use the map(int, ...) func;on
and apply the int() func;on to each input value.
numbers = list(map(int, input("Enter numbers separated by spaces:
").split()))
print(numbers)
You can also specify a different delimiter, such as a comma, by passing it as an argument
to split():
words = input("Enter words separated by commas: ").split(",")
print(words)
                                              3
1.4. Opening and Closing Files
Python provides a built-in open() func;on that is used to open a file stored on a computer
hard disk. Here’s its syntax:
file object = open(file_name [, access_mode][, buffering])
                      Mode        DescripMon
                      r           Read mode (default)
                      w           Write mode (overwrites file)
                      a           Append mode (adds to the file)
                      r+          Read and write mode
                      w+          Write and read mode (overwrites)
                      a+          Append and read mode
                      rb, wb, ab Binary modes
Python provides various methods for detec;ng the informa;on in an open file.
              AQribute     DescripMon
              file.closed Returns true if the file is closed; false otherwise
              file.mode     Returns access mode with which file was opened
              file.name     Returns name of the file
Open a file and find its aHributes:
fSample = open("sample.txt", "r")
print ("Name of the file: ", fSample.name)
print ("Closed or not : ", fSample.closed)
print ("Opening mode : ", fSample.mode)
                                            4
You can close an open file using the close() method, which clears related content from
memory and closes any ac;ve streams to the backend file.
fSample = open("sample.txt", "r")
print ("Closed or not : ", fSample.closed)
fSample.close()
print ("Closed or not : ", fSample.closed)
1.5. Reading and WriMng Files
Python provides two methods: write and read. The file.write() method stores data in a file
while the file.read() method reads data from an open file. These methods are commonly
used to handle file import and export opera;ons in Python.
Wri;ng to a file:
file = open("example.txt", "w") # Open file in write mode
file.write("Hello, this is a test file!")
file.close()
Read en;re content:
file = open("example.txt", "r") # Open file in read mode
content = file.read() # Read entire content
print(content)
file.close() # Always close the file after use
Using “with” ensures the file is properly closed aSer opera;ons:
with open("example.txt", "r") as file:
    content = file.read()
    print(content)
Reading file line by line:
with open("example.txt", "r") as file:
    for line in file:
        print(line.strip()) # strip() removes newline characters
Appending to a file:
with open("example.txt", "a") as file:
    file.write("\nAdding a new line.")
                                           5
2. Regular Expressions
A regular expression is a sequence of special characters to iden;fy and match specific
paHerns within strings. It is a powerful tool for searching, extrac;ng, and manipula;ng
text based on defined paHerns.
The “re” module provides support for working with regular expressions in Python.
import re
                PaQern DescripMon
                    ^    Matches beginning of the line
                    $    Matches end of the line
                    .    Matches any single character except a newline
                    *    Zero or more repe;;ons of the preceding
                    +    One or more repe;;ons of the preceding
                    ?    Zero or one repe;;ons of the preceding
                  […]    Matches any single character in brackets
                 [^…]    Matches any single character not in brackets
                  \w     Matches word characters
                  \W     Matches nonword characters
                  \d     Matches digits. Equivalent to [0-9]
                  \D     Matches nondigits
Matching a paHern
                                           6
pattern = r"hello"
text = "hello world"
match = re.search(pattern, text)
if match:
    print("Match found!")
Finding all matches
pattern = r"\d+" # Matches one or more digits
text = "There are 3 cats, 4 dogs, and 5 birds."
matches = re.findall(pattern, text)
print(matches)
Using “re.sub()” for replacing text
text = "My phone number is 123-456-7890."
pattern = r"\d{3}-\d{3}-\d{4}"
new_text = re.sub(pattern, "XXX-XXX-XXXX", text)
print(new_text) # Output: My phone number is XXX-XXX-XXXX.
Extrac;ng groups with “re.search()”
pattern = r"(\d{3})-(\d{3})-(\d{4})"
text = "Call me at 123-456-7890."
match = re.search(pattern, text)
if match:
    print("Area Code:", match.group(1)) # Output: 123
    print("Main Number:", match.group(2) + match.group(3))                # Output:
4567890
In Python, a regular expression paHern like “^F.+” matches a string that starts with “F” and
con;nues through any number of characters un;l it reaches a colon, stopping only at the
end of the line. This is why it keeps retrieving characters even aSer encountering the first
colon.
text = 'From: Using the : character'
print(re.findall('^F.+:', text)) # Output: ['From: Using the :']
In contrast, re.findall('^F.+?:', x) uses a non-greedy quan;fier (+?), instruc;ng Python to
match the shortest possible sequence between "F" and the first colon. This ensures the
match stops at the first occurrence of the colon rather than con;nuing to the end of the
line.
text = 'From: Using the : character'
print(re.findall('^F.+?:', text)) # Output: ['From:']
                                             7
3. Exercise
  1. Study and give specific examples of special characters in the Regular Expression
     Syntax sec;on on the link: hHps://docs.python.org/3/library/re.html#regular-
     expression-syntax
  2. Create a file named text.txt containing at least 3,000 words. Then, write a Python
     program that reads the content of text.txt, counts the occurrences of each unique
     word, and generates an output file named output.txt. The first line
     of output.txt should display the total number of unique words. The following lines
     should list each word along with the number of ;mes it appears, following the
     format <word> <number of occurrences>.
     Example:
            5
            đi    3
            tôi   6
            tôn   1
            thắng 1
            là    3
  3. Write a Python script that uses regular expressions to extract the course number,
     course code, and course name from the following text:
     Courses = ["100 MAT Mathematics",
                 "101 PHY Physics",
                 "102 CHE Chemistry",
                 "103 BIO Biology",
                 "104 ENG English",
                 "105 CSC Computer Science",
                 "106 HIS History",
                 "107 ECO Economics",
                 "108 PSY Psychology",
                 "109 ART Art & Design"]