KEMBAR78
G3 Computing Textbook Chapter 05 | PDF | Universal Product Code | Computer Programming
0% found this document useful (0 votes)
58 views17 pages

G3 Computing Textbook Chapter 05

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views17 pages

G3 Computing Textbook Chapter 05

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CHAPTER

05 Input Validation

218
218
219
219
5.1 Why Validation is Needed

For a problem to be solved, its inputs, outputs and processes need to be defined clearly. While the
programmer often has control over the processes used and outputs produced by the code, the supplied
inputs can come from many possible sources, and the programmer often has no control over what
is supplied. This means that the supplied inputs may not actually meet the requirements for valid or
acceptable input data as defined by the problem.

For example, consider the following program to find the average value in a list of numbers.

Input Output

• values: list to calculate • average: average value in


average from (already values; None if values is
provided) empty

Table 5.1 Input and output requirements for the problem of calculating the average
value in a list of numbers

Figure 5.1 Finding the average value in a list of numbers

220
However, what would happen if the values were initialised as a list of strs instead?

1
2
3
4
5
6
7
8
9
10
11
12

Figure 5.2 Attempting to find the average value in a list of strs

Here, Python outputs an error message that the addition operator (+) used by the sum() function is unable
to work with values that are of different data types (i.e., int and str). This is because values should
contain only numeric values. While this error is due to invalid input provided by the user, we can prevent
the program from behaving in an unexpected manner by performing data validation before the data is
processed. This ensures that the input data is sensible, complete and within acceptable boundaries.

5.2 Recovering from Invalid Input

LEARNING OUTCOMES

2.4.6 Justify the use of data validation and identify the appropriate action to take when invalid
data is encountered: asking for input again (for interactive input) or exiting the program
(for non-interactive input).

When performing data validation, there are two basic choices on what to do if the data is invalid.

221
5.2.1 Asking for Input Again

If the data was entered by the user,


we can ask for the data to be re-
entered. This option makes the
most sense if the data might change
by trying again.

For instance, suppose there is an


is_valid() function that returns
a bool for whether the provided
input is valid. The following example
demonstrates how to keep asking Figure 5.3 Example that asks for input again if input is invalid
for input again if the input is invalid:

5.2.2 Exiting the Program

If the input data is invalid and was


not entered by the user (i.e., it was
read from a file), we can skip the
rest of the program and exit.

For instance, suppose the input


is in a file named input.txt and
there is an is_valid() function
that returns a bool for whether
the provided input is valid. The
following example demonstrates Figure 5.4 Example that exits the program if input is invalid
how to exit the program if the input
is invalid:

U
DID YO
O W ?
K N In Python, one way of ending or exiting a program immediately is by importing
the sys (short for system) module and calling sys.exit().
For instance, suppose the input is in a file named input.txt and there is an is_valid() function
that returns a bool for whether the provided input is valid. The following example demonstrates using
sys.exit() to exit the program immediately if the input is invalid:

Figure 5.5 Template that uses sys.exit() to exit the program immediately if input is invalid

222
5.3 Common Validation Checks

LEARNING OUTCOMES

2.4.7 Validate input data for acceptance by performing:


• length check,
• range check,
• presence check,
• format check,
• existence check (i.e., checking for whether input data is already in the system), and/or
• calculation of a check digit

To validate input data, a number of common validation checks may be performed. Note that these validation
checks are not mutually exclusive and may have some overlaps. For instance, a length check (see section
5.3.1) may also be performed as part of a format check (see section 5.3.4).

5.3.1 Length Checks

Recall that the number of items in a


list or number of characters in a
str is called its length. The length of
the inputs for a problem often needs
to meet certain requirements in order
to be valid. This is known as a length
check.

For example, consider the input to


a word-guessing game where all
guesses must have a length of 3
characters.

Input Output

• guess: str with a length of • “Correct” or “Wrong”,


3 characters (required and depending on whether the
provided as many times as guess matches a secret str of
needed until program ends) length 3 characters (program
ends after correct guess)

Table 5.2 Example of problem requiring a length check

223
For the input to be valid, every guess must have a length of 3 characters exactly. The program below
performs this length check using the len() function on line 16.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Figure 5.6 Example of program that performs a length check

Note that this program uses two infinite while loops. To make the player keep guessing until a correct
guess is made, the program enters an outer while loop on line 11 that keeps repeating until the correct
word is guessed and a break statement is used to exit the outer loop on line 24. To make the player re-
enter guess until it passes input validation, the program enters an inner while loop on line 14 that keeps
repeating until the length check on line 16 is passed and a break statement is used to exit the loop on line
17.

In general, length checks make use of the len() function and are needed to ensure that list or str input
data is not too short or too long.

224
5.3.2 Range Checks

Another common requirement is to limit the input to a particular range of values. This is known as a range
check. For example, consider the problem of converting rounded percentage scores to academic grades
based on Table 5.3.

Rounded Grade Grade


percentage score points

Input:
75–100 A1 1
Rounded percentage
score 70–74 A2 2

65–69 B3 3 Output:
Grade

60–64 B4 4

55–59 C5 5

50–54 C6 6

45–49 D7 7

40–44 E8 8

Less than 40 F9 9

Table 5.3 Converting rounded percentage scores to grade points

The inputs and outputs for this problem are specified below:

Input Output

• score: rounded percentage • Correct academic grade for


score, should be a whole given rounded percentage
number between 0 and 100 score
inclusive

Table 5.4 Example of problem requiring a range check

225
In this problem, the input is a rounded percentage and must be a whole number between 0 and 100. The
following program shows how this range check might be performed:

By using None to represent an invalid


score, this loop keeps repeating as
long as score is invalid.

If input validation fails, we set score


to None so the program tries again by
repeating the loop.

Once the program exits the loop and


reaches this point, we can be certain
that score is valid.

Figure 5.7 Example of program that performs a range check

Like the previous example, this program keeps trying if data validation fails by asking the user to re-enter
the score. In general, range checks make use of these operators:
• less than (<)
• less than or equal to (<=)
• greater than (>)
• greater than or equal to (>=)

They are needed to ensure that int or float input data is within the required range of values.

226
5.3.3 Presence Checks

For some problems, portions of the input are required (or mandatory) while other portions may be optional.
If it is possible to leave out any optional inputs, the program should perform a presence check to ensure
that all the required inputs are provided.

For instance, let us look again at the word-guessing game where all guesses must have a length of 3
characters.

Input Output

• guess: str with a length of • “Correct” or “Wrong”,


3 characters (required and depending on whether the
provided as many times as guess matches a secret str of
needed until program ends) length 3 characters (program
ends after correct guess)

Table 5.5 Example of problem requiring a presence check

In this example, the input guess is required and must not be left blank.

The program in Figure 5.8 shows how a presence check for guess might be performed by checking whether
the given input for guess is blank on line 16.

In general, presence checks use special functions or compare inputs to values such as None or an empty
string ("") to ensure that all required inputs are provided.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Figure 5.8 Example of program that performs a presence check

227
5.3.4 Format Checks

Sometimes a problem requires the input to satisfy additional complex requirements such as following a
particular pattern. This is known as a format check.

For instance, consider a program that prints an appropriate greeting for the given time of the day.

Input Output

• time: time based on the 24- • An appropriate greeting for the given time;
hour clock in HH:MM format “Good Morning” from 05:00 to 11:59, “Good
Afternoon” from 12:00 to 17:59, “Good Evening”
from 18:00 to 21:59, and “Good Night” from
22:00 to 04:59

Table 5.6 Example of problem requiring format check

In this problem, the input must follow the HH:MM format. This means that the user must key in exactly two
digits followed by a colon and another two digits.

The following program shows how this format check might be performed.

1
2
3 str.isdigit() method returns True
4 if every character in the str is a digit.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Figure 5.9 Example of program that performs a format check

228
Depending on the problem to be solved, more complex format checks, such as requiring that the input
string uses a particular data format, may also be needed.

5.3.5 Existence Checks

For some problems, input is valid


only if it is (or alternatively, is Input Output
not) in an existing collection or
repository of data. This is known
• username: a str (must • Email address for the
as an existence check. already exist in the given username
system)
Suppose we have an email lookup
program that allows users to enter
a username and look up the user’s Table 5.7 Example of problem requiring existence check
email address.

In this problem, an existence check is needed to ensure that the entered username already exists in the
system.

The following program shows how this check may be performed on line 10 if usernames and email
addresses are stored in a dict named addresses:

Input:
username
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 Output:
16 email address
17

Figure 5.10 Example of program that performs an existence check

Suppose now we want users to register unique usernames in a system.

Input Output

• username: a unique str (provided as many • The list of all


times as needed until a blank username is unique usernames
entered; must not have been entered before) entered

Table 5.8 Example of problem requiring an existence check

229
This time, an existence check is needed to ensure that the entered username does not already exist in the
system.

The following program shows how this check may be performed on line 10 if usernames are stored in a
list named usernames:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Enter username: siti


Enter username: bala
Enter username: bala
Data validation failed!
Username should not exist in system
Enter username: alex
Enter username:
[‘siti’, ‘bala’, ‘alex’]

Figure 5.11 Example of program that performs an existence check

In general, existence checks use the in and not in membership operators to check whether input already
exists in the system.

5.3.6 Check Digits

A check digit is usually an additional digit or letter added to the end of a sequence of digits that is intended
to be read by or entered into a computer manually. The check digit is mathematically related to the original
sequence of digits so that simple input errors, such as accidentally swapping two digits or wrongly entering
a digit, would break this mathematical relationship and hence be detected. If the check digit is a letter,
it would usually need to be converted to a number so that it can be used in the algorithm to check the
sequence.

After the digits have been manually entered, the computer will calculate whether the expected mathematical
relationship is true. If the relationship is still true, it is likely that the numbers were entered correctly.
Otherwise, the computer can ask for the numbers to be re-entered and checked again.

230
For instance, most products sold in shops are
labelled with a barcode that also contains a
number. This number is also known as a product
code, which is used to identify the product being
purchased. A product code is obtained when the
barcode is decoded by a barcode reader or when
the number is entered manually at the cashier.
The last digit of this product code is actually a
check digit that helps to prevent errors in the
decoding or entry process. The last digit here is
the check digit
One popular standard for check digits that
are used in product barcodes is the Universal
Product Code (UPC-A) standard. Figure 5.12
shows an example of a 12-digit UPC-A product
code. The 12th digit is the check digit.
Figure 5.12 Example of a UPC-A barcode

To calculate the check digit, the first 11 digits need to be processed using the following algorithm:

1 2 3 4
Add the digits in Multiply the result Add the digits in If the last digit of the result is
the odd-numbered by three. the even-numbered 0, the check digit should be 0.
positions (i.e., 1st, positions (i.e., 2nd, Otherwise, subtract the last digit
3rd … 11th). 4th … 10th) to the of the result from 10. The check
result. digit should be the same as the
resulting answer.

The program in Figure 5.13 demonstrates how to calculate the check digit from the first 11 digits of a UPC-A
code. There are also many other possible check digit algorithms.

Figure 5.13 Finding the check digit of a Universal Product Code

231
QUICK
E CK 5 .3
CH
1. For each of the following extracts of input entry and validation code, identify the most appropriate description
of whether the program is performing a length check, range check, presence check, format check or existence
check.
a) Extract 1:
while True:
s = input(“Enter s: “)
if len(s) == 2 and s[0] in “ABCDEF” and s[1].
isdigit():
break
print(“Data validation failed!”)
...

Figure 5.14 Extract 1

b) Extract 2:
while True:
first = input(“Enter first name: “)
middle = input(“Enter middle name (optional): “)
last = input(“Enter last name: “)
if first == “” or last == “”:
print(“Data validation failed!”)
else:
break
...
Figure 5.15 Extract 2

c) Extract 3:
with open(‘serials.txt’, ‘r’) as f:
serials = f.read().split(‘\n’)
while True:
serial = input(“Enter serial: “)
if serial not in serials:
print(“Data validation failed!”)
continue
break
...

Figure 5.16 Extract 3

d) Extract 4:
p = float(input(“Enter p:”))
while p < 0.0 or p > 1.0:
print(“Data validation failed!”)
p = float(input(“Enter p:”))
...

Figure 5.17 Extract 4

e) Extract 5:
n = int(input(“Enter n:”))
while True:
integers = []
while True:
input_text = input(“Enter integer, blank to end: “)
if input_text == “”:
break
integers += [int(input_text)]
if len(integers) == n:
break
print(“Data validation failed!”)
...
Figure 5.18 Extract 5

232
W
REVIE N
S TI O
QUE
1. A program requires the user to enter their age. Suggest two common validation checks that may be performed
and explain why they may be needed.

2. In Singapore, each citizen and permanent resident has a unique National Registration Identity Card (NRIC)
number that includes a check digit.

A program is written that requires the user’s own NRIC number and performs input validation using its check
digit.

Explain whether this is sufficient to prevent the user from entering another person’s NRIC number.

3. Programs may use str values in the format 'YYYY-MM-DD' to represent day DD, month MM and year YYYY. If we
do not need to distinguish between leap years and non-leap years, a valid date obeys these rules:

• YYYY must be a 4-digit number between 1900 and 9999 inclusive


• MM must be a 2-digit number between 01 and 12 inclusive
• DD must be a 2-digit number
• If MM is 02, DD must be between 01 and 29 inclusive
• If MM is 01, 03, 05, 07, 08, 10 or 12, DD must be between 01 and 31 inclusive
• Otherwise, DD must be between 01 and 30 inclusive

Write the input entry and validation code for a program that does not distinguish between leap years and non-
leap years and needs to accept a valid date. If the input entered via the keyboard is invalid, your input validation
code should keep trying by asking for the input to be entered again. The specification for the problem’s input is
provided below. (The specification for the output is not needed to solve this problem.)

Input Output
• date: a valid date str in the format (not needed)
“YYYY-MM-DD” (there is no need to
distinguish between leap years and
non-leap years)

233
ANSWER

Pg. 232-Quick Check 5.3


1. a) Format check
b) Presence check
c) Existence check
d) Range check
e) Length check

Pg. 233-Review Question


1. Any two of:
• Range check
Explanation: Needed to ensure that the input is non-negative.
• Presence check
Explanation: Needed to ensure that the input is provided.
• Format check
Explanation: Needed to ensure that the input is made up of digits.

2. No, performing input validation with a check digit is not sufficient to prevent a user from entering another
person’s NRIC number.

A check digit can only check if there is an input error such as typing the wrong digit or having swapped digits. It
cannot detect if a valid NRIC number belongs to the person who is using the program.

3. A possible program is as follows:

# Program: date_validation.ipynb
1 # Input and input validation
2 def is_valid(date):
3 # Check length and whether dashes are in correct place.
4 if len(date) != 10:
5 return False
6 if date[4] != ‘-’ or date[7] != ‘-’:
7 return False
8
9 # Check if YYYY, MM and DD are digits.
10 year = date[:4]
11 month = date[5:7]
12 day = date[8:]
13 if not (year.isdigit() and month.isdigit() and day.isdigit()):
14 return False
15
16 # Perform range checks on YYYY, MM and DD.
17 year = int(year)
18 month = int(month)
19 day = int(day)
20 if year < 1900:
21 # Upper limit of 9999 does not need explicit check.
22 return False
23 if month < 1 or month > 12:
24 return False
25 if day < 1:
26 return False
27 if month == 2:
28 return day <= 29 # February requires day <= 29.
29 elif month in [1, 3, 5, 7, 8, 10, 12]:
30 return day <= 31 # “Long” months require day <= 31.
31 else:
32 return day <= 30 # “Short” months require day <= 30.
33
34 while True:
35 date = input(“Enter date: “)
36 if is_valid(date):
37 break
38 print(“Data validation failed!”)

234

You might also like