Data Processing
DATA PR O C E S S I N G
DEFINITION OF TERMS
Data
It’s a collection of raw facts (and may include numbers, letters, symbols, sound
or images) that convey little meaning by themselves and cannot be used for
decision making
Information
This is data that has undergone processing and is meaningful to the user and can
be used for decision making.
Data processing
Refer to the process of transforming data into information either manually or
electronically
DATA PROCESSING CYCLE
Definition
It refers to the input process – output stage that data goes though to be transformed into
information.
Data collection
Future use
Output Data input
Data processing
Figure 1: Data processing cycle
Makari Page 1 of 13
Data Processing
STAGES OF DATA PROCESSING CYCLE
1. Data collection / Data gathering / Fact finding
Is the process of getting the data from its point of origin for processing purpose
Stages of data collection
i. Data creation
Is the process of putting together facts in an organized format. This could
be in form of manually prepared documents or captured from the source
using a data capture device such as a bar code reader.
ii. Data transmission
This refers to the transfer of the data from the source to the point where
processing will be done. Transmission can be electronically through the
computer to computer or physically through the post office etc
iii. Data preparation
This is the transcription / conversion of data from the source document to
machine readable form. Data collected using devices that capture data in
digital form do not requite transcription/ conversion.
iv. Media conversion
Data may need to be converted from one medium to another e.g. from a
compact disk to hard disk for faster input
v. Input validation
Data entered into the computer is subject to validity and verification
checks before processing.
Verification
It involves checking that what is on input document is exactly the same
as what is entered into the computer.
Validation
This is identification and removal of errors by the computer though the
control of the program
vi. Sorting
The source documents are arranged in a particular order for easy and
faster entry.
Makari Page 2 of 13
Data Processing
Methods of data collection
i. Interview
It’s a process of obtaining information by means of conversation. It helps the
analyst to overcome the interviewee’s resistance hence providing facts and
verification of the same facts
Types of interviews
a. Structured interviews
Here questionnaires are predetermined and the interview is carried out in
a fixed format.
b. Unstructured interviews
Here questionnaires are not predetermined and the questions are
asked per the flow of the conversation. In this case the interviewee
may lose focus and forget to cover all the areas that are necessary
to gather information.
ii. Questionnaire
It is a written set of questions to which the subjects respond in writing.
Questionnaires solicit information on the background of the subjects, their
knowledge of the research problem, their attitudes and views of the problem.
Advantages
They are easy to analyze
They are easy to administer
The subjects can respond at their own time
They allow the use of a big sample
The subjects responds at a free environment thus can be sincere
They are economical in terms of time and money
Disadvantages
They are difficult to construct
Require literate subjects
The rate of return of the questionnaire is usually very low
Subject may misunderstand certain questions or deliberately give socially
acceptable response
Limited information is collected
iii. Observation
This method involves watching an operation for a period of time to see for one
self what exactly happens. This method is particularly good for tracing the
bottlenecks and checking facts that have already been noticed
Point to consider during observation
Not too many behaviors should be observed
Have a single subject
The behavior to be observed should be defined in sufficient details
Where more than one observer is used a study training is necessary
Advantages
It provide first hand information
In some situations it may be the only method of getting information
It supplement interview and questionnaire
Makari Page 3 of 13
Data Processing
Disadvantages
It cannot be used where events have passed
Its time consuming
Can cover a limited area
There are dangers of the observer being biased
When recording the observer may miss to observe something else
The subject may behave differently when being observed.
iv. Record inspection
This is going through the existing records in order to get information
Advantages
First hand information
It very accurate
Disadvantages
Time consuming
Some records may not be complete
Fault documents may be presented
2. Data input
This is the process where the collected data is converted from the human readable form to
machine readable form
3. Processing
This is the transformation of input data by the central processing unit to a more
meaningful output
4. Output
This is the final activity of data processing cycle which produces the desired output
(information). The information is then distributed to the target group.
DESCRIPTION OF ERRORS IN DATA PROCESSING
The accuracy of the data entered in the computer determines the accuracy of the information
given out
Types of data processing errors
i. Transcription errors
These errors occurs during data entry and they include
a. Misreading errors
They are brought about by incorrect reading of the source document by the user
and hence entering wrong values. This may be caused by bad hand writing or
confusion on source document e.g. the number ‘5’ can be typed as ‘S’ or letter
‘O’ typed as ‘Zero’
b. Transposition errors
Makari Page 4 of 13
Data Processing
This error occurs as a result of incorrect arrangement of characters .i.e. putting
characters in the wrong order e.g. the user may enter 524 instead of 542.
NB: Transcription errors can be eliminated by the use of data capturing devices.
ii. Computation errors
They occur when an arithmetic operation does not produce the expected results. They
include:
a. Overflow errors
Occurs if the results from a calculation is too large to be stored in the allocated
memory location / space e.g. if the memory space can accommodate an 8 bit
number and the results from the arithmetic calculation is 9 bit number
b. Truncation errors
This result from having real numbers that have a long fractional part which
cannot fit in the allocated memory space the computer cut off the extra characters
from the fractional part e.g. a number like 0.856948 can be truncated to four
digits as 0.856
c. Rounding errors
This result from raising or lowering a digit in a real number to the required
rounded number e.g. a number like 0.589 can be rounded to 0.59
iii. Algorithm / logical errors
These errors occurs as a result of wrong algorithm design
DATA INTEGRITY
Data integrity refers to the accuracy and completeness of the data entered in a computer or
received from the information system
Factors that determines data integrity
i. Accuracy
Accuracy refers to how close an approximation is to an actual value. As long as
the correct data / instruction / information are entered, the computer will produce
accurate results efficiently.
ii. Timeliness
This refers to whether the information is available when it is needed, and if it’s
outdated when it is received or when its is used. If the information is not
available when needed or it’s outdated by the time it’s used then it has little or no
value in decision making.
iii. Relevance
Data entered must be pertinent to the processing needs at hand and must meet the
requirements of the processing cycle.
iv. Audibility
Makari Page 5 of 13
Data Processing
It’s also referred to as verifiability. This is the ability of the user to check the
accuracy and completeness of information.
Factors that may lead to lose of data integrity
i. Accidental insertion /alteration, modification, destruction
ii. Deliberate insertion /alteration, modification, destruction
iii. Transfer error
iv. Lack of input validation
Threats to data integrity
Threats to data integrity can be minimized through the following:
i. Using error detection and correction software when transmitting data
ii. Designing user interface that minimize chances of invalid data entry
iii. Using devices that capture data directly from the source e.g. scanner, OMR
iv. Control access to data by enforcing security measures
v. Backing up data preferably on external storage media
DATA PROCESSING METHODS
There are three methods of data processing
i. Manual / conversion
ii. Mechanical
iii. Electronic
1. Manual / conversion data processing
In this method, most tasks are done manually with a pen and a paper with simple tools
like tables and rulers. The staff in the organization use laid down procedures and rules to
collect data, process and distribute information to the relevant departments for use.
2. Mechanical data processing
In this method, the staffs use various mechanical machines like calculators, cash
registers, typewriters, duplicating machines etc. to perform their operations. Thus it’s the
automation of the manual tasks.
3. Electronic data processing
This method involves the use of microprocessors where tasks are performed by the use of
electronic machines that simulate human intelligence during data processing. This
method is faster and more accurate than the manual and the mechanical and large
volumes of data can be processed per unit time. It involves the use of computers, mobile
phones, digital TVs, robots etc.
Makari Page 6 of 13
Data Processing
Factors determining the method of data processing
i. Size and type of business
ii. Timing aspect i.e. how fast is the information needed for decision making
iii. Links between applications
COMPUTER FILES
A file is a collection of related records that give a complete set of information about a certain item
or entity. A file can be stored manually in file cabinets or electronically in computer storage
devices
Advantages of computerized filing system
i. Information takes less space than the manual
ii. Its easy to update and modify information
iii. It offers faster access and retrieval of data and information
iv. It enhances data integrity and reduces duplication
v. Can be accessed within a network environment making it very flexible
Disadvantages of computerized filing system
i. Can be affected by viruses
ii. Require computer literate manpower to operate
iii. Hacking can be possible
Elements of a computer file
It’s made up of three elements
i. Character
ii. Field
iii. Record
Character
Is the smallest element in a computer file and refer to a letter, number or symbol
that can be entered, stored and output by a computer
Field
Is a single character or a collection of characters that represent a single piece of
data e.g. in a student record, the student admission number is an example of a
field
Record
Is a collection of related fields that represent a single entity e.g. details of a
student in a row such as admission number, name, total marks etc.
Makari Page 7 of 13
Data Processing
LOGICAL AND PHYSICAL FILES
Computer files are classified as either logical or physical
Logical file
It’s a type of file viewed in terms of what type of data items it contains and details of
what processing operations may be performed on the data item
Physical File
Is one that is viewed in terms of how data is stored on the storage media and how the
processing operations are made possible. Physical files have implementation details such
as character per field and data type for each field.
TYPES OF COMPUTER PROCESSING FILES
i. Master file
This is the main file that contains relatively permanent records about particular item or
entries e.g. a customer file may contain details such as customer ID, name, address etc.
ii. Transaction file / movement file
It holds temporary input data during transaction processing. This file is used to update
dynamic data on the master files
iii. Report file
They contain a set of records extracted from the data in the master file. They are used to
prepare reports that can be printed at a latter date e.g. overtime report, late arrival report
etc.
iv. Sort file
They are created from existing transactions or master file. They are used where date is to
be processed sequentially. Data or records are first sorted in a particular format e.g.
ascending or descending order
v. Backup file
Is used to hold copies (backup) of data or information from the computer fixed storage
(hard disk) Since a file held in the hard disk may be corrupt, lost or changed accidentally
it is necessary to keep copies of the recent updated files. In case of hard disk failure, a
backup file can be used to reconstruct the original file
vi. Reference file
Makari Page 8 of 13
Data Processing
They are used for reference or look up purpose. Look-up information is that information
which is stored in a separate file but is required during processing. They may contain
permanent or semi-permanent records e.g. price list
FILE ORGANIZATION METHODS
Files organization is the arrangement of the records withing a particular file. There are four major
methods of file organisation : sequential, direct / random , serial and indexed-sequential
organisation.
1. Sequential File Organisation
Records are stored, sorted and accessed in sequential order by a particular key field e.g.
if a file contains employee data, then the likely data field chosen to be the key field could
be the Employee Number.
Sequential Files are stored on magnetic tape. Data which is stored sequentially can only
be accessed sequentially from the beging of the file proceeding to the tail of the file until
the record is reached. For this reason, it is called sequential access method (SAM) file.
Advantages
i. Simple to understand the approach
ii. Easy to organize and maintain
iii. Reading a record requires only the key field
iv. Require cheap input / output media and devices
Disadvantages
i. Entire file must be accessed even when the activity rate is very slow
ii. Random enquiries are impossible
iii. Data redundancy is very high
2. Direct / Random File Organisation
In this method records are stored randomly but accessed directly. To access a record
stored randomly a record key is used to search for that record.
To get to a particular record, the input will be the record key field, and the output from
this program is the address This is only found in optical and magnetic disks and not on
tapes.
Advantages
i. Its quick to access a record
ii. Its easy to update a record
iii. They do not make the use of indexes
Makari Page 9 of 13
Data Processing
Disadvantages
i. Expensive hardware and software resources are required
ii. Data may be accidentaly erased or overwritten unless special precautions are
taken
iii. Systems designed around it are complex and costly
3. Indexed-Sequential File Organisation
Its a combination of the earlier two methods only that an index is used to enable th
computer to locate individual record on the storage media e.g. in a magnetic drum,
records are stored sequentially on the trak however each record is assigned an index that
can be used to access it directly.
Advantages
i. Records can be accessed both sequentilally and randomly
ii. Record are not duplicated
iii. It take lesss time to access a record
Diadavantages
i. The storage media is rather expensive
ii. Accessing a record sequentially is time consuming
iii. Processing records sequentially may introduce redundancy
4. Serial file organization
In this organization records in a file are stored and accessed one after another. The records cannot
be sorted in any way. This method of file organization is in magnetic tapes (draw the diagram in
page 42 book 3 by longhorn)
ELECTRONIC DATA PROCESSING MODES
The basic techniques that are used in data processing are:
1. Online processing
Data is processed immediately it is received. The computer is connected directly to the
data input unit via a communication link.
It means being accessible to and under the control of the processor. Terminals where ever
their physical location are said to be on-line when directly linked to the processor.
Application
Interactive processing / transaction processing
a. On-line order processing
b. On-line payroll processing
c. On-line point of sale
d. On-line check out system
Random enquiries
a. On-line credit enquiry
Makari Page 10 of 13
Data Processing
b. On-line product availability enquiry
c. On-line account enquiry
Advantages
i. Files are maintained up to date
ii. Information is readily available for decision making
iii. Its possible to do some file enquiries through the terminals (work station)
Disadvantages
i. They are expensive to run and maintain
ii. Initial capital to purchase the hard ware and the soft ware is high
iii. If the central processor fails the terminals will not get the feed back
2. Distributed processing
It means that a processing facility is made available at a number of sites instead of a
single computer at a central point
Distributed processing allows a business to select the level of processing autonomy in
respect to department and security of data.
Application
Companies with different branches may opt for distributed processing but all submit their
reports to the head office
Advantages distributed processing
i. Less risk of system break down
ii. Less work load is availed at the central processor
iii. Incase of data loss only a small portion of the data is lost
Disadvantages
i. Expensive due to extra communication cost
ii. More sophisticated software is required to maintain data integrity
iii. It may time to make a decision since information is processed from different sites
3. Time sharing
Many terminals connected to a central computer are given access to the central
processing unit apparently at the same time. However, in the actual sense each user is
allocated a time slice of the CPU in sequence
The amount of time allocated to each user is controlled by a multi user operating system.
If a time slice allocated to a particular user is not enough the user is allocated another
time slice latter in a round robin manner.
Advantages
i. Where work load is not much the users are served proficiently
Disadvantages
i. Requires a big processor thus it must be expensive
Makari Page 11 of 13
Data Processing
ii. The user has no control of the central computer
iii. If tasks and users are many it may take time to be complete a single task
iv. It may not be reliable in terms of security
4. Batch processing
This is where transactions are accumulated to a computer file over a period of time
(batch) and then processed at a pre-specified period of time to produce a batch of output.
Batch runs can be made on any basis (daily, weekly, monthly or as requested)
Application
Payroll processing
Advantages
i. Simple to develop
ii. The initial cost of processing is low
iii. Timing reports is not necessary
Disadvantages
i. Its not convenient where decisions have to be made immediately
ii. Its difficult to provide priority scheduling
iii. It takes time before information is generated and distributed
5. Multi-processing
In this processing, more than one task is executed at the same time on different
processors of the same computer. In this case the different but independent processors
must be coordinated to work together. This coordination is made possible by a
multiprocessing operating system that enable different processors to operate together and
share the same memory.
6. Multi-programming / multi-tasking
In this processing, more than one program is executed at the same time by a single
processing unit. The OS allocates each program a time slice and decides what order they
will be executed
Advantages
i. Increases the productivity of the computer by reducing the CPU idle time
ii. Reduces the incidence of peripheral bound operations
Disadvantages
i. Requires expensive CPU
7. Interactive processing
Makari Page 12 of 13
Data Processing
In this processing there is continuous dialogue between the user and the computer. As the
program executes, it keeps on prompting the user to provide input or respond to prompts
displayed on the screen
8. Real-time systems
Systems process data so fast that the results are available to influence the activities
currently taking place e.g. in a chemical processing plant, the system must react
immediately to any change in reaction.
Application
a. Air line reservation system
b. Online ware house stock control
c. Online hotel accommodation system
d. Online banking system
Advantages
i. Real-time systems are fast and very reliable.
ii. Information is readily available for accurate decision making
iii. It provide immediate control
Disadvantages
i. They require complex and expensive operating system
ii. They require front end processors (FEP) to relieve the central processor the huge
work load.
iii. They are expensive to develop
On-line systems do not need to be real-time but real time system must be on-line system.
Makari Page 13 of 13