KEMBAR78
NBDM Training 2016 - Module 2 - Data Management Using Stata | PDF | Microsoft Excel | Comma Separated Values
0% found this document useful (0 votes)
37 views63 pages

NBDM Training 2016 - Module 2 - Data Management Using Stata

The document describes a National Beneficiary Data Training that will take place from April 28 to May 7, 2016 at the Cloud 9 Hotel in Antipolo, Rizal. The training will cover getting started with STATA, data profiling, analysis and reporting using STATA. At the end of the training, participants will be able to perform data management and analysis in STATA, produce graphs and tables for reporting, and use STATA for data profiling and approved updates.

Uploaded by

Sonny Asis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views63 pages

NBDM Training 2016 - Module 2 - Data Management Using Stata

The document describes a National Beneficiary Data Training that will take place from April 28 to May 7, 2016 at the Cloud 9 Hotel in Antipolo, Rizal. The training will cover getting started with STATA, data profiling, analysis and reporting using STATA. At the end of the training, participants will be able to perform data management and analysis in STATA, produce graphs and tables for reporting, and use STATA for data profiling and approved updates.

Uploaded by

Sonny Asis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

NATIONAL BENEFICIARY DATA TRAINING FOR 2016

April 28-May 7, 2016


The Cloud 9 Hotel, Antipolo, Rizal
At the end of the module, participants
will:
 Appreciate the power of STATA as
another method to undertake data
management
 Be able to perform data profiling and
data analysis in STATA
 Be able to produce graphs and tables
for reporting using STATA
 Be able to use STATA for data
management
 Getting started with STATA
 Data profiling using STATA
 Data analysis using STATA
 Data reporting using STATA
 Workshop on data profiling,
analysis and reporting using
STATA and P1 2016 approved
updates
STATA
A statistical software used in
data management and
analysis What is STATA?

The name STATA is a syllabic


abbreviation of the words
“statistics” and “data”.
STATA
Initial release: 1985 by
StataCorp
What is STATA?

Command-line interface and


graphical user interface (menus
and dialog boxes)

Case sensitive (all commands


are in lower cases)
 Results window: The big window. Results of
all Stata commands appear here (except
graphs which are shown in their own
windows).
 Command window: Below the results window. The STATA
Commands are entered here. Windows
 Variables window: Shows a record of all
variables in the dataset that is currently
being used
 Review window: Records all Stata commands
that have been entered. A previous command
can be repeated by double-clicking the
command in the Review window (or by using
Page Up).
Properties Window

Browser Window
Other STATA
Windows
Editor Window

Do-file Window
The three most important menus
 Data - for organizing and managing the data
 Graphics - for visual exploration &
presentation
Stata Menus and
 Statistics - for analysis
Toolbar

Data, Graphics, Statistics


By default:
c:\Program Files\stata 12
Change working directory:
cd c:\ foldername Define Working
Directory

Example:
cd "C:\Users\HP-
PC\Documents\Module 2 DM"
 Open STATA file (.dta)
−by selecting File  Open or by typing:
use <filename>
Example:
Opening Data
use “sample roster”
 If the file name contains blanks, the
address must be enclosed in quotation
marks.
 Save STATA file (.dta)
save <filename>
Example :
save “sample roster1” (1st time save)
Saving Data
save “sample roster1”, replace
(2nd time – be careful with overwriting)

 Close file (save first!)


clear
 Import excel (.xls)
- by selecting File  Import  Excel
spreadsheet or by typing
import excel “<filename>”, firstrow Importing Data

- Example:
import excel "update type 5 sample.xls“,
firstrow
 Import text file (.csv, .txt)
- by selecting File  Import  Text data
created by a spreadsheet or by typing
insheet using <filename> Importing Data
- Example:
insheet using "update type 5 sample.txt",
clear
 Export excel (.xls)
- Store excel data format (.xls)
- by selecting File  Export  Excel
spreadsheet or by typing
Exporting Data
export excel “<filename>”, firstrow(variable)
- Example:
export excel "update type 5 sample1.xls",
firstrow(variable)
 Import text file (.csv, .txt)
- by selecting File  Export  Comma-
or tab-separated data or by typing
outsheet using “<filename>” Exporting Data
- Store data created by a spreadsheet (.csv, .txt)
- Example:
outsheet using "update type 5 sample1.txt“
• Utilizes different kinds of
descriptive statistics such as:
• Frequency
• Minimum It is used as a
starting point for
• Maximum the data quality
assessment which
• Mean/Median/Mode will determine the
• Total/Sum future strategy with
regards to data
• Missing validation, error
correction
• Describes the data type of the
variables (whether string or
numerical)
 count
– counts the number of observation

 distinct [varlist]
- report number(s) of distinct observations or
values
- Example: distinct household_id

 describe or describe [varlist]


 displays the number of observations, variable
names, types and labels
 Example: describe household_id

Dataset: sample roster


codebook [varlist]
 displays the variable type, number of
missing values and sample values
 Example: codebook
codebook member_status
 summarize
– useful for numerical variables
- summary of # observation, average, min, max
 Example: summarize age
 browse or browse [varlist]
 displays the dataset
 Example: browse
browse gender preg_status

 edit or edit [varlist]


 displays the dataset and at the same time, the user may
edit the data
 Example: edit
edit gender preg_status
 sort [varlist]
 to arrange the data in ascending/alphabetical order using a
specific variable
 Example: sort last_name first_name mid_name

 rename [var1] [var2]


 to change the name of a particular variable
 Example: rename household_id hhid
generate newvar = exp
Create new variable
Example:
generate attending_school = "NO" if
schoolname=="NO SCHOOL" & attend_school !=1
replace oldvar = exp1 [if exp2]
may be used to change the contents of an existing
variable
Example:
replace attending_school = “YES" if
attending_school==“”
drop [varlist]
 to remove/delete a specific variable/data
Example: drop hh_set set_group

keep [varlist]
 to retain specific variables
 Example: keep hhid entry_id
One-way tabulation
tab [varname]
produces one-way table of
frequency counts and percentages
Example: tab region, m

tab1 [varlist]
produces results of the indicated
variables
Example: tab1 fieldupdated
newvalue, m
 Dataset: Update 5 sample
 append using filename [, options]
 to consolidate different datasets

A
+ B
=

B
 append

 Example:
- to consolidate approved update type 5 in
Antipolo City and Morong
 insheet using “update type 5 antipolo.txt“,
clear
 save “update type 5 antipolo.dta”
 insheet using "update type 5 morong.txt“,
clear
 save “update type 5 morong.dta”
 append using “update type 5 antipolo”
 Merge
- combine datasets horizontally
- to match/merge datasets

+ =
A B A B

• Type:
merge 1:1 varlist using filename
merge m:1 varlist using filename
merge 1:m varlist using filename

•Dataset in memory is called “master dataset”.


•Dataset filename is called “using dataset”.
Example:
- to check if there are newly selected child benes with
newly approved updates on education
 insheet using "update type 11 sample.txt", clear
 save "update type 11 sample"
 merge m:m entryid using "update type 5 sample“
 keep if _merge==3
 save “update type 11 with type 5.dta”

“There are 159 children selected as


beneficiaries for education last P1
2016 who also have approved
updates on education info in the
same period.
Comparing and contrasting two or more
variables/data
Duplicity checking
where STATA commands are saved in order to
keep a record of the commands used to
produce the result;

allows the user to run a long series of


commands several times;
One-way tabulation
tab [varname] if [exp]
produces one-way table of
frequency counts and percentages
according to the condition specified
Example:
- to show the grade level of the
child bene for education
tab ed_attnmnt if childbene
!="", m
Dataset: sample roster
 Two-way tabulation
 tab [varname1] [varname2]
 produces two-way table of frequency counts
 Example:
- to identify the child beneficiaries who are target for updating on school
facility
-tab attending_school childbene, m
Dataset: sample roster
 Two-way tabulation
 tab [varname1] [varname2] if [exp]
 produces two-way table of frequency counts according to the
condition specified
 Example:
tab attending_school childbene if member_status=="1 -
Active", m
Dataset: sample roster
 Import and save the sample grantee list

insheet using “sample grantee list.txt”, clear


save “sample grantee list.dta"
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Type:
duplicates tag [varlist], generate(newvar)
duplicates tag household_id, generate(dup_ID)
 Example 1: To identify possible duplicate households in
terms of HHID
 tab dup_ID

Since the value of


dup_ID is only zero
(0), then we are sure
that there are no
duplicates in terms of
the household ID.
 Example 2: To identify possible duplicate households in
terms of grantee names
duplicates tag lastname first_name middle_name, generate(dup_name)
 Example 2: To identify possible duplicate households in
terms of grantee names
 tab dup_name
 keep if dup_name==1
 save “possible duplicate HHs.dta”
Since the value of
dup_NAME consists
of one (1), then we
are sure that there
are indeed possible
duplicates in terms
of names.
 Example 2: To identify possible duplicate households in terms of grantee
names
 browse if dup_name==1
1. Merge list of identified possible duplicates
(dup_name = 1) with sample roster
use “possible duplicate HHs.dta”, clear
merge 1:m household_id using “sample roster”
 keep if _merge==3 (to retain only the roster of the possible
duplicate households)
 sort lastname first_name middle_name
3. Export the roster of possible duplicate HHs
3. Export the roster of possible duplicate HHs
export excel using "roster possible duplicate hhs.xls",
firstrow(variables)
4. Check the exported file in your folder and
conduct (manually) the roster analysis by analyzing
the composition of the possible duplicate
households
Textual Form
Can be presented using sentences and paragraphs
Involves enumerating important characteristics of the
data (count, minimum, maximum, average, mode
etc.)
Tabular Form
 Clear organization of data into rows and columns
1) Table Number
Table 1. Number of Child Beneficiaries per Age and Grade Level and Table Title
Age Category 2) Column
Level of Education Grand Total
3-5 YO 6-14 YO 15 -18 YO Header
No Grade Reported 36,448 77,969 27,099 141,516
Day Care 103,861 50,582 949 155,392
Kinder 197,272 160,854 1,791 359,917
Kinder / Day Care 1,109 62,520 1,613 65,242
Grade 1 121,643 314,199 1,502 437,344
Grade 2 6,692 683,029 4,194 693,915
Grade 3 1,028 833,623 8,495 843,146
Grade 4 589 1,063,993 20,544 1,085,126
3) Row
Grade 5 308 1,029,544 38,114 1,067,966 4) Body
Classifier
Grade 6 450 1,105,209 195,563 1,301,222
Grade 7 97 613,181 77,412 690,690
Grade 8 69 522,538 181,210 703,817
Grade 9 79 302,485 300,276 602,840

Grade 10 / 4th Year HS 62 42,561 433,370 475,993

Grade 11 - 79 423 502


Grade 12 2 32 90 124
Grand Total 469,709 6,862,398 1,292,645 8,624,752

Source: Pantawawid Pamilya Information System as of March 31, 2016 5) Source Note
 Example:
- to tabulate the grade level of the child bene for
education
 tab ed_attnmnt if childbene !="", m

Tabular Form
To copy the table from Stata to another application, say Excel,
highlight the table in the Stata Results window, and go to
the Edit menu, and select Copy, Copy Table, or Copy Table as
HTML. After you have copied the table, you can paste the table into
another program.

Tabular Form
Bar graph
- used to compare things between different
groups or to track changes over time.
- graph bar (count) entry_id , over(region_nick)
Source: Update Type 11 Graphical Form
To give a visual
effect
Line graph
- used to compare changes over the same
period of time for more than one group.

Graphical Form
To give a visual
effect
Pie Chart
- best to use when you are trying to compare
parts of a whole. It doesn’t show changes over time.
#delimit ;
graph pie entryid, over(relationshiptohead)
plabel(_all name, size(*1.5) color(white))
legend(off)
Graphical Form
plotregion(lstyle(none))
title("Relationship to HH Head of Child-beneficiaries P1 2016"); To give a visual
#delimit cr
effect
1. Using the list of approved updates
 Import and save all types of updates into STATA
 Tabulate
 Count of data set records with values or null (blank)
- E.g. – distinct count of households, count per field updated, count of new values with blanks, etc.
 Find 3 data inconsistencies
- E.g. – old value equals new value, age versus grade level, child bene but not child/grandchild, etc.
2. Using the grantee list and HH roster
 Graph the number of households by client status and province
 Summary of households with count of eligible by province.
 Compare the grade level and age of the child beneficiaries for education using a two way table.
3. Using the list of Code 21 households
 Check possible duplicates using the duplicates command in STATA and conduct sample of roster analysis if possible (minimum of 1
pair)

You might also like