0% found this document useful (0 votes)

37 views63 pages

NBDM Training 2016 - Module 2 - Data Management Using Stata

The document describes a National Beneficiary Data Training that will take place from April 28 to May 7, 2016 at the Cloud 9 Hotel in Antipolo, Rizal. The training will cover getting started with STATA, data profiling, analysis and reporting using STATA. At the end of the training, participants will be able to perform data management and analysis in STATA, produce graphs and tables for reporting, and use STATA for data profiling and approved updates.

Uploaded by

Sonny Asis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views63 pages

NBDM Training 2016 - Module 2 - Data Management Using Stata

Uploaded by

Sonny Asis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

NATIONAL BENEFICIARY DATA TRAINING FOR 2016

April 28-May 7, 2016

The Cloud 9 Hotel, Antipolo, Rizal
At the end of the module, participants
will:
 Appreciate the power of STATA as
another method to undertake data
management
 Be able to perform data profiling and
data analysis in STATA
 Be able to produce graphs and tables
for reporting using STATA
 Be able to use STATA for data
management
 Getting started with STATA
 Data profiling using STATA
 Data analysis using STATA
 Data reporting using STATA
 Workshop on data profiling,
analysis and reporting using
STATA and P1 2016 approved
updates
STATA
A statistical software used in
data management and
analysis What is STATA?

The name STATA is a syllabic

abbreviation of the words
“statistics” and “data”.
STATA
Initial release: 1985 by
StataCorp
What is STATA?

Command-line interface and

graphical user interface (menus
and dialog boxes)

Case sensitive (all commands

are in lower cases)
 Results window: The big window. Results of
all Stata commands appear here (except
graphs which are shown in their own
windows).
 Command window: Below the results window. The STATA
Commands are entered here. Windows
 Variables window: Shows a record of all
variables in the dataset that is currently
being used
 Review window: Records all Stata commands
that have been entered. A previous command
can be repeated by double-clicking the
command in the Review window (or by using
Page Up).
Properties Window

Browser Window
Other STATA
Windows
Editor Window

Do-file Window
The three most important menus
 Data - for organizing and managing the data
 Graphics - for visual exploration &
presentation
Stata Menus and
 Statistics - for analysis
Toolbar

Data, Graphics, Statistics

By default:
c:\Program Files\stata 12
Change working directory:
cd c:\ foldername Define Working
Directory

Example:
cd "C:\Users\HP-
PC\Documents\Module 2 DM"
 Open STATA file (.dta)
−by selecting File  Open or by typing:
use <filename>
Example:
Opening Data
use “sample roster”
 If the file name contains blanks, the
address must be enclosed in quotation
marks.
 Save STATA file (.dta)
save <filename>
Example :
save “sample roster1” (1st time save)
Saving Data
save “sample roster1”, replace
(2nd time – be careful with overwriting)

 Close file (save first!)

clear
 Import excel (.xls)
- by selecting File  Import  Excel
spreadsheet or by typing
import excel “<filename>”, firstrow Importing Data

- Example:
import excel "update type 5 sample.xls“,
firstrow
 Import text file (.csv, .txt)
- by selecting File  Import  Text data
created by a spreadsheet or by typing
insheet using <filename> Importing Data
- Example:
insheet using "update type 5 sample.txt",
clear
 Export excel (.xls)
- Store excel data format (.xls)
- by selecting File  Export  Excel
spreadsheet or by typing
Exporting Data
export excel “<filename>”, firstrow(variable)
- Example:
export excel "update type 5 sample1.xls",
firstrow(variable)
 Import text file (.csv, .txt)
- by selecting File  Export  Comma-
or tab-separated data or by typing
outsheet using “<filename>” Exporting Data
- Store data created by a spreadsheet (.csv, .txt)
- Example:
outsheet using "update type 5 sample1.txt“
• Utilizes different kinds of
descriptive statistics such as:
• Frequency
• Minimum It is used as a
starting point for
• Maximum the data quality
assessment which
• Mean/Median/Mode will determine the
• Total/Sum future strategy with
regards to data
• Missing validation, error
correction
• Describes the data type of the
variables (whether string or
numerical)
 count
– counts the number of observation

 distinct [varlist]
- report number(s) of distinct observations or
values
- Example: distinct household_id

 describe or describe [varlist]

 displays the number of observations, variable
names, types and labels
 Example: describe household_id

Dataset: sample roster

codebook [varlist]
 displays the variable type, number of
missing values and sample values
 Example: codebook
codebook member_status
 summarize
– useful for numerical variables
- summary of # observation, average, min, max
 Example: summarize age
 browse or browse [varlist]
 displays the dataset
 Example: browse
browse gender preg_status

 edit or edit [varlist]

 displays the dataset and at the same time, the user may
edit the data
 Example: edit
edit gender preg_status
 sort [varlist]
 to arrange the data in ascending/alphabetical order using a
specific variable
 Example: sort last_name first_name mid_name

 rename [var1] [var2]

 to change the name of a particular variable
 Example: rename household_id hhid
generate newvar = exp
Create new variable
Example:
generate attending_school = "NO" if
schoolname=="NO SCHOOL" & attend_school !=1
replace oldvar = exp1 [if exp2]
may be used to change the contents of an existing
variable
Example:
replace attending_school = “YES" if
attending_school==“”
drop [varlist]
 to remove/delete a specific variable/data
Example: drop hh_set set_group

keep [varlist]
 to retain specific variables
 Example: keep hhid entry_id
One-way tabulation
tab [varname]
produces one-way table of
frequency counts and percentages
Example: tab region, m

tab1 [varlist]
produces results of the indicated
variables
Example: tab1 fieldupdated
newvalue, m
 Dataset: Update 5 sample
 append using filename [, options]
 to consolidate different datasets

A
+ B
=

B
 append

 Example:
- to consolidate approved update type 5 in
Antipolo City and Morong
 insheet using “update type 5 antipolo.txt“,
clear
 save “update type 5 antipolo.dta”
 insheet using "update type 5 morong.txt“,
clear
 save “update type 5 morong.dta”
 append using “update type 5 antipolo”
 Merge
- combine datasets horizontally
- to match/merge datasets

+ =
A B A B

• Type:
merge 1:1 varlist using filename
merge m:1 varlist using filename
merge 1:m varlist using filename

•Dataset in memory is called “master dataset”.

•Dataset filename is called “using dataset”.
Example:
- to check if there are newly selected child benes with
newly approved updates on education
 insheet using "update type 11 sample.txt", clear
 save "update type 11 sample"
 merge m:m entryid using "update type 5 sample“
 keep if _merge==3
 save “update type 11 with type 5.dta”

“There are 159 children selected as

beneficiaries for education last P1
2016 who also have approved
updates on education info in the
same period.
Comparing and contrasting two or more
variables/data
Duplicity checking
where STATA commands are saved in order to
keep a record of the commands used to
produce the result;

allows the user to run a long series of

commands several times;
One-way tabulation
tab [varname] if [exp]
produces one-way table of
frequency counts and percentages
according to the condition specified
Example:
- to show the grade level of the
child bene for education
tab ed_attnmnt if childbene
!="", m
Dataset: sample roster
 Two-way tabulation
 tab [varname1] [varname2]
 produces two-way table of frequency counts
 Example:
- to identify the child beneficiaries who are target for updating on school
facility
-tab attending_school childbene, m
Dataset: sample roster
 Two-way tabulation
 tab [varname1] [varname2] if [exp]
 produces two-way table of frequency counts according to the
condition specified
 Example:
tab attending_school childbene if member_status=="1 -
Active", m
Dataset: sample roster
 Import and save the sample grantee list

insheet using “sample grantee list.txt”, clear

save “sample grantee list.dta"
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Example 1: To identify possible duplicate households in terms of
HHID
 Type:
duplicates tag [varlist], generate(newvar)
duplicates tag household_id, generate(dup_ID)
 Example 1: To identify possible duplicate households in
terms of HHID
 tab dup_ID

Since the value of

dup_ID is only zero
(0), then we are sure
that there are no
duplicates in terms of
the household ID.
 Example 2: To identify possible duplicate households in
terms of grantee names
duplicates tag lastname first_name middle_name, generate(dup_name)
 Example 2: To identify possible duplicate households in
terms of grantee names
 tab dup_name
 keep if dup_name==1
 save “possible duplicate HHs.dta”
Since the value of
dup_NAME consists
of one (1), then we
are sure that there
are indeed possible
duplicates in terms
of names.
 Example 2: To identify possible duplicate households in terms of grantee
names
 browse if dup_name==1
1. Merge list of identified possible duplicates
(dup_name = 1) with sample roster
use “possible duplicate HHs.dta”, clear
merge 1:m household_id using “sample roster”
 keep if _merge==3 (to retain only the roster of the possible
duplicate households)
 sort lastname first_name middle_name
3. Export the roster of possible duplicate HHs
3. Export the roster of possible duplicate HHs
export excel using "roster possible duplicate hhs.xls",
firstrow(variables)
4. Check the exported file in your folder and
conduct (manually) the roster analysis by analyzing
the composition of the possible duplicate
households
Textual Form
Can be presented using sentences and paragraphs
Involves enumerating important characteristics of the
data (count, minimum, maximum, average, mode
etc.)
Tabular Form
 Clear organization of data into rows and columns
1) Table Number
Table 1. Number of Child Beneficiaries per Age and Grade Level and Table Title
Age Category 2) Column
Level of Education Grand Total
3-5 YO 6-14 YO 15 -18 YO Header
No Grade Reported 36,448 77,969 27,099 141,516
Day Care 103,861 50,582 949 155,392
Kinder 197,272 160,854 1,791 359,917
Kinder / Day Care 1,109 62,520 1,613 65,242
Grade 1 121,643 314,199 1,502 437,344
Grade 2 6,692 683,029 4,194 693,915
Grade 3 1,028 833,623 8,495 843,146
Grade 4 589 1,063,993 20,544 1,085,126
3) Row
Grade 5 308 1,029,544 38,114 1,067,966 4) Body
Classifier
Grade 6 450 1,105,209 195,563 1,301,222
Grade 7 97 613,181 77,412 690,690
Grade 8 69 522,538 181,210 703,817
Grade 9 79 302,485 300,276 602,840

Grade 10 / 4th Year HS 62 42,561 433,370 475,993

Grade 11 - 79 423 502

Grade 12 2 32 90 124
Grand Total 469,709 6,862,398 1,292,645 8,624,752

Source: Pantawawid Pamilya Information System as of March 31, 2016 5) Source Note
 Example:
- to tabulate the grade level of the child bene for
education
 tab ed_attnmnt if childbene !="", m

Tabular Form
To copy the table from Stata to another application, say Excel,
highlight the table in the Stata Results window, and go to
the Edit menu, and select Copy, Copy Table, or Copy Table as
HTML. After you have copied the table, you can paste the table into
another program.

Tabular Form
Bar graph
- used to compare things between different
groups or to track changes over time.
- graph bar (count) entry_id , over(region_nick)
Source: Update Type 11 Graphical Form
To give a visual
effect
Line graph
- used to compare changes over the same
period of time for more than one group.

Graphical Form
To give a visual
effect
Pie Chart
- best to use when you are trying to compare
parts of a whole. It doesn’t show changes over time.
#delimit ;
graph pie entryid, over(relationshiptohead)
plabel(_all name, size(*1.5) color(white))
legend(off)
Graphical Form
plotregion(lstyle(none))
title("Relationship to HH Head of Child-beneficiaries P1 2016"); To give a visual
#delimit cr
effect
1. Using the list of approved updates
 Import and save all types of updates into STATA
 Tabulate
 Count of data set records with values or null (blank)
- E.g. – distinct count of households, count per field updated, count of new values with blanks, etc.
 Find 3 data inconsistencies
- E.g. – old value equals new value, age versus grade level, child bene but not child/grandchild, etc.
2. Using the grantee list and HH roster
 Graph the number of households by client status and province
 Summary of households with count of eligible by province.
 Compare the grade level and age of the child beneficiaries for education using a two way table.
3. Using the list of Code 21 households
 Check possible duplicates using the duplicates command in STATA and conduct sample of roster analysis if possible (minimum of 1
pair)

Getting Started With Your Data: Using Stata
No ratings yet
Getting Started With Your Data: Using Stata
32 pages
ECON6067 Stata (I) 2022
No ratings yet
ECON6067 Stata (I) 2022
28 pages
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
Stat A Tutorial
No ratings yet
Stat A Tutorial
40 pages
EH426 AT2 2024 Intro Stata
No ratings yet
EH426 AT2 2024 Intro Stata
39 pages
Command List For Fall 2015 Workshop
No ratings yet
Command List For Fall 2015 Workshop
4 pages
Econometrics Data Cleaning Guide
No ratings yet
Econometrics Data Cleaning Guide
7 pages
Introduction To Stata Data Management: Chang Y. Chung Office of Population Research Princeton University September 2013
100% (1)
Introduction To Stata Data Management: Chang Y. Chung Office of Population Research Princeton University September 2013
24 pages
Stataguide
No ratings yet
Stataguide
13 pages
Stata Class Notes
No ratings yet
Stata Class Notes
43 pages
Stata Basics for Econometrics Students
No ratings yet
Stata Basics for Econometrics Students
181 pages
Stata Presentation1
No ratings yet
Stata Presentation1
66 pages
Presentation 1
No ratings yet
Presentation 1
23 pages
Week 1 - Intro To Stata
No ratings yet
Week 1 - Intro To Stata
35 pages
Using Stata: The Opening Display
No ratings yet
Using Stata: The Opening Display
16 pages
Software Material
No ratings yet
Software Material
13 pages
Introduction To Stata Software, MaU, 2022
No ratings yet
Introduction To Stata Software, MaU, 2022
93 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
Computing For Research I: Spring 2012
No ratings yet
Computing For Research I: Spring 2012
34 pages
STATA
No ratings yet
STATA
26 pages
Data Analysis Using STATA Software
No ratings yet
Data Analysis Using STATA Software
16 pages
0.1 Intro To Stata
No ratings yet
0.1 Intro To Stata
30 pages
Stata Notes
No ratings yet
Stata Notes
7 pages
Stata - Tutorial MATERIAL
No ratings yet
Stata - Tutorial MATERIAL
3 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
Intro To Stata 2022
No ratings yet
Intro To Stata 2022
36 pages
Applied Econometrics Course Guide
No ratings yet
Applied Econometrics Course Guide
68 pages
Stata Prirucnik
No ratings yet
Stata Prirucnik
75 pages
Stata An Introduction Summer 2020
No ratings yet
Stata An Introduction Summer 2020
60 pages
Stata Basics for Beginners
No ratings yet
Stata Basics for Beginners
63 pages
6.1 Stata
No ratings yet
6.1 Stata
62 pages
Intro Stata
No ratings yet
Intro Stata
126 pages
Do - File - Quan Ly Va Lam Sach Du Lieu
No ratings yet
Do - File - Quan Ly Va Lam Sach Du Lieu
6 pages
Stata Data Wrangling: Importing Files
No ratings yet
Stata Data Wrangling: Importing Files
7 pages
Stata Manual Introduction
No ratings yet
Stata Manual Introduction
24 pages
Introduction To Stata 2024-06-18 Handout
No ratings yet
Introduction To Stata 2024-06-18 Handout
52 pages
Introduction To Stata
No ratings yet
Introduction To Stata
71 pages
Ipa/J-Pal Staff Training STATA 101
No ratings yet
Ipa/J-Pal Staff Training STATA 101
24 pages
Basics of STATA Software
No ratings yet
Basics of STATA Software
67 pages
Stata Cheat Sheet: Command in "User" Menu Useful For What? Additional Options More Info
No ratings yet
Stata Cheat Sheet: Command in "User" Menu Useful For What? Additional Options More Info
2 pages
A Short Introduction To STATA
No ratings yet
A Short Introduction To STATA
8 pages
Stata Introduction and Worksheet
No ratings yet
Stata Introduction and Worksheet
2 pages
Introduction To Stata: Ucla Idre Statistical Consulting Group
No ratings yet
Introduction To Stata: Ucla Idre Statistical Consulting Group
119 pages
Training at Gudar Campus
No ratings yet
Training at Gudar Campus
83 pages
Stataguide
No ratings yet
Stataguide
17 pages
STATA Basics for Economics Students
No ratings yet
STATA Basics for Economics Students
6 pages
Tutorial of Stata
No ratings yet
Tutorial of Stata
11 pages
Stata's: What To Do First?
No ratings yet
Stata's: What To Do First?
3 pages
Introduction To Stata: Li-Pin Juan
No ratings yet
Introduction To Stata: Li-Pin Juan
41 pages
Stoc
No ratings yet
Stoc
44 pages
Computing Stata Notes
No ratings yet
Computing Stata Notes
5 pages
STATA Basics for Academic Users
100% (3)
STATA Basics for Academic Users
46 pages
Stata 10 Guide for Econometrics
No ratings yet
Stata 10 Guide for Econometrics
7 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
Stat A Guide
No ratings yet
Stat A Guide
16 pages
Project Execution Notes
No ratings yet
Project Execution Notes
3 pages
EMPTECH - Grade 11
No ratings yet
EMPTECH - Grade 11
2 pages
Web Development Essentials Guide
No ratings yet
Web Development Essentials Guide
4 pages
PHP Syllabus
No ratings yet
PHP Syllabus
2 pages
Ibns 20 Deployment Guide November 2014 Guide Compress
No ratings yet
Ibns 20 Deployment Guide November 2014 Guide Compress
73 pages
Resume Profile
No ratings yet
Resume Profile
2 pages
Kishore Kumar Appavoo
No ratings yet
Kishore Kumar Appavoo
3 pages
Selenium Testing Guide
No ratings yet
Selenium Testing Guide
35 pages
Iterative Model
No ratings yet
Iterative Model
3 pages
Data Integrity Protection in Cloud Computing
No ratings yet
Data Integrity Protection in Cloud Computing
14 pages
IDOC Filtration
No ratings yet
IDOC Filtration
9 pages
BBMD PDF
No ratings yet
BBMD PDF
6 pages
Requirements Documentation
No ratings yet
Requirements Documentation
10 pages
Weekly Report Februari 2023 (Baba) Terbaru
No ratings yet
Weekly Report Februari 2023 (Baba) Terbaru
72 pages
Opmanager Datasheet
No ratings yet
Opmanager Datasheet
5 pages
National Data Policy Qatr
No ratings yet
National Data Policy Qatr
34 pages
DBMS Monograph For Appraisal
No ratings yet
DBMS Monograph For Appraisal
10 pages
Free Data Science Courses & Certs
No ratings yet
Free Data Science Courses & Certs
2 pages
Lalith Resume
No ratings yet
Lalith Resume
1 page
ECE 4180 Lab 5: C/C++ and C# GUI Application Development in Windows and Interfacing Mbed To A PC
No ratings yet
ECE 4180 Lab 5: C/C++ and C# GUI Application Development in Windows and Interfacing Mbed To A PC
25 pages
Dbvisit 7 Eleven Case Study
No ratings yet
Dbvisit 7 Eleven Case Study
2 pages
Crossplane and Tap Session
No ratings yet
Crossplane and Tap Session
47 pages
SAP Support Setup Guide
No ratings yet
SAP Support Setup Guide
9 pages
Cyber Security
No ratings yet
Cyber Security
29 pages
5481-Article Text-5555-1-10-20210707
No ratings yet
5481-Article Text-5555-1-10-20210707
8 pages
Fss BNK Towergroup Payment Hubs
No ratings yet
Fss BNK Towergroup Payment Hubs
16 pages
Web - Enabled Data Warehouse Why The Web? Convergence of Technologies The Web As A Data Source
No ratings yet
Web - Enabled Data Warehouse Why The Web? Convergence of Technologies The Web As A Data Source
14 pages
Unit 2
No ratings yet
Unit 2
16 pages
ERP 10 - Deploying Client Config Changes To Every PC - Knowledge Portal
No ratings yet
ERP 10 - Deploying Client Config Changes To Every PC - Knowledge Portal
2 pages
350-415 ENSWDI Exam Outline
No ratings yet
350-415 ENSWDI Exam Outline
2 pages

NBDM Training 2016 - Module 2 - Data Management Using Stata

Uploaded by

NBDM Training 2016 - Module 2 - Data Management Using Stata

Uploaded by

NATIONAL BENEFICIARY DATA TRAINING FOR 2016

April 28-May 7, 2016

The name STATA is a syllabic

Command-line interface and

Case sensitive (all commands

Data, Graphics, Statistics

 Close file (save first!)

 describe or describe [varlist]

Dataset: sample roster

 edit or edit [varlist]

 rename [var1] [var2]

•Dataset in memory is called “master dataset”.

“There are 159 children selected as

allows the user to run a long series of

insheet using “sample grantee list.txt”, clear

Since the value of

Grade 10 / 4th Year HS 62 42,561 433,370 475,993

Grade 11 - 79 423 502

You might also like