KEMBAR78
Biostat Module1-3 | PDF | Mean | Quartile
0% found this document useful (0 votes)
4 views12 pages

Biostat Module1-3

aaaaaaaaaaaaaaaaa

Uploaded by

kiskie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

Biostat Module1-3

aaaaaaaaaaaaaaaaa

Uploaded by

kiskie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MODULE 1: BIOSTATISTICS AND EPIDEMIOLOGY - Usually occurs earlier in time than the

dependent variable (always happens first)


WHAT IS BIOSTATISTICS?

BIO – means life

STATISTICS – the science that deals with the collection,


organization, analysis and interpretation of numerical
data.

• Singular – body of method


• Plural – set of data

2 Branches of Statistics:

• Descriptive Statistics – refers to the different TYPES OF VARIABLES:


methods applied in order to summarize and
present data in a form which will make them Qualitative (Categorical)
easier to analyze and interpret data.
- Non-numerical
• Inferential Statistics – refers to the methods - Unmeasured variables
involved in order to make generalizations and
conclusions about a target population based on Quantitative (Continuous)
results from a sample.
- Data is based on a unit; measurable
IMPORTANCE OF BIOSTATISTICS: • Discrete: no in-between values (exact values)
-distinct either/or results
• A tool in decision making
• Continuous: values that have increments
• Better clinical decisions in between
• Improved patient care and outcomes -can be averaged
VARIABLES AND DATA: SCALES OF MEASUREMENT:
• Phenomenon whose values and categories NOMINAL ORDINAL INTERVAL RATIO
cannot be predicted (CATEGORICAL): (ORDERED): (scale of equally (an interval
No order Cannote spaced but w/out a variable with
• Can be any characteristics that differs from ranking or true zero point) true zero point)

person to person (Dicker Et. Al, 2012) inequalities

• All the information regarding all the variables in Car color Shirt sizes Date of Length
the study are called DATA. (S,M,L) birth
Names Cancer rating based Volume
TYPES OF VARIABLES: stages
Gender Pain scale On a criteria Seconds
Dependent (1-10)
- Known as the outcome variable courses temperature Money
- Usually measured
- Relies on other factors (but usually on the
independent variable) DEMOGRAPHICS

Independent - The science of population

- Known as experimental or predictor variable 3 ELEMENTS:


- The cause or reason of any situation which can
1. SIZE
be manipulated
2. COMPOSITION/STRUCTURE
- Determines the outcome of the dependent
3. DISTRIBUTION
variable
ANALYZING DATA: - Data are explained in paragraph or sentence
form.
• Descriptive Statistics – procedure that help us
organize and describe data collected from either
a sample or a population.
• Inferential Statistics – the logic and procedures
concerned with making predictions or inference
about a population from observations and
analyses of a sample.

POPULATION VS SAMPLE

• Population – total step of individuals objects, Tabular Method


groups or events in which the researcher is - Done by presenting data in tables
interest. - A systematic way of arranging them in rows and
• Sample – a relative means small substance columns
selected from a population. - Useful for demonstrating patterns, exception,
SOURCES OF DEMOGRAPHIC DATA? differences and other relationships
- Usually serve as basis for preparing additional
DEMOGRAPHIC TOOLS: visual displays of data
• Ratio – a relative magnitude of 2 quantities or a - TABLES DESIGNED TO PRESENT DATA TO OTHERS
comparison of any two values. SHOULD BE AS SIMPLE AS POSSIBLE

Graphical Method

- Displays numeric data in visual form


- Can display patterns, trends, aberrations,
• Proportion – comparison of a part to the whole similarities and differences in the data not
and the numerator is part of the denominator. evident in table
- Can be an essential tool for analyzing and trying
to make sense of data
- Designed and constructed as to attract and hold
attention

• Rate – an event occurs in a defined population PARTS OF TABLES:


over a specified period of time

MODULE 2: PRESENTATION OF DATA

DATA ARE PRESENTED IN 3 BASIC FORMS:

Textual/Narrative Method

- The data is simply narrated


- Simplest form of presenting data
Table Number - Placed after the footnote or after the bottom
rule
- must be numbered consecutively as they appear
in the article or report TABLE CONSTRUCTION
- place the table number on the first line of the
Simplicity
title on a single line
- the word “TABLE” is typed with the left hand - Clean, professional and uniform look
edge of the table followed by the number, space - Devoid of unnecessary markings and frills
and the first line of the title - Break up tables if necessary
Title Clarity
- should give complete information as to: - Should jive with the textual discussion
• What - Does not appear out of place
• Who - Clear, concise heading , footnotes etc
• Where
Directness
• When
- Only necessary information should be included
Headnote: a secondary caption and serves to clarify
items in the main title or body of the tabke

Column Heading

- Indicate the basis of classification of the


columns
- The heading should be centered on the columns
where they belong
- Should follow the style adopted for the title

Row Heading/Stubs

- Indicate the basis of classification of the


columns
- The heading should be centered on the
columns where they belong
- Should follow the style adopted for the title

Body

- The intersection of a row and column are called


a cell.
• Figures should be aligned by the decimal points
Ex. 4.5,7,.25,3.53 → 4.50 , 7.00 , 0.25 , 3.53 Class Intervals

Footnotes - Should be mutually exclusive and exhaustive


- When dealing with fractions or decimal, follow
- Small letters rather than numbers should be conventional rounding rules.
used to designate footnotes - Use principles of biologic plausibility
- All footnotes should be placed immediately Ex. Neomatal (0-12 mos)
below the bottom rule of the table Adolescents (5-14 yrs)
Source of Data
- Always consider a category of “unknown” or
- If the data presented are not originated “not stated”
Strategies:

• Strategy 1: Divide the data into groups of similar


size
• Strategy 2: Base intervals on mean and standard
deviation
• Strategy 3: Divide the range into equal class
intervals

CONSTRUCTING A FREQUENCY TABLE

1. Arrange data in ascending order


2. Determine the range
- The difference between the lowest and highest
value
3. Determine the number of classes to be used
(usually between 5-15)
4. Determine the width of each class
- Range/divided by the number of classes decided
in step 3
5. List the classes by specifying the lower and
upper limits of the class
6. Count the number of the objects falling within
each class
7. Various statistics may now be derived

Graphical Method

- Displays numeric data in visual form


- Can display patterns, trends , aberrations,
similarities and differences in the data not
Class Midpoint evident in tables
- The value which divides the class into subclasses - Can be an essential tool for analyzing and trying
- Equal to the sum of the class limits divided by 2 to make sense of data
- Designed and constructed as to attract and hold
Cumulative Frequency attention
- The sum of all class frequencies preceding a MOST GRAPHS HAVE TWO SCALES OR AXES THAT
given class INTERSECT AT A RIGHT ANGLE
Relative Frequency X-axis - generally shows value of the independent
- =frequency/(total number of observation)x100 variable horizontal plane

Cumulative Relative Frequency Y-axis – shows the dependent variable

- The sum of all relative frequencies preceding a Each axis should be labelled properly
given class Ex. Unit, variable name etc.
GRAPHS TYPES: Multiple bar chart
- The component figures are shown as separate
Pie Chart/Circle graph
bars adjoining each other
- Simple, easily understood chart - Depicts distributional pattern of more than one
- The “slices” shows the proportional contribution variable
of each part
- Useful for the showing the proportions of a
single variable’s frequency distribution
- The total of the whole circle is 100%

Component bar/subdivided diagram

- Bars are sub-divided into component parts of


the figure
- These sort of diagrams are constructed when
each total is built up from two or more
component figures.
Bar graph
Histogram
- The caption is placed below the chart
- Graphical representations of the frequency
representing the figure, figure number and title
distribution of a continuous quantitative
heading
- A rectangle or bar is used to depict the counts
- Used to portray categories of a qualitative
of each class or groupings
variable (horizontal) or a discrete quantitative
variable (vertical) Line graph

Advantages: - Done by plotting data with dots and connecting


the plotted prints by a straight line
• Gives clearer picture of illustration of one
- Primarily intended to portray trends
variable to another
- Advantages when plotting 2 or more variables
• Give more readily understandable presentation
of the data Arithmetic-scale line graph
• The height of each variable is easy to interpret
- Shows patterns or trend over some variable
Disadvantages: - In epidemiology, it is used to show long series of
data and compare several series
• Shows confusing picture when plotting - Primarily portray an overall trend over time
• Variables which overlap with each other

Simple bar chart

- A one-dimensional diagram in which the bar


represents the whole of the magnitude
SUMMARY: IBM SPSS STATISTICS
SPSS – stands for STATISTICAL PACKAGE FOR THE SOCIAL
SCIENCES

- Used to analyze data collected from surveys,


tests, observations, etc.

Among its features for statistical data analysis are:

1. Descriptive Statistics: Frequencies, central


tendency, plots , charts, etc.
2. Inferential and multivariate statistical
procedures: ANOVA, factor analysis, cluster
analysis, etc.

DATA EDITOR WINDOW:

1. Data view: for data input


• When SPSS statistics is launched, the data editor
window opens in Data View which looks similar
to a Microsoft Excel worksheet (a matrix
consisting of rows and columns).The difference
is that the rows and colums in Data view are
reffered to as cases and variables.

WHY USE CHARTS AND GRAPHS?


2. Variable view: for adding variables and defining
What do you lose?
variable properties
• Ability to examine numeric detail offered by a • Variable view is where variables are defined by
table assigning variable names and specifying the
• Ability to see additional relationships within attributes such as data type
data (string,data,numeric,etc.), value labels and
• Potentially time measurement scales (nominal,ordinal,or
scale).You can think of variable view as the
What do you gain?
backbone structure for data view; data cannot
• Ability to direct readers’ attention to one aspect be entered nor viewed without first defining
of the evidence variables in variable view.
• Ability to reach readers who might otherwise be
intimidated by the same data in a tabular
format
• Ability to focus on bigger picture rather than
perhaps minor technical details.
CREATING A DATA FILE 9. In the value labels dialog box, type 1 in the
value box, type female in the label box and then
1. DEFINING THE VARIABLES
click the add button
2. ENTERING THE DATA
10. Repeat step 9 using a value of 2 and a label of
• Defining the variables involves multiple
male
processes and requires careful planning.
11. Click the OK button
• Once the variables have been defined, the data
can then be added

DEFINING VARIABLES

First, assign variable names based on your research


questionnaire.If variable names are not assigned, SPSS
statistics provides default names that may not be
recognizable.Second, each variable’s Type attribute
should be specified.if necessary, assign labels to values
to help all users of the file better understand the data.
12. Type GPA in row three under the name column
To define variables (example) and then press the enter key
13. Type age in row four under the name column
1. Click the variable view tab in the lower-left
and then press the enter key
corner of the data editor window
14. Click the cell in row four under the decimals
column, and then change the entry to 0 using
the spin box
15. Type what is your age? In row four under the
label column and then press the enter key
2. Type Name in the first cell under the name 16. Click none in row four under the values column,
column, and then press the enter key. and then click the ellipses button
3. Under the type column, click numeric and then 17. In the value labels dialog box, type 1 in the
Ellipses button that appers in the cell value box, type 19 or younger in the label box
4. In the Variable type dialog box, select the string and then click the add button
option button and then click the OK button. 18. Repeat step 17 for values 2 through 5 and label
them as shown in table 3.see figure 7 for the
results
19. Click OK buttom

5. Type sex in row two under the name column,


and then press the enter key
6. Click the cell in row two under the decimals
VARIABLE NAMES:
column, and then change the entry to 0 using
>the following rules apply to variable names:
the spin box
7. Type what is your sex? In row tow under the • Each variable name must be unique; duplication
label column and then press the enter key is not allowed
8. Click none in row two under the values column
and then click the ellipse buttom
• Variable names ending with a period should be representing degree of satisfaction or
avoided, since the period may be interpreted as confidence and preference rating scores.
a command terminator
Scale
• The period, the underscore and the characters
#,$ and @ can be used within variable - A variable can be treated as scale (continuous)
names.For example, A._$@#1 is a valid variable when its values represent ordered caregories
name with a meaningful metric, so that distance
• Variable names cannot contain spaces comparisons between values are
• Variable names can be up to 64 bytes long and appropriate.Examples of scale variables include
the first character must be a letter or one of the age in years and income in thousands of dollars.
characters @,# or $.Subsequent characters can
Note: For ordinal stirng variables, the alphabetic order
be any combination of letters, numbers,
of string values is assumed to reflect the true order of
nonpunctuation characters and a period(.)
the categories.For example, for a string variable wth the
• Variable names ending in underscores should be
value of low, medium, high, the order of the categories
avoided, since such names may conflict with
is interpreted as high,low,medium which is not the
names of variables automatically created by
correct order.In general, it is more reliable to use
commands and procedures.
numeric codes to represent ordinal data.
• Reserved keywords cannot be used as variable
names.Research keywords are
ALL,AND,BY,EQ,GE,GT,LE,LT,NE,NOT,OR,TO,and
VARIABLE TYPE:
WITH.
• When long variable names need to wrap onto • Numeric – a variable whose values are
multiple lines in output, lines are broken at numbers.Values are displayed in standard
underscores, periods and points where content numeric format.The data editor accepts numeric
changes from lower case to upper case values in standard format or in scientific
• Variable names can be defined with any mixture notation.
of uppercase and lower case charascters and • Comma – a numeric variable whose values are
case is preserved for display purposes. displayed wth commas delimiting every three
places and displayed with the period as a
VARIABLE MEASUREMENT LEVEL
decimal delimiter.The data editor accepts
>you can specify the level of measurement as scale numeric values for comma variables with or
(numeric data on an interval or ratio scale), ordinal or without commas or in scientific notation.Values
nominal.Nominal and ordinal can be either string cannot contain commas to the right of the
(alphanumeric) or numeric. decimal indicator.
• Dot – a numeric variable whose values are
Nominal
displayed with periods delimiting every three
- A variable can be treated as nominal when its places and with the comma as a decimal
vaues represent categories with no intrinsic delimiter.The data editor accepts numeric values
ranking (for example, the department of the for dot variables with or without periods or in
company in which an employee scientific notation. Values cannot contain
works).Examples of nominal variables include periods to the right of the decimal indicator.
region, postal code and religious affliation. • Scientific notation – a numeric variable whose
values are displayed with an embedded E and a
Ordinal
signed power-of-10 exponent. The data editor
- A variable can be treated as ordinal when its accepts numeric values for such variales with or
values represent categories with some intrinsic without an exponent.The exponent can be
ranking (for example, levels of service preceded by E or D with an optional sign or by
satisfaction from highly satisfied).Examples of the sign alone – for example, 123, 1.23E2,
ordinal variables include attitude scores 1.23D2, 1.23E+2 and 1.23+2
• Custom currency – a numeric variable whose • SKEWED – if one tail of a unimodal distribution
values are displayed in one of the custom is longer than the other tail, meaning that the
currency formats that you have defined on the data is not spread evenly.\
currency tab of the options dialog box.Defined
Data can be either right skewed (positively skewed) or
custom currency characters cannot be used in
left skewed (negatively skewed)
data entry but are displayed in the data editor.
• String – a variable whose values are not numeric • If data is skewed to the right , it will rise quickly
and therefore are not used in calculations.The to a peak and have a long tail on the right.
values can contain any characters up to the • The opposite is true for data that is skewed to
defined length.Uppercase and lowercase letters the left.
are considered distinct.This type is also known
as an alphanumeric variable. REMEMBER: Skeweness refers to the tail, not the
• Restricted numeric – a variable whose values hump.So a distribution that is skewed to the left has a
are restricted to non-negative integers.Values long left tail.
are displayed with leading zeros padded to the
maximum width of the variable.Values can be
entered in scientific notation.

MODULE 3: SUMMARY MEASURES

MEASURES OF CENTRAL TENDENCY

• Convey information regarding the average value The mean is simply the arithmetic average of the data
of a set of values and is calculated by taking the sum of all values in the
• The clustering/grouping at a particular value or number set and dividing the total by the number of
central point of a frequency distribution values in the dataset.The mean is the most commonly
• Thus, for any particular set of data , a single used measure of central tendency.
typical value can be used to describe the entire
• Properties of the mean
data
1. Uniqueness. For a given set of data there is
• Three measures of central location are
one and only one arithmetic mean.
commonly used in epidemiology:
2. Simplicity. The arithmetic mean is easily
Arithmetic mean, median and mode
understood and easy to compute.
The choice of an appropriate measure of central 3. Since each and every value in a set of data
tendency for representing a distribution depends on enters into the computation of the mean, it
three factors: is affected by each value. Extreme values,
therefore, have an influence on the mean
• The way the variables are measured (their level and, in some cases, can so distort it that it
of measurement) becomes undesirable as a measure of
• The shape of the distribution central tendency.
• The purpose of the research 4. The mean is not the measure of choice for
Data can be either symmetric or skewed data that are severely skewed or have
extreme values in one direction or another
• SYMMETRIC – of the data can be divided into
pieces that are very similar to each other
observed values (if even number of
observations)
• The median, like the mode, is not generally
affected by one or two extreme values
(outliers).For example, of the values on the
previous page had been 4,23,28,31 and 131
(instead of 31), the median would still be 28.
• The median has less-than-ideal statistical
properties.Therefore, it is not ofte used in
statistical manipulations and analyses.

• The category or score with the largest


frequency or percentage in the distribution
• It can be determined simply by tallying the
number of times each value occurs.
The median is the 50th percentile of the values in a
dataset and represents the literal middle of the data.

a. If the number of observations (n) is odd, the


middle position falls on a single observation.
b. If the number of observations is even, the
middle position falls between two observations.

Properties and uses of the mode

The mode is the easiest measure of central location


to understand and explain.It is also the easiest and
requires no calculations.

• The mode is the preferred measured of central


location for addressing which values is the most
popular or the most common.For example, the
mode is used to describe which day of the week
people most prefer to come to the influenza
vaccination clinic, or the “typical” number of
doses of DPT the children in a particular
community have received by their second
birthday.
• As demosntarted, a distribution can have a
single mode.however, a distribution has more
Properties and uses of the median than one mode is two or more values tie as the
most frequent values.It has no mode if no value
• The median is a good descriptive measure,
appears more than once
particularly for data that are skewed, because it
• The mode is used almost exclusively as a
is the central point of the distribution
“descriptive” measure.It is almost never used in
• The median is relatively easy to identify.It is
statistical manipulations or analyses.
equal to either a single observed value (if odd
number of observations) or the average of two
• The mode is not typically affected by one or - In other words, the interquartile range includes
two extreme values. the second and third quartiles of a distribution
- This measurement gives an idea of the middle
MEASURES OF DISPERSION
50 percent of the observations and is, therefore,
• Spread, or dispersion, is the second important less likely to be influenced by outliers or
feature of frequency distributions. extreme values.
• A way to describe the spread of the data, or
how far each data point is from the center.
• If all the values are the same, there is no
dispersion; if they are not all the same,
dispersion is present in the data.
• The amount of dispersion may be small when
the values, though different, are close together
• If the values are widely scattered, the dispersion
is greater.
• Measures of spread include the range,
interquartile range, variance and standard
deviation.

- The range is the difference between the largest


and smallest value in a set of observations.
Range= maximum – minimum
- In the statistical world, the range is reported as
a single number and is the result of subtracting
the maximum from the minimum value. In the
epidemiologic community, the range is usually
reported as “from (the minimum) to (the
maximum),” that is, as two numbers rather than
one.

- The variance represents the amount of spread


or variability around the mean of a set of data.
- The variance can be described as the average
squared deviation of individual values from the
mean of that set

- The standard deviation of a set of data is the


- The interquartile range is the difference square root of the variance
between the 25th percentile (1st quartile) and
the 75th percentile (3rd quartile) in a set of
data.
- Standard deviation is usually calculated only percentile, because all values fall at or below
when the data are more-or-less “normally the maximum.
distributed”

- The variance and the standard deviation are two > A general rule to follow is that if the data is skewed
closely related measures of variation that either to the left or to the right, the median represents
increase or decrease based on how closely the the data better than the mean.
scores cluster around the mean
> If a sample is normally distributed, the mean and
median will be nearly the same. With symmetrical data,
the mode will be similar as well.

> The arithmetic mean is the best descriptive measure


Standard Error (SE)
for data that are normally distributed
- The standard error is the standard deviation of
the sampling distribution of the means, rather
than the observations themselves.
- The smaller the standard error, the closer any
given sample mean is likely to be to the true
population mean
- The primary practical use of the standard error
of the mean is in calculating confidence
intervals around the arithmetic mean.

Measure of Position

They do not measure a central tendency or a spread


(dispersion), but instead measure location in a data set.

• Quartiles: Each quartile includes 25% of the


data. First quartile is the 25th percentile. Second
quartile is the 50th percentile (median) Third
quartile is the 75th percentile Fourth quartile is
the 100th percentile (maximum)
• Percentiles: Divide the data in a distribution into
100 equal parts. The Pth percentile (P ranging
from 0 to 100) is the value that has P percent of
the observations falling at or below it. In other
words, the 90th percentile has 90% of the
observations at or below it. The median, the
halfway point of the distribution, is the 50th
percentile. The maximum value is the 100th

You might also like