Correlation
PROC CORR DATA=dataset <options>;
     VAR variable(s);
     WITH variable(s);
RUN;
PROC CORR DATA=sample PLOTS=SCATTER(NVAR=all);
   VAR weight height;
RUN;
DATA WineRanking;
     INPUT company $ type $ score 3. date MMDDYY10.;
             FORMAT date MMDDYY8.;
         DATALINES;
           Helmes Pinot 56 09/14/2012
           Helmes Reisling 38 09/14/2012
           Vacca Merlot 91 09/15/2012
           Sterling Pinot 65 06/30/2012
           Sterling Prosecco 72 06/30/2012
         ;
RUN;
Type           Informat Name          What it Does
Character      $w.                    Reads in character data of length w.
Numeric        w.d                    Reads in numeric data of length w with d decimal points
Date           MMDDYYw.               Reads in date data in the form of 10-01-81
PROC FORMAT;
     VALUE GENDERCODE
       0 = 'Male'
       1 = 'Female';
     VALUE ATHLETECODE
       0 = 'Non-athlete'
       1 = 'Athlete';
     VALUE SMOKINGCODE
       0 = 'Nonsmoker'
       1 = 'Past smoker'
       2 = 'Current smoker';
RUN;
DATA sample_formatted2;
     SET sample;
       FORMAT gender GENDERCODE. athlete ATHLETECODE. smoking SMOKINGCODE.;
RUN;
PROC PRINT DATA=sample LABEL; // FOR LABELLING
     VAR bday;
     LABEL bday = "Date of Birth";
RUN;
                                          DATE
DATDIF(start_date, end_date, basis);
DATA sample;
     SET sample;
     date = DATDIF(DOB, Admdate, 'ACT/ACT');
RUN;
Here the DATDIF function returns the difference between two date variables
(DOB and Admdate) in number of days and saves it in the new numeric variable date.
DATA sample;
     SET sample;
     years = YRDIF(DOB, Admdate, 'ACT/ACT');
RUN;
Here the YRDIF function gives the difference between two dates (DOB and Admdate) in
number of years and saves it in the new numeric variable years.
DATA sample;
     SET sample;
     date = MDY(mn, days, yr);
     FORMAT date MMDDYY10.;
RUN;
Here a new variable date will be created by combining the values in the
variables mn, days, and yr using the MDY function. The (optional) MMDDYY10. format
tells SAS to display the date values in the form MM/DD/YYYY.
DATA sample;
     SET sample;
     wkday = WEEKDAY(DOB);
RUN;
Here the WEEKDAY function extracts the day of the week value from the date
variable DOB and saves it in the new numeric variable wkday.
DATA sample;
     SET sample;
     days = DAY(DOB);
RUN;
Here the DAY function extracts the day value from the date variable DOB and saves it in
the new numeric variable days.
DATA sample;
     SET sample;
     mn = MONTH(DOB);
RUN;
Here the MONTH function extracts the month value from the date variable DOB and
saves it in the new numeric variable mn
DATA sample;
     SET sample;
     yr = YEAR(DOB);
RUN;
Here the YEAR function extracts the year portion from the date value variable DOB and
saves it in the new numeric variable yr.
                                          SORTING
PROC SORT data=sample;
     BY gender descending bday;
RUN;
The data is sorted first by gender.Within each gender, the data is then sorted in descending
order by birth date.
                                            MERGE
DATA New-Dataset-Name (OPTIONS);
     MERGE Dataset-Name-1 (OPTIONS) Dataset-Name-2 (OPTIONS);
     BY Variable(s);
RUN;
The BY statement contains the variable(s) that identifies the observation in the first dataset that
represents the same subject as the observation in the second dataset.
DATA patients;
INPUT Subject_ID DOB Gender $;
INFORMAT DOB MMDDYY10.;
FORMAT DOB MMDDYY10.;
DATALINES;
1 9/20/1980 Female
2 6/12/1954 Male
3 4/2/2001 Male
4 8/29/1978 Female
5 2/28/1986 Female
;
RUN;
DATA initial_appointments;
INPUT Subject_ID Visit_Date Doctor $;
INFORMAT Visit_Date MMDDYY10.;
FORMAT Visit_Date MMDDYY10.;
DATALINES;
1 1/31/2012 Walker
2 2/2/2012 Jones
3 1/15/2012 Jones
5 1/29/2012 Smith
;
PROC SORT DATA=patients;
         BY Subject_ID;
RUN;
PROC SORT DATA=initial_appointments;
        BY Subject_ID;
RUN;
DATA one_to_one_match;// One-to-one matching assumes that each subject appears exactly
once in each of the datasets being merged.
         MERGE patients initial_appointments;
         BY Subject_ID;
RUN;
ONE-TO-MANY MATCH// INSTEAD OF ONE TO
ONE YOU CAN ALSO USE THIS.
One-to-many matching assumes that each subject appears exactly once in one dataset,
but can have multiple matching records in another dataset.
                           printing observations by groups
PROC SORT DATA=sample;
     BY Gender;
RUN;
PROC PRINT DATA=sample LABEL;
     BY Gender;
     ID ids;
     VAR Gender Height Weight;
     FORMAT Height Weight 3.0;
RUN;
Because we want to print observations by gender, we must first sort the data using PROC SORT.
The BY statement specifies that we want to group the printed output by the levels of
variable Gender. The ID statement specifies that variable StudentID should be printed instead of
the observation number. Because we are only interested in the height and weight of each
student, these two variables are specified in the VARstatement. (Note, however, that the variable
given in the ID statement will automatically print, regardless of whether or not it is listed in the
VAR statement.) Finally, a FORMAT statement specifies that height and weight should print with
no decimal point. (Specifically, it says that the values should be no wider than three characters,
and should have no decimal places.)
                              THE FREQ PROCEDURES
PROC FREQ DATA=dataset;
     TABLES variable(s) / <options>;
RUN;
The TABLES statement is where you put the names of the variables you want to produce a
frequency table for. You can list as many variables as you want, with each variable separated by
a space.
                      Descending Order and Missing Values
PROC FREQ DATA=sample ORDER=freq;
     TABLE State Rank / MISSING;
RUN;
The ORDER=freq option in the first line of the syntax tells SAS to order the values in the table in
descending order. The MISSING option appearing after the slash (/) in the TABLE statement
tells SAS to include the missing values as a row in the table.