KEMBAR78
SAS Interview QA | PDF | Databases | Sas (Software)
100% found this document useful (2 votes)
1K views12 pages

SAS Interview QA

Validation procedure is used to check the output of the SAS programmer. If this output is same as the output generated by the source programmer then the program is considered to be valid. Validation procedure can be used for TLG by checking the output manually and for analysis data set it can be done using proc COMPARE.

Uploaded by

hpradeep
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views12 pages

SAS Interview QA

Validation procedure is used to check the output of the SAS programmer. If this output is same as the output generated by the source programmer then the program is considered to be valid. Validation procedure can be used for TLG by checking the output manually and for analysis data set it can be done using proc COMPARE.

Uploaded by

hpradeep
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

c 


    

Ans:- These are the following four phases of the clinical trials:


c Test a new drug or treatment to a small group of people (20-80) to evaluate its
safety.


 The experimental drug or treatment is given to a large group of people (100-300) to
see that the drug is effective or not for that treatment.


 The experimental drug or treatment is given to a large group of people (1000-3000)
to see its effectiveness, monitor side effects and compare it to commonly used treatments.


 The 4 phase study includes the post marketing studies including the drug's risk,
benefits etc.

 
          
     
     

Ans:- Validation procedure is used to check the output of the SAS program generated by the
source programmer. In this process validator write the program and generate the output. If
this output is same as the output generated by the SAS programmer's output then the program
is considered to be valid. We can perform this validation for TLG by checking the output
manually and for analysis data set it can be done using PROC COMPARE.

      


    
 !"


## !

Ans:- It is not possible to perform the validation for the listing having 400 pages manually.
To do this, we convert the listing in data sets by using PROC RTF and then after that we can
compare it by using PROC COMPARE.

 $  %&$$&'(%)    !*




Ans:- Yes, we can use PROC COMPARE to validate the listing because if there are many
entries (pages) in the listings then it is not possible to check them manually. So in this
condition we use PROC COMPARE to validate the listings.

+    !  " ! !




Ans:- We can generate the listings by using the PROC REPORT. Similarly we can create the
tables by using PROC FREQ, PROC MEANS, and PROC TRANSPOSE and PROC
REPORT. We would generate graph, using proc Gplot etc.

,         

Ans:- Actually it depends on the complexity of the tables if there are same type of tables then,
we can create 1-2-3 tables in a day.
7 *
  
%&$-
  . 

Ans:- I have used many procedures like proc report, proc sort, proc format etc. I have used
proc report to generate the list report, in this procedure I have used subjid as order variable
and trt_grp, sbd, dbd as display variables.

/ 
   
    

Ans:- I have worked with demographic, adverse event , laboratory, analysis and other data
sets.

0     


  1(*
 
 

Ans:- We can submit the docs to FDA by e-submission. Docs can be submitted to FDA using

Define.pdf or define.Xml formats. In this doc we have the documentation about macros and
program and E-records also. Statistician or project manager will submit this doc to FDA.

c# *
 
     1(

Ans:- We submit ISS and ISE documents to FDA.

cc $ 
 $2-$. *
  $2-$-'
 


Ans: I have used version 3.1.1 of the CDISC SDTM.

c 
   
-(

Ans:- This document contains detailed information regarding study objectives and statistical
methods to aid in the production of the Clinical Study Report (CSR) including summary
tables, figures, and subject data listings for Protocol. This document also contains
documentation of the program variables and algorithms that will be used to generate
summary statistics and statistical analysis.

c       3 !   


    4   

My project group consisting of six members, a project manager, two statisticians, lead
programmer and two programmers.

I usually report to the lead programmer. If I have any problem regarding the programming I
would contact the lead programmer.

If I have any doubt in values of variables in raw dataset I would contact the statistician. For
example the dataset related to the menopause symptoms in women, if the variable sex having
the values like F, M. I would consider it as wrong; in that type of situations I would contact
the statistician.

c ).  -(-   


SAS documentation includes programmer header, comments, titles, footnotes etc. Whatever
we type in the program for making the program easily readable, easily understandable are in
called as SAS documentation.

c+    5 




  ! 
   

I would know the program has been modified or not by seeing the modification history in the
program header.

c,  3   !

It is a planetary meeting of all the project managers to discuss about the present Status of the
project in hand and discuss new ideas and options in improving the Way it is presently being
performed.

c7 6        

Clintrial, the market's leading Clinical Data Management System (CDMS).Oracle Clinical or
OC is a database management system designed by Oracle to provide data management, data
entry and data validation functionalities to Clinical Trials process.18. Tell me about MEDRA
and what version of MEDRA did you use in your project?Medical dictionary of regulatory
activities. Version 10

c0 -'

CDISC¶s Study Data Tabulation Model (SDTM) has been developed to standardize what is
submitted to the FDA.

# *
$%

Case Report Tabulation, Whenever a pharmaceutical company is submitting an NDA,


conpany has to send the CRT's to the FDA.

c *
  $%1

Annotated CRF is a CRF(Case report form) in which variable names are written next the
spaces provided to the investigator. Annotated CRF serves as a link between the raw data and
the questions on the CRF. It is a valuable toll for the programmers and statisticians..
 *
  5    c$%1(%cc

Title 21 CFR Part 11 of the Code of Federal Regulations deals with the FDA guidelines on
electronic records and electronic signatures in the United States. Part 11, as it is commonly
called, defines the criteria under which electronic records and electronic signatures are
considered to be trustworthy, reliable and equivalent to paper records.

*
 
    ()  *
   

What are the variables in adverse event datasets?The adverse event data set contains the
SUBJID, body system of the event, the preferred term for the event, event severity. The
purpose of the AE dataset is to give a summary of the adverse event for all the patients in the
treatment arms to aid in the inferential safety analysis of the drug.

*
 
      *

     

The lab data set contains the SUBJID, week number, and category of lab test, standard units,
low normal and high range of the values. The purpose of the lab data set is to obtain the
difference in the values of key variables after the administration of drug.

+      !   


!
 
   


I used proc freq and proc univariate to find the discrepancies in the data, which I reported to
my manager.

,    $%7" 


" 

  


Yes I have created patient profile tabulations as the request of my manager and and the
statistician. I have used PROC CONTENTS and PROC SQL to create simple patient listing
which had all information of a particular patient including age, sex, race etc.

7        

Yes, I have created SAS Xport transport files using Proc Copy and data step for the FDA
submissions. These are version 5 files. we use the libname engine and the Proc Copy
procedure, One dataset in each xport transport format file. For version 5: labels no longer
than 40 bytes, variable names 8 bytes, character variables width to 200 bytes. If we violate
these constraints your copy procedure may terminate with constraints, because SAS xport
format is in compliance with SAS 5 datasets.

Libname sdtm ³c:\sdtm_data´;Libname dm xport ³c:\dm.xpt´;


Proc copy;
In = sdtm;
Out = dm;
Select dm;
Run;

/      !   


!
 
   


I used proc freq and proc univariate to find the discrepancies in the data, which I reported to
my manager.

0   

CDISC- Clinical data interchange standards consortium.They have different data models,
which define clinical data standards for pharmaceutical industry.

SDTM ± It defines the data tabulation datasets that are to be sent to the FDA for regulatory
submissions.
ADaM ± (Analysis data Model)Defines data set definition guidance for creating analysis data
sets.

ODM ± XML ± based data model for allows transfer of XML based data .

Define.xml ± for data definition file (define.pdf) which is machine readable.

ICH E3: Guideline, Structure and Content of Clinical Study Reports

ICH E6: Guideline, Good Clinical Practice

ICH E9: Guideline, Statistical Principles for Clinical Trials

Title 21 Part 312.32: Investigational New Drug Application

#     ) 


5  !    3 " 
" 

  5     
5  ! 

Yes I have done edit check programs .Edit check programs ± Data validation.

1.Data Validation ± proc means, proc univariate, proc freq.Data Cleaning ± finding errors.

2.Checking for invalid character values.Proc freq data = patients;Tables gender dx ae / nocum
nopercent;Run;Which gives frequency counts of unique character values.

3. Proc print with where statement to list invalid data values.[systolic blood pressure - 80 to
100][diastolic blood pressure ± 60 to 120]

4. Proc means, univariate and tabulate to look for outliers.Proc means ± min, max, n and
mean.Proc univariate ± five highest and lowest values[ stem leaf plots and box plots]

5. PROC FORMAT ± range checking

6. Data Analysis ± set, merge, update, keep, drop in data step.

7. Create datasets ± PROC IMPORT and data step from flat files.

8. Extract data ± LIBNAME.9. SAS/STAT ± PROC ANOVA, PROC REG.

10. Duplicate Data ± PROC SORT Nodupkey or NoduplicateNodupkey ± only checks for
duplicates in BYNoduplicate ± checks entire observation (matches all variables)For getting
duplicate observations first sort BY nodupkey and merge it back to the original dataset and
keep only records in original and sorted.

11.For creating analysis datasets from the raw data sets I used the PROC FORMAT, and
rename and length statements to make changes and finally make a analysis data set.

c *
8  
The purpose of the verification is to ensure the accuracy of the final tables and the quality of
SAS programs that generated the final tables. According to the instructions SOP and the SAP
I selected the subset of the final summary tables for verification.
E.g Adverse event table, baseline and demographic characteristics table.The verification
results were verified against with the original final tables and all discrepancies if existed were
documented.

 *
 ! 8   

Its same as macro validation except here we have to validate the programs i.e according to
the SOP I had to first determine what the program is supposed to do, see if they work as they
are supposed to work and create a validation document mentioning if the program works
properly and set the status as pass or fail.Pass the input parameters to the program and check
the log for errors.

 *
  5    2-- 2-)"
   
  

ISS (Integrated summary of safety):Integrates safety information from all sources (animal,
clinical pharmacology, controlled and uncontrolled studies, epidemiologic data). "ISS is, in
part, simply a summation of data from individual studies and, in part, a new analysis that
goes beyond what can be done with individual studies."ISE (Integrated Summary of
efficacy)ISS & ISE are critical components of the safety and effectiveness submission and
expected to be submitted in the application in accordance with regulation. FDA¶s guidance
Format and Content of Clinical and Statistical Sections of Application gives advice on how to
construct these summaries. Note that, despite the name, these are integrated analyses of all
relevant data, not summaries.

 ).  
   
   8   

I have done data validation and data cleaning to check if the data values are correct or if they
conform to the standard set of rules.A very simple approach to identifying invalid character
values in this file is to use PROC FREQ to list all the unique values of these variables. This
gives us the total number of invalid observations. After identifying the invalid data «we
have to locate the observation so that we can report to the manager the particular patient
number.Invalid data can be located using the data _null_ programming.

Following is e.g

DATA _NULL_;

INFILE "C:PATIENTS,TXT" PAD;FILE PRINT; ***SEND OUTPUT TO THE OUTPUT


WINDOW;

TITLE "LISTING OF INVALID DATA";

***NOTE: WE WILL ONLY INPUT THOSEVARIABLES OF INTEREST;INPUT @1


PATNO $3.@4 GENDER $1.@24 DX $3.@27 AE $1.;

***CHECK GENDER;IF GENDER NOT IN ('F','M',' ') THEN PUT PATNO= GENDER=;
***CHECK DX;
IF VERIFY(DX,' 0123456789') NE 0
THEN PUT PATNO= DX=;
***CHECK AE;
IF AE NOT IN ('0','1',' ') THEN PUT PATNO= AE=;
RUN;

For data validation of numeric values like out of range or missing values I used proc print
with a where statement.

PROC PRINT DATA=CLEAN.PATIENTS;


WHERE HR NOT BETWEEN 40 AND 100 AND
HR IS NOT MISSING OR
SBP NOT BETWEEN 80 AND 200 AND
SBP IS NOT MISSING OR
DBP NOT BETWEEN 60 AND 120 AND
DBP IS NOT MISSING;TITLE "OUT-OF-RANGE VALUES FOR
NUMERICVARIABLES";
ID PATNO;
VAR HR SBP DBP;
RUN;

If we have a range of numeric values µ001¶ ± µ999¶ then we can first use user defined format
and then use proc freq to determine the invalid values.

PROC FORMAT;
VALUE $GENDER 'F','M' = 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
VALUE $DX '001' - '999'= 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
VALUE $AE '0','1' = 'VALID'' ' = 'MISSING'OTHER = 'MISCODED';
RUN;

One of the simplest ways to check for invalid numeric values is to run either PROC MEANS
or PROC UNIVARIATE.We can use the N and NMISS options in the Proc Means to check
for missing and invalid data. Default (n nmiss mean min max stddev).The main advantage of
using PROC UNIVARIATE (default n mean std skewness kurtosis) is that we get the
extreme values i.e lowest and highest 5 values which we can see for data errors. If u want to
see the patid for these particular observations «..state and ID patno statement in the
univariate procedure.

+ %    


 ! 
Develop programming for report formats (ISS & ISE shell) required by the regulatory
authorities.Update ISS/ISE shell, when required.
$ -  
Provide information on safety and efficacy findings, when required.Provide updates on safety
and efficacy findings for periodic reporting.
- -   
Draft ISS and ISE shell.Update shell, when appropriate.Analyze and report data in approved
format, to meet periodic reporting requirements.
, ).    $        
-!9- 
When the patients are not aware of which treatment they receive.
 9- 
When the patients and the investigator are unaware of the treatment group assigned.
 9- 
Triple blind study is when patients, investigator, and the project team are unaware of the
treatments administered.

7 *
 
  4   
  
Demog
Adverse Events
Vitals
ECG
Labs
Medical History
PhysicalExam etc

/ $   
    
  
 ! Usubjid, Patient Id, Age, Sex, Race, Screening Weight, Screening Height, BMI etc

()  Protocol no, Investigator no, Patient Id, Preferred Term, Investigator Term,
(Abdominal dis, Freq urination, headache, dizziness, hand-food syndrome, rash, Leukopenia,
Neutropenia) Severity, Seriousness (y/n), Seriousness Type (death, life threatening,
permanently disabling), Visit number, Start time, Stop time, Related to study drug?

8  Subject number, Study date, Procedure time, Sitting blood pressure, Sitting Cardiac
Rate, Visit number, Change from baseline, Dose of treatment at time of vital sign, Abnormal
(yes/no), BMI, Systolic blood pressure, Diastolic blood pressure.

)$  Subject no, Study Date, Study Time, Visit no, PR interval (msec), QRS duration
(msec), QT interval (msec), QTc interval (msec), Ventricular Rate (bpm), Change from
baseline, Abnormal.

  Subject no, Study day, Lab parameter (Lparm), lab units, ULN (upper limit of normal),
LLN (lower limit of normal), visit number, change from baseline, Greater than ULN (yes/no),
lab related serious adverse event (yes/no).Medical History: Medical Condition, Date of
Diagnosis (yes/no), Years of onset or occurrence, Past condition (yes/no), Current condition
(yes/no).


 ).  Subject no, Exam date, Exam time, Visit number, Reason for exam, Body
system, Abnormal (yes/no), Findings, Change from baseline (improvement, worsening, no
change), Comments

0  
.    55     ! ).   ) 
$
5

 !Weight is outside expected rangeBody mass index is below expected


( check weight and height)
Age is not within expected range.
DOB is greater than the Visit date or not..
Gender value is a valid one or invalid. etc
()
Stop is before the start or visit Start is before birthdate Study medicine discontinued due to
adverse event but completion indicated (COMPLETE =1)

 
Result is within the normal range but abnormal is not blank or µN¶Result is outside the
normal range but abnormal is blank

8 
Diastolic BP > Systolic BP

'  
Visit date prior to Screen datePhysicalPhysical exam is normal but comment included

# *
 
   ! !-(-    ! *


 
        !!  
ADVANTAGES OF USING A SAS®-BASED SYSTEM

 :
A Typical SAS®-based system can utilize a standard file server to store its databases and
does not require one or more dedicated servers to handle the application load. PC SAS® can
easily be used to handle processing, while data access is left to the file server. Additionally,
as presented later in this paper, it is possible to use the SAS® product SAS®/Share to provide
a dedicated server to handle data transactions.
1   :
Systems that use complicated database software often require the hiring of one ore more
DBA¶s (Database Administrators) who make sure the database software is running, make
changes to the structure of the database, etc. These individuals often require special training
or background experience in the particular database application being used, typically Oracle.
Additionally, consultants are often required to set up the system and/or studies since
dedicated servers and specific expertise requirements often complicate the process.Users with
even casual SAS® experience can set up studies. Novice programmers can build the structure
of the database and design screens. Organizations that are involved in data management
almost always have at least one SAS® programmer already on staff. SAS® programmers will
have an understanding of how the system actually works which would allow them to extend
the functionality of the system by directly accessing SAS® data from outside of the
system.Speed of setup is dramatically reduced. By keeping studies on a local file server and
making the database and screen design processes extremely simple and intuitive, setup time is
reduced from weeks to days.All phases of the data management process become
homogeneous. From entry to analysis, data reside in SAS® data sets, often the end goal of
every data management group. Additionally, SAS® users are involved in each step, instead
of having specialists from different areas hand off pieces of studies during the project life
cycle.No data conversion is required. Since the data reside in SAS® data sets natively, no
conversion programs need to be written.Data review can happen during the data entry
process, on the master database. As long as records are marked as being double-keyed, data
review personnel can run edit check programs and build queries on some patients while
others are still being entered.Tables and listings can be generated on live data. This helps
speed up the development of table and listing programs and allows programmers to avoid
having to make continual copies or extracts of the data during testing.43. Have you ever had
to follow SOPs or programming guidelines?SOP describes the process to assure that standard
coding activities, which produce tables, listings and graphs, functions and/or edit checks, are
conducted in accordance with industry standards are appropriately documented.It is normally
used whenever new programs are required or existing programs required some modification
during the set-up, conduct, and/or reporting clinical trial data.44. Describe the types of SAS
programming tasks that you performed: Tables? Listings? Graphics? Ad hoc reports?
Other?Prepared programs required for the ISS and ISE analysis reports. Developed and
validated programs for preparing ad-hoc statistical reports for the preparation of clinical study
report. Wrote analysis programs in line with the specifications defined by the study
statistician. Base SAS (MEANS, FREQ, SUMMARY, TABULATE, REPORT et c) and
SAS/STAT procedures (REG, GLM, ANOVA, and UNIVARIATE etc.) were used for
summarization, Cross-Tabulations and statistical analysis purposes. Created Statistical reports
using Proc Report, Data _null_ and SAS Macro. Created, derived and merged and pooled
datasets,listings and summary tables for Phase-I and Phase-II of clinical trials.45. Have you
been involved in editing the data or writing data queries?If your interviewer asks this
question, the u should ask him what he means by editing the data« and data queries«

c (   !


       7   

 *
     
 !
Programmers sometime hardcode when they need to produce report in urgent. But it is
always better to avoid hardcoding, as it overrides the database controls in clinical data
management. Data often change in a trial over time, and the hardcode that is written today
may not be valid in the future.Unfortunately, a hardcode may be forgotten and left in the SAS
program, and that can lead to an incorrect database change.

          

Before writing "Test plan" you have to look into on "Functional specifications". Functional
specifications itself depends on "Requirements", so one should have clear understanding of
requirements and functional specifications to write a test plan.

 *

       
Although the verification and validation are close in meaning, "verification" has more of a
sense of testing the truth or accuracy of a statement by examining evidence or conducting
experiments, while "validate" has more of a sense of declaring a statement to be true and
marking it with an indication of official sanction.

+ *

-(-       !     
Conditional statements, if then else.
Put statement
Debug option.

, *
%&$$2-$
It is new SAS procedure that is available as a hotfix for SAS 8.2 version and comes as a part
withSAS 9.1.3 version.
PROC CDISC is a procedure that allows us to import (and export XML files that are
compliant with the CDISC ODM version 1.2 schema.
For more details refer SAS programming in the Pharmaceutical Industry text book.
7)*
&$1
Pharmaceutical companies conduct longitudinalstudies on human subjects that often span
several months. It is unrealistic to expect patients to keep every scheduled visit over such a
long period of time.Despite every effort, patient data are not collected for some time points.
Eventually, these become missing values in a SAS data set later. For reporting purposes,the
most recent previously available value is substituted for each missing visit. This is called the
Last Observation Carried Forward (LOCF).LOCF doesn't mean last SAS dataset observation
carried forward. It means last non-missing value carried forward. It is the values of individual
measures that are the "observations" in this case. And if you have multiple variables
containing these values then they will be carried forward independently.

/))  

Extract, transform and Load:


).   

The 1st part of an ETL process is to extract the data from the source systems. Most data
warehousing projects consolidate data from different source systems.

Each separate system may also use a different data organization / format. Common data
source formats are relational databases and flat files, but may include non-relational database
structures such as IMS or other data structures such as VSAM or ISAM.

).     converts the data into a format for transformation processing.An intrinsic part of
the extraction is the parsing of extracted data, resulting in a check if the data meets an
expected pattern

  The transform stage applies a series of rules or functions to the extracted data
from the source to derive the data to be loaded to the end target. Some data sources will
require very little or even no manipulation of data. In other cases, one or more of the
following transformations types to meet the business and technical needs of the end target
may be required:·

Selecting only certain columns to load (or selecting null columns not to load) · Translating
coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse
stores M for male and F for female), this is called automated data cleansing; no manual
cleansing occurs during ETL · Encoding free-form values (e.g., mapping "Male" to "1" and
"Mr" to M) ·

Joining together data from multiple sources (e.g., lookup, merge, etc.) · Generating surrogate
key values · Transposing or pivoting (turning multiple columns into multiple rows or vice
versa) · Splitting a column into multiple columns (e.g., putting a comma-separated list
specified as a string in one column as individual values in different columns) ·

Applying any form of simple or complex data validation; if failed, a full, partial or no
rejection of the data, and thus no, partial or all the data is handed over to the next step,
depending on the rule design and exception handling. Most of the above transformations
itself might result in an exception, e.g. when a code-translation parses an unknown code in
the extracted data.Load:The load phase loads the data into the end target, usually being the
data warehouse (DW).
Depending on the requirements of the organization, this process ranges widely. Some data
warehouses might weekly overwrite existing information with cumulative, updated data,
while other DW (or even other parts of the same DW) might add new data in a historized
form, e.g. hourly. The timing and scope to replace or append are strategic design choices
dependent on the time available and the business needs. More complex systems can maintain
a history and audit trail of all changes to the data loaded in the DW.

As the load phase interacts with a database, the constraints defined in the database schema as
well as in triggers activated upon data load apply (e.g. uniqueness, referential integrity,
mandatory fields), which also contribute to the overall data quality performance of the ETL
process.
source: wikipedia

È !
  5
V

You might also like