Slide1: Access to Confidential Data for Statistical Analysis Kenneth Harris, Director of Research Data Center
National Center for Health Statistics (NCHS): National Center for Health Statistics (NCHS) Despite the wide dissemination of its data through publications, CD-ROMs, etc., the inability to release files with, for instance, lower levels of geography, severely limits the utility of some data for research, policy, and programmatic purposes and sets a boundary on one of the Center’s goals to increase its capacity to provide state and local area estimates.
NCHS (cont.): NCHS (cont.) In pursuit of this goal and in response to the research community’s interest in restricted data, NCHS established the Research Data Center (RDC), a mechanism whereby researchers can access detailed data files in a secure environment, without jeopardizing the confidentiality of the respondents.
Research Data Center: Research Data Center The NCHS Research Data Center, established in 1998, is a facility at the NCHS headquarters in Hyattsville, Maryland, where researchers are granted access to restricted data files needed to complete approved projects. Restricted data files may contain information, such as lower levels of geography, but do not contain direct identifiers (e.g., name or social security number).
Data Restrictions: Data Restrictions Section 308 (d) of the Public Health Service Act and the NCHS Staff Confidentiality Manual do not permit the release of data that are either identified or identifiable to persons outside of NCHS.
Data Restrictions (cont.): Data Restrictions (cont.) Identifiable data include not only direct identifiers such as name, social security number, etc., but also data that can serve to allow inferential identification of either individual or institutional respondents by a number of means.
Data Restrictions (cont.): Data Restrictions (cont.) Research indicates that identifiability is greatly enhanced if geographic identifiers for state, county, census tract, block-group or block are released on public use files.
Key Issues for Research Data Availability: Key Issues for Research Data Availability CONFIDENTIALITY
The dissemination of data in a manner that would allow public identification of the respondent or would in any way be harmful to him/her is prohibited and the data are immune from legal process.
Key Issues for Research Data Availability (cont.): Key Issues for Research Data Availability (cont.) DISCLOSURE
Disclosure relates to inappropriate attribution of information to a data subject, whether an individual or an organization. Disclosure occurs when a data subject is identified from a released file (identity disclosure), sensitive information about a data subject is revealed through the released file (attribute disclosure), or the released data make it possible to determine the value of some characteristic of an individual more accurately than otherwise would have been possible (inferential disclosure).
Appendix I – Rules for the Release of Micro Data Files: Appendix I – Rules for the Release of Micro Data Files The data file must not contain any detailed
information about the subject that could facilitate identification and that is not essential for research purposes (e.g., exact date of the subject’s birth).
Geographic places that have fewer than 100,000 people are not to be identified on the data file.
Characteristics of an area are not to appear on the data file if they would uniquely identify an area of less than 100,000 people.
Appendix I – Rules for the Release of Micro Data Files (cont.): Appendix I – Rules for the Release of Micro Data Files (cont.) Information on the drawing of the sample which might assist in identifying a data subject must not be released outside the Center. Thus, the identities of primary sampling units are not to be made available outside the Center.
Before any new or revised micro data files are published, they, together with their full documentation, must be approved for publication by the NCHS Director or Deputy Director.
A micro data file containing confidential data on unidentified individuals or facilities may not be released to any person or organization outside NCHS until that person, or a responsible representative of that organization, has first signed the statement on the Order Form, whereby he gives assurance that the data provided will be used only for statistical reporting or research purposes.
Why NCHS Does Not Release Files With Lower Levels of Geography: Why NCHS Does Not Release Files With Lower Levels of Geography Research suggests that in the case of personal surveys nine commonly collected variables result in the table below.
Why NCHS Does Not Release Files With Lower Levels of Geography (cont.): Why NCHS Does Not Release Files With Lower Levels of Geography (cont.)
Notes: A geopolitical area may be a county, city, town, or other place with well- defined boundaries.
In this case, identification refers to certainty identification.
How Does RDC Operate?: How Does RDC Operate?
On-Site Access
Remote Access
Staff Assisted Analytical Session
User Procedures: User Procedures To gain access to NCHS restricted data through
either method, user must:
Submit a research proposal.
An advisory and proposal review committee receives, reviews, and approves researcher proposals
Proposals are evaluated primarily on the confidentiality disclosure risk.
Scientific merit is not an evaluation criteria.
Sign an affidavit of confidentiality and promise not to use any method to attempt to identify respondents.
User Procedures (cont.): User Procedures (cont.) Not take any materials or equipment into RDC unless approved by RDC staff.
Submit data files to be merged onto NCHS data ahead of time – all merging is done by RDC staff.
Subject all output and/or materials removed from the RDC to a disclosure limitation review.
May not remove any NCHS restricted data files nor linked data files.
Researcher Affidavit of Confidentiality: Researcher Affidavit of Confidentiality I certify that no confidential data or information viewed or otherwise obtained while I am a researcher in the National Center for Health Statistics (NCHS), Research Data Center (RDC) will be removed from NCHS. Further, I understand that NCHS will perform a disclosure review and must provide approval to me before I remove any data from the RDC, whether it be in electronic or paper form. I acknowledge NCHS Confidentiality Statute, 308(d) of the Public Health Service Act stated below and fully understand my legal obligations to NCHS to protect all confidential data. Further I understand any violation I may perform is punishable under 18 United States Code (USC), 1001 which carries a fine of up to $10,000 or up to 5 years in prison.
Researcher Affidavit of Confidentiality (cont.): Researcher Affidavit of Confidentiality (cont.) NCHS 308(d) Confidentiality Statute - No information, if an establishment or person supplying the information or described in it is identified, obtained in the course of activities undertaken or supported under section 304, 305, 306, 307, or 309 may be used for any purpose other than the purpose for which it was supplied unless such establishment or person has consented to its use for such other purpose and in the case of information obtained in the course of health statistical or epidemiological activities under section 304 or 306, such information may not be published or released in other form if the particular establishment or person supplying the information or described in it is identifiable unless such establishment or person has consented to its publication or release in other form.
Researcher Affidavit of Confidentiality (cont.): Researcher Affidavit of Confidentiality (cont.)
18 United States Code, 1001 - Deliberately making a false statement in any matter within the jurisdiction of any Department or Agency of the Federal Government violates 18 USC 1001 and is punishable by a fine of up to $10,000 or up to 5 years in prison.
____________________ _______________ Researcher’s Signature Date
____________________ _______________
NCHS Witness Date
Can Researcher Merge his/her Data with NCHS ?: Can Researcher Merge his/her Data with NCHS ? Must Interact with RDC staff to ensure
that their data can be merged with the
NCHS data.
User-supplied data will be merged with
NCHS data by RDC staff only.
The NCHS RDC policy states that merged
and user-supplied data will not be made
available for analysis to anyone without
the written consent of the user.
The Cost per Project: The Cost per Project On Site
$200 per day (2 day minimum)
Remote Access
NSFG-CDF = $500/ year
NHIS-polio = $500/ year
NHIS Linked Mort. File = $250/Month
NHANES Linked Mort. File = $250/Month
The Cost per Project (cont.): The Cost per Project (cont.) Files 130k records = $1000 per month
Staff Assisted Variable
File Construction and Setup
For Mortality Files = $250 per day
For all Other Files = $500 per day
Do Doctors perform “defensive Cesareans”?: Do Doctors perform “defensive Cesareans”? Overview: This topic re-examined the issues of “defensive medicine” and state reforms designed to limit malpractice risk on the use of cesarean section delivery.
NCHS Data Used: National Hospital Discharge Survey (NHDS)
Years of Data Used: 1980 through 1992, inclusive.
User’s Data Merged with NCHS? Yes
Method of Access to NCHS Data: Remote and
On-site Access
Statistical Software Used: SAS
Economic Model to Explain the Incidence of Sexual Activity, Contraceptive Use, STD, and Pregnancy Among Teenage Girls.: Economic Model to Explain the Incidence of Sexual Activity, Contraceptive Use, STD, and Pregnancy Among Teenage Girls. Overview: National Survey of Family Growth Data provide extensive socio-demographic information and reports of the sexual histories of these women. Researcher focused on the effects of a number of policies measured at the state-level. These included:
Parental notification of consent laws.
Medicaid funding of abortions.
Welfare generosity.
NCHS Data Used: National Survey of Family Growth (NSFG)
User’s Data Merged with NCHS? Yes
Method of Access to NCHS Data: Remote Access
Statistical Software Used: SAS
Nursing Home Admission and Payment Source?: Nursing Home Admission and Payment Source? Overview: This project tested if patients with Medicare were being discriminated against because their reimbursement rate was significantly below the private pay rate for nursing homes.
NCHS Data Used: National Nursing Home Survey (NNHS)
Years of Data Used: 1985, 1995, and 1997
User’s Data Merged with NCHS? No
Method of Access to NCHS Data: Remote Access
Statistical Software Used: SAS
Hardware and Software: Hardware and Software All RDC hardware and software are standard.
Hardware
Pentium IV computers with Windows 2000
Software
SAS (only language on ANDRE)
Sudaan
Fortran
HLM
Stata
Limdep
text editors/viewers
Onsite workstations do NOT have email or internet access
Only access to printer is through RDC staff
Record Linkage for Epidemiologic Research: Accessing Linked data at the NCHS Research Data Center: Record Linkage for Epidemiologic Research: Accessing Linked data at the NCHS Research Data Center Christine S. Cox
NCHS Data Users Conference
July 12, 2006
Slide28:
Administrative records
Linked Data File
NCHS Surveys
What is Record Linkage?
NCHS Linked Data: Major Activities: NCHS Linked Data: Major Activities Mortality
National Death Index
Health Care Utilization and Costs
Medicare Data
Retirement and Disability
Social Security Data
NCHS Linked Data: Mortality : NCHS Linked Data: Mortality
Eligibility status
Assigned vital status
Date of death
Age at death
Underlying and multiple causes of death
Adjusted sample weights
Research Potential of Linked Mortality Data: Research Potential of Linked Mortality Data Living and Dying in the USA: Behavioral, Health, and Social Differentials of Adult Mortality
RG Rogers, CB Nam, RA Hummer A Semiparametric Analysis of the Body Mass Index’s Relationship to Mortality
JT Gronniger The Income-Associated Burden of Disease in the United States
P Muennig, P Franks, H Jia, E Lubetkin and MR Gold Excess Deaths Associated with Underweight, Overweight, and Obesity KM Flegal, BI Graubard, DF Williamson; MH Gail JAMA. 2005;293:1861-1867.
NCHS Linked Data: Medicare: NCHS Linked Data: Medicare Medicare entitlement and health care utilization and payment data for 1991-2000
Denominator file
MEDPAR Inpatient hospitalization
MEDPAR Skilled nursing facility
Hospital outpatient
Home Health Care
Hospice
Carrier (physician/supplier Part B file)
Durable Medical Equipment
Research Potential ofLinked Medicare Data: Research Potential of Linked Medicare Data Examine risk factors for health conditions
Examine reliability of survey data
Examine survey report of disability with program participation eligibility criteria
Compare survey reported health conditions to claims records
Examine disparities in Medicare service utilization
NCHS Linked Data: Retirement/Disability: NCHS Linked Data: Retirement/Disability Social Security data from Retirement, Survivors, and Disability Insurance (RSDI) and Supplemental Security Insurance (SSI) programs
Master Beneficiary Record (MBR)
1962-2003
Payment History Update System (PHUS)
1984-2003
Supplemental Security Record (SSR)
1974-2003
Research Potential of Linked Social Security Data: Research Potential of Linked Social Security Data Examine reliability of survey information for SSA program participation and benefits
Compare the health characteristics of those who take early (age 62) Social Security benefits to those who postpone benefits
Policy analysis using validated survey data
Predicting the number of people who will become disabled based upon survey reported health conditions
Determining whether current disability entitlement funding levels will be adequate as the population ages
Summary NCHS Data Linkage: Summary NCHS Data Linkage
Slide37: www.cdc.gov/nchs/r&d/nchs_datalinkage/data_linkage_activities.htm
Why can’t you just give me the data?: Why can’t you just give me the data? NCHS does not “own” the linked administrative data
NCHS data confidentiality rules prohibit the release of potentially identifiable data – special considerations concerning the protection of linked data
The RDC is the only option for access for now….
Overview: Data Access Procedures: Overview: Data Access Procedures Proposal Requirements
Access Methods
Helpful Tips
Where to get help?
Proposal Requirements: Proposal Requirements Proposal is evaluated by review committee
Review criteria
Scientific and technical feasibility
Availability of RDC resources
Disclosure risk for restricted information
The extent to which project is in accordance with the mission of NCHS
Special note: NCHS does not try to determine if proposals are duplicative
Proposal Requirements: Proposal Requirements Cover letter
Project title
Abstract (maximum 300 words summarizing project)
Full contact information
Institutional affiliation
Mail address, phone, email
Dates of proposed time at RDC (or indication of using remote access)
Source of funding for proposed research
Proposal Requirements: Proposal Requirements Study background
Key study questions or hypotheses
Public health benefits
Methods
Analytic approach and statistical methods
Statistical software requirements
Description of intended output for nondisclosure review, e.g.
Table shells
Model equations
Test statistics that researcher plans to remove from RDC
Proposal Requirements: Proposal Requirements Explanation of why restricted data are needed, e.g. describe why publicly available data are insufficient
Summary of data requirements to be included in analytic file
Identification of sample
Identification of variables
Description of additional data to be supplied by researcher to be merged with NCHS or other data source (must clearly identify source of other data)
Proposal Requirements: Appendices: Proposal Requirements: Appendices Current Curriculum Vitae or resume for each investigator
Data dictionary – complete listing of specific data requested and its source(s) and indicate if public use or restricted access variables
specific files and years
sample
variables (dependent, independent, matching/linking)
Proposal Requirements: Appendices: Proposal Requirements: Appendices For remote-access applicants
Description of the computer and email system to be used to receive output
Security provisions for the computer and email systems
For students
Letter from department chair or academic advisor stating that student is working under the direction of the department
Overview: RDC Data Access Procedures: Overview: RDC Data Access Procedures Proposal Requirements
Access Methods
Helpful Tips
Where to get help?
Access Methods: Access Methods Once approved, three methods to access restricted data
on-site - use local computing resources in the NCHS RDC, Hyattsville, MD
remote – submit programs electronically to be executed in the RDC with output returned by email
staff assisted – RDC staff provide on-site programming for off-site approved researchers
For all methods of access, restricted data files remain in RDC and output is inspected for disclosure violations
On-Site Access: On-Site Access RDC staff constructs necessary data files, including merged user data
Most statistical packages available with sufficient lead time
Output subject to disclosure review
Open only during normal working hours
Remote Access Method: Remote Access Method RDC staff constructs necessary data files, including merged user data
SAS programs only (certain procedures and functions not allowed) – additional software options expected
Both submitted programs and output undergo a programmed disclosure limitation review
RDC Staff-assisted Programming Method: RDC Staff-assisted Programming Method Subcontract with the RDC staff to perform programming tasks
Useful for those planning to use statistical software not available for the remote system and who are not able to travel to the RDC facility
Cost is estimated for each research project
Overview: RDC Data Access Procedures: Overview: RDC Data Access Procedures Proposal Requirements
Access Methods
Helpful Tips
Where to get help?
RDC Helpful Tips: RDC Helpful Tips Be clear about research and data requirements (helps to determine feasibility of project)
Clearly identify the sample to be included in the analytic file
Provide data dictionaries for both
Public use data
Restricted data
Provide examples of expected output
Overview: RDC Data Access Procedures: Overview: RDC Data Access Procedures Proposal Requirements
Access Methods
Helpful Tips
Where to get help?
Slide54: Visit the RDC at:
www.cdc.gov/nchs/r&d/rdc.htm or email: rdca@cdc.gov
Slide55: LINKED DATA, CONTEXTUAL DATA, and GEO-CODING ON-SITE and STAFF-ASSISTED DATA ACCESS Christopher Rogers
Research Data Center
cor2@cdc.gov
Why Link Data Sets?: Why Link Data Sets? Improve modeling and make use of existing data.
Compensate for increased difficulties taking surveys.
Open your mind.
Common Example:
Economic variables versus Ethnic variables
Historical Trends: Historical Trends More linking of scientific data sets between government agencies. Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA.)
Confused political and social situation in US.
Quality NCHS Resources: Quality NCHS Resources Linked Birth and Infant Death Data with Fetal Death Data.
Geo-coded NHIS 1986-2003 (2004-2005).
Geo-coded NHANES III.
Cycles 4, 5, and 6 NSFG Contextual Data.
Linked Data Sets described earlier.
Linked Birth and Infant Death: Linked Birth and Infant Death Designed to study factors in infant death.
Links birth and death certificates for deaths under one year of age. Includes fetal deaths for 1995-1997
Years: 1983-1991 and 1995-1997
Numerator File (for deceased children): Parental information and behavior, prenatal care, infant health variables, demographics, cause of death.
Denominator File (for control group): Parental information and behavior, prenatal heath, infant health, demographics.
Fetal Death Data: 1995-1997
Restricted Data: County/City of mother’s residence or County of child’s birth or death when under 250,000. 100,000 starting 1989.
Data Example: Data Example From the Division of Vital Statistics. Proposals or questions can go either to the RDC or the DVS.
Fetal Death Data portion. Given 1989-1999.
Linked to county level contextual data.
Goal to model fetal death with emphasis on ground water quality. Estimates death rates for each county.
Geo-Coded NHIS: Geo-Coded NHIS National Health Interview Survey. RDC has access to files from 1963 to present. Previously geo-coded households for 1986-1994. Recently geo-coded by RDC from 1995-2003. 2004-2005 coding in progress.
State (2 digits), County (3 digits), Tract (6 digits), Block Group (1 digit), and Block (3-4 digits) levels. Households coded to 1990 and 2000 Censuses.
Geo-Coded NHANES III: Geo-Coded NHANES III NHANES III is also linked to NDI Mortality data.
NHANES III has been geo-coded twice. The RDC has done it at the same level of detail as NHIS.
Continuous NHANES has not been geo-coded yet.
Example: Large project with neighborhood, economic, ethnic, and individual medical and behavioral variables. Multi-level models.
NSFG Contextual Data: NSFG Contextual Data Contextual variables available with Cycles 4, 5, and 6. Supplied for each individual in sample.
Cycle 6: 1054 contextual variables at the state, county, tract, and block group levels. For respondent addresses in 2000 and 2002.
Contextual data include both economic and demographic characteristics of locations. Easily merged by case ID to individual characteristics, behaviors, and histories.
Simple NSFG Example: Simple NSFG Example A simple example relating economics on state level, ethnicity, and behavior, but not using contextual variables.
Treatment States given waiver to offer more family planning services (FPS).
Questions:
FPS effects on behavior
FPS effect on pregnancy rates
Differential impacts across demographic subgroups?
Change of Topic: Accessing Data : Change of Topic: Accessing Data On-site access to data at the RDC in Hyattsville.
Staff-assisted remote access to data via e-mail.
Researchers often use both types of access.
Potential Designated Agent status. (CIPSEA)
The RDC has put many resources into automated remote access.
On-Site Access: On-Site Access Rules in 24 page file GuidelinesRDC11-8-05.pdf available on-line.
The RDC and NCHS surveys have knowledgeable professional staffs that review proposals carefully. Clients can only remove what has been approved. Checked by staff.
Exploratory Data Analysis. If needed, ask. Recent example: Checking general shapes of variables for model validity. OKed by survey.
Modeling needs. Recent example: Nested randomized geo-codes.
Estimation problems. Example: Single PSU in a Stratum.
Staff-Assisted Remote Access: Staff-Assisted Remote Access Analysis done through a particular staff member. Usually efficient, but could be very busy.
Staff member determines costs based on time.
Staff usually not asked to do much programming.
Staff creates data, runs e-mailed programs, checks, and returns output to researcher.
Staff can do exploratory analysis, if needed.
Staff can help check modeling problems.
Commonly done after on-site visit.
Our Mission: Our Mission The RDC has a professional staff dedicated to helping researchers uncover knowledge and advance understanding.
Slide69: Remote Access System Vijay Gambhir
Remote Access System: Remote Access System Envisioned as an integral Part of RDC
Pre – onsite usage
Post – onsite usage
Super store/ Convenience store
Basics of Remote Access System: Basics of Remote Access System Object oriented, event driven system based upon the principles of distributed computing
About two years of development efforts
Set of applications called in service by resident component
Advanced pattern recognition techniques
Analytic Data Research by Email (ANDRE): Analytic Data Research by Email (ANDRE) NCHS has been providing remote data access to researchers through ANDRE since April 1998.
In the past five years, ANDRE has served 45 different data analysts and executed over 9,500 SAS programs for their research programs.
Main Features of ANDRE: Main Features of ANDRE Completely automated system
Operates round the clock
without any human intervention
Registered subscribers only
Proposals already reviewed and approved
Have an agreement with NCHS/RDC
Unlimited Access during the subscription period
Data Requests: Data Requests Registered user can submit data requests by email from anywhere and at any time.
Results of the data request released to a specified email address that has been certified as secure by the subscriber and approved by NCHS/RDC.
Authentication: Authentication Multi-levels of system security:
Submission syntax
User id
Password
Email/code word
Package
Path info
Data Request Analysis: Data Request Analysis Compliance with the disclosure limitation constraints of NCHS
Integrity of the system
Resource constraints (CPU time & Storage requirements)
Protection of ANDRE’s work environment
Prevention of Direct Disclosure: Prevention of Direct Disclosure Cleaning up of the Log File
Categorization of SAS commands/words
Forbidden Commands
Modifications to the Commands
Output suppression
Sample: Original Log: Sample: Original Log 1 options nocenter;
2 Data one;
3 Infile 'd:\nchs\respnd95.dat' lrecl=13064;
4 Input
5 TODAYSPG 6847-6847
6 CONSTAT1 11934-11935
7 CONSTAT2 11936-11937
8 CONSTAT3 11938-11939
9 CONSTAT4 11940-11941
10 SEX1MTHD 11945-11946
11 POST_WT 12350-12359;
12 if constat1 = 'ab' then vjvar=1; else vjvar = 2;
13 WGT1000=POST_WT/1000;
14 title 'NSFG cycle 1995';
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
12:15
NOTE: The infile 'd:\nchs\respnd95.dat' is:
File Name=d:\nchs\respnd95.dat,
RECFM=V,LRECL=13064
NOTE: Invalid numeric data, 'ab' , at line 12 column 15.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 1000000111260837511521 1 1050 12 106921124112411189
101 2
201 19211059110611197
……
Sample: Original Log (cont.): Sample: Original Log (cont.) ……
12901 11232521101 05267213103033921811931011103 01030000000321120000392702210611511200403 1344 1316
13001 622501001006034
TODAYSPG=1 CONSTAT1=5 CONSTAT2=88 CONSTAT3=88 CONSTAT4=88 SEX1MTHD=1 POST_WT=2545.7569 vjvar=2 WGT1000=2.5457569 _ERROR_=1
_N_=20
NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'.
The minimum record length was 13064.
The maximum record length was 13064.
NOTE: The data set WORK.ONE has 10847 observations and 9 variables.
NOTE: DATA statement used:
real time 39.88 seconds
cpu time 12.10 seconds
15 proc freq;
16 tables CONSTAT1 vjvar;
17 run;
NOTE: There were 10847 observations read from the data set WORK.ONE.
NOTE: PROCEDURE FREQ used:
real time 0.49 seconds
cpu time 0.04 seconds
Sample: Cleaned Log: Sample: Cleaned Log 1 options nocenter;
2 Data one;
3 Infile 'd:\nchs\respnd95.dat' lrecl=13064;
4 Input
5 TODAYSPG 6847-6847
6 CONSTAT1 11934-11935
7 CONSTAT2 11936-11937
8 CONSTAT3 11938-11939
9 CONSTAT4 11940-11941
10 SEX1MTHD 11945-11946
11 POST_WT 12350-12359;
12 if constat1 = 'ab' then vjvar=1; else vjvar = 2;
13 WGT1000=POST_WT/1000;
14 title 'NSFG cycle 1995';
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
12:15
NOTE: The infile 'd:\nchs\respnd95.dat' is:
File Name=d:\nchs\respnd95.dat,
RECFM=V,LRECL=13064
NOTE: Invalid numeric data, 'ab' , at line 12 column 15.
Sample: Cleaned Log (cont.): Sample: Cleaned Log (cont.) NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'.
The minimum record length was 13064.
The maximum record length was 13064.
NOTE: The data set WORK.ONE has 10847 observations and 9 variables.
NOTE: DATA statement used:
real time 39.88 seconds
cpu time 12.10 seconds
15 proc freq;
16 tables CONSTAT1 vjvar;
17 run;
NOTE: There were 10847 observations read from the data set WORK.ONE.
NOTE: PROCEDURE FREQ used:
real time 0.49 seconds
cpu time 0.04 seconds
Forbidden Commands: Forbidden Commands Commands That Pose Unacceptable Disclosure Risks
OR
Disallowed to Protect Integrity/Internal Environment of ANDRE
Add firstobs report iml
Print first. Pctn nofreq
Obs last. Pctsum nocum
Firstobs nocol tabulate editor
Browse summary list put
Commands Modification: Commands Modification Modify user’s program to enforce restrictions on options allowed with certain SAS procedures to prevent objectionable info appearing in the output
PROC MEANS n mean std;
Output Suppression: Output Suppression Wiping out of extreme values from the output of Proc Univariate.
Suppressing complete output line (Procs Means, corr, Univariate, etc) where sample size less than the minimum acceptable value.
Proc Means Suppression: Proc Means Suppression The MEANS Procedure
Variable Label N Mean Std Dev
--------------------------------------------------------------------------------------------
EXPEND_R Current expend/pupil in public schl/1000 5424 5.0830820 1.3958710
*** Values Suppressed ***
RPUB87 exp. for contr. serv. and supplies 1997$ 5424 23472052.60 18806802.86
RPUB92 exp. for contr. serv. and supplies 1997$ 5424 34800922.98 30481634.59
PRGPRO Coordinated Pregnancy Prevention Program 1708 0.0679157 0.2516749
HIVED HIV/AIDS Education 1708 3.5146370 0.8044378
*** Values Suppressed ***
PRGPRO87 Coordinated Pregnancy Prevention Program 5424 0.0540192 0.2260764
HIVED87 HIV/AIDS Education 5424 3.4968658 0.8008324
WT_PER15 % Wt females aged 15-19/total 15-19 5424 0.7279681 0.1265796
BK_PER15 % Bk females aged 15-19/total 15-19 5424 0.1409869 0.0932332
HS_PER15 % Hs females aged 15-19/total 15-19 5424 0.0962413 0.1055191
TEENMMC2 Teenmom by cohort (1,2,3r) 1201 1.7119067 0.7715351
C18_2_1S R in C2 (vs 1) at 18-19 endpt (1,2) 1770 1.5248588 0.4995228
TM2_1S18 R tnmm in Coh 2 (vs 1)-age 18 @ ext 358 1.4804469 0.5003168
AGE_12 Date R = 12 in century months 6450 979.5613953 69.3124265
STRTST IA5 Date R started living in current sta 3870 1132.55 753.2066507
BDAYCENM R date of birth 6450 835.5613953 69.3124265
RAVPAY95 real av. an. pay 95 dollars 5424 26933.93 2826.80
PERCAFDC percent of households receiving AFDC 5424 0.0422254 0.0127307
SALARY teacher salaries real 96-97$$$ 5424 35338.66 5729.11
--------------------------------------------------------------------------------------------
Proc Univariate OutputUnsuppressed: Proc Univariate Output Unsuppressed The SAS System 9
14:09 Sunday, October 24, 1999
Univariate Procedure
Variable=AVHRATET
Moments Quantiles(Def=5)
N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008
Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588
Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152
Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639
USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645
CV -40.5821 Std Mean 0.039598 1% -12.7402
T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321
Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949
M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463
Sgn Rank -1303593 Pr>=|S| 0.0001
Extremes
Lowest Obs Highest Obs
-13.5463( 1547) -0.90519( 649)
-13.5397( 1836) -0.81756( 1094)
-13.4637( 2084) -0.76928( 1739)
-13.4413( 1127) -0.5907( 21)
-13.4402( 1088) -0.25314( 400)
Proc Univariate OutputSuppressed: Proc Univariate Output Suppressed The SAS System 9
14:09 Sunday, October 24, 1999
Univariate Procedure
Variable=AVHRATET
Moments Quantiles(Def=5)
N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008
Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588
Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152
Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639
USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645
CV -40.5821 Std Mean 0.039598 1% -12.7402
T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321
Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949
M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463
Sgn Rank -1303593 Pr>=|S| 0.0001
Proc Univariate OutputSuppressed (sample size = 1): Proc Univariate Output Suppressed (sample size = 1)
Univariate Procedure
Variable=FREQ (sum) freq
Moments Quantiles(Def=5)
Serious Disclosure limitation Violations
Values too low to release
Output of Proc Univariate withheld
Proc Freq Suppression (One-Way Tables): Proc Freq Suppression (One-Way Tables) Suppress at least two consecutive rows to prevent derivation of suppressed values from cumulative totals.
Disallow single row output.
One-Way Freq TableSuppressed: One-Way Freq Table Suppressed Cumulative Cumulative
LOGRNTOPAT Frequency Percent Frequency Percent
-----------------------------------------------------------------
0.2277839309 ????? ????? ????? ?????
0.2277839309 ????? ????? ????? ?????
0.2305236586 5 0.08 6429 97.99
0.231111721 5 0.08 6434 98.06
0.232058915 ????? ????? ????? ?????
0.232058915 ????? ????? ????? ?????
0.2436220827 ????? ????? ????? ?????
0.2436220827 ????? ????? ????? ?????
0.2498117984 6 0.09 6456 98.40
0.2504106777 6 0.09 6462 98.49
0.2513144283 18 0.27 6480 98.77
0.2595111955 6 0.09 6486 98.86
0.2670627852 ????? ????? ????? ?????
0.2670627852 ????? ????? ????? ?????
0.2736958305 5 0.08 6500 99.07
0.2814124594 5 0.08 6505 99.15
0.3022808719 6 0.09 6511 99.24
0.3364722366 10 0.15 6521 99.39
One-Way Freq Tablesuppressed (cont.): One-Way Freq Table suppressed (cont.) Cumulative Cumulative
LOGRNTOPAT Frequency Percent Frequency Percent
-----------------------------------------------------------------
0.3403258059 ????? ????? ????? ?????
0.3403258059 ????? ????? ????? ?????
0.3715635564 6 0.09 6537 99.63
0.3856624808 ????? ????? ????? ?????
0.3856624808 ????? ????? ????? ?????
0.6931471806 6 0.09 6550 99.83
1.2527629685 ????? ????? ????? ?????
1.2527629685 ????? ????? ????? ?????
1.2527629685 ????? ????? ????? ?????
Proc Freq Suppression (Two-way Tables): Proc Freq Suppression (Two-way Tables) Rows and columns totals preserved
Cells with values less than the acceptable minimum are suppressed
Additional suppressions to ensure that no row and no column has single suppression.
Logical stitching of horizontal and vertical splits.
Proc Freq: Two-way Tables Suppression: Proc Freq: Two-way Tables Suppression TABLE OF FAMREL BY FAMSIZER
FAMREL FAMSIZER
Frequency|
Percent |
Row Pct |
Col Pct | 2| 3| 4| 5| Total
---------+--------+--------+--------+--------+
3 | 94 | 388 | 792 | 533 | 2206
| 3.97 | 16.40 | 33.47 | 22.53 | 93.24
| 4.26 | 17.59 | 35.90 | 24.16 |
| 98.95 | 96.28 | 96.12 | 94.34 |
---------+--------+--------+--------+--------+
4 | ?????? | 9 | 22 | 27 | 104
| ?????? | 0.38 | 0.93 | 1.14 | 4.40
| ?????? | 8.65 | 21.15 | 25.96 |
| ?????? | 2.23 | 2.67 | 4.78 |
---------+--------+--------+--------+--------+
6 | ?????? | 6 | 10 | 5 | 56
| ?????? | 0.25 | 0.42 | 0.21 | 2.37
| ?????? | 10.71 | 17.86 | 8.93 |
| ?????? | 1.49 | 1.21 | 0.88 |
---------+--------+--------+--------+--------+
Total 95 403 824 565 2366
4.02 17.03 34.83 23.88 100.00
(Continued)
Proc Freq: Two-way Tables Suppression (Cont.): Proc Freq: Two-way Tables Suppression (Cont.) checking frequencies 4
12:01 Thursday, May 6, 1999
TABLE OF FAMREL BY FAMSIZER
FAMREL FAMSIZER
Frequency|
Percent |
Row Pct |
Col Pct | 6| 7| 8| 9| Total
---------+--------+--------+--------+--------+
3 | 209 | 98 | 19 | 73 | 2206
| 8.83 | 4.14 | 0.80 | 3.09 | 93.24
| 9.47 | 4.44 | 0.86 | 3.31 |
| 90.48 | 83.05 | 59.38 | 74.49 |
---------+--------+--------+--------+--------+
4 | 13 | 10 | ?????? | 12 | 104
| 0.55 | 0.42 | ?????? | 0.51 | 4.40
| 12.50 | 9.62 | ?????? | 11.54 |
| 5.63 | 8.47 | ?????? | 12.24 |
---------+--------+--------+--------+--------+
6 | 9 | 10 | ?????? | 13 | 56
| 0.38 | 0.42 | ?????? | 0.55 | 2.37
| 16.07 | 17.86 | ?????? | 23.21 |
| 3.90 | 8.47 | ?????? | 13.27 |
---------+--------+--------+--------+--------+
Total 231 118 32 98 2366
9.76 4.99 1.35 4.14 100.00
Fully Automated and Expert system?: Fully Automated and Expert system? Fully automated?
Reboot to deal with memory leakage.
Confidentiality Expert? How reliable?
As good as underlying algorithms. Needs constant monitoring
What is new?: What is new? Improved and expanded hardware platform
Two machines dedicated to heavy remote access usage
Three additional machines dedicated to general remote access usage
What is New?: What is New? Sudaan now available to remote access users
Proc Crosstab
Proc Rlogist
Proc Regress
Proc Multilog
Proc Survival
What is new: What is new Proc Descript
Other new Sudaan procedures will be made available shortly
Plans to make Stata available through remote access
What is new: What is new Web Component of ANDRE under construction.
On-line scanning of users’ code
Valuable research tools and information readily available to the users.
Contact Information: Contact Information For general Questions/Comments
Email: rdca@cdc.gov Phone: (301) 458-4732
For On-site Info:
Email: Neb9@cdc.gov Phone: (301) 458-4097
For Remote Access Info:
Email: vgambhir@cdc.gov Phone: (301) 458-4226