logging in or signing up Data_Cleaning process dramitbhatt Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 577 Category: Science & Tech.. License: All Rights Reserved Like it (2) Dislike it (0) Added: February 18, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Data Cleaning : Data Cleaning Cleaning Data : Cleaning Data Data cleaning or validation is a collection of activities used to assure validity & accuracy of data Logical & Statistical checks to detect impossible values due to data entry errors coding inconsistent data Who Cleans Database? : Who Cleans Database? Data Management Plan through SOPs clearly defines tasks & responsibilities involved in database cleaning Please see SOPs… who is Responsible for Cleaning! Cleaning Data : Cleaning Data Activities include: Manual review of data Computer checks/validations designed to identify inaccurate or invalid data using ranges completeness protocol violations consistency checks aggregate descriptive statistics to detect strange patterns in data Slide 5: Making sure that raw data were accurately entered into a computer-readable file Checking that character variables contain only valid values Checking that numeric values are within predetermined ranges Checking for & eliminating duplicate data entries Checking if there are missing values for variables where complete data are necessary definition of data cleaning definition of data cleaning : definition of data cleaning Checking for uniqueness of certain values, such as subject IDs Checking for invalid data values & invalid date sequences Verifying that complex multi-file (or cross panel) rules have been followed. For e g., if an AE of type X occurs, other data such as concomitant medications or procedures might be expected Clean Data Checklist : Clean Data Checklist Refers to a list of checks to be performed by data management while cleaning database Checklist is developed & customized as per client specifications Provides list of checks to be performed both on ongoing/periodic basis towards end of study Strict adherence to checklist prevents missing out on any of critical activities Slide 8: Point-by-point checks Textual data Continuing events Query generation Query integration Missing data Duplicate data Protocol violation Coding SAE Recon Data consistency Ranges External data Visit sequence Data Cleaning Point-by-Point Checks : Point-by-Point Checks Refers to cross checking between CRF & database for every data point Constitutes a “second-check” apart from data entry Incorrect entries/entries missed out by Data Entry are corrected during cleaning Special emphasis to be given for Dates Numerical values Header information (including indexing) Missing Data Checks : Missing Data Checks Missing responses to be queried for, unless indicated by investigator as not done not available not applicable Validations to be programmed to flag missing field discrepancies Missing Data…!! Missing Page Checks : Missing Page Checks Expected pages identified during setup of studies Tracking reports of missing pages to be maintained to identify CRFs misrouted in-house CRFs never sent from Investigator’s site AE Record Protocol Violation Checks : Protocol Violation Checks Protocol adherence to be reviewed & violations, if any, to be queried Primary safety & efficacy endpoints to be reviewed, to ensure protocol compliance Key protocol violations : Key protocol violations Inclusion & exclusion criteria adherence Age Concomitant medications/antibiotics Medical condition Study drug dosing regimen adherence Study or drug termination specifications Switches in medications Continuity of Data Checks : Continuity of Data Checks Refers to checking continuity of events that occur across study across visits Includes Adverse Events Medications Treatments/Procedures Overlapping Start/Stop Dates & Outcomes to be checked across visits Continuity of Data Checks : Continuity of Data Checks Overlapping dates across visits: Scenario: Per protocol, AEs are to be recorded on Visits 1, 2 & 3 “Headache” is recorded as follows: Consistency Checks : Consistency Checks Designed to identify potential data errors by checking sequential order of dates corresponding events missing data (indicated as existing elsewhere) Involves cross checking between data points across CRFs within same CRF Consistency Checks : Consistency Checks Cross check across different CRFs: AE reported with action “concomitant medication” (AE Record) Ensure corresponding concomitant medication reported in appropriate timeframe (Concomitant Medication Record) Consistency Checks : Consistency Checks Cross check within same CRF: 1st DCM: Report doses of antibiotics taken “before” intake of first dose of study drug 2nd DCM: Report doses of antibiotics taken “after” intake of first dose of study drug: NOTE: First dose of study drug is taken on 15-May-2001 Coding Checks : Coding Checks Textual or free text data collected & reported (AEs, medications) must be coded before they can be aggregated & used in summary analysis Coding consists of matching text collected on CRF to terms in a standard dictionary Items that cannot be matched, or coded without clarification from site Ulcers, for example, require a location (gastric, duodenal, mouth, foot, etc.) to be coded code Range Checks : Range Checks Designed to identify statistical outliners values that are physiologically impossible values that are outside normal variation of population under study Ensure that appropriate range values are applied For eg., ranges for WBCs can be applied either in ‘percentage’ or in ‘absolute’ Ensure that appropriate ranges are applied depending on whether lab used is Primary Secondary Range Checks : Range Checks Cross check between Hematology record & AE record: External Data Checks : External Data Checks Ensure receipt of all required external data from centralized vendors: Laboratory Data Device Data (ECG, Bioimages) Missing e-data records to be tracked & requested from vendor on a periodic basis Missing data to be noted & corresponding values to be ‘re-loaded’ by vendor External Data Checks : External Data Checks Examples of missing data/values: Missing collection time of blood sample Missing date of ECG Missing location of chest radiograph Missing systolic blood pressure Missing microbiological culture transmittal ID External Data Checks : External Data Checks Examples of invalid data/values: Incorrect loading of visit number Incorrect loading of subject number Incorrect loading of date/time of collection Duplicate Data Checks : Duplicate Data Checks Refers to duplicate entries within a single CRF across similar CRFs Duplicate entries & duplicate records to be deleted per guideline specifications Examples: Treatment ‘physiotherapy’ on ‘30-Aug-2001’ reported twice on either same Treatment Record or across two different Treatment Records Duplicate Data Checks : Duplicate Data Checks Examples: Both Visit 4 & Visit 10 Blood Chemistry CRFs (with different collection dates) are updated with same values for all tests performed Both ‘primary’ & ‘additional’ Medical History CRFs at Screening are reported with same details of abnormalities Which one to Retain…? Textual Data Checks : Textual Data Checks All textual data to be proofread & checked for spelling errors Obvious mis-spellings to be corrected per Internal Correction (as specified by guidelines) Common examples of textual data: Abnormalities/pre-existing conditions in Medical History record Adverse Events Medications/Antibiotics Project & study-specific data Visit Sequence Checks : Visit Sequence Checks Sequence of visits should be reviewed & if out of sequence, should be either queried corrected per Internal Correction (as per guidelines) Either a single CRF or a group of CRFs could be out of sequence with that particular visit Visit Sequence Checks : Visit Sequence Checks SAE Reconciliation Checks : SAE Reconciliation Checks All SAEs reported on CRFs should be reconciled with those reported on SAE Reports & vice versa Communication to be maintained with Sponsor Clinical Scientist Documents to be Followed : Documents to be Followed Protocol Guidelines – General & Project-Specific SOPs Subject Flowcharts Clean Patient Check Lists Tracking Spreadsheets You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Data_Cleaning process dramitbhatt Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 577 Category: Science & Tech.. License: All Rights Reserved Like it (2) Dislike it (0) Added: February 18, 2010 This Presentation is Public Favorites: 1 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Data Cleaning : Data Cleaning Cleaning Data : Cleaning Data Data cleaning or validation is a collection of activities used to assure validity & accuracy of data Logical & Statistical checks to detect impossible values due to data entry errors coding inconsistent data Who Cleans Database? : Who Cleans Database? Data Management Plan through SOPs clearly defines tasks & responsibilities involved in database cleaning Please see SOPs… who is Responsible for Cleaning! Cleaning Data : Cleaning Data Activities include: Manual review of data Computer checks/validations designed to identify inaccurate or invalid data using ranges completeness protocol violations consistency checks aggregate descriptive statistics to detect strange patterns in data Slide 5: Making sure that raw data were accurately entered into a computer-readable file Checking that character variables contain only valid values Checking that numeric values are within predetermined ranges Checking for & eliminating duplicate data entries Checking if there are missing values for variables where complete data are necessary definition of data cleaning definition of data cleaning : definition of data cleaning Checking for uniqueness of certain values, such as subject IDs Checking for invalid data values & invalid date sequences Verifying that complex multi-file (or cross panel) rules have been followed. For e g., if an AE of type X occurs, other data such as concomitant medications or procedures might be expected Clean Data Checklist : Clean Data Checklist Refers to a list of checks to be performed by data management while cleaning database Checklist is developed & customized as per client specifications Provides list of checks to be performed both on ongoing/periodic basis towards end of study Strict adherence to checklist prevents missing out on any of critical activities Slide 8: Point-by-point checks Textual data Continuing events Query generation Query integration Missing data Duplicate data Protocol violation Coding SAE Recon Data consistency Ranges External data Visit sequence Data Cleaning Point-by-Point Checks : Point-by-Point Checks Refers to cross checking between CRF & database for every data point Constitutes a “second-check” apart from data entry Incorrect entries/entries missed out by Data Entry are corrected during cleaning Special emphasis to be given for Dates Numerical values Header information (including indexing) Missing Data Checks : Missing Data Checks Missing responses to be queried for, unless indicated by investigator as not done not available not applicable Validations to be programmed to flag missing field discrepancies Missing Data…!! Missing Page Checks : Missing Page Checks Expected pages identified during setup of studies Tracking reports of missing pages to be maintained to identify CRFs misrouted in-house CRFs never sent from Investigator’s site AE Record Protocol Violation Checks : Protocol Violation Checks Protocol adherence to be reviewed & violations, if any, to be queried Primary safety & efficacy endpoints to be reviewed, to ensure protocol compliance Key protocol violations : Key protocol violations Inclusion & exclusion criteria adherence Age Concomitant medications/antibiotics Medical condition Study drug dosing regimen adherence Study or drug termination specifications Switches in medications Continuity of Data Checks : Continuity of Data Checks Refers to checking continuity of events that occur across study across visits Includes Adverse Events Medications Treatments/Procedures Overlapping Start/Stop Dates & Outcomes to be checked across visits Continuity of Data Checks : Continuity of Data Checks Overlapping dates across visits: Scenario: Per protocol, AEs are to be recorded on Visits 1, 2 & 3 “Headache” is recorded as follows: Consistency Checks : Consistency Checks Designed to identify potential data errors by checking sequential order of dates corresponding events missing data (indicated as existing elsewhere) Involves cross checking between data points across CRFs within same CRF Consistency Checks : Consistency Checks Cross check across different CRFs: AE reported with action “concomitant medication” (AE Record) Ensure corresponding concomitant medication reported in appropriate timeframe (Concomitant Medication Record) Consistency Checks : Consistency Checks Cross check within same CRF: 1st DCM: Report doses of antibiotics taken “before” intake of first dose of study drug 2nd DCM: Report doses of antibiotics taken “after” intake of first dose of study drug: NOTE: First dose of study drug is taken on 15-May-2001 Coding Checks : Coding Checks Textual or free text data collected & reported (AEs, medications) must be coded before they can be aggregated & used in summary analysis Coding consists of matching text collected on CRF to terms in a standard dictionary Items that cannot be matched, or coded without clarification from site Ulcers, for example, require a location (gastric, duodenal, mouth, foot, etc.) to be coded code Range Checks : Range Checks Designed to identify statistical outliners values that are physiologically impossible values that are outside normal variation of population under study Ensure that appropriate range values are applied For eg., ranges for WBCs can be applied either in ‘percentage’ or in ‘absolute’ Ensure that appropriate ranges are applied depending on whether lab used is Primary Secondary Range Checks : Range Checks Cross check between Hematology record & AE record: External Data Checks : External Data Checks Ensure receipt of all required external data from centralized vendors: Laboratory Data Device Data (ECG, Bioimages) Missing e-data records to be tracked & requested from vendor on a periodic basis Missing data to be noted & corresponding values to be ‘re-loaded’ by vendor External Data Checks : External Data Checks Examples of missing data/values: Missing collection time of blood sample Missing date of ECG Missing location of chest radiograph Missing systolic blood pressure Missing microbiological culture transmittal ID External Data Checks : External Data Checks Examples of invalid data/values: Incorrect loading of visit number Incorrect loading of subject number Incorrect loading of date/time of collection Duplicate Data Checks : Duplicate Data Checks Refers to duplicate entries within a single CRF across similar CRFs Duplicate entries & duplicate records to be deleted per guideline specifications Examples: Treatment ‘physiotherapy’ on ‘30-Aug-2001’ reported twice on either same Treatment Record or across two different Treatment Records Duplicate Data Checks : Duplicate Data Checks Examples: Both Visit 4 & Visit 10 Blood Chemistry CRFs (with different collection dates) are updated with same values for all tests performed Both ‘primary’ & ‘additional’ Medical History CRFs at Screening are reported with same details of abnormalities Which one to Retain…? Textual Data Checks : Textual Data Checks All textual data to be proofread & checked for spelling errors Obvious mis-spellings to be corrected per Internal Correction (as specified by guidelines) Common examples of textual data: Abnormalities/pre-existing conditions in Medical History record Adverse Events Medications/Antibiotics Project & study-specific data Visit Sequence Checks : Visit Sequence Checks Sequence of visits should be reviewed & if out of sequence, should be either queried corrected per Internal Correction (as per guidelines) Either a single CRF or a group of CRFs could be out of sequence with that particular visit Visit Sequence Checks : Visit Sequence Checks SAE Reconciliation Checks : SAE Reconciliation Checks All SAEs reported on CRFs should be reconciled with those reported on SAE Reports & vice versa Communication to be maintained with Sponsor Clinical Scientist Documents to be Followed : Documents to be Followed Protocol Guidelines – General & Project-Specific SOPs Subject Flowcharts Clean Patient Check Lists Tracking Spreadsheets