Analysis of Surveillance Data: Analysis of Surveillance Data Source: Denis Coulombier, WHO Arnold Bosman, 2006
Steps in Surveillance Analysis: Steps in Surveillance Analysis Data quality
Descriptive analysis
Time
Place
Persons
Generate hypothesis
Test hypothesis
Data Quality Issues: Data Quality Issues Missing values
Attraction to round figures
Data entry errors
Bias related to lack of representativity
Cases more severe
Urban > rural
Source not represented (private sector, GPs)
Notifications of All Notifiable Diseasesby Date of Onset, USA, 1989: Notifications of All Notifiable Diseases by Date of Onset, USA, 1989
Measles and ARI by Month, Haiti, 1992-1993, 38 Sentinel Sites: 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 92 93 0 1 2 3 4 5 Cases X 1000 Measles Measles and ARI by Month, Haiti, 1992-1993, 38 Sentinel Sites
Analysis of time characteristics: Analysis of time characteristics
Descriptive Analysis of Time: Descriptive Analysis of Time Graphical analysis
Requires aggregation on appropriate time unit
Choice of the time variable
Date of onset
Date of notification
To describe trend, seasonality, and residuals
Use of rates when denominator changes over time
Descriptive Analysis of TimeGraphical analysis: Descriptive Analysis of Time Graphical analysis
Descriptive Analysis of TimeComponent of Surveillance Data: Descriptive Analysis of Time Component of Surveillance Data
Descriptive Analysis of TimeSmoothing Techniques: Descriptive Analysis of Time Smoothing Techniques
Notification of giardiasis in Delaware, 03/1991-03/1995: Notification of giardiasis in Delaware, 03/1991-03/1995
Effect of the Moving Average Window Size: Effect of the Moving Average Window Size Weekly Notifications of Salmonellosis, Georgia, 1993-1994
Cases of Gonorrhea in Michigan: Cases of Gonorrhea in Michigan Week 10 of 1994 and 208 Previous Weeks
Descriptive Analysis of TimeSize of the Moving Average Window: Descriptive Analysis of Time Size of the Moving Average Window Showing seasonality: smooth residuals
Empirical approach
Window increases with variance
5 to 15 weeks
Showing trend: smooth residuals and seasonality
52 weeks
Malaria- By year, United States, 1930-1992: Malaria- By year, United States, 1930-1992 1931 1936 1941 1946 1951 1956 1961 1966 1971 1976 1981 1986 1991 Year 0.01 0.1 1 10 100 1000 Cases/100000 population Relapse -Overseas cases Relapses, Korean veterans Vietnam veterans Immigration 1931 1936 1941 1946 1951 1956 1961 1966 1971 1976 1981 1986 1991 Year 0 20 40 60 80 100 120 Cases/100000 population Relapse - Oversea cases Relapses, Korean veterans Vietnam veterans Immigration Semi-log scale Arithmetic scale
Testing for Time Hypothesis: Testing for Time Hypothesis Remove confounding (rates)
Removing time dependency
Trend and seasons
By restriction or modelling
Test for detection of outbreaks
More cases than expected?
Test for changes in trend
Departure from historical trend?
Accounting for Time DependencyAirline Passengers in the US, Monthly Data, 1949 - 1960: 0 100 200 300 400 500 600 700 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 Accounting for Time Dependency Airline Passengers in the US, Monthly Data, 1949 - 1960 Is the red dot consistent with the data?
Tests not accounting for time dependencyMean + 1.96 Standard Deviations: Tests not accounting for time dependency Mean + 1.96 Standard Deviations 0 100 200 300 400 500 600 700 -10 10 30 50 70 90 110 130 150 Yes 95% CI Mean Randomly ordered data
Tests accounting for time dependency: Chronologically ordered data Tests accounting for time dependency -0,4 -0,3 -0,2 -0,1 0,0 0,1 0,2 0,3 1 12 23 34 45 56 67 78 89 100 111 122 Month 95% CI Mean Residuals, after removing trend and seasonality
Statistical Tests for Time Series: Statistical Tests for Time Series For time series with no trend and seasonality: random series
Tests not accounting for time dependency
Chi square, Poisson
For time series with seasonality and no trend
Tests accounting for TD by restriction
Similar historical period mean/median
For all time series
Tests accounting for TD by modeling
Linear regression corrected for season or
Fourier analysis and SARIMA models
Olympic Games Surveillance, Athens 2004Septic Shocks, Syndromic Surveillance: Olympic Games Surveillance, Athens 2004 Septic Shocks, Syndromic Surveillance Poisson test
Count of cases/average previous 7 days (l) between 1-4% <1% P-value
MMWR Figure 1: Accidental variations? : MMWR Figure 1: Accidental variations? Mean and standard deviation Test Can be used with median and percentiles,
Better to reduce effect of past epidemics
Thresholds Based on Median and PercentilesDiarrhoea in Madaba district, Jordan, 2000-2001: Thresholds Based on Median and Percentiles Diarrhoea in Madaba district, Jordan, 2000-2001 Accounting for TD by restriction
5 weeks centred around current week, past 5 years (25 weeks)
5th and 95th percentile threshold Current week 5 week historical periods * 5 Forecast 95th perc. 5th perc. Historical period 52 week forecast
Comparison expressed in SD between notifications of weeks 31/97 to 34/97and previous 5 years, same period, France: Comparison expressed in SD between notifications of weeks 31/97 to 34/97 and previous 5 years, same period, France Botulisme Brucellosis Typhoid-parat. fever Legionellosis Meningococcemia Aids Foodborne outbreaks Tuberculosis Tetanus Probability of observing such a departure from historical data Alert Area: 1.65 Z-score > 10% 5 & < 10% 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 -0,5 -1 -1,5 -2 -2,5 -3 -3,5 -4 -4,5 -5 -5,5
Notification of Food borne Outbreaks in France, 1995-1998: Notification of Food borne Outbreaks in France, 1995-1998
Interpreting the results: Interpreting the results Role of chance
Role of bias
True disease pattern
Conclusions: Conclusions Analysis to draw attention
Validation by investigation