comparemethods

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Comparison of Different Methods of Computing Yearly Growth Rates for Petroleum Supply, 1995-2005: 

Comparison of Different Methods of Computing Yearly Growth Rates for Petroleum Supply, 1995-2005 Carol Joyce Blumberg Mathematical Statistician Petroleum Statistical Methodology Team Petroleum Division Office of Oil and Gas

Disclaimer: 

Disclaimer This is a working document prepared by the Energy Information Administration (EIA) in order to solicit advice and comment on statistical matters from the American Statistical Association Committee on Energy Statistics. This topic will be discussed at EIA's spring 2007 meeting with the Committee to be held April 19 and 20, 2007.

Background: 

Background The Petroleum Division of the Office of Oil and Gas collects data on volumes of Product Supplied for, among other products, Finished Motor Gasoline, Distillate Fuel Oil, and Total Products. These data are collected weekly and monthly, with final numbers published annually.

Background (cont’d): 

Background (cont’d) Product Supplied = (Field Production + Refinery and Blender Net Production + Imports + Adjustments) – (Stock Change + Refinery and Blender Net Inputs + Exports)

Motivation for this Research: 

Motivation for this Research Our clients (especially market analysts and financial analysts) like to compute year-to-year growth rates. For example, what was the percent increase in volume of Product Supplied in March 2004 as compared to March 2003?

Motivation (cont’d): 

Motivation (cont’d) But, different analysts have used different methods of computing these yearly growth rates. This confuses the consumers of their growth rates (for example, the mass media and other financial analysts.)

Motivation (cont'd): 

Motivation (cont'd) So, the Petroleum Division decided to do an empirical study to determine which method(s) of computing yearly growth rates for Product Supplied are best.

Data Location: 

Data Location All of the data used in this research are available on EIA Petroleum Navigator. Petroleum Navigator is the main website for Petroleum data (one-stop shopping). Main url is http://tonto.eia.doe.gov/dnav/pet/pet_pub_publist.asp.

Units of Measurement: 

Units of Measurement All volumes are in thousands of BARRELS PER DAY. Using barrels per day allows one not to have to adjust for the different number of days in February in different years.

Relevant Data: 

Relevant Data Weekly Volumes Estimated by using data from five weekly surveys of samples from the appropriate populations. These numbers are reported in Weekly Petroleum Status Report (WPSR).

Relevant Data (cont'd): 

Relevant Data (cont'd) Monthly (Preliminary) Data The monthly data are collected using seven surveys. They are basically censuses of the appropriate populations. These data can be thought of as preliminary estimates. They are published about 60 days after the end of the month in Petroleum Supply Monthly. These data will be abbreviated as PSM.

Relevant Data (cont'd): 

Relevant Data (cont'd) Final Monthly Values These are the monthly values with corrections for late submissions and resubmissions of data. They are published in Petroleum Supply Annual (abbreviated as PSA.) PSA is published about 6 months after the end of the calendar year.

Yearly Growth Rates: 

Yearly Growth Rates Ideally a yearly growth rate here is defined by Formula A (which we call the “Gold Standard”) of where Month = Jan., Feb., Mar., Apr.… and t = number of years since 1994 and where PSA is the monthly amount reported in Petroleum Supply Annual.

Yearly Growth Rates (cont'd): 

Yearly Growth Rates (cont'd) But, there is an issue here. Analysts want to compute yearly growth rates as soon as possible after the end of a month. They do not want to wait between 6 to 18 months later when the PSA (final) data are available. They want a good estimate of the Gold Standard in a timely manner.

Main Research Question: 

Main Research Question How can we use PSM and Weekly data for year t and combine it with PSA, PSM and/or Weekly data from the SAME MONTH in the previous year (year t-1) to Best Estimate the Gold Standard of Formula A?

Methodology: 

Methodology In consultation with John Cook , Douglas MacIntyre, Carol French, Paula Weir, and Bin Zhang of the Petroleum Division it was decided to investigate 14 different possible estimation formulas.

Methodology (cont'd): 

Methodology (cont'd) These 14 formulas are on the handout and are actual methods used by analysts (including EIA) or slight variations on them. These formulas were then compared to the Gold Standard of Formula A using various criteria.

Using Weekly Data to Make Monthly Estimates: 

Using Weekly Data to Make Monthly Estimates But, before explaining these formulas and giving the evaluation criteria, I would like to define 3 ways of getting monthly estimates from weekly data: MFW, 4wa, and 4wb. Note: A reporting week basically goes from one Friday through the next Thursday.

MFW (Monthly-from-Weekly): 

MFW (Monthly-from-Weekly) The MFW estimates are weighted averages of the weeks that contain the days of a certain month. For example, for July 2006, the weighted average was {6*(data reported for 7/7/06) + 7*(data reported for 7/14/06) + 7*(data reported for 7/21/06) + 7*(data reported for 7/28/06) + 4*(data reported for 8/4/06)}/31.

Calendar for July, 2006 for MFW: 

Calendar for July, 2006 for MFW

4wa: 

4wa The quantities 4wa are defined as the four-week averages computed using the four-week period before the report date of the last Friday in the month.

4wb: 

4wb The quantities 4wb are a “compromise” between the MFW and 4wa estimates. They are defined as the four-week averages using the 4 weeks that cover the most days in the particular month under consideration, where all weeks have a report date of a Friday.

4wa versus 4wb: 

4wa versus 4wb For example, for August 2006 using the 4wa estimate would have a cutoff date of Friday, August 25 and thus include 24 days of August and 4 days of July.

4wa versus 4wb (cont'd): 

4wa versus 4wb (cont'd) However, using the four weeks that cover the most days in August the 4wb estimate would have a cutoff date of September 1 for the data and thus contain 28 days of August.

Calendar for August, 2006 for 4wa: 

Calendar for August, 2006 for 4wa

Calendar for August 2006 for 4wb: 

Calendar for August 2006 for 4wb

Reminder about Growth Rates: 

Reminder about Growth Rates A yearly growth rate is the percent change in volume of Product Supplied from year t-1 (denominator) to year t (numerator). What we did was to vary the numerator and denominator

Relationships between the Formulas: 

Relationships between the Formulas

Criteria for Comparison: 

Criteria for Comparison Closeness of means and standard deviations to Formula A 2. Differences of means (including t-tests) 3. Mean Square Error of Differences 4. Correlations with Formula A Percent of Time within ±1% and within ±2% of Formula A 6. Percent of Time in Same Direction (that is, both positive or both negative)

Closeness of Means and Standard Deviations & Mean Square Error: 

Closeness of Means and Standard Deviations & Mean Square Error Here I am using the Means, Standard Deviations, Difference in Mean from A, Standard Deviation of the Differences, p-values for the Differences (based on t-tests), and Mean Square Error from the summary Tables 1, 2, and 3 on the handout. After that I will discuss the rest of the entries on those tables.

Closeness of Means & Standard Deviations: 

Closeness of Means & Standard Deviations More detailed tables are in the written paper. When interpreting the tables (in the handout or paper) you can either look at the means or the differences in means. It is really your choice.

Closeness of Means & Standard Deviations Finished Motor Gasoline (See Table 1): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (See Table 1) Clearly PSM is a better numerator than any of the weekly based measurements. Dividing by PSA gives biased estimates that are underestimates for all numerators. Dividing by PSM does not. But, dividing by PSA gives a smaller standard deviation of the differences than dividing by PSM.

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) So, we need to look at Mean Square Error and statistical significance also. We find that PSM/PSM (Formula 2) has slightly bigger Mean Square Error than PSM/PSA (Formula 1). But, Formula 1 is statistically significantly different from Formula A, while Formula 2 is not.

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) Note: Having a statistically significant difference is not good here. We do not want to reject the null hypothesis. So, even though it is a bit of a mixed bag, I prefer Formula 2 (PSM/PSM) to Formula 1 (PSM/PSA) because it has less bias.

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) But, sometimes people do not want to wait the 60 days after the end of a month to figure out a growth rate. So, they then must use monthly estimates based on weekly data, which are computable within a few days of the end of the month.

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) So, if we use a weekly based measurement (either MFW, 4wa, or 4wb or some other linear combination of the weeks) as a numerator, what is the best denominator?

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) Answer: 1. There is hardly any differences between the three numerators of MFW, 4wa, and 4wb for any of the denominators. 2. None of them work really well as compared to having PSM as a numerator.

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd): 

Closeness of Means & Standard Deviations Finished Motor Gasoline (cont'd) 3. Further, dividing a weekly-based numerator by PSM is best because: Division by PSM (or division by itself) give only a small bias (difference in means) while PSA gives a larger bias Division by PSM (or division by PSA) give smaller Mean Square Error then division of a weekly-based measure by itself.

Closeness of Means & Standard Deviations Distillate Fuel Oil & Total Products: 

Closeness of Means & Standard Deviations Distillate Fuel Oil & Total Products Results are the same as for Finished Motor Gasoline except that the Mean Square Errors are (as compared to Finished Motor Gasoline): Slightly bigger for Total Products Much bigger for Distillate Fuel Oil

Closeness of Means & Standard Deviations (final slide of this section): 

Closeness of Means & Standard Deviations (final slide of this section) Also, as expected, the results for Formulas 12 to 14 are the averages of the results for the formulas that make up their first 6 months and last 6 months. So, even though I discussed them further in the paper, I will not discuss them in this presentation.

Correlations: 

Correlations For all three products, the correlations of Formula 1 (PSM/PSA) and Formula 2 (PSM/PSM) with the values given by Formula A (the “Gold Standard”) are much higher than the correlations using any of the other formulas.

Correlations (cont'd): 

Correlations (cont'd) In addition, if a weekly-based measurement is used in the numerator, then the correlations are the lowest when a weekly-based measure is divided by itself (Formulas 3, 4, and 5).

Percent of the Time Within ±1% and ±2% of Formula A: 

Percent of the Time Within ±1% and ±2% of Formula A These are important criteria since the users of our data want estimates that are close most of the time. These criteria are just as important as the usual statistical ones already discussed.

Percent of the Time Within ±1% and ±2% of Formula A (cont'd): 

Percent of the Time Within ±1% and ±2% of Formula A (cont'd) The story here is similar to correlations. Formulas 1 and 2 do the best by quite a lot for all products; Formulas 3 to 5 (those with the same weekly-based measured as the numerator and denominator) do the worst.

Percent of the Time Within ±1% and ±2% of Formula A (cont'd): 

Percent of the Time Within ±1% and ±2% of Formula A (cont'd) But, all of the formulas that use a weekly-based measure as the numerator have problems.

Percent of Time Each Formula in the Same Direction as Formula A: 

Percent of Time Each Formula in the Same Direction as Formula A Since we are dealing with yearly growth rates, there are many times when Formula A and all of the other formulas give growth rate estimates of near 0%.

Percent of Time Each Formula in the Same Direction as Formula A (cont'd): 

Percent of Time Each Formula in the Same Direction as Formula A (cont'd) So, sometimes we get the unfortunate situation where Formula A can give a positive rate of growth and another formula gives a negative rate of growth (or vice versa.)

Percent of Time Each Formula in the Same Direction as Formula A (cont'd): 

Percent of Time Each Formula in the Same Direction as Formula A (cont'd) That is, the Formula A and another formula may give estimates that are very close. But, one indicates positive growth and one indicates negative growth (or vice versa.) This is a problem “psychologically” since people tend to look more at the direction (up or down) rather than the magnitude of the value of a growth rate estimate.

Percent of Time Each Formula in the Same Direction as Formula A (cont'd): 

Percent of Time Each Formula in the Same Direction as Formula A (cont'd)

Overall Recommendations: 

Overall Recommendations •Estimating a year-to-year growth rate by dividing the PSM value for a certain month for year t by the PSA value for the same month of the previous year (Formula 1) should be avoided.

Overall Recommendations (cont'd): 

Overall Recommendations (cont'd) •Dividing the PSM value for a certain month for year t by the PSM value for the same month of the previous year (Formula 2) is preferable to using any of the monthly averages (that is, MFW, 4wa, or 4wb) derived from weekly data divided by any available denominator (Formulas 3 to 14).

Overall Recommendations (cont'd): 

Overall Recommendations (cont'd) •No recommendation can be made for those situations where it is necessary to use a monthly average in the numerator that is derived from weekly data, since it is a “mixed bag”. See paper for details.

Questions for the Committee: 

Questions for the Committee Any suggestions for new ways to form the numerator and/or denominator when weekly data are used? Other useful criteria for comparing the different formulas? 3. Ideas for investigating seasonality? 4. Any suggestions for looking at concavity (rate of change of the yearly growth rates), remembering that these are discrete (monthly) measurements?

Thank you: 

Thank you Carol Joyce Blumberg Petroleum Division (EI-42) Office of Oil and Gas Carol.Blumberg@eia.doe.gov (202) 586-6565 Fax:(202) 586-4913 (after about May 5)