Sunday, August 3, 2014

2014 - colder and slightly wetter than average US growing conditions

Futures prices for corn have been decreasing a lot - for example the December 2014 contract for corn has decreased by more than 25% between the beginning of May and the beginning of August (chart of CME), indicating that the market either anticipates increased supply or a drop in demand, likely the former.  Maybe good weather will be giving the US, which produces 40% of the world's corn a bumper crop.

Below are the weather updates for 2014 until the end of July 2014.  Similar to earlier posts, I graphed the cumulative degree days above 29C, which have been shown to be very detrimental for corn growth. This measure counts how much temperatures exceed 29C and for how long. For example, being half a day at 33C would result in 2degree days above 29C (0.5days x 4C). These are the weighted average of all counties in the United States, where the weight is proportional to expected production (expected yield according to trend times last reported growing area).  Areas with higher yields and larger growing area get weighted more heavily.

Grey lines show the historic distribution from 1950-2011, while the last three years are shown in color.  The red line shows how hot July 2012 has been - notice the sharp increase of the red line in July.  By comparison, 2014 (green line) has been the second lowest total by the end of July of what we have observed in the last 65 years.  This should be great for con yields, as there hasn't been much damaging heat.

At the same time, it has also been slightly wetter than average.  Since too wet or too dry, having close to average amount of rain is good for crops as well.

Looking at these two graphs, it suggests that 2014 will be a very plentiful harvest.

Saturday, April 12, 2014

Daily weather data: original vs knock-off

Any study that focuses on nonlinear temperature effects requires precise estimates of the exact temperature distribution.  Unfortunately,  most gridded weather data sets only give monthly estimates (e.g., CRU, University of Delaware, and up until recently PRISM).  Monthly averages can hide extremes - both hot and cold. Monthly means don't capture how often and by how much temperatures pass a certain threshold.

At the time Michael Roberts and I wrote our article on nonlinear temperature effects in agriculture, the PRISM climate group only made its monthly aggregates publicly available for download, but not the underlying daily data.  In the end we hence reverse-engineered the PRISM interpolation algorithm, i.e., we regressed monthly averages at each PRISM grid on monthly averages at the (7 or 10, depends on the version) closest weather stations that are publicly available.  Once we had the regression estimates linking monthly PRISM averages to weather stations, we bravely applied them to the daily weather data at the stations to get daily data at the PRISM cells (for more detail, see the paper).  Cross-validation suggested we weren't that far off, but then again, we only could do cross-validation tests in areas that have weather stations.

Recently, the PRISM climate group made their daily data available from the 1980s onwards.  I finally got a chance to download them and compare them to the daily data we previously had constructed from monthly averages.  This was quiet a nerve-wrecking exercise: how far were we off and does it change the results - or in the worst case, did I screw up the code and got garbage for our previous paper?

Below is a table that summarizes PRISM's daily data for the growing season (April-September) in all counties east of the 100 degree meridian except Florida that either grow corn or soybeans, basically the set of counties we had used in our study (small change: our study used 1980-2005, but since PRISM's daily data is only available from 1981 onwards, the tables below use 1981-2012). The summary statistics are:

First sigh of relieve! It looks like the numbers are rather close (strangely enough, the biggest deviations seems to be for precipitation, yet we used PRISM's monthly aggregates to derive season-totals and did not rely on any interpolation, so the new daily PRISM data is a bit different from the old PRISM data). Also, recall from a recent post that looked at the NARR data that degrees above 29C can differ a lot between data sets, as small differences in the daily maximum temperature will give vastly different results.

Next, I plugged both data sets into a panel of corn and soybean yields to see which one explains those yields better (i) in sample; and (ii) out of sample.  I used models using only temperature variables (columns a and b) as well as models using the same four weather variables we used before (columns c and d). PRISM's daily data is used in columns a and c, our re-engineered data are in columns b and d:

Second sigh of relief: It seems to be rather close again. In all four comparisons (1b) to (1a), (1d) to (1c), (2b) to (2a), and (2d) to (2c), our reconstruction for some strange reason has a larger in-sample R-square.  The reduction in RMSE is given in the second row of the footer: it is the reduction in out-of sample prediction error compared to a model with no weather variables. I take 1000 times 80% of the data as estimation sample and derive the prediction error for the remaining 20%. The given number is the average of the 1000 draws. For RMSE reductions, the picture is mixed: for the corn models that only include the two degree days variables, the PRISM daily data does slightly better, while the reverse is true for soybeans.  In models that also include precipitation, the construction of season-total precipitation seems to do better when I added the monthly PRISM totals (columns d) rather than adding the new daily PRISM precipitation totals (columns c).

Finally, since the data we constructed is a knock-off, how can it do better than the original in some cases?  My wild guess (and this is really only speculation) is that we took great care in filling in missing data for weather stations to get a balanced panel.  That way we insured that year-to-year fluctuations are not due to fact that one averages over a different set of stations.  I am not aware how exactly PRISM deals with missing weather station data.

Thursday, January 2, 2014

Massetti et al. - Part 3 of 3: Comparison of Degree Day Measures

Yesterday's blog entry outlined the differences between Massetti et al. derivation of degree days and our own.  To quickly recap: Our measure show much less variation within a county over the years, i.e., the standard deviation of fluctuations around the mean outcome in a county are about a third of theirs. One possibility is that our measure over-smoothes the year-to-year fluctuations, or alternatively, that Massetti et al.'s fluctuations might include measurement error, which would result in attenuation bias (paper).

Below are tests comparing various degree day measures in a panel of log corn and soybean yields. It seems preferable to test the predictive power in a panel setting as one does not have to worry about omitted variable bias (As mentioned before, Massetti et al. did not share their data with us and we hence can't match the same controls in a cross-sectional regression of farmland values). We use the optimal degree days bounds from earlier literature.

The following two tables regress log corn and soybean yields, respectively, for all counties east of the 100 degree meridian (except Florida) in 1979-2011 on four weather variables, state-specific restricted cubic splines with 3 knots, and county fixed effects. Column definitions are the same as in yesterday's post: Columns (1a)-(3b) use the NARR data to derive degree dats, while column (4b) uses our 2008 procedure. Columns (a) use the approach of Massetti et al. and derive the climate in a county as the inverse-distance weighted average of the four NARR grids surrounding a county centroid.  Columns (b) calculate degree days for each 2.5x2.5mile PRISM grid within a county (squared inverse-distance weighted average of all NARR grids over the US) and derives the county aggregate as the weighted average of all grids where the weight is proportional to the cropland area in a county. 

Columns (0a)-(0b) are added as baseline using a quadratic in growing season average temperature. Columns (1a)-(1b) follow Massetti et al. and first derive average daily temperatures and degree days using daily averages, i.e., degree days are only positive if the daily average exceeds the threshold. Columns (2a)-(2b) calculate degree days for each 3-hour reading. Degree days will be positive if part of the temperature distribution is above the threshold, but not the daily average.  Columns (3a)-(3b) approximate the temperature distribution within a day by linearly interpolating between the 3-hour measures.  Column (4b) uses a sinusoidal approximation between the daily minimum and maximum to approximate the temperature distribution within a day.

Explaining log corn yields 1979-2011.

Explaining log soybean yields 1979-2011.

The R-square is lowest for regressions using a quadratic in average temperature (0.37 for corn and 0.33 for soybeans).  It is slightly higher when we use degree days based on the NARR data set in columns (1a)-(3b), ranging from 0.39-0.41 for corn and 0.35-0.36 for soybeans.  It is much higher when our degree days measure is used in columns (4b): 0.51 for corn and 0.48 for soybeans.

The second row in the footer lists the percent reduction in root mean squared error (RMSE) compared to a model with no weather controls (just county fixed effects and state-specific time trends). Weather variables that add nothing would have 0%, while weather measures that explain all remaining variation would reduce the RMSE by 100%.  Column (4b) reduces the RMSE by twice as much as measures derived from NARR. Massetti et al.'s claim that they introduce "accurate measures of degree days" seems very odd given that their measure performs half as well as previously published measures that we shared with them.

The NARR data set likely includes more measurement error than our previous data set. Papers making comparisons between degree days and average temperature should use the best available degree days construction in order not to bias the test against the degree days model.

Correction (January 30th): An earlier version had a mistake in the code by calculating the RMSE both in and out-of-sample. The corrected version only calculates the RMSE out-of-sample.  While the reduction in RMSE increased for all columns, the relative comparison between models is not impacted.

Wednesday, January 1, 2014

Massetti et al. - Part 2 of 3: Calculation of Degree Days

Following up on yesterday's post, let's look at the differences in how to calculate degree days. Recall that degree days just count the number of degrees above a threshold and sum them over the growing season.  Massetti et al. argue in their abstract that "The paper shows that [...] hypotheses of the degree day literature fail when accurate measures of degree days are used." This claim is attributed to the fact that Massetti et al. supposedly use better data and hence get more accurate readings of degree days, however, no empirical evidence is provided. They use data from the North American Regional Reanalysis (NARR) that provides temperatures at 3-hour intervals. The authors proceed to first calculate average temperatures for each day from the eight readings per day, and then calculate degree days as the difference of the average temperature to the threshold.

Before we compare their method to calculating degree days to ours, a few words on the NARR data. Reanalysis data combine observational data with differential equations from physical models to interpolate data. For example, they utilize mass and energy balance, i.e., a certain amount of moisture can only fall once at precipitation.  If precipitation comes down in one grid, it can't also come down in a neighboring grid.  On the plus side, the physical models construct an entire series of data (solar radiation, dew point, fluxes, etc) that normal weather stations do not measure.  On the downside, the imposed differential equations that relate all weather measures imply that interpolated data do not always match actual observations.

So how do the degree days in Massetti et al. compare to ours? Here's a little detour on degree days - this is a bit technical and dry, so please be patient.  The first statistical study my coauthors and I published using degree days in 2006 used monthly temperature data since we did not have daily temperature data at the time.  Since degree days depend how many times a temperature threshold is passed, monthly averages can be a challenge as a temporal average will hide how many times a threshold is passed.  The literature has gotten around this problem by estimating an empirical link between the standard deviation in daily and monthly temperatures, called Thom's formula. We used this formula was used to derive fluctuations in average daily temperatures to derive degree days.

The interpolation of the temperature distribution when only knowing monthly averages is certainly not ideal, and we hence went through great length to better approximate the temperature distribution. All of my subsequent work with various coauthors hence not only looked at the distribution of daily average temperatures within a month, but went one step further by looking at the temperature distribution within a day.  The rational is that even if average daily temperatures do not cross a threshold, the daily maximum might.  We interpolated daily maximum and minimum temperature, and fit a sinusoidal curve between the two to approximate the distribution within a day (See Snyder). This is again an interpolation and might have its own pitfalls, but one can empirically test whether it improves predictive power, which we did and will do for part 3 of this series.

Here is my beef with Massetti et al: Our subsequent work in 2008 showed that calculating degree days using the within-day distribution of temperatures is much better.  We even emphasize that in a panel setting average temperatures perform better than degree days derived using Thom's formula (but not in the cross-section as the Thom's approximation works much better at getting average number of degree days correct than year-to-year fluctuations around the mean). What I find disingenuous in the Massetti et al. is that it makes a general statement about comparing degree days to average temperature, yet only discusses the inferior approach for calculating degree days using Thom's formula.  What makes things worse is that we shared our "better" degree days data that uses the within day distribution with them (which they acknowledge).
 
Unfortunately, Massetti et al. decided not to share their data with us, so the analysis below uses our construction of their variables.  We downloaded surface temperature from NARR.  The reanalysis data provides temperature readings at several altitude levels above ground, and in general, the higher the reading above the ground, the lower temperatures, which will result in lower degree day numbers.

The following table constructs degree days for counties east of the 100 degree meridian in various ways.  Columns (1a)-(3b) use the NARR data, while column (4b) uses our 2008 procedure. Columns (a) use the approach of Massetti et al. and derive the climate in a county as the inverse-distance weighted average of the four NARR grids surrounding a county centroid.  Columns (b) calculate degree days for each 2.5x2.5mile PRISM grid within a county (squared inverse-distance weighted average of all NARR grids over the US) and derives the county aggregate as the weighted average of all grids where the weight is proportional to the cropland area in a county. Results don't differ much between (a) and (b).

Columns (1a)-(1b) follow Massetti et al. and first derive average daily temperatures and degree days using daily averages, i.e., degree days are only positive if the daily average exceeds the threshold. Columns (2a)-(2b) calculate degree days for each 3-hour reading. Degree days will be positive if part of the temperature distribution is above the threshold, but not the daily average.  Columns (3a)-(3b) approximate the temperature distribution within a day by linearly interpolating between the 3-hour measures.  Column (4b) uses a sinusoidal approximation between the daily minimum and maximum to approximate the temperature distribution within a day.
Average temperature and average season-total degree days 8-32C in 1979-2011 are fairly consistent between all columns.  We give the mean outcome in a county as well as two standard derivations: the between standard deviation (in round brackets) is the standard deviation in the average outcome between counties, while the within standard deviation [in square brackets] is the average standard deviation of the year-to-year fluctuations around a county mean. The between standard deviation is fairly consistent across columns, but the within-county standard deviation is much lower for our interpolation in column (4b).

As a result of the lower within-county variation, fluctuations are lower and hence the threshold is passed less often in column (4b).  Extreme heat as measured by degree days above 29C or 34C are hence lower when the within-day distribution is use din column (4b) compared to columns (2a)-(3b). There are two possible interpretation: either our data is over-smoothing and hence under-predicting the variance, or NARR has measurement error which will lead to attenuation bias.  We will test both possible theories in part 3 tomorrow.