
|
|
publications > water resources investigations > report 99-4094 > estimation of nutrient loads and water-quality analyses > model development
Model DevelopmentAn ordinary least-squares regression technique was used to develop predictive equations for the purpose of estimating total nitrogen and total phosphorus loads discharged from the east coast canals to Biscayne Bay. The predictive equations can be used to estimate the value of a dependent variable from observations on a related or independent variable. In this study, load was used as the dependent or response variable and discharge as the independent or explanatory variable. Because discharge is used in the computation of load, linearity and the best fit equation can be established or developed more easily by relating load to discharge rather than concentration to discharge. Using more than one independent variable, such as stage, rainfall, gate opening, and discharge, multiple linear regression was attempted to improve the predictive equations. However, because discharge is computed from stage and gate openings and is based upon rainfall, collinearity between independent variables precluded this approach. A smoothing procedure, called locally weighted scatterplot smooth (LOWESS), was used by plotting load as a function of discharge to determine the degree of linearity between the two variables. When it was deemed necessary to improve the linear relation between load and discharge, transformations of the independent variable were made based upon the relation of the curve to the Mosteller and Tukey bulging rule (Helsel and Hirsch, 1992, p. 229). Improvement in the models was based on increases in the adjusted coefficient of determination (R2 ), which explains the amount of variation in the load determined by discharge and reduction in the predicted error sums of squares (PRESS) statistic. The models selected were those having the highest adjusted R2 and the lowest PRESS statistic as well as the lowest root mean square error or standard error of the regression. The null hypothesis of no linear relation between load and discharge was rejected at the 0.05 significance level ( An important part of model development is residuals analysis. Two assumptions of an ordinary least-squares regression are: (1) variance of the residuals is constant (homoscedastic) over the range of values, and (2) residuals are independent. Plots of predicted values against residuals were examined, and where nonconstant variance (heteroscedasticity) over the range of values was observed, log transformations of the response variables were made. Because of errors in comparing log space with real space, no comparisons were made with this analogy between transformed and nontransformed models based on the adjusted R2 or PRESS statistics. A key element in model development is regression diagnostics. Basing model adequacy solely on the adjusted R2 may prove to be inadequate because there may be no indication as to whether the data have been well fitted. Examination of data points for leverage, influence, or outliers was required to verify model adequacy. Outliers in the x direction were determined to be significant if they exceeded 3 p/n where p is the number of coefficients and n equals the number of samples (Helsel and Hirsch, 1992, p. 247). Studentized residuals were used to examine outliers in the y direction, and Cooks D was used to determine influence from outliers. Observations were considered to have high influence if Cooks D exceeded the value for the F distribution for p +1, n - p at The ESTIMATOR ProgramBecause continuous discharge data are currently being computed and long-term water-quality data are available at site S-26 along Miami Canal, a software program, called ESTIMATOR, was used in the development of nitrogen and phosphorus load models and for the estimation of loads at the site. The ESTIMATOR program develops models and computes loads based on mean daily discharge and water-quality data files. This program was used at site S-26 (formerly part of the NASQAN network as previously discussed) because of the continuous discharge data available and the requirement that about 50 water samples should have been collected over at least a 2-year period. The ESTIMATOR program consists of a seven-parameter log/linear model employing the following regressors: a constant, a quadratic fit to the natural logarithm of discharge, a quadratic fit to time, and a sinusoidal first-order Fourier function to compensate for seasonality. The model is written as: lnL = where ln is the natural logarithm, L is load, The ESTIMATOR program employs a minimum variance unbiased estimator for the elimination of bias in the transformation of log space to real space. Regression models for total nitrogen and total phosphorus loads and coefficients statistically significant at the 95 percent confidence intervals for site S-26 are presented in table 4. In addition to the constant, both the logarithms of discharge and discharge squared were significant for the total nitrogen load model, whereas only the logarithm of discharge was significant for the total phosphorus load model. The program estimates monthly and annual mean daily loads. Maximum and minimum monthly mean daily and annual daily loads computed by the ESTIMATOR program at site S-26 for water years 1987 through 1996 are summarized in table 5.
Nutrient Loads Estimation ModelsMeasured nutrient loads were used to develop equations through simple linear regression analyses in order to estimate computed loads based on discharge. The measured data were mainly based on data collected from depth-integrated samples. The nutrient loads and/or discharge were occasionally transformed before simple linear regression techniques were employed to develop the "best fit" models used to estimate the loads. Results of the regression analyses relating load to discharge at the east coast canal sites are presented in table 6. The coefficients of determination (R2 ) in table 4 were adjusted to conform with the number of degrees of freedom in the model.
The adjusted R2 for the total nitrogen load models ranged from 0.69 to 0.99 (table 4), indicating that from 69 to 99 percent of the variation in total nitrogen load is explained by discharge. The maximum adjusted R2 of 0.99 was determined for sites G-58 (table 6) and S-26 (table 4), both in urban areas, and the minimum adjusted R2 of 0.69 was determined for site S-20 (table 6) in a forested/ wetland area. The average adjusted R2 for all the total nitrogen load models was 0.87. Models for five sites (S-20F, S-20G, S-21A, S-25, and S-29) required transformation of the independent variable (discharge) to improve linearity, and models for one site (S-22) required log transformation of the dependent variable (total nitrogen load) due to nonconstant variance of the residuals (heteroscedasticity). All of the total nitrogen load models had p-values less than 0.05, indicating they were statistically significant at an alpha level of 0.05. Plots showing total nitrogen load as a function of discharge at the east coast canal sites are shown in figure 17.
The adjusted R2 for the total phosphorus load models ranged from 0.23 to 0.99 (table 6), indicating from 23 to 99 percent of the variation in total phosphorus load is explained by discharge. The maximum adjusted R2 of 0.99 was determined for sites S-123 (table 6) and S-26 (table 4) located in urban areas, and the minimum adjusted R2 of 0.23 was determined for site S-21A in an agricultural area (table 6). The average adjusted R2 for all the total phosphorus load models was 0.76, which is lower than that for the total nitrogen load models. Models for five sites (G-58, S-20, S-21, S-21A, and S-25) required transformation of the independent variable (discharge) to improve linearity, and models for six sites (G-93, S-21A, S-22, S-25, S-28, and S-29) required log transformation of the dependent variable (total phosphorus load) due to heteroscedasticity. Most of the total phosphorus load models had p-values less than 0.05; however, the predictive model for site G-58 had a p-value of 0.08, indicating that it is statistically significant only at an alpha level of 0.1. Thus, upper and lower limits of the slope coefficient for site G-58 represents the 90 percent confidence intervals, indicating less predictive power for the model. The total phosphorus load model for site S-21A, indicating an inverse relation between total phosphorus concentration and discharge was not statistically significant at either an alpha level of 0.05 or 0.10 and should not be used for estimating loads. Plots showing total phosphorus load as a function of discharge at the east coast canal sites are presented in figure 18.
Variation in constituent concentration relative to discharge generally is attributed to two physical processes: dilution or washoff (or a combination of both). Dilution usually occurs as constituents enter a stream or canal from a point source or because of ground-water seepage during discharge. This is reflected as an inverse relation between constituent concentration and discharge, and is common among the major ions. Washoff generally occurs as nutrients enter a stream because of surface runoff from urban or agricultural areas, and in this case, reflects an increase in constituent concentrations with discharge. Variation in phosphorus concentrations, however, is commonly the result of both the dilution and washoff processes. The low adjusted R2 for some total phosphorus load models could be attributed to dilution and washoff occurring simultaneously, resulting in the poor linear relations between load and discharge. Model Usage and Limitations
Figure 1. Location of the east coast canal sites and subregions in Miami-Dade County [larger image].
The upper and lower limits of the 95 percent confidence intervals of the slope coefficient for the total nitrogen and total phosphorus load models are presented in table 6. The significance of the 95 percent confidence intervals and the relation to the predictive power of the models is demonstrated by the following example. If the mean daily discharge at site S-28 along Biscayne Canal were 300 ft3/s, the model-predicted total nitrogen load would range from 3.20 to 4.76 tons/d (tons per day) based on the lower and upper limits of the 95 percent confidence interval for the slope coefficient. A problem can arise with the use of log-transformed models in estimating the mean load for several time periods. If the log-transformed values are summed for a period and the mean is then transformed back into the original units, the estimates tend to be biased low and more closely resemble the median and not the mean. This can be overcome by use of the "smearing estimator" described in Helsel and Hirsch (1992) by the following equation: where Yi is the response variable; f -1 is the inverse of selected transformation; b0 is the intercept; bi is the slope coefficient; Xi is the value of x for which y is to be estimated, e i represents residuals, in original units; and n is the number of samples. Use of this equation requires that the residuals be independent and heteroscedastic. As previously mentioned, the ESTIMATOR program (used at site S-26) employs a minimum variance unbiased estimator for the elimination of bias in transforming from log space to real space.
|
||||||||||||||||||||||||||||||||||||||||
U.S. Department of the Interior, U.S. Geological Survey
This page is: http://sofia.usgs.gov/publications/wri/99-4094/modeldev.html
Comments and suggestions? Contact: Heather Henkel - Webmaster
Last updated: 19 January, 2005 @ 02:49 PM (KP)