USGS
South Florida Information Access
SOFIA home
Help
Projects
by Title
by Investigator
by Region
by Topic
by Program
Results
Publications
Meetings
South Florida Restoration Science Forum
Synthesis
Information
Personnel
About SOFIA
USGS Science Strategy
DOI Greater Everglades Science Plan
Education
Upcoming Events
Data
Data Exchange
Metadata
publications > water resources investigations > report 99-4094 > estimation of nutrient loads and water-quality analyses > model development

Home
Introduction
Study Area
Methods & Procedures
Estimation of Nutrient Loads & Water-Quality Analyses
- Distribution
- Cross-Section Surveys
- Statistical Comparisons
- Model Development
Summary & Conclusions
References
PDF Version

Model Development

An ordinary least-squares regression technique was used to develop predictive equations for the purpose of estimating total nitrogen and total phosphorus loads discharged from the east coast canals to Biscayne Bay. The predictive equations can be used to estimate the value of a dependent variable from observations on a related or independent variable. In this study, load was used as the dependent or response variable and discharge as the independent or explanatory variable. Because discharge is used in the computation of load, linearity and the best fit equation can be established or developed more easily by relating load to discharge rather than concentration to discharge.

Using more than one independent variable, such as stage, rainfall, gate opening, and discharge, multiple linear regression was attempted to improve the predictive equations. However, because discharge is computed from stage and gate openings and is based upon rainfall, collinearity between independent variables precluded this approach. A smoothing procedure, called locally weighted scatterplot smooth (LOWESS), was used by plotting load as a function of discharge to determine the degree of linearity between the two variables. When it was deemed necessary to improve the linear relation between load and discharge, transformations of the independent variable were made based upon the relation of the curve to the Mosteller and Tukey bulging rule (Helsel and Hirsch, 1992, p. 229).

Improvement in the models was based on increases in the adjusted coefficient of determination (R2 ), which explains the amount of variation in the load determined by discharge and reduction in the predicted error sums of squares (PRESS) statistic. The models selected were those having the highest adjusted R2 and the lowest PRESS statistic as well as the lowest root mean square error or standard error of the regression. The null hypothesis of no linear relation between load and discharge was rejected at the 0.05 significance level (alpha symbol level) when the attained significance level (p-value) was less than 0.05.

An important part of model development is residuals analysis. Two assumptions of an ordinary least-squares regression are: (1) variance of the residuals is constant (homoscedastic) over the range of values, and (2) residuals are independent. Plots of predicted values against residuals were examined, and where nonconstant variance (heteroscedasticity) over the range of values was observed, log transformations of the response variables were made. Because of errors in comparing log space with real space, no comparisons were made with this analogy between transformed and nontransformed models based on the adjusted R2 or PRESS statistics.

A key element in model development is regression diagnostics. Basing model adequacy solely on the adjusted R2 may prove to be inadequate because there may be no indication as to whether the data have been well fitted. Examination of data points for leverage, influence, or outliers was required to verify model adequacy. Outliers in the x direction were determined to be significant if they exceeded 3 p/n where p is the number of coefficients and n equals the number of samples (Helsel and Hirsch, 1992, p. 247). Studentized residuals were used to examine outliers in the y direction, and Cook’s D was used to determine influence from outliers. Observations were considered to have high influence if Cook’s D exceeded the value for the F distribution for p +1, n - p at alpha symbol = 0.1 (Helsel and Hirsch, 1992, p. 249). Numerous data values demonstrated high leverage, but only a few data values showed both high leverage and high influence.

The ESTIMATOR Program

Because continuous discharge data are currently being computed and long-term water-quality data are available at site S-26 along Miami Canal, a software program, called ESTIMATOR, was used in the development of nitrogen and phosphorus load models and for the estimation of loads at the site. The ESTIMATOR program develops models and computes loads based on mean daily discharge and water-quality data files. This program was used at site S-26 (formerly part of the NASQAN network as previously discussed) because of the continuous discharge data available and the requirement that about 50 water samples should have been collected over at least a 2-year period. The ESTIMATOR program consists of a seven-parameter log/linear model employing the following regressors: a constant, a quadratic fit to the natural logarithm of discharge, a quadratic fit to time, and a sinusoidal first-order Fourier function to compensate for seasonality. The model is written as:

lnL = beta sub-zero symbol + beta sub-one symbollnQ + beta sub-two symbolln2Q + beta sub-three symbolT + beta sub-four symbolT2 + beta sub-five symbolsin2pi symbolT + beta sub-six symbolcos2pi symbolT , (9) D

where ln is the natural logarithm, L is load, beta sub-zero symbol is the constant, beta sub-one symbolto beta sub-six symbol are the coefficients, Q is discharge, T is decimal time, T2 is decimal time squared, and sin2pi symbolT + cos2pi symbolT are periodic functions.

The ESTIMATOR program employs a minimum variance unbiased estimator for the elimination of bias in the transformation of log space to real space. Regression models for total nitrogen and total phosphorus loads and coefficients statistically significant at the 95 percent confidence intervals for site S-26 are presented in table 4. In addition to the constant, both the logarithms of discharge and discharge squared were significant for the total nitrogen load model, whereas only the logarithm of discharge was significant for the total phosphorus load model. The program estimates monthly and annual mean daily loads. Maximum and minimum monthly mean daily and annual daily loads computed by the ESTIMATOR program at site S-26 for water years 1987 through 1996 are summarized in table 5.

Table 4. Regression models for the estimation of total nitrogen and total phosphorus loads at site S-26 developed from the ESTIMATOR program

[click here to go to Table 4]

Table 5. Summary statistics for the estimation of total nitrogen and total phosphorus loads at site S-26 computed by the ESTIMATOR program for water years 1987-96

[click here to go to Table 5]

Nutrient Loads Estimation Models

Measured nutrient loads were used to develop equations through simple linear regression analyses in order to estimate computed loads based on discharge. The measured data were mainly based on data collected from depth-integrated samples. The nutrient loads and/or discharge were occasionally transformed before simple linear regression techniques were employed to develop the "best fit" models used to estimate the loads. Results of the regression analyses relating load to discharge at the east coast canal sites are presented in table 6. The coefficients of determination (R2 ) in table 4 were adjusted to conform with the number of degrees of freedom in the model.

Table 6. Results of regression analyses relating load to discharge at the east coast canal sites in Miami-Dade County

[click here to go to Table 6]

The adjusted R2 for the total nitrogen load models ranged from 0.69 to 0.99 (table 4), indicating that from 69 to 99 percent of the variation in total nitrogen load is explained by discharge. The maximum adjusted R2 of 0.99 was determined for sites G-58 (table 6) and S-26 (table 4), both in urban areas, and the minimum adjusted R2 of 0.69 was determined for site S-20 (table 6) in a forested/ wetland area. The average adjusted R2 for all the total nitrogen load models was 0.87. Models for five sites (S-20F, S-20G, S-21A, S-25, and S-29) required transformation of the independent variable (discharge) to improve linearity, and models for one site (S-22) required log transformation of the dependent variable (total nitrogen load) due to nonconstant variance of the residuals (heteroscedasticity). All of the total nitrogen load models had p-values less than 0.05, indicating they were statistically significant at an alpha level of 0.05. Plots showing total nitrogen load as a function of discharge at the east coast canal sites are shown in figure 17.

graph of total nitrogen load as a function of discharge at site G-58 graph of total nitrogen load as a function of discharge at site S-21
graph of total nitrogen load as a function of discharge at site S-26 graph of total nitrogen load as a function of discharge at site G-93
graph of total nitrogen load as a function of discharge at site S-21A graph of total nitrogen load as a function of discharge at site S-27
graph of total nitrogen load as a function of discharge at site S-20 graph of total nitrogen load as a function of discharge at site S-22
graph of total nitrogen load as a function of discharge at site S-28 graph of total nitrogen load as a function of discharge at site S-20F
graph of total nitrogen load as a function of discharge at site S-25 graph of total nitrogen load as a function of discharge at site S-29
graph of total nitrogen load as a function of discharge at site S-20G graph of total nitrogen load as a function of discharge at site S-25B
graph of total nitrogen load as a function of discharge at site S-123
Figure 17. Total nitrogen load as a function of discharge at the east coast canal sites in Miami-Dade County [click on each graph to view a larger version].

The adjusted R2 for the total phosphorus load models ranged from 0.23 to 0.99 (table 6), indicating from 23 to 99 percent of the variation in total phosphorus load is explained by discharge. The maximum adjusted R2 of 0.99 was determined for sites S-123 (table 6) and S-26 (table 4) located in urban areas, and the minimum adjusted R2 of 0.23 was determined for site S-21A in an agricultural area (table 6). The average adjusted R2 for all the total phosphorus load models was 0.76, which is lower than that for the total nitrogen load models. Models for five sites (G-58, S-20, S-21, S-21A, and S-25) required transformation of the independent variable (discharge) to improve linearity, and models for six sites (G-93, S-21A, S-22, S-25, S-28, and S-29) required log transformation of the dependent variable (total phosphorus load) due to heteroscedasticity. Most of the total phosphorus load models had p-values less than 0.05; however, the predictive model for site G-58 had a p-value of 0.08, indicating that it is statistically significant only at an alpha level of 0.1. Thus, upper and lower limits of the slope coefficient for site G-58 represents the 90 percent confidence intervals, indicating less predictive power for the model. The total phosphorus load model for site S-21A, indicating an inverse relation between total phosphorus concentration and discharge was not statistically significant at either an alpha level of 0.05 or 0.10 and should not be used for estimating loads. Plots showing total phosphorus load as a function of discharge at the east coast canal sites are presented in figure 18.

graph of total phosphorus load as a function of discharge at site G-58 graph of total phosphorus load as a function of discharge at site S-21
graph of total phosphorus load as a function of discharge at site S-26 graph of total phosphorus load as a function of discharge at site G-93
graph of total phosphorus load as a function of discharge at site S-21A graph of total phosphorus load as a function of discharge at site S-27
graph of total phosphorus load as a function of discharge at site S-20 graph of total phosphorus load as a function of discharge at site S-22
graph of total phosphorus load as a function of discharge at site S-28 graph of total phosphorus load as a function of discharge at site S-20F
graph of total phosphorus load as a function of discharge at site S-25 graph of total phosphorus load as a function of discharge at site S-29
graph of total phosphorus load as a function of discharge at site S-20G graph of total phosphorus load as a function of discharge at site S-25B
graph of total phosphorus load as a function of discharge at site S-123
Figure 18. Total phosphorus load as a function of discharge at the east coast canal sites in Miami-Dade County [click on each graph to view a larger version].

Variation in constituent concentration relative to discharge generally is attributed to two physical processes: dilution or washoff (or a combination of both). Dilution usually occurs as constituents enter a stream or canal from a point source or because of ground-water seepage during discharge. This is reflected as an inverse relation between constituent concentration and discharge, and is common among the major ions. Washoff generally occurs as nutrients enter a stream because of surface runoff from urban or agricultural areas, and in this case, reflects an increase in constituent concentrations with discharge. Variation in phosphorus concentrations, however, is commonly the result of both the dilution and washoff processes. The low adjusted R2 for some total phosphorus load models could be attributed to dilution and washoff occurring simultaneously, resulting in the poor linear relations between load and discharge.

Model Usage and Limitations

map of site locations

Figure 1. Location of the east coast canal sites and subregions in Miami-Dade County [larger image].


The models with the greatest predictive power are those with the highest adjusted R
2 and the lowest root mean square error. Because of project time constraints, data collected during this study span only 2 water years. The predictive power of the models might be enhanced if data were collected for a longer time period, such as 5 or 10 years, because a more representative range of hydrologic conditions is likely to occur over a longer time period. The small number of samples that were collected at some sites reflects the fact that the gates at the control structures rarely open, except during extreme rainfall events. Additionally, loads estimated by the models represent loads computed at the gated control structures on the canals. At some sites, such as S-25, S-25B, and S-26, the canal reaches (fig. 1) consist of several miles of ungaged sections to which both point and nonpoint source nutrient loads could be contributed before the terminus of the reaches at Biscayne Bay. This is particularly true of Miami River, downstream of S-26 along Miami Canal, which drains the urban and commercial areas of downtown Miami. Other processes, such as ground-water seepage, direct surface drainage, and precipitation, can also contribute nutrient loads to Biscayne Bay.

The upper and lower limits of the 95 percent confidence intervals of the slope coefficient for the total nitrogen and total phosphorus load models are presented in table 6. The significance of the 95 percent confidence intervals and the relation to the predictive power of the models is demonstrated by the following example. If the mean daily discharge at site S-28 along Biscayne Canal were 300 ft3/s, the model-predicted total nitrogen load would range from 3.20 to 4.76 tons/d (tons per day) based on the lower and upper limits of the 95 percent confidence interval for the slope coefficient.

A problem can arise with the use of log-transformed models in estimating the mean load for several time periods. If the log-transformed values are summed for a period and the mean is then transformed back into the original units, the estimates tend to be biased low and more closely resemble the median and not the mean. This can be overcome by use of the "smearing estimator" described in Helsel and Hirsch (1992) by the following equation:

equation 10, see long description D

where Yi is the response variable; f -1 is the inverse of selected transformation; b0 is the intercept; bi is the slope coefficient; Xi is the value of x for which y is to be estimated, e i represents residuals, in original units; and n is the number of samples. Use of this equation requires that the residuals be independent and heteroscedastic. As previously mentioned, the ESTIMATOR program (used at site S-26) employs a minimum variance unbiased estimator for the elimination of bias in transforming from log space to real space.

Go back to Statistical Comparisons | Go ahead to Summary and Conclusions



| Disclaimer | Privacy Statement | Accessibility |

U.S. Department of the Interior, U.S. Geological Survey
This page is: http://sofia.usgs.gov/publications/wri/99-4094/modeldev.html
Comments and suggestions? Contact: Heather Henkel - Webmaster
Last updated: 19 January, 2005 @ 02:49 PM (KP)