USGS
South Florida Information Access
SOFIA home
Help
Projects
by Title
by Investigator
by Region
by Topic
by Program
Results
Publications
Meetings
South Florida Restoration Science Forum
Synthesis
Information
Personnel
About SOFIA
USGS Science Strategy
DOI Greater Everglades Science Plan
Education
Upcoming Events
Data
Data Exchange
Metadata
publications > poster > dealing with data realities - automation of evaluation of data quality and estimation of missing data for the everglades depth estimation network (eden)

Dealing with Data Realities - Automation of Evaluation of Data Quality and Estimation of Missing Data for the Everglades Depth Estimation Network (EDEN)

Poster presented July 2009 at the 3rd National Conference on Ecosystem Restoration (NCER)

Paul A. Conrads1, Matthew D. Petkewich1, Ruby Daamen2, and Edwin A. Roehl, Jr.2

1USGS South Carolina Water Science Center, Columbia, S.C.
2Advanced Data Mining LLC, Greenville, S.C.

Background

map of southern Florida showing location of water-level gages in Everglades Depth Estimation Network
Figure 1. Location of water-level gages in Everglades Depth Estimation Network (EDEN). Water Conservation Areas (WCA) 2 and 3 are subdivided by canals. WCA3A is further subdivided into a northern (WCA3AN) and a southern (WCA3AS) region (from Pearlstine and others, 2007). [larger image]
The Everglades Depth Estimation Network (EDEN) is an integrated network of 253 real-time water-level gaging stations, ground-elevation models, and water-surface models designed to provide scientists, engineers, and water-resource managers with current (2000-present) water-depth information for the entire freshwater portion of the greater Everglades (fig. 1). A spatially-continuous interpolated water surface across the greater Everglades is generated from daily median water-level values (fig. 2). Missing or erroneous data compromise the quality of the modeled water-surface elevations. To increase the accuracy of the daily water-surface model, two applications were developed to (1) evaluate the data quality at each station and (2) estimate water levels to fill data gaps.

Approach

Two applications were developed to facilitate the 1) evaluation of the data, and 2) estimation of missing water-level data. The EDEN Data Evaluation Program (EDEN DEP) uses a series of tunable filters developed from the historical database and are used to evaluate the data for each site based on historical behaviors an comparison to other sites of similar behavior. Filters include time derivatives to evaluate various rates of changes and differences with time series from other sites. The EDEN Data Gap Estimation Program (EDEN GAP) uses linear regression models to estimate missing data for each site in the network (Conrads and Petkewich, 2009). To minimize the inability to estimate data due to a missing data from an input site, three or four regression equations were developed for each site using different input sites.

example of Everglades Depth Estimation Network water-surface maps for a wet season day, and for a dry season day
Figure 2. Example of Everglades Depth Estimation Network (EDEN) water-surface map for a (A) wet season day and (B) dry season day (from Pearlstine and others, 2007). Vertical datum is North American Vertical Datum of 1988. [larger image]

EDEN Data Gap Estimation Program

To increase the accuracy of the daily water-surface elevation model, linear regression equations to estimate missing data for each gaging station in EDEN were developed (Conrads and Petkewich, 2009). To minimize the occurrences of no estimation of data due to missing data for an input station, a minimum of three linear regression equations were developed for each station using different input stations. For each site, an order was established for the regression equation to be used to fill a data gap.

The 726 equations were incorporated into a database application that automatically estimates missing record (EDEN GAP). The performance statistics computed for each equation provides documentation of the "goodness-of-fit" of the equations (table 1). In addition, although the majority of the equations provide satisfactory estimations of water levels, the performance statistic provides a prioritization for identifying stations where improved equations are needed to provide more satisfactory water-level estimates.

Table 1. Minimum, median, and maximum values of the summary statistics for the 726 estimation equations.

[R2, coefficient of determination; RMSE, root mean square error; %, percent]

Statistic Minimum Median Maximum
R2
0.01
0.94
1.00
Mean error
-0.19
0.00
0.25
RMSE
0.02
0.17
1.24
Standard error
0.02
0.16
1.04
Nash-Sutcliffe
0.01
0.94
1.00
Percent model error
0.4%
4.7%
21.1%
Percent model bias
-38.3%
0.0%
32.3%

EDEN Data Evaluation Program

flow diagram showing the transmission, storage, processing, inferential sensing, and access of data from the water-level data collected in the field to website access by users
Figure 3. Flow diagram showing the transmission, storage, processing, inferential sensing, and access of data from the water-level data collected in the field to website access by users (modified from Telis, 2006). [larger image]
Another challenge for EDEN is the need to efficiently detect erroneous data for reasons such as sensor, communication, and other types of hardware failures. Detecting these failures can be time consuming and problematic, especially when data problems are not obvious by visual inspection, for example, detecting a drifting sensor.

The development of the EDEN Data Evaluation Program (EDEN DEP) is the first phase of a project to address these data validation issues by developing an intelligent software application to automate the validation and correction of the data. As shown in Figure 3, the software, hereafter referred to the Inferential Sensor, will reside between the National Water Information System (NWIS) server and EDEN web applications.

As part of Phase I of the development of the Inferential Sensor software, the EDEN DEP was developed to pre-process raw data to prepare it for automated analysis. A series of tunable filters were developed from the EDEN database and are used to evaluate the data for each site based on historical behaviors. Filters include thresholds based on historical values and user-specifications and time derivatives of rates of changes over specified periods

Other elements of EDEN DEP are:

  • Import raw data
  • Pre-process raw data to prepare it for automated analyses
  • Validate raw data using a univariate filter analysis
  • Provide for user review of the validated data
  • Save validated data to a Microsoft ACCESS1 database for use by 3rd party programs

Summary

Data-quality evaluation and estimation of missing data can be a time-consuming process, especially for a network as large as EDEN with 253 gaging stations. To increase the accuracy of the daily water-surface elevation model, two applications were developed to address data-quality issues from the network. One program, EDEN GAP, estimates water levels to fill data gaps. The other program, EDEN DEP, uses a series of tunable filters to validate the data. The two programs effectively and efficiently address data-quality issues by automating many of the processes for data estimation and data validation and will improve the consistency and utility of the EDEN data.

References

Conrads, P.A., and Petkewich, M.D., 2009, Estimation of missing water-level data for the Everglades Depth Estimation Network (EDEN): U.S. Geological Survey Open-File Report 2009-1120, 53 p.

Pearlstine, L., Higer, A., Palaseanu, M., Fujisaki, I., and Mazzotti, F., 2007, Spatially continuous interpolation of water stage and water depths using the Everglades Depth Estimation Network (EDEN): Gainesville, Fl, Institute of Food and Agricultural, University of Florida, CIR 1521, 18 p., 2 apps.

Telis, Pamela A., 2006, The Everglades Depth Estimation Network (EDEN) for Support of Ecological and Biological Assessments: U.S. Geological Survey Fact Sheet 2006-3087, 4 p.



| Disclaimer | Privacy Statement | Accessibility |

U.S. Department of the Interior, U.S. Geological Survey
This page is: http://sofia.usgs.gov/publications/posters/ncer09_auto_eden/index.html
Comments and suggestions? Contact: Heather Henkel - Webmaster
Last updated: 16 March, 2011 @ 04:09 PM(TJE)