USGS - science for a changing world

South Florida Information Access (SOFIA)

projects > greater everglades hydrology monitoring network: data mining and modeling to separate human and natural hydrologic dynamics > work plan

Project Work Plan

U.S. Geological Survey, Greater Everglades Priority Ecosystems Science (GE PES)

Fiscal Year 2005 Study Work Plan

Study Title: Hydrology Monitoring Network: Data Mining and Modeling to Separate Human and Natural Hydrologic Dynamics
Study Start Date: 10/01/2004 Study End Date: 9/30/2007
Web Sites:
Location (Subregions, Counties, Park or Refuge): Total System
Funding Source: USGS Greater Everglades Priority Ecosystems Science (GE PES)
Other Complementary Funding Source(s): none
Principal Investigator(s): Paul Conrads
Study Personnel: Paul Conrads, Ed Roehl, Whitney Stringfield
Supporting Organizations:
Associated / Linked Studies: Freshwater Inflows to Northeastern Florida Bay (Hittle, PI), Estimation of Critical Parameters in Conjunction with Monitoring of the Florida Snail Kite Population (Wiley Kitchens, PI)

Overview & Objective(s):

New technologies in environmental monitoring have made it cost effective to acquire tremendous amounts of hydrologic and water-quality data. Although these data are a valuable resource for understanding environmental systems, often there is seldom a thorough analysis of the data. The monitoring network(s) supported by the Comprehensive Everglades Restoration Plan (CERP) records tremendous amounts of data each day and the data base incorporates millions of data points describing the environmental response of the system to changing conditions. To enhance the evaluation of the CERP data base, there is an immediate need to apply new methodologies to systematically analyze the data set to answer critical questions such as relative impacts of controlled freshwater releases, tidal dynamics, and meteorological forcing on streamflow, water level, and salinity. There also is a need to integrate longer-term hydrologic data with shorter-term hydrologic data collected for biological resource studies. This study will be undertaken as a series of pilot studies to demonstrate the efficacy of data mining techniques to evaluate CERP data and address hydrologic issues important to DOI's efforts in South Florida. In addition, preliminary assessment of the complete set of hydrologic data networks for further integration and analysis using data mining techniques will be conducted.

The objectives of the study include: (1) integration of hydrologic analysis and synthesis with biological studies; (2) separation of water level, stream flow, and salinity time series into the natural (tidal, climate) and anthropogenic components; and, (3) identification of additional areas where application of data mining techniques can address the DOI science needs in South Florida.

Specific Relevance to Major Unanswered Questions and Information Needs Identified: (Page numbers below refer to DOI Science Plan.)

This study addresses research needs as described in the Science Plan in Support of Ecosystem Restoration, Preservation, and Protection in South Florida (May 2004). Specifically, the study supports the Water Conservation Area 3 Decompartmentalization and Sheetflow Enhancement Project (DECOMP) by addressing the science needed for “...additional research to understand the effects of different hydrologic regimes and ecological processes on restoring and maintaining ecosystem function…” (p.64) and supports ecological studies of impacts of hydrologic change on Everglade snail kite habitat. The study also supports the Combined Structural and Operational Plan project (CSOP) by addressing the needed science for “…refinement of hydrologic targets and operating protocols (p. 63).”


New study-initial year of funding

Recent Products:


Planned Products:

Major products include (1) data bases of the measured and derived hydrologic data that will be used for integration with the Everglades snail kite study and analysis of freshwater inflows; (2) artificial neural network (ANN) models used to hindcast long-term water level response at 20 sites in WCA 3a; (3) ANN models used to analyze freshwater inflows for natural and anthropogenic components; and (4) a summary document describing the assessment of data networks for further integration and analysis using data mining techniques.


Title of Task 1: Integration of Long-term Hydrologic Data with Snail Kite Study
Task Funding: USGS Greater Everglades Priority Ecosystems Science (GE PES)
Task Leaders: Paul Conrads
Phone: (803) 750-6140
FAX: (803) 750-6181
Task Status (proposed or active):
Task priority: high
Task Personnel: Paul Conrads, Ed Roehl, and Whitney Stringfield

Task Summary and Objectives:

The monitoring network for the snail kite study has established an array of 20 continuous water-level monitors to understand differences in hydrology in the study area. To maximize the information content, empirical models using data mining techniques will be developed to (1) predict the water levels at the long-term water-level stations to changing hydrologic inputs, and (2) predict the water level at the 20 short-term monitoring stations. After completing these models, the period of record of the short-term monitoring stations can be extended to be concurrent with the three long-term stations. The hydrologic record extension will allow researchers to analyze the water depth and hydroperiods over a large range of hydrologic conditions and to integrate long-term ecological data with the extended hydrologic data.

Work to be undertaken during the proposal year and a description of the methods and procedures:

To simulate the water level response at the 20 short-term snail kite monitoring sites, ANN models will be developed using long-term water-level data in the study area. The ANNs will then be used to extend the period of record of the short-term monitoring sites. The steps to be taken are described below.

Step 1. Data Compilation and Merging Hydrologic, meteorological, and operational data from the USGS, the National Weather Service, and other databases will be merged and time synchronized. Variables of interest include river flows, freshwater releases, water levels, specific conductance, wind direction and speed, and rainfall.

Step 2. Data Preparation Methods will be used to maximize the information content in the raw data, while diminishing the influence of poor or missing measurements. Signal (time series) processing methods include clustering, filtering, spectral decomposition, estimation of data characteristics and time delays, and synthesizing missing data. Signal processing transforms the “raw” data into “pre-processed” data for analysis and modeling. The data collected from the agencies have different sampling frequencies, ranging from every 15 minutes to once per month. The variables must be “time-merged” by either interpolating between less frequent measurements, or by averaging frequent samples to obtain fewer values.

Another signal processing task is “signal decomposition.” The complex behaviors of the variables of a natural system result from interactions between multiple physical forces. Signal decomposition involves digital filtering to split a signal into sub-signals, called “components,” that are independently attributable to different physical forces. Some components are periodic and some are chaotic. The filtering method of choice is frequency-domain filtering. It is applied to a signal after it has been converted into a frequency distribution by Fast Fourier Transform. This allows a signal component that lies within a window of frequencies, for example, the 12.4-hour tidal cycle lies between periods of 12.0 to 13.0 hours, to be excised, analyzed, and modeled independently of other components. Digital filtering also can diminish the effect of noise in a signal to improve the amount of useful information that it contains. Working from filtered signals makes the modeling process more efficient, precise, and accurate.

Step 3. Correlation Analysis and Sensitivity Estimation Correlation analysis quantifies the relationships between many variables and provides deeper understanding of the data. The computer systematically correlates factors that influence parameters of interest, such as water level, to combinations of controlled and uncontrolled variables, such as river flows, controlled releases, and meteorological conditions. Correlation methods based on statistics and machine learning are applied in combination. Comparing them to known patterns of behavior validates results found by the computer. Correlation analysis identifies:

  1. Relative impact - For example, “What are the relative impacts of operational releases, and tidal forcing on water levels?”
  2. Relationships between controlled (river flows) and uncontrolled variables (meteorology and tidal forcing).
  3. Quantifiable answers to complex questions - For example, “What is the critical temporal relationship between the freshwater releases and water levels at the long-term stations and short-term stations?”

Step 4. Predictive Modeling Using machine learning, a predictive model is developed directly from the data and correlations determined in Steps 2 and 3. To maximize accuracy, the model is constructed from sub-models, which independently correlate periodic and chaotic components. Their outputs are combined to obtain an overall prediction that manifests all of the different forcing functions, represented by input variables, which affect the output variables.

Step 5. Develop Long-Term Water Level Data For the connection between hydrologic response and the snail kite, many of the water level gages were only operated for a limited number of years. Long-term hind casts of water level for the period of record for the long-term monitors will be produced using the data and correlations of Steps 2 and 3 and predictive modeling of Step 4.

Step 6. Identify Appropriate Applications for Dissemination of Results There are many ways that the results from the data mining applications can be disseminated to the end user. The ANN models can be embedded into an Excel application and coupled with an optimization routine to analyze operational practices on water levels and freshwater inflows. For other analyses, a response surface viewer can be distributed that allows scientists to interrogate ANN models and analyze the interactions of various explanatory variables on a response variable. Other advanced visualization techniques can be used to view model output, including 2-D and 3-D animations. The appropriate application for the dissemination of the results will be discuss with the PIs of the Everglades snail kite studies for development during Year 2 (2006).

Step 7. Present Result in Appropriate Technical Conference The results of the project will be documented in manuscript for proceedings papers or journal articles and presented at appropriate regional and national environmental or neural-network conferences.

Specific Task Product(s):

  1. Data base of long-term and short-term hydrologic data and derived variables (April 2005).
  2. ANN models used to hind cast 20 short-term monitoring sites to period-of-record of long-term sites (July 2005).
  3. Manuscript summarizing development and application of ANN models (August 2005).

Title of Task 2: Analysis of Water Level, Streamflow, and Salinity Signals
Task Funding: USGS Greater Everglades Priority Ecosystems Science (GE PES)
Task Leaders: Paul Conrads
Phone: (803) 750-6140
FAX: (803) 750-6181
Task Status: Active (first year)
Task priority: High
Task Personnel: Paul Conrads, Ed Roehl, and Whitney Stringfield

Task Summary and Objectives:

The freshwater inflow dynamics into Florida Bay and rivers in the Everglades National Park are constantly integrating various changing conditions such as low-gradient streamflows, semi-diurnal tides, and meteorological forcing. Only a portion of these forces are controlled by operational practices. Using data mining techniques and systematically decomposing the time series and decorrelating variables, high fidelity empirical models will be developed that can assist the analysis of the relative contribution of the major forces on the streamflow and salinity dynamics of the system.

Work to be undertaken during the proposal year and a description of the methods and procedures:

To understand the relationships between streamflows, tidal forcing, water levels, meteorological forces, controlled freshwater releases, and salinity, artificial neural network models will be developed to predict the parameter of interest at various locations. The steps will be very similar to those for Task 1. For Step 5, the predictive models developed in Step 4 will be used to analyze the relative contribution of controlled releases and natural conditions on freshwater inflows.

Specific Task Product(s):

  1. Data base of water levels, flow, salinity, and controlled release time series and derived variables used for analysis (June 2005).
  2. ANNs models used to analyze freshwater inflows for natural and anthropogenic components (August 2005).
  3. Manuscript summarizing development and application of ANNs models (September 2005).

Title of Task 3: Assessment of Hydrologic Data Networks for Further Analysis and Integration
Task Funding: USGS Greater Everglades Priority Ecosystems Science (GE PES)
Task Leaders: Paul Conrads
Phone: (803) 750-6140
FAX: (803) 750-6181
Task Status: Active (first year)
Task priority: High
Task Personnel: Paul Conrads and Ed Roehl

Task Summary and Objectives:

For a system as complex as the Everglades with long-range restoration issues, there are many areas where the application of data mining can be beneficial to DOI's support of ecosystem restoration in South Florida. Of particular interest, as indicated by the pilot projects in Task 1 and Task 2, are areas where data mining can provide necessary spatial and/or temporal integration of hydrology and biological research using existing data sets.

Work to be undertaken during the proposal year and a description of the methods and procedures:

During Year One of the Data Mining and Modeling study, the PIs will meet with other researchers and managers from Federal and State agencies to assess the hydrologic data network, discuss research needs, and identify potential data mining applications for Years 2 and 3 of the project. After identifying priority issues to address, the availability of existing data, modeling approach, and appropriate deliverable products will be determined.

Specific Task Product(s):

Summary document of the assessment of the data network for addressing specific restoration, preservation, and protection issues will be presented to the Coordinator of the Greater Everglades Priority Ecosystems Science and his staff. Document will also include potential areas for study in Years 2 and 3.

| Disclaimer | Privacy Statement | Accessibility |

U.S. Department of the Interior, U.S. Geological Survey
This page is:
Comments and suggestions? Contact: Heather Henkel - Webmaster
Last updated: 04 September, 2013 @ 02:08 PM(KP)