Bacti: Early Warning System for Prediction of Bacterial Concentration Exceedance in Tidal Waters

Bacti is a project to develop a tool (or suite of complementary tools) to deliver rapid forecasting of bacterial concentration exceedance in tidal waters where these arise as a result of trigger events such as rainfall, wind direction, Combined Sewer Flow (CSO) operation, etc. It aims to facilitate meeting the requirements of European Commission Revised Bathing Water Directive (2006/7/EC) (rBWD). The project focuses on utilising machine-learning modelling tools that can also deliver acceptable levels of accuracy.  We also consider simple transferability so that it can be utilised widely at different bathing waters and shellfish waters.

The modelling tool will have potential applications in providing forecast water quality at bathing waters and shellfish waters to assist with water management actions, and active incident management.  Also it is intended as a tool to inform retrospective investigations into water quality non-compliance (in particular source apportionment at different bathing beaches).

Beaches

In order to make the results from Bacti applicable to the everyday world, the work is being undertaken and applied to several 'case study' beaches, selected for their diverse bathing water quality history and catchment profiles. The results from each of these beaches will be able to be adapted and applied to other case study beaches in Bacti, and then hopefully on a much wider scale. The case study beaches in Bacti are:

  • Burnham Jetty, Somerset
  • Combe Martin, Devon
  • East Looe, Cornwall
  • Ilfracombe (Capstone), Devon
  • Lyme Regis (Church Cliff), Dorset
  • Mothecombe, Devon
  • Par, Cornwall
  • Porthluney, Cornwall
  • Readymoney, Cornwall
  • Rock, Cornwall
  • Seaton, Cornwall

Exeter's involvement

Specific Objectives

The objectives for this project are:

  • Project steering group:  The Centre for Water Systems(CWS) is lead partner. 
  • Delivery to be mainly CWS, but may be complemented by further Environment Agency of England and Wales (EA) in-house work.
  • Review current simple modelling tools and techniques, including the existing developed EA Decision Trees (DT) Model for bathing water quality forecasting.
  • Review historic data and knowledge of designated bathing waters and shellfish waters, and identify suitable trial locations.  Consider both ‘dirty’ and ‘clean’ data sets.
  • Consider suite of potential ‘trigger’ events and model input parameters, e.g.: Current and antecedent rainfall (and dry periods); tide state and height (including tide-locking of fluvial and drainage systems); wind direction, speed, and duration; CSO operation; Surface water overflow / flooding; river flow; catchment wetness; hours of sunshine etc.
  • Undertake extensive data mining and analysis of existing historic data sets to isolate data clusters associated with particular trigger events.
  • Consider need, for additional water quality data, e.g. daily sampling, or intensive temporal or spatial surveys.  If need is identified, design an appropriate sample programme at agreed bathing water and / or shellfish water locations.  Implementation can be EA led or outsourced.
  • Develop machine-learning modelling techniques to forecast bacteria concentrations based on trigger events.
  • Validate model predictions against real sample data.
  • Review model performance and consider validity for future live field trials.

Location and nature of sampling / other data sources

Sampling has been carried out manually at case study beaches since 2000-2 and the present. Samples are gathered weekly, during the bathing season (May-Sept); so on average 20 samples per year have been taken. During the 2012 bathing season, daily sampling was carried out for a small selection of case study beaches. Where present, samples are also taken in rivers in the area above the beach. Water quality is assessed in the lab for faecal coliform (FC) and faecal streptococci (FS) bacteria colony counts. Salinity readings are also taken. Antecedent rainfall is assessed from the nearest available raingauge to each beach. Times of sample are recorded; tidal heights are taken from standard tables. CSO spill and river flow data is available where applicable. Source apportionment data relating to catchments has been provided by EA. Other weather data is potentially available via BADC website.

Programme of work

  • Task 1 – Literature survey

To research, read and review existing studies and tools relating to bathing water quality assessment and early warning systems, including, but not limited to the existing EA Decision Tree system and the South West Water (SWW) Beach Live system.

Party/parties responsible for delivery

University of Exeter (UoE).  EA to provide any reports they consider relevant, or which they would wish to see included in the review.

  • Task 2 – Data source identification

Identify and review data relevant to the development and / or operation of forecasting tools.  Data may include (for each catchment & beach location): All Available FC + FS sample counts (bathing season); Antecedent rainfall (various periods, Wiski raingauge and/or Nimrod radar); Tide data; Salinity data; River flow data; Soil wetness data (NAPI);  Geographical Information System (GIS) data relating to each catchment and location of sample sites, and (where available): CSO spill data; Hours of sunshine; other data.

Party/parties responsible for delivery

Environment Agency to provide rainfall, river flow, and water quality data for the agreed study sites.  UoE to provide results data, produced from running machine-learning models and comparisons with EA decision tree model results.

  • Task 3 – Review of Machine Learning (ML) tools

Options for using a variety of machine learning classifier models in a Receiver-Operating-Characteristic (ROC) scenario will be investigated. These may include Artificial Neural Networks (ANN) and/or Bayesian Belief Networks. Potential for pre-and post-processing of data including waveform decomposition techniques may be investigated. Techniques are to be compared against the baseline of the existing EA decision tree-based model and a simple 10-mm rainfall in the antecedent 24-hours threshold.

Party/parties responsible for delivery

UoE to include findings in final report.

  • Task 4 – Historic data analysis

The data provided covers the period 1/5/2000 to 31/10/2012. Analysis of this data will provide statistics relating to bathing sites and trends over the period of analysis.

Party/parties responsible for delivery

UoE to provide results of analysis needed to assess model skill at each site.

  • Task 5 – Tidal waters data sampling

Intensive water quality sampling at study sites to provide model calibration and verification data should greatly enhance model accuracy.  Having identified suitable sites (i.e. where exceedances are relatively likely to occur), arrange an intensive sampling schedule over a tidal cycle, and / or daily or twice daily during normal working hours over a bathing season or yearly period.  Target start date in April 2012. Sampling should be for FC and FS as per statutes.

Party/parties responsible for delivery

Environment Agency to bid for resources to undertake surveys during the period April 2012 to March 2013.  University of Exeter may bid for additional project funding, part of which may be used to undertake / fund all, part, or other relevant survey work.

  •  Task 6 – Analysis of the EA decision tree tool

Consider the modelling tools and data sources used, and modelling approach applied, with the existing EA decision tree model.  Identify opportunities to apply this historic work to this project.  Modify EA’s SPSS Version V15 Code to run on Version 19 (available at the University of Exeter).  Confirm results identical with those obtained at EA.  Research standard best practice for Decision Tree models.  Compare EA decision tree model methodology and structure with standard best practice for DT models and report.

Party/parties responsible for delivery

Joint Environment Agency / University of Exeter meeting.  Environment Agency to provide copies of models and data to the University of Exeter.  UoE to provide report.

  • Task 7 – Evaluation of EA decision tree tool

For each site, using Leave-One-Out-Cross-Validation (LOOCV) (where data blocks are the respective years of data samples), run EA DT model with a variety of input data; vary parameters in an ROC scenario, to establish optimum operating point and input data set of EA DT model.  Report.

Party/parties responsible for delivery

Joint Environment Agency / University of Exeter meeting.  Environment Agency to provide copies of models and data to the University of Exeter.  UoE to provide report.

  •  Task 8 – Development of Machine Learning prototype tool(s)

Code to be written in MATLAB.

Use data-driven approach to model construction. Target data is to be extracted from FC and FS observation data exceedances of statutory thresholds. Input data to be organised so as to be able to include/exclude any parameter in/from any given trial. Training and validation data sets to be selected at random or using LOOCV, where each data block is one year’s worth of observation history. Develop code for a variety of neural network classifier models (time-independent for weekly sample intervals and time-dependent for rapid sampling interval trial), with a variety of inputs, so as to be able to vary exceedance classifier threshold on output, in an ROC scenario, to establish optimum operating point for models. Code is to run automatically, based on configuration setup file and output data automatically into MS Excel 2007 format workbooks.

Party/parties responsible for delivery

Joint Environment Agency / University of Exeter meeting.  Environment Agency to provide copies of data to the University of Exeter.  UoE to provide models, and description in report.

  •  Task 9 – Evaluation of Machine Learning prototypes

For each site, using a variety of neural network classifier models, with a variety of inputs, vary exceedance classifier threshold on output, in an ROC scenario, to establish optimum operating point for model. Compare results with those obtained from EA decision tree model and with each other. Establish the overall “best model” and its optimum operating point. Consider generalisation of model to predict at multiple sites. Use 2012-3 rapid sample observations in a separate (time-dependent) trial, to assess ability of model to predict exceedances and/or FC+FS levels at the next few sampling intervals, and hence make rapid real-time predictions, based on real-time monitoring of input parameters only (not FC/FS samples). (FC+FS samples would still be needed for periodic model re-calibration).

Party/parties responsible for delivery

Joint Environment Agency / University of Exeter meeting.  Environment Agency to provide copies of data to the University of Exeter.  UoE to provide models, results and evaluation in report.

  • Task 10 – Thesis writing

Write up research, results and analysis – based on all of above.

Party/parties responsible for delivery

University of Exeter.

 Partners