Information Overload in Water Systems

PM: What does not get measured does not get managed. This is a principle I subscribe to. But you need a second principle to make this work: Avoid information overload. Here is an example of how a company figure out how to analyze large amounts of data to identify useful information that can be acted upon.

Pipe Dreams: To plug leaks from the water supply, you first have to find them.
An effective way of detecting leaks [in municipal water systems], both accidental and deliberate, would therefore be welcome.
TaKaDu, a firm based near Tel Aviv, thinks it has one. The problem, in the view of its founder, Amir Peleg, is not a lack of data per se, but a lack of analysis. If anything, water companies—at least, those in the rich world—have too much information. A typical firm’s network may have hundreds, or even thousands, of sensors. The actual difficulty faced by water companies, Dr Peleg believes, is interpreting the signals those sensors are sending. It is impossible for people to handle all the incoming signals, and surprisingly hard for a computer, too.

TaKaDu’s engineers have therefore developed a monitoring system called a statistical anomaly detection engine that is intended to identify clues in the data which might otherwise be missed. It applies a range of statistical tests (linear-regression analysis is one of the more familiar) to the data stream, and thus works out when the incoming signals are deviating significantly from normal behaviour. Sometimes such deviations are caused by faulty meters. Sometimes they are caused by leaks. Either way, that is valuable knowledge.

To know what is unusual you have, of course, to know what is normal. Even a 1% change in flow rate can sometimes be significant, if it is persistent, but that is not always the case. Existing leak-detection systems therefore have thresholds built into them, to avoid false alarms. The price of this is that small leaks may go undetected and thus unrepaired, which can lead to larger leaks later. The detection engine attempts to work out what is important by using a process of continuous modelling to define normality. This identifies both obvious patterns—such as daily, weekly and annual flow-rates—and subtle ones, such as correlations between the behaviours of widely separated parts of the system that are brought about by things like similarities in network layout or in the behaviour of local customers.


Full Story on Economist.com