DFG Research Training Group 2153: "Energy Status Data - Informatics Methods for its Collection, Analysis and Exploitation"

M.Sc. Edouard Fouché

  • Lehrstuhl
    Prof. K. Böhm

    Karlsruher Institut für Technologie
    Am Fasanengarten 5
    76131 Karlsruhe

Research Abstract

The analysis of energy status data is challenging. First, the data is often available as a stream. By nature, data streams are infinite; they are evolving over time and can be aggregated at multiple time scales. Most existing methods for data analysis only apply to static data, they cannot accommodate data streams. Second, the information produced by energy systems is diverse and high-dimensional. The effects of high- dimensionality – summarized as the “curse of dimensionality” – lead to the deterioration of the performance of traditional statistical approaches. Third, energy status data are typically unlabeled, making supervised learning algorithms impractical.

To cope with high-dimensionality, it is a common approach to detect the most relevant features or feature subsets in the system. However, the solution space grows exponentially with dimensionality, such that it becomes intractable. Also, to understand the data, one is interested in finding different representations, for example by deriving new or hidden variables or by shrinking the data space to its most prominent components. Obviously, the space of possible feature transformations is virtually unbounded. Due to the nature of data streams, the relevance of features and feature transformations is also likely to change over time.

The focus in this PhD project is the development of unsupervised learning algorithms to dynamically estimate the relevance of features and feature subsets in data streams. To do so, it is required to design time-dependent quality estimates for feature subsets and corresponding heuristics to perform time-dependent search. Such results can be used in combination with feature transformation methods for the automatic generation of relevant feature representations. This is challenging due to the evolving and infinite nature of data streams. Also, in most real-life scenarios, results should be available in a timely manner, such that the algorithms are required to be very efficient.

The goal of the developed algorithms and models is to bring a significant contribution in the field of predictive maintenance for energy systems. Our industry partner, Bioliq, will benefit from this work in the monitoring of their production plant. For example, by the timely discovery of anomalies and patterns in sensor streams.