Area 2: Data Analysis, Machine Learning and HCI

Area 2 looks at data that is increasingly driving innovation in the information age.

Data Analysis

This subtopic looks at fundamental questions of data analysis.

Tools for Data Analysis: (PIs: Böhm, Bläsius) Algorithms and methods for data science have matured – but then the gap between them and usable tools has just become more obvious. However, the problem is not unsolvable – database systems are a shining example of useful technology to process data, and the abstraction of their interfaces has proven just right for many purposes. Here we propose far-reaching and fundamental research that generalizes their principle to data science. Data-analysis tasks shall be primitives that the system can flexibly arrange and fit into the evaluation of more abstract information needs. We will initially focus on three examples: (1) Complex event processing where atomic events are changes of correlation of time series/of data streams. One use case is monitoring and predictive maintenance of scientific facilities like bioliq. (2) Query languages for moving objects shall be generalized for movements in dynamic attributed graphs. One use case is dislocations in structural materials science. (3) Synthetic data generation where characteristics of the data are specified declaratively. Realistic synthetic data is useful to benchmark algorithms and software. 

Machine Learning

Although a subarea of data analysis, machine learning has recently gained so much traction that we single it out as a separate subtopic.

Efficient Learning from Heterogeneous Training Signals: (PIs: Friederich, Neumann, Nie- hues, Stiefelhagen, Stühmer, Waibel) Machine Learning (ML) methods form the core of modern AI, yet the unspoken truth is that ML, particularly Deep Learning, still requires huge amounts of data for effective learning. This is neither practical nor biologically plausible. To tackle this challenge, we investigate new learning paradigms which go beyond traditional supervised learning with large amount of homogeneously labeled data. In particular, we focus on the development of methods that enable learning from different levels of supervision (learning from weakly-labeled data, self-supervised learning), sharing knowledge between different tasks (representation and transfer learning, meta learning and modular learning), and learning in applications which require continuous adaptation (active and incremental learning). To achieve the goal of reducing dependency on big labeled data sets, we also need to explore application-relevant inductive biases for the listed paradigms, e.g., the use of graph neural networks, transformers, latent variable models, etc. Apart from advantages in accuracy for real-world data sets, our research will also enable better interpretability of models and their predictions and allow opportunities in new application areas. The ability to deal with small data sets and use small modular network architectures will also directly feed in the focus activity “Memory Centric Computing” (Tahoori). Further, we will investigate the application of our methods in the subtopic “Algorithm Engineering”, e.g., for the development of particular heuristics for NP-hard problems (Sanders). Success along these lines have considerable impact for autonomous learning in a changing world, respond to more individual needs and requirements, and lower the dependency on large data aggregators. It will boost application in a wide range of application scenarios, not only in well established AI-areas (e.g. natural language processing, computer vision and robotics), but particularly also in relevant Helmholtz research fields, including chemistry, materials science, or medicine.

Human Computer Interaction and Visual Computing

While data analysis is concerned with incoming data, in this subtopic we look at outgoing data which at the same time is an important aspect of the final stage in data analysis.

Human Computer Interaction (HCI): (PI: Beigl) This is an important and central topic in computer science. An indication of this is that the two largest ACM conferences, the CHI and the Ubicomp/ISWC, are entirely or partially dedicated to this topic. However, in recent years research has evolved from exploring fundamental questions about the application of HCI to a variety of novel applications and technologies. This research project aims to bring the interesting results from over 100 specific topics (CHI2022) back into a more general format. We do this by extracting general interaction patterns, general presentation expressions and methods, and general psychological explanations from the specific works. The proposed work is therefore designed as ongoing fundamental research, in which the corpus of patterns, methods and explanations is sharpened and expanded whenever new technologies or application domains become available. The methodology to find such a common basic expression will be both AI-inspired and grounded in human psychology: e.g., using NLPs to find structures in the many research papers and autoencoders to search for minimal and condensed expressions and patterns and their explanation from human psychology.

Advanced Visualization: (PI: Dachsbacher) Knowledge acquisition from large, complex data requires not only automatic analysis techniques, but also interactive visualization to take advantage of the high bandwidth of human perception and to enable exploration and understanding of large amounts of information. One challenge for large-scale simulations is to reconcile online processing and post-hoc visual exploration. We will develop novel distributed visualization and data compression techniques (in cooperation with PI Sanders). The basis for such perceptually-effective visualization is rendering using (physically-based) Monte Carlo image synthesis. Dachsbacher’s group is leading in this field, which became standard and a driving force across various industries leveraging photo-realistic computer graphics. However, it also offers tremendous potential beyond merely computing plain images, e.g., for multispectral, time-of-flight, or differentiable rendering (optimization-by-synthesis). Many applications ranging from computer vision, generating AI training data, to planning future cities and solar farms benefit from more efficient, accurate, and robust image synthesis methods, which are a focus of this research activity.