Chair of Data Science and Data Engineering

Prof. Dr. Emmanuel Müller
Chair of Data Science and Data Engineering
Department of Computer Science
Technical University of Dortmund

Principal Investigator LAMARR Institute
for Machine Learning and Artificial Intelligence

Founding Director RC-Trust.AI
Research Center Trustworthy Data Science and Security

E-Mail:
emmanuel.mueller(at)cs.tu-dortmund.de

Otto-Hahn-Straße 14
Campus Nord
D-44227 Dortmund

The new chair of Computer Science 9 is looking forward to welcome you in person at our TU Dortmund offices!
In the meantime, please feel free to check out our online lecture for a quick introduction or have a look into our recent machine learning research.

Open PhD Positions Methods and Algorithms in Machine Learning We are continuously hiring PhD students in the research areas of Representation Learning on Complex Data, Explainable and Trustworthy Anomaly Detection, Verifiable Predictive Analytics with a strong background in statistics, linear algebra, scalable algorithms, and formal methods. If you have interest in these research areas apply for open PhD positions.

Research and Teaching Overview

Our research covers data mining, machine learning, scalable algorithms, and interactive exploration for high dimensional data, complex graphs, time series, and data streams. The chair is leading and contributing to several open-source initiatives enabling repeatability and comparability for the research community. We have organized several tutorials and workshops at major data mining, database, and machine learning conferences, and edited a special issue for the Machine Learning Journal. In the past few years, we have initiated and coordinated various education programmes for “Data Science” and “Data Engineering”: One on the level of university education (M.Sc. programme), two graduate schools (PhD programmes) and multiple executive education programmes for industry.

Research Activities

Knowledge Discovery and Data Mining

Knowledge discovery and data mining, as part of many scientific and industrial applications, does not end with the execution of algorithms. With data mining algorithms, resulting in the discovery of unknown, novel, and unexpected patterns, one should aim at assisting humans in their daily decision making. On the one side, we investigate efficient algorithms, which scale with size and complexity of the data. Moreover, on the other side, our algorithms generate verifiable and explainable knowledge for human users.

PostDoc positions within our chair are open to complement our research on representation learning, anomaly detection, predictive analytics and explainable machine learning. We are looking for candidates with a strong background in machine learning, scalable algorithms, or statistical methods.

Data Mining Topics Algorithms for Big Data Analytics

Our research addresses theoretic challenges in correlation analysis, representation learning, (un-)supervised feature selection, cluster and outlier detection as well as practical challenges in efficient computation of these models in large and complex datasets. The development of novel techniques for complex data spaces (e.g. graph structures, time series, data streams, or high dimensional data) is a particular challenge in this area. We overcome information loss and scalability challenges of traditional data mining techniques that assume homogeneous data and enable big data analytics on heterogeneous datasets. Our chair investigates algorithms for the selection of relevant attributes in high dimensional data, correlations in time series data, change in multivariate data streams, and similarity structures in graph data.

Verifiable Knowledge Discovery for Human Users

Our research aims at an easy to understand presentation of data analytics results. We represent intrinsic dependencies between different information sources for human users. Our research includes exploring the automatic extraction of dependencies and pattern descriptions, which is a significant research contribution for many applications where patterns have to be verified by the users. Human users require such descriptions of potential reasons for each of the detected patterns. Hence, we have proposed verifiable descriptions for learned representations, unexpected patterns, user-driven data exploration, and explainable data profiling.

Teaching Activities

Big Data Analytics Data Science Education

In our lectures, we cover fundamental concepts in the field of Big Data Analytics for students in B.Sc./M.Sc. Computer Science and B.Sc./M.Sc. Data Science programmes. Techniques for the analysis of large and complex datasets have a significant impact in many industrial and scientific applications. In science, industry, and society, in general, there is the necessity of understanding complex data by extracting valuable patterns from a multitude of datasets. In our courses, we introduce the systematic processing of large data volumes as a precondition for both human data understanding and automatic data analysis. We teach fundamental data analytics techniques applicable to different domains in science and industry.

In the Winter Term we offer the introductory lecture "Big Data Analytics", a Seminar "Machine Learning for Sequential Data and Graph Data", and an advanced lecture "Data Science Challenges in Practice". The basic lecture is open for students without prior experience in machine learning, while the advanced lecture, the seminar, and thesis projects require fundamental knowledge in machine learning, scalable algorithms, or statistical methods.

In the Summer Term we offer the advanced lecture "Machine Learning Paradigms for Complex Data", a project group on "Verifiable and Explainable Machine Learning", and an introductory lab course "Big Data Analytics Lab". The basic lab course is open for students that have interest in a practical experience with machine learning, while the advanced lecture, the project group, and thesis projects are more theory- as well as research-oriented.

Additionally, we offer "Studienarbeiten" following the INTEGER teaching concept within our chair in the research areas representation learning, anomaly detection, and verifiable learning. For example, we offer “Explainable Anomaly Detection” as one open thesis project. Students with interest in such machine learning topics can apply at any time!

Lectures, Labs, and Seminars

We provide basic lectures, lab courses, and practice-oriented projects as introductory courses:

Big Data Analytics (every winter term)
Fundamentals in Statistics and Linear Algebra
Fundamentals in Data Structures and Scalable Algorithms
Big Data Analytics Lab (every summer term) (incl. annual Data-Mining-Cup competition)
Projects on selected machine learning topics (e.g. “Predictive Diagnostics”, “Graph Exploration”, …)

We provide a selected set of advanced lectures and research seminars for specialization in data science and engineering:

Machine Learning for Sequential Data and Graph Data
Machine Learning Paradigms for Complex Data
Graph Mining and Exploration
Indexing Structures for Efficient Database Access
Data Science Challenges in Practice
Smart Representations for Big Data Analytics
Data Science Research Labs (e.g. “Exploration of Complex Networks”, “Representation Learning for Predictive Maintenance”, …)

INTEGER Teaching Concept

In our Data Science and Data Engineering Labs, we supervise students w.r.t. open research challenges. Courses reflect our research focus on formal problem settings and scalable algorithmic solutions. As result of these lab courses we aim at publication and presentation of results at international conferences, i.e. students will participate in the entire research process! We have named this course style “INTEGER”. INTEGER provides students the opportunity to participate in research. As part of lab courses, we supervise Bachelor and Master students w.r.t. open research challenges, development of novel solutions, publication of results, and let the student’s present their work at international conferences. With INTEGER students have successfully participated in the entire research process and gained enthusiasm for research.

Selected Publications

All of our publications are listed online [DBLP Bibliography] – [ACM Digital Library] – [Google Scholar]

Magdalena Wischnewski, Nicole Krämer, Emmanuel Müller:
Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions
Proc. ACM Conference on Human Factors in Computing Systems (CHI 2023) [Full Text PDF][Talk at CHI 2023]
Benedikt Böing, Simon Klüttermann, Emmanuel Müller:
Post-Robustifying Deep Anomaly Detection Ensembles by Model Selection
Proc. IEEE 22nd International Conference on Data Mining (ICDM 2022) [Full Text PDF]
Chiara Balestra, Florian Huber, Andreas Mayr, Emmanuel Müller:
Unsupervised Features Ranking via Coalitional Game Theory for Categorical Data
(Best Paper Award) Proc. 24th International Conference on Big Data Analytics and Knowledge Discovery [Full Text PDF]
Carina Newen, Emmanuel Müller:
Unsupervised DeepView: Global Explainability of Uncertainties for High Dimensional Data
Proc. IEEE International Conference on Knowledge Graph (ICKG 2022) [Full Text PDF]
Erik Scharwächter, Jonathan Lennartz, Emmanuel Müller:
Differentiable Segmentation of Sequences
Proc. International Conference on Learning Representations (ICLR 2021) [Full Text PDF]
Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Ivan Oseledets, Emmanuel Müller:
FREDE: Anytime Graph Embeddings
Proc. International Conference on Very Large Data Bases (VLDB 2021) [Full Text PDF]
Benedikt Böing, Rajarshi Roy, Emmanuel Müller, Daniel Neider:
Quality Guarantees for Autoencoders via Unsupervised Adversarial Attacks
Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020) [Full Text PDF]
Erik Scharwächter, Emmanuel Müller:
Two-Sample Testing for Event Impacts in Time Series
Proc. SIAM International Conference on Data Mining (SDM 2020) [Full Text PDF]
Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, Marius Kloft:
Deep Semi-Supervised Anomaly Detection
Proc. International Conference on Learning Representations (ICLR 2020) [Full Text PDF]
Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Müller:
The Shape of Data: Intrinsic Distance for Data Distributions
Proc. International Conference on Learning Representations (ICLR 2020) [Full Text PDF]
Nikita Klyuchnikov, Davide Mottin, Georgia Koutrika, Emmanuel Müller, Panagiotis Karras:
Figuring out the User in a Few Steps: Bayesian Multifidelity Active Search with Cokriging.
Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2019) [Full Text PDF]
Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Müller, Danai Koutra:
Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket
Proc. IEEE International Conference on Data Mining (ICDM 2019) [Full Text PDF]
Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Alex Bronstein, Emmanuel Müller:
NetLSD: Hearing the Shape of a Graph
Proc. 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018) [Full Text PDF]
Lukas Ruff, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Robert Vandermeulen, Alexander Binder, Emmanuel Müller, Marius Kloft:
Deep One-Class Classification
Proc. 35th International Conference on Machine Learning (ICML 2018) [Full Text PDF]
Erik Scharwächter, Fabian Geier, Lukas Faber, Emmanuel Müller:
Low redundancy estimation of correlation matrices for time series using triangular bounds
Proc. 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) [Full Text PDF]
Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Emmanuel Müller:
VERSE: Versatile Graph Embeddings from Similarity Measures
Proc. 27th International Conference on World Wide Web (WWW 2018) [Full Text PDF]
Arvind Shekar Kumar, Tom Bocklisch, Patricia Iglesias Sanchez, Christoph Strähle, Emmanuel Müller:
Multi-Feature Interactions and Redundancy for Feature Ranking in Mixed Data.
Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2017) [Full Text PDF]
Davide Mottin and Emmanuel Müller:
Graph Exploration: From Users to Large Graphs
Proc. ACM International Conference on Management of Data (SIGMOD 2017) [Full Text PDF]
Erik Scharwächter, Emmanuel Müller, Jonathan Donges, Marwan Hassani, Thomas Seidl:
Detecting Change Processes in Dynamic Networks by Frequent Graph Evolution Rule Mining
Proc. IEEE International Conference on Data Mining (ICDM 2016) [Full Text PDF]
Fabian Keller, Emmanuel Müller, Klemens Böhm:
Estimating mutual information on data streams.
(Best Paper Award) Proc. 27th International Conference on Scientific and Statistical Database Management (SSDBM 2015) [Full Text PDF]
Thibault Sellam, Emmanuel Müller, Martin L. Kersten:
Semi-Automated Exploration of Data Warehouses.
Proc. 24th ACM Conference on Information and Knowledge Management (CIKM 2015) [Full Text PDF]
Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, Emmanuel Müller:
Focused clustering and outlier detection in large attributed graphs.
Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2014) [Full Text PDF]
Hoang Vu Nguyen, Emmanuel Müller, Klemens Böhm:
A Near-Linear Time Subspace Search Scheme for Unsupervised Selection of Correlated Features.
In Big Data Research Journal (1) 2014 [Full Text PDF]
Hoang Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Pavel Efros, Klemens Böhm:
Multivariate Maximal Correlation Analysis
Proc. 31th International Conference on Machine Learning (ICML 2014) [Full Text PDF]
Patricia Iglesias Sánchez, Emmanuel Müller, Fabian Laforet, Fabian Keller, Klemens Böhm:
Statistical Selection of Congruent Subspaces for Mining Attributed Graphs.
Proc. IEEE International Conference on Data Mining (ICDM 2013) [Full Text PDF]
Fabian Keller, Emmanuel Müller, Klemens Böhm:
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking.
Proc. IEEE 28th International Conference on Data Engineering (ICDE 2012) [Full Text PDF]
Emmanuel Müller, Matthias Schiffer, Thomas Seidl:
Statistical selection of relevant subspace projections for outlier ranking.
Proc. IEEE 27th International Conference on Data Engineering (ICDE 2011) [Full Text PDF]
Emmanuel Müller, Stephan Günnemann, Ira Assent, Thomas Seidl:
Evaluating Clustering in Subspace Projections of High Dimensional Data.
Proc. 35th International Conference on Very Large Data Bases (VLDB 2009) [Full Text PDF]