WOG Seminar 2008 Participants

This is the list of the participant at the WOG event.

Invited Speakers

David Mitchell David.G.Mitchell.jpg
Title: Modelling Languages, Model Expansion and MXG
Michael Benedikt Michael.Benedikt.jpg
Title: Moving Quickly From Queries to Streams
Abstract: Many applications of XML require querying to be done in a streaming fashion. That is, one wants processing that is one-pass, with minimal

buffering and tractable space usage. We first consider processing models that have these desirable properties, both in the setting of shallow (fix-depth) documents and in general. There are, of course, many models to look at, depending on (for example) whether the states and transitions of the processor are represented explicitly or symbolically. We then look at two questions on query languages: which queries can be translated into stream processors, and which queries can be efficiently translated into stream processors? We give answers for various stream models. This talk includes joint work with Alan Jeffrey of Bell Labs.

Bernhard Thalheim Bernhard.Thalheim.jpg
Title: Foundations for and Specification of Large Information-Intensive Websites
Floris Geerts Floris.Geerts.jpg
Title: Constraint-based data cleaning
Abstract: Data dependencies have been well studied for the relational model and a variety of dependency languages have been proposed to specify the semantics of relational data. Driven by increasing demand for data quality technology, there is renewed interest in the study of dependencies for effectively capturing inconsistencies in real-life data. Classical dependencies such as functional (FD) or inclusion dependencies (IND), however, fail to capture certain errors and inconsistencies in data.

In the first part of this talk, I describe extensions of functional and inclusion dependencies that circumvent these limitations. These extensions, called conditional functional and inclusions dependencies (CFDs and CINDs, respectively), although being more effective in the detection of inconsistencies than their traditional FDs and INDs, still inherit many of the nice theoretical properties known from classical dependency theory. I describe some of the properties of CFDs and CINDs and highlight both the differences and similarities with FDs and INDs.

In the second part of the talk, I focus on CFDs and address the problem how CFDs can be used to repair inconsistent data. That is, given data that does not satisfy a set of CFDs, how can we minimally change the data (e.g., by value modifications) such that the resulting data is consistent, i.e., satisfies the set of CFDs. I describe algorithms that automatically repair any inconsistent data set and conclude the talk with some open problems.

Regular Talks

Olga De Troyer Olga.De.Troyer.jpg and Sven Casteleyn Sven.Casteleyn.jpg
Title: Developing Web and Virtual Reality Applications using High-Level Intuitive Models
Abstract: The main theme of the research group WISE (VUB) is conceptual modeling and design methods. This theme is applied on different domains. The most important ones are the Web and Virtual Reality.

In the context of Virtual Reality, this has resulted in VR-WISE. VR-WISE (“Virtual Reality – With Intuitive Specifications Enabled”) is an innovative development method for Virtual Reality applications, which is specially developed to be usable by non-technical persons. It uses high level intuitive modeling concepts and allows specifying a Virtual Environment by means of models, expressed in terms of the vocabulary of the application domain.

In the context of the Web, this has resulted in WSDM. WSDM (“Web Semantics Design Method”) was one of the first Web design methods (1998), and has evolved from a method targeting small kiosk-style Websites to a complete Semantic Web design method, supporting modern Web applications with a variety of additional design issues (e.g. semantic annotations, localization, accessibility, adaptivity, …). It uses a layered set of conceptual models, each providing the modeling primitives for a specific Web design concern, i.e. data & functionality, navigation and presentation.

In this presentation, we present both the VR-WISE and the WSDM approach. Furthermore for VR-WISE, we show how F-logic has been used to formally define the semantics of some of the modeling concepts. For WSDM, we elaborate on the internal use of Semantic Web technology, and show how it has been exploited to (automatically) generate semantically annotated Websites.

Dieter Van De Craen Dieter.Van.De.Craen.jpg
Title: Efficiently Querying Scientific Databases
Abstract: Scientific data in the Life Sciences is distributed over various autonomous sources with inherent and often intricate relationships. In this setting, scientists are routinely required to write distributed queries. Being non-experts in computer science, they are faced with the challenge of expressing such queries. This is a non-trivial task, even if we assume scientists to be familiar with query languages like SQL since queries become arbitrarily complex in the presence of many sources.

As a user-friendly method for querying complex networks of sources, we propose exploratory queries. Exploratory queries are loosely-structured, hence requiring only minimal user knowledge of the source network. We attack the optimization problem for exploratory queries by proposing several multi-query optimization algorithms that compute a global evaluation plan while minimizing the total communication cost, a key bottleneck in distributed settings. The proposed algorithms are necessarily heuristics, as computing an optimal global evaluation plan is shown to be NP-hard. We present BioScout, a distributed monitoring system for biological data that uses our algorithms to compute a global plan and in which scientists can graphically draw their queries. Finally, we present our experimental results that illustrate the potential of our algorithms not only for the optimization of exploratory queries, but also for the multi-query optimization of large batches of standard queries.

Jan Hidders Jan.Hidders.jpg
Title: Towards a calculus for collection-oriented scientific workflows
Abstract: In many sciences, such as the life sciences, there has been a rapidly growing availability and use of computational resources such as software packages and databases, both in the form of locally installable software and externally accessible web services,. These often have to be combined in various complex ways in order compute the desired result. This has led to a need for specialized workflow systems that allow the scientists to define and manage workflows that integrate such resources in a user-friendly way and without having to program. These workflow systems tend to differ from business-oriented workflow systems in that they typically deal with large collections of data and often have to iterate over the elements of such collections. Although such workflow systems already exist, e.g., Taverna and Kepler, their formal semantics is often lacking, too complex for formal analysis or only describes either data-flow or control-flow but not both. In this work we propose a formal framework in which the complete semantics of such systems can be described. Moreover, we present a calculus that is closely related to the graph-based notation for workflows but has a semantics that can be described in structural way.
Luc De Raedt Luc.De.Raedt.jpg
Title: Probabilistic Relational Learning


Universiteit Antwerpen

Calin Garboni Calin.Garboni.jpg
Title: Itemset Mining: from Condensed Representation to Fault Tolerance
Abstract: Frequent Itemset Mining problem is by now well known and forms the core of many data mining algorithms. Condensed representations only store a non-redundant cover of all frequent itemsets, which drastically reduces the number of patterns comparing to the complete collection of frequent itemsets. One such representation is the Non-Derivable Itemsets (NDI) representation. It relies on the Inclusion-Exclusion Principle. We propose methods of efficiently dicover the NDIs as well as the generalized itmests. Moreover, we investigate the use of such techniques for the fault tolerant itemset mining, i.e. groups of similar transaction that share most items. This allow us to find interesting patterns in datasets with missing values as well as longer approximative patters which can be more meaningful than the short, absolutely matched frequent itemsets.
Bart Goethals Bart.Goethals.jpg
Title: Pattern mining and database integration
Abstract: I am and have been working mostly on the foundations of itemset mining and related pattern mining problems. Also, I am working on the integration of data mining into database systems.
Jan Hidders Jan.Hidders.jpg
Title: Towards a calculus for collection-oriented scientific workflows
Abstract: My research interests are XML databases, especially query and transformation languages for XML data. I have done research on the expressive power of such languages, and on query optimization techniques for these languages. Another interest is the relationship between data modeling and process modeling, especially for processes whose structure is largely determined by the involved data structures. Finally, I also do research on possible formalizations for collection-oriented scientific workflows.
Wim Le Page Wim.Le.Page.jpg
Title: Mining Rules of Simple Conjunctive Queries
Abstract: The discovery of recurring patterns in databases is one of the main topics in data mining and many efficient solutions have been developed for relatively simple classes of patterns and data collections. Indeed, most frequent pattern mining or association rule mining algorithms work on so called transaction databases. Not only for itemsets, but also for more complex patterns such as trees, graphs, or arbitrary relational structures, databases consisting of a set of transactions are used. For all these pattern classes, specialized algorithms exist to discover them efficiently. The motivation for these works is the potentially high business value of the discovered patterns. Unfortunately, many relational databases are not suited to be converted into a transactional format and even if this would be possible, a lot of information implicitly encoded in the relational model would be lost after conversion. We consider association rule mining on arbitrary relational databases by combining pairs reveal interesting properties in the database. Intuitively, we pose two queries on the database such that the second query is more specific than the first query. Then, if the number of tuples in the output of both queries is almost the same, this could reveal a potentially interesting discovery. Mining these rules is realised by efficient algorithms and database-oriented implementations in SQL.
Michael Mampaey Michael.Mampaey.jpg
Title: Database Summarization
Abstract: Current data mining techniques tend to produce a wealth of potentially interesting patterns, usually in quantities too large for a single user to manage or understand. In our research we aim to find ways of globally summarizing a database by using the local patterns occurring within it. Employing state of the art data mining techniques, the goal is to retrieve a small, concise subset of all patterns, that characterize the whole database as well as possible.
Adriana Prado Adriana.Prado.jpg
Title: Mining Views: Database Views for Data Mining
Abstract: We present a system towards the integration of data mining into relational databases. To this end, a relational database model is proposed, based on the so called virtual mining views. We show that several types of patterns and models over the data, such as itemsets, association rules and decision trees, can be represented and queried using a unifying framework.
Celine Robardet Celine.Robardet.jpg
Koen Smets Koen.Smets.jpg
Title: Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach
Abstract: Since the end of 2006 several autonomous bots are, or have been, running on Wikipedia to keep the encyclopedia free from vandalism and other damaging edits. These expert systems, however, are far from optimal and should be improved to relieve the human editors from the burden of manually reverting such edits. We investigate the possibility of using machine learning techniques to build an autonomous system capable to distinguish vandalism from legitimate edits. We highlight the results of a small but important step in this direction by applying commonly known machine learning algorithms using a straightforward feature representation. This study demonstrates that elementary features, which are also used by the current approaches to fight vandalism, are not sufficient to build such a system. They will need to be accompanied by additional information which, among other things, incorporates the semantics of a revision.

Vrije Universiteit Brussel

Sven Casteleyn Sven.Casteleyn.jpg
Abstract: Sven Casteleyn received his Master degree in Computer Science from the Vrije Universiteit Brussel in 1999, and his Ph.D. degree from the same university in 2005. His research interests lie primarily within the field of Web Engineering and the Semantic Web, more specifically web site design methods, adaptation & personalization, semantic annotations & technology, ontology evolution, aspect orientation & semantic web, conceptual modeling, etc. Currently, Sven works as a post-doctoral researcher at the Web and Information System Engineering lab of the Vrije Universiteit Brussel


Olga De Troyer Olga.De.Troyer.jpg
Title: Conceptual Modeling of Web Systems and Virtual Reality
Abstract: The major theme of my research is design methods and more in particular conceptual modeling techniques. Over the years, the focus has moved from database over Web systems towards Virtual Reality. Within the context of conceptual modeling, special attention is given to semantics, adaptivity and personalization, localization and globalization, evolution, accessibility, and usability. Also reasoning on designs and tool support is part of the research. Semantic Web technology, Web 2.0 technology, ontologies, and design patterns are important enabling technologies for the research.
Frederic Kleinermann Frederic.Kleinermann.jpg
Abstract: My research is about Conceptual Modeling for the domain of Virtual Reality (VR). VR applications are becoming more feasible due to better and faster hardware. As the network connections are also getting faster, VR applications start to appear on the Internet. But the development of such applications is still a specialized, time-consuming and expensive process. By introducing a Conceptual Modeling phase into the development process of VR applications, a number of the obstacles preventing a quick spread of this type of applications can be removed. However, existing Conceptual Modeling techniques are too limited for modeling a VR application in an appropriate way. For this reason, my research has focused on the development of a Conceptual Modeling approach aiming at making VR more accessible to non VR-specialists. As more and more VR applications are becoming available through internet and are mixed with other types of media, my research is also focusing on how to embed Conceptual Modeling in a semantic framework in order to provide the basis for semantically rich VR applications. This may be essential for its success in the future over the web and also its use in the context of the Semantic Web.
Abdalghani Mushtaha Abdalgahani.Mushtaha.jpg
Title: Towards localization methodology support for website localisation
Abstract: My research is situated in the context of E-Learning and cultural differences. The proposed research is about evaluating the influence of the users’ cultural background on content and interface understanding in the context of e-learning, and to develop a design methodology for learning environments that take into account social, religion and cultural factors. My contribution in the direction of localisation for e-learning will be to evaluate the influence of users’ cultural background on content and interface understanding; to define and characterise the cultural dimensions which are shown to be essential to consider for e-learning; and to work towards a methodology for localising or globalising e-learning websites. Using the results of the research, it will be possible to develop e-learning environments that take into account relevant social and cultural factors.
Bram Pellens Bram.Pellen.jpg
Abstract: Bram Pellens is a Researcher in the Department of Computer Science of the Vrije Universiteit Brussel (Belgium). His research topic is in the area of conceptual modeling of Virtual Reality (VR) applications and in particular the aspect of describing behavior (i.e. dynamic part) within these Virtual Environments (VEs). In this context, he has worked on a behavior modeling approach and an associated graphical behavior modeling language. Currently, this work is being extended by incorporating the use of so-called “behavior patterns” (similar to design patterns known from Software Engineering) in the graphical behavior modeling language. This is currently being applied in the domain of Computer Games and Interactive Media applications. The main aim of this work is to further improve the development process, and more precisely the aspect of modeling the behavior, of Computer Games or Virtual Environments. Recently, he is also performing research in the area of Web Engineering and the Semantic Web. Within this field, the main interest is on adaptive web applications, personalization, interoperability between web applications, semantic web technology and languages. A lot of his time is currently spent in the area of personalized and adaptive technology-enhanced learning (TEL) environments supporting life-long learning. For this particular research, the focus is on interoperability-issues in the context of (distributed) user modeling.
William Van Woensel William.Van.Woensel.jpg
Title: Design and development of Web presences for physical entities
Abstract: By creating a virtual presence for people, places and things, a seemless integration between the physical and the virtual world can be achieved. More specifically, by exploiting the coupling between physical objects and their virtual representation, one can easily access information or services related to these entities. This bridging of the physical and virtual world means one can navigate the virtual Web while walking around; in the same way a user can select a link on the WWW to travel to another Website, one can move in the physical world and access the Web presences of nearby objects. By employing Web technology to host and access such virtual presences, many existing Web content and services can be reused. However, design and development of these Web presences poses not only challenges for adaptation to the capabilities of mobile devices, but also for exploiting context more extensively. On the one hand, this means taking into account user properties and preferences to filter out interesting and relevant Web presences; on the other hand, natural relations between entities based on e.g. location, conceptual relationships, etc can be taken into account. For example, if information on such relations of a "user" entity is available to a Web presence, they can be used to provide more personalized content or services to the user. Moreover, in the reverse direction, these relations can provide the user with a powerful means of exploring nearby (or otherwise related) entities. In this setting, issues such as authentication, security and privacy become major issues; access to information on (relations of) an entity will have to be restricted.
Lamia Abo Zaid Lamia.Abo.Zaid.jpg
Title: An Ontology representation of Variability and Feature Models
Abstract: Many of today’s software applications are very large in size and very similar in functionality which called for the creation of variable software. Where variation points are defined to capture the variability introduced in the software product, one of the most common methods to capture and represent variability is via feature models. The goal of the feature model was to bring a visual representation for the relations between the features. We argue that feature models have two main problems: 1) lack of scalability 2) Lack of expressivity which makes that it provides an incomplete model of the variability. In our work we address the solution to the above problems by providing an ontology representation for feature models. Our goal is to create an OWL ontology that is expressive enough to represent feature relations and constraints in a formal way. This makes it possible to scale the feature model to the size of thousands of features while also checking the consistency of the model. An ontology representation for feature models also allows integrating and sharing them between different systems and components. This is an important advantage due to the distributed nature of the software development process. Furthermore representing feature models in OWL enables query support for the underlying models providing easy access to information in the feature model.

Universiteit Hasselt

Peter Boyen Peter.Boyen.jpg
Title: Database support for the calculation of graph similarity in biological networks
Abstract: Bioinformatics research has evolved from the genome to the proteome (from genes to proteins), and more recently the interactome (interactions between proteins). Proteins are essential parts of organisms and perfrm important tasks in every process within cells. Interacions between proteins can be modelled as a graph or a network. Other examples of biological networks are pathways, that from the view of theoretical computer science can be considered automata responsible for specific metabolic processes. More and more biological network are gathered in online databases (e.g. DIP, KEGG, WikiPathways and BioCyc). The search capabilities in these databases are limited to searching for keywords in metadata fields of searching for linear networks. In particular, there is no functionality that provides a search for networks similar to a given network. The most important reason for these limitations is dat the researched similarity measures for biological networks don't scale to larger collections.

In the scientific literature, almost exclusively similarity measures in terms of graph alignment are reasearched. This is a topology conserving similarity measure based on homeomorphisms. Homeomorphisms are topology conserving in the sense that insertions and deletions of nodes are only allowed along paths. There are two types of graph alignment that are studied in the literature. Local alignment looks for occurences with high similarity of a given small graph in a given large graph, while global alignment, as the name indicates, calculates the global similarity between two graphs. Since both problems are NP-hard, local alignment research has primarily focused on finding classes of graphs for which alignment is possible in polynomial time and on probabilistic algorithms that don't necessarily return the optimal result. Research into global alignment is young and mostly limited to expanding local alignments. In this research project we want to research a broader class of similarity measures than only those based on graph al ignment, and classify these according to precision (biological relevance) and scalability to larger collections of networks.

Wouter Gelade Wouter.Gelade.jpg
Title: Foundations of XML and Formal Languages
Abstract: My main area of research is that of database theory, more specifically:
  • Foundations of XML (In particular, XML schema languages)
  • Formal languages (In particular, regular expressions and their succinctness and complexity)
  • Dynamic complexity theory (see also the entry of Marcel Marquardt)
  • Web services
Goele Hollanders Goele.Hollanders.jpg
Title: On Phase Transitions in Learning Sparse Networks
Abstract: The capacity to adapt in artificial and biological systems is the key word when a system has to maintain themselves in a changing environment. Natural evolution of living beings is a good example of this, but also the chemical organization in a cell is a good illustration. The concentration of proteins and other chemical substances in the cell change continuously as a function of internal and external factors. For a good understanding of the functioning of a cell, it is important to develop a model that describes this chemical composition as a dynamic system. The aim of my research is to infer such a model.
Natalia Kwasnikowska nobody.jpg
Title: A formal model for dataflows and dataflow repositories
Abstract: Modern scientific research is characterized by extensive computerized data processing of lab results and other scientific data. Such processes are often complex, consisting of several data manipulating steps. We refer to such processes as dataflows, to distinguish them from more general workflows. General workflows also emphasize the control flow aspect of a process, whereas our focus is mainly on data manipulation and data management.

Important data management aspects of dataflows include, among others, 1) support for complex data structures; 2) iterating operations over all members of a collection; 3) support for hierarchical design of dataflows; 4) support for the use of external resources and services, like GenBank; and 5) a dataflow can be run several times, often a large number of times, on different inputs. All data of these different runs must be retained, including input parameters, intermediate results (e.g., from external services), output data, and meta-data (e.g., dates).

The last item is of a particular importance and leads to the notion of a dataflow repository: a database system that stores different dataflows together with their different runs. A dataflow repository can provide effective management of all experimental and workflow data kept in a large laboratory or enterprise setting, facilitate verification of results and tracking of the provenance (origin) of dataflow results.

We propose a formal model for dataflows and dataflow repositories. Our model includes careful formalisations of such features as complex data manipulation, external service calls, subdataflows, and the provenance of output values.

Bart Moelans Bart.Moelans.jpg
Title: Qualitative polyline similarity testing with applications to query-by-sketch, indexing and classification
Abstract: Area of research: spatial databases, spatio-temporal databases, GIS and data mining

Working on two projects, a European FET-IST project called GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery, http://www.geopkdd.eu/) and a Flemish FWO-project with topic "Knowledge representation and database problems for spatio-temoral data: a calculus approach to representing knowledge about trajectories"

Frank Neven Frank.Neven.jpg
Title: Databases: theories and systems
Abstract: My main research area is that of database theory and database systems.
  • Foundations of XML:

- static analysis of query and pattern languages (including typechecking and query containment)

- XML schema languages: structural properties, expressiveness, algorithms, and automatic inference of schemas (schema learning)

  • Databases and bioinformatics

- monitoring of life science data/query optimization

- computing graph similarity for biological networks

  • Theoretical Computer Science

- regular languages

- finite model theory

Dieter Van De Craen Dieter.Van.De.Craen.jpg
Title: Efficiently Querying Scientific Databases
Jan Van den Bussche Jan.Van.Den.Bussche.jpg
Title: Learning first-order queries: beyond unions of conjunctive queries
Dries Van Dyck Dries.Van.Dyck.jpg
Title: Algorithmic Graph Theory and Graph Mining
Abstract: Graphs are ubiquitous in computer science: primarily as an extremely powerful datastructure and secondarily as a combinatorial object with beautiful mathematical properties to be exploited algorithmically. As most computational problems in which graphs arise are NP-hard, heuristic aproximation algorithms run the show in solving real life instances. My main research interests are those areas where computer science meets algorithmic graph theory, such as:
  • Graph mining, searching for frequently occuring patterns in a collection of graphs or in one large single graph
  • Similarity and structure of biological networks
  • Heuristics for hard graph problems
  • Algorithmic aspects of graph grammars/expressions
Stijn Vansummeren Stijn.Vansummeren.jpg
Title: Integration of programming languages and query languages
Abstract: I am interested in databases, programming languages, and in the intereaction of these two fields. My recent research has focused on:
  • Foundations of XML:

- Intrinsic complexity: adding structural recursion to XQuery as a candidate replacement for arbitrary (possibly non-terminating) recursion, with a focus on taming its complexity

- static analysis of query and pattern languages (including typechecking and type inference)

- automatic inference of schemas (schema learning)

  • Foundations of Scientific and Curated databases:

- provenance in database query languages

Katholieke Universiteit Leuven

Mai Ajspur Mai.Ajspur.jpg
Title: Probabilistic Logic Learning
Laura Antanas Laura.Antanas.jpg
Title: Relational action plans for robots
Abstract: Lately, robotics research has started to focus on solving real-world tasks. The goal is to build robots that can operate in the same world as we do, infer human activities and provide assistance, if needed. In this context symbolic representations and uncertainties of the environment play an important role. Relational representations have been studied for a long time, but only recently combined with probabilities. My future research aims at learning relational action plans for robots using probabilistic relational representations. The goal is to learn these models from perceptions, such as video and audio information, RFID tags, etc.
Hendrik Blockeel Hendrik.Blockeel.jpg
Title: Machine learning and data mining
Abstract: Hendrik Blockeel's research interests cover a variety of topics in data mining and machine learning, with a particular focus on:
  • probabilistic logics: how to represent uncertain knowledge and knowledge about uncertainty, preferably in a way that allows for combining first order logic inference with probabilistic inference
  • graph mining: learning from data that are represented as (annotated) graphs, or as links or nodes in such a graph. A particular recent research interest is induction of graph grammars.
  • predicting structured outputs: learning predictive functions that do not only take structured values as inputs, but also predict structured values as outputs (where the structure of the output value might be fixed, variable but given, or to be predicted, and parameters of the structure are always to be predicted)
  • relational data mining: mining data stored in a relational database, inferring hypotheses that may include, for instance, aggregate functions
  • inductive databases: finding a single approach for knowledge processing that integrates machine learning, data mining, statistics, and database querying
  • applications of machine learning in bioinformatics
Stephen Bond Stephen.Bond.jpg
Title: Automated Reasoning Techniques for FO(ID)
Abstract: FO(ID) is an extension of first-order logic with inductive definitions. My research concerns the use of various techniques, such as tableau calculus, constraint solving and SAT modulo theories, to enable various kinds of automated reasoning in FO(ID) (e.g. query answering, theorem

proving, abduction) and to increase the speed and efficiency of model generation.

Bjoern Bringmann Bjoern.Bringmann.jpg
Maurice Bruynooghe Maurice.Bruynooghe.jpg
Head of Declarative Languages and Artificial Intelligence
Abstract: Research interest:
  • Machine learning and data mining
  • Knowledge representation and reasoning
  • Design, analysis and implementation of declarative programming languages
Álvaro Cortés-Calabuig Alvaro.Cortes-Calabuig.jpg
Abstract: Standard databases convey Reiter's closed-world assumption that an atom not in the database is false. This assumption is relaxed in locally complete databases that are sound but only partially complete about their domain. One of the consequences of the weakening of the closed-world assumption is that query answering in locally closed databases is not tractable. In our research we develop efficient approximate methods for query answering, based on fixpoint computations. We aim at presenting results for a broad class of locally closed databases in which our method would produce complete answers to queries.
Fabrizio Costa Fabrizio.Costa.jpg
Title: Machine Learning for Structured Input and Output
Abstract: My main research area is that of machine learning over structured data (i.e. sequences, trees or graphs) where the prediction task is not a single value (i.e as in the various flavors of classification or regression) but rather a structure in itself (i.e. sequences to sequences as in the Named Entity Recognition task, sequences to trees as in the Natural Language parsing tasks, sequences to graphs as in the Protein Folding task).
Christophe Costa Florencio Christophe.Costa.Florencio.jpg
Abstract: I am currently working on (predictive) graph mining within an inductive logic programming framework. The idea is to use graph logic and graph grammar formalisms to express patterns found in graph data. I am especially interested in the more theoretical aspects, i.e., learnability, complexity, expressive power.
Kurt De Grave Kurt.De.Grave.jpg
Title: Active learning for Drug Lead Discovery
Abstract: I study the problem of resource-constrained automated scientific discovery in life sciences. The resource constraint calls for concepts and techniques from both active learning and optimization. The automation of scientific discovery requires a closed-loop learner. Applications being worked on are nanofiltration membrane optimization, the Robot Scientist, and drug discovery.
Leslie De Koninck Leslie.De.Koninck.jpg
Title: Execution Control for CHR
Abstract: Constraint Handling Rules (CHR) is a rule-based language designed for the implementation of application-specific constraint solvers. While

CHR enables a high-level specification of the constraint solver logic, it lacks facilities for flexible execution control. Our work intends to remedy this problem by extending CHR with user-definable rule priorities (CHR-rp). Rule priorities allow for a concise and declarative specification of the control aspect of a CHR program. Moreover, by means of optimized compilation, we have shown that the increased flexibility does not need to result in performance penalties. We combined CHR-rp with search into a new framework that supports both the specification of the propagation and of the search strategy. Finally, we have proposed a strong meta-complexity result for CHR-rp, inspired by the Logical Algorithms framework by Ganzinger and McAllester.

Luc De Raedt Luc.De.Raedt.jpg
Bart Demoen Bart.Demoen.jpg
Title: CHR, Prolog related stuff and some theory
Abstract: With my students, I work on CHR: analysis, compilation, optimization, new language constructs, complexity .... Actually, they do most of the work and I try to hang in. With friends in Melbourne, I work on symmetry detection for and dynamic symmetry breaking in constraint solvers. A friend in Angers helps me improving Prolog emulators and finding better ways to integrate constraint solvers in a Prolog implementation. I am interested in introducing types into Prolog, garbage collection, game theory (especially its complexity issues), the implementation of Probabilistic Logic Learning in the WAM and dealing with large data sets in Inductive Learning.
Marc Denecker nobody.jpg
Anton Dries Anton.Dries.jpg
Title: Detecting Concept Drift using Computational Learning Theory
Abstract: My general research topic is about learning from relational data streams. The main goal is to investigate data stream related problems and develop techniques that are applicable to streams of relational data. My current focus in on the problem of concept drift and how to detect it using computational learning theory applied to a k-CNF learning algorithm.
Kurt Driessens Kurt.Driessens.jpg
Title: Relational Reinforcement Learning
Abstract: The bulk of my past research is based in the field of Relational Reinforcement Learning. Relational reinforcement learning combines the

ideas of reinforcement learning with the expressive power of relational representations. Research during my PhD focussed on the development of different relational regression algorithms that can be used for Q-learning in relational reinforcement learning problems, and built the first applicable RRL system. Currently I am interested in relational, incremental and probabilistic Regression and the integration and automatic generation of domain knowledge and transfer learning in a reinforcement learning setting.

Daan Fierens Daan.Fierens.jpg
Title: Learning Directed Probabilistic Logical Models from Relational Data
Abstract: Two kinds of models that are often used in the machine learning community are probabilistic models (models that define a probability distribution on the data) and logical models (models that use elements of logic programming or first-order logic). The advantage of the former is the ability to model stochastic or noisy data, the advantage of the latter is the ability to handle relational data. There is a growing interest in combining both advantages by using so-called probabilistic logical models. My research is about directed probabilistic logical models and how to learn such models from relational data.
Elisa Fromont Elisa.Fromont.jpg
Title: Inductive databases
Abstract: The knowledge discovery process often involves relatively complex steps of data preprocessing and model construction. Current approaches to model construction are limited to applying fixed algorithms or tools to a given dataset. The Inductive Databases vision is that users should be able to query for patterns or models in the database in the same way they would query for "usual" data in order to support the discovery process. Challenges involve, in particular, the design of generic query languages, the storage of patterns into databases and the creation of new efficient constraint-based mining algorithm to be able to answer a large range of user queries.
Robby Goetschalckx Robby.Goetschalckx.jpg
Title: Cost-sensitive Sparse Linear Regression
Abstract: We consider the problem of linear regression where some features might only be observable at a certain cost. The learning task becomes a search for the features that contain enough information to warrant their cost.
Fabian Guiza Grandas Fabian.Guiza.Grandas.jpg
Title: Data Mining in Intensive Care
Abstract: In this work we present a combination of Time-Series Models with a Gaussian Process Classifier for predicting the moment when an intensive care patient becomes clinically stable to allow for the weaning of mechanical ventilation. This is a relevant prediction task since the removal of assisted ventilation is indicative of the recovery of the patient's health state. Weaning from mechanical ventilation is also a prerequisite for discharge from intensive care, and is therefore valuable information in planning of the use of Operation Rooms and the availability of beds for post surgery patients.

Time-Series Models are used in this study as a representation of the dynamical behavior of the patient's first 4 hours of intensive care stay. The parameters of these models are used as inputs for binary Gaussian Process classifiers, which determine the probability of removal from mechanical ventilation within a given time-interval. Initial results indicate that the dynamical components of the physiological signals studied appear to be more predictive when clinical stability occurs several hours after ICU admission; while the static information component appears more relevant for determining removal of assisted ventilation withing the first 8 hours of ICU stay.

Tias Guns Tias.Guns.jpg
Title: Constraint Programming for Itemset Mining
Abstract: In data mining a lot of effort is put in Constraint-based mining. The goal is to allow the user to define several constraints on the data and the model, and to let the system push does constraints as deep as possible in the mining process. This is typically done by adapting the base

algorithm, which spurs the existince of many algorithmic variations. Constraint programming on the other hand is a general framework for solving constrained problems. The goal of my research is to enhance constraint-based mining using techniques from Constraint Programming. As a first step, the itemset mining problem and many of its constraints has been translated into a Constraint Programming problem. This offers the genericity and modelling flexibility of CP to itemset mining.

Ping Hou Ping.Hou.jpg
Title: Deductive system for FO(ID)
Abstract: FO(ID) is an extension of first-order logic (FO) with inductive definitions (IDs). My research concerns developing deductive inference

methods for this logic. My recent work has focussed on investigating a sequent calculus (Gentzen-style deductive system) for FO(ID), proving the soundness and completeness results of this deductive system and showing some model theoretic facts about FO(ID).

Gerda Janssens Gerda.Janssens.jpg
Abstract: My research topics cover a variety of topics related to logic programming:
  • program analysis and abstract interpretation
  • memory reuse for Mercury, region-based memory management for Mercury
  • performant ILP Data Mining Systems
  • implementations of logic programs (currently hipP)
  • verification of functional equivalence of C programs
Angelika Kimmig Angelika.Kimmig.jpg
Abstract: I am working in the area of probabilistic logic learning, more specifically on a probabilistic version of Prolog called ProbLog. ProbLog has been motivated by the need to analyze networks of biological knowledge extracted from collections of large databases. Within ProbLog, we study probabilistic variants of various learning settings, including e.g. theory revision and explanation based learning.
Theo Mantadelis Theo.Mantadelis.jpg
Title: Efficient algorithms for the integration of probabilistic logic in Prolog based Machine Learning
Abstract: The Machine Learning group of DTAI (Computer Science department) started working with a new GOA on Probabilistic Logics for Machine

Learning. The group uses the in-house constructed Prolog system hipp. This system needs to be enhanced with features that make learning of probabilistic theories possible. New algorithms need to be developed for this. A framework in which different semantics can be explored/exploited must be designed, so that alternatives can be explored. One goal is to define a core language in which all, or most, competing probabilistic logics can be expressed and executed.

Maarten Marien Maarten.Marien.jpg
Title: Model generation for FO(ID)
Abstract: FO(ID) is an extension of first-order logic (FO) with inductive definitions (IDs). This language is an expressive language for knowledge representation, which has many practical applications in declarative problem solving. My research concerns supporting this language computationally, and more specifically: doing model generation for FO(ID). As such, I have three main

areas of interest:

  • extending SAT solvers with computational support for IDs;
  • comparing FO(ID) with a related language, namely ASP;
  • investigating the methodology of declarative problem solving using FO(ID).
Wannes Meert Wannes.Meert.jpg
Title: PLL: Learning for CP-Logic
Abstract: The field of Probabilistic-Logical Learning (PLL) tries to combine probability theory with first-order logic and data mining. One of the formalisms in this field is CP-Logic which has as core components: Causality, Probabilities and (First-Order) Logic. Our research is targeted towards learning CP-theories from data.
ManhThang Nguyen ManhThang.Nguyen.jpg
Title: Automated Termination Analysis of Logic Programs: Breaking the paradigm barriers
Abstract: The question whether a given program terminates w.r.t. a specific input is one of the fundamental problems in program verification.

In the last 20 years, this problem has been researched quite thoroughly, with the emphasis on two specific paradigms: Logic Programming and Term Rewriting Systems. Although in both areas, the work has been extensive and successful, with many techniques and automated tools developed, termination analysis research has evolved very independently for these paradigms.

The aim of this research is to improve the technologies within the field of Logic Programming by porting termination techniques from TRS to Logic Programming and developing a new automated termination analysis tool based on these techniques.

Siegfried Nijssen Siegfried.Nijssen.jpg
Abstract: Siegfried's main research interest is data mining and machine learning under constraints, and data mining in graph-structured data. Research questions include how constraints can help to find models and patterns more efficiently, how constraints can help to find more usable models and patterns, and how we can develop general algorithms that can deal with many types of constraints at the same time.
Quan Phan Quan.Phan.jpg
Title: Static memory management for logic programming languages
Abstract: Memory management is a classic problem in programming language implementation. In modern programming languages, runtime garbage collectors are used to free language users from thinking, even knowing, about memory usage. However, those collectors have their own disadvantages. The most prominent one is that it needs time to detect no-longer-used memory cells (garbage) when the main program is running, affecting its performance. My research is in the line of using program analyses to reasonning about a program's garbage at compile-time instead of runtime and augment the program to reuse the garbage at certain points. I am working with Mercury, a logic and functional programming language, and have been focusing on region-based memory management technique. My research interests are compilers, program analyses, memory management techniques, and in general things related to programming languages.
Beau Piccart Beau.Piccart.jpg
Abstract: My current research focus is on multi-target models (e.g. multi-target predictive clustering trees). More specifically on inductive transfer between the different targets in a multi-target model and how to exploit this transfer.
Paolo Pilozzi Paolo.Pilozzi.jpg
Title: Automated Analysis and Verification of Constraint Handling Rules
Abstract: My research investigates automated analysis and verification of Constraint Handling Rules (CHR), with an emphasis on termination analysis of CHR. The question whether a given program terminates w.r.t. a specific input remains to be a fundamental problems in program verification. In this domain, CHR confronts us with new challenges due to its multi-headed rules and its multi-set semantics under a fire-once policy for propagation.

The aim of my research is to establish new technologies within the field of CHR by adapting existing techniques and by developing new techniques for language-specific problems. As a second objective, I aim to develop an automated verification and analysis program for the CHR language.

Stefan Raeymaekers Stefan.Raeymaekers.jpg
Abstract: I am working on information extraction from web pages, using tree grammar induction to learn the wrappers to extract the data. I also

researched finite state tree automata as an efficient representation for these tree grammars.

Leander Schietgat Leander.Schietgat.jpg
Abstract: My research interests include biological applications of machine learning and data mining. I am particularly interested in techniques that use graphs, either as input or output. One area where I am working in is functional genomics, where the task is to predict the functions of genes. Since these functions are organized in a hierarchy, the output of the learning method is a directed acyclic graph. Another application that keeps me busy is the classification of molecules. By representing molecules as graphs, it is possible to exploit some properties, which can be used to develop efficient algorithms.
Tom Schrijvers Tom.Schrijvers.jpg
Title: Declarative programming languages
Abstract: My research is situated in the area of declarative programming languages: functional, logic and constraint-based languages (eg. Haskell, Prolog, Mercury, Constraint Handling Rules, rewrite systems). Particular topics of interest are:
  • type systems (type checking en type inference)
  • automated program analysis
  • optimized compilation
  • program transformation and refactoring
  • complexity properties
Jan Struyf Jan.Struyf.jpg
Abstract: Jan Struyf is a postdoctoral researcher and member of the machine learning subgroup of DTAI. His research interests include relational data mining, inductive logic programming, inductive databases, probabilistic logics, and predictive clustering.
Ingo Thon Ingo.Thon.jpg
Title: Analysis of Sequences of Relational Interpretaions
Abstract: Surprisingly few SRL approaches have been developed for modeling dynamicdomains, i.e. domains with temporal and/or sequential aspects. One reason might be that timeis a special type of relation. Sequentiality also leads to large amounts of data. The algorithmic complexity for general purpose, dynamic SRL approaches easilyexplodes and becomes intractable unless quite strong assumptions are made. I'm interested in developing a relational representation of sequences ofinterpretations allowing efficient inference and learning.
Pedro Toledo Pedro.Toledo.jpg
Title: Using Frequent Patterns from Graph Mining for Workflow Optimization
Abstract: My research interests are focused on applying frequent pattern mining techniques on finding near-optimal cost subgraphs in constrained graphs, when the only information about the cost is obtained from previous non-optimal solutions. This problem has been proposed due to its applications in developing computer-assisted environments where a human expert is helped at each decision point to design the best model to complete a set of tasks.
Werner Uwents Werner.Uwents.jpg
Title: Relational neural networks
Abstract: My research focuses on neural networks for relational data domains. This includes data represented as bags or graphs and data from relational databases.
Anneleen Van Assche Anneleen.Van.Assche.jpg
Abstract: I am a postdoctoral researcher and member of the machine learning subgroup of DTAI. My PhD research dealt with improving the applicability of ensemble methods in data mining. In particular, we investigated several techniques which remedy certain drawbacks of ensembles such as the

application and efficiency in relational domains and interpretability of the classification models. In general, my research interests include relational data mining, inductive logic programming, inductive databases and ensemble methods.

Peter Van Weert Peter.Van.Weert.jpg
Title: Extension and Implementation of CHR
Abstract: My research interests are the design and implementation of (declarative) programming languages, Constraint Handling Rules (CHR) in articular. CHR is a high-level, rule-based programming language. The goal of my research is to improve the usability of the CHR language, by extending the language with expressive, declarative language features, and by developing, implementing, and evaluating new and existing program analyses and compilation techniques for (extended) CHR programs. We extended the CHR language with negation as absence and user-definable aggregates. I developed a state-of-the-art CHR compiler and runtime system for Java, called K.U.Leuven JCHR. It aims at a tight integration with the object-oriented host-language. JCHR outperforms other CHR systems and other forward chaining rule engines by up to several orders of magnitude.
Joost Vennekens nobody.jpg
Celine Vens Celine.Vens.jpg
Title: Developing data mining methods for bioinformatics
Abstract: In my research, I develop data mining techniques for applications in bioinformatics, such as functional genomics, phylogenetic analysis, document classification,... Recent work has focussed on using decision trees (and ensembles thereof) for predicting the functions of genes. This task is an instance of hierarchical multi-label classification, because a gene can have multiple functions, and the functions are organized in a hierarchy.
Sven Verdoolaege Sven.Verdoolaege.jpg
Abstract: My research interests include analysis and transformation of C programs and (weighted) counting of integer points in (integer projections of) polytopes. My current focus is on verifying the equivalence of two versions of a program.
Hanne Vlaeminck Hanne.Vlaeminck.jpg
Title: The use of Knowledge Representation in Software Engineering
Abstract: Typically in Software Engineering (SE) there is no distinction made between knowledge about the domain and about what the program should do. I investigate how we can express the domain knowledge and behavior of the application in appropriate formal knowledge representation (KR)

languages. We then can store this in a knowledge base and use inference algorithms to perform certain tasks. My goal is to develop a methodology for SE with the use of KR.

Dean Voets Dean.Voets.jpg
Title: Automated Termination Analysis of Programming Languages
Abstract: My research investigates automated termination analysis of pogramming languages. The question whether a given program terminates w.r.t. a specific input remains to be a fundamental problems in program verification. At the moment the aim of my research is to investigate termination of java and CHR. The aim of my research is to establish new technologies within the field of CHR and java by adapting existing techniques and by developing new techniques for language-specific problems.
Johan Wittocx Johan.Wittocx.jpg
Title: Approximate reasoning for FO(ID)
Abstract: FO(ID) is an extension of classical first-order logic with inductive definitions. Given an FO(ID) theory, one can specify an approximation of a model of that theory by specifying what is known to be certainly true (false) in the model. I investigate techniques to refine such approximations. As a particular application of this research, I am interested in creating compact groundings in the context of model expansion for FO(ID).
Pieter Wuille Pieter.Wuille.jpg
Title: Automatic transformation of data structures in functional languages
Abstract: since computer programs become more and more complex, programmers tend to focus more on functionality, and less on efficiency. Often, little attention goes to the choice of data structures, even though the time and space complexity of programs is largely dependent on this choice. It should however not always be made by the programmer. In many cases, the optimal choice changes throughout a program's development, or should even only be made at runtime. My research will investigate automatic analysis of the usage of datastructures and translating programs to change the data structures used to solve this problem.
Bernard Zenko Bernard.Zenko.jpg
Abstract: Bernard Ženko is a postdoctoral researcher at the machine learning subgroup of DTAI, KU Leuven, and Jožef Stefan Institute, Ljubljana, Slovenia. His research interests include predictive clustering, rule induction, ensembles of classifiers, and applications of machine learning techniques in environmental and life sciences.
Albrecht Zimmermann Albrecht.Zimmermann.jpg

Partner Universities

Marcel Marquardt Marcel.Marquardt.jpg
Title: Dynamic Complexity Theory
Abstract: To explore the computational complexity of dynamic problems, where one has to answer queries on locally changing input structures, a couple of different complexity classes were introduced in the last years.

I am interested in the class DynFO and some of its subclasses, introduced by Patnaik, Immerman and Hesse. Here the updates on the query and the auxiliary information are computed by first-order-formulae. The dynamic setting allows to express properties of the input structure that are not expressible in FO. For example, every context-free-language is in DynFO or the reachability problem on undirected graphs. The class DynFO equals the class of first-order incremental evaluation systems (FOIES).

One major open problem is the question whether the reachability problem on directed graphs can be maintained via an DynFO-algorithm. Hesse recently has shown that it can be maintained in Dyn(FO+Counting).