Francisco N. F. Q. Simoes  

Utrecht University – Department of Information and Computing Sciences – Intelligent Systems group

I am currently a PhD candidate at the Department of Information and Computing Sciences of Utrecht University working on causal inference. My supervisors are Thijs van Ommen and Mehdi Dastani. My PhD project is part of a collaboration between ProRail and Utrecht University.

Your Name's Photo

My current research focuses on the construction of useful abstractions from data making use of the interventionist framework of causal inference. This has lead me to the study of modeling of properties of causal relationships using information theoretical quantities. The goal is to develop algorithms which learn representations possessing the desired properties, by making use of appropriate information theoretical metrics.

In addition to exploring causal representation learning and information-based modeling of causal properties, I'm also investigating causal discovery when there's background knowledge and methods for selecting interventions in the presence of an SCM. These are topics that my Master's students have been working on.

On the applied side, an objective of my PhD is to apply these developed methods to analyze train delay data provided by ProRail. The goal is to understand how causal inference can be utilized to enhance train traffic control.

  • PhD in Artificial Intelligence, Utrecht University

    2021-Now

    Causal Discovery, Intervention Selection, Representation Learning, Information Theory

  • ML engineer & Data Scientist, Orbisk (Utrecht)

    2020-2021

    Computer Vision, API development

  • Research internship at UMC's Brain Center

    2020

    Dimensionality reduction methods in a Genome-Wide Association Study (GWAS) of ALS.

  • MSc in Theoretical Physics, Utrecht University.

    2017-2019

    Mathematical emphasis. Lie Algebras, Differential Geometry, Representation Theory, ...

  • A la carte Mathematics units, University of Porto.

    2016-2017

    Topology, Group Theory, Logic, Functional Analysis, Manifolds, ...

  • BSc in Physics, University of Porto.

    2013-2016

Research & Publications

Optimal Causal Representations and the Causal Information Bottleneck

Francisco N. F. Q. Simoes, Mehdi Dastani, Thijs van Ommen

Submitted to the 13th International Conference on Learning Representations (ICLR 2025).

We propose the Causal Information Bottleneck (CIB), a causal extension of the IB, which compresses a set of chosen variables while maintaining causal control over a target variable. This method produces representations which are causally interpretable, and which can be used when reasoning about interventions.

Fundamental Properties of Causal Entropy and Information Gain

Francisco N. F. Q. Simoes, Mehdi Dastani, Thijs van Ommen

In Proceedings of the 3rd conference of Causality Learning and Reasonig (CLeaR) 2024. Accepted for both poster and oral presentation.

This research contributes to the formal understanding of the notions of causal entropy and causal information gain by establishing and analyzing fundamental properties of these concepts, including bounds and chain rules. Furthermore, we elucidate the relationship between causal entropy and stochastic interventions. We also propose definitions for causal conditional entropy and causal conditional information gain.

Causal Entropy and Information Gain for Measuring Causal Control

Francisco N. F. Q. Simoes, Mehdi Dastani, Thijs van Ommen

In Proceedings of the European Conference on Artificial Intelligence (ECAI) 2023. Work presented at the third XI-ML workshop of ECAI 2023.

We introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables.

Jaccard Kernel PCA in genotype and gene-burden data for ALS

Francisco N. F. Q. Simoes (supervisor: Kevin Kenna)

Report for my internship at the Utrecht Brain Center, part of the Utrecht Medical Center (2020).

The project aimed to study whether one could enhance population stratification control in large-scale genetic studies of Amyotrophic lateral sclerosis (ALS) by utilizing Jaccard principal component analysis (jPCA) as an alternative to standard methods like PCA, which are ineffective for rare genetic variants. To that end, I developed a set of scripts (written in R) capable of running jPCA on very large datasets (through parallelization) and another one allowing for arbitrary positive integers gene-burden values.

The Monoidal Category of D-branes in a Kazama-Suzuki Model

Francisco N. F. Q. Simoes (supervisors: Stefan Vandoren, Ana Ros Camacho)

My Master's thesis. Written for completion of the MSc program in Theoretical Physics (2019).

This thesis consists of an application of category theory to string theory. Concretely, we show that the D-branes of the most prolific Kazama-Suzuki model (which is an N = 2 superconformal field theory with central charge c = 9) form a category finite on objects. We furthermore prove that this category admits a notion of tensor product and thus a monoidal structure. I believe this thesis can be useful to someone trying to understand Kac-Moody algebras in the context of string theory, and representation theory of the Virasoro algebra in general.

Posters and Talks

Supervision & Teaching

[In Progress. Putative title: Graphical Rules for Optimal Intervention Selection Search]

2023-Present

Supervisee: Sem Yedema. Co-supervisor: Thijs van Ommen.

This Master's thesis was incorporated into the student's internship at ProRail.

The goal of this project is to discover rules that help us minimize the number of interventions that need to be tested in order to find the optimal intervention for a given target. More details to come.

Causal discovery from train network data with background knowledge

2022-2023

Supervisee: Vera Schoonderwoerd. Co-supervisor: Thijs van Ommen.

This Master's thesis was incorporated into the student's internship at ProRail.

The goal of this project was to learn an SCM describing train delay data provided by ProRail. The student applied a modified version of the FCI algorithm to train delay data. This algorithm took into account the background knowledge one has about causal relationships between train delays, making it feasible to learn the causal graph despite the high number of variables. Each structural equation was then estimated by fine-tuning a neural network previously trained on the entire dataset, and which also incorporated exogenous variables external to the delay data.

Teaching

Advanced Machine Learning - MSc Computer Science

2024

Tutorial coordinator; Guest lecturer; Teaching Assistant

Machine Learning - BSc Computer Science

2023

Tutorial coordinator; Teaching Assistant

Advanced Machine Learning - MSc Computer Science

2023

Tutorial coordinator; Guest lecturer; Teaching Assistant

Machine Learning - BSc Computer Science

2022

Tutorial coordinator; Teaching Assistant

Calculus and Linear Algebra - University College Utrecht

2019

Teaching Assistant