Past MLG events (imported from old website)

[MLG EVENT] Numerical methods for gradient algorithms on manifolds (Miguel Atencia)

-- Posted by Pierre Dupont on Sun 26 April 2015 at 05:10 pm --

Event date: Mon 04 May 2015 at 11:00 am
Place: Otlet Seminar Room (Reaumur, 3rd floor)

Numerical methods for gradient algorithms on manifolds

Dr. Miguel Atencia (Universidad de Málaga, Spain)

This work is motivated by continuous-time Hopfield networks, which are dynamical systems that perform optimization on a bounded set. The key to their optimization ability is the existence of a Lyapunov function that decreases along trajectories, thus they can be formulated as gradient systems. When implementing such algorithms on a digital computer, it is imperative to consider numerical methods that preserve the gradient descent property. In this regard, discrete gradient methods are presented, highlighting their favourable analytical and computational properties. From this standpoint, we take the leap to gradient systems defined on Riemannian manifolds, with a threefold perspective: dealing with spurious trajectories of discretized Hopfield networks that escape their domain of definition; implementing more general algorithms, such as those defined on matrix manifolds or including constraints; and proposing numerical methods for simulating real systems from physics. We construct numerical methods for gradient dynamical systems on manifolds by applying a discrete gradient method to the system written in local coordinates. Encouraged by promising preliminary results, we will explore the design challenges as well as the possible applications of such methods for numerically integrating systems arising from machine learning, optimization or physics.

[MLG EVENT] Learning Deep Neural Networks (Romain Hérault)

-- Posted by Pierre Dupont on Fri 17 April 2015 at 06:12 pm --

Event date: Wed 29 April 2015 at 10:45 am
Place: Shannon seminar room (Maxwell, first floor, A-105)

Joint ISP group and MLG seminar.

In this talk we will addresses the problem of learning Deep Neural Network (DNN) through the use of smart initializations or regularizations. Moreover, we will look at recent applications of DNN to structured output problems (such as image labeling or facial landmark detection).

1) Introduction to supervised learning, why using regularization ? why looking for sparsity ?
2) Introduction to perceptron, multilayer perceptron and back-propagation
3) Deep Neural Network and the vanishing gradient problem
4) Smart initializations and topologies (stacked autoencoders, convolutional neural networks)
5) Regularizing (denoising and contractive AE, dropout, multi-obj)
6) Deep architecture for high dimensional output or structured output


Y. Bengio, A. Courville, P. Vincent, "Representation Learning: A Review
and New Perspectives," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug., 2013

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., &
Salakhutdinov, R. R. (2012). Improving neural networks by preventing
co-adaptation of feature detectors. (arXiv:1207.0580).

LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional
networks and applications in vision. In Circuits and Systems (ISCAS),
Proceedings of 2010 IEEE International Symposium on (pp. 253-256). IEEE.

J. Lerouge, R. Herault, C. Chatelain, F. Jardin, R. Modzelewski, IODA:
An input/output deep architecture for image labeling, Pattern
Recognition, Available online 27 March 2015, ISSN 0031-3203,

[MLG EVENT] Visual Data Analysis - On the hunt for unknown unknowns, and opening very black boxes (Prof Jan Aerts)

-- Posted by Roberto D'Ambrosio on Tue 17 March 2015 at 09:37 am --

Event date: Mon 23 March 2015 at 02:00 pm
Place: Otlet seminar room, Reaumur building 3rd floor

Data analysis has been automated more and more in the last few decades. Advances in machine learning and statistics make it possible to gain a lot of information from large datasets. But are we starting to rely too much on those algorithms alone? In this talk, Jan Aerts will highlight some issues with automated data analysis in general, as well as applied to biology specifically. He will elaborate on different aspects in the field of data visualization (the data, the human, and the combination of the two) and how it can help us to bring data analysis to the next level.

[MLG EVENT] Interpretable models for interdisciplinary data exploration. From using to assisting the expert: one example. (Kerstin Bunte)

-- Posted by Roberto D'Ambrosio on Wed 21 January 2015 at 10:14 am --

Event date: Mon 26 January 2015 at 02:00 pm
Place: Otlet seminar room, Reaumur building 3rd floor

Due to improved sensor technology, dedicated data formats and rapidly increasing digitalization capabilities the amount of electronic data produced by scientific, educational, medical, industrial and other applications increases dramatically since decades. This fact leads to the consequence that manual inspection of digital data sets often becomes infeasible. Automatic methods which identify structural characteristics or task relevant information and help humans to quickly scan through massive data amounts are required. Therefore, many powerful machine learning techniques have been developed for different problems ranging from e.g. dimensionality reduction, feature extraction, regression, classification and information retrieval. Systems which not only enhance the performance, but also provide intuitive and interpretable insights to the data and the actual task at hand are highly desired. In this presentation I will focus on the introduction to the idea of distance learning, dimensionality reduction and visualization by means of an interpretable prototype based method for biomedical problems.

[MLG EVENT] Metric Learning for Temporal Sequence Alignment (Rémi Lajugie)

-- Posted by Roberto D'Ambrosio on Mon 12 January 2015 at 11:07 am --

Event date: Mon 19 January 2015 at 02:00 pm
Place: Otlet seminar room, Reaumur building 3rd floor

In this talk, we propose to learn a Mahalanobis distance to perform alignment of multivariate time series. The learning examples for this task are time series for which the true alignment is known. We cast the alignment problem as a structured prediction task, and propose realistic losses between alignments for which the optimization is tractable. We provide experiments on real data in the audio-to-audio context, where we show that the learning of a similarity measure leads to improvements in the performance of the alignment task. We also propose to use this metric learning framework to perform feature selection and, from basic audio features, build a combination of these with better alignment performance.


[MLG EVENT] Thesis defense : Incremental organ segmentation with machine learning techniques – Application to radiotherapy (Guillaume Bernard)

-- Posted by Guillaume Bernard on Mon 17 November 2014 at 10:20 am --

Event date: Fri 21 November 2014 at 04:00 pm
Place: ISV - Salle de Séminaire B059, Bâtiment Carnoy, Croix du Sud 5

Radiotherapy is a cancer treatment modality that can be considered as a ballistic problem where the tumour must be irradiated while sparing the surrounding healthy organs. Before the beginning of the treatment, the radiation oncologist draws the contours of the organs at risk, tumour and nodal areas on an image of the patient. The accuracy of these contours is crucial to deliver an optimal treatment. In practice, manual drawing is a quite lengthy and repetitive operation, which is moreover subject to certain variability.

This thesis aims at providing new automatic tools to facilitate the delineation of organs at risk. For this purpose, an automatic method of organ segmentation using machine learning techniques is proposed. By automating the most repetitive part of organ drawing, the radiation oncologist can concentrate on the more essential steps of treatment planning.

We propose a method that first over-segmented the images with watershed algorithm in order to determine homogeneous areas and to avoid processing each pixel independently. Moreover, we assume that neighbouring pixels with similar intensity are both part of the same organ. Each organ is thus comprised of one or several such areas. To determine the organ each area belongs to, we propose to use classification algorithms, such as SVM and random forests, in an incremental way. The organs are identified one after the other. Organs, which are already identified, contribute to identifying the next ones.

We show that this incremental method is effective and allows improving the organ classification accuracy. However, in order to get optimal results, a good sequence of identification is necessary. Two techniques are presented in this thesis to determine such a good sequence of organ classification, starting from a few annotated images. We show that these techniques lead to high performances of identification. We also show that a human operator can easily and quickly correct errors made by the incremental approach. Finally, our method being generic, it can be adapted to all regions of the human body.

[MLG EVENT] Extreme Learning Machines: a Kernel Point of View (Benoît Frenay, Assistant Professor, Namur University, Belgium)

-- Posted by Pierre Dupont on Thu 13 November 2014 at 03:09 pm --

Event date: Mon 17 November 2014 at 02:00 pm
Place: Otlet seminar room, Reaumur building 3rd floor

Extreme learning machines (ELMs) are models which are fast to learn, yet they provide good results in terms of accuracy. These neural networks use a predefined, randomised set of hidden neurons and only their output weights are optimised. In this talk, I will provide an introduction to extreme learning.

[MLG EVENT] Computationally Efficient Test for Gene Set Dysregulation (Adrien Dessy, Ph.D. student, ICTEAM/INGI)

-- Posted by Adrien Dessy on Mon 13 October 2014 at 04:20 pm --

Event date: Mon 20 October 2014 at 02:00 pm
Place: Shannon room (a.105), Maxwell building 1st floor

Differential analysis of Gene Regulatory Networks has been raising a growing interest lately. The aim of this kind of analysis is to assess if the interactions or associations of genes differ between two or more biological conditions (typically, a disease status). The identification of genetic regulatory pathways whose structure is condition-dependent brings insights about organisms and diseases functioning at the molecular level.

In this presentation, we introduce a computationally efficient test to assess from gene expression data if a given group of genes is differentially regulated between two conditions.

The method yields promising results in terms of precision and recall on real datasets. Indeed, the statistical evaluation shows that the proposed test significantly outperforms a baseline test based on univariate analysis of gene expression data. Moreover, the new test offers performances equivalent to those of the standard permutation test usually implemented in this context, while being much more efficient from a computational point of view.

[MLG EVENT] Handling Imbalanced Datasets by Reconstruction Rules in Decomposition Schemes (Dr. Roberto d'Ambrosio, Post-doc researcher, ICTEAM/INGI)

-- Posted by Pierre Dupont on Mon 06 October 2014 at 02:44 pm --

Event date: Mon 13 October 2014 at 02:00 pm
Place: Otlet seminar room, Reaumur building 3rd floor

Disproportion among class priors is encountered in a large number of domains making conventional learning algorithms less effective in predicting samples belonging to the minority classes. Most of the proposals reported in the literature deal with classification imbalanced problems composed of two classes usually named as dichotomies, whereas small efforts have been directed towards multiclass recognition tasks, which are referred to as polychotomies. In this latter case existing approaches can be roughly divided in two groups. In the first one there are the methods addressing directly the multiclass tasks, whereas in the second group there are the methods using decomposition schemes. A decomposition scheme divides the polychotomy into several binary classification tasks, and then a reconstruction rule combines the predictions of binary learners to estimate the final decisions.In the case of a skewed classification task addressed by a decomposition approach, existing methods propose specialized classification algorithms that work at level of single binary learners, while little efforts have been directed to develop a reconstruction rule specifically tailored for class imbalance learning.On this motivation, this talk presents two reconstruction rules suited to multiclass skewed data.These rules use information coming from the classification reliability i.e. a measure of the goodness of classification acts. This quantity takes into account phenomena like noise, borderline samples, etc., and conveys useful information on the classification process. The first rule, Reconstruction Rule by Selection, is an empirical rule that addresses the imbalanced problem in the One-per-Class decomposition framework. This rule uses classifiers reliabilities, crisp labels and a-priori samples distribution to compute the final decision. The second rule is a statistical reconstruction rule in the Error Correcting Output Code (ECOC) decomposition framework. This rule applies softmax regression in order to estimate the final classification. Both the rules shows better performances at solving imbalanced problems than traditional methods.

[MLG EVENT] Using Machine Learning and Bayesian methods to analyze Large Supernovae Datasets (Prof. Melvin Varughese, University of Cape Town, South Africa)

-- Posted by Pierre Dupont on Mon 11 November 2013 at 11:17 am --

Event date: Mon 02 December 2013 at 03:00 pm
Place: Nyquist Room, Maxwell Building (1st floor)

Type Ia Supernovae are one of the most widely used standard candles in astronomy. That is, they can be used to calculate distances to astronomical objects since their peak brightness is very well-determined. In addition to distance, one may use a spectroscopic reading of a type Ia supernova (or its host galaxy) to determine how fast it is receding from us. Consequently, by examining how the speed of recession varies with the distance to the supernova, it is possible to recreate the expansion history of the universe. This history enables us to estimate cosmological parameters for the early universe.

A serious impediment to utilizing type Ia supernova to learn about the early universe is the probable contamination of the dataset with other supernova types. Any supernovae dataset will consist of multiple types of supernova, where (with the exception of type Ia supernova) we are neither able to determine the peak brightness of the supernova nor the distance to the supernova. It is important to account for this contamination as misclassifying a supernova as a type Ia will severely bias our estimates of the cosmological parameters.

This seminar studies how to go about determining the probability of a transient object being a type Ia supernova given a set of flux measurements for the object through time. In the process, we characterize the flux measurements with a parameterized function and feed the resulting fits into a classification algorithm. The algorithms are trained on 1,200 spectroscopically observed supernovae and are subsequently used to predict the classes of 20,000 test supernovae. These probabilities may be used within a Bayesian framework to obtain an unbiased estimate of the Bayesian parameters.

[MLG EVENT] PhD Confirmation (Jérôme Paul)

-- Posted by Jérôme Paul on Mon 18 November 2013 at 09:55 am --

Event date: Wed 20 November 2013 at 02:00 pm
Place: Otlet Meeting Room (3rd floor, Reaumur, LLN)

Dear all,

It is my pleasure to invite you to my PhD Confirmation. It will take place this Wednesday (November 20) at 2pm in the Otlet meeting room.

I will summarize my work about feature selection from heterogeneous biomedical data.
I'll mainly focus on three aspects :
- an analysis of random forest's stability
- a statistically interpretable feature importance measure for random forest
- heterogeneous feature selection with kernels


[MLG EVENT] Bayesian regression and classification with multivariate sparsifying priors (Tom Heskes, Radboud University Nijmegen, Editor-in-Chief of Neurocomputing)

-- Posted by Jérôme Paul on Tue 18 June 2013 at 10:12 am --

Event date: Tue 25 June 2013 at 02:00 pm
Place: séminaire Shannon, Maxwell a.105

Many regression and classification problems in neuroimaging and bioinformatics belong to the class large p, small n : many variables, just a few data points. Popular methods for handling such problems include L1-regularization and spike-and-slab variable selection. These methods are univariate when it comes to determine which variables are selected. In this talk I will present multivariate extensions that allow for the incorporation of (spatio-temporal) constraints and lead to smooth importance maps. I will discuss how to arrive at efficient algorithms for (approximate) inference and will illustrate the methods on fMRI analysis and EEG source localization.

[MLG EVENT] Stochastic gradient methods for machine learning (Roman Zakharov)

-- Posted by Jérôme Paul on Fri 15 March 2013 at 01:50 pm --

Event date: Fri 29 March 2013 at 02:00 pm
Place: Auditoire BARB21, Sainte Barbe building (Place Sainte Barbe, Louvain-la-Neuve)

This talk is inspired by the talk "Large-scale convex optimisation for machine learning", given by Francis Bach (INRIA - Ecole Normale Superieure) at the "Nonsmooth optimisation in machine learning" workshop and based on his presentation.

Many machine learning problems can be cast as convex optimisation problems. The common issue in solving these problems is the size of the data, either there are many observations ("large n"), or each observation is presented by many variables ("large p"), or both. In this setting online algorithms which pass over the data only once, may be preferred over batch algorithms, which require multiple passes over the data. This talk gives an overview of the stochastic gradient methods and shows that an appropriate combination of batch and online algorithms leads to interesting results, such as a linear convergence rate with an iteration cost similar to stochastic gradient descent.

[MLG EVENT] Supervised Metric Learning with Generalization Guarantees (Aurélien Bellet)

-- Posted by Pierre Dupont on Tue 23 October 2012 at 08:28 pm --

Event date: Mon 12 November 2012 at 02:00 pm
Place: Otlet room - Reaumur (3rd floor)


In recent years, the crucial importance of metrics in machine learning algorithms has led to an increasing interest in optimizing distance and similarity functions using knowledge from training data to make them suitable for the problem at hand. This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. These approaches have two important limitations. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so
far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored. In this talk, I present theoretical and algorithmic contributions that address these limitations. More precisely, I describe new approaches (inspired by the recent theory of good similarity
functions) to learn similarities for numerical and structured data through convex optimization. Our methods can be seen as minimizing a bound on the generalization error of a linear classifier built from the learned similarity. Experiments on various datasets highlight the usefulness of these approaches, in particular the important sparsity of
the resulting classifiers, allowing fast predictions even in high dimension. I also briefly describe a simple adaptation of the notion of algorithmic robustness that allows the derivation of generalization guarantees for many existing metric learning methods.

[MLG EVENT] Hybrid Semantic Similarity Measures (Alexander Panchenko, Université catholique de Louvain and Moscow State Technical University)

-- Posted by Jérôme Paul on Tue 02 October 2012 at 09:50 am --

Event date: Mon 05 November 2012 at 02:00 pm
Place: Otlet seminar room (Reaumur, 3rd Floor)

This talk is devoted to an application of Machine Learning to Natural Language Processing. We present several novel hybrid semantic similarity measures between terms. These meta-measures use as features 16 baseline measures based on wordnet, web as a corpus, text corpora, dictionaries, and encyclopedia. The hybrid measures rely on 8 combination methods and 3 simple feature selection techniques. In particular, we present a supervised measure trained on a set of known semantic relations between terms. The results are evaluated on thee tasks : correlation with human judgements, semantic relation ranking, and semantic relation extraction. Our results show that hybrid measures outperform single baseline measures by a wide margin, achieving a correlation up to 0.890 with human judgements and Precision(20) up to 0.987 on the semantic relation ranking task. We also discuss possible applications of the developed measures to text classification, query expansion, and term/text clustering.


[MLG EVENT] Semi-Supervised Learning with Kernel Induction: a subject for research collaboration (Antonio Braga (UFMG university, Belo Horizonte, Brazil))

-- Posted by Jérôme Paul on Mon 22 October 2012 at 03:18 pm --

Event date: Thu 18 October 2012 at 11:30 am
Place: Maxwell seminar room

The format of the seminar will be to raise questions for further collaboration more than presenting solutions to current problems.

[MLG EVENT] Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization (Jaakko Peltonen (Aalto University, Finland))

-- Posted by Jérôme Paul on Thu 07 June 2012 at 01:55 pm --

Event date: Fri 08 June 2012 at 02:00 pm
Place: Otlet seminar room (Reaumur, 3rd Floor)

Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness measures and new visualization methods. The task is information retrieval given the visualization: to find similar data based on the similarities shown on the display. The fundamental tradeoff between precision and recall of information retrieval can then be quantified in visualizations as well. The user needs to give the relative cost of missing similar points vs. retrieving dissimilar points, after which the total cost can be measured. We then introduce a new method NeRV (neighbor retrieval visualizer) which produces an optimal visualization by minimizing the cost. We further derive a variant for supervised visualization; class information is taken rigorously into account when computing the similarity relationships. We show empirically that the unsupervised version outperforms existing unsupervised dimensionality reduction methods in the visualization task, and the supervised version outperforms existing supervised methods.

[MLG EVENT] Including prior knowledge in machine learning for genomic data (Jean-Philippe Vert (Mines ParisTech - Institut Curie))

-- Posted by Pierre Dupont on Wed 26 October 2011 at 04:59 pm --

Event date: Mon 14 November 2011 at 11:00 am
Place: Euler seminar room (A002, ground floor)

Estimating predictive models from high-dimensional and structured genomic data measured on a small number of samples is one of the most challenging statistical problems raised by current needs in computational biology. Popular tools in statistics and machine learning to address this issue are so-called shrinkage estimators, which minimize an empirical risk regularized by a penalty term, and which include for example support vector machines or the LASSO. In this talk we will discuss new penalty functions for shrinkage estimators, including generalizations of the LASSO which lead to particular sparsity patterns, and which can be seen as a way to include problem-specific prior information in the estimator. We will illustrate the approach by several examples such as the classification of gene expression data using gene networks as prior knowledge, or the classification and detection of frequent breakpoints in DNA copy number profiles.

To attend this event, please register on the CIL website (link below).


[MLG EVENT] Multi-Attribute Networks and the Impact of Partial Information on Inference and Characterization (Eric Kolaczyk)

-- Posted by Pierre Dupont on Mon 17 October 2011 at 12:52 pm --

Event date: Mon 07 November 2011 at 02:00 pm
Place: UCL, Euler room, bâtiment Euler, avenue Georges Lemaître, 4, Louvain-la-Neuve

Association networks represent systems of interacting elements, where a link between two different elements indicates a sufficient level of similarity between element attributes. While in reality relational ties between elements can be expected to be based on similarity across multiple attributes, the vast majority of work to date on association networks involves ties defined with respect to only a single attribute. We propose an approach for the inference of multi-attribute association networks from measurements on continuous attribute variables, using canonical correlation and a hypothesis-testing strategy. Within this context, we then study the impact of partial information on multi-attribute network inference and characterization, when only a subset of attributes is available. We examine through a combination of analytical and numerical techniques the implications of the choice and number of node attributes on the ability to detect network links and, more generally, to estimate higher-level network summary statistics, such as node degree, clustering coefficients, and measures of centrality. We consider in detail the case of two attributes and discuss generalization of our findings to more than two attributes. Our work is motivated by and illustrated within the context of gene/protein regulatory networks in human cancer cells.
Joint work with Natallia Katenka.

[MLG EVENT] Ensemble Logistic Regression for Feature Selection (Roman Zakharov)

-- Posted by Jérôme Paul on Fri 07 October 2011 at 06:35 pm --

Event date: Wed 26 October 2011 at 02:00 pm
Place: Otlet seminar room (Reaumur, 3rd floor)

A novel feature selection algorithm embedded into logistic regression is
proposed in this talk. It specifically addresses high dimensional data
with few observations, which are commonly found in the biomedical domain
such as microarray data. The overall objective is to optimize the
predictive performance of a classifier while favoring also sparse and
stable models.

Feature relevance is first estimated according to a simple t-test
ranking. This initial feature relevance is treated as a feature sampling
probability and a multivariate logistic regression is iteratively
reestimated on subsets of randomly and non-uniformly sampled features.
At each iteration, the feature sampling probability is adapted according
to the predictive performance and the weights of the logistic
regression. Globally, the proposed selection method can be seen as an
ensemble of logistic regression models voting jointly for the final
relevance of features.

Practical experiments reported on several microarray datasets show that
the proposed method offers a comparable or better stability and
significantly better predictive performances than logistic regression
regularized with Elastic Net. It also outperforms a selection based on
Random Forests, another popular embedded feature selection from an
ensemble of classifiers.

[MLG EVENT] Using weighted automata as a probabilistic model (Raphaël Bailly)

-- Posted by Pierre Dupont on Wed 11 May 2011 at 11:48 am --

Event date: Wed 08 June 2011 at 11:00 am

Raphaël Bailly from the Laboratoire d'Informatique Fondamentale of the Marseille University (France)
will give a talk on Wednesday June 08, 2011 from 11.00 to 12:00 in the Euler 002 seminar room.
This is a joint seminar from the Machine Learning Group and the Large Graphs and Networks Group.


The framework of this seminar is the problem of density estimation on
structured data (finite strings or trees). The most widely used model for
this kind of task is the Hidden Markov Model, or HMM, whose training is
done by the Baum-Welch algorithm, which maximizes the likelihood of a
training sample. It is then necessary to limit the expressiveness of the
considered HMM family by setting a maximum number of hidden states, in
order to avoid overfitting. This is usually done by an a priori knowledge,
or by some heuristics.

The Weighted Automata are a generalization of HMMs. The main interest of
Weighted Automata relies on the fact that there is a simple, convergent
way to estimate the number of states and the parameters of such models.
The main problem comes from the use of these models as probabilistic
models - in the sense that Weighted Automata obtained by these methods do
not define, in general, a probability distribution.

We will expose some properties of these models, related to their use as
probabilistic models. We then give convergence results for the parameters
estimation method. Finally, we introduce some variations of Weighted
Automata which solve many of the problems mentioned here.

[MLG EVENT] Statistical Emulation of the Rapid Response of a Climate Model to Astronomical Forcing Variations (Nabila Bounceur, UCL)

-- Posted by Pierre Dupont on Mon 07 March 2011 at 02:14 pm --

Event date: Wed 11 May 2011 at 02:00 pm

Nabila Bounceur from the Georges Lemaître Centre for Earth and Climate Research (UCL) will give a seminar on Wednesday May 11 at 14.00. This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


We are interested in the response of a general circulation model of the atmosphere and the ocean to variations of the astronomical forcing during the Pleistocene. However,
the demand on computing resources would be far too excessive for long time simulations. So, our aim is to formulate a reduced order model for this response, by the
construction of a statistical model called an emulator, calibrated on the available runs.

Insolation during long time variations is influenced by the eccentricity, the longitude of the perihelion, and the obliquity. To deal with this astronomical theory of
paleoclimate, input data are expressed in an adequate base from those parameters. The choice of a small number of experiments to develop the emulator is crucial. The
distribution of the experimental points were made following two plans to compare them. A full factorial design and a Latin hypercube one. The latest maximize the
minimum distance between the design points. 27 experiments by plan were then made.

Here, we have developed and designed an emulator of a three-dimensional Earth system model of intermediate complexity (LOVECLIM, Goosse et al., 2010), considering the principal components of its response (surface temperature) using a
weighted principal component analysis. The first three principal component account for 99% of the response variance.

For the statistical emulation, we have first used a linear regression and then a nonlinear one using a combination of the Gaussian process with a linear regressor. The latest provides the quantification of uncertainty about evaluating the emulator at a limited number of input data. The surface responses obtained for each emulator parameter allow us, considering the uncertainties, the study of the influence of the variation of the orbital parameters on the surface temperature and the detection of nonlinearities.

[MLG EVENT] Elecrocardiogram Analysis (Simon Dablemont)

-- Posted by Simon Dablemont on Wed 23 February 2011 at 08:26 am --

Event date: Wed 13 April 2011 at 02:00 pm

Electrocardiogram (ECG) is used, in clinical stage, when drugs are tested in human volunteers during a series of clinical trials.

ECG is a non-invasive signal that measures the electrical activity of the heart.

Drug-induced changes in the ECG intervals are now the gold standard for assessing the potential of a drug to cause sudden cardiac death.

The aim of researchers is to develop algorithms for automatically computing ECG intervals measurements in each of the automated measurements.

At this time, Hidden Markov models are in use. Some researchers use a Non Linear Kalman Filter, but this filter seems very difficult to stabilize.

We suggest a new Functional approach that aligns measurements to find a Target curve and normalize the annotations of experts by non-linear warping functions

[MLG EVENT] Ph.D. Defense - Automated modeling and processing of long-term electrocardiogram signals. (G. de Lannoy)

-- Posted by Gael de Lannoy on Wed 02 March 2011 at 12:05 pm --

Event date: Fri 08 April 2011 at 02:30 pm

With the growing complexity of clinical technology, the medical community is now facing large amounts of data. These data originate from such various sources as radiology imaging, microarray experiments, spectroscopy and physiological signals among others. Physiological signals consist in the recording of the electrical activity generated by the human body, for example in the muscles or in the brain. The analysis of these signals yields a great potential for such various tasks as brain-computer interfaces and the monitoring of body functions, including the diagnosis of disease conditions. In particular, the electrocardiogram (ECG) is a physiological signal representing the electrical activity produced by the heart. In the large majority of situations involving the recording of an ECG signal, a long-term monitoring is required not to miss any transient pattern. This thesis focuses on the design and the assessment of machine learning algorithms to automatically process such ECG recordings. In particular, four crucial topics are investigated.

(1) The first topic concerns the accurate and automatic segmentation of the ECG characteristic points, corresponding to the onset and ending of the P, QRS and T waves using sparse conditional random fields and the continuous wavelet transform. Experiments are conducted on real human ECG recordings.

(2) The second topic concerns the analysis of fluctuations in autonomic activity by heart rate variability metrics. A critical review of heart variability metrics is provided and the possible use of heart rhythm variations as a marker of epileptic seizures is investigated. Experiments are conducted on real ECG recordings from epileptic patients.

(3) The third topic focuses on the supervised classification of heart beats, i.e. the labeling of beats in a recorded ECG signal as either a normal beat or a pathological beat. The main difficulties are the strong unbalance in the number of beats of each type and the extraction of discriminative features from the heart beat time-series. Specific weighted and sparse classifiers able to handle these difficulties are designed. In particular, a weighted variant of the SVM and of the (L1-regularized) CRF classifiers are investigated. Experiments are conducted on real pathological ECG signals.

(4) The fourth topic concerns the filtering of ECG artifacts in other physiological recordings using ad-hoc filtering and semi-blind source separation techniques, such as periodic component analysis. Experiments are conducted on invasive recordings from the vagus nerve in rats.

[MLG EVENT] DFASAT: Learning Finite State Software Models By Satisfiability (Sicco Verwer)

-- Posted by Pierre Dupont on Wed 23 March 2011 at 06:08 pm --

Event date: Mon 04 April 2011 at 02:00 pm

Sicco Verwer received a PhD from Delft University and is currently a post-doctoral researcher at KULeuven.
He will give a seminar about the winning algorithm of the STAMINA competition.

This talk will take place on Monday April 04, at 14.00 in the Euler seminar room.


In this talk, we will give an overview of the techniques we used to solve difficult problem instances in the STAMINA DFA learning competition. First, we describe the base algorithm which is a standard evidence-driven state-merging type algorithm but uses a SAT solver in order to search for an optimal solution. Second, we will explain the modifications we made to the base algorithm in order to learn models for software systems. Especially using a different evidence value proved to be vital for winning the STAMINA competition. This essentially changes the traditional Occam learning bias to one that is more suited to software models. Even with this improved bias,
searching for the optimal answer remains very useful.

This talk will be followed by an open discussion about the design of such a competition.

[MLG EVENT] Low-rank matrix completion for recommender systems: optimization on manifolds at work. (Nicolas Boumal, UCL/ICTEAM/INMA)

-- Posted by J�r�me Paul on Tue 08 February 2011 at 10:04 am --

Event date: Fri 25 March 2011 at 02:00 pm

This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


We consider moderately large matrices (millions of entries) of low rank. We address the problem of recovering, i.e., completing such matrices when most of the entries are unknown, and allow for the observed entries to be noisy.

Matrix completion is directly linked to collaborative filtering and recommender systems. In this machine learning setting, the rows of the incomplete matrix correspond to items and the columns correspond to users, or vice versa. The known entries of the matrix are the recorded ratings that some users gave to some items. The aim is to predict which items should be recommended to which users, i.e., predict the unobserved ratings.

The matrix completion problem is commonly stated in a constrained optimization framework. Our approach, but we are not the only ones, exploits the geometry of the low-rank constraint to recast the problem as an unconstrained optimization problem on the Grassmann manifold. We then apply a second-order Riemannian trust region method to find a good local minimizer of the associated objective function. By doing so, we improve on key aspects of existing methods such as Admira, FPCA, OptSpace, SET and others.

We demonstrate the performances of our algorithm in terms of accuracy and speed on synthetic and real data, and compare against existing algorithms (which we are still investigating). We will also discuss fundamental issues inherent to the extension our algorithm to very large scale problems (matrices with billions of entries such as, e.g., the Netflix data).


[MLG EVENT] An Assymetric Laplacian and Commute Times for a Directed Graph (Daniel Boley)

-- Posted by Gael de Lannoy on Tue 08 February 2011 at 10:26 am --

Event date: Fri 18 March 2011 at 11:00 am

On March 18, 2011, 11:00-12:00, Prof. Daniel Boley, University of Minnesota (department of Computer Science and Engineering), USA, will make a presentation of his recent work in the Euler room (Euler Building, Ground floor, UCL, Louvain-la-Neuve).

This event is jointly organised by the "Machine Learning Group" and "Large Graphs and Networks", UCL.

Here are the title and the abstract.

An Assymetric Laplacian and Commute Times for a Directed Graph.

For undirected graphs, it is well known how to obtain the expected
inter-vertex commute times from the graph Laplacian matrix. We show
the same formulas hold in the case of strongly connected directed graphs.
Our result is obtained by deriving a close relation between the Laplacian with
a particular scaling and the so-called Fundamental matrix for an associated
random walk over the graph. We find that the commute times still form a metric
and give bounds in terms of the stationary probabilities for the random walk.
We compare these commute times with those obtained from a previously proposed
symmetrized Laplacian derived from a related weighted undirected graph.

[MLG EVENT] Transcriptional Network Inference from Functional Similarity and Expression Data: A Supervised Approach. (Jerome Ambroise)

-- Posted by Thibault Helleputte on Thu 18 November 2010 at 03:34 pm --

Event date: Thu 16 December 2010 at 02:00 pm

In this seminar, I will present several algorithms enabling to reconstruct gene regulatory network from gene expression data. A gene regulatory network focuses on interactions between transcription factors and their target genes.

Unsupervised methods will be first presented including relevance networks and Gaussian Graphical Models. If the number p of variables (genes) is much larger than the number n of microarrays experiments, standard approaches to compute Gaussian Graphical Models are inappropriate. Suitable alternative based either on regularized estimation of the inverse covariance matrix, or on regularized high dimensional regression will be briefly introduced. Then, I will present the supervised algorithm recently proposed by F. Mordelet and J.-P. Vert. Using this supervised approach conducts to an improvement of predictive performances but this method cannot be used to predict interactions involving an 'orphan' transcription factor. Finally I will present the 'TNIFSED' method that infers gene regulatory network from the integration of correlation and partial correlation coefficients with the gene functional similarity through a supervised classifier. Compared to the supervised SIRENE algorithm, TNIFSED performed slightly worse when transcription factors are associated with a wide range of yet identified target genes. However, unlike SIRENE, predictive performance of TNIFSED does not decrease with the number of target genes, a feature which makes TNIFSED suitable to discover target genes associated with ’orphan’ transcription factors.

This talk will take place in the Euler Building Seminar Room.

[MLG EVENT] Presentation of the UCL-Geomatics research activities (Sophie Bontemps (Earth and Life Institute))

-- Posted by Thibault Helleputte on Thu 25 November 2010 at 03:25 pm --

Event date: Thu 09 December 2010 at 02:00 pm

The UCL-Geomatics research group is part of the Earth and Life Institute of the Université catholique de Louvain. Defined by end-user needs thanks to close collaborations with international agencies and private companies, the research strategy is centred on developing methods to use spatial and temporal data (which have been acquired from field measurements and/or satellite acquisitions) in the field of environmental management.

In remote sensing, the team expertise include canopy biophysical variable retrieval using radiative transfer modelling, comprehensive field experiments and advanced image processing techniques (such as object-based approaches, classification algorithms and multivariate statistical methods applied to space-time analysis). Applications focus on the estimation of biophysical variables (vegetation and soil) and on the land cover dynamics monitoring for the agriculture, forestry and climate modelling sectors. Through several international challenging projects (such as JRC-GLC2000 and ESA-GLOBCOVER), the team has acquired a unique expertise in mass processing of very large volumes of data, enabling it to handle all earth observation scales from the very high resolution imagery to the global daily time series.

The research activities also develop geographical information system applications supporting multidisciplinary modelling and participatory dynamics for natural resources management in both tropical and temperate regions.

Finally, the team is also specialized on the topic of spatial and temporal modelling and analysis data, both from the theoretical and practical viewpoints, with a special emphasis on (i) accounting simultaneously for data sources exhibiting different spatial/temporal quality and resolution (ii) solving for conflicting information through data fusion procedure. This expertise covers the development of advanced mathematical models as well as the development of user-friendly numerical analysis packages. Methods that are used cover Bayesian analysis, spatial statistics, advanced geostatistics and time series analysis.

This seminar aims at presenting some of the research projects in which the UCL-Geomatics group is involved. The methodologies which are used will be introduced as well as the scientific and societal issues they address.

This talk will take place in the Euler Building Seminar Room.

[MLG EVENT] How to Validate Merging of Cancer Microarray Data Sets: An Extensive Comparison (Jonatan Taminau (VUB))

-- Posted by Thibault Helleputte on Sun 31 October 2010 at 10:14 am --

Event date: Thu 02 December 2010 at 02:00 pm

This talk will cover two distinct topics and will take place in the Euler Building Seminar Room.

Part 1. How to Validate Merging of Cancer Microarray Data Sets: An Extensive Comparison

Motivation: There is a vast amount of gene expression data that has been gathered in microarray studies all over the world. Many of these studies use different experimentation plans, different platforms, different methodologies, etc. Because there clearly is a need to create combined data sets which will allow more statistically relevant analysis, merging information of different studies is an important part of current research in bio-informatics and several algorithms have been proposed recently. In this article, we concisely describe several of those microarray data merging techniques and apply them on different cancer microarray data sets.
Results: We study three cases of increasing complexity and test all methods by using a number of popular validation criteria. Furthermore, we test the compatibility of the transformed data sets by performing cross-study classification. This setting, with different types of cancer, different microarray data merging techniques and different validation methods, shows us a lack of consistency between the obtained results and opens the perspective of new, more general, approaches.

Part 2. Applying Subgroup Discovery for the Analysis of String Quartet Movements

Descriptive and predictive analyses of symbolic music data assist in understanding the properties that characterize specific genres, movements and composers. Subgroup Discovery, a machine learning technique lying on the intersection between these types of analysis, is applied on a dataset of string quartet movements composed by either Haydn or Mozart. The resulting rules describe subgroups of movements for each composer, which are examined manually, and we investigate whether these subgroups correlate with metadata such as type of movement or period. In addition to this descriptive analysis, the obtained rules are used for the predictive task of composer classification; results are compared with previous results on this corpus.

[MLG EVENT] Expectation Propagation for Bayesian Multi-task Feature Selection (Daniel Hernandez-Lobato (ICTM/INGI))

-- Posted by Thibault Helleputte on Sun 31 October 2010 at 10:23 am --

Event date: Thu 25 November 2010 at 02:00 pm

Daniel Hernandez-Lobato (ICTM/INGI) will give a talk about a work presented at ECML 2010. The talk will take place in the Euler Building Seminar Room.

Abstract: In this talk we propose a Bayesian model for multi-task feature selection. This model is based on a generalized spike and slab sparse prior distribution that enforces the selection of a common subset of features across several tasks. Since exact Bayesian inference in this model is intractable, approximate inference is performed through expectation propagation (EP). EP approximates the posterior distribution of the model using a parametric probability distribution. This posterior approximation is particularly useful to identify relevant features for prediction. We focus on problems for which the number of features d is significantly larger than the number of instances for each task. We propose an efficient parametrization of the EP algorithm that offers a computational complexity linear in d. Experiments on several multi-task datasets show that the proposed model outperforms baseline approaches for single-task learning or data pooling across all tasks, as well as two state-of-the-art multi-task learning approaches. Additional experiments confirm the stability of the proposed feature selection method with respect to various sub-samplings of the training data.

[MLG EVENT] How to use machine learning to crack online captcha's. (Samuel Branders (ICTM/INGI))

-- Posted by Thibault Helleputte on Tue 02 November 2010 at 05:29 pm --

Event date: Thu 18 November 2010 at 02:00 pm

A captcha is a challenge-response test used to ensure the humanity of a user. By nature, captcha prevents the automation of some tasks on the internet. In this talk, several machine learning based approaches are proposed to overcome a particular captcha challenge (Megaupload). The Megaupload's captchas have some specific difficulties, like the tilting of the letters, different size of images, overlapping between letters and lot more. Nearest neighbors and linear SVM based solutions are compared. They lead to a practical application that allows to automatically pass the captcha challenge in only one or sometimes two trials.

Exceptionally, this talk will take place in Pierre Currie Building, room PCUR03.

[MLG EVENT] Non-linear Models and Learning for Near Infrared Spectra (Catherine Krier)

-- Posted by Pierre Dupont on Mon 03 May 2010 at 11:00 am --

Event date: Wed 12 May 2010 at 11:30 am

PhD Public defense

Near Infrared (NIR) spectrometry is a non-destructive and relatively cheap technology which enables automated controls in various domains such as food industry or pharmaceutics. Yet, if the quality of the prediction obtained thanks to NIR spectra is important, identifying the chemical components responsible for the prediction is also an essential issue, often neglected by traditional methods.
Generally speaking, NIR spectra may be considered as high-dimensional vectors, with an important degree of redundancy between components. These properties lead to numerical issues and render the models complex to interpret. A dimensionality reduction step is consequently required. Besides, factors such as experimental conditions induce non-linearities in the relationship between the spectral variables and the parameter of interest, which are ignored by the models traditionally met in this context.
The main goal of this work is therefore to propose a methodology taking the non-linearities into account and leading to an easier interpretation in terms of wavelength bands. This methodology relies on three aspects: spectra and variables normalizations, dimensionality reduction steps and non-linear modeling. In particular, the dimensionality issue is addressed by filters based on the Mutual Information concept, and functional methods such as B-splines representation or variable clustering.
A study over six databases reveals that non-linear models globally outperform linear models. In addition, the proposed methodology enables to identify a reduced number of wavelength ranges which correspond mostly to spectral regions considered as meaningful by the specialists.

[MLG EVENT] Research discussion. Topic: Machine learning challenges. (Damien François and Pierre Dupont)

-- Posted by Damien Francois on Tue 06 April 2010 at 06:11 pm --

Event date: Wed 21 April 2010 at 11:00 am

Dear colleagues,

you might have noticed that several machine learning challenges have started recently (e.g. Stamina challenge). Some of us might be interested in participating in those challenges.

I will therefore animate a Research Discussion whose objective is to
- list actives challenges
- summarize challenge goals, constraints, and dates
- group interested people into teams to participate
- brainstorm for paths to solutions

The Research discussion will take place in the Euler seminar room (ground floor, room 002)

See you then


[MLG EVENT] Kernel-based modeling for spectral clustering (Carlos Alzate from the KUL Department of Electrical Engineering)

-- Posted by Arnaud de Decker on Fri 19 February 2010 at 04:32 pm --

Event date: Wed 24 March 2010 at 11:00 am

This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI)

Classical spectral clustering methods arise as relaxations of graph partitioning problems that are NP-hard. These relaxations take the form of eigenvalue problems involving a Laplacian matrix which represents similarities between each pair of data points. Spectral clustering is known to perform well in cases where standard clustering methods such as a k-means fail.
In this work, we present a different approach to spectral clustering. The clustering model is cast in a constrained optimization setting where the primal problem is expressed in terms of high dimensional feature maps typical of support vector machine formulations. The dual problem corresponds to an eigenvalue decomposition of a modified kernel matrix that can be regarded as a similarity matrix. The clustering information is contained in the projections onto the eigenvectors solution. One of the main advantages
of this primal-dual setting is the possibility to extend the clustering model to out-of-sample without making use of approximations. A model selection criterion designed to find the parameters of the model is also proposed. This criterion exploits the structure of the projections when the clusters are well-formed and can be used in a learning scheme with training, validation and test stages which is
important for good generalization performance. Another advantage lies in the enhanced extendability of the clustering model.
Additional constraints can be added to the core primal problem in order to achieve desirable properties. The core clustering model has been extended to incorporate prior knowledge in the form of pairwise constraints in the cluster assignments. For handling large data sets, sparse and highly sparse models have been proposed. These formulations aim at approximating the solutions by solving a reduced eigenvalue problem and expressing the projections in terms of a very reduced set of points.
Application examples include: image segmentation, time series analysis,text mining and power grid network analysis.

[MLG EVENT] Research Discussion: Evaluation methods for regression/classification (Everyone)

-- Posted by Thibault Helleputte on Wed 24 February 2010 at 09:53 am --

Event date: Wed 03 March 2010 at 11:00 am


What are the motivations for evaluating our models? Does the way we should do it depend on the application? What are the links between the evaluation metrics, the protocol used to compute it, and statistical tests used to assess the significance of several models performance? What are the pathological biases that are likely to leak into our evaluations? Does the way we evaluate regression models naturally applies to classification? And conversely? How outliers should be handled with respect to performance evaluation? Is MSE always the best metric for regression and Accuracy the best for classification?

These are just a small subset of the questions that we not only could, but should be able to answer as machine learning practitioners. We will discuss those, and many more on the same topic during the next Machine Learning Group Research Discussion.

For those who are not familiar with the MLG Research Discussions, please note that, unlike seminars, Research Discussions are intended to be as active and interactive as possible, without official "speaker". A few slides will just launch the discussion. Damien François, Thibault Helleputte, Michel Verleysen and Pierre Dupont will animate the debate.

LOCATION: Due to Industry Days in Reaumur building, this MLG Event will take place in the Euler building, Room 002.

[MLG EVENT] Research Discussion: Evaluation methods for regression/classification (everyone)

-- Posted by Thibault Helleputte on Wed 03 February 2010 at 03:37 pm --

Event date: Wed 10 February 2010 at 11:00 am

What are the motivations for evaluating our models? Does the way we should do it depend on the application? What are the links between the evaluation metrics, the protocol used to compute it, and statistical tests used to assess the significance of several models performance? What are the pathological biases that are likely to leak into our evaluations? Does the way we evaluate regression models naturally applies to classification? And conversely? How outliers should be handled with respect to performance evaluation? Is MSE always the best metric for regression and Accuracy the best for classification?

These are just a small subset of the questions that we not only could, but should be able to answer as machine learning practitioners. We will discuss those, and many more on the same topic during the next Machine Learning Group Research Discussion.

For those who are not familiar with the MLG Research Discussions, please note that, unlike seminars, Research Discussions are intended to be as active and interactive as possible, without official "speaker". A few slides will just launch the discussion. Damien François, Thibault Helleputte, Michel Verleysen and Pierre Dupont will animate the debate.

[MLG EVENT] Bayesian Models for Supervised Classification and the Expectation Propagation Algorithm (Daniel Hernández Lobato)

-- Posted by Pierre Dupont on Thu 28 January 2010 at 11:08 am --

Event date: Wed 03 February 2010 at 11:00 am

TITLE: Bayesian Models for Supervised
Classification and the Expectation Propagation Algorithm

Daniel Hernández Lobato from the UCL Machine Learning Group will give
a seminar on Wednesday February 03, 2010 at 11.00 am.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI)


This talk preents the second part of the PhD thesis og Daniel Hernández Lobato.
It proposes novel applications of Bayesian techniques with a focus on computational efficiency.
Specifically, the expectation propagation (EP) algorithm is used as an alternative to
more computationally expensive methods such as Markov chain Monte
Carlo or type-II maximum likelihood estimation. In this part of the
thesis we introduce the Bayes machine for binary classification. In
this Bayesian classifier the posterior distribution of a parameter
that quantifies the level of noise in the class labels is inferred
from the data. This posterior distribution can be efficiently
approximated using the EP algorithm. When EP is used to compute the
approximation, the Bayes machine does not require any re-training to
estimate this parameter. The cost of training the Bayes machine can be
further reduced using a sparse representation. This representation is
found by a greedy algorithm whose performance is improved by
considering additional refining iterations. Finally, we show that EP
can be used to approximate the posterior distribution of a Bayesian
model for the classification of microarray data. The EP algorithm
significantly reduces the training cost of this model and is useful to
identify relevant genes for subsequent analysis.

[MLG EVENT] Prediction Based on Averages over Automatically Induced Learners: Ensemble Methods and Bayesian Techniques (Daniel Hernández Lobato)

-- Posted by Pierre Dupont on Wed 23 December 2009 at 08:23 pm --

Event date: Fri 15 January 2010 at 10:30 am

Daniel Hernández Lobato from the UCL Machine Learning Group will give
a seminar on January 15, 2010 at 10.30 AM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI)


Prediction Based on Averages over Automatically Induced Learners: Ensemble Methods and Bayesian Techniques


Ensemble methods and Bayesian techniques are two learning paradigms
that can be useful to alleviate the difficulties associated
with automatic induction from a limited amount of data in the presence of noise.
Instead of considering a single hypothesis for prediction, these methods
take into account the outputs of a collection of hypotheses compatible with the
observed data. Averaging the predictions of different learners
provides a mechanism
to produce more accurate and robust decisions. However, the practical
use of ensembles
and Bayesian techniques in machine learning presents some
complications. Specifically,
ensemble methods have large storage requirements. The predictors of
the ensemble need to be kept in memory
so that they can be readily accessed. Furthermore, computing the final
ensemble decision
requires querying every predictor in the ensemble. Thus, the prediction cost
increases linearly with the ensemble size. In general, it is also difficult to
estimate an appropriate value for the size of the ensemble. On the
other hand, Bayesian approaches
require the evaluation of multi-dimensional integrals or summations
with an exponentially large number of terms that are often intractable.
In practice, these calculations are made using approximate algorithms
that can be
computationally expensive. This thesis addresses some of these shortcomings
and proposes novel applications of ensemble methods and Bayesian techniques
in supervised learning tasks of practical interest.

In the first part of this thesis we analyze different pruning methods
that reduce
the memory requirements and prediction times of ensembles.
These methods replace the original ensemble by a subensemble with good
generalization properties.
We show that identifying the subensemble that is optimal in terms of
the training error
is possible only in regression ensembles of intermediate size. For
larger ensembles
two approximate methods are analyzed: ordered aggregation and SDP-pruning.
Both SDP-pruning and ordered aggregation select subensembles that outperform
the original ensemble. In classification ensembles it is possible to
make inference about the final ensemble prediction by querying only a fraction
of the total classifiers in the ensemble. This is the basis of a novel
ensemble pruning method: instance-based (IB)
pruning. IB-pruning produces a large speed-up of the classification
process without significantly
deteriorating the generalization performance of the ensemble.
This part of the thesis also describes a statistical procedure for
determining an
adequate size for the ensemble. The probabilistic framework introduced
in IB-pruning can be used
to infer the size of a classification ensemble so that the resulting
ensemble predicts the same
class label as an ensemble of infinite size with a specified confidence level.

The second part of this thesis proposes novel applications of Bayesian
with a focus on computational efficiency. Specifically, the expectation
propagation (EP) algorithm is used as an alternative to more
computationally expensive methods
such as Markov chain Monte Carlo or type-II maximum likelihood estimation.
In this part of the thesis we introduce the Bayes machine for binary
In this Bayesian classifier the posterior distribution of a parameter
that quantifies the
level of noise in the class labels is inferred from the data.
This posterior distribution can be efficiently approximated using the
EP algorithm.
When EP is used to compute the approximation, the Bayes machine does
not require any re-training
to estimate this parameter. The cost of training the Bayes machine can
be further reduced using
a sparse representation. This representation is found by a greedy
algorithm whose performance is
improved by considering additional refining iterations. Finally, we
show that EP can be used
to approximate the posterior distribution of a Bayesian model for the
classification of microarray data.
The EP algorithm significantly reduces the training cost of this model
and is useful to
identify relevant genes for subsequent analysis.

[MLG EVENT] CIL course on Boosting: theoretical foundations and algorithms (Marc Sebban)

-- Posted by Pierre Dupont on Thu 22 October 2009 at 11:01 pm --

Event date: Fri 18 December 2009 at 11:00 am

Prof. Marc Sebban from the Hubert Curien Laboratory, University Jean Monnet, Saint-Etienne, France will give a CIL doctoral course, entitled "Boosting: theoretical foundations and algorithms".

The course will take place on Friday Dec 18, 2009 from 11.00 to 13.00 and 14.30 to 17:00, in the Otlet meeting room, 3rd floor of the Reaumur Building, INGI Department.


Free but *requested*. Please consult:

Boosting: theoretical foundations and algorithms

Boosting is a general method for improving the accuracy of any given learning algorithm. It refers to a general and provably effective method
of producing a very accurate prediction rule by combining many weak base classifiers. After an introduction to learning theory, this one day
course will explain the underlying theory of boosting by focusing especially on the well-known AdaBoost algorithm.

The following topics will be covered:
- Introduction to learning theory
- Ensemble methods
- Adaboost
- Learning and generalization: theoretical bounds
- Boosting and SVM
- Boosting and game theory

[MLG EVENT] Financial Market Crashes (Simon Dablemont)

-- Posted by Michel Verleysen on Tue 20 October 2009 at 06:41 pm --

Event date: Tue 24 November 2009 at 02:00 pm

Simon Dablemont will give a seminar/rsearch discussion entitled "Financial Market Crashes" on November 24, 2009 at 2.00 PM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).

Warning: this seminar will be given in French.


Stock market crashes are momentous events that are fascinating to practitioners.
For traders, the fear of a crash is a perpetual source of stress, and the onset of this event always ruins the lives of many people.
Modeling this event with Extreme Value Theory and GARCH models with long-tails doesn’t work.
We present a new approach adapted from the collective systems in theoretical physics with imitation process such as alignment of atomic spins to create magnetization and the renormalization group theory (RGB) to build models self-similar across scales.

Questions for the public?

Which kind of network can represent a collective system with different scales?

We have models with linear parameters “slaved” to nonlinear parameters.
In order to obtain the global solution, it should be useful to employ the “taboo search” to determine an “elite list” of solutions as the initial solutions of the ensuing line search procedure in conjunction with a quasi-Newton method.
? Who knows these procedures?

[MLG EVENT] An Introduction to Compressed Sensing: Combining Sparsity and Sampling (Laurent Jacques)

-- Posted by Pierre Dupont on Wed 30 September 2009 at 06:43 pm --

Event date: Tue 17 November 2009 at 02:00 pm

Laurent Jacques will give a seminar entitled "An Introduction to Compressed Sensing: Combining Sparsity and Sampling" on November 17, 2009 at 2.00 PM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


In this seminar, I will briefly introduce the concepts surrounding a recent revolution in sampling theory called Compressed Sensing (CS) due to the seminal works of E. Candès, T. Tao, J. Romberg and D. Donoho in 2006. In short, this theory shows that, without loss of information, a signal can be sampled in function of its intrinsic dimension rather than according to its cutoff frequency. The reason of this seemingly magic statement comes from (i) a generalization of the sampling into any linear measurement process, (ii) a prior knowledge of the signal structure, i.e. its assumed sparsity (or compressibility) in a certain basis (e.g. wavelets, DCT, curvelets, ...), and (iii) the use of non-linear reconstruction techniques (e.g. Basis Pursuit, Lasso, Greedy methods, ...).
The Compressed Sensing theory is actually at the junction of many scientific fields like signal processing, approximation theory, concentration of measure, convex optimization, polytope geometry ... It is however possible to quickly explain why the reconstruction of a signal from "less information than usual" is possible, and how this theory is also stable under the addition of noise on the measurement process. To conclude, I will illustrate this talk with some recent applications of CS in the field of compressive imaging and signal reconstruction under non-gaussian noise on the measurements.

[MLG EVENT] Image and character recognition with extended edit distances for all sequence alignments (Silvia Garcia Diez)

-- Posted by Pierre Dupont on Wed 07 October 2009 at 05:33 pm --

Event date: Tue 10 November 2009 at 02:00 pm

Silvia Garcia Diez from the UCL MLG will give a seminar on November 10, 2009 at 2PM
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


Sequence comparison is a common tool used over a wide range of domains
such as bioinformatics, speech and image recognition, error control, and
text mining. Finding similarities among sequences may provide us with
information about new, unknown data. Typical approaches try to find an
optimal alignment between two sequences or their partial substrings.
However, sub-optimal paths may also contain relevant information about the
similarity between two sequences.

We propose a novel approach based on Akamatsu's model which takes into
account, not only the optimal alignment between two
sequences, but also all the sub-optimal ones that tie them. In order to
achieve this, we add all the paths weighed by their cost and obtain a
distance measure. We will show how this method can be efficiently
implemented by the use of forward/backward variables for an extension of
the Levenshtein edit distance, as well as the longest common subsequence.
Competitive results for image and character recognition tasks are

[MLG EVENT] EEG feature extraction: General description and new approach using time-frequency distributions (Carlos Guerrero)

-- Posted by Michel Verleysen on Wed 14 October 2009 at 04:53 pm --

Event date: Tue 03 November 2009 at 02:00 pm

Carlos Guerrero will give a seminar entitled "EEG feature extraction: General description and new approach using time-frequency distributions" on November 3, 2009 at 2.00 PM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


The neural activity of the human brain starts from prenatal development and this electrical information is a good indicator of abnormality in the nervous central system. An alternative with rapidly developing, high temporal resolution and low cost is the electroencephalogram (EEG). The EEG signal can be a good indicator of abnormality in the central nervous system and abnormal EEG is a dynamic signal which exhibits non-stationary behavior with focal or multifocal activity, spikes, sharp wave and focal mono-rhythmic discharges.
This seminar briefly introduces to variety of approaches in EEG features extraction and describes a new method to identify seizures in EEG signals using feature extraction in time-frequency distributions (TFDs). Particularly, the method extracts features from the Smoothed Pseudo Wigner-Ville distribution using the tracks estimation based on McAulay-Quatieri sinusoidal model. The propose technique is based on the length of the track that, combined with energy and frequency features, allows to isolate a continuous energy trace from other oscillations when an epileptic seizure is beginning. The feature evaluation consist in two steps: firstly, the features values are calculated using 6 random EEGs, and secondly, 12 EEGs records are used for testing these values. The adaptability presented with measures like sensitivity and specifity show that our extraction method is a suitable approach for automatic seizure detection, and opens the possibility of formulating new criteria to detect, classify or analyze abnormal EEGs.
We conclude this talk by presenting future work to feature selection from a practical viewpoint.

[MLG EVENT] Dimensionality reduction: from PCA to recent nonlinear techniques (John Lee)

-- Posted by Pierre Dupont on Thu 24 September 2009 at 08:30 pm --

Event date: Tue 27 October 2009 at 02:00 pm

John Lee, from the Unité d'imagerie moléculaire et radiothérapie expérimentale, UCL, will give a talk on dimensionality reduction on Tuesday 27 Oct, at 2PM.

This is a joined CESAME-MLG seminar and it will thus take place in the EULER room.


Dimensionality reduction is an old yet unsolved problem, with many applications in data visualization, knowledge discovery, and machine learning in general. Our aim in this talk will be to review several developments in the field of dimensionality reduction, with a particular focus on nonlinear methods. As an introduction, we will point out some weird properties of high dimensional spaces, which will motivate the use of dimensionality reduction. Next, we will go back in time and start our review with a short reminder about well known techniques such as principal component analysis and multidimensional scaling. Our travel into time will also bring us to visit Sammon mapping and other methods based on distance preservation. Next, we will come across self-organizing maps and auto-encoders with bottleneck neural networks. Some spectral methods such as Isomap and locally linear embedding will be reviewed as well. A glance at recent methods based on similarity preservation such as stochastic neighbor embedding will close the survey. Finally, we will try to identify the relationships between the different approaches, and say a few words about quality criteria for dimensionality reduction techniques.

[MLG EVENT] Network Inference based on Mutual Information Applied to Microarray Data (Patrick Meyer)

-- Posted by Pierre Dupont on Wed 23 September 2009 at 11:13 am --

Event date: Tue 06 October 2009 at 02:00 pm

Patrick Meyer from the Université Libre de Bruxelles will give
a seminar on October 06, 2009 at 2.00 PM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


An important issue in computational biology is the extent to which it is possible to learn transcriptional interactions from measured expression data. The reverse engineering of transcriptional regulatory networks from expression data alone is challenging because of the combinatorial nature of the problem and of the limited amount of (noisy) samples available in expression datasets.
This talk will focus on information-theoretic approaches of network inference which typically rely on the estimation of mutual information and/or conditional mutual information from data in order to measure the statistical dependence between genes expressions. The adoption of mutual information in network inference can be traced back to Chow and Liu's tree algorithm. Nowadays, two main categories of information-theoretic network inference methods hold the attention of the bioinformatics community: i) methods based on bivariate mutual information that infer undirected networks up to thousands of genes thanks to their low algorithmic complexity and ii) methods based on conditional mutual information that are able to infer a larger set of relationships between genes but at the price of a higher algorithmic complexity. The strengths and weaknesses of these information-theoretic methods for inferring transcriptional networks will be detailed in this talk.

[MLG EVENT] Biomarker selection from microarray data: a transfer learning approach (Pierre Dupont)

-- Posted by Pierre Dupont on Tue 22 September 2009 at 10:57 am --

Event date: Fri 02 October 2009 at 04:00 pm

In the context of a Biostatistics seminar organized by the UCL Institute of Statistics (Room C115), Pierre Dupont will give a talk on biomarker selection from microarray data.

The full seminar starts at 2:30PM with a first talk by Josée Dupuis from Boston University School of Public Health, USA. Follow the link for the full program.


Classification of microarray data is a challenging problem as it typically relies on a few tens of samples but several thousand dimensions (genes).
Feature selection techniques are commonly used in this context, both to increase the interpretability of the predictive model and possibly to reduce its cost.
Feature selection aims at finding a small subset of the original covariates that best predicts the outcome.
In the case of clinical studies, the selected genes are considered to be biomarkers forming a signature of a patient status or his expected response to a treatment.
A good signature is also ideally stable with respect to sampling variation, under the assumption that the biological process modeled is (mostly) common across patients.

We focus here on embedded methods for which a multivariate feature selection is performed jointly with the classifier estimation.
We study in particular regularized (or penalized) linear models, such as extensions to linear support vector machines (SVM) or variants of the LASSO,
since they offer state of the art predictive performances for high dimensional and sparse data.
In this context, we describe two original contributions.

Firstly, some prior knowledge may be available to bias the selection towards some genes a priori assumed to be more relevant.
We present a novel optimization algorithm to make use of such a partial supervision as a soft constraint.
A practical approximation of this technique reduces to standard SVM learning with iterative rescaling of the inputs.
The scaling factors depend on the prior knowledge but the final selection may depart from it if necessary to optimize the classification objective.
Secondly, we show how to adapt the above algorithm in a transfer learning setting: a preliminary selection is performed on
one or several source dataset(s) and is subsequently used to bias the selection on a target dataset.
This is particularly relevant for microarray data for which each individual dataset is typically very small
but a fastly growing collection of related datasets are produced and made publicly available.
Experimental results illustrate that both approaches improve the stability and classification performances of the resulting models.
We conclude this talk by sketching some open issues, both from a theoretical and a practical viewpoint.

Attachment: DUPONT021009_16h00_2009.pdf


[MLG EVENT] L1-regularized classifiers for microarray data (Roman Zakharov)

-- Posted by Pierre Dupont on Mon 21 September 2009 at 09:30 pm --

Event date: Tue 29 September 2009 at 04:30 pm

Roman Zakharov from the UCL MLG will give
a seminar on September 29, 2009 at 4.30 PM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI).


We consider supervised learning problems with several thousand input features but only a few tens of samples, such as those produced from microarray experiments. In such settings it is easy to find a perfect classifier on the training data. However such a model is likely to generalize poorly. A standard way to control overfitting is to restrict the class of functions to (generalized) linear models. Our additional objective is to produce predictive models on a small subset of features. Feature selection improves the interpretability of the model and, in some cases, classification performance when some of the original features are unrelated with the outcome to be predicted.

In this context, L1-norm is a common regularizer to enforce sparsity while estimating
a predictive model. We argue in favor of L1-regularized logistic regression, which is a straightforward adaptation of the LASSO to classification problems. Estimating such a model requires solving a convex optimization problem. We present here an optimization algorithm that uses coordinate descent while always updating the dimension with the largest gradient.
We also discuss an alternative optimization algorithm that relies on LARS to solve a L1 regularized squared error regression. A third alternative is known as Elastic Net, a regularizer mixing L1 and L2 norms. Both LARS and Elastic Net include an embedded procedure to set automatically the regularization parameter. Our experiments show that, even without such a procedure, L1-regularized logistic regression produces very similar solutions for a wide range of regularizer values. The resulting estimation algorithm is thus preferable for its simplicity and computational efficiency.

Practical experiments were conducted on several microarray datasets. We report competitive
classification results with L1-regularized logistic regression with an automatic selection
of the most relevant features for a given regularization parameter.
We conclude our talk by stressing the additional issue of optimizing as well the stability
of the selected features with respect to data sampling.

[MLG EVENT] Subset Selection (Professor Ilse C.F. Ipsen)

-- Posted by Catherine Krier on Fri 21 August 2009 at 02:02 pm --

Event date: Thu 27 August 2009 at 10:30 am

Dear all,

The UCL Machine Learning Group is pleased to announce the following seminar:

Title: Subset Selection

Orator: Prof. Ilse C.F. Ipsen
Department of Mathematics
North Carolina State University
Raleigh, NC, USA

Abstract: Subset selection methods try identify those columns of a matrix that are "most" linearly independent. We discuss deterministic and randomized methods for subset selection, as well as the application of subset selection to nonlinear least squares problems in parameter estimation.

The seminar will be held in the Euler seminar room, on Thursday 27 August 2009 at 10:30.

Hope to see you there,

Catherine Krier

[MLG EVENT] Expectation Propagation for Microarray Data Classification (Daniel Hernández Lobato)

-- Posted by Pierre Dupont on Wed 12 August 2009 at 06:10 pm --

Event date: Mon 24 August 2009 at 11:00 am

Daniel Hernández Lobato from the Universidad Autónoma de Madrid will give
a MLG seminar on August 24, 2009 at 11 AM.
This talk will take place in the Otlet meeting room (3rd floor, Reaumur Building, INGI)


Microarray experiments are a very promising tool for disease treatment and early
diagnosis. However, the datasets obtained in these experiments typically have a
rather small number of instances and a large number of covariates, most of which
are irrelevant for discrimination. These features usually lead to
instabilities in microarray classification algorithms.

Bayesian methods can be useful to overcome this problem because they compute
probability distributions for the model coefficients rather than point estimates.
However, exact Bayesian inference is often infeasible
and hence, some form of approximation has to be made. In this talk we propose
a Bayesian model for microarray data classification based on a prior
distribution that enforces sparsity in the model coefficients. Expectation
Propagation (EP) is then used to perform approximate inference as an alternative to more
computationally demanding methods like Markov Chain Monte Carlo (MCMC) sampling.
The proposed model is evaluated on fifteen microarray datasets and compared with
other popular classification algorithms. These experiments show that the model
trained with EP performs well on the datasets investigated and is also
useful for identifying relevant genes for subsequent analysis.

[MLG EVENT] Coordinate structure analysis using sequence alignment and perceptron training. (Dr Masashi Shimbo)

-- Posted by Thibault Helleputte on Mon 15 June 2009 at 03:36 am --

Event date: Wed 24 June 2009 at 04:00 pm

Dr Masashi Shimbo, from the Nara Institute of Technology, Graduate School of Information Science (Japan), will give a MLG presentation the 24th of June at the IAG building (UCL), Place des Doyens 1, local b.017 (ground floor), at 16h00-17h30.

The talk is related to data mining and computational linguistics/text mining.

Abstract: Coordination is one of the major sources of syntactic ambiguity in natural language. We formulate coordination disambiguation as a problem of inverse sequence alignment, and use perceptron training to model the similarity of conjuncts.

There will be a discussion after the presentation with a drink.

If you want to join us for lunch on Wednesday, please, send me an email.

Best Regards,

Marco Saerens

[MLG EVENT] Nonlinear dimensionality reduction (Michel Verleysen)

-- Posted by Thibault Helleputte on Fri 05 June 2009 at 02:14 pm --

Event date: Fri 05 June 2009 at 11:00 am

Seminar organized by the Large Graph & Networks team of the UCL.

Methods of dimensionality reduction provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth
are, e.g. the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. This seminar will review (some of the) existing techniques for nonlinear dimensionality reduction, based either on the nonlinear optimization of well-designed objective criteria, or on the algebraic solution to simplified ones.