Abstracts 25th anniversary of ISBA

Maïlis Amico, KUL and UCL

The single-index/Cox mixture cure model

In survival analysis it often happens that a certain fraction of the subjects under study never experience the event of interest, i.e. they are considered ’cured’. In the presence of covariates, a common model for this type of data is the mixture cure model, which assumes that the population consists of two subpopulations, namely the cured and the non-cured ones, and it writes the survival function of the whole population given a set of covariates as a mixture of the survival function of the cured subjects (which equals one), and the survival function of the non-cured ones. In the literature one usually assumes that the mixing probabilities follow a logistic model. This is however a heavy modeling assumption, which is often not met in practice. Therefore, in order to have a flexible model which at the same time does not suffer from curse-of-dimensionality problems, we propose in this paper a single-index model for the mixing probabilities. For the survival function of the non-cured subjects we assume a Cox proportional hazards model. We estimate this model using a maximum likelihood approach. We also carry out a simulation study, in which we compare the estimators under the single-index model and under the logistic model for various model settings, and we apply the new model and estimation method on two data sets.


Ray Carroll, Texas A&M University

A Semiparametric Single-Index Risk Score Across Populations

We consider a problem motivated by issues in nutritional epidemiology, across diseases and populations. In this area, it is becoming increasingly common for diseases to be modeled by a single diet score, such as the Healthy Eating Index, the Mediterranean Diet Score, etc. For each disease and for each population, a partially linear single-index model is fit. The partially linear aspect of the problem is allowed to differ in each population and disease. However, and crucially, the single-index itself, having to do with the diet score, is common to all diseases and populations, and the nonparametrically estimated functions of the single- index are the same up to a scale parameter. Using B-splines with an increasing number of knots, we develop a method to solve the problem, and display its asymptotic theory. An application to the NIH-AARP Study of Diet and Health is described, where we show the advantages of using multiple diseases and populations simultaneously rather than one at a time in understanding the effect of increased Milk consumption. Simulations illustrate the properties of the methods.


Joris Chau, UCL

Statistical data depth for Hermitian positive-definite matrices

In multivariate time series analysis, the non-degenerate autocovariance matrices or the non-degenerate spectral density matrix of a second-order stationary multivariate time series are necessarily Hermitian positive-definite matrices. In this talk, we introduce the concept of a statistical data depth for data observations in the non-Euclidean space of Hermitian positive-definite matrices, with in mind the application to collections of observed covariance or spectral density matrices. Data depth is an important tool in multivariate data analysis providing a center-to-outward ordering of the multivariate data observations. This allows one to characterize central points or regions of the data, detect outlying observations, but also provides e.g. a practical framework for rank-based hypothesis testing. First, we list the properties a data depth function acting on the space of Hermitian positive-definite matrices should ideally satisfy. Second, we introduce two computationally efficient data depth functions satisfying each of these requirements. The usefulness of the new data depth concepts, both as a powerful exploratory data analysis tool, but also as a means to generalize many traditional univariate rank-based hypothesis tests, is illustrated on several multivariate brain signal time series datasets.


Luc Devroye, McGill University, Montreal

Cellular tree classifiers

Suppose that binary classification is done by a tree method in which the leaves of a tree correspond to a partition of d-space. Within a partition, a majority vote is used. Suppose furthermore that this tree must be constructed recursively by implementing just two functions, so that the construction can be carried out in parallel by using "cells": first of all, given input data, a cell must decide whether it will become a leaf or internal node in the tree. Secondly, if it decides on an internal node, it must decide how to partition the space linearly. Data are then split into two parts and sent downstream to two new independent cells. We discuss the design and properties of such classifiers.
This is joint work with Gerard Biau.


Paul Embrechts, ETH Zürich

Hawkes Graphs

In this talk the Hawkes skeleton and the Hawkes graph are introduced. These objects summarize the branching structure of a multivariate Hawkes point process in a compact, yet meaningful way. I demonstrate how graph-theoretic vocabulary is very convenient for the discussion of multivariate Hawkes processes. I also show how the graph view may be used for the specification and estimation of Hawkes models from large, multitype event streams. We pay special attention to computational issues in the implementation.  This makes the results applicable to data with dozens of event streams and thousands of events per component. A simulation study confirms that the presented procedure works as desired. The talk finishes with an application to the modeling of limit order book data in the context of high frequency finance. 
The results presented are based on joint work with Matthias Kirchner, RiskLab, ETH Zurich.

 


 

Donatien Hainaut, UCL

A switching self-exciting jump diffusion process for stock prices

In this talk, we proposes a new Markov switching process with clustering effects. In this approach, the parameters of a self-excited jump process combined to a geometric Brownian motion, are modulated by a hidden Markov chain with a finite number of states. Each regime corresponds to a particular economic cycle that determines the Brownian volatility and the frequency of clustered jumps. After the study of theoretical properties of this process, we propose a sequential Monte-Carlo method to filter the hidden state variables. This approach serves us next to develop a Markov Chain Monte-Carlo procedure to fit the model to the S&P 500. We also show that our model may be used to price European style options.

 


In memory of Peter Hall (DHC ISBA 1997) by Ingrid Van Keilegom, KUL and UCL