AG Stochastik (Wintersemester 2022/23)
- Dozent*in: Prof. Dr. Nicole Bäuerle, Prof. Dr. Vicky Fasen-Hartmann, Prof. Dr. Tilmann Gneiting, Prof. Dr. Daniel Hug, Prof. Dr. Günter Last, Prof. Dr. Mathias Trabs
- Veranstaltungen: Seminar (0127200)
- Semesterwochenstunden: 2
Studierende und Gäste sind jederzeit herzlich willkommen. Wenn nicht explizit anders unten angegeben, finden alle Vorträge in Präsenz im Raum 2.059 statt. Für die Aufnahme in den E-Mail-Verteiler für die Einladungen kontaktieren Sie bitte Tatjana Dominic (tatjana.dominic@kit.edu).
Termine | ||
---|---|---|
Seminar: | Dienstag 15:45-17:15 | 20.30 SR 2.59 |
Lehrende | ||
---|---|---|
Seminarleitung | Prof. Dr. Nicole Bäuerle | |
Sprechstunde: nach Vereinbarung. | ||
Zimmer 2.016 Kollegiengebäude Mathematik (20.30) | ||
Email: nicole.baeuerle@kit.edu | Seminarleitung | Prof. Dr. Vicky Fasen-Hartmann |
Sprechstunde: Nach Vereinbarung. | ||
Zimmer 2.053 Kollegiengebäude Mathematik (20.30) | ||
Email: vicky.fasen@kit.edu | Seminarleitung | Prof. Dr. Tilmann Gneiting |
Sprechstunde: nach Vereinbarung | ||
Zimmer 2.019 Kollegiengebäude Mathematik (20.30) | ||
Email: tilmann.gneiting@kit.edu | Seminarleitung | Prof. Dr. Daniel Hug |
Sprechstunde: Nach Vereinbarung. | ||
Zimmer 2.051 Kollegiengebäude Mathematik (20.30) | ||
Email: daniel.hug@kit.edu | Seminarleitung | Prof. Dr. Günter Last |
Sprechstunde: nach Vereinbarung. | ||
Zimmer 2.001, Sekretariat 2.056 Kollegiengebäude Mathematik (20.30) | ||
Email: guenter.last@kit.edu | Seminarleitung | Prof. Dr. Mathias Trabs |
Sprechstunde: Sprechzeit nach Vereinbarung | ||
Zimmer 2.020 Kollegiengebäude Mathematik (20.30) | ||
Email: trabs@kit.edu |
Dienstag, 31.01.2023, 15.45 Uhr
Botond Tibor Szabo (Università Bocconi)
Optimal distributed testing under communication constraints in high-dimensional and nonparametric Gaussian white noise model
Abstract: We study the problem of signal detection in Gaussian noise in a distributed setting both for high-dimensional and nonparametric signals. We consider both the public and private coin protocols, i.e. when the machines have and don't have access to a shared source of randomness, respectively. We derive lower bounds on the size that the signal needs to have in order to be detectable. We also derive matching upper bounds based on constructive algorithms. We distinguish different regimes based on the dimension of the model (or the smoothness of the signal in the nonparametric setting), the number of machines and the number of transmitted bits between the machines. We show that in certain regimes under the more flexible public coin protocol one can achieve lower detection boundaries than using private coins, while in other regimes the two type of protocols results in the same testing limitations and guarantees. Finally in the nonparametric framework we derive both lower and upper bounds for adaptation.
Dienstag, 24.01.2023, 15.45 Uhr
Günter Last (Institut für Stochastik, KIT)
Testing Hyperunifomity
Abstract: Hyperuniform structures can be both isotropic like a liquid and homogeneous like a crystal. In that sense, they represent a new state of matter, and they have attracted a quickly growing attention in physics, biology and material science. In mathematical terms, hyperuniformity of a stationary point process in Euclidean space can be described as an anomalous suppression of large-scale density fluctuations. This means that the variance of the number of points in a large ball grows more slowly than its volume. By now many different point processes with these exciting properties have been discovered, among them perturbed lattices, quasi-crystals, dependent thinnings and the Ginibre process.
We devise the first rigorous significance test for hyperuniformity with sensitive results, even for a single sample. Our test is based on the asymptotic behavior of the so-called scattering intensity, which is the squared norm of the empirical Fourier transform of the (localized) point process, suitably normalized. Theoretical results as well as simulations show that this behavior applies to a wide range of stationary point processes. We can then use the likelihood ratio principle to test for hyperuniformity. Remarkably, the asymptotic distribution of the resulting test statistic is universal under the null hypothesis of hyperuniformity. We obtain its explicit form from simulations with very high accuracy. The novel test precisely keeps a nominal significance level for hyperuniform models, and it rejects non-hyperuniform examples with high power even in borderline cases. Moreover, it does so given only a single sample with a practically relevant system size.
This talk is based on joint work with Michael Klatt and Norbert Henze.
Dienstag, 17.01.2023, 15.45 Uhr
Johannes Resin (Heidelberger Institut für Theoretische Studien (HITS)
Elicitability of Probabilistic Top List Functionals
Abstract: In the face of uncertainty, the need for probabilistic assessments has long been recognized in the literature on forecasting. In classification, however, comparative evaluation of classifiers often focuses on predictions specifying a single class through the use of simple accuracy measures, which disregard any probabilistic uncertainty quantification. I propose probabilistic top lists as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicitable through the use of strictly consistent evaluation metrics. The proposed evaluation metrics are based on symmetric proper scoring rules and admit comparison of various types of predictions ranging from single-class point predictions to fully specified predictive distributions. The Brier score yields a metric that is particularly well suited for this kind of comparison.
Dienstag, 10.01.2023, 15.45 Uhr
Adrian Fischer (Université libre de Bruxelles)
Normal approximation for the posterior in exponential families
Abstract: In this talk we obtain quantitative Bernstein-von Mises type bounds on the normal approximation of the posterior distribution in exponential family models when centering either around the posterior mode or around the maximum likelihood estimator. Our bounds, obtained through a version of Stein's method, are non-asymptotic, and data dependent; they are of the correct order both in the total variation and Wasserstein distances, as well as for approximations for expectations of smooth functions of the posterior. All our results are valid for univariate and multivariate posteriors alike, and do not require a conjugate prior setting. We illustrate our findings on a variety of exponential family distributions, including Poisson, multinomial and normal distribution with unknown mean and variance. The resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors {may} affect the quality of the normal approximation. The performance of the bounds is also assessed with simulations.
Mittwoch, 14.12.2022, 14.00 Uhr im SR 1.067
Nicoleta Serban (Georgia Institute of Technology)
From Data Analytics to Making an Impact
Abstract: In this seminar, I will expand on my recent research on health analytics for decision making with a focus on methodologies using massive databases, highlighting both challenges and opportunities. The research presented in this seminar has been motivated by one of my research programs to bring rigor in measurement of and inference on healthcare access. I will use this framework to motivate the access model, a classic assignment optimization but with many important computational challenges, including spatial coupling, complex system constraints, large-scale decision space and data uncertainty. Methodological contributions on statistical inference for optimization models will be introduced, specifically, batching solutions in linear programs using parametric programming and distributed optimization for solving large-scale optimization.
Dienstag, 15.11.2022, 15.45 Uhr
Bojana Milošević (University of Belgrade)
Some characterization-based goodness-of-fit tests: journey through complete data and randomly censored data cases
Abstract: The usage of different distributional characterization theorems has shown to be rather popular for the con-struction of goodness-of-fit tests. Such tests are attractive because they reflect some intrinsic properties of probability distributions connected with the given characterization, and therefore can be more efficient or more robust than others. The growth of the number of characterization theorems especially contributed to the development of this direction. One of the most fruitful types of characterizations is of the equidistribution type. Therefore most of the talk will be dedicated to this class of characterizations and associated tests. In prac-tice, we are often facing incomplete data sets. For example, a common problem in clinical trials is the miss-ing data caused by patients who do not complete the study on a full schedule and drop out of the study with-out further measurements. This usually results in randomly right-censored data and providing adequate goodness-of-fit tests for such data is of importance for the wide scientific community. Here we present an overview of characterization-based tests for complete data and their adaptations for ran-domly censored data. We show their asymptotic and small sample properties. We also introduce alternative approaches to the adaptation of goodness-of-fit tests for randomly censored data and explore its properties. The final part of the talk will be dedicated to several potential directions for future research.
Dienstag, 25.10.2022, 15.45 Uhr
Bikramjit Das (Singapore University of Technology and Design)
Active labelling for rare event classification: an importance-sampling based approach
Abstract: In this work, we investigate active learning techniques in an imbalanced classification problem where label-ling is expensive. Classification tasks are known to be difficult when the data set is imbalanced, i.e., samples from one class is much smaller (rarer) than the other classes. Furthermore, for many instances in medical diagnosis, document/image classification, etc, even though we have a huge unlabelled data set, we have access to only a limited number of labelled points since labelling is either difficult, expensive or time-consum-ing. Active learning is a tool used to sequentially choose points to label in such circumstances. Under the as-sumption that the rare class appears due to some extreme phenomena we propose an importance-sampling based algorithm to sequentially query labels while obtaining the classifier by minimizing a cost-sensitive loss function accounting for the class imbalance. We show that our algorithm approximates a zero-variance im-portance sampler and in experiments we show that our approach achieves target (weighted) accuracy and/or F1-score using a relatively small number of sample queries.
The talk is based on (ongoing) joint work with Karthyek Murthy and Xiangyu Liu.