Técnicas de inferencia estadística
In the current information age, new types of data are emerging. The new challenging problems associated with the analysis of these data should be tackled through new statistical ideas and developments. Among these new types of data, one can think about the (crisp or fuzzy) finite dimensional Euclidean set-valued ones, which are generically referred to as imprecise data.
The work developed for this dissertation deals with the robust analysis of the location or central tendency of the random mechanism generating such imprecise data (i.e., a random set or random fuzzy set).
The best-known location measure of such mechanisms is the associated Aumann or Aumann-type mean, respectively. Both means preserve the main valuable properties of the real/vectorial-valued means, but they also share a negative feature: their high sensitivity to either the existence of outliers or data changes.
In looking for a more robust location measure, it seems convenient to follow some of the most successful approaches with real/vectorial data: the trimmed means and the M-estimators. To develop or adapt these approaches, two methodologies will be considered, namely:
- On one hand, imprecise data can be represented as functional data from (a convex cone within) certain Hilbert space-valued random elements. Consequently, one may particularize results and methods from Functional Data Analysis. This can be properly made whenever one guarantees that the particularization does not move out of the cone of the imprecise data
- On the other hand, when either the preceding methodology fails or it can be improved, one may develop ad hoc concepts and methods by combining notions and results from both (Fuzzy) Set-Valued Analysis and Large Sample/Resampling Statistics.
With this general goal, the work for this dissertation is structured as follows:
Chapter 1 gathers the main supporting tools that will be used in developing the two mentioned approaches. A substantial part of the chapter regards notions and results which have been expressly introduced during the course of this thesis. The types of data to be dealt with, the usual arithmetics and metrics between them, the random mechanisms generating these data and the associated Aumann/Aumann-type means, are also presented. To motivate the main body of the work, simulation studies are carried out to corroborate empirically that the means are highly sensitive to outliers and data changes in the imprecise-valued case.
Chapter 2 is devoted to the extension of trimmed means to deal with imprecise-valued data. Trimmed means for functional data, which were introduced by other authors, exist and they are unique under ideal conditions. There is also an algorithm in the literature to compute an approximation of their empirical version. In this chapter, a new algorithm to compute exactly the empirical trimmed mean is introduced. Consistency and robustness properties are established and simulations are carried out to compare it with other algorithms or trimmed means for functional data. The new ideas are finally particularized to imprecise data.
Chapter 3 deals with the extension of M-estimates of location for imprecise-valued data. Some recent studies have been first adapted to Hilbert space-valued random elements in such a way that necessary and sufficient conditions for the existence of M-estimates ensure that they belong to the conical parameter space. Furthermore, an iterative algorithm extending the re-weighted least squares algorithm is established. Since some interesting loss functions do not fulfill these conditions, additional ad hoc procedures have been developed to deal with fuzzy and interval-valued data.
Chapter 4 aims to compare the robust behaviour of the different location measures introduced in this work from an empirical point of view. While the aim of the simulations developed in Chapters 2 and 3 was to compare the robustness of the new location measures in contrast to the sensitivity of the Aumann/Aumann-type means, the comparison in this chapter will be stated among all the approaches which have been suggested in the course of this work. A summarized discussion is to be presented after them.
Each of the chapters in the work will end with some common types of remarks, namely,
- those clearly highlighting the main contributions in the chapter;
- those pointing out the ideas and results which, having been developed for the chapter, have already been published, accepted or submitted for publication