Viewpoint invariant features and robust monocular Camera pose estimation

Ferraz Colomina, Luis

Viewpoint invariant features and robust monocular Camera pose estimation

Author

Ferraz Colomina, Luis

Director

Binefa Valls, Xavier

Tutor

Toledo Morales, Ricardo

Date of defense

2016-02-01

ISBN

9788449058851

Pages

174 p.

Department/Institute

Universitat Autònoma de Barcelona. Departament de Ciències de la Computació

Abstract

La pose de la càmera respecte a una escena del món real determina la projecció perspectiva de l'escena sobre el pla imatge. L'anàlisi de les deformacions entre parelles d'imatges degudes a la perspectiva i la pose de la càmera han portat a molts investigadors en Visió per Computador a tractar amb problemes com, la capacitat per detectar i buscar coincidències de les mateixes característiques locals a diferents imatges o recuperar per cada imatge la pose original de la càmera. La diferencia entre els dos problemes recau en la localitat de la informació que es mostra a la imatge, mentre en el cas de les característiques es busca la invariància local, per al cas de la pose de la càmera es busquen fonts d'informació més global, com ara conjunts de característiques locals. La detecció de característiques locals és una peça clau per un ampli rang d'aplicacions de Visió per Computador donat que permet buscar coincidències i localitzar regions específiques de la imatge. A la primera part d'aquest treball la invariància de les característiques és abordada proposant algoritmes per millorar la robustesa a les pertorbacions de la imatge, canvis de perspectiva i poder de discriminació des de dos punts de vista: (i) detecció precisa de cantonades i taques a les imatges evitant redundàncies mitjançant el seu moviment a través de diferents escales, i (ii) aprenentatge de descriptors robustos. Concretament, proposem tres detectors invariants a escala on un d'ells detecta cantonades i taques simultàniament amb un increment de la càrrega computacional insignificant. També proposem un detector invariant afí de taques. Sobre descriptors, proposem aprendre'ls mitjançant xarxes neurals de convolució i grans conjunts de regions d'imatges anotades sota diferents condicions. Malgrat que és un tema investigat durant dècades, l'estimació de la pose de la càmera encara és un repte. L'objectiu dels algorismes de Perspective-n-Point (PnP) és estimar la localització i orientació d'una càmera calibrada a partir de n correspondències 3D-a-2D conegudes entre un prèviament conegut model 3D d'una escena real i característiques 2D obtingudes d'una única imatge. A la segona part d'aquesta tesi l'estimació de la pose de la càmera és adreçada amb nous mètodes de PnP, els quals redueixen dràsticament el cost computacional permetent aplicacions en temps real independentment del nombre de correspondències. A més, proporcionem un mecanisme integrat de rebuig de correspondències incorrectes amb una càrrega computacional insignificant i un nou mètode per incrementar la precisió que modela l'error de reprojecció de cada correspondència. A escenaris complexos i grans, amb potser centenars de milers de característiques, és difícil i computacionalment car trobar correspondències correctes. En aquest cas, proposem un mètode robust i precís per estimar la pose de la càmera. El nostre mètode s'aprofita de classificadors d'alt nivell, que estimen la pose de la càmera de manera poc precisa, per tal de restringir les correspondències a ser utilitzades pels nostres precisos algorismes de PnP.

Camera pose with respect to a real world scene determines the perspective projection of the scene on the image plane. The analysis of the deformations between pairs of images due to perspective and camera pose have led many Computer Vision researchers to deal with problems such as, the ability to detect and match the same local features in different images or recovering for each image its original camera pose. The difference between both problems lie in the locality of the image information, while for local features we look for local invariance, for camera pose we look for more global information sources, like sets of local features. Local feature detection is a cornerstone of a wide range of Computer Vision applications since it allows to match and localize specific image regions. In the first part of this work local invariance of features is tackled proposing algorithms to improve the robustness to image perturbations, perspective changes and discriminative power from two points of view: (i) accurate detection of non-redundant corner and blob image structures based on their movement along different scales, and (ii) learning robust descriptors. Concretely, we propose three scale invariant detectors, detecting one of them corners and blobs simultaneously with a negligible computational overhead. We also propose one blob affine invariant detector. In terms of descriptors, we propose to learn them using Convolutional Neural Networks and large datasets of annotated image regions under different image conditions. Despite being a topic researched for decades camera pose estimation is still an open challenge. The goal of the Perspective-n-Point (PnP) problem is to estimate the location and orientation of a calibrated camera from n known 3D-to-2D point correspondences between a previously known 3D model of a real scene and 2D features obtained from a single image. In the second part of this thesis camera pose estimation is addressed with novel PnP approaches, which reduces drastically the computational cost allowing real-time applications independently of the number of correspondences. In addition, we provide an integrated outlier rejection mechanism with a negligible computational overhead and a novel method to increase the accuracy by modelling the reprojection error of each correspondence. Finally in the case of complex and huge scenarios, with maybe hundreds of thousands of features, is difficult and computationally expensive to be able to find correct 3D-to-2D correspondences. In this case, a robust and accurate top-down approach for camera pose estimation is proposed. Our approach takes advantage of high-level classifiers, which estimates a rough camera pose, in order to constrain the 3D-to-2D correspondences to be used by our accurate and robust to outliers PnP method.

Keywords

PnP; Punts carcterístics; Puntos característicos; Interest points; Aprenentatge de descriptors; Aprendizaje de descriptores; Learning of descriptors

Subjects

004 - Computer science and technology. Computing. Data processing

Knowledge Area

Tecnologies

Documents

lfc1de1.pdf

5.535Mb

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/3.0/es/

This item appears in the following Collection(s)

Departament de Ciències de la Computació [93]

Viewpoint invariant features and robust monocular Camera pose estimation

Author

Director

Tutor

Date of defense

ISBN

Pages

Share

Department/Institute

Abstract

Keywords

Subjects

Knowledge Area

Documents

Export

Rights

This item appears in the following Collection(s)