Specific-purpose processing architectures for dynamic artificial vision systems

BARRANCO EXPÓSITO, FRANCISCO

Specific-purpose processing architectures for dynamic artificial vision systems

BARRANCO EXPÓSITO, FRANCISCO

Dirigida por:

Universidad de defensa: Universidad de Granada

Fecha de defensa: 09 de octubre de 2012

Tribunal:

Sergio Cuenca Asensi Presidente/a
Samuel F. Romero García Secretario
Silvio Paolo Sabatini Vocal
Ignacio Bravo Muñoz Vocal
Rafael Muñoz Salinas Vocal

Tipo: Tesis

Teseo: 329459 DIALNET DIGIBUG editor

Resumen

A visual system in dynamic environments copes with an amount of information that becomes unmanageable. Dynamic vision systems consists of several complex elements such as the knowledge database about objects in the scene, their behavior and capabilities, their inter-relationships, or environmental conditions; a list of planned steps to accomplish a task which also includes some feedback for reconsidering it; components to synchronize the multimodal information that system manages; and active vision. This last component is one of the most complex components. In dynamic environments, static vision systems that merely observe the scene are unfit. Active vision includes the control of gaze (control of the eye movements) and the visual attention; adaptation is included as consequence of the feedback in several components of the dynamic vision systems. Human eye perceives high-quality information from the region that surrounds the center of gaze (fovea). The active control of gazing is firstly, a task-dependent process and secondly, helps reducing the processing resources selecting a small window of the scene. In this work, we present a system designed to control the vergence for the ¿iCub¿ robotic head, by using depth cues. Initially, we give the system the capacity of emulating the neural control mechanism called¿fixation¿, that helps the system to stay still to explore in detail a static target. Secondly, an object moving to variable speed in the 3D space is quite difficult to track. In our work, we also face the solution of this process which is called ¿smooth pursuit¿ in the literature. On the other hand, the biological way of reducing the huge visual bandwidth and then, optimize the available resources is by selecting the most interesting areas and focusing the computation of the system for data that come from them: this process is called visual attention. Our aim is designing a model and an implementation for a visual attention system that selects the most important areas in a scene to further process them there more accurately. This implementation is valuable for systems such as advanced driving assistance, integrated in smart cameras for video-surveillance, industrial inspection applications in uncontrolled scenarios, or even in aid devices for low-vision patients. This thesis dissertation presents two different approaches to the visual attention model implementation for dynamic environments. Firstly, we implement a strongly bio-inspired system that uses the responses generated by various kinds of cells in an artificial retina. These sparse cues are then used as inputs for a system that computes optical flow. This implementation is complemented by adding an attentional top-down mechanism that selects the most reliable cues. The attentional modulation stream is deployed using color, texture or speed information of the elements in the scene in order to bias the cues of objects of interest depending on the task that is taking place. Or in our case, biasing the computation of optical flow to focus it on our target objects. We also carry out a second alternative that combines two forms of attention: inherent bottom-up that depends on the contrast of some features of the objects in the scene, and a task-dependent top-down that biases the competition of the most salient locations according to the addressed task requirements. The first part of this alternative consists in designing and implementing a low-level vision system capable of dealing with real-world scenarios in real time, in an embedded device. These strong requirements lead us to focus on implementing our system for an FPGA, a reconfigurable specific purpose device. The implementation is developed in an incremental manner, by firstly setting the basic architecture for the computation of multiscale optical flow. This first architecture is then evolved to include also the computation of disparity, and local energy and orientation, in a single chip. Various alternatives are also tested with the purpose of reducing the resource cost and of improving the estimation precision. Hence, the inclusion of color cues is benchmarked for the implementation of optical flow anddisparity, presenting a large amount of results for the estimations with diverse color representations and a thorough study of its impact in the total resource cost. We also present an alternative for the multiscale-with-warping scheme implemented for the optical flow computation that allows a significant reduction of resource cost by dropping out an affordable density of estimations. The selection of these alternative schemes has to be carefully taken, studying the specific application requirements in terms of accuracy, density and resource utilization. After the implementation of this layer of low-level visual processing engines, we design an architecture to generate a visual attention system that coordinates a bottom-up saliency stream using the energy, orientation, motion and a new color opponency engines, and a top-down modulation that uses the disparity and motion ones. The system is tested in the framework of advanced driving assistance systems.