Dependable systems over synchronous ethernet

  1. Gutiérrez Rivas, José Luis
Dirigida por:
  1. Javier Díaz Alonso Director
  2. Eduardo Ros Vidal Director

Universidad de defensa: Universidad de Granada

Fecha de defensa: 04 de diciembre de 2018

Tribunal:
  1. Mancia Anguita López Presidenta
  2. Manuel Rodríguez Álvarez Secretario/a
  3. Antonio Martínez Álvarez Vocal
  4. Héctor Esteban Pinillos Vocal
  5. Elizabeth Laier English Vocal

Tipo: Tesis

Resumen

This thesis dissertation presents our work with critical distributed applications in industrial network infrastructures. This work focuses on providing all elements on the grid with redundancy features to increase fault tolerance at both local and distributed levels with particular emphasis on timing features. This dissertation is structured in four parts. In the first part, we review the state-of-the-art, paying special attention to all the elements that conform a critical distributed system. This section starts with the evolution that safety-critical (SC) systems have experienced during the last years and their adaptation from single to multi-core architectures. Then, the progressive growth of power grid technologies into Smart Grid systems and their relationship with critical applications and their event synchronization needs. Different timing technologies are detailed with particular emphasis in the main one used in this thesis, the White Rabbit technology (WR), which is capable of providing sub-nanosecond accuracies over Ethernet-based networks. Finally, it has been included a brief market survey to compare our contribution to other existing technologies in the market. In the second part, we review the methods to increase reliability in mixed-critical end-systems using multi-core architectures. We focus on the developed methods to isolate non-critical and critical parts in terms of hardware and software without increasing the certification costs of the system. This deployment is based on an industrial use case that describes an emergency stop of an industrial motor controller, used as proof of concept. This part ends with an analysis of the fault tolerance features of the system due to the implementation of redundant hardware components, safe communication channels and redundant software architectures. In the third part, we move from inter-core communication problems to inter-processor communication networks. We review the methods to increase reliability, scalability and compatibility in industrial networks, focusing on data and time distribution. We firstly introduce the development of different clocks for the WR technology to increase scalability and industrial compatibility. Later, we describe the methods developed to provide WR timing networks with fault tolerance and single point of failure avoidance in ring topologies. This requires of switchover mechanisms to change from a primary to a backup time reference, which is also described in this part of the text. The same way and for the sake of data transmission, we describe the redundancy mechanisms developed to guarantee data distribution and reception, thus increasing services availability and reducing network latency. Finally, we analyze the bandwidth and reliability of data distribution. The fourth part corresponds to the integration of all previous concepts, redundant implementations and compatibility methods into a real mission-critical distributed control system over a synchronous network scenario. This system is composed of network devices with redundancy capabilities, acquisition modules and Remote Terminal Units (RTUs) interconnected in a ring network topology. This deployment includes the dissemination of redundant timing references using WR for the core of the network with the best accuracy possible (below 1 ns). Moreover, other industrial timing solutions like the Precision Time Protocol (PTP) and IRIG-B are used for the acquisition modules and RTUs. Data is also exchanged through reliable communication channels. Finally, a safety tool has been used to evaluate all the elements that form the system in terms of their criticality and integrity levels.