Recuperación de Información para la Web de las Cosas

  1. Manta Caro, Héctor Cristyan
Supervised by:
  1. Juan Manuel Fernández Luna Director

Defence university: Universidad de Granada

Fecha de defensa: 24 April 2023

Committee:
  1. Paolo Rosso Chair
  2. Juan Francisco Huete Guadix Secretary
  3. Lidia Fuentes Fernández Committee member

Type: Thesis

Abstract

The Internet and Web technologies have evolved remarkably since their conception, changing our lives and society in many ways. The creation of the World Wide Web (www) and the Network of Networks have also provided a crucial technological base for the progress of digital society and the construction of smart cities. Today, new paradigms are emerging that frame a new era, that of the Internet and the Web of Things. This new era is due to the possibility of interconnecting to the Internet, not only traditional devices such as smartphones, laptops, mobile and ubiquitous computing, but also any object in the real world. Hyperconnection of animate or inanimate objects in the real world also allows publishing Web-type services to provide highly dynamic content and real-time data to end users about the status or actions allowed on them. At the same time, these things (also referred to as objects or entities) are endowed with a certain intelligence. Currently, the Internet of Things encompasses around 12 billion interconnected devices, with an expected 16.5 billion by 2025 and more than 25 billion things by 2030. The Internet of Things (IoT) describes technologies and research disciplines that enable the Internet to delve into the real world of physical objects, seamlessly connecting and taking on some intelligence. Through IoT, information is obtained on the properties, states and characteristics of things to activate their functionalities telematically. The Web of Things (WoT) proposes the abstraction of real-world entities into a kind of “avatar” Virtual Web or intelligent Web Agent in order to acquire, process and present information in real-time concerning this Avatar and with the ability to connect to and control things in the real world through it. A new generation of services can emerge from these complementary paradigms, many of which are supported by the new infrastructure of 5G and 6G networks. One of them is crucial for our daily interaction with this new intelligent cyber world, Information Retrieval (IR), mainly in the form of search engines, which can also evolve into much more powerful tools. These systems will provide the ability to find relevant information about things in the real world through their abstractions. On this basis, a new architecture for this type of IR service must be defined. It must consider the synergy and challenges both IoT and WoT impose due to their colossal size and unprecedented dynamism. Simulators are essential for researching and developing new systems, architectures or protocols. Simulators have played a key role in history, driving the development of the Internet and the Web and its components. This paper highlights the view that simulationbased research will continue to participate in the evolution of paradigms associated with the Internet and the Web of Things and information retrieval systems. The appearance of such paradigms implies a redefinition and reconstruction towards the evolution of current IR systems facing new challenges. Therefore, it is essential to develop abstract models of Web representation through simulation to establish new approaches in Information Retrieval for the Web of Things. Moreover, this builds mechanisms for experimentation and validation of these approaches through dynamic synthetic collections that mimic WoT behaviour through simulation. A latent challenge exists due to the heterogeneity of WoT modelling proposals that have emerged in the conception of WoT architecture. At different levels, from the Web representation mechanism and the thing description language to the semantic enrichment around WoT. All these heterogeneous levels directly impact the construction of IR systems in the form of search engines for WoT. Different directions could be taken to solve the problems of adaptability to WoT for a particular field of application via i) specialization of the IR system or ii) generalization of the IR system. As for the IR principles and architecture, there is no evidence of an in-depth study of the techniques and strategies of the IR systems in their suitability and adaptability given the dynamic characteristics of the WoT, considering the requirements it imposes and the expected challenges. This doctoral work describes: A proposal to model theWeb of Things based on a structured XML representation. This model has been designed with flexibility and modularity to allow the representation of multiple scenarios, being the conceptual source for the future development of IR systems. A discrete event simulator named SIM.WoT, whose ultimate goal is to encapsulate the expected dynamics of the WoT for the development of IR systems. The simulator generates a synthetic collection of XML documents in real time containing spatiotemporal contexts and textual information with highly dynamic dimensions. The simulator is characterized by its flexibility and versatility to represent real-world scenarios and offers a unique perspective for IR. An IR proposal for the WoT that contemplates the critical stages of indexing, scoring and presentation, called IR.WoT. This paper describes design considerations, cloud implementation and experimentation based on a collection of synthetic XML documents from simulation. A study of adaptability of conventional IR paradigms and concepts to the WoT context in the form of a Systematic Literature Review (SLR) and update of state of the art to 2022. Construction of an open Dataset which, as a result of the SLR, contains the data and analysis of search engines and IR mechanisms for the Internet and the Web of Things in the scientific literature. A report of experimentation of the indexing and retrieval stages in an IR.WoT search engine proposal along with an evaluation proposal. We are finishing with an analysis of results, conclusions and future work.