Evaluación de Modelos Semánticos Distribucionales para la Extracción de Relaciones Semánticas Activadas por Ríos con Nombre Propio de un Corpus Especializado de Pequeño Tamaño

  1. Rojas Garcia, Juan
  2. Faber, Pamela
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2019

Issue: 63

Pages: 51-58

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

EcoLexicon (http://ecolexicon.ugr.es) is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of landform concepts such as named rivers (e.g., Nile River), distributional semantic models (DSMs) were applied to a small-sized, specialized corpus to extract the terms related to each named river mentioned in it and their semantic relations. Since the construction of DSMs is highly parameterized and their evaluation in small specialized corpora has received little attention, this paper identified parameter combinations in DSMs suitable for the extraction of the semantic relations takes_place_in, affects, and located_at, frequently held by named rivers in the corpus. The models were thus evaluated using three gold standard datasets. The results showed that, for a small-sized corpus, count-based models outperformed prediction-based ones with the log-likelihood association measure, and the detection of a specific relation depended largely on the context window size. |

Funding information

This research was carried out as part of project FFI2017-89127-P, Translation-Oriented Terminology Tools for Environmental Texts (TOTEM), funded by the Spanish Ministry of Economy and Competitiveness. Funding was also provided by an FPU grant given by the Spanish Ministry of Education to the first author.

Bibliographic References

  • Artiles, J., A. Borthwick, J. Gonzalo, S. Sekine, and E. Amig´o. 2010. Weps3 evaluation campaign: Overview of the web people search clustering and attribute extraction tasks. In CLEF 2010.
  • Berant, J., A. Chou, R. Frostig, and P. Liang. 2013. Semantic parsing on freebase from question-answer pairs. In EMNLP 2013, pages 1533–1544.
  • Bollacker, K., C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. AcM.
  • Chen, L., J. Liang, C. Xie, and Y. Xiao. 2018. Short text entity linking with finegrained topics. In CIKM ’18, pages 457– 466, New York, NY, USA. ACM.
  • Guo, S., M.-W. Chang, and E. Kıcıman. To link or not to link? a study on end-toend tweet entity linking. In Proceedings of NAACL-HLT, pages 1020–1030.
  • Lehmann, J., R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer, and C. Bizer. 2014. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal,6.
  • Levenshtein, V. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10.
  • Mcnamee, P. and H. T Dang. 2009. Overview of the tac 2009 knowledge base population track. In Proceedings of the Second Text Analysis Conference.
  • Rizzo, G., M. van Erp, J. Plu, and R. Troncy. 2016. Making Sense of Microposts (#Microposts2016) Named Entity rEcognition and Linking (NEEL) Challenge. In 6th Workshop on Making Sense of Microposts, pages 50–59.
  • Shekarpour, S., K. M. Endris, A. J. Kumar, D. Lukovnikov, K. Singh, H. Thakkar, and C. Lange. 2016. Question answering on linked data: Challenges and future directions. In WWW 2016, pages 693–698.
  • Trivedi, P., G. Maheshwari, M. Dubey, and J. Lehmann. 2017. Lc-quad: A corpus for complex question answering over knowledge graphs. In International Semantic Web Conference, pages 210–218. Springer.
  • Unger, C., C. Forascu, V. Lopez, A.-C. N. Ngomo, E. Cabrio, P. Cimiano, and S. Walter. 2014. Question Answering over Linked Data (QALD-4). sep