Review of spoken dialogue systems

Ramón López-Cózar Delgado; Zoraida Callejas; David Griol; José Francisco Quesada Moreno

doi:10.3989/LOQUENS.2014.012

Review of spoken dialogue systems

Ramón López-Cózar Delgado ¹
Zoraida Callejas ¹
David Griol ²
José Francisco Quesada Moreno ³

1 Universidad de Granada

Universidad de Granada

Granada, España

ROR https://ror.org/04njjy449
2 Universidad Carlos III de Madrid

Universidad Carlos III de Madrid

Madrid, España

ROR https://ror.org/03ths8210
3 Universidad de Sevilla

Universidad de Sevilla

Sevilla, España

ROR https://ror.org/03yxnpp24

Mostrar afiliaciones +

Revista:

Loquens : revista española de ciencias del habla

ISSN: 2386-2637

Año de publicación: 2014

Número: 1

Páginas: 3

Tipo: Artículo

DOI: 10.3989/LOQUENS.2014.012 DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Loquens : revista española de ciencias del habla

Objetivos de desarrollo sostenible

Resumen

Los sistemas de diálogo son programas de ordenador desarrollados para interaccionar con los usuarios mediante habla, con la finalidad de proporcionarles servicios automatizados. La interacción se lleva a cabo mediante turnos de un tipo de diálogo que, en muchos estudios existentes en la literatura, los investigadores intentan que se parezca lo más posible al diálogo real que se lleva a cabo entre las personas en lo que se refiere a naturalidad, inteligencia y contenido afectivo. En este artículo describimos los fundamentos de esta tecnología, incluyendo las tecnologías básicas que se utilizan para implementar este tipo de sistemas. También presentamos una evolución de la tecnología y comentamos algunas aplicaciones actuales. Asimismo, describimos paradigmas de interacción, incluyendo lenguajes de script y desarrollo de interfaces conversacionales para aplicaciones móviles. Un aspecto clave de esta tecnología consiste en realizar un correcto modelado del usuario. Por este motivo, discutimos diversos modelos afectivos, de personalidad y contextuales. Finalmente, comentamos algunas líneas de investigación actuales relacionadas con la comunicación verbal, interacción multimodal y gestión del diálogo.

Referencias bibliográficas

Acosta, J. C., & Ward, N. G. (2011). Achieving rapport with turnby-turn, user-responsive emotional coloring. Speech Communication, 53(9–10), 1137–1148. http://dx.doi.org/10.1016/j.specom.2010.11.006
Ahmad, F., Hogg-Johnson, S., Stewart, D. E., Skinner, H. A., Glazier, R. H., & Levinson, W. (2009). Computer-assisted screening for intimate partner violence and control: A randomized trial. Annals of Internal Medicine, 151(2), 93–102. http://dx.doi.org/10.7326/0003-4819-151-2-200907210-00124 PMid:19487706
Alexandersson, J., Girenko, A., Spiliotopoulos, D., Petukhova, V., Klakow, D., Koryzis, D., … & Gardner, M. (2014). Metalogue: A multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management. Proceedings of 10th International Conference on Intelligent Environments (IE '14), 365–368. http://dx.doi.org/10.1109/ie.2014.67
Allen, J. (1995). Natural language understanding. Redwood City, CA: The Benjamin Cummings.
Andrade, A. O., Pereira, A. A., Walter, S., Almeida, R., Loureiro, R., Compagna, D., & Kyberd, P. J. (2014). Bridging the gap between robotic technology and health care. Biomedical Signal Processing and Control, 10, 65–78. http://dx.doi.org/10.1016/j.bspc.2013.12.009
Andreani, G., Di Fabbrizio, D., Gilbert, M., Gillick, D., Hakkani-Tur, D., & Lemon, O. (2006). Let's DISCOH: Collecting an annotated open corpus with dialogue acts and reward signals for natural language helpdesks. IEEE 2006 Workshop on Spoken Language Technology, 218–221. http://dx.doi.org/10.1109/SLT.2006.326794
Baker, R. S. J. d., D'Mello, S. K., Rodrigo, M. M. T., & Graesser, A. C. (2010). Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive–affective states during interactions with three different computer-based learning environments. International Journal of Human–Computer Studies, 68(4), 223–241. http://dx.doi.org/10.1016/j.ijhcs.2009.12.003
Balahur, A., Mihalcea, R., & Montoyo, A. (2014). Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications. Computer Speech and Language, 28(1), 1–6. http://dx.doi.org/10.1016/j.csl.2013.09.003
Baptist, L., & Seneff, S. (2000). GENESIS-II: A versatile system for language generation in conversational system applications. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), 3, 271–274.
Batliner, A., Seppi, D. Steidl, S., & Schuller, B. (2010). Segmenting into adequate units for automatic recognition of emotion-related episodes: A speech-based approach. Advances in Human Computer Interaction, 2010. http://dx.doi.org/10.1155/2010/782802
Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G., & Tobiasson, H. (2009). The MonAMI reminder: A spoken dialogue system for face-to-face interaction. Proceedings of the 10th INTERSPEECH Conference 2009, 296–299.
Bickmore, T., & Giorgino, T. (2006). Health dialog systems for patients and consumers. Journal of Biomedical Informatics, 39(5), 556–571. http://dx.doi.org/10.1016/j.jbi.2005.12.004 PMid:16464643
Bickmore, T. W., Puskar, K., Schlenk, E. A., Pfeifer, L. M., & Sereika, S. M. (2010). Maintaining reality: Relational agents for antipsychotic medication adherence. Interacting with Computers, 22(4), 276–288. http://dx.doi.org/10.1016/j.intcom.2010.02.001
Black, L. A., McTear, M. F., Black, N. D., Harper, R., & Lemon, M. (2005). Appraisal of a conversational artefact and its utility in remote patient monitoring. Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, 506–508. http://dx.doi.org/10.1109/CBMS.2005.33
Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken language interface research. Computer Science Department, Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/~max/mainpage_files/bohus%20et%20al%20 olympus_hlt2007.pdf http://dx.doi.org/10.3115/1556328.1556333
Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. Proceedings of the 8th European Conference on Speech Communication and Technology. EUROSPEECH 2003–INTERSPEECH 2003, 597–600.
Bouakaz, S., Vacher, M., Bobillier Chaumon, M.-E., Aman, F., Bekkadja, S., Portet, F., … & Chevalier, T. (2014). CIRDO: Smart companion for helping elderly to live at home for longer. IRBM, 35(2), 100–108. http://dx.doi.org/10.1016/j.irbm.2014.02.011
Boves L., & Os, E. den (2002). Multimodal services–A MUST for UMTS (Tech. Rep.). EURESCOM 2002.
Bui, T. H. (2006). Multimodal dialogue management - State of the art. Human Media Interaction Department, University of Twente (Vol. 2). PMid:16789818 PMCid:PMC1475712
Callejas, Z., Griol, D., Engelbrecht, K.-P., & López-Cózar, R. (2014). A clustering approach to assess real user profiles in spoken dialogue systems. In J. Mariani, S. Rosset, M. Garnier-Rizet & L. Devillers (Eds.), Natural interaction with robots, knowbots and smartphones (pp. 327–334). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-8280-2_29
Callejas, Z., Griol, D., & López-Cózar, R. (2011). Predicting user mental states in spoken dialogue systems. EURASIP Journal on Advances in Signal Processing, 2001, 6. http://dx.doi.org/10.1186/1687-6180-2011-6
Callejas, Z., Griol, D., & López-Cózar, R. (2014). A framework for the assessment of synthetic personalities according to user perception. International Journal of Human–Computer Studies, 72(7), 567–583. http://dx.doi.org/10.1016/j.ijhcs.2014.02.002
Callejas, Z., López-Cózar, D., Ábalos, N., & Griol, D. (2011). Affective conversational agents: The role of personality and emotion in spoken interactions. In D. Pérez-Marín & I. Pascual-Nieto (Eds.), Conversational agents and natural language interaction: Techniques and effective practices (pp. 203–222). IGI Global. http://dx.doi.org/10.4018/978-1-60960-617-6.ch009
Callejas, Z., Ravenet, B., Ochs, M., & Pelachaud, C. (2014). A model to generate adaptive multimodal job interviews with a virtual recruiter. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14), 3615–3619.
Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37. http://dx.doi.org/10.1109/T-AFFC.2010.1
Cavazza, M., de la Camara, R. S., & Turunen, M. (2010). How was your day? A Companion ECA. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 1629–1630.
Cohen, P. (1997). Dialogue modeling. In R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology (pp. 204–210). New York: Cambridge University Press PMid:9401498
Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.
Corradini, A., Fredriksson, M., Mehta, M., Königsmann, J., Bernsen, N. O., & Johanneson, L. (2004). Towards believable behavior generation for embodied conversational agents. Proceedings of the Workshop on Interactive Visualisation and Interaction Technologies (IV&IT), 946–953. http://dx.doi.org/10.1007/978-3-540-24688-6_121
Creed, C., & Beale, R. (2012). User interactions with an affective nutritional coach. Interacting with Computers, 24(5), 339–350. http://dx.doi.org/10.1016/j.intcom.2012.05.004
Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 30–42. http://dx.doi.org/10.1109/TASL.2011.2134090
Dalianis, H. (1999). Aggregation in natural language generation. Computational Intelligence, 15(4), 384–414. http://dx.doi.org/10.1111/0824-7935.00099
De Silva, L. C., Morikawa, C., & Petra, I. M. (2012). State of the art of smart homes. Engineering Applications of Artificial Intelligence, 25(7), 1313–1321. http://dx.doi.org/10.1016/j.engappai.2012.05.002
Delichatsios, H., Friedman, R. H., Glanz, K., Tennstedt, S., Smigelski, C., Pinto, B., … & Gillman, M. W. (2001). Randomized trial of a "talking computer" to improve adults' eating habits. American Journal of Health Promotion, 15(4), 215–224. http://dx.doi.org/10.4278/0890-1171-15.4.215 PMid:11349340
Dethlefs, N., Hastie, H., Cuayáhuitl, H., & Lemon, O. (2013). Conditional random fields for responsive surface realisation using global features. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 1254–1263.
Dutoit, T. (1996). An introduction to Text-to-Speech synthesis. Dordrecht: Kluwer Academic.
Dybkjær, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, 43(1–2), 33–54. http://dx.doi.org/10.1016/j.specom.2004.02.001
Expert Advisory Group on Language Engineering Standards (EAGLES) (1996). Evaluation of natural language processing systems (Tech. Rep.). EAGLES Document EAG-EWG-PR2. Center for Sprogteknologi, Copenhagen.
Failenschmid, K., Williams, D., Dybkjær, L., & Bernsen, N. (1999). DISC Deliverable D3.6 (Tech. Rep.). NISLab, University of Southern Denmark. PMid:10602415
Farzanfar, R., Frishkopf, S., Migneault, J., & Friedman, R. (2005). Telephone-linked care for physical activity: A qualitative evaluation of the use patterns of an information technology program for patients. Journal of Biomedical Informatics, 38(3), 220–228. http://dx.doi.org/10.1016/j.jbi.2004.11.011 PMid:15896695
Foster, M. E., Giuliani, M., & Isard, A. (2014). Task-based evaluation of context-sensitive referring expressions in human-robot dialogue. Language, Cognition and Neuroscience, 29(8), 1018– 1034. http://dx.doi.org/10.1080/01690965.2013.855802
Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408. http://dx.doi.org/10.1017/S0269888909990166
Fryer, L., & Carpenter, R. (2006). Bots as language learning tools. Language Learning and Technology, 10(3), 8–14
Geutner, P., Steffens, F., & Manstetten, D. (2002). Design of the VICO spoken dialogue system: Evaluation of user expectations by Wizard-of-Oz experiments. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC '02), Canary Islands.
Ghanem, K. G., Hutton, H. E., Zenilman, J. M., Zimba, R., & Erbelding, E. J. (2005). Audio computer assisted self interview and face to face interview modes in assessing response bias among STD clinic patients. Sexually Transmitted Infections, 81(5), 421–425. http://dx.doi.org/10.1136/sti.2004.013193 PMid:16199744 PMCid:PMC1745029
Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., … & Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17(1–2), 1–18. http://dx.doi.org/10.1016/0167-6393(95)00008-C
Graaf, M. M. A. de, & Ben Allouch, S. (2013). Exploring influencing variables for the acceptance of social robots. Robotics and Autonomous Systems, 61(12), 1476–1486. http://dx.doi.org/10.1016/j.robot.2013.07.007
Griol, D., Callejas, Z., López-Cózar, R., & Riccardi, G. (2014). A domain- independent statistical methodology for dialog management in spoken dialog systems. Computer Speech and Language, 28(3), 743–768. http://dx.doi.org/10.1016/j.csl.2013.09.002
Griol, D., Molina, J. M., Sanchis de Miguel, A., & Callejas, Z. (2012). A proposal to create learning environments in virtual worlds integrating advanced educative resources. Journal of Universal Computer Science, 18(18), 2516–2541.
Hardy, H., Biermann, A., Bryce Inouye, R., McKenzie, A., Strzalkowski, T., Ursu, C., … & Wu, M. (2006). The AMITIÉS system: Data-driven techniques for automated dialogue. Speech Communication, 48(3–4), 354–373. http://dx.doi.org/10.1016/j.specom.2005.07.006
Harris, R. A. (2004). Voice interaction design: Crafting the new conversational speech systems. Morgan Kaufmann.
He, Y., & Young, S. (2005). Semantic processing using the Hidden Vector State Model. Computer Speech and Language, 19(1), 85–106. http://dx.doi.org/10.1016/j.csl.2004.03.001
Heinroth, T., & Minker, W. (2013). Introducing spoken dialogue systems into Intelligent Environments. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-5383-3 http://dx.doi.org/10.1007/978-1-4614-5383-3
Hempel, T. (2008). Usability of speech dialogue systems: Listening to the target audience. Springer.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A. Jaitly, N., … & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. http://dx.doi.org/10.1109/MSP.2012.2205597
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.
Hubal, R., & Day, R. S. (2006). Informed consent procedures: An experimental test using a virtual character in a dialog systems training application. Journal of Biomedical Informatics, 39(5), 532–540. http://dx.doi.org/10.1016/j.jbi.2005.12.006 PMid:16464644
Hudlicka, E. (2014). Affective BICA: Challenges and open questions. Biologically Inspired Cognitive Architectures, 7, 98–125. http://dx.doi.org/10.1016/j.bica.2013.11.002
Janarthanam, S., Lemon, O., Liu, X., Bartie, P., Mackaness, W., & Dalmas, T. (2013). A multithreaded conversational interface for pedestrian navigation and question answering. Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 151–153.
Jokinen, K., Kanto, K., & Rissanen, J. (2004). Adaptative user modelling in AthosMail. Lecture Notes on Computer Science, 3196, 149–158.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). Prentice Hall.
Kerly, A., Ellis, R., & Bull, S. (2008). CALMsystem: A conversational model for learner modelling. Knowledge-Based Systems, 21(3), 238–246. http://dx.doi.org/10.1016/j.knosys.2007.11.015
Kortum, P. (2008). HCI beyond the GUI: Design for haptic, speech, olfactory, and other nontraditional interfaces. Morgan Kaufmann.
Kovács, G. L., & Kopácsi, S. (2006). Some aspects of Ambient Intelligence. Acta Polytechnica Hungarica, 3(1), 35–60.
Krebber, J. Möller, S., Pegam, R., Jekosch, U., Melichar, M., & Rajman, M. (2004). Wizard-of-Oz tests for a dialog system in smart homes. Paper presented at the Joint Congress CFA/DAGA '04, Strasbourg.
Larsson, S. & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(4), 323–340. http://dx.doi.org/10.1017/S1351324900002539
Lebai Lutfi, S., Fernández-Martínez, F., Lucas-Cuesta, J. M., López-Lebón, L., & Montero, J. M. (2013). A satisfactionbased model for affect recognition from conversational features in spoken dialog systems. Speech Communication, 55(7–8), 825–840. http://dx.doi.org/10.1016/j.specom.2013.04.005
Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation. Computer Speech and Language, 25(2), 210–221. http://dx.doi.org/10.1016/j.csl.2010.04.005
Lemon, O., & Pietquin, O. (Eds.) (2012). Data-driven methods for adaptive spoken dialogue systems: Computational learning for conversational interfaces. Springer. http://dx.doi.org/10.1007/978-1-4614-4803-7
Levow, G.-A. (2012). Bridging gaps for spoken dialog system frameworks in instructional settings. Proceedings of NAACL– HLT Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data, 21–22.
Longé, M., Eyraud, R., & Hullfish, K.C. (2012). Multimodal disambiguation of speech recognition. U.S. Patent No. 8095364 B2. Retrieved from http://www.google.com/patents/US8095364
López, V., Eisman, E. M., Castro, J. L., & Zurita, J. M. (2012). A case based reasoning model for multilingual language generation in dialogues. Expert Systems with Applications, 39(8), 7330–7337. http://dx.doi.org/10.1016/j.eswa.2012.01.085
López-Cózar, & R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems: Development and assessment. John Wiley.
Maglogiannis, I., Zafiropoulos, E., & Anagnostopoulos, I. (2009). An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Applied Intelligence, 30(1), 24–36. http://dx.doi.org/10.1007/s10489-007-0073-z
Mairesse, F., & Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational Linguistics, 37(3), 455–488. http://dx.doi.org/10.1162/COLI_a_00063
McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169. http://dx.doi.org/10.1145/505282.505285
McTear, M. F. (2004). Spoken dialogue technology. Toward the conversational user interface. Springer. http://dx.doi.org/10.1007/978-0-85729-414-2
McTear, M. F. (2011). Trends, challenges and opportunities in spoken dialogue research. In W. Minker, G. G. Lee, S. Nakamura, & J. Mariani (Eds.), Spoken dialogue systems technology and design (pp. 135–161). New York: Springer. http://dx.doi.org/10.1007/978-1-4419-7934-6_6
McTear, M. F., & Callejas, Z. (2013). Voice application development for Android. Packt.
Melin, H., Sandell, A., & Ihse, M. (2001). CTT-bank: A speech controlled telephone banking system–An initial evaluation. TMHQPSR 42(1), 1–27.
Menezes, P., Lerasle, F., Dias, J., & Germa, T. (2007). Towards an interactive humanoid companion with visual tracking modalities. International Journal of Advanced Robotic Systems, 48–78. http://dx.doi.org/10.5772/4813
Migneault, J. P., Farzanfar, R., Wright, J. A., & Friedman, R. H. (2006). How to write health dialog for a talking computer. Journal of Biomedical Informatics, 39(5), 468–481. http://dx.doi.org/10.1016/j.jbi.2006.02.009 PMid:16564749
Minker, W., Albalate, A., Buhler, D., Pittermann, A., Pittermann, J., Strauss, P.-M., & Zaykovskiy, D. (2006). Recent trends in spoken language dialogue systems. ITI 4th International Conference on Information Communications Technology (ICIT '06), 1–2. http://dx.doi.org/10.1109/itict.2006.358271
Minker, W., Haiber, U., Heisterkamp, P., & Scheible, S. (2004). The SENECA spoken language dialogue system. Speech Communication, 43(1–2), 89–102. http://dx.doi.org/10.1016/j.specom.2004.01.005
Möller, S., Engelbrecht, K.P., & Schleicher, R. (2008). Predicting the quality and usability of spoken dialogue services. Speech Communication, 50(8–9), 730–744. http://dx.doi.org/10.1016/j.specom.2008.03.001
Möller, S., & Heusdens, R. (2013). Objective estimation of speech quality for communication systems. IEEE Transactions on Audio, Speech and Language Processing, 101(9), 1955–1967. http://dx.doi.org/10.1109/jproc.2013.2241374
Moors, A., Ellsworth, P. C., Scherer, K. R., & Frijda, N. H. (2013). Appraisal theories of emotion: State of the art and future development. Emotion Review, 5(2), 119–124. http://dx.doi.org/10.1177/1754073912468165
Nass, C., & Yen, C. (2012). The man who lied to his laptop: What we can learn about ourselves from our machines. Current Trade.
Neustein, A., & Markowitz, J. A. (2013). Mobile speech and advanced natural language solutions (2013 ed.). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-6018-3
O'Neill, I., Hanna, P., Liu, X., Greer, D., & McTear, M. F. (2005). Implementing advanced spoken dialogue management in Java. Science of Computer Programming, 54(1), 99–124. http://dx.doi.org/10.1016/j.scico.2004.05.006
Os, E. den, Boves, L., Lamel, L, & Baggia, P. (1999). Overview of the ARISE project. Proceedings of the 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999, 1527–1530 .
Pfeifer, L. M., & Bickmore, T. (2010). Designing embodied conversational agents to conduct longitudinal health interviews. Proceedings of Intelligent Virtual Agents, 4698–4703.
Picard, R. W. (2003). Affective computing: Challenges. International Journal of Human–Computer Studies, 59(1–2), 55–64. http://dx.doi.org/10.1016/S1071-5819(03)00052-1
Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech. Cambridge, MA: MIT Press.
Pieraccini, R., & Huerta, J. M. (2008). Where do we go from here? In L. Dybkjær & W. Minker (Eds.), Recent trends in discourse and dialogue (pp. 1–24). Springer Netherlands. http://dx.doi.org/10.1007/978-1-4020-6821-8_1
Pon-Barry, H., Schultz, K., Bratt, E.O., Clark, B., & Peters, S. (2006). Responding to student uncertainty in spoken tutorial dialogue systems. International Journal of Artificial Intelligence in Education, 16(2), 171–194.
Qu, C., Brinkman, W.-P., Ling, Y., Wiggers, P., & Heynderickx, I. (2014). Conversations with a virtual human: Synthetic emotions and human responses. Computers in Human Behavior, 34, 58–68. http://dx.doi.org/10.1016/j.chb.2014.01.033
Rabiner, L. R., & Huang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall. PMid:8430825
Ramelson, H. Z., Friedman, R. H., & Ockene, J. K. (1999). An automated telephone-based smoking cessation education and counseling system. Patient Education and Counseling, 36(2), 131–144. http://dx.doi.org/10.1016/S0738-3991(98)00130-X
Rich, C., & Sidner, C. L. (1998). COLLAGEN: A collaboration manager for software interface agents. User Modeling and User- Adapted Interaction, 8(3–4), 315–350. http://dx.doi.org/10.1023/A:1008204020038
Rieser, V., Lemon, O., & Keizer, S. (2014). Natural language generation as incremental planning under uncertainty: Adaptive information presentation for statistical dialogue systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(5), 979–994. http://dx.doi.org/10.1109/TASL.2014.2315271
Roda, C., Angehrn, A., & Nabeth, T. (2001). Conversational agents for advanced learning: Applications and research. Proceedings of BotShow 2001, 1–7.
Rodríguez, W. R., Saz, O., & Lleida, E. (2012). A prelingual tool for the education of altered voices. Speech Communication, 54(5), 583–600. http://dx.doi.org/10.1016/j.specom.2011.05.006
Rothkrantz, L. J. M., Wiggers, P., Flippo, F., Woei-A-Jin, D., & van Vark, R. J. (2004). Multimodal dialogue management. Lecture Notes in Computer Science, 3206, 621–628. http://dx.doi.org/10.1007/978-3-540-30120-2_78
Russ, G., Sallans, B., & Hareter, H. (2005). Semantic based information fusion in a multimodal interface. Proceedings of the International Conference on Human–Computer Interaction, HCI '05, Las Vegas. Lawrence Erlbaum.
Saz, O., Yin, S. C., Lleida, E., Rose, R., Vaquero, C., & Rodríguez, W. R. (2009). Tools and technologies for computer-aided speech and language therapy. Speech Communication, 51(10), 948–967. http://dx.doi.org/10.1016/j.specom.2009.04.006
Schlangen, D., & Skantze, G. (2011). A general, abstract model of incremental dialogue processing. Dialogue & Discourse, 2(1), 83–111. http://dx.doi.org/10.5087/dad.2011.105
Schuller, B. W., & Batliner, A. (2013). Computational paralinguistics: Emotion, affect and personality in speech and language processing. John Wiley & Sons. http://dx.doi.org/10.1002/9781118706664
Sekmen, A., & Challa, P. (2013). Assessment of adaptive human– robot interactions. Knowledge-Based Systems, 42, 49–59. http://dx.doi.org/10.1016/j.knosys.2013.01.003
Seneff, S. (2002). Response planning and generation in the MERCURY flight reservation system. Computer Speech and Language, 16(3– 4), 283–312. http://dx.doi.org/10.1016/S0885-2308(02)00011-6
Stewart, J. Q. (1922). An electrical analogue of the vocal organs. Nature, 110, 311–312. http://dx.doi.org/10.1038/110311a0
Turing, A. (1950). Computing machinery and intelligence. Mind, 236, 433–460. http://dx.doi.org/10.1093/mind/LIX.236.433
Vipperla, R., Wolters, M., & Renals, S. (2012). Spoken dialogue interfaces for older people. In K. J. Turner (Ed.), Advances in home care technologies (pp. 118–137). IOS Press.
Walker, M., Hindle, D., Fromer, J., Di Fabbrizio, G., & Mestel, C. (1997). Evaluating competing agent strategies for a voice email agent. Proceedings of the 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997, 2219–2222.
Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language, 12(4), 317–347. http://dx.doi.org/10.1006/csla.1998.0110
Wang, Z., & Lemon, O. (2013). A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information. Proceedings of 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 423–432.
Weizenbaum, J. (1966). ELIZA–A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. http://dx.doi.org/10.1145/365153.365168
Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2011). Some background on dialogue management and conversational speech for dialogue systems. Computer Speech and Language, 25(2), 128–139. http://dx.doi.org/10.1016/j.csl.2010.03.001
Williams, J. D., Yu, K., Chaib-draa, B., Lemon, O., Pieraccini, R., Pietquin, O., … & Young, S. (2012). Introduction to the issue on advances in spoken dialogue systems and mobile interface. IEEE Journal of Selected Topics in Signal Processing, 6(8), 889–890. http://dx.doi.org/10.1109/JSTSP.2012.2234401
Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179. http://dx.doi.org/10.1109/JPROC.2012.2225812
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T.S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58. http://dx.doi.org/10.1109/TPAMI.2008.52 PMid:19029545
Zhu, C., Sheng, W. (2011). Motion- and location-based online human daily activity recognition. Pervasive and Mobile Computing, 7(2), 256–269. http://dx.doi.org/10.1016/j.pmcj.2010.11.004
Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPITER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8, 85–96. http://dx.doi.org/10.1109/89.817460

Fuente de los datos: Dialnet

Review of spoken dialogue systems

Universidad de Granada

Universidad Carlos III de Madrid

Universidad de Sevilla

Objetivos de desarrollo sostenible

Resumen

Referencias bibliográficas