Review of spoken dialogue systems

  1. Ramón López-Cózar Delgado 1
  2. Zoraida Callejas 1
  3. David Griol 2
  4. José Francisco Quesada Moreno 3
  1. 1 Universidad de Granada
    info

    Universidad de Granada

    Granada, España

    ROR https://ror.org/04njjy449

  2. 2 Universidad Carlos III de Madrid
    info

    Universidad Carlos III de Madrid

    Madrid, España

    ROR https://ror.org/03ths8210

  3. 3 Universidad de Sevilla
    info

    Universidad de Sevilla

    Sevilla, España

    ROR https://ror.org/03yxnpp24

Revue:
Loquens : revista española de ciencias del habla

ISSN: 2386-2637

Année de publication: 2014

Número: 1

Pages: 3

Type: Article

DOI: 10.3989/LOQUENS.2014.012 DIALNET GOOGLE SCHOLAR lock_openAccès ouvert editor

D'autres publications dans: Loquens : revista española de ciencias del habla

Objectifs de Développement Durable

Résumé

Los sistemas de diálogo son programas de ordenador desarrollados para interaccionar con los usuarios mediante habla, con la finalidad de proporcionarles servicios automatizados. La interacción se lleva a cabo mediante turnos de un tipo de diálogo que, en muchos estudios existentes en la literatura, los investigadores intentan que se parezca lo más posible al diálogo real que se lleva a cabo entre las personas en lo que se refiere a naturalidad, inteligencia y contenido afectivo. En este artículo describimos los fundamentos de esta tecnología, incluyendo las tecnologías básicas que se utilizan para implementar este tipo de sistemas. También presentamos una evolución de la tecnología y comentamos algunas aplicaciones actuales. Asimismo, describimos paradigmas de interacción, incluyendo lenguajes de script y desarrollo de interfaces conversacionales para aplicaciones móviles. Un aspecto clave de esta tecnología consiste en realizar un correcto modelado del usuario. Por este motivo, discutimos diversos modelos afectivos, de personalidad y contextuales. Finalmente, comentamos algunas líneas de investigación actuales relacionadas con la comunicación verbal, interacción multimodal y gestión del diálogo.

Références bibliographiques

  • Acosta, J. C., & Ward, N. G. (2011). Achieving rapport with turnby-turn, user-responsive emotional coloring. Speech Communication, 53(9–10), 1137–1148. http://dx.doi.org/10.1016/j.specom.2010.11.006
  • Ahmad, F., Hogg-Johnson, S., Stewart, D. E., Skinner, H. A., Glazier, R. H., & Levinson, W. (2009). Computer-assisted screening for intimate partner violence and control: A randomized trial. Annals of Internal Medicine, 151(2), 93–102. http://dx.doi.org/10.7326/0003-4819-151-2-200907210-00124 PMid:19487706
  • Alexandersson, J., Girenko, A., Spiliotopoulos, D., Petukhova, V., Klakow, D., Koryzis, D., … & Gardner, M. (2014). Metalogue: A multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management. Proceedings of 10th International Conference on Intelligent Environments (IE '14), 365–368. http://dx.doi.org/10.1109/ie.2014.67
  • Allen, J. (1995). Natural language understanding. Redwood City, CA: The Benjamin Cummings.
  • Andrade, A. O., Pereira, A. A., Walter, S., Almeida, R., Loureiro, R., Compagna, D., & Kyberd, P. J. (2014). Bridging the gap between robotic technology and health care. Biomedical Signal Processing and Control, 10, 65–78. http://dx.doi.org/10.1016/j.bspc.2013.12.009
  • Andreani, G., Di Fabbrizio, D., Gilbert, M., Gillick, D., Hakkani-Tur, D., & Lemon, O. (2006). Let's DISCOH: Collecting an annotated open corpus with dialogue acts and reward signals for natural language helpdesks. IEEE 2006 Workshop on Spoken Language Technology, 218–221. http://dx.doi.org/10.1109/SLT.2006.326794
  • Baker, R. S. J. d., D'Mello, S. K., Rodrigo, M. M. T., & Graesser, A. C. (2010). Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive–affective states during interactions with three different computer-based learning environments. International Journal of Human–Computer Studies, 68(4), 223–241. http://dx.doi.org/10.1016/j.ijhcs.2009.12.003
  • Balahur, A., Mihalcea, R., & Montoyo, A. (2014). Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications. Computer Speech and Language, 28(1), 1–6. http://dx.doi.org/10.1016/j.csl.2013.09.003
  • Baptist, L., & Seneff, S. (2000). GENESIS-II: A versatile system for language generation in conversational system applications. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), 3, 271–274.
  • Batliner, A., Seppi, D. Steidl, S., & Schuller, B. (2010). Segmenting into adequate units for automatic recognition of emotion-related episodes: A speech-based approach. Advances in Human Computer Interaction, 2010. http://dx.doi.org/10.1155/2010/782802
  • Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G., & Tobiasson, H. (2009). The MonAMI reminder: A spoken dialogue system for face-to-face interaction. Proceedings of the 10th INTERSPEECH Conference 2009, 296–299.
  • Bickmore, T., & Giorgino, T. (2006). Health dialog systems for patients and consumers. Journal of Biomedical Informatics, 39(5), 556–571. http://dx.doi.org/10.1016/j.jbi.2005.12.004 PMid:16464643
  • Bickmore, T. W., Puskar, K., Schlenk, E. A., Pfeifer, L. M., & Sereika, S. M. (2010). Maintaining reality: Relational agents for antipsychotic medication adherence. Interacting with Computers, 22(4), 276–288. http://dx.doi.org/10.1016/j.intcom.2010.02.001
  • Black, L. A., McTear, M. F., Black, N. D., Harper, R., & Lemon, M. (2005). Appraisal of a conversational artefact and its utility in remote patient monitoring. Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, 506–508. http://dx.doi.org/10.1109/CBMS.2005.33
  • Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken language interface research. Computer Science Department, Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/~max/mainpage_files/bohus%20et%20al%20 olympus_hlt2007.pdf http://dx.doi.org/10.3115/1556328.1556333
  • Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. Proceedings of the 8th European Conference on Speech Communication and Technology. EUROSPEECH 2003–INTERSPEECH 2003, 597–600.
  • Bouakaz, S., Vacher, M., Bobillier Chaumon, M.-E., Aman, F., Bekkadja, S., Portet, F., … & Chevalier, T. (2014). CIRDO: Smart companion for helping elderly to live at home for longer. IRBM, 35(2), 100–108. http://dx.doi.org/10.1016/j.irbm.2014.02.011
  • Boves L., & Os, E. den (2002). Multimodal services–A MUST for UMTS (Tech. Rep.). EURESCOM 2002.
  • Bui, T. H. (2006). Multimodal dialogue management - State of the art. Human Media Interaction Department, University of Twente (Vol. 2). PMid:16789818 PMCid:PMC1475712
  • Callejas, Z., Griol, D., Engelbrecht, K.-P., & López-Cózar, R. (2014). A clustering approach to assess real user profiles in spoken dialogue systems. In J. Mariani, S. Rosset, M. Garnier-Rizet & L. Devillers (Eds.), Natural interaction with robots, knowbots and smartphones (pp. 327–334). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-8280-2_29
  • Callejas, Z., Griol, D., & López-Cózar, R. (2011). Predicting user mental states in spoken dialogue systems. EURASIP Journal on Advances in Signal Processing, 2001, 6. http://dx.doi.org/10.1186/1687-6180-2011-6
  • Callejas, Z., Griol, D., & López-Cózar, R. (2014). A framework for the assessment of synthetic personalities according to user perception. International Journal of Human–Computer Studies, 72(7), 567–583. http://dx.doi.org/10.1016/j.ijhcs.2014.02.002
  • Callejas, Z., López-Cózar, D., Ábalos, N., & Griol, D. (2011). Affective conversational agents: The role of personality and emotion in spoken interactions. In D. Pérez-Marín & I. Pascual-Nieto (Eds.), Conversational agents and natural language interaction: Techniques and effective practices (pp. 203–222). IGI Global. http://dx.doi.org/10.4018/978-1-60960-617-6.ch009
  • Callejas, Z., Ravenet, B., Ochs, M., & Pelachaud, C. (2014). A model to generate adaptive multimodal job interviews with a virtual recruiter. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14), 3615–3619.
  • Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37. http://dx.doi.org/10.1109/T-AFFC.2010.1
  • Cavazza, M., de la Camara, R. S., & Turunen, M. (2010). How was your day? A Companion ECA. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 1629–1630.
  • Cohen, P. (1997). Dialogue modeling. In R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology (pp. 204–210). New York: Cambridge University Press PMid:9401498
  • Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.
  • Corradini, A., Fredriksson, M., Mehta, M., Königsmann, J., Bernsen, N. O., & Johanneson, L. (2004). Towards believable behavior generation for embodied conversational agents. Proceedings of the Workshop on Interactive Visualisation and Interaction Technologies (IV&IT), 946–953. http://dx.doi.org/10.1007/978-3-540-24688-6_121
  • Creed, C., & Beale, R. (2012). User interactions with an affective nutritional coach. Interacting with Computers, 24(5), 339–350. http://dx.doi.org/10.1016/j.intcom.2012.05.004
  • Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 30–42. http://dx.doi.org/10.1109/TASL.2011.2134090
  • Dalianis, H. (1999). Aggregation in natural language generation. Computational Intelligence, 15(4), 384–414. http://dx.doi.org/10.1111/0824-7935.00099
  • De Silva, L. C., Morikawa, C., & Petra, I. M. (2012). State of the art of smart homes. Engineering Applications of Artificial Intelligence, 25(7), 1313–1321. http://dx.doi.org/10.1016/j.engappai.2012.05.002
  • Delichatsios, H., Friedman, R. H., Glanz, K., Tennstedt, S., Smigelski, C., Pinto, B., … & Gillman, M. W. (2001). Randomized trial of a "talking computer" to improve adults' eating habits. American Journal of Health Promotion, 15(4), 215–224. http://dx.doi.org/10.4278/0890-1171-15.4.215 PMid:11349340
  • Dethlefs, N., Hastie, H., Cuayáhuitl, H., & Lemon, O. (2013). Conditional random fields for responsive surface realisation using global features. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 1254–1263.
  • Dutoit, T. (1996). An introduction to Text-to-Speech synthesis. Dordrecht: Kluwer Academic.
  • Dybkjær, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, 43(1–2), 33–54. http://dx.doi.org/10.1016/j.specom.2004.02.001
  • Expert Advisory Group on Language Engineering Standards (EAGLES) (1996). Evaluation of natural language processing systems (Tech. Rep.). EAGLES Document EAG-EWG-PR2. Center for Sprogteknologi, Copenhagen.
  • Failenschmid, K., Williams, D., Dybkjær, L., & Bernsen, N. (1999). DISC Deliverable D3.6 (Tech. Rep.). NISLab, University of Southern Denmark. PMid:10602415
  • Farzanfar, R., Frishkopf, S., Migneault, J., & Friedman, R. (2005). Telephone-linked care for physical activity: A qualitative evaluation of the use patterns of an information technology program for patients. Journal of Biomedical Informatics, 38(3), 220–228. http://dx.doi.org/10.1016/j.jbi.2004.11.011 PMid:15896695
  • Foster, M. E., Giuliani, M., & Isard, A. (2014). Task-based evaluation of context-sensitive referring expressions in human-robot dialogue. Language, Cognition and Neuroscience, 29(8), 1018– 1034. http://dx.doi.org/10.1080/01690965.2013.855802
  • Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408. http://dx.doi.org/10.1017/S0269888909990166
  • Fryer, L., & Carpenter, R. (2006). Bots as language learning tools. Language Learning and Technology, 10(3), 8–14
  • Geutner, P., Steffens, F., & Manstetten, D. (2002). Design of the VICO spoken dialogue system: Evaluation of user expectations by Wizard-of-Oz experiments. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC '02), Canary Islands.
  • Ghanem, K. G., Hutton, H. E., Zenilman, J. M., Zimba, R., & Erbelding, E. J. (2005). Audio computer assisted self interview and face to face interview modes in assessing response bias among STD clinic patients. Sexually Transmitted Infections, 81(5), 421–425. http://dx.doi.org/10.1136/sti.2004.013193 PMid:16199744 PMCid:PMC1745029
  • Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., … & Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17(1–2), 1–18. http://dx.doi.org/10.1016/0167-6393(95)00008-C
  • Graaf, M. M. A. de, & Ben Allouch, S. (2013). Exploring influencing variables for the acceptance of social robots. Robotics and Autonomous Systems, 61(12), 1476–1486. http://dx.doi.org/10.1016/j.robot.2013.07.007
  • Griol, D., Callejas, Z., López-Cózar, R., & Riccardi, G. (2014). A domain- independent statistical methodology for dialog management in spoken dialog systems. Computer Speech and Language, 28(3), 743–768. http://dx.doi.org/10.1016/j.csl.2013.09.002
  • Griol, D., Molina, J. M., Sanchis de Miguel, A., & Callejas, Z. (2012). A proposal to create learning environments in virtual worlds integrating advanced educative resources. Journal of Universal Computer Science, 18(18), 2516–2541.
  • Hardy, H., Biermann, A., Bryce Inouye, R., McKenzie, A., Strzalkowski, T., Ursu, C., … & Wu, M. (2006). The AMITIÉS system: Data-driven techniques for automated dialogue. Speech Communication, 48(3–4), 354–373. http://dx.doi.org/10.1016/j.specom.2005.07.006
  • Harris, R. A. (2004). Voice interaction design: Crafting the new conversational speech systems. Morgan Kaufmann.
  • He, Y., & Young, S. (2005). Semantic processing using the Hidden Vector State Model. Computer Speech and Language, 19(1), 85–106. http://dx.doi.org/10.1016/j.csl.2004.03.001
  • Heinroth, T., & Minker, W. (2013). Introducing spoken dialogue systems into Intelligent Environments. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-5383-3 http://dx.doi.org/10.1007/978-1-4614-5383-3
  • Hempel, T. (2008). Usability of speech dialogue systems: Listening to the target audience. Springer.
  • Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A. Jaitly, N., … & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. http://dx.doi.org/10.1109/MSP.2012.2205597
  • Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.
  • Hubal, R., & Day, R. S. (2006). Informed consent procedures: An experimental test using a virtual character in a dialog systems training application. Journal of Biomedical Informatics, 39(5), 532–540. http://dx.doi.org/10.1016/j.jbi.2005.12.006 PMid:16464644
  • Hudlicka, E. (2014). Affective BICA: Challenges and open questions. Biologically Inspired Cognitive Architectures, 7, 98–125. http://dx.doi.org/10.1016/j.bica.2013.11.002
  • Janarthanam, S., Lemon, O., Liu, X., Bartie, P., Mackaness, W., & Dalmas, T. (2013). A multithreaded conversational interface for pedestrian navigation and question answering. Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 151–153.
  • Jokinen, K., Kanto, K., & Rissanen, J. (2004). Adaptative user modelling in AthosMail. Lecture Notes on Computer Science, 3196, 149–158.
  • Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). Prentice Hall.
  • Kerly, A., Ellis, R., & Bull, S. (2008). CALMsystem: A conversational model for learner modelling. Knowledge-Based Systems, 21(3), 238–246. http://dx.doi.org/10.1016/j.knosys.2007.11.015
  • Kortum, P. (2008). HCI beyond the GUI: Design for haptic, speech, olfactory, and other nontraditional interfaces. Morgan Kaufmann.
  • Kovács, G. L., & Kopácsi, S. (2006). Some aspects of Ambient Intelligence. Acta Polytechnica Hungarica, 3(1), 35–60.
  • Krebber, J. Möller, S., Pegam, R., Jekosch, U., Melichar, M., & Rajman, M. (2004). Wizard-of-Oz tests for a dialog system in smart homes. Paper presented at the Joint Congress CFA/DAGA '04, Strasbourg.
  • Larsson, S. & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(4), 323–340. http://dx.doi.org/10.1017/S1351324900002539
  • Lebai Lutfi, S., Fernández-Martínez, F., Lucas-Cuesta, J. M., López-Lebón, L., & Montero, J. M. (2013). A satisfactionbased model for affect recognition from conversational features in spoken dialog systems. Speech Communication, 55(7–8), 825–840. http://dx.doi.org/10.1016/j.specom.2013.04.005
  • Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation. Computer Speech and Language, 25(2), 210–221. http://dx.doi.org/10.1016/j.csl.2010.04.005
  • Lemon, O., & Pietquin, O. (Eds.) (2012). Data-driven methods for adaptive spoken dialogue systems: Computational learning for conversational interfaces. Springer. http://dx.doi.org/10.1007/978-1-4614-4803-7
  • Levow, G.-A. (2012). Bridging gaps for spoken dialog system frameworks in instructional settings. Proceedings of NAACL– HLT Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data, 21–22.
  • Longé, M., Eyraud, R., & Hullfish, K.C. (2012). Multimodal disambiguation of speech recognition. U.S. Patent No. 8095364 B2. Retrieved from http://www.google.com/patents/US8095364
  • López, V., Eisman, E. M., Castro, J. L., & Zurita, J. M. (2012). A case based reasoning model for multilingual language generation in dialogues. Expert Systems with Applications, 39(8), 7330–7337. http://dx.doi.org/10.1016/j.eswa.2012.01.085
  • López-Cózar, & R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems: Development and assessment. John Wiley.
  • Maglogiannis, I., Zafiropoulos, E., & Anagnostopoulos, I. (2009). An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Applied Intelligence, 30(1), 24–36. http://dx.doi.org/10.1007/s10489-007-0073-z
  • Mairesse, F., & Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational Linguistics, 37(3), 455–488. http://dx.doi.org/10.1162/COLI_a_00063
  • McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169. http://dx.doi.org/10.1145/505282.505285
  • McTear, M. F. (2004). Spoken dialogue technology. Toward the conversational user interface. Springer. http://dx.doi.org/10.1007/978-0-85729-414-2
  • McTear, M. F. (2011). Trends, challenges and opportunities in spoken dialogue research. In W. Minker, G. G. Lee, S. Nakamura, & J. Mariani (Eds.), Spoken dialogue systems technology and design (pp. 135–161). New York: Springer. http://dx.doi.org/10.1007/978-1-4419-7934-6_6
  • McTear, M. F., & Callejas, Z. (2013). Voice application development for Android. Packt.
  • Melin, H., Sandell, A., & Ihse, M. (2001). CTT-bank: A speech controlled telephone banking system–An initial evaluation. TMHQPSR 42(1), 1–27.
  • Menezes, P., Lerasle, F., Dias, J., & Germa, T. (2007). Towards an interactive humanoid companion with visual tracking modalities. International Journal of Advanced Robotic Systems, 48–78. http://dx.doi.org/10.5772/4813
  • Migneault, J. P., Farzanfar, R., Wright, J. A., & Friedman, R. H. (2006). How to write health dialog for a talking computer. Journal of Biomedical Informatics, 39(5), 468–481. http://dx.doi.org/10.1016/j.jbi.2006.02.009 PMid:16564749
  • Minker, W., Albalate, A., Buhler, D., Pittermann, A., Pittermann, J., Strauss, P.-M., & Zaykovskiy, D. (2006). Recent trends in spoken language dialogue systems. ITI 4th International Conference on Information Communications Technology (ICIT '06), 1–2. http://dx.doi.org/10.1109/itict.2006.358271
  • Minker, W., Haiber, U., Heisterkamp, P., & Scheible, S. (2004). The SENECA spoken language dialogue system. Speech Communication, 43(1–2), 89–102. http://dx.doi.org/10.1016/j.specom.2004.01.005
  • Möller, S., Engelbrecht, K.P., & Schleicher, R. (2008). Predicting the quality and usability of spoken dialogue services. Speech Communication, 50(8–9), 730–744. http://dx.doi.org/10.1016/j.specom.2008.03.001
  • Möller, S., & Heusdens, R. (2013). Objective estimation of speech quality for communication systems. IEEE Transactions on Audio, Speech and Language Processing, 101(9), 1955–1967. http://dx.doi.org/10.1109/jproc.2013.2241374
  • Moors, A., Ellsworth, P. C., Scherer, K. R., & Frijda, N. H. (2013). Appraisal theories of emotion: State of the art and future development. Emotion Review, 5(2), 119–124. http://dx.doi.org/10.1177/1754073912468165
  • Nass, C., & Yen, C. (2012). The man who lied to his laptop: What we can learn about ourselves from our machines. Current Trade.
  • Neustein, A., & Markowitz, J. A. (2013). Mobile speech and advanced natural language solutions (2013 ed.). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-6018-3
  • O'Neill, I., Hanna, P., Liu, X., Greer, D., & McTear, M. F. (2005). Implementing advanced spoken dialogue management in Java. Science of Computer Programming, 54(1), 99–124. http://dx.doi.org/10.1016/j.scico.2004.05.006
  • Os, E. den, Boves, L., Lamel, L, & Baggia, P. (1999). Overview of the ARISE project. Proceedings of the 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999, 1527–1530 .
  • Pfeifer, L. M., & Bickmore, T. (2010). Designing embodied conversational agents to conduct longitudinal health interviews. Proceedings of Intelligent Virtual Agents, 4698–4703.
  • Picard, R. W. (2003). Affective computing: Challenges. International Journal of Human–Computer Studies, 59(1–2), 55–64. http://dx.doi.org/10.1016/S1071-5819(03)00052-1
  • Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech. Cambridge, MA: MIT Press.
  • Pieraccini, R., & Huerta, J. M. (2008). Where do we go from here? In L. Dybkjær & W. Minker (Eds.), Recent trends in discourse and dialogue (pp. 1–24). Springer Netherlands. http://dx.doi.org/10.1007/978-1-4020-6821-8_1
  • Pon-Barry, H., Schultz, K., Bratt, E.O., Clark, B., & Peters, S. (2006). Responding to student uncertainty in spoken tutorial dialogue systems. International Journal of Artificial Intelligence in Education, 16(2), 171–194.
  • Qu, C., Brinkman, W.-P., Ling, Y., Wiggers, P., & Heynderickx, I. (2014). Conversations with a virtual human: Synthetic emotions and human responses. Computers in Human Behavior, 34, 58–68. http://dx.doi.org/10.1016/j.chb.2014.01.033
  • Rabiner, L. R., & Huang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall. PMid:8430825
  • Ramelson, H. Z., Friedman, R. H., & Ockene, J. K. (1999). An automated telephone-based smoking cessation education and counseling system. Patient Education and Counseling, 36(2), 131–144. http://dx.doi.org/10.1016/S0738-3991(98)00130-X
  • Rich, C., & Sidner, C. L. (1998). COLLAGEN: A collaboration manager for software interface agents. User Modeling and User- Adapted Interaction, 8(3–4), 315–350. http://dx.doi.org/10.1023/A:1008204020038
  • Rieser, V., Lemon, O., & Keizer, S. (2014). Natural language generation as incremental planning under uncertainty: Adaptive information presentation for statistical dialogue systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(5), 979–994. http://dx.doi.org/10.1109/TASL.2014.2315271
  • Roda, C., Angehrn, A., & Nabeth, T. (2001). Conversational agents for advanced learning: Applications and research. Proceedings of BotShow 2001, 1–7.
  • Rodríguez, W. R., Saz, O., & Lleida, E. (2012). A prelingual tool for the education of altered voices. Speech Communication, 54(5), 583–600. http://dx.doi.org/10.1016/j.specom.2011.05.006
  • Rothkrantz, L. J. M., Wiggers, P., Flippo, F., Woei-A-Jin, D., & van Vark, R. J. (2004). Multimodal dialogue management. Lecture Notes in Computer Science, 3206, 621–628. http://dx.doi.org/10.1007/978-3-540-30120-2_78
  • Russ, G., Sallans, B., & Hareter, H. (2005). Semantic based information fusion in a multimodal interface. Proceedings of the International Conference on Human–Computer Interaction, HCI '05, Las Vegas. Lawrence Erlbaum.
  • Saz, O., Yin, S. C., Lleida, E., Rose, R., Vaquero, C., & Rodríguez, W. R. (2009). Tools and technologies for computer-aided speech and language therapy. Speech Communication, 51(10), 948–967. http://dx.doi.org/10.1016/j.specom.2009.04.006
  • Schlangen, D., & Skantze, G. (2011). A general, abstract model of incremental dialogue processing. Dialogue & Discourse, 2(1), 83–111. http://dx.doi.org/10.5087/dad.2011.105
  • Schuller, B. W., & Batliner, A. (2013). Computational paralinguistics: Emotion, affect and personality in speech and language processing. John Wiley & Sons. http://dx.doi.org/10.1002/9781118706664
  • Sekmen, A., & Challa, P. (2013). Assessment of adaptive human– robot interactions. Knowledge-Based Systems, 42, 49–59. http://dx.doi.org/10.1016/j.knosys.2013.01.003
  • Seneff, S. (2002). Response planning and generation in the MERCURY flight reservation system. Computer Speech and Language, 16(3– 4), 283–312. http://dx.doi.org/10.1016/S0885-2308(02)00011-6
  • Stewart, J. Q. (1922). An electrical analogue of the vocal organs. Nature, 110, 311–312. http://dx.doi.org/10.1038/110311a0
  • Turing, A. (1950). Computing machinery and intelligence. Mind, 236, 433–460. http://dx.doi.org/10.1093/mind/LIX.236.433
  • Vipperla, R., Wolters, M., & Renals, S. (2012). Spoken dialogue interfaces for older people. In K. J. Turner (Ed.), Advances in home care technologies (pp. 118–137). IOS Press.
  • Walker, M., Hindle, D., Fromer, J., Di Fabbrizio, G., & Mestel, C. (1997). Evaluating competing agent strategies for a voice email agent. Proceedings of the 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997, 2219–2222.
  • Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language, 12(4), 317–347. http://dx.doi.org/10.1006/csla.1998.0110
  • Wang, Z., & Lemon, O. (2013). A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information. Proceedings of 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 423–432.
  • Weizenbaum, J. (1966). ELIZA–A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. http://dx.doi.org/10.1145/365153.365168
  • Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2011). Some background on dialogue management and conversational speech for dialogue systems. Computer Speech and Language, 25(2), 128–139. http://dx.doi.org/10.1016/j.csl.2010.03.001
  • Williams, J. D., Yu, K., Chaib-draa, B., Lemon, O., Pieraccini, R., Pietquin, O., … & Young, S. (2012). Introduction to the issue on advances in spoken dialogue systems and mobile interface. IEEE Journal of Selected Topics in Signal Processing, 6(8), 889–890. http://dx.doi.org/10.1109/JSTSP.2012.2234401
  • Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179. http://dx.doi.org/10.1109/JPROC.2012.2225812
  • Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T.S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58. http://dx.doi.org/10.1109/TPAMI.2008.52 PMid:19029545
  • Zhu, C., Sheng, W. (2011). Motion- and location-based online human daily activity recognition. Pervasive and Mobile Computing, 7(2), 256–269. http://dx.doi.org/10.1016/j.pmcj.2010.11.004
  • Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPITER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8, 85–96. http://dx.doi.org/10.1109/89.817460