Colorstripe

Ymir: A Mind Model

for Communicative Creatures and Humanoids

Kristinn R. Thórisson


Ymir Ymir is a broad, generative model of psychosocial dialogue skills that bridges between multimodal perception, decision and multimodal action in a coherent framework. It represents a distributed, modular approach that can be used to create autonomous characters capable of full-duplex [1] multimodal perception and action generation (Thórisson, 1998, 1996). Features from three A.I. approaches have been adopted in Ymir: Blackboard systems (Adler, 1992, Nii, 1989, Engelmore & Morgan, 1988, Selfridge, 1959), Schema Theory (Arbib, 1992) and behavior-based systems (Maes, 1990). However, Ymir goes beyond any one of these in the number of communication modalities and performance criteria it addresses. The goals behind the architecture, all of which have been successfully addressed in the model, can be summarized as:

Given the complexities of integrating numerous multimodal capabilities in a single system, two practical features that make the architecture more useful as a research tool are:

A character created in this architecture should have the following abilities:

Two critical abilities that affect the way the whole system is designed, and allow us to meet the requirements in the last two bullets:


The six main types of elements in Ymir are:

Ymir

Multimodal information streams into the processing layers from the user (big arrows on left) and are processed at three different levels, using blackboards (yellow planes) for communicating intermediate and final results. An action scheduler (cylinder) composes particular motor morphologies and sends them to the agent's animation module (see ToonFace ). The current implementation of the Ymir/Gandalf system comprises about 13.000 lines of custom-written LISP code and a few hundred lines of C code (excluding third-party code such as speech recognition and synthesis, space-tracking drivers and realtime vision-based pupil tracking).

Details on the inner workings of Ymir are given in my thesis , chapters 7, 8 & 9; proof that it really works is given in Chapter 10.


Papers on Ymir

Thórisson, K. R. (1999).  A Mind Model for Multimodal Communicative Creatures and Humanoids.   International Journal of Applied Artificial Intelligence, 13 (4-5), 449-486. This is the main Ymir paper. It provides an overview of the Ymir architecture and gives examples of implementation and performance of the first Ymir instantiation, Ymir Alpha. [DOWNLOAD PDF]

Thórisson, K. R. (2002). Machine Perception of Multimodal Natural Dialogue. P. McKevitt (Ed.), Language, vision & music . Amsterdam: John Benjamins. Focuses on the perceptual mechanims of Ymir, and their implementation in the communicative humanoid Gandalf, the first agent created in the Ymir architecture. [DOWNLOAD PDF]

Thórisson, K. R. (2002). Natural Turn-Taking Needs No Manual: A Computational Theory and Model, from Perception to Action. In B. Granström (Ed.), Multimodality in Language and Speech Systems. Heidelberg: Springer-Verlag. In-depth details on the full-duplex turn-taking system implemented for Gandalf in the Ymir architecture. [DOWNLOAD PDF]

More recent papers that build on the Ymir foundation:

Ng-Thow-Hing, V., T. List, K. Thórisson, J. Lim, J. Wormer (2007). Design and Evaluation of Communication Middleware in a Humanoid Robot Architecture. IROS 2007 Workshop on Measures and Procedures for the Evaluation of Robot Architectures and Middleware, Oct. 29, 2007, San Diego, CA, 2007. PDF

Thórisson, K. R. (2007). Integrated A.I. Systems. Minds & Machines, 17:11-25, 2007. Invited paper at The Dartmouth Artificial Intelligence Conference: The Next 50 Years — Commemorating the 1956 Founding of AI as a Research Discipline, July 13-15, 2006, Dartmouth, New Hampshire, U.S.A. PDF

Thórisson, K.R., T. List, C. Pennock, J. DiPirro (2005). Whiteboards: Scheduling Blackboards for Semantic Routing of Messages & Streams. In K. R. Thórisson, H. Vilhjalmsson, S. Marsella (eds.), AAAI-05 Workshop on Modular Construction of Human-Like Intelligence, Pittsburgh, Pennsylvania, July 10, 8-15. Menlo Park, CA: American Association for Artificial Intelligence. PDF

List, T., J. Bins, R. B. Fisher, D. Tweed, K. R. Thórisson (2005). Two Approaches to a Plug-and-Play Vision Architecture - CAVIAR and Psyclone. In K. R. Thórisson, H. Vilhjalmsson, S. Marsella (eds.), AAAI-05 Workshop on Modular Construction of Human-Like Intelligence, Pittsburgh, Pennsylvania, July 10, 16-23. Menlo Park, CA: American Association for Artificial Intelligence. PDF

Thórisson, K. R., H. Benko, A. Arnold, D. Abramov, S. Maskey, A. Vaseekaran (2004). Constructionist Design Methodology for Interactive Intelligences. A.I. Magazine, 25(4): 77-90. Menlo Park, CA: American Association for Artificial Intelligence. PDF

 References

Footnotes

[1] Full-duplex in this context means that the interaction is open-loop, i.e. the exchange of information is not step-lock.

[2] Even in highly automated tasks performed by humans, such as touch typing, people can cancel motor sequences within 90 ms of a perceptual cue (Kosslyn & Koenig, 1992). This means that the fastest perception-action loop in our model should be no longer than 90 ms, quite a strict requirement for a complex system like multimodal dialogue.


point1.gif For further information, see Thórisson's selected publications and thesis . point1.gif


[ Back to Thórisson's home page ]


Copyright 1998 K.R.Th. All rights reserved.