Ymir: A Mind Model

for Communicative Creatures and Humanoids

Kristinn R. Thórisson

Ymir architecture overview
Papers on Ymir Ymir as physical LEGO


Ymir is a broad, generative model of psychosocial dialogue skills that bridges between multimodal perception, decision and multimodal action in a coherent framework. It represents a distributed, modular approach that can be used to create autonomous characters capable of full-duplex (i.e. the interaction is open-loop -- the exchange of information is not step-lock). multimodal perception and action generation (Thórisson, 1998, 1996). Features from three A.I. approaches have been adopted in Ymir: Blackboard systems (Adler, 1992, Nii, 1989, Engelmore & Morgan, 1988, Selfridge, 1959), Schema Theory (Arbib, 1992) and behavior-based systems (Maes, 1990). However, Ymir goes beyond any one of these in the number of communication modalities and performance criteria it addresses. The goals behind the architecture, all of which have been successfully addressed in the model, can be summarized as:

Given the complexities of integrating numerous multimodal capabilities in a single system, two practical features that make the architecture more useful as a research tool are:

A character created in this architecture should have the following abilities:

Two critical abilities that affect the way the whole system is designed, and allow us to meet the requirements in the last two bullets:

The six main types of elements in Ymir are (see image, above):

Multimodal information streams into the processing layers from the user (big arrows on left) and are processed at three different levels, using blackboards (yellow planes) for communicating intermediate and final results. An action scheduler (cylinder) composes particular motor morphologies and sends them to the agent's animation module (see ToonFace ). The current implementation of the Ymir/Gandalf system comprises about 13.000 lines of custom-written LISP code and a few hundred lines of C code (excluding third-party code such as speech recognition and synthesis, space-tracking drivers and realtime vision-based pupil tracking).

Details on the inner workings of Ymir are given in my thesis , chapters 7, 8 & 9; proof that it really works is given in Chapter 10.


Recent papers building on the Ymir foundation

Ng-Thow-Hing, V., K. R. Thórisson, R. K. Sarvadevabhatla, J. Wormer and Thor List (2009). Cognitive Map Architecture: Facilitation of Human-Robot Interaction in Humanoid Robot. IEEE Robotics & Automation Magazine, March, 16(1):55-66. [PDF]

Ng-Thow-Hing, V., T. List, K. Thórisson, J. Lim, J. Wormer (2007). Design and Evaluation of Communication Middleware in a Humanoid Robot Architecture. IROS 2007Workshop on Measures and Procedures for the Evaluation of Robot Architectures and Middleware, Oct. 29, 2007, San Diego, CA, 2007. [PDF]

Thórisson, K. R. (2007). Integrated A.I. Systems. Minds & Machines, 17:11-25, 2007. Invited paper at The Dartmouth Artificial Intelligence Conference: The Next 50 Years — Commemorating the 1956 Founding of AI as a Research Discipline, July 13-15, 2006, Dartmouth, New Hampshire, U.S.A. [PDF]

Thórisson, K.R., T. List, C. Pennock, J. DiPirro (2005). Whiteboards: Scheduling Blackboards for Semantic Routing of Messages & Streams. In K. R. Thórisson, H. Vilhjalmsson, S. Marsella (eds.), AAAI-05 Workshop on Modular Construction of Human-Like Intelligence, Pittsburgh, Pennsylvania, July 10, 8-15. Menlo Park, CA: American Association for Artificial Intelligence. [PDF]

List, T., J. Bins, R. B. Fisher, D. Tweed, K. R. Thórisson (2005). Two Approaches to a Plug-and-Play Vision Architecture - CAVIAR and Psyclone. In K. R. Thórisson, H. Vilhjalmsson, S. Marsella (eds.), AAAI-05 Workshop on Modular Construction of Human-Like Intelligence, Pittsburgh, Pennsylvania, July 10, 16-23. Menlo Park, CA: American Association for Artificial Intelligence.[PDF]

Thórisson, K. R., H. Benko, A. Arnold, D. Abramov, S. Maskey, A. Vaseekaran (2004). Constructionist Design Methodology for Interactive Intelligences. A.I. Magazine, 25(4): 77-90. Menlo Park, CA: American Association for Artificial Intelligence. [PDF]


Older papers on Ymir

Thórisson, K. R. (1999).  A Mind Model for Multimodal Communicative Creatures and Humanoids.   International Journal of Applied Artificial Intelligence, 13 (4-5), 449-486. This is the main Ymir paper. It provides an overview of the Ymir architecture and gives examples of implementation and performance of the first Ymir instantiation, Ymir Alpha. [PDF]

Thórisson, K. R. (2002). Machine Perception of Multimodal Natural Dialogue. P. McKevitt (Ed.), Language, vision & music . Amsterdam: John Benjamins. Focuses on the perceptual mechanims of Ymir, and their implementation in the communicative humanoid Gandalf, the first agent created in the Ymir architecture. [PDF]

Thórisson, K. R. (2002). Natural Turn-Taking Needs No Manual: A Computational Theory and Model, from Perception to Action. In B. Granström (Ed.), Multimodality in Language and Speech Systems. Heidelberg: Springer-Verlag. In-depth details on the full-duplex turn-taking system implemented for Gandalf in the Ymir architecture.[PDF]




[1] Even in highly automated tasks performed by humans, such as touch typing, people can cancel motor sequences within 90 ms of a perceptual cue (Kosslyn & Koenig, 1992). This means that the fastest perception-action loop in our model should be no longer than 90 ms, quite a strict requirement for a complex system like multimodal dialogue.

point1.gif For further information, see Thórisson's selected publications and thesis . point1.gif

[ Back to Thórisson's home page ]

Copyright 2010 K.R.Th. All rights reserved.