Paul Mc Kevitt
Center for PersonKommunikation (CPK)
Institute of Electronic Systems (IES)
Aalborg University
Fredrik Bajers Vej 7-A5, DK- 9220, Aalborg Ø, DENMARK
pmck@kom.auc.dk
1 Introduction
There is a major motivating force which is driving the Humanities and Sciences/Engineering
towards each other in the area of integration of language and vision
processing by machines: SuperinformationhighwayS. This force is the ability
now to have information in text, voice, sound, graphic and video forms
available within minutes at local and 1. remote sites through interfaces
like Netscape and search engines like 2. AltaVista. People will be able
to pose their queries for retrieving information about say stocks and shares,
or good restaurants in a city or their bank account by speaking that query
to the machine. In turn, they will be able to direct the machine's
graphical display of the information it is presenting in response. Visual
information comes in many formats from diagrams to videos as does language
information both natural and formal. The Sciences/Engineering are more
concerned with methods for transmitting, processing, representing and retrieving
information across networks while the Humanities are more concerned
with the actual information itself. Slethei (1998) also makes this point
on convergence of the gap between the two cultures, especially in respect
of spoken dialogue systems (http://www.hd.uib.no/AcoHum/abs/Slethei.htm).
The area of MultiMedia is growing rapidly internationally and it is clear that it has various meanings from various points of view. MultiMedia can be separated into at least two areas: (1) traditional MultiMedia and (2) Intelligent MultiMedia (IntelliMedia). The former area is the one that people traditionally think of as being MultiMedia, encompassing the presentation of text, voice, sound and video/graphics with possibly touch and virtual reality linked in. However, the computer has little or no understanding of the meaning of what it is presenting. IntelliMedia, which involves the computer processing and understanding of perceptual input from speech, text and visual images and reacting to it is much more complex and involves technologies from the Engineering side in terms of spoken language processing, natural language processing, image processing, Computer Science and Artificial Intelligence and from the Humanities side in terms of Linguistics, Cognitive Science, Psychology and studies of the mind. (see Mc Kevitt 1994/95/96/97). This is the newest area of MultiMedia research which has seen an upsurge over the last two years and one where most universities internationally do not have all the necessary expertise locally. Traditional and Intelligent MultiMedia education and research are found in the Science/Engineering and Humanities/Humanistic Computing Departments at Aalborg University, Denmark.
2 IntelliMedia 2000+
The Institute for Electronic Systems at Aalborg University, Denmark has
expertise in the area of IntelliMedia and has already established an initiative
called IntelliMedia 2000+ funded by the Faculty of Science and Technology
(FaST). IntelliMedia 2000+ coordinates research on the production of a
number of real-time research demonstrators exhibiting examples of IntelliMedia
applications and education in the form of a new Master's degree in
IntelliMedia. An important emphasis is the integration of research
and education in IntelliMedia. IntelliMedia 2000+ is coordinated
from the Center for PersonKommunikation (CPK) which has a wealth
of experience and expertise in spoken language processing, one of the central
components of IntelliMedia, but also radio communications which would
be useful for mobile applications (CPK Annual Report, 1998).
More details on IntelliMedia 2000+ can be found on WWW: http://www.kom.auc.dk/CPK/MMUI/.
IntelliMedia 2000+ involves four research groups from three Departments
within the Institute for Electronic Systems: Computer Science (CS),
Medical Informatics (MI), Laboratory of Image Analysis (LIA) and Center
for PersonKommunikation (CPK), focusing on platforms for integration and
learning, expert systems and decision taking, image/vision processing,
and spoken language processing/sound localisation respectively. The first
two groups provide a strong basis for methods of integrating
semantics and conducting learning and decision taking while the latter
groups focus on the two main input/output components of IntelliMedia, vision
and speech/sound.
3 Education
Teaching is a large part of IntelliMedia 2000+ and two new courses have
been initiated: (1) MultiModal Human Computer Interaction, and (2)
Readings in Advanced Intelligent MultiMedia. MultiModal HCI, including
traditional HCI, involves teaching of methods for the development
of optimal interfaces through methods for layout of buttons, menus, and
form filling methods for screens but also includes advanced interfaces
using spoken dialogue and gesture. The course on Readings in Advanced Intelligent
MultiMedia is innovative and new and includes active learning where student
groups present state of the art research papers and invited guest lecturers
present their research from IntelliMedia 2000+. A new Master's Degree (M.Eng./M.Sc.)
has been established and incorporates the courses just mentioned as core
modules of a 1 and 1/2 year course taught in English on IntelliMedia. Each
semester has a theme associated with it and involves both project work
and courses. Semester I focusses on Basic methods, Semester II on Advanced
methods and III on a Master's Thesis in Intelligent MultiMedia. The latter
semester has no courses. The Masters course is open for non-Danish and
Danish students. All courses are given in English and the thesis can be
written in English or Danish. Each student is graded according to internationally
recognised grading schemes. More details can be found on WWW: http://www.kom.auc.dk/ESN/masters.
The emphasis on group organised and project oriented education at Aalborg University (Kjaersdam and Enemark 1994) is an excellent framework in which IntelliMedia, an inherently interdisciplinary subject, can be taught. Most courses involve students working on project work in groups in the unique Aalborg style. Here, each semester the students work together in groups of three to four on self-chosen projects and this has proven to give students better opportunities after their education. Approximately 50% of the courses have individual examinations and all courses can be examined as part of an oral examination based on the prepared project report. Groups can even design and implement a smaller part of a system which has been agreed upon between a number of groups. It is intended that there be a tight link between the education and research aspects of IntelliMedia 2000+ and that students can avail of software demonstrators and platforms developed but can also become involved in developing them. The Master's course is now in its second year with over 20 students, half of whom are from abroad and a number of student projects related to IntelliMedia 2000+ have already been completed (Bakman et al. 1997a, 1997b, Nielsen 1997, Tuns and Nielsen 1997). Currently five student groups are enrolled in the Master's conducting projects on multimodal interfaces, pool-game trainer, virtual steering wheel, audio-visual speech recognition, and face recognition. Occasionally, a Lifelong Learning course is given for returning students of Aalborg University who wish to continue their education. This course is a compression of the core IntelliMedia courses.
4 CHAMELEON
The results from the four research groups of IntelliMedia 2000+ have hitherto
to a large extent been developed within the groups themselves. However,
our goal was to establish collaboration among the groups in order to integrate
their results into developing IntelliMedia demonstrator systems and applications.
Some of the results would be integrated within a short term perspective
as some of the technologically based modules are already available, others
on the longer term as new results become available. The demonstrator
would be a single platform called CHAMELEON with a general architecture
of communicating agent modules processing inputs and outputs from different
modalities and each of which could be tailored to a number of application
domains. CHAMELEON would demonstrate that existing platforms for distributed
processing, decision taking, image processing, and spoken
dialogue processing could be interfaced to the single platform and act
as communicating agent modules within it. CHAMELEON would be independent
of any particular application domain. The first prototype of a CHAMELEON
software and hardware platform has been developed. CHAMELEON demonstrates
that existing software modules for (1) distributed processing and learning,
(2) decision taking, (3) image processing, and (4) spoken dialogue processing
can be interfaced to a single platform and act as communicating agent modules
within it.
CHAMELEON is independent of any particular application domain and the various modules can be distributed over different machines. Most of the modules are programmed in C++ and C. CHAMELEON demonstrates that (1) it is possible for agent modules to receive inputs particularly in the form of images and spoken dialogue and respond with required outputs, (2) individual agent modules can produce output in the form of semantic representations, (3) the semantic representations can be used for effective communication of information between different modules, and (4) various means of synchronising the communication between modules can be tested to produce optimal results. More details on CHAMELEON are found in Broendsted et al.(1998) and Mc Kevitt (1998) (http://www.hd.uib.no/AcoHum/abs/McKevitt-demo.htm) .
5 Conclusion
SuperinformationhighwayS are forcing the merging of the Humanities and
Sciences/Engineering in terms of processing, integrating, representing
and accessing information in multiple modalities including at least
text, voice, sounds and images/videos (Intelligent Multimedia).
Information from many cultures will be input in the form of natural and
formal speech and language with images in the form of simple diagrams
right up to videos. The Humanities will be concerned more with the
content of the information being passed while the Sciences/Engineering
will be more concerned with processing, representation and transmission.
As Horgan (1996) points out much of the future of science for 2000+
will be in the integration and engineering of existing theories, models
and systems with convergence. Aalborg University is well equipped in terms
of research expertise and education to be able to contribute to IntelliMedia
2000+ which will be important for the future of international computing
and media development. An important emphasis is the integration of research
and education in IntelliMedia. We believe IntelliMedia will also throw
light on the numerous developments in Computer and Cognitive Science (CS)
(O Nuallain 1995 and O Nuallain et al.1997). IntelliMedia 2000+ (http://www.kom.auc.dk/CPK/MMUI/)
will ensure the position of Denmark and Europe in the construction of the
future of SuperinformationhighwayS.
Acknowledgements
We take this opportunity to acknowledge support from the
Faculty of Science and Technology, Aalborg University, Denmark and from
the European Union (EU) under the ESPRIT (OPEN-LTR) Project 24 493. Paul
Mc Kevitt would also like to acknowledge the British Engineering and Physical
Sciences Research Council (EPSRC) for their generous funded support
under grant B/94/AF/1833 for the Integration of Natural Language,
Speech and Vision Processing (Advanced Fellow) and LIMSI-CNRS,
Orsay, France where he was a Visiting Professor whilst completing this
abstract.
Notes:
1 Netscape is a trademark of Netscape Communications Corporation.
2 AltaVista is a trademark of Digital Equipment Corporation.
3 Paul Mc Kevitt is also a British Engineering and Physical
Sciences Research Council (EPSRC) Advanced Fellow at the Department of
Computer Science, University of Sheffield, for five years under grant B/94/AF/1833
for the Integration of Natural Language, Speech and Vision Processing.
7 References
Broendsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P.
Mc Kevitt, T.B. Moeslund, K.G. Olesen (1998) A platform for developing
Intelligent MultiMedia applications. Technical Report R-98-1004,
Center for PersonKommunikation (CPK), Institute for Electronic Systems
(IES), Aalborg University, Denmark, May. Bakman, Lau, Mads Blidegn, Thomas
Dorf Nielsen, and Susana Carrasco Gonzalez (1997a) NIVICO - Natural Interface
for VIdeo COnferencing. Project Report (8th Semester), Department of Communication
Technology, Institute 8, Aalborg University, Denmark. Bakman, Lau, Mads
Blidegn, and Martin Wittrup (1997b) Improving human computer interaction
by adding speech, gaze, tracking and agents to a WIMP based environment.
Project Report (9th/10th Semester), Department of Communication Technology,
Institute 8, Aalborg University, Denmark. Baekgaard, Anders (1996)
Dialogue management in a Generic Dialogue System. Proceedings of
the Eleventh Twente Workshop on Language Technology (TWLT), Dialogue Management
in Natural Language Systems, 123-132. Twente, The Netherlands. Dalsgaard,
Paul and A. Baekgaard (1994) Spoken language dialogue systems,
In Prospects and Perspectives in Speech Technology: Proceedings
in Artificial Intelligence, Chr. Freksa, (Ed.), 178-191, September.
Muenchen, Germany, Infix. Horgan, John (1996) The end of science: facing
the limits of knowledge in the twilight of the scientific age. Reading,
Mass.: Addison-Wesley (Helix Books). Mc Kevitt, Paul (1994) Visions
for language. Proceedings of theWorkshop on Integration of Natural
Language and Vision processing. Twelfth American National Conference
on Artificial Intelligence (AAAI-94), Seattle,Washington, USA, August,
47-57. Mc Kevitt, Paul (Ed.) (1995/1996) Integration of Natural Language
and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic
Publishers. Mc Kevitt, Paul (1997) SuperinformationhighwayS. In ``Sprog
og Multimedier'' (Speech and Multimedia) Tom Broendsted and Inger
Lytje (Eds.), 166-183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag
(Aalborg University Press). Mc Kevitt, Paul (1998) CHAMELEON and the IntelliMedia
WorkBench: integrating research from the humanities, science and engineering.
In WWW and printed Proceedings of the International Conference on The Future
of the Humanities in the Digital Age: problems and perspectives for humanities
education and research. University of Bergen, Bergen, Norway, September
(http://www.hd.uib.no/AcoHum/abs/McKevitt.htm).
Nielsen, Joergen (1997) Distributed applications communication system applied
on IntelliMedia WorkBench. Project Report (8th Semester), Department
of Medical Informatics and Image Analysis (MIBA), Institute 8, Aalborg
University, Denmark.O Nuallain, Sean (1995) The search for mind: a new
foundation for cognitive science. Norwood, New Jersey: Ablex Publishing
Corporation. O Nuallain, Sean, Paul Mc Kevitt and Eoghan Mac Aogain (1997)
(Eds.) Two sciences of mind: readings in cognitive science and consciousness.
"Advances in Consciousness Research" (AiCR 9). USA: John Benjamins.
Slethei, Kolbjørn (1998) Can education bridge the gap between the
two cultures? In WWW and printed Proceedings of the International
Conference on The Future of the Humanities in the Digital Age: problems
and perspectives for humanities education and research. University
of Bergen, Bergen, Norway, September (http://www.hd.uib.no/AcoHum/abs/McKevitt.htm)
Tuns, Nicolae G. and Thomas Dorf Nielsen (1998) Experimenting
with phase web as AI support in the CHAMELEON system. Project Report (9th
semester), Department of Computer Science, Institute 8, Aalborg University,
Denmark.