Minutes of the 3rd working group meeting


Bergen, Sep. 28, 1998.

Present at the meeting: Bill Black, Tony McEnery, Paul Mc Kevitt, Koenraad de Smedt (chair), Andy Way.

Publication

The plan for a book on advanced computing in the humanities, and in particular a chapter in it about computational linguistics, is discussed. The publication should on the one hand include an analysis part, and on the other hand a part with proposals and recommendations. One way is to make these parts different volumes. However, since the time scale is limited, a single integrated volume seems more feasible. Each chapter should therefore cover analysis and proposals/recommendations. Especially the analysis part, which intends to describe the situation today, is a moving target. It should be regarded as a snapshot and should also be provided on the web in an easily updateable way.

An editor for the chapter on computional linguistics needs to be appointed. Fact finding is needed. Some facts from the Euromap study could be incorporated. A questionnaire should be directed to higher education institutions teaching computational linguistics. It seems most practical to do this by web questionnaire with as many predefined answers as possible. Care should be taken to approach only one person per site. A direct approach of responsible persons is important. The questionnaire could ask for the following:

  1. subjects offered (shown in checklist; wide area including corpus linguistics, empirical approaches, speech processing, psycholinguistic modelling)
  2. teaching methods (including web)
  3. proportion theory/practice
  4. where do graduates go to?
  5. programming languages and platforms used
  6. which grammars, formalisms, statistical approaches
  7. textbooks
  8. resources used in teaching (e.g. BNC)
  9. multilinguality (languages taught; languages use in teaching; languages processed)
  10. student mobility
  11. traineeships and sandwiching
  12. does accreditation of foreign languages function satisfactorily?
  13. what are prerequisites for the program?
  14. position of respondent

EACL '99

The next EACL conference will be in Bergen on June 8-12, 1999, with workshops on June 12. It is proposed to try and arrange a workshop on educational matters, possibly in cooperation with the TNP on Speech Communication Sciences. At the same time, the annual meeting of the EACL board could discuss (a) professional profiles and (b) its standpoint on the creation of a European masters degree.

Web teaching test cases

The ELSNET-funded projects are under way. Some participants at this meeting are involved in a project on developing courseware for linguistic representation on the web and a demonstrator for parsing. ACO*HUM is playing a role in coordination and dissemination of this project. Bill Black informs the meeting of progress in this project.

Web teaching of natural language processing can benefit from special presentation software on the web. Many special linguistic representations are to be displayed in courses, including trees, feature structures, ATN networks, etc. Sending these over the net in the form of predefined graphics files (GIF) not only wastes bandwidth, but makes it difficult to generate on the fly, placing a burden on the server side. The currently tried solution is to develop a platform-independent Java client which receives a structure specified in a symbolic notation and displays it in the browser. On the server side, existing Lisp or Prolog parsers can be used to generate symbolic structures and send them to the client in an appropriate way.

For teaching parsing, there are existing alternative tools, which do not operate over the web. The LFG workbench is a heavy-duty, theory specific and very sophisticated tool for parsing. However, it is platform specific, operating in Unix environments and incompatible with the X window system and is therefore rapidly becoming obsolete. Linguistic Instruments provides easily accessible tools compatible with four formalisms. They are also platform specific since they operate on the Mac only. With respect to textbooks, the books by Gazdar & Mellish are good, treating a wide range of automata and parsers, using CFG, PATR and DCG formalisms and showing both Lisp and Prolog techniques. These books are however sold out and miss some newer approaches. Most importantly, the books are books, which means they lack interactive visuals.

At UMIST, Bill Black and Simon Hill have a project in which they have developed a first version of a parser usable via a web interface (see current demo). The server is based on expect and a TCL script which drives a running Lisp parser and converts the parse output to a format which is then sent to the client. The client is an applet to be run in a web browser. It receives the display request from the server and displays it as a parse tree in a window in the browser. The current version of the applet is written in Java but could also be programmed in TCL/Tk, in which case the TCL/Tk plugin is needed.

The current version has certain limitations. The Lisp parser has to be running all the time. Being a single process, it does not distinguish between clients, save the client's data, or allow clients to use their own rules. Alternatives might consist of (1) invoking a parser by a CGI script, starting a Lisp or Prolog saves state each time, (2) writing the server completely in Lisp, using the CL-http server, or (3) doing it all in Java on the client side.

Ergonomic issues include the desirability of a user-adjustable look and the desirability of viewing multiple parses, either in different windows or superimposed in different colours or otherwise. The possibility to display other kinds of structures needs to be explored, as well.

Teaching and learning issues need to be considered. Courses using a web parser may do so for a variety of reasons. The focus may be on descriptive linguistics, grammar development, parsing strategies, etc. As a pedagogical tool, the current approach may have certain limitations. It may not be suitable for scaling up and it is questionable if large-scale resources should be used. The student may be too restricted in the ways to experiment with parsing.

The ELSNET-sponsored project will continue. The meeting concludes that attempts must be made to learn and disseminate valuable lessons learned.


Minutes written by Koenraad de Smedt, October 26, 1998.