An Electronic Edition of John Wilkins' Alphabetical and Conceptual Dictionary

Final Report on a Marie Curie host fellowship research stay at BATMULT, April - July 2003

Natascia Leonardi


Project description:

The project I developed at the HIT-Centre consists in the implementation of a digital version of John Wilkins' Essay Towards a Real Character and a Philosophical Language (London, 1668). This version was planned as a partial reproduction of the text that could enable to test the applicability of the model to the integral work. The result obtained represents a pilot version for the complete realisation of the Essay that I intend to accomplish after my PhD. This project is an integral component of my PhD thesis as it supports my theoretical analysis of Wilkins' work; furthermore it also represents an autonomous product.

My thesis (An Essay Towards a Real Character and a Philosophical Language and An Alphabetical Dictionary. Implications of a Conceptual and Alphabetic Arrangement in Defining Procedures [provisional title]) is focused on the analysis of the different types of defining procedures adopted by the author in two independent but interrelated sections of the Essay. Such sections are the classificatory Tables of the Universal Philosophy and the Alphabetical Dictionary. The latter was written by Wilkins in collaboration with William Lloyd; it is bound together with the Essay but is also endowed with its own title page.

A description of the structures of the two sections is functional to illustrate the defining procedures used by the author and, consequently, also to explain the way in which such procedures have been implemented in the electronic version produced.

The Tables consist in a hierarchical structure based on three main classificatory levels – defined by Wilkins `Genus', `Difference', and `Species' – complemented by further intermediate defining nodes. Each node normally contains two elements, but it is also possible to find nodes in which are collocated only one, or up to four elements. When the node contains more than one element, the first is the basic (unmarked) one while the others are characterised as either `Affinis' or `Opposed' to the unmarked one. As far as opposition is concerned, in the Tables of the Essay it can be simple or interpreted as a gradual opposition, when three or four items are contained in one node. As a consequence, each item contained in one node is always singled out and its collocation (and definition) in the Tables cannot be confused with that of the others. Wilkins defines the elements included in the classificatory scheme `Radicals' and they can be interpreted as conceptual-semantic units.

The hierarchical structure devised by Wilkins is instrumental to produce a definition of each of the elements contained in the nodes. Definitions are provided by the characterising features attached to each relational node, furthermore on the three main classificatory levels a prose definition (and occasionally a longer comment) is provided. In this way the definition of a conceptual-semantic item can be obtained by the reader through a “composition” of the partial information given on each defining node, proceeding through the hierarchy from top to bottom.

A list of English words is attached to most of the `Radicals', but in the Tables no further semantic information is provided for such words. Such kind of information can be retrieved only in the Alphabetical Dictionary. In the Alphabetical Dictionary all the words contained in the classificatory Tables of the Universal Philosophy (i.e. the `Radicals' and their associated words) are listed alphabetically. Each entry in the Dictionary contains a quite detailed semantic analysis of the lemma. Particular attention is devoted by Wilkins to the phenomena of polysemy and synonymy in the English language. As a consequence the Alphabetical Dictionary has in part an autonomous value, as it provides a lexicographic definition of the words it contains, but it also depends on the Tables. Some of the definitions simply consist in a direct reference to the node in the Tables where a particular semantic content is collocated.

The mutual dependence of the two sections analysed yields an increased amount of information on the conceptual and lexical units that are classified in the Tables and in the Dictionary. But, at the same time, such structure can hamper the accessibility of the text. In fact, the reader needs a good knowledge of the Essay in order to fully exploit all its potentialities.

A digital version of the text can first of all fully represent the complementary defining procedures implemented by Wilkins in the Essay. In addition, if compared with the paper format, it can enhance the mutual relationship between the two sections under analysis, and can also make available a more easily accessible version of the original work. The availability of the Essay in electronic format is helpful on the one hand to obtain a more “user-friendly” search in the text, furthermore it would also make possible an integration of this 17th-century monolingual dictionary in existing corpora of analogous texts.

The work on the Essay I accomplished in Bergen is based on a first digital version I developed at the University of Macerata with the technical support of Dott. Marco Marziali. This first version consists in an encoding of two of the forty Tables of the Essay: the Table of `Discourse' and of `Natural Power'. Dott. Marziali elaborated a tailor-made markup language functional to the identification of the three main classificatory levels, to the isolation of the “Radicals”, and to the identification of the words associated with them. He also implemented a software (in visual C++) that parses the tagged text in a way that makes explicit its original structure. The output of the software extracts an HTML page that reproduces the hierarchical format of the source. Moreover lexical definitions of the “Radicals” can be automatically retrieved in their original sequence in the Tables, starting from the top of the hierarchy.

Such definitions can be viewed in HTML format, graphic devices are used to visually distinguish the three main defining nodes (Genus, Difference, and Species): bold characters are used for `Radicals'; parenthesised phrases contain the relational definitions added by Wilkins to the bare hierarchy. In the first version of my project the relations of Affinity and Opposition that characterise the items in each node are labelled but not activated in the software. Furthermore, this version necessitated a formal refinement of the definitions in order to provide coherent English sentences which could also match the original formulation in the text. Before beginning to work at the HIT-Centre the definition of the procedures for encoding the Dictionary section of the Essay was at an initial stage.

Activities during the Fellowship:

A standard language (XML) has been used for encoding the two sections of the Essay taken into consideration (`Discourse' and `Natural Power'). In my original project I had considered adopting the TEI (Text Encoding Initiative) Guidelines for XML-encoding an electronic version of the classificatory Tables and of the Alphabetical Dictionary. I chose instead to develop a custom DTD because the text I am working with requires a high number of special tags. I consider this a profitable choice because together with the advantages of a standardised language, XML provides the flexibility that is necessary for applications on a structurally complex source such as Wilkins' Essay.

The digital product developed implements the planned connection between the Alphabetical Dictionary and the Tables: the former provides the “lexicographic” type of definition which characterises traditional alphabetic dictionaries; when a reference to the conceptual definition in the Tables is given, a link activates the retrieval of the necessary information from that section of the text. The product obtained is based on an encoding in XML of two of the forty classificatory Tables of the Essay (`Discourse' and `Natural Power'), and of the entries in the Alphabetical Dictionary corresponding to the `Radicals' contained in these Tables; the entries for the related words listed after the `Radicals' are being completed. Since the main aim of the final version is to provide an integrated edition of the lexical-conceptual definitions given by Wilkins, in establishing the blueprints for the encoding, the focus has been placed on the structural facet of the hierarchical Tables and of the Dictionary, rather than on the stylistic side of the text. Formal (stylistic) components are encoded if they also have a functional value in the defining procedure. The parts of the text that have been encoded can be viewed with a web browser, and an XSL stylesheet has been written by Vemund Olstad (Special Consultant, HIT-Centre) that allows the display of the Tables encoded in HTML format.

The basic phase of my work consisted in identifying the structural features of the classificatory Tables and of the Alphabetical Dictionary that are functional to the realisation of a digital edition of Wilkins' work encoded in XML. The work developed before my Fellowship at the HIT-Centre was helpful mainly as far as the theoretical facet of my project is concerned: the standard encoding language adopted requires, in fact, different procedural decisions. Since the Tables ultimately consist of relational nodes, the relations that define each single node (and its content) were reproduced in the mark-up phase. Taylor-made tags have been established for indicating both the “vertical” relations of embedment between hypernymic and hyponymic nodes and the “horizontal” relations between sibling nodes. All the levels of the hierarchy have been tagged, and the three main classificatory layers have been singled out. The latter component of the encoding has a particular relevance since it is functional to the parsing process which allows the reconstruction of the definitions. The `Radicals' on each node have been encoded, and a specification of their relation to the unmarked one has been included when it was applicable. The words related to each single `Radical' have been singled out of the list in which they are usually included, so that a link from each single word in the Tables to its entry in the Dictionary can be implemented.

The Tables encoded for the realisation of this project have been chosen both on the basis of the semantic fields they are devoted to (related to language and human faculties, respectively) and because they represent the two different structural typologies on which the classificatory Tables are built. The hierarchical organisation of the Table of `Natural Power' can be represented as a directed acyclic graph, while the Table of `Discourse' presents the structure of a regular tree. Tests on the encoding have been done on the Table of `Natural Power' as it presents a more complicated structure. As a matter of fact, for the encoding of this Table the introduction of a special device was necessary in order to make available the correct path of the graph that should be reconstructed through the XSL transformation. Two distinct paths have been identified through the graph; these are encoded on the different levels of the hierarchy when the basic architecture of the source text presents such a structure.

A stylesheet (written by Vemund Olstad) allows the transformation of the encoded text of the Table to an HTML representation. This transformation retrieves the components of the hierarchy which contribute to the definition of each item included in a relational node (`Radical'). Through this transformation of the encoded text the specific definitions contained in the classificatory Tables for the concepts classified are reconstructed.

The encoding of the Alphabetical Dictionary has required particular attention since even though it can be rightly considered an effective monolingual English dictionary, the structure of the entries and the defining procedures cannot find a direct parallel either with those adopted in modern reference works or in older alphabetical dictionaries endowed with a conventional structure. As a consequence, the tags for the encoding of this section are almost completely tailor-made, and during my stay at the HIT-Centre I have had the possibility to make use of Paul Meurer's (Researcher, HIT-Centre) expertise in a final check of my work of encoding.

In a phase preliminary to the digitisation of the Dictionary I have recorded all the components of the entries in a Microsoft Excel document. I have considered the possible usefulness of developing a database out of Wilkins' and Lloyd's Alphabetical Dictionary as a by-product of my project. This database could be available in the future also for different implementations of the same materials. Øystein Reigem (Special Consultant, HIT-Centre) helped me in the definition of the organisation of the document and of the fields relevant to the eventual realisation of a database.

A different stylesheet (developed by Vemund Olstad) is applied to the encoded section of the Alphabetical Dictionary. It is functional to allow a search of the words contained in the entries and the HTML display of the entries extracted from the Dictionary. The stylesheet for the Dictionary contains a Javascript that links the Dictionary to the Tables. This script has been implemented by Sindre Sørensen (Special Consultant, HIT-Centre).

The digital edition produced gives access to the Alphabetical Dictionary and through it to the definitions available in the two Tables encoded through a search interface accessible with a Windows web browser (cf. below for its description). The indentation that characterises the format of the entries in the Alphabetical Dictionary is visually reproduced in the HTML output. This aspect has been reproduced because it does not simply belong to the stylistic facet of the text. In fact, indentation reflects the organisation of the different senses identified for a word-form, and of their respective sub-senses. Moreover sub-entries can be indented both directly under the entry-word or subsumed under one of the different senses given for a word.

The display of the entries includes the code-references to the classificatory Tables, when they are given in the text. Code-references have a double function: they can be the only information given for a lexical unit, as a consequence, in such cases, they replace the definition itself; but they can also be part of a definition if they are used as a substitute of the word whose meaning is defined in the specific nodes of the Tables that the code-references indicate. The code-references that appear in the electronic version of the Dictionary are hyperlinks, which may be clicked on by the user in order to retrieve the definition of the word indicated by the reference (cf. below for a description of this aspect).

With the implementation described above the defining material offered to the user is twofold: through the search interface it is possible to access both the specific definition pertinent to the search word and also the wider context in which the specific meaning is included in the Tables. The reader can therefore access both the lexical definition provided by the Alphabetical Dictionary and the conceptual definition based on a relational structure which is displayed in the classificatory Tables.

With this particular implementation it has been possible for me to accomplish the aim of my project: The realisation of a digital edition of the defining sections of the Essay that reproduces the twofold defining methods used by Wilkins. Such edition is mainly intended to be faithful to the basic structure of the source text. Furthermore, the digital version developed, while providing a user-friendly interface for the original version of the Essay, is also a successful way to elicit the potentialities of the text.

During my stay at the HIT-Centre of the University of Bergen (2 April-28 July 2003) the goals of my original project, submitted to the HIT-Centre (BATMULT project) for my application to the Marie Curie Pre-Doctoral Fellowship have been achieved.

Results:

  1. XML-encoded version of the General Scheme that precedes the forty classificatory Tables in the Essay and of the Tables of `Discourse' and of `Natural Power'. The principal tags in the encoded version are the specific tags that identify the three main classificatory levels (`Genus', `Difference', and `Species') and the particular path of the classificatory graph on which the `Radicals' are placed in the original text. Furthermore, tags mark also the `Radicals' and the single related words listed after each of them.
  2. an XSL stylesheet (by V. Olstad) permits the display of the Tables encoded in HTML format.
  3. an XSL stylesheet (by V. Olstad) transforms the pertinent tagged sections of the encoded text of the classificatory Tables in order to reconstruct the definitions of each of the `Radicals' in the Tables. This stylesheet uses the tags listed in point 1. to yield the reconstruction process and is given several parameters that decide which node should be focused on in the HTML reproduction of the Tables retrieved from the Dictionary search.
  4. a Microsoft Excel document contains the entries of the Alphabetical Dictionary for the words defined in the two Tables considered with specifications regarding the role in the Dictionary of each of the included items.
  5. an XML-encoded version of the Alphabetical Dictionary, where tags specify the nature of the different components of the entries. Such tags are also functional to reproduce the format such entries have in the original source.
  6. an XML stylesheet (by V. Olstad) for the Dictionary enables the search of the words contained in the entries (both of the lemmas and of the words that are part of the definitions). It also makes available the HTML display of the entries extracted from the Dictionary. It contains:
  7. a Javascript (by S. Sørensen) that connects the Dictionary to the Tables. It uses the code-references that appear in the Dictionary entries for linking them to the corresponding words in the Tables.
  8. a search interface (by P. Meurer), uses XSL and Javascript: it is accessible with a Windows web browser (Internet Explorer 6.0 or higher) and allows the user to search for the words encoded in the Dictionary. The search is case-sensitive and the result of the search displays any occurrences of the search word – and/or search phrase – both if it is encoded as a headword in the Dictionary and when it appears in the body of the entry. As a consequence the search returns all the entries in which the search word appears, i.e. including its occurrence as entry-word, sub-entry or a component of a definition.
  9. the HTML output of the Dictionary originated by the stylesheet (cf. point 3.1) displays the code-references to the Tables provided in the source document in most of the entries. They are hyperlinks functional to retrieve the definition of the word indicated by the reference. The link based on the Javascript generates a window which contains the reconstructed definition of the meaning searched for. In addition a further link, located next to the code-reference is given, that will have the function of allowing the user to directly access the partial Table in which the word is defined in the classificatory section of the Essay.
  10. During the period I spent at the HIT-Centre I have also developed the theoretical description of this pilot electronic version of the Essay, which is part of my PhD thesis.
  11. I gave a presentation of my work at a seminar series organized by Prof. Helge Dyvik at the Department of Linguistics of the University of Bergen (23rd May).