> Particularly, we wonder about the 'c="0000037 002"' component. For
> instance, are we correct in assuming that 002 refers to the first sentence
> in a turn? If so, how are the following sentences within the same turn
> numbered? And what about the '0000037' part?
This looks like the reference number inserted by CLAWS at the start of each
line in its vertical output format. There is a reference number for each word
in this format. The first part '0000037' refers to the line number in the
untagged input file, and the second part increments by 1 for punctuation and by
10 for other items.
> We also wonder about the numbers in the overlap tags (<ptr t= >). As we
> understand it, the example above is an illustration of correct enumeration
> (the reason we are asking this, is that we have seen instances where the
> numbering is
> different).
I assume these follow the same rules as in the encoding scheme for the British
National Corpus.
> We have tried to get this information both from Lancaster and the internet,
Who did you contact at Lancaster?
Regards,
Paul.