Re: [Corpora-List] Sorting upper-ASCII chars in Unix

From: Vlado Keselj (vlado@cs.dal.ca)
Date: Mon Nov 24 2003 - 21:27:41 MET

  • Next message: Daniel Midgley: "[Corpora-List] Preliminary CFP: Student Research Workshop at ACL-04"

    On Mon, 24 Nov 2003, William Fletcher wrote:

    > A recent query elicited numerous responses from Unix gurus. Perrhaps
    > one of them can help me with a question that has our Unix people
    > stumped.
    >
    > I have been trying to use the Unix sort function to sort files which
    > contain upper-ASCII characters (i.e. ASCII code > 127) on a machine with
    > locale, language and charset set to US English. Lower-ASCII characters
    > and some upper-ASCII characters sort fine, but some upper-ASCII
    > characters (specifically some non-alphanumeric ones) are left in
    > semi-random order.
    >
    > How should the relevant environmental variables be set to permit sorting
    > files in straight ASCII order?

    This can cause a lot of frustration, indeed.

    The following variables may effect sorting:
    LANG, LANGUAGE, NLSPATH, LOCPATH, LC_ALL, LC_MESSAGES

    I believe that setting: LC_ALL=POSIX
    solves the problem.

    Vlado

    >
    > Thanks in advance,
    > Bill Fletcher
    >



    This archive was generated by hypermail 2b29 : Mon Nov 24 2003 - 21:28:21 MET