Extracting a List of Sense Disambiguated Synonyms from an Online Dictionary




The feasibility of automating the extraction of a list of sense-disambiguated synonyms from a machine-readable dictionary (Funk and Wagnall's) is investigated. In a preprocessing phase, the text of each entry is parsed to bracket the fields (part-of-speech, etymology, etc.) and label the definition as descriptive, illustrative, or synonym. This is a complex process because dictionaries in general have a rather loose structure and because the typographical clues that flag most fields have not been preserved in the on-line version of Funk and Wagnall's. Only a small percentage of the synonyms are flagged explicitly in the dictionary; the majority are extracted by identifying definitions without 'differentiae' (i.e. consisting only of a 'genus'). Since most words are polysemous (and many are homographs), phase three performs sense and lemma number disambiguation by establishing (in)direct circularity, in order to avoid relating words based on orthographic in stead of semantic similarities. A planned fourth phase will establish a network of related words with distance weights based on distance travelled, type of link, and degree of alignment.