WordNet–Wikipedia–Wiktionary: Construction of a three-way alignment

Please download to get full document.

View again

of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural language processing. Automatic alignment of complementary resources is one way of improving this coverage and quality;
  WordNet–Wikipedia–Wiktionary:Construction of a Three-way Alignment Tristan Miller 1 , Iryna Gurevych 1,2 1 Ubiquitous Knowledge Processing Lab (UKP-TUDA)Department of Computer Science, Technische Universit¨at Darmstadt 2 Ubiquitous Knowledge Processing Lab (UKP-DIPF)German Institute for International Educational Research, Frankfurt am Main http://www.ukp.tu-darmstadt.de/ Abstract The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural languageprocessing. Automatic alignment of complementary resources is one way of improving this coverage and quality; however, past attemptshave always been between pairs of specific resources. In this paper we establish some set-theoretic conventions for describing conceptsand their alignments, and use them to describe a method for automatically constructing  n -way alignments from arbitrary pairwisealignments. We apply this technique to the production of a three-way alignment from previously published WordNet–Wikipediaand WordNet–Wiktionary alignments. We then present a quantitative and informal qualitative analysis of the aligned resource. Thethree-way alignment was found to have greater coverage, an enriched sense representation, and coarser sense granularity than both thesrcinal resources and their pairwise alignments, though this came at the cost of accuracy. An evaluation of the induced word senseclusters in a word sense disambiguation task showed that they were no better than random clusters of equivalent granularity. However,use of the alignments to enrich a sense inventory with additional sense glosses did significantly improve the performance of a baselineknowledge-based WSD algorithm. Keywords: lexical semantic resources, sense alignment, word sense disambiguation 1. Introduction Lexical semantic resources (LSRs) are used in a wide vari-ety of natural language processing tasks, including ma-chine translation, question answering, automatic summar-ization, and word sense disambiguation. Their cover-age of concepts and lexical items, and the quality of theinformation they provide, are crucial for the success of these tasks, which has motivated the manual constructionof full-fledged electronic LSRs. However, the effort re-quired to produce and maintain such expert-built resourcesis phenomenal (Briscoe, 1991). Early attempts at resolvingthis knowledge acquisition bottleneck focused on methodsfor automatically acquiring structured knowledge from un-structured knowledge sources (Hearst, 1998). More recentcontributions treat the question of automatically connect-ing or merging existing LSRs which encode heterogeneousinformation for the same lexical and semantic entities, orwhich encode the same sort of information but for differentsets of lexical and semantic entities. These approaches haveuntil now focused on pairwise linking of resources, and inmost cases are applicable only to the particular resourcesthey align.In this research we address the novel task of automaticallyaligning arbitrary numbers and types of LSRs through thecombination of existing pairwise alignments, which in the-ory reduces the number of specialized algorithms requiredto find concept pairs in any  n  resources from as many as  n 2   =  n ! ÷ 2 ( n − 2 ) ! to as little as  n − 1. The remainderof this paper is structured as follows: In the next sec-tion, we give some background on LSRs and alignments,both in general and for the specific ones we will be work-ing with. Section 3 describes a technique for construct-ing  n -way alignments and applies it to the production of a three-way alignment of concepts from WordNet, Wiki-pedia, and Wiktionary using existing WordNet–Wikipediaand WordNet–Wiktionary alignments. Though the tech-nique is straightforward, this is, to our knowledge, the firsttime anyone has actually used it to align more than two het-erogeneous LSRs at the concept level. Section 4 presentsvarious statistical and qualitative analyses of the aligned re-source. Our paper concludes with a discussion of possibleapplications and further evaluations of the aligned resource. 2. Background 2.1. Lexical semantic resources The oldest type of lexical semantic resource is the  diction-ary . In its simplest form, a dictionary is a collection of   lex-ical items  (words, multiword expressions,  etc. ) for whichthe various  senses  (or  concepts ) are enumerated and ex-plained through brief prose  definitions . Many dictionariesprovide additional information at the lexical or sense level,such as etymologies, pronunciations, example sentences,and usage notes. A  wordnet  , like a dictionary, enumer-ates the senses of its lexical items, and may even providesome of the same sense-related information, such as defin-itions and example sentences. What distinguishes a word-net, however, is that the senses and lexical items are organ-ized into a network by means of conceptual-semantic andlexical relations.  Encyclopædias  are similar to dictionaries,except that their concept descriptions are much longer andmore detailed. WordNet.  WordNet   (Fellbaum, 1998) is an expert-builtEnglish-language wordnet which has seen myriad applica-tions. Foreachsense(inWordNetparlance, a synset  )Word-Net provides a list of synonymous lexical items, a defini-tion, and zero or more example sentences showing use of   the lexical items. 1 Within each version of WordNet, syn-sets can be uniquely identified with a label, the  synset off-set  , which encodes the synset’s part of speech and its po-sition within an index file. Synsets and lexical items areconnected to each other by various semantic and lexical re-lations, respectively, in a clear-cut subsumption hierarchy.The latest version of WordNet, 3.0, contains 117659 syn-sets and 206941 lexical items. Wiktionary.  Wiktionary 2 is an online, free content dic-tionary collaboratively written and edited by volunteers. Itincludes a wide variety of lexical and semantic informa-tion such as definitions, pronunciations, translations, inflec-ted forms, pictures, example sentences, and etymologies,though not all lexical items and senses have all of this in-formation. The online edition does not provide a conveni-ent and consistent means of directly addressing individuallexical items or their associated senses; however, the third-party API JWKTL (Zesch et al., 2008) can assign uniqueidentifiers for these in snapshot editions downloaded foroffline use. A snapshot of the English edition from 3 April2010 contains 421847 senses for 335748 English lexicalitems. Wikipedia.  Wikipedia 3 is an online free content en-cyclopædia; like Wiktionary, it is produced by a com-munity of volunteers. Wikipedia is organized into millionsof uniquely named  articles , each of which presents de-tailed, semi-structured knowledge about a specific concept.Among LSRs, encyclopædias do not have the same estab-lished history of use in NLP as dictionaries and wordnets,but Wikipedia has a number of features—particularly itsnetwork of internal hyperlinks and its comprehensive art-icle categorization scheme—which make it a particularlyattractive source of knowledge for NLP tasks (Zesch et al.,2007; Gurevych and Wolf, 2010). 2.2. Pairwise alignments Each of the aforementioned resources has different cover-age (primarily in terms of domain, part of speech, and sensegranularity) and encodes different types of lexical and se-mantic information. There is a considerable body of priorwork on connecting or combining them at the concept levelin order to maximize the coverage and quality of the data;this has ranged from largely manual alignments of selectedsenses(MeyerandGurevych, 2010; Dandalaetal., 2012)tominimally supervised or even fully automatic alignment of entireresourcepairs(Ruiz-Casadoetal., 2005; deMeloandWeikum, 2009; Niemann and Gurevych, 2011; Navigli andPonzetto, 2012; Meyer and Gurevych, 2011; Matuschek and Gurevych, 2013; Hartmann and Gurevych, 2013). 4 Inour work, we use the alignments from Meyer and Gurevych(2011) and Matuschek and Gurevych (2013), which were 1 In this paper we use the term  sense  in a general way to refer tothe concepts or meanings described by an LSR. This is in contrastto the WordNet documentation, where it refers to the pairing of alexical item with a synset. 2 https://www.wiktionary.org/ 3 https://www.wikipedia.org/ 4 A different approach with some of the same benefits is toprovide a unified interface for accessing multiple LSRs in thesame application (Garoufi et al., 2008; Gurevych et al., 2012). among the few that were publically available in a transpar-ent, documented format at the time of our study.Meyer and Gurevych (2011) describe a text similarity–based technique for automatically aligning English Wik-tionary senses with WordNet synsets. The versions of WordNet and Wiktionary they use contain 117659 and421847 senses, respectively, for 206941 and 335748 lex-ical items, respectively. Their published alignment file con-sists of 56952 aligned pairs, but as the same Wiktionarysense is sometimes paired with multiple WordNet synsets,the set of aligned pairs can be reduced mathematically (see § 3) to 50518  n :1 sense mappings, where 1 ≤ n ≤ 7. Align-ments for a well-balanced sample of 320 WordNet synsets(Niemann and Gurevych, 2011) were compared with hu-man judgments, and were found to greatly outperform therandom and MFS baselines, with F 1  =  0 . 66.Dijkstra-WSA (Matuschek and Gurevych, 2013) is a state-of-the-art graph-based technique which was applied toalign WordNet with a snapshot of the English edition of Wikipedia containing 3348245 articles, resulting in 42314aligned pairs. Here, too, the set of aligned pairs can bemathematically reduced to 30857  n :1 mappings, where1 ≤ n ≤ 20. The alignment achieved F 1  = 0 . 67 on the afore-mentioned well-balanced reference dataset. 3. Construction of the three-way alignment Since synonymy is reflexive, symmetric, and transitive (Ed-mundson, 1967), we can define an equivalence relation  ∼ on a set of arbitrary sense identifiers  T   = { t  1 , t  2 , . . . }  suchthat  t  i  ∼  t   j  if   t  i  and  t   j  are synonyms ( i.e. , if the sensesthey refer to are equivalent in meaning). The  synonymset   of an identifier  t   ∈  T  , denoted  [ t  ] T  , is the equivalenceclass of   t   under  ∼ :  [ t  ] T   =  { u  ∈  T  | u  ∼  t  } .  The set of allsuch equivalence classes is the quotient set of   T   by  ∼ : T  /  ∼  =  { [ t  ] T  | t   ∈ T  } .  For any pair of disjoint sets  U   and V   such that  T   =  U   ∪ V   and there exist some  u  ∈ U   andsome  v  ∈ V   for which  u  ∼  v , we say that  u  and  v  are an aligned pair   and that  A  f   ( { U  , V  } ) =  T  /  ∼  is a  full align-ment   of the  sources { U  ,  V  } . More generally, for any set of disjoint sets  W   =  { W  1 , W  2 , . . . }  where  T   =   W   and thereexist distinct  W  i , W   j  ∈ W   : ∃ u ∈ W  i , v ∈ W   j  :  u ∼ v , we saythat  A  f   ( W  ) =  T  / ∼ is a full alignment of  W  .Full alignments may include synonym sets whichdo not contain at least one identifier from each of their sources. The  conjoint alignment   which ex-cludes these synonym sets is defined as  A c ( W  ) = { [ t  ] T  | t   ∈ T  , ∀ W  i  ∈ W   : ∃ u ∈ W  i ∩ [ t  ] T  } . The cardinality of a full or conjoint alignment is a countof its synonym sets. The number of individual identifiersreferenced in an alignment  A ( W  )  can also be computed:   A ( W  )  = |   A ( W  ) | .  If     A ( W  )  = | T  |  then  A ( W  )  must bea full alignment.Given a set of identifiers and a set of aligned pairs, findingall the synonym sets is analogous to computing the connec-ted components in a graph. Hopcroft and Tarjan (1973) de-scribe an algorithm for this which requires time and spaceproportional to the greater of the number of identifiers orthe number of aligned pairs.Let  WKT  ,  WN  , and  WP  be disjoint sets of unique senseidentifiers from Wiktionary, WordNet, and Wikipedia, re-  spectively; the combined set of all their identifiers is  T   = WKT   ∪ WN  ∪ WP . The Dijkstra-WSA data correspondsto a set of ordered pairs  ( n ,  p )  ∈  WN  × WP  where  n  ∼  p .This data was sufficient for us to employ the connectedcomponent algorithm to compute  A c ( { WN  , WP } ) , the con- joint alignment between WordNet and Wikipedia. We re-constructed the full alignment,  A  f   ( { WN  , WP } ) , by addingthe unaligned identifiers from the srcinal Wikipedia andWordNet databases. Similarly, the Meyer and Gurevych(2011) data contains a set of pairs  ( n , k  ) ∈ WN  × WKT   suchthat  n ∼ k  , but it also contains a list of unaligned singletonsfrom both  WN   and  WKT  . We therefore directly computedboth  A  f   ( { WN  , WKT  } )  and  A c ( { WN  , WKT  } )  using the con-nected component algorithm. 4. Analysis The conjoint three-way alignment of WordNet, Wiktion-ary, and Wikipedia is a set of 15953 synonym sets relating63771 distinct sense identifiers (27324 WordNet synsets,19916 Wiktionary senses, and 16531 Wikipedia articles).Of the synonym sets, 9987 (63%) contain exactly one iden-tifier from each source; Table 1 gives further details on syn-onym set sizes. Since our WordNet–Wikipedia alignmentis for nouns only, the synonym sets in the conjoint three-wayalignmentconsistentirelyofnouns. Thefullthree-wayalignment groups all 3887751 identifiers from the srcinalsources into 3789065 synonym sets: 69259 of these aredescribed by adjectives, 3613514 by nouns, 12415 by ad-verbs, 76992 by verbs, and 16885 by other parts of speech.Coverage of lexical items is not as straightforward to ana-lyze owing to how Wikipedia treats them. Concepts inWikipedia are canonically identified by an article title,which is typically a lexical item optionally followed by aparenthetical description which serves to disambiguate theconcept from others which would otherwise share the sametitle. Lexical synonyms for the concept, however, are notexplicitly and consistently encoded as they are in Word-Net synsets. These synonyms are sometimes given in theunstructured text of the article, though identifying these re-quires sophisticated natural language processing. Many re-direct page titles 5 and incoming hyperlink texts—which aremuch easier to compile—are also synonyms, but others areanaphora or circumlocutions, and Wikipedia does not dis-tinguish between them.If we make no attempt to identify lexical synonyms fromWikipedia other than the article title, we find that the three-way conjoint alignment covers at least 44803 unique lex-ical items, 42165 of which are found in WordNet, 17939in Wiktionary, and 16365 in Wikipedia. Moreover 20609of these lexical items are unique to WordNet and 2638 toWikipedia. (There are no lexical items unique to Wik-tionary.) We can also calculate the  word sense distribu-tion d  ( k  )  of the conjoint alignment—that is, the percentageof lexical items which have a given number of senses  k  .Table 2 shows this distribution for WordNet, Wiktionary,and the conjoint three-way alignment; and also the average 5 In Wikipedia parlance, a  redirect page  is an empty pseudo-article which simply refers the visitor to a different article. Theyare analogous to “see” cross-references in indices (Booth, 2001). ( ¯ ω  ) and maximum ( ˆ ω  ) number of senses per lexical item.We observe that while the distributions for the unalignedresources are similar, the conjoint alignment demonstratesa marked shift towards monosemy. Though Zipf’s law of meaning (Zipf, 1949) suggests that this might be the res-ult of poor coverage of very high frequency lexical items,we found that the conjoint alignment actually covers 97 of the 100 most common (and 934 of the 1000 most common)nouns occurring in the British National Corpus.Informal spot checks of synonym sets show them to be gen-erally plausible, which is to be expected given the accuracyof the source alignments. However, the incidence of ques-tionable or obviously incorrect mappings seems dispropor-tionately higher in larger synonym sets. For example, onesynonym set of cardinality 21 reasonably groups togethervarious equivalent or closely related senses of the noun“hand”, but also includes senses for “Palm OS” and “left-wing politics”, since in the two-way alignments they hadbeen mistakenly aligned with the anatomical senses for“palm” and “left hand”, respectively. It appears that sucherrors are not only propagated but actually exaggerated byour algorithm, resulting in noisy data. 5. Evaluation There are several different ways in which sense alignmentscan be formally evaluated. The conceptually simplest iscomparison with human judgments as Meyer and Gurevych(2011) and Matuschek and Gurevych (2013) did with theirpairwise alignments. However, there are many reasons whythis sort of evaluation is not appropriate for an alignmentof more than two resources: First, it disregards the transit-ive nature of synonymy. That is, if the two-way alignmentcontains the pairs  ( n 1 , k  )  and  ( n 2 , k  ) , then those two pairsare considered in the evaluation, but not the implied pair ( n 1 , n 2 ) . This was perhaps more acceptable for the two-wayalignments where only a small minority of the mappingsare not 1:1, but our three-way alignments rely more heavilyon the transitive property; indeed, in the conjoint alignment100% of the synonym sets were produced by exploiting it.Second, even if we modify the evaluation setup such thatthe implied pairs are also considered, since the number of identifiers per synonym set is much higher in the three-wayalignment, there is a combinatorial explosion in the num-ber of candidate pairs for the judges to consider. Finally,considering sense pairs in isolation may not be the mostappropriate way of evaluating what are essentially  clusters of ostensibly synonymous sense descriptions.We could therefore reduce the problem to one of evaluatingclusters of senses from a single resource—that is, for everysynonym set in the full alignment, we remove sense iden-tifiers from all but one resource, and treat the remainder asa coarse-grained clustering of senses. Established intrinsicor extrinsic sense cluster evaluation techniques can then beapplied. An example of the former would be computing theentropy and purity of the clusters with respect to a human-produced gold standard (Zhao and Karypis, 2003). How-ever, while such gold standards have been constructed forearly versions of WordNet (Agirre and Lopez de Lacalle,2003; Navigli, 2006), theyhavenot, toourknowledge, beenproduced for the more recent version used in our alignment.  alignment 2 3 4 5 6 7 8 9  ≥ 10 total  A c  ( { WN  , WP } )  23737 4801 1355 492 234 112 54 28 44 30857  A c  ( { WN  , WKT  } )  45111 4601 656 99 35 12 4 0 0 50518  A c  ( { WN  , WP , WKT  } )  0 9987 2431 1666 654 441 209 164 401 15953Table 1: Distribution of synonym sets by cardinality in the two- and three-way conjoint alignments resource  d  ( 1 )  d  ( 2 )  d  ( 3 )  d  ( 4 )  d  ( ≥ 5 )  ¯ ω   ˆ ω  WN   83.4% 10.4% 3.1% 1.3% 1.8% 1.32 59 WKT   85.2% 9.4% 2.8% 1.1% 1.3% 1.26 58  A c  ( { WN  , WP , WKT  } )  91.0% 6.4% 1.6% 0.6% 0.5% 1.14 16Table 2: Word sense distribution in WordNet, Wiktionary, and the three-way conjoint alignmentA possible extrinsic cluster evaluation would be to takethe sense assignments of a state-of-the-art word sense dis-ambiguation (WSD) system and rescore them on clusteredversions of the gold standard (Navigli, 2006; Snow et al.,2007). That is, the system is considered to disambiguatea term correctly not only if it chooses the gold-standardsense, but also if it chooses any other sense in that sense’scluster. The improvement for using a given sense cluster-ing is measured relative to a computed random clusteringof equivalent granularity.Cluster evaluations are appropriate if constructing thealignment is simply a means of decreasing the granularityof a single sense inventory. However, they do not measuretheutilityofthealignmentasanLSRinitsownright, whichcalls for extrinsic evaluations in scenarios where unalignedLSRs are normally used. One previous study (Ponzetto andNavigli, 2010) demonstrated marked improvements in ac-curacy of two different knowledge-based WSD algorithmswhen they had access to additional definition texts or se-mantic relations from a WordNet–Wikipedia alignment.Conventional wisdom in WSD is that for knowledge-basedapproaches, more data is always better, so a three-wayalignment which provides information from Wiktionary aswell could boost performance even further. A complicationwith this approach is that our alignment method producescoarse-grained synonym sets containing multiple sensesfrom the same resource, and so without additional pro-cessing a WSD algorithm would not distinguish betweenthem. Forusewithexistingfine-graineddatasets, suchsyn-onym sets could either be removed from the alignment, orelse the WSD algorithm would need to be written in sucha way that if it selects such a synonym set as the correctone, itperformsanadditional, finer-graineddisambiguationwithin it.In this study we performed two of the aforementioned typesof WSD-based evaluations. The first evaluation is a cluster-based one where we rescore the results of existing WSDsystems using the clusters induced by our three-way align-ment; we describe this and present the results in  § 5.1. Inour second evaluation, we use our three-way alignment toenrich WordNet glosses with those from aligned senses inthe other two resources, and then use our enriched sense in-ventory with a knowledge-based WSD algorithm; this iscovered in  § 5.2. For both evaluations we use the freelyavailable DKPro WSD framework (Miller et al., 2013). 5.1. Clustering of WSD results In this evaluation, we follow the approach of Snow et al.(2007). Specifically, we take the raw sense assignmentsmade by existing word sense disambiguation systems ona standard data set and then rescore them according to agiven clustering. A system is considered to have correctlydisambiguated a term not only if it chose the correct sensespecified by the data set’s answer key, but also if it choseany other sense in the same cluster as the correct one. Of course, any clustering whatsoever is likely to increase ac-curacy, simply by virtue of there being fewer senses for sys-tems to choose among. To account for this, we measure theaccuracy obtained with each clustering relative to that of arandom clustering of equivalent granularity.Like Snow et al. (2007), we use the raw sense assignmentsof the three top-performing systems in the Senseval-3English all-words WSD task (Snyder and Palmer, 2004):GAMBL (Decadt et al., 2004), SenseLearner (Mihalceaand Faruque, 2004), and the Koc¸ University system (Yuret,2004). While other datasets would be equally applicable,we use this one as it ensures comparability to the previouswork. The scores for our random clusterings are determ-ined computationally: For a given clustering, let  C   be theset of clusters over the  N   senses of a given term. Then theexpectation that the correct sense and an incorrectly chosensense will have been clustered together is ∑ c ∈ C  | c | ( | c |− 1 )  N  (  N  − 1 ) , where | c | is the number of senses in the cluster  c . Note thatall of the Senseval systems we rescore attempt to disambig-uate every item in the data set, so coverage is always 100%.This means that in this evaluation, recall, precision, and F-score are always equivalent; we refer to these collectivelysimply as “accuracy” and report them as percentages.Whereas our alignment uses WordNet 3.0, the Senseval-3data set uses WordNet 1.7.1, so we use the WN-Map map-pings (Daud´e et al., 2003) to convert the WordNet 1.7.1synset offsets to WordNet 3.0 synset offsets. Furthermore,because some of the WordNet synset clusters induced byour alignment contain no one common lexical item, we“purify” these clusters by splitting them into smaller ones  system base MFF random  ∆ GAMBL 65.21 69.13 68.88  + 0.25SenseLearner 64.72 68.10 68.47  − 0.37Koc¸ 64.23 67.76 67.54  + 0.22average 64.72 68.33 68.30  + 0.03Table 3: Senseval-3 WSD accuracy using our MFF-purifiedclusters and random clustering of equivalent granularity system base LFF random  ∆ GAMBL 65.21 68.99 68.70  + 0.28SenseLearner 64.72 67.96 68.22  − 0.26Koc¸ 64.23 67.71 67.43  + 0.28average 64.72 68.22 68.12  + 0.10Table 4: Senseval-3 WSD accuracy using our LFF-purifiedclusters and random clustering of equivalent granularitysuch that each synset in the cluster shares at least one lex-ical item with all the others. We tested two cluster puri-fication approaches: in the first, we create a new clusterby taking from the srcinal cluster all synsets containing itsmost common lexical item, and repeat this until the srcinalcluster isempty. Werefer to this technique as  most-frequent  first  , or MFF. The second approach ( least-frequent first  , orLFF) workssimilarly, except thatnewclustersareconstruc-ted according to the  least   common lexical item.The results of this evaluation using MFF and LFF clustersare shown in Tables 3 and 4, respectively. The tablecolumns show, in order, the systems’ srcinal accuracyscores, 6 the accuracies rescored according to the WordNetclustering induced by our full three-way alignment, the ac-curacies rescored according to a random clustering of equi-valent granularity, and the improvement of our clusteringrelative to the random one. As can be seen, the effect of our clusters on system performance is practically indistin-guishable from using the random clusterings. By compar-ison, Snow et al. (2007) report a modest but presumablysignificant average improvement of 3.55 percentage points. 5.2. Enriched sense inventory forknowledge-based WSD In this evaluation we attempted to measure the contribu-tion of additional sense information from aligned sensesto knowledge-based word sense disambiguation. First, weenriched the glosses of WordNet senses with those fromtheir aligned Wiktionary and Wikipedia senses. (In the caseof Wikipedia, we used the first paragraph of the article.)We then ran a popular knowledge-based WSD baseline,the simplified Lesk algorithm (Kilgarriff and Rosenzweig,2000), on the aforementioned Senseval-3 data set. This al-gorithm selects a sense for the target word solely on thebasis of how many words the sense gloss and target wordcontext have in common, so additional, accurate gloss in- 6 The slight difference in scores with respect to those reportedin Snyder and Palmer (2004) is an artifact of the conversion fromWordNet 1.7.1 to WordNet 3.0. glosses coverage precision recall F 1 standard 26.85 69.23 18.59 29.30enriched 29.17 67.26 19.62 30.38Table 5: Senseval-3 WSD accuracy using simplified Lesk,with and without alignment-enriched sense glosses glosses coverage precision recall F 1 standard 98.61 53.46 52.71 53.08enriched 98.76 51.07 50.44 50.75Table 6: Senseval-3 WSD accuracy using simplified ex-tended Lesk with 30 lexical expansions, with and withoutalignment-enriched sense glossesformation should help close the lexical gap and thereforeincrease both coverage and accuracy.The results of this evaluation are shown in Table 5. As pre-dicted, coverage increased somewhat. The overall increasein recall was modest but statistically significant (correctedMcNemar’s  χ  2 =  6 . 22, df   =  1,  χ  21 , 0 . 95  =  3 . 84).The fact that our enriched sense representations boosted theaccuracy of this simple baseline motivated us to repeat theexperiment with a state-of-the-art knowledge-based WSDsystem. For this we used the system described in Milleret al. (2012), a variant of the simplified extended Lesk al-gorithm (Banerjee and Pedersen, 2002) which enriches thecontext and glosses with lexical items from a distributionalthesaurus. However, as can be seen in Table 6, recall de-creased by 2.27 percentage points; this difference was alsostatistically significant (corrected McNemar’s  χ  2 =  6 . 51,df   =  1,  χ  21 , 0 . 95  =  3 . 84). It seems, therefore, that the addi-tional gloss information derived from our alignment is notcompatible with the lexical expansion technique.To gain some insight as to why, or at least when, this is thecase, we compared the instances incorrectly disambiguatedwhen using the standard glosses but not when using the en-riched glosses against the instances incorrectly disambig-uated when using the enriched glosses but not when usingthe standard glosses. Both setshad about the same POS dis-tribution. However, the words represented in the latter setwere much rarer (an average of 178 occurrences in SemCor(Milleretal., 1994), versus302fortheformerset)andmorepolysemous (7.8 senses on average versus 6.5). The correctdisambiguations in the latter set were also more likely to bethe most frequent sense (MFS) for the given word, as tabu-lated in SemCor (71.6% MFS versus 63.3%). Using the en-riched sense glosses seems to be slightly worse for shortercontexts—the corresponding second set of misclassified in-stances had an average sentence length of 123 tokens com-pared to the other’s 127. (By comparison, the average sen-tence lengths where both methods correctly or incorrectlydisambiguated the target word were 135 and 131, respect-ively.) 6. Conclusion and future work In this paper we described a straightforward technique forproducing an  n -way alignment of LSRs from arbitrary pair-
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks