Uncertainty of polysemous word senses in the light of discrimination learning
Conference object (Published version)
Metadata
Show full item recordAbstract
This research will present an attempt to simulate the processing effects of polysemy using the
model based on discrimination learning (Baayen, et al., 2011) thus showing that lexical-
semantic processing can be described using the principles of error-driven learning.
Polysemous words denote multiple related senses (e.g. scientific paper, daily paper,
wrapping paper, etc.; Eddington & Tokowicz, 2015). The starting point of this research is the
finding that the processing of polysemous words is affected both by the number of senses and
by the balance of sense probabilities. It has been shown that the processing latencies of the
polysemous words decrease as the number of senses increases and as the redundancy of
sense probability distribution decreases (i.e. as the balance of sense probabilities increases;
Filipović Đurđević, 2007; Filipović Đurđević & Kostić, under review).
Here, 150 polysemous words for which processing latencies were previously collected
were split into bigr...ams which served as the input to the model, i.e. the cues. At the output
level, each set of bigrams constituting one word form was linked to its corresponding lemma
and to the co-occurring context words, i.e. the context words which appeared within the -/+3
window surrounding the target word form. The context words were preselected from the
Frequency Dictionary of the Contemporary Serbian Language (Kostić, 1999). We started with
3000 most frequent nouns, adjectives, and verbs (1000 each), and ended with 2383 context
words after excluding the homographs. We started by building first-order co-occurrence
vectors (Schütze, 1998) for 150 polysemous words which were presented in the experiment.
Separate vectors were built for each occurrence of the word, each vector consisting of the
zeros (0) for the context words that were not found within the seven-point window, and the
ones (1) for the context words that co-occurred with the target word. This information was then
used to represent the lemma followed by the co-occurring context words as the outcomes.
The simulation was run in R (R CoreTeam, 2017), using ndl package (Arppe et al., 2015),
following the procedure described in Baayen et al., (2011). The activations were calculated for
each outcome by summing the strengths of all the bigrams present in the target word. Finally,
the corresponding activations for the lemma and the co-occurring context words were
summed. These activations were taken as the indicator of the strength of support for the given
outcome by the cues which were present in the input. The given outcome consisted of lemma
and co-occurring context words.
The calculated activations were significantly correlated both with processing latencies
observed in the experiment and with descriptors of lexical ambiguity. We observed negative
correlation between activations and processing latencies (r=-.42, t(144)=5.639, p<.001),
positive correlation between number of senses and activation (r=.18, t(144)=2.229, p=.027),
and negative correlation between redundancy of sense probability distribution and activation
(r=-.23, t(144), p=.004). However, when we performed multiple linear regression with several
lexical variables in addition to the number of senses and redundancy as predictors of
activation, only redundancy accounted for the activation variance over and above the
contribution of familiarity, concreteness, and orthographic neighborhood size.
This finding brings evidence that the effect of balance of sense probabilities can be
simulated in a model based on the principles of discrimination learning. In other words, it
demonstrates that semantic ambiguity effects can arise through error-driven learning.
Keywords:
discrimination learning / polysemy / semantic ambiguitySource:
Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen, 2021, 9-Funding / projects:
- Ministry of Science, Technological Development and Innovation of the Republic of Serbia, institutional funding - 200163 (University of Belgrade, Faculty of Philosophy) (RS-MESTD-inst-2020-200163)
Institution/Community
Psihologija / PsychologyTY - CONF AU - Filipović Đurđević, Dušica PY - 2021 UR - http://reff.f.bg.ac.rs/handle/123456789/5145 AB - This research will present an attempt to simulate the processing effects of polysemy using the model based on discrimination learning (Baayen, et al., 2011) thus showing that lexical- semantic processing can be described using the principles of error-driven learning. Polysemous words denote multiple related senses (e.g. scientific paper, daily paper, wrapping paper, etc.; Eddington & Tokowicz, 2015). The starting point of this research is the finding that the processing of polysemous words is affected both by the number of senses and by the balance of sense probabilities. It has been shown that the processing latencies of the polysemous words decrease as the number of senses increases and as the redundancy of sense probability distribution decreases (i.e. as the balance of sense probabilities increases; Filipović Đurđević, 2007; Filipović Đurđević & Kostić, under review). Here, 150 polysemous words for which processing latencies were previously collected were split into bigrams which served as the input to the model, i.e. the cues. At the output level, each set of bigrams constituting one word form was linked to its corresponding lemma and to the co-occurring context words, i.e. the context words which appeared within the -/+3 window surrounding the target word form. The context words were preselected from the Frequency Dictionary of the Contemporary Serbian Language (Kostić, 1999). We started with 3000 most frequent nouns, adjectives, and verbs (1000 each), and ended with 2383 context words after excluding the homographs. We started by building first-order co-occurrence vectors (Schütze, 1998) for 150 polysemous words which were presented in the experiment. Separate vectors were built for each occurrence of the word, each vector consisting of the zeros (0) for the context words that were not found within the seven-point window, and the ones (1) for the context words that co-occurred with the target word. This information was then used to represent the lemma followed by the co-occurring context words as the outcomes. The simulation was run in R (R CoreTeam, 2017), using ndl package (Arppe et al., 2015), following the procedure described in Baayen et al., (2011). The activations were calculated for each outcome by summing the strengths of all the bigrams present in the target word. Finally, the corresponding activations for the lemma and the co-occurring context words were summed. These activations were taken as the indicator of the strength of support for the given outcome by the cues which were present in the input. The given outcome consisted of lemma and co-occurring context words. The calculated activations were significantly correlated both with processing latencies observed in the experiment and with descriptors of lexical ambiguity. We observed negative correlation between activations and processing latencies (r=-.42, t(144)=5.639, p<.001), positive correlation between number of senses and activation (r=.18, t(144)=2.229, p=.027), and negative correlation between redundancy of sense probability distribution and activation (r=-.23, t(144), p=.004). However, when we performed multiple linear regression with several lexical variables in addition to the number of senses and redundancy as predictors of activation, only redundancy accounted for the activation variance over and above the contribution of familiarity, concreteness, and orthographic neighborhood size. This finding brings evidence that the effect of balance of sense probabilities can be simulated in a model based on the principles of discrimination learning. In other words, it demonstrates that semantic ambiguity effects can arise through error-driven learning. C3 - Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen T1 - Uncertainty of polysemous word senses in the light of discrimination learning SP - 9 UR - https://hdl.handle.net/21.15107/rcub_reff_5145 ER -
@conference{ author = "Filipović Đurđević, Dušica", year = "2021", abstract = "This research will present an attempt to simulate the processing effects of polysemy using the model based on discrimination learning (Baayen, et al., 2011) thus showing that lexical- semantic processing can be described using the principles of error-driven learning. Polysemous words denote multiple related senses (e.g. scientific paper, daily paper, wrapping paper, etc.; Eddington & Tokowicz, 2015). The starting point of this research is the finding that the processing of polysemous words is affected both by the number of senses and by the balance of sense probabilities. It has been shown that the processing latencies of the polysemous words decrease as the number of senses increases and as the redundancy of sense probability distribution decreases (i.e. as the balance of sense probabilities increases; Filipović Đurđević, 2007; Filipović Đurđević & Kostić, under review). Here, 150 polysemous words for which processing latencies were previously collected were split into bigrams which served as the input to the model, i.e. the cues. At the output level, each set of bigrams constituting one word form was linked to its corresponding lemma and to the co-occurring context words, i.e. the context words which appeared within the -/+3 window surrounding the target word form. The context words were preselected from the Frequency Dictionary of the Contemporary Serbian Language (Kostić, 1999). We started with 3000 most frequent nouns, adjectives, and verbs (1000 each), and ended with 2383 context words after excluding the homographs. We started by building first-order co-occurrence vectors (Schütze, 1998) for 150 polysemous words which were presented in the experiment. Separate vectors were built for each occurrence of the word, each vector consisting of the zeros (0) for the context words that were not found within the seven-point window, and the ones (1) for the context words that co-occurred with the target word. This information was then used to represent the lemma followed by the co-occurring context words as the outcomes. The simulation was run in R (R CoreTeam, 2017), using ndl package (Arppe et al., 2015), following the procedure described in Baayen et al., (2011). The activations were calculated for each outcome by summing the strengths of all the bigrams present in the target word. Finally, the corresponding activations for the lemma and the co-occurring context words were summed. These activations were taken as the indicator of the strength of support for the given outcome by the cues which were present in the input. The given outcome consisted of lemma and co-occurring context words. The calculated activations were significantly correlated both with processing latencies observed in the experiment and with descriptors of lexical ambiguity. We observed negative correlation between activations and processing latencies (r=-.42, t(144)=5.639, p<.001), positive correlation between number of senses and activation (r=.18, t(144)=2.229, p=.027), and negative correlation between redundancy of sense probability distribution and activation (r=-.23, t(144), p=.004). However, when we performed multiple linear regression with several lexical variables in addition to the number of senses and redundancy as predictors of activation, only redundancy accounted for the activation variance over and above the contribution of familiarity, concreteness, and orthographic neighborhood size. This finding brings evidence that the effect of balance of sense probabilities can be simulated in a model based on the principles of discrimination learning. In other words, it demonstrates that semantic ambiguity effects can arise through error-driven learning.", journal = "Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen", title = "Uncertainty of polysemous word senses in the light of discrimination learning", pages = "9", url = "https://hdl.handle.net/21.15107/rcub_reff_5145" }
Filipović Đurđević, D.. (2021). Uncertainty of polysemous word senses in the light of discrimination learning. in Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen, 9. https://hdl.handle.net/21.15107/rcub_reff_5145
Filipović Đurđević D. Uncertainty of polysemous word senses in the light of discrimination learning. in Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen. 2021;:9. https://hdl.handle.net/21.15107/rcub_reff_5145 .
Filipović Đurđević, Dušica, "Uncertainty of polysemous word senses in the light of discrimination learning" in Book of Abstracts, International Conference on Error-Driven Learning in Language (EDLL 2021), March 10 - 12, University of Tübingen (2021):9, https://hdl.handle.net/21.15107/rcub_reff_5145 .