Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives
Аутори
Mišić, KsenijaAnđelić, Sara
Ilić, Lenka
Osmani, Dajana
Manojlović, Milica
Filipović Đurđević, Dušica
Конференцијски прилог (Објављена верзија)
Метаподаци
Приказ свих података о документуАпстракт
The majority of words can denote multiple related objects/phenomena, i.e. can have multiple
related senses – so called polysemes. Understanding this linguistic phenomenon is therefore of
high importance both in terms of linguistic inquiries and in terms of psychological studies of
cognitive mechanisms. Previous research demonstrated that, in addition to the number of
senses, processing is also influenced by the balance of sense probabilities (Filipović Đurđević &
Kostić, 2021). However, the resources for the study of lexical ambiguity are very sparce (e.g. a
database of 150 polysemous Serbian nouns; Filipović Đurđević & Kostić, 2017). Additionally,
most of these effects were demonstrated either within a single part of speech category
(typically nouns) or for ambiguous words with senses that span across various part of speech
(e.g. a record / to record; as pointed out by Eddington & Tokowicz, 2015). Therefore, the goal of
this paper is to present a new open database containing... raw and categorized native speakers’
semantic intuitions for 308 Serbian polysemous nouns (100), verbs (100), adjectives (108) and
multiple quantifications representing an array of the level of ambiguity indices.
For each of the polysemous words, we collected semantic intuitions of native speakers by using
the total meaning metric (Azuma, 1997). We then categorized the collected descriptions by
using three strategies: a) relying solely on semantic intuition, b) relying solely on dictionary
descriptions, and c) combining semantic intuitions and dictionary descriptions. Within each
strategy, we also monitored and investigated the effect of the coder (the researcher performing
the categorization) in order to explore the robustness of each approach. We then generated
the sense probability distributions for each word by counting the response frequencies across
created categories. In order to quantify the level of ambiguity, we calculated the number of
senses, redundancy, and entropy of the obtained sense probability distributions (Shannon,
1948; Filipović Đurđević & Kostić, 2017). Each measure, within each approach was also
corrected for the effects of idiosyncratic senses, reflexive verbs etc. This database will be
openly available and will provide a useful resource in ambiguity research. In future, this
database should be expanded with measures from word embeddings (i.e. BERT; Wiedemann et
al., 2019) that separate different word senses. This will allow for quantifying the level of
ambiguity on large-scale samples of text that may reveal a more precise estimation of sense
numbers and sense probabilities, and would allow for abandoning the counting-of-senses
approach (as suggested by Filipović Đurđević et al., 2009). Adding this to the database in the
future, and therefore allowing comparison to existing measures may allow another validation
point for measures derived from human participants.
Кључне речи:
open database / polysemous nouns / polysemous verbs / polysemous adjectivesИзвор:
Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad, 2023, 27-Издавач:
- Faculty of Philosophy in Novi Sad
Финансирање / пројекти:
- Министарство науке, технолошког развоја и иновација Републике Србије, институционално финансирање - 200163 (Универзитет у Београду, Филозофски факултет) (RS-MESTD-inst-2020-200163)
Институција/група
Psihologija / PsychologyTY - CONF AU - Mišić, Ksenija AU - Anđelić, Sara AU - Ilić, Lenka AU - Osmani, Dajana AU - Manojlović, Milica AU - Filipović Đurđević, Dušica PY - 2023 UR - http://reff.f.bg.ac.rs/handle/123456789/5127 AB - The majority of words can denote multiple related objects/phenomena, i.e. can have multiple related senses – so called polysemes. Understanding this linguistic phenomenon is therefore of high importance both in terms of linguistic inquiries and in terms of psychological studies of cognitive mechanisms. Previous research demonstrated that, in addition to the number of senses, processing is also influenced by the balance of sense probabilities (Filipović Đurđević & Kostić, 2021). However, the resources for the study of lexical ambiguity are very sparce (e.g. a database of 150 polysemous Serbian nouns; Filipović Đurđević & Kostić, 2017). Additionally, most of these effects were demonstrated either within a single part of speech category (typically nouns) or for ambiguous words with senses that span across various part of speech (e.g. a record / to record; as pointed out by Eddington & Tokowicz, 2015). Therefore, the goal of this paper is to present a new open database containing raw and categorized native speakers’ semantic intuitions for 308 Serbian polysemous nouns (100), verbs (100), adjectives (108) and multiple quantifications representing an array of the level of ambiguity indices. For each of the polysemous words, we collected semantic intuitions of native speakers by using the total meaning metric (Azuma, 1997). We then categorized the collected descriptions by using three strategies: a) relying solely on semantic intuition, b) relying solely on dictionary descriptions, and c) combining semantic intuitions and dictionary descriptions. Within each strategy, we also monitored and investigated the effect of the coder (the researcher performing the categorization) in order to explore the robustness of each approach. We then generated the sense probability distributions for each word by counting the response frequencies across created categories. In order to quantify the level of ambiguity, we calculated the number of senses, redundancy, and entropy of the obtained sense probability distributions (Shannon, 1948; Filipović Đurđević & Kostić, 2017). Each measure, within each approach was also corrected for the effects of idiosyncratic senses, reflexive verbs etc. This database will be openly available and will provide a useful resource in ambiguity research. In future, this database should be expanded with measures from word embeddings (i.e. BERT; Wiedemann et al., 2019) that separate different word senses. This will allow for quantifying the level of ambiguity on large-scale samples of text that may reveal a more precise estimation of sense numbers and sense probabilities, and would allow for abandoning the counting-of-senses approach (as suggested by Filipović Đurđević et al., 2009). Adding this to the database in the future, and therefore allowing comparison to existing measures may allow another validation point for measures derived from human participants. PB - Faculty of Philosophy in Novi Sad C3 - Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad T1 - Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives SP - 27 UR - https://hdl.handle.net/21.15107/rcub_reff_5127 ER -
@conference{ author = "Mišić, Ksenija and Anđelić, Sara and Ilić, Lenka and Osmani, Dajana and Manojlović, Milica and Filipović Đurđević, Dušica", year = "2023", abstract = "The majority of words can denote multiple related objects/phenomena, i.e. can have multiple related senses – so called polysemes. Understanding this linguistic phenomenon is therefore of high importance both in terms of linguistic inquiries and in terms of psychological studies of cognitive mechanisms. Previous research demonstrated that, in addition to the number of senses, processing is also influenced by the balance of sense probabilities (Filipović Đurđević & Kostić, 2021). However, the resources for the study of lexical ambiguity are very sparce (e.g. a database of 150 polysemous Serbian nouns; Filipović Đurđević & Kostić, 2017). Additionally, most of these effects were demonstrated either within a single part of speech category (typically nouns) or for ambiguous words with senses that span across various part of speech (e.g. a record / to record; as pointed out by Eddington & Tokowicz, 2015). Therefore, the goal of this paper is to present a new open database containing raw and categorized native speakers’ semantic intuitions for 308 Serbian polysemous nouns (100), verbs (100), adjectives (108) and multiple quantifications representing an array of the level of ambiguity indices. For each of the polysemous words, we collected semantic intuitions of native speakers by using the total meaning metric (Azuma, 1997). We then categorized the collected descriptions by using three strategies: a) relying solely on semantic intuition, b) relying solely on dictionary descriptions, and c) combining semantic intuitions and dictionary descriptions. Within each strategy, we also monitored and investigated the effect of the coder (the researcher performing the categorization) in order to explore the robustness of each approach. We then generated the sense probability distributions for each word by counting the response frequencies across created categories. In order to quantify the level of ambiguity, we calculated the number of senses, redundancy, and entropy of the obtained sense probability distributions (Shannon, 1948; Filipović Đurđević & Kostić, 2017). Each measure, within each approach was also corrected for the effects of idiosyncratic senses, reflexive verbs etc. This database will be openly available and will provide a useful resource in ambiguity research. In future, this database should be expanded with measures from word embeddings (i.e. BERT; Wiedemann et al., 2019) that separate different word senses. This will allow for quantifying the level of ambiguity on large-scale samples of text that may reveal a more precise estimation of sense numbers and sense probabilities, and would allow for abandoning the counting-of-senses approach (as suggested by Filipović Đurđević et al., 2009). Adding this to the database in the future, and therefore allowing comparison to existing measures may allow another validation point for measures derived from human participants.", publisher = "Faculty of Philosophy in Novi Sad", journal = "Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad", title = "Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives", pages = "27", url = "https://hdl.handle.net/21.15107/rcub_reff_5127" }
Mišić, K., Anđelić, S., Ilić, L., Osmani, D., Manojlović, M.,& Filipović Đurđević, D.. (2023). Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives. in Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad Faculty of Philosophy in Novi Sad., 27. https://hdl.handle.net/21.15107/rcub_reff_5127
Mišić K, Anđelić S, Ilić L, Osmani D, Manojlović M, Filipović Đurđević D. Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives. in Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad. 2023;:27. https://hdl.handle.net/21.15107/rcub_reff_5127 .
Mišić, Ksenija, Anđelić, Sara, Ilić, Lenka, Osmani, Dajana, Manojlović, Milica, Filipović Đurđević, Dušica, "Open database of polysemous senses of 308 Serbian polysemous nouns, verbs, and adjectives" in Book of abstracts, 10th Novi Sad workshop on Psycholinguistic, neurolinguistic, and clinical linguistic research, April 22, Faculty of Philosophy, University of Novi Sad (2023):27, https://hdl.handle.net/21.15107/rcub_reff_5127 .
Related items
Showing items related by title, author, creator and subject.
-
Number, Relative Frequency, Entropy, Redundancy, Familiarity, and Concreteness of Word Senses: Ratings for 150 Serbian Polysemous Nouns
Filipović Đurđević, Dušica; Kostić, Aleksandar (UNIVERSITY OF NOVI SAD, FACULTY OF PHILOSOPHY, 2017) -
Can a naive discrimination learning model classify inflected forms of polysemous nouns?
Mišić, Ksenija; Filipović Đurđević, Dušica (Faculty of Philosophy in Novi Sad, 2021) -
Temporal dynamics of polysemous verb processing
Mišić, Ksenija; Filipović Đurđević, Dušica (Institute for Psychology and Laboratory for Experimental Psychology, Faculty of Philosophy in Belgrade, 2020)