Resources > AnlamVer Dataset

Word Similarity and Relatedness Dataset for Turkish.
See paper "AnlamVer: Semantic Model Evaluation Dataset for Turkish - Similarity and Relatedness" for details.


Download Final annotated dataset: anlamver-final.cvs

Download Individual scores of each annotator: anlamver-participants.cvs

This dataset is annotated by the open-source software WSQuest.

Column Names

Column Abbr. Column Name Note
QID QuestionID
W1 Word1
W2 Word2
Sim Similarity Participants' average
Rel Relatedness Participants' average
S Similar Is in (similar) sub-space in Sim-Rel vector space.
D Dissimilar ""
R Related ""
U Unrelated ""
SR SimilarRelated ""
DR DissimilarRelated ""
SU SimilarUnrelated ""
DU DissimilarUnrelated ""
AVG-C Average concreteness Individual concreness values from TKN dataset
W1F Word1 frequency Frequency values based on Boun Corpus
W2F Word2 frequency Frequency values based on Boun Corpus
AnyOOV Any out-of-vocabulary(OOV) word exists OOV values are based on BounCorpus
Two Is both words OOV OOV values are based on BounCorpus
EstSyn EstimatedSynonym Word-pair estimated as synonyn relation type before the annotation
EstAny EstimatedAntonym ""
EstRHigh EstimatedHighRelatedness ""
EstRMed EstimatedMediumRelatedness ""
EstRLow EstimatedLowRelatedness ""
EstHyp EstimatedHyponym ""
EstMer EstimatedMeronym ""
W1-RWG RareWord(RW) group of word1 See paper for RW groups. RW groups are assigned by word frequency values.
W2-RWG RareWord(RW) group of word2 ""
RWMin Minimum group of two words in the word-pair ""
W1-DG Derivational group of word1 Value represents how many derivations the word has
W2-DG Derivational group of word2 --
DGMax Max of derivational groups Max(W1-DG,W2-DG)
W1-IG Inflectional group of word1 Value represents how many inflections the word has
W2-IG Inflectional group of word2 --
IGMax Max of inflectional groups Max(W1-IG,W2-IG)


If you use this resource on your research, please cite the following paper: