4 Results

The distribution of property types is analyzed both class-independently and within each class (separately for German and Italian), and an unsupervised clustering analysis based on property types is conducted.

Distributional Analysis

We first look at the issue of how comparable the German and Italian data are, starting with a check of the overlap at the level of specific properties.

There are 226 concept–property pairs that were produced by at least 10 German subjects; 260 pairs were produced by at least 10 Italians. Among these common pairs, 156 (i. e., 69% of the total German pairs, and 60% of the Italian pairs) are shared across the 2 languages. This suggests that the two sets are quite similar, since the overlap of specific pairs is strongly affected by small differences in normalization (e. g., has a fur, has fur and is hairy count as completely different properties).

Of greater interest to us is to check to what extent property types vary across languages and across concept classes. In order to focus on the main patterns emerging from the data, we limit our analysis to the 6 most common property types in the whole data set (that are also the top 6 types in the two languages separately), accounting for 69% of the overall responses. These types are:

• (external) part (WB code: ece; “dog has 4 legs”)

• (external) quality (WB code: ese; “apple is green”)

• behaviour (WB code: eb; “dog barks”)

• function (WB code: sf ; “broom is for sweeping”)

• location (WB code: sl; “skyscraper is found in cities”)

Figure 1 compares the distribution of property types in the two languages via a mosaic plot (Meyer et al., 2006), where rectangles have areas proportional to observed frequencies in the corresponding cells. The overall distribution is very similar. The only significant differences pertain to category and location types: Both differences are significant at the level p < 0.0001, according to a Pearson residual test (Zeileis et al., 2005).

For the difference in location, no clear pattern emerges from a qualitative analysis of German and Italian location properties. Regarding the difference in (superordinate) categories, we find, interestingly, a small set of more or less abstract hypernyms that are frequently produced by Italians, but never by Germans: construction (72), object (36), structure (16). In the these cases, the Italian translations have subtle shades of meaning that make them more likely to be used than their German counterparts. For example, the Italian word oggetto (“object”) is used somewhat more concretely than the extremely abstract German word Objekt (or English “object”, for that matter) – in Italian, the word might carry more of an “artifact, man-made item” meaning. At the same time, oggetto is less colloquial than German Sache, and thus more amenable to be entered in a written definition. In addition, among others, the category vehicle was more frequent in the Italian than in the German data set (for which one reason could be the difference between the German and Italian equivalents, which was discussed in section 3.3). Differences of this sort remind us that property elicitation is first and foremost a verbal task, and as such it is constrained by language-specific usages. It is left to future research to test to what extent linguistic constraints also affect deeper conceptual representations (would Italians be faster than Germans type at recognizing super ordinate properties of concepts when they are expressed non-verbally?).

Despite the differences we just discussed, the main trend emerging is one of essential agreement between the two languages, and indicates that, with some caveats, salient property types may be cross-linguistically robust. We, thus, turn to the issue of how such types are distributed across concepts of different classes. This question is visually answered by the association plots on the following page.

Each plot illustrates, through rectangle heights, how much each cell deviates from the value expected given the overall contingency tables (in our case, the reference contingency tables are the language-specific distributions). The sign of the deviation is coded by direction with respect to the baseline. For example, the first row of the left plot tells us, among other things, that in German behavior properties are strongly over-represented in mammals, whereas function properties are under-represented within this class.

The first observation we can make about figure 2 is how, for both languages, a large proportion of cells show a significant departure from the overall distribution. This confirms what has already been observed and reported in the literature on English norms – see, in particular, Vinson et. al. (2008): property types are highly distinctive characteristics of concept classes.

The class-specific distributions are extremely similar in German and Italian. There is no single case in which the same cell is deviating significantly but in opposite directions in the two languages; and the most common pattern by far is the one in which the two languages show the same deviation profile across cells, often with very similar effect sizes (compare, e. g., the behaviour and function columns). These results suggest that property types are not much affected by linguistic factors, an intrinsically interesting finding that also supports our idea of structuring relation-based navigation in a multi-lingual dictionary using concept-class–specific property types.

The type patterns associated with specific concept classes are not particularly surprising, and they have been already observed in previous studies (Vinson and Vigliocco, 2008; Baroni and Lenci, 2008). In particular, living things (animals and plants) are characterized by paucity of functional features, that instead characterise all man-made concepts. Within the living things, animals are characterised by typical behaviours (they bark, fly, etc.) and, to a lesser extent, parts (they have legs, wings, etc.), whereas plants are characterised by a wealth of qualities (they are sweet, yellow, etc.)

Differences are less pronounced within man-made objects, but we can observe parts as typical of tool and furniture descriptions. Finally, location is a more typical definitional characteristic of buildings (for clothing, nothing stands out, if not, perhaps, the pronounced lack of association with typical locations). Body parts, interestingly, have a type profile that is very similar to the one of (manipulable) tools – manipulable objects are, after all, extensions of our bodies.

Clustering by Property Types

The distributional analysis presented in the previous section confirmed our main hypotheses – that property types are salient properties of concepts that differ from a concept class to the other, but are robust across languages. However, we did not take skewing effects associated to specific concepts into account (e. g., it could be that, say, the property profile we observe for body parts in figure 2 is really a deceiving average of completely opposite patterns associated to, say, heads and hands).

Moreover, our analysis already assumed a division into classes – but the type patterns, e. g., of mammals and birds are very similar, suggesting that a higher-level “animal” class would be more appropriate when structuring concepts in terms of type profiles. We tackled both issues in an unsupervised clustering analysis of our 50 target concepts based on their property types. If the postulated classes are not internally coherent, they will not form coherent clusters. If some classes should be merged, they will cluster together.

Concepts were represented as 6-dimensional vectors, with each dimension corresponding to one of the 6 common types discussed above, and the value on a dimension given by the number of times that concept triggered a response of the relevant type. We used the CLUTO toolkit 4, selecting the rbr method and setting all other clustering parameters to their default values. We explored partitions into 2 to 10 clusters, manually evaluating the out-put of each solution.

Both in Italian and in German, the best results were obtained with a 3-way partition, neatly corresponding to the division into animals (mammals and birds), plants (vegetables and fruits) and objects plus body parts (that, as we observed above, have a distribution of types very similar to the one of tools). The 2-way solution resulted in merging two of the classes animals and plants both in German and in Italian. The 4-way solution led to an arbitrary partition among objects and body parts (and not, as one could have expected, in separating objects from body parts). Similarly, the 5-to 10-way solutions involve increasingly granular but still arbitrary partitions within the objects/body parts class. However, one notable aspect is that in most cases almost all concepts of mammals and birds, and vegetables and fruits are clustered together (both in German and Italian), expressing their strong similarity in terms of property types as compared to the other classes as defined here.

Looking at the 3-way solution in more detail, in Italian, the concept horse is in the same cluster with objects and body parts (as opposed to German, where the solution is perfect). The misclassification results mainly from the fact that for horse a lot of functional properties were obtained (which is a feature of objects), but none of them for the other animals in the Italian data.

In German, some functional properties were assigned to both horse and dog, which might explain why it was not misclassified there.

To conclude, the type profiles associated with animals, vegetables and objects/body parts have enough internal coherence that they robustly identify these macro-classes in both languages. Interestingly, a 3-way distinction of this sort – excluding body parts – is seen as fundamental on the basis of neuro-cognitive data by Caramazza and Shelton (1998). On the other hand, we did not find evidence that more granular distinctions could be made based on the few (6) and very general types we used. We plan to explore the distribution across the remaining types in the future (preliminary clustering experiments show that much more nuanced discriminations, even among all 10 categories, can be made if we use all types). However, for our applied purposes, it is sensible to focus on relatively coarse but well-defined classes, and on just a few common relation types (alternatively, we plan to combine types into superordinate ones, e. g. external and internal quality). This should simplify both the automatic harvesting of corpus-based properties of the target types and the structuring of the dictionary relational interface.

Finally, the peculiar object-like behaviour of body parts on the one hand, and the special nature of horse, on the other, should remind us of how concept classification is not a trivial task, once we try to go beyond the most obvious categories typically studied by cognitive scientists – animals, plants, manipulable tools. In a lexicographic perspective, this problem cannot be avoided, and, indeed, the proposed approach should scale in difficulties to even trickier domains, such as those of actions or emotions.


Conclusion

This research is part of a project that aims to investigate the cognitive salience of semantic relations for (pedagogical) lexicographic purposes. The resulting most salient relations are to be used for revising and adding to the word field entries of a multilingual electronic dictionary in a language learning environment.

We presented a multi-lingual concept description experiment. Participants produced different semantic relation type patterns across concept classes. Moreover, these patterns were robust across the two native languages studied in the experiment – even though a closer look at the data suggested that linguistic constraints might affect (verbalisations of) conceptual representations (and thus, to a certain extent, which properties are produced). This is a promising result to be used for automatically harvesting semantically related words for a given lexical entry of a concept class.

However, the granularity of concept classes has to be defined. In addition, to yield a larger number of usable data for the analysis, a re-mapping of the rare semantic relation types occurring in the actual data set should be conducted. Moreover, the stimuli set will have to be expanded to include, e. g., abstract concepts – although we hope to mine some abstract concept classes on the basis of the properties of our concept set (colors, for example, could be characterized by the concrete objects of which they are typical).

To complement the production experiment results, we aim to conduct an experiment which investigates the perceptual salience of the produced semantic relations (and possibly additional ones), in order to detect inconsistencies between generation and retrieval of salient properties. If, as we hope, we will find that essentially the same properties are salient for each class across languages and both in production and perception, we will then have a pretty strong argument to suggest that these are the relations one should focus on when populating multi-lingual dictionaries.

Of course, the ultimate test of our approach will come from empirical evidence of the usefulness of our relation links to the language learner. This is, however, beyond the scope of the current project.


Информация о работе «Cognitive aspects of lexicon in the light of the language picture of the world»
Раздел: Иностранный язык
Количество знаков с пробелами: 35875
Количество таблиц: 0
Количество изображений: 0

Похожие работы

Скачать
86684
4
0

... . – The fence has just been painted. The fact that the indefinite to this graduation of dynamism in passive constructions. Chapter II. Contextual and functional features of the Passive forms in English and Russian   2.1 The formation of the Passive Voice   The passive voice is formed by means of the auxiliary verb to be in the required form and Participle II of the notional verb. a)  The ...

Скачать
103507
1
0

... is not quite true for English. As for the affix morpheme, it may include either a prefix or a suffix, or both. Since prefixes and many suffixes in English are used for word-building, they are not considered in theoretical grammar. It deals only with word-changing morphemes, sometimes called auxiliary or functional morphemes. (c)  An allomorph is a variant of a morpheme which occurs in certain ...

Скачать
93279
0
0

... . In the above example the verb undergo can be replaced by its synonyms without any change of the sentence meaning. This may be easily proved if a similar context is found for some other synonym in the same group. For instance: These Latin words suffered many transformations in becoming French. The denotational meaning is obviously the same. Synonyms, then, are interchangeable under certain ...

Скачать
149109
4
0

... , finally, the observation and analysis must be objective. 2.1.2.   Approaches to observation in the language classroom studies Observation in the language classroom is treated either as a research procedure for in -service professional development or as a learning tool for pre-service teachers. Hargreaves (1980:212) suggests that the 1970s were a ‘notable decade’ for classroom studies thanks ...

0 комментариев


Наверх