|
Zipf's Law: |
Language has a lot of rare phenomena: New words in English text |
|---|---|
| Negative Data Issues: |
Probabilistic models work with pruning, because the spaces searched are so large. Hypotheses that receive little support (predicting low frequency phenomena) can be eliminated. Consider the problem oif explaining exceptions to productive English valence patterns:
Goldberg (94) uses this kind of argumentation, relativizing the counting to a theory of markedness of contexts:
|
|
Corpus-based Linguistics |
J. R. Firth. "You shall know a word by the company it keeps." Lumpers versus splitters. Once you start looking at corpora, the splitters have a point. The very gross distributional categories of generative grammar ( Noun[num=sing, count=mass,relational=yes]) don't determine the collocational company a word keeps:
|
|
Methodological Qualms |
Are grammaticality intuitions really that important? Examples in text book on p. 10 from van Riemsdijk and Williams (1986):
|
| BUT |
It's folly to believe that corpus-based work will relieve us the need to make tehretical assumptions. At least if we work on meaning. Geoff Samson, rabid anti-Chomskian, distributes a corpus, available here, called Susanne, which is abalkanced Brownlike corpus, with TREES. Where did all those trees come from?
|
| Non-categorical phenomena | |
| Computational issues |
A language system must be able to disambiguate:
|