Monday, March 5, 2007

syntax

In linguistics, syntax is the study of the rules, or "patterned relations", that govern the way words combine to form phrases and phrases combine to form sentences. The word originates from the Greek words συν (syn), meaning "co-" or "together", and τάξις (táxis), meaning "sequence, order, or arrangement". The combinatory behavior of words is governed to a first approximation by their part of speech (noun, adjective, verb, etc., a categorization that goes back in the Western tradition to the Greek grammarian Dionysius Thrax). Modern research into natural language syntax attempts to systematize descriptive grammar and, for many practitioners, to find general laws that govern the syntax of all languages. It is unconcerned with prescriptive grammar (see prescription and description).

Theories of syntax differ in the object of study. While formal grammars (especially in the generative grammar tradition) have focused on the mental process of language production (i-language), empirical grammars have focused on linguistic function, explaining the language in use (corpus linguistics). The latter often encodes frequency data in addition to production rules, and provide mechanisms for learning the grammar (or at least the probabilities) from usage data. One way of considering the space of grammars is to distinguish those that do not encode rule frequency (the majority) and those that do (probabilistic grammars).

From a biological and neurobiological perspective syntax has recently played a crucial role. On the one hand, it has been proven that syntax (in that it involves recursion rules) is a specific characteristic of all and only human language; on the other, experiments in neuroimaging have shown that that a dedicated network in the human brain (crucially involving Broca's area, a portion of the left inferior frontal gyrus), is selectively activated by those languages that meet the Universal Grammar requirements characterizing all and only human languages as shown by generative grammar in the pioneering work of Noam Chomsky.
Contents
[hide]

* 1 History of syntax
* 2 Formal syntax
* 3 Empirical approaches to syntax
* 4 See also
o 4.1 Syntactic terms
* 5 References
* 6 External links

[edit] History of syntax

Syntax, literally "composition", is an Ancient Greek work, whereas the name of other domain of linguistics such semantics or morphology are recent (19th century). The history of this field is rather complicated: two landmarks in the field are the first complete Greek grammar, written by Dionysus Thrax in the 1st century BC - a model for Roman grammarians, whose work led to the medieval and Renaissance vernacular grammars - and the Grammaire of Port-Royal - a Cistercian convent in the Vallée de Chevreuse southwest of Paris that launched a number of culturally important institutions. The central role of syntax within theoretical linguistics became clear only in the last century which could reasonably called the "century of syntactic theory" as far as linguistics is concerned. For a detailed and critical survey of the history of syntax in the last two centuries see the monumental work by Graffi 2001 [1]

[edit] Formal syntax

There are many theories of formal syntax — theories that have in time risen or fallen in influence. Most theories of syntax share at least two commonalities. First, they hierarchically group subunits into constituent units (phrases). Second, they provide some system of rules to explain patterns of acceptability/grammaticality and unacceptability/ungrammaticality. Most formal theories of syntax offer explanations of the systematic relationships between syntactic form and semantic meaning. The earliest framework of semiotics was established by Charles W. Morris in his 1938 book Foundations of the Theory of Signs. Syntax is defined within the study of signs as the first of its three subfields, specifically the study of the interrelation of the signs. The second subfield is semantics and is the study of the relation between the signs and the objects to which they apply. The third is pragmatics which studies the relationship between the sign system and the user.

In the framework of transformational-generative grammar (of which government and binding theory and minimalism are recent developments), the structure of a sentence is represented by phrase structure trees, otherwise known as phrase markers or tree diagrams. Such trees provide information about the sentences they represent by showing the hierarchical relations between their component parts.

There are various theories for designing the best grammars such that by systematic application of the rules, one can arrive at every phrase marker in a language and hence every sentence in the language. The most common are Phrase structure grammars, preferred by Noam Chomsky's MIT school of linguistics, and ID/LP grammars, the latter of which some argue has an explanatory advantage (especially those in opposition to the MIT school of linguistics, such as Ivan Sag and Geoffrey Pullum.) Dependency grammar is a class of syntactic theories separate from generative grammar in which structure is determined by the relation between a word (a head) and its dependents. One difference from phrase structure grammar is that dependency grammar does not have phrasal categories. Algebraic syntax is a type of dependency grammar.

A modern approach to combining accurate descriptions of the grammatical patterns of language with their function in context is that of systemic functional grammar, an approach originally developed by Michael A.K. Halliday in the 1960s and now pursued actively on all continents. Systemic-functional grammar is related both to feature-based approaches such as Head-driven phrase structure grammar and to the older functional traditions of European schools of linguistics such as British Contextualism and the Prague School.

Tree adjoining grammar is a grammar formalism with interesting mathematical properties which has sometimes been used as the basis for the syntactic description of natural language. In monotonic and monostratal frameworks, variants of unification grammar are often preferred formalisms.

With the publication of Gold's Theorem[2] 1967 it was claimed that grammars for natural languages governed by deterministic rules could not be learned based on positive instances alone. This was part of the argument from the poverty of stimulus, first presented in 1980[3]. This led to the nativist view, that a form of grammar (including a complete conceptual lexicon in certain versions) were hardwired from birth.

[edit] Empirical approaches to syntax

A grammar is a description of the syntax of a language. Theoretical models rarely consider the language in use, as revealed by corpus linguistics, but focus on a mental language or i-language as its "proper" object of study. In contrast, the "empirically responsible"[4] approach to syntax seeks to construct grammars that will explain language in use. A key class of grammars in the latter tradition are the stochastic context-free grammars.

A problem faced in any formal syntax is that often more than one production rule may apply to a structure, thus resulting in a conflict. The greater the coverage, the higher this conflict, and all grammarians (starting with Panini) have spent considerable effort devising a prioritization for the rules, which usually turn out to be defeasible. Another difficulty is overgeneration, where unlicensed structures are also generated. Probabilistic grammars circumvent these problems by using the frequency of various productions to order them, resulting in a "most likely" (winner-take-all) interpretation, which by definition, is defeasible given additional data. As usage patterns are altered in diachronic shifts, these probabilistic rules can be re-learned, thus upgrading the grammar.

One may construct a probabilistic grammar from a traditional formal syntax by assigning each non-terminal a probability taken from some distribution, to be eventually estimated from usage data. On most samples of broad language, probabilistic grammars that tune these probabilities from data typically outperform hand-crafted grammars (although some rule-based grammars are now approaching the accuracies of PCFG).

Recently, probabilistic grammars appear to have gained some cognitive plausibility. It is well known that there are degrees of difficulty in accessing different syntactic structures (e.g. the Accessibility Hierarchy for relative clauses). Probabilistic versions of minimalist grammars have been used to compute information-theoretic entropy values which appear to correlate well with psycholinguistic data on understandability and production difficulty.[5]

Statistical grammars are not subject to Gold's theorem since the learning is incremental.

No comments: