Language and Computers


Parsing

This is top-down parsing, in the simplest form.
Phrase-
structure
Rules
 

  • S -> NP (M) VP | VP
  • VP -> V NP
  • VP -> V PP
  • NP -> (Art) (Adj) (N) N | ProperName
  • S -> VP (M) VP
  • VP -> is/are Adj
  • PP -> P NP

Good things:

  1. One set of rules for NP, whether in subject position or not.
  2. Lots of sentences we haven't seen before now "generated"
Grammar
Modification
 

Adverb data:

  • Certainly/*completely the team can rely on my support.
  • The team can certainly/completely rely on my support.
  • The team certainly/*completely can rely on my support.
  • The team can rely *certainly/completely on my support.
  • The team can rely on my support certainly/completely.

How can we modify our little set of phrase-structure rules to account for these facts?

    Modifying the grammar:
  1. New Categories:
    1. S-adverb (SAdv)
    2. VP-adverb (VPAdv)
  2. Add or modify existing PS rules to account for data:
    • S -> (SAdv) NP (SAdv) (M) (SAdv) VP (SAdv)
    • VP -> (VPAdv) VP V (VPAdv) PP (VPAdv)
Lexicon  

For our purposes now, we need rules that introduce words

  1. ProperName -> John | Mary | Richard M. Nixon
  2. Noun(N) -> boy | team | support | flies | rice | planes
  3. Adjective(Adj) -> tall | flying | dangerous
  4. Verb(V) -> likes | hates | hate | like | rely | relies | fly | flies
  5. Preposition(P) -> on
  6. Article(Art) -> a | the | my
  7. Modals(M) -> can
Another term for word is terminal, because we have finished specifying that portion of a tree when we apply such a rule.
Derivation
(not parsing)
  Some trees obey the rules of our grammar, some do not. A derivation is a little step by step proof or demonstration that a particular tree obeys the rules of the grammar.

That portion of the derivation is completed or terminated when a word is used; hence terminal node.

Context Free grammars are an important part of our Theory of Computation, in the sense of Marr. Derivations are just a procedure for veriyfing that something is in grammar or out; they are NOT an algorithm or progarm for parsing.

In particular, we don't know how to assign a permissible structure to a new sentence. That's parsing .

Finite-
State Grammars
 

Restrict rules so that non-terminals must occur at the beginning or at the end of a rule.

Then we have a Finite-State Grammar. Finite-state grammars can be parsed using a particularly simple kind of computational model called finite-state automaton.

These are more restricted than cointext-free grammars, hence a proper subset of tehm. They are easier to process and to parse. We'll come back to them later.

Top-Down
Parsing
 

Think of this as an algorithm that takes one particular view of how to do a derivation. Suppose we are trying to parse the senternce Richard M. Nixon likes John:

  1. S -> NP VP
  2. NP -> ProperName
  3. ProperName -> John [terminal]
  4. VP -> V NP
  5. V -> likes
  6. NP -> ProperName
  7. ProperName -> Richard M. Nixon.
Bottum-up
PArsing
 
  1. ProperName -> John [terminal]
  2. NP -> ProperName
  3. V -> likes
  4. ProperName -> Richard M. Nixon.
  5. NP -> ProperName
  6. VP -> V NP
  7. S -> NP VP
Parser states  

First we add positions to our representation of the input.

    1 John 2 likes 3 Richard 4 M. 5 Nixon 6
The word "John" starts at position 1, ends at position 2; the word "Richard M. Nixon" starts at position 4 and ends at position 6.

A parser state is the current goal of the computation. It consists of two pieces of information:

  1. What we're looking for: A category
  2. Where we're looking for it: a position

We start in the following state:

  1. S
  2. 1
We're looking for an S that starts at position 1.
Step Curr. State Backup
States
Rewrite/
Found
1. ((S) 1)   initial position
2. ((NP VP) 1) ((VP) 1) S -> NP VP
3. ((Art N VP) 1) ((Art Adj N VP) 1)
((ProperName VP) 1)
((VP) 1)
NP -> Art N (current state)
NP -> Art Adj N (backup state)
NP -> ProperName (backup state)
4. ((Art Adj N VP) 1) ((ProperName VP) 1)
((VP) 1)
Backing up!!
"John" is not an Art
5. ((ProperName VP) 1) ((VP) 1) Backing up!!
"John" is not an Art
6. ((VP) 2) ((VP) 1) Found: (John ProperName 1 2)
7. ((V NP) 2) ((V PP) 2)
((VP) 1)
VP -> V NP
VP -> V PP
8. ((NP) 3) ((V PP) 2)
((VP) 1)
Found: (likes V 2 3)
9. ((Art N) 3) ((Art Adj N VP) 3)
((ProperName VP) 3)
((V PP) 2)
((VP) 1)
NP -> Art N (current state)
NP -> Art Adj N (backup state)
NP -> ProperName (backup state)
10. ((Art Adj N VP) 3) ((ProperName VP) 3)
((V PP) 2)
((VP) 1)
Backing up!!
"Richard" is not an Art
11. ((ProperName VP) 3) ((V PP) 2)
((VP) 1)
Backing up!!
"Richard" is not an Art
12. ((ProperName VP) 3) ((V PP) 2)
((VP) 1)
Backing up!!
"Richard" is not an Art
13. (6) ((V PP) 2)
((VP) 1)
Found: ("Richard M. Nixon" ProperName 3 6)
Done!!
14. ((V PP) 2) ((VP) 1) Backing up!!
Keep looking for more !!
15. ((PP) 3) ((VP) 1) Found: (V likes 2 3)
15. ((P NP) 3) ((VP) 1) PP -> P NP
16. (VP 1)   Backing up !!