Methods in Statistical NLP

Earley Parsing


Complete Edges and Incomplete Edges

In class we identified two problems for naive depth first topdown parsers:

  1. Left recursion led to infinite loops.
  2. Sometimes work had to be done over because we kept no record of where we had found constituents.

We solve both problems with the idea of the chart.

  1. We attach to our set of words a set of indices, which we will use to let us know where constituents start and stop.
  2. We keep track of what we have built with what we call complete edges in the chart, noting where a contituent stops and where it starts. For the lexical complete edges in the above we want:
  3. We keep track of what we are looking for with incomplete edges, saying what we are looking for and where it starts (we don't know where it ends yet). We start out looking for an S at 0.
  4. Matters are complicated slightly in the Earley algorithm. We keep incomplete and incomplete edges in a single structure (called a parser state in your text). This records what rule we're using to find a category, and how far we've gotten. As long as we haven't gotten all the way through the rule, the edge is incomplete; when we've found the last constituent in the rule, it's complete. Some Earley edges (parser states):
    1. S => • NP VP [0,0]: Incomplete. We're trying to build an S that starts at 0 using the NP VP rule. We've found nothing yet (the dot is in first position), we're currently looking for an NP that starts at 0.
    2. S => NP • VP [0,1]: Incomplete. We're trying to build an S that starts at 0 using the NP VP rule. We've found an NP (the dot is in second position), we're currently looking for an VP that starts at 0.
    3. S => NP VP • [0,3]: Complete. We've succeeded in building an S that starts at 0 using the NP VP rule. It ends at 3 (the dot is in the last position), we're currently looking for an VP that starts at 0.

The Earley Algorithm

The essential ideas.

  1. Initialize. To look for an S at 0, we add the following incomplete edge at 0, which uses a dummy category gamma,
  2. Predictor: For each edge of the form
      X => alpha • B beta [k, j],
    where B is NOT a lexical category, we add incomplete edges expanding B at [j,j]. For example, given and our grammar (the grammar of Fig 10.2), we add:
    1. S => • NP VP [0,0].
    2. S => • Aux NP VP [0,0].
    3. S => • VP [0,0].
    One of these,
      S => • NP VP [0,0].
    immediately leads to:
    1. NP => • Det Nominal [0,0].
    2. NP => • Proper-Noun [0,0].
  3. Scanner: For each edge of the form
      X => alpha • B beta [k,j],
    where B is a lexical category, we check the input, and if it is of the right category, we add a complete edge for B at [j,j+1]. For example, given and our lexicon and the example sentence (Book that flight), we add:
      det => that • [1,2].
  4. Completer: Whenever we build a complete edge of category B beginning at index i, we check for incomplete edges looking for a B beginning at i. For example, in our example when we create
      NP => Det Nominal • [1,3],
    we find the following:
      VP => Verb • NP [0,1].
    And so we add a new edge which advances the dot by an NP. For our example we add:
      VP => Verb NP • [0,3]
    In this case we've added another complete edge.
  5. The algorithm
  6. Earley used as a recognizer: The sequence of states for Book that flight
  7. Earley used as a parser: The sequence of states for Book that flight
  8. A fragment of the Earley parser chart

Homework assignment

Problem 4: Parsing

A. Explain what gave rise to S24 and S25 for the parser chart in Figure 10.18. Note that analagous edges arise for the recognizer chart in Figure 10.17, so that the answer doesn't have anything to do with using the algorithm for parsing rather than recognition. Also note that the answer isn't just: The predicter put S24 and S25 there. Why does the predictor put S24 and S25 in the chart? Don't be a scared to look at the algorithm in Figure 10.16 to get your answer.

B. For this next part, use the grammar in Fig 10.2, but add the word landed as a Verb. This means the Verb rule in the second column is now:

Be an Earley parser. Parse the sentence:

Show the Earley parsing chart in the same format as is used in Figure 10.18. Assign names to all the edges and show them. Organize the edges, as in Figure 10.18, according to what index they end at. But don't try to give them in the order they're created.

C. Extra for programmer types. Exercise 10.4, p. 394.