Placeholder Image

Subtitles section Play video

  • Last time you [Sean] did a really good animation about top-down parsing. We had

  • these sentences in this totally artificial grammar, which has only got

  • about 30 words in it, all in all. But we started here with is

  • and, in the top-down approach, you basically say: "OK, I'll push those on

  • the stack in reverse order: " And, starting at the top, of the

  • stack, at the left, you say: "Well, what can a be? So, you do these things on

  • your stack, top down, one at a time, and you start off at the root of the tree

  • and you develop the component parts, left to right, one at a time. It is a lot

  • easier, totally by hand, to write a top-down parser rather than a bottom-up one.

  • But what I shall be getting to, later on and in fact I'm starting right now, is to

  • say: "Well what happens if instead of starting up at the root and developing

  • the leaves of the tree, as it were, you start right down at the bottom, with the

  • text string that you know is correct, and try and work upwards you know can you

  • work back upwards, from the leaves to the root?"

  • So, in fact let's write that test sentence below here: the robot stroked two furry ..."

  • And actually, since layout doesn't matter too much at the moment I'll squeeze the

  • word 'dice' in at the right there. Now, be clear, we're talking about

  • bottom-up parsing now. In bottom-up parsing you start with the string that

  • you think is correct and you say: "Starting with a string can I look into the

  • rules and see how to work up the tree, not down the tree?" And, yes, so therefore

  • you're looking at possible matches on the right-hand side [of the rules] for components of

  • this string, reading from left to right. OK 'the'. How many ways are there you

  • can match the string 'the' against one of these classifications here? Well the

  • text - the string 'the' - is the right-hand side possibility of an icle. That's

  • one way to do it. Oh! and then look up here, right at the top of the grammar, if

  • you have an icle followed by a , like 'the' followed by something else,

  • that could be a ect, and that's looking good because right up at the top of the

  • grammar we want to end up with ect . Looking just at "the robot",

  • and looking at the grammar right-hand sides, I could do it by saying: "Well it's

  • the subject of the sentence, it's at the left-hand side, and if I go icle

  • I get 'the' and I get 'robot'. But what I've done, just to act as a talking point

  • and it illustrates a lot of things here, I've given you a shortcut. If you want to,

  • you can just do "the robot", with no further interior analysis at all. It's an

  • allowed phrase; it's the ect. Now I have to say that, as we develop this

  • story, we will get into bottom-up parsing. Because one of the tools were going to

  • use called 'yacc' basically produces bottom-up parsers for you, not top-down ones.

  • And it's a yacc behaviour symptom that it

  • loves, when you're trying to match text strings,

  • it likes to match the longest one that it can [legally] manage. So it is going to seize on

  • "the robot", all as one phrase, as being a wonderful long solution. Why doesn't it

  • use 'the' and then wait patiently for ? That's not the bottom up way.

  • If you can see a longer ... Oh! and this thing by the way, that you're looking at, is

  • built upon a stack, of course, it's called the 'handle'. I get a longer handle by

  • going for this option here and getting "the robot", all in one go. OK "the robot"

  • then, and you've got to get used to reading from right to left now,

  • in bottom-up parsing, "the robot" all as one phrase is an example of a ect. So, we

  • can now say: "OK, 'the robot' is my ect". Now that act - of picking up a substring

  • from your sentence and going upwards, and making it more abstract if you like - it's

  • called 'reducing', in bottom-up parsing. So, looking for a longer and longer and

  • longer string, to get your handle, that's called 'shifting', because you're shifting

  • characters, one after another, and making the string longer and longer and saying

  • "... can I go any further?" That's shifting but when you say: "Ooh! that's a nice long

  • string - and it matches" and then you go up and say: "Oh! that's my subject", that is

  • called 'reduction' because you're going to something simpler further up the tree.

  • So, you can tick that off as being done bottom-up. [The] next thing is, you see this

  • string of characters called 'stroked' and, once again, it's right-hand-side driven.

  • What is there, on the right hand side, and which rule is it, that could possibly

  • match 'stroked'? You see in here, against , 'bit', 'kicked' or 'stroked'. Those three

  • strings are your possibilities. So, that's fine. Going right to left you say 'stroked'

  • is an example of a . So we've got our there. Now,again,

  • I've cheated but it's wonderful fun! I have not analyzed "two furry dice" into

  • adjectives and nouns or anything like that. I've just put it in as a interesting

  • short-cut to have there. And it is an example of what I would call an object-phrase

  • Some of you, who are really good English linguists, may want to go on about

  • my lack of understanding about what a direct and indirect object are - not to

  • mention 'predicates' and so on. But please forgive me. I regard it as being a phrase

  • in an object position. So, I'm saying there's a quick match here and bottom-up

  • I love this: "two furry dice" is a great long handle. Oh! and if I match it

  • there, what's the left-hand side it corresponds

  • to? [Answer is] . OK then, we've won! We have worked bottom-up to having ect ect

  • on our stack starting with the string. And, what's more, we've exhausted the

  • [input] string now. It's the end of it. There's a sort of full stop after that.

  • There we are then. We've got top-down which tends to be

  • more - how shall we say? Eager?- you know a top-down parse would very probably leap on the

  • word 'the', and not bother to go any further because it's found a quick match for it,

  • whereas bottom-up is the other way round. It's basically saying: "I want the longest

  • possible handle". Even at this stage in the late 50s and early 60s there

  • was a sneaking suspicion, coming around, that actually bottom-up parsing was a

  • little bit more powerful than top-down. I'm going to put out a set of notes for

  • this so that you can look up for yourself. Just examples of why it [i.e. bottom up] is more

  • powerful. But roughly speaking I think you can sense that because you've not

  • only got something you're looking for but you've [also] got a handle that you've

  • already accumulated, it's like gathering more contextual information - going

  • bottom-up. But, on the other hand, handling the stack and working out what's

  • happening is a darn sight more complicated - if you do it by hand - coming bottom up.

  • Rather than doing it all by hand, why not me and you lot [together]. It's a good way

  • to learn 'lex' and 'yacc' In other words don't write the C directly yourself. Get

  • a software tool, like these two, to do it for you. So, that's exactly what

  • we're going to do. I've got the program 'putty' that does 'ssh' connected here.

  • I'm talking to my other Linux machine in the other room, where I have got set up a

  • parser - complete parser: front-end lexical analyzer [then] syntax analysis - for this

  • 'furry grammar' and all legal sentences in it. And I know, first of all, you will want me

  • to call up the program that implements this and the test sentence first of all is:

  • [In unison] "the robot stroked two furry dice". Here we go!

  • So, "furry". It's hanging there, it's waiting for us to give a correct furry sentence [reads from screen] "... dice ... return"

  • Look at that! It's happy with it! I'm giving it in subject-verb-object

  • order and I have numbered those rules, in the grammar as I showed you earlier, and

  • I now have, as it were, a map of how it has parsed it. Rule 3?

  • Now that is the one that effectively says I can do "the robot" all as one

  • phrase. It has chosen not to go for 'the' and 'robot' as icle and ,

  • as separate entities. It might well have done that, had I gone top-down, but because this yacc-confected

  • parser system goes bottom-up it's gone for the longest possible

  • handle at that stage and it's matched it. Rule 4: the middle piece is matched

  • 'stroked' as the verb and finally it has spotted, right at the very end, that I put

  • in another sneaky short-cut to "two furry dice" is Rule 6. And that is my parse.

  • So, should we try one more just to make sure? Go on Sean, tell me which one to try next.

  • >> Sean: Try "the woman bit the dog" >> DFB: Yep, "the woman bit the dog"

  • There you are look - Rule 2 for "the woman" now. Rule 2 not rule 3. if it has followed Rule 2 it's

  • gone down the icle route, which means it knows that's the only way to

  • match "the woman". There is no shortcut way. OK?

  • Rule 4 - a rule again; it chose 'bit' Rule 5: now this time, again, there is no

  • short-cut to "two furry dice" at Rule 6 You've got to go the long way around and

  • following Rule 5, you break it down into icle again: 'the' and 'dog'

  • So, there we are. You could say well you've written a compiler for the

  • 'furry' language, with the help of lex and yacc. We could go into details of that

  • later, if need be, but not now. It's fair enough but it's not doing anything really is it?

  • What more shall we do with this, now we've written it this far then Sean?

  • You tell me ?! >> Sean: Well I think next time we need to come up with an action, it needs to do something ....

  • >> DFB: We need to transform that grammar in some way.

  • Those of you who, in the previous video, actually bothered to look at the EXTRA BITS

  • may have had a sneak preview as to what we're going to do, as our much more

  • interesting actions now we have recognized the innate structure.

  • So, remember - always watch the EXTRA BITS !

Last time you [Sean] did a really good animation about top-down parsing. We had

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it