Placeholder Image

Subtitles section Play video

  • The goal of the first three units in this course is to build a Web crawler

  • that will collect data from the Web for our search engine.

  • And to learn about big ideas in Computing by doing that.

  • In Unit 1, we'll get started by extracting the first link on a web page.

  • A Web crawler finds web pages for our search engine

  • by starting from a "seed" page and following links on that page to find other pages.

  • Each of those links lead to some new web page, which itself could have links that lead to other pages.

  • As we follow those links, we'll find more and more web pages

  • building a collection of data that we'll use for our search engine.

  • A web page is really just a chunk of text that comes from the Internet into your Web browser.

  • We'll talk more about how that works in Unit 4.

  • But for now, the important thing to understand is that

  • a link is really just a special kind of text in that web page.

  • When you click on a link in your browser it will direct you to a new page.

  • And you can keep following those links as a human.

  • What we'll do in this Unit is write a program to extract that first link from the web page.

  • In later units, we'll figure out how to extract all the links and build their collection for our search engine

The goal of the first three units in this course is to build a Web crawler

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it