Wikipedia 1b Team
Search engine for Wikipedia - Internet Information project at University of Amsterdam, 2006.
Team members:
- Aspasia Beneti
- Jiøí I¹a
- Markos Mylonakis
- Günther Starnberger
- Jeroen Steggink
- Lars Wortel
Abstract
Wikipedia, the free encyclopedia, is quickly increasing its influence on the Internet. More and more people link to it for term definitions, historical events description and even the news. In the same time the content of the encyclopedia grows in a rapid rate. Together with the increasing amount of available information it is becoming difficult to actually find the content of the interest. The internal search engine is down very often, suggesting the users to use Google or Altavista instead. And they do. And they do it next time as well, because the results "feel better". Is there a way to build a search engine which could use specific features of Wikipedia to improve the users' satisfaction?
Summary
- Categories in Wikipedia are highly incoherent. Some contain hundreds of articles, some only one or two. We show how to use incoming and outgoing links together with automatic definition extraction from the article to filter out irrelevant categories.
- We show how to use specific features of Wikipedia, mainly pointing a user to the relevant subsection of the article, and alternative titles (redirects) handling.
- The performance of the created search engine, based on Lucene, is compared to the current (as of May 2006) Wikipedia search engine (which we outperform), Wikii (larger UvA project) and Google.
The final report of the Wikipedia 1b team is available for download.
Preview
Because the search engine itself is not available anymore, you can check the screenshot here: