15 December 2015

Yahoo's web crawlers is Open Source: Anthelion now available under Apache license

With Anthelion Yahoo has released its Web crawler for structured data under open source license. The software works as a plug-in for Apache Nutch.

Open source web crawler: Yahoo Anthelion searches the semantic web



Semantic annotations such as using RDFa making web content understandable to machines. Now, Yahoo released with Anthelion a Web crawler, which is to accurately read the data. The software works as a plug-in for Apache Nutch and was released under the free Apache 2.0 license.
Anthelion is a crawler for the semantic web. (Graphic: Yahoo)
Anyone interested in the functioning of the crawler, the Paper "should Focused Crawling for Structured Data" of employees of Yahoo Labs and Robert Meusel reading of the University of Mannheim. The source code of the software can be found on the project of Anthelion GitHub page.

Anthelion: Applications for the Web crawler

Anthelion was designed to most effectively search for matching data on the Web. When feasible application called Yahoo for example, the search for web pages, give them the information about movies. The highlight: Thanks to online learning algorithms should be Anthelion extremely effective in finding other related websites.

No comments:

Post a Comment