Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs
Autor: M. Keller, M. Nussbaumer Links:
Quelle: Emerging Intelligent Data and Web Technologies (EIDWT), pp. 99-106, Tirana, Albania, September 2011
Large Web sites contain a plethora of different menus and navigation aids, which implement systems of content organization as hierarchies, linear structures or matrices. Humans are able to decode the fine-grained content organization because they are aware of the different access methods provided by navigation systems and understand the higher-level information architecture. In contrast, current methods of link analysis cannot extract such a detailed model of the information architecture and are not able to recognize site boundaries and content hierarchies the way humans do. In this paper present a new approach of mining navigation systems that increases the precision of Web structure mining. Instead of analyzing the complete Web graph spanned by pages and hyperlinks, sub graphs called Navigation Structure Graphs (NSGs) are analyzed. A NSG represents the hyperlinks belonging to a certain navigation system. We demonstrate the capabilities of NSGs for analyzing the organization of Web sites and present our research on mining NSGs.