Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova and Che Wilcox Human Computer Interaction GroupBCL Technologies Inc. Santa Clara, CA [email protected] : Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova and Che Wilcox Human Computer Interaction GroupBCL Technologies Inc. Santa Clara, CA
[email protected]
Overview of the talk : Overview of the talk Web pages vs. document layout
Why do we need layout information?
Web page summarization for handheld devices
The future: Marrying Ontology with XML
Conclusion and Future Work
Slide 3: Related Work
Slide 4: Web Page Summarization for Handheld Devices Web Page Data Structure Content Analysis Content Processing for Re-authoring Node Merging Representing the Complete Web page
Slide 5: Web Page Summarization for Handheld Devices
The Future: Marrying Ontology with XML : The Future: Marrying Ontology with XML We assume that we have layout information for a web page
What do we do then?
How do we use this information?
How do that information help us in getting better re-authoring solutions? We then define an ontology for that domain! We define an XML to code that information
Slide 7: To define an ontology for the domain of web pages What is Ontology and How do We Define it? Ontology establishes a joint terminology between members of a community of interest. These members can be human or automated agents. A list of elements
Concept hierarchy
Concept association
Rules or axioms
Slide 8: A List of Elements in the Web Domain
Slide 9: Concept Hierarchy and so on…
Slide 10: Concept Association and so on…
Slide 11: Rules or Axioms and so on…
Slide 12: Web Page Summarization for Handheld Devices using Ontology Web Page Data Structure Content Analysis Content Processing for Re-authoring Node Merging Representing the Complete Web page Use Ontology to re-format the web page XML Structure Derived Device Specific Display
Slide 13: What is the Advantage of using Ontology? It improves the quality of the output in many ways.
It becomes possible to capture the contextual relationship among various components within the document
It leads to better understanding of the information contained within the document.
This additional information can be used in other processes, such as document categorization and contextual search.
Future Work : Future Work It is assumed that the future of mobile browsing lies in the adoption of semantic web technology.
Before that realizes, the proposed approach offers a workable compromise to generate high fidelity re-authored web pages.
This is an exploratory paper offering a specific pathway to the future of web page re-authoring provided accurate layout information is available.
Currently, it is beyond the capability of any algorithm to achieve this level of accuracy. However, approximations to that accuracy are attainable and even practical. It will be interesting to discuss other possibilities in this space.
Conclusions : Conclusions Some ideas about how to produce better web page re-authoring solutions by using linguistic knowledge and ontology assuming accurate layout information for web pages is available.
It is shown that such an approach will produce high quality intelligent summary for web pages allowing fast and efficient web browsing on small display handheld devices.