Presentation Transcript
Issues in Migrating to XML based Content Management for Academic Web Portal: Issues in Migrating to XML based Content Management for Academic Web Portal Lorenzo Sommaruga
DTI-SUPSI (University of Applied Sciences of Southern Switzerland), Manno Switzerland
lorenzo.sommaruga@supsi.ch
Content: Content Content Values
Why?
What?
Web (R)evolution
Rationales for the migration from HTML to XML
Analysis of the HTML web site
How?
Issues involved in the migration process from HTML to XML
Conclusion
Content Values: Content Values This year's conference theme
“Documenting the Future”
Content Past
= storing, history Present
= dynamic Future
= valuable assets
Why? The Problem: Why? The Problem Initial situation ( 400 employees
HTML web site:
~2000 static html pages
280 folders
developed during 4 or 5 years by
various authors (> 10)
not consistent
a collection of sites created and managed by different departments and institutes
Why? “No craftsman thinks for himself!”: Why? “No craftsman thinks for himself!” Sample proverbs
“I figli del ciabattino camminano scalzi” =
The shoemaker's children have no shoes
“il fabbro ha lo spiedo di legno”
Why? “No computer scientist thinks for himself!”: Why? “No computer scientist thinks for himself!” Computer science university departments are expected to take benefits from the application of concepts, design models, architectures, and technologies usually taught in their courses
Their value, capabilities, and competences can be shown through a well-designed web site
"In the network economy, the web site becomes a company's primary interface to the customer"
What? : What? To reorganize the old web site
Project
a medium-large size university web portal
using the XML technology
for content management
What? Paper Objective: What? Paper Objective Analyse issues involved in the migration process from the HTML to the XML web site
Web (R)evolution: Web (R)evolution Current state of the web
What is happening to the web?
A real
evolution and
revolution
Evolution – Where we are – Past+Present: Evolution – Where we are – Past+Present Current Web
net of information nodes
connected each other by
hypertextual links
Initial idea built on the HTML language
Tim Berners-Lee, invents the World Wide Web in 1990
Evolution – Where we go – Present+Future: Evolution – Where we go – Present+Future Towards new frontiers for
Representing and using hypermedia contents
Efficient management of the huge informative mass
rapidly generated
on continuous change, dynamic
easy of retrieval
Evolution – Roadmap: Evolution – Roadmap XML emerges (eXtensible Markup Language, 1998 http://www.w3c.org/XML/)
as a standard content mark-up language
Web services emerge
for developing interoperable application on the net with standard languages for communication (SOAP) and service description (WSDL)
… Semantic Web, RDF
Evolution – Expectations: Evolution – Expectations Effects of evolution are turned in terms of benefits in:
information organization
its search and retrieval
$ $ Companies: $$
Revolution: Revolution Innovation at the fundaments of the web, to its raw material
Contents to be used in a revolutionary, different, and deeper way
maturity of market interests, technologies, and infrastructures
the need for machine that are able to understand and semantically elaborate contents
the need for the semantic web
Revolution – Machine oriented web: Revolution – Machine oriented web Until now the web was oriented to humans
The new web is oriented towards machines and programs
very fast in making computations and logic deductions
However, information is difficult to be retrieved on the web, and
information quality and accuracy are very low
Revolution – Separation of content from presentation: Revolution – Separation of content from presentation Content managed and manipulated in web servers in order to be
aggregated, filtered, elaborated, etc. and then
presented in an always up-to-date way and optimized for the web consumer
Focus on the concept of content:
“content” is what there is inside a container
container = the web page
independently from how it is presented within the page, its style, colour, page position
Rationales for the migration from HTML to XML: Rationales for the migration from HTML to XML
1. A change
2. Advantages
Rationales - 1. A Change: Rationales - 1. A Change from simple information page editing (traditional web site)
to a more complex process of content management
taking into consideration the real user needs:
navigating
publishing
checking information in a fast and easy way
security and access control who is allowed to see and publish contents
how the information is presented on the web site
what working flow functionalities are available to the intranet user
Rationales - 2. Advantages: Rationales - 2. Advantages XML based web content management:
the separation of content from style
data interoperability with portable and re-usable contents
the semantic markup
explicit meta information about the contents
makes possible an elaborated, more "intelligent" information treatment, setting both structural and content foundations towards a new "semantic" web where the
Main goal is to "Transform all information into valuable assets."
Rationales - The Site Usefulness: Rationales - The Site Usefulness Reduced by some factors:
content and presentation inconsistency
low homogeneity (various sub-sites)
navigation disorientation
not all departments has
competences
resources
time
to keep content updated
Rationales - Solution Adopted: Rationales - Solution Adopted A decentralized model
to maintain the different sites in order to guarantee an appropriate autonomy level
A centralized control for unifying the style
A content management system
supporting XML
Analysis of the HTML web site: Analysis of the HTML web site Goal:
identifying the main components, and
evaluating the content updating issue
Academic Web Site Modules: Academic Web Site Modules From our analysis:
Teaching (didattica)
Research
Services
General information about the school ("SUPSI briefly")
University live (“SUPSI live”)
Frequency of Content Changes – Static: Frequency of Content Changes – Static A small part of content can be considered static
information in the module "SUPSI in breve" (SUPSI briefly) about:
history
mission
location
etc.
Frequency of Content Changes – Occasionally: Frequency of Content Changes – Occasionally A part changes occasionally:
the organization chart
lecturers' details
etc.
Frequency of Content Changes – Regularly: Frequency of Content Changes – Regularly A part changes regularly:
courses
course timetable
calendar
some templates and administration documents (yearly or per term)
Frequency of Content Changes – Frequently: Frequency of Content Changes – Frequently Another part is newly generated on a week rate
information in the module in "SUPSI live“:
news
events
seminar announcements
etc.
In the existing SUPSI web site this dynamic part is the central part of the home page
The current Web Site: The current Web Site
The current XML Web Site: The current XML Web Site
How? Development Platform and Implementation: How? Development Platform and Implementation Apache Cocoon (v.2.1.3): XML open source web publishing tool
Separation of concerns model
Content delivery in
multiple formats: e.g. HTML, WML, PDF, SVG, RTF, …
How? Architecture Scheme: Web Server HTTP Gateway Interface Application Server
Servlet, ASP, JSP TOMCAT
Cocoon
XML Parser + XML Processor
browser IE
NS
…
Mobile WML
Stampante PS,PDF
…
XML
Schemas
DTDs Templates
C.I.
pages
XML+XSL Structures Content Processing (un- semi-) Structured Content Other search
& query XML search
(Lucene)
RDF
Descr.
Resources Metadata How? Architecture Scheme Data Layer Logic Layer Presentation Layer
How? Strategic Project Development: How? Strategic Project Development Taking into account
the XML potential
the size of the school information system
priority to modules
with more dynamic content (> benefits)
the school is more interested in
the teaching module has been selected for starting the migration process
The Teaching Module: The Teaching Module A course management module in the new XML web site
Added functionalities:
PROTECTED access
INSERTION of a new course
DEFINITION of CURRICULA
UPDATING of courses and curricula
VISUALIZATION of courses
The Teaching ModuleProtected Access: The Teaching Module Protected Access access restricted to allowed people such as teachers or staff
The Teaching ModuleInsertion/Updating: The Teaching Module Insertion/Updating Insertion or Updating of a new course
Definition of metadata about a course following Bologna’s model
Entered by means of a validated form
Stored via XML and Cocoon in the central DB of the school (Oracle)
The Teaching ModuleDefinition of Curricula: The Teaching Module Definition of Curricula DEFINITION of CURRICULA
by
associating existing courses
to semesters
to an academic year, and
to a curriculum
E.g.
Curric. Informatica
Course DTI-PROG
Sem. 1 and 2
Acad. Year 2003/04
The Teaching ModuleVisualization of Courses - HTML: The Teaching Module Visualization of Courses - HTML Visualization of a course in a dynamic HTML page
Note the PDF icon which links to the corresponding PDF format
The Teaching ModuleVisualization of Courses - PDF: The Teaching Module Visualization of Courses - PDF Visualization of a course in a dynamically created PDF page
The Teaching ModuleVisualization of Curricula - HTML: The Teaching Module Visualization of Curricula - HTML Visualization of a curriculum in HTML format
The Teaching ModuleVisualization of Curricula - PDF: The Teaching Module Visualization of Curricula - PDF Visualization of a curriculum in PDF format
Versatility of contents
Original data are the same as for the HTML page
taken from DB into XML and transformed into different output format via Cocoon
Issues involved in the migration process from HTML to XML: Issues involved in the migration process from HTML to XML Basic issues involved in the
creation and
management
of an XML based medium-large size university web site
Issues – 1. Structure: Issues – 1. Structure Structure of an academic web portal
Often a collection of various sites
Created and managed by different departments
Each with its own content type and layout
Content and presentation inconsistency
Low homogeneity and disorientation
Issues - 2. Resources: Issues - 2. Resources Different availability of
competences
resources and
time
in different departments to keep content updated
considerably reducing the global university site quality
and usefulness
Issues – 3. Content Management : Issues – 3. Content Management The use of a CMS (content management system) to solve the previous issues implies
changes in the content management process,
web master role replaced by
new distributed roles
more focused on the content rather than on its formatting
different authoring process facilitated by masks or supported by user-friendly tools
need for training people to adopt new tools
Publication process more straightforward
Issues – 4. Project Management : Issues – 4. Project Management making project managers aware of the XML potential
and of the global management costs
CMS implies an initial arise of cost for:
re-organizing the information corpora
training people to use new procedure and technologies
initial costs are rewarded by the long term benefits of an easy maintenance and reuse
The difficulty is understanding the real benefits and long term effects
Project rejections, low investments, etc.
Issues – 5. Designing : Issues – 5. Designing Designing homogeneous interfaces
Corporate Identity
XML: separates content (.xml) from presentation (.xsl, .css)
XSLT: filtering and transforming
XSLT: multiple output format
Issues – 6. Retrieving : Issues – 6. Retrieving Providing appropriate end-user facilities for searching and navigating
XML: meta information added-value
describes single pieces of content
and their semantics
Solid and durable contents
Markup COST
Conclusions : Conclusions Lessons learned from the migration process from the
HTML to the XML
SUPSI web site
Lesson 1 : Lesson 1 Migrating to XML is not only translating HTML to XML pages
Implies using a CMS which exploits the XML format
Particularly useful to manage dynamic content
The case of an academic web site: most part of information changes occasionally or regularly
Lesson 2 : Lesson 2 Exploiting the XML format means:
Reusable and portable contents
Different views or presentations of the same content
to different users or
in different context
Lesson 3 : Lesson 3 Main Issues Summary
difficulty in understanding the benefits and effects
initial high costs
need for new professional roles for web content management
need for specific training
Delay in the development
Questions : Questions
Lorenzo Sommaruga DTI-SUPSI
University of Applied Sciences of Southern Switzerland Manno Switzerland lorenzo.sommaruga@supsi.ch