logging in or signing up tidwell BAWare Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 158 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: June 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Schema Wrangling forFun and Profit: Schema Wrangling for Fun and Profit Doug Tidwell IBM developerWorks dtidwell@us.ibm.com Where we started: Where we started We have too many DTDs – we need to move to schemas No real validation (too much legacy HTML) Lack of good tools for authors and editors Lack of good documentation Difficult to separate data and metadata for authors and editors Where we're going: Where we're going Everything is schema-based All documents are validated before being stored in our databases Editing tools automate as much of the authoring process as possible (and are schema-based!) Documentation is generated from the schema Authors and editors see different views of the documents DTDs: DTDs DTDs: DTDs XML’s original attempt at defining valid document structures was the Document Type Definition. This is a holdover from the old SGML days; XML DTDs are very similar to SGML DTDs. Document structure: Document structure DTDs define: The tags that can or must appear How tags can be nested How often tags can appear Attributes of the tags Default values of attributes All valid values of certain attributes So what’s wrong with DTDs?: So what’s wrong with DTDs? DTD syntax is different from XML syntax DTDs can't express certain constraints easily (if at all) Element b can occur from 4 to 17 times A zip code is a five digit number, optionally followed by a hyphen and a four digit number Schemas to the rescue! Schema basics: Schema basics Schema basics: Schema basics The schema specification is in three parts at w3c.org: Part 0 – Tutorial Edited by IBM’s David Fallside, a really good introduction to schemas Part 1 – Structures Rules for defining XML document structures Part 2 – Datatypes Rules for defining datatypes Our earlier examples: Our earlier examples Element b can occur between 4 and 17 times: andlt;xsd:element name='b' type='xsd:string' minOccurs='4' maxOccurs='17'/andgt; Our earlier examples: Our earlier examples A zip code is a five digit number, optionally followed by a hyphen and a four digit number: andlt;xsd:simpleType name='postcode'andgt; andlt;xsd:restriction base='xsd:string'andgt; andlt;xsd:pattern value='[0-9]{5}(-[0-9]{4})?'/andgt; andlt;/xsd:restrictionandgt; andlt;xsd:simpleTypeandgt; Documents and schemas: Documents and schemas To associate a schema with a document, you can use attributes on the root element: andlt;?xml version='1.0' ?andgt; andlt;address xmlns:xsi='http://...' xsi:noNamespaceSchemaLocation= 'doug.xsd'andgt; Documents and schemas: Documents and schemas If your root element is from a particular namespace, you use a different attribute: andlt;?xml version='1.0' ?andgt; andlt;x:address xmlns:x='http://...' xmlns:xsi='http://...' xsi:schemaLocation= 'http://...'andgt; Documents and schemas: Documents and schemas If you’re writing an application that validates the document, you can set the schema externally to the document: parser.setProperty( 'http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation', 'memory.xsd'); A list of elements: A list of elements An x, followed by a y, followed by a z: andlt;xsd:sequenceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:sequenceandgt; A choice of elements: A choice of elements An x, or a y, or a z: andlt;xsd:choiceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:choiceandgt; Combining techniques: Combining techniques Either an x, or a y followed by a z: andlt;xsd:choiceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:sequenceandgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:sequenceandgt; andlt;/xsd:choiceandgt; Enumerations: Enumerations Here’s a data type that can contain the values red, blue, or green: andlt;xsd:simpleType name='color'andgt; andlt;xsd:restriction base='xsd:string'andgt; andlt;xsd:enumeration value='red'/andgt; andlt;xsd:enumeration value='blue'/andgt; andlt;xsd:enumeration value='green'/andgt; andlt;/xsd:restrictionandgt; andlt;/xsd:simpleTypeandgt; A side note on enumerations: A side note on enumerations To make life simpler, we wanted case-insensitive enumerations. We want to accept blue, BLUE, or bLuE in the previous example. This isn't supported by XML schemas, but there's a workaround. See ibm.com/developerworks/library/x-case for all the details. Groups: Groups With XML schema, you can define groups of elements or attributes. These are very useful for defining elements or attributes that are repeated throughout your schema. Groups: Groups andlt;xsd:attributeGroup name='dmy'andgt; andlt;xsd:attribute name='day' type='day'/andgt; andlt;xsd:attribute name='month' type='month'/andgt; andlt;xsd:attribute name='year' type='year'/andgt; andlt;/xsd:attributeGroupandgt; Annotations: Annotations Annotations: Annotations At least 50% of the text of our XML schema is in andlt;xsd:annotationandgt; and andlt;xsd:documentationandgt; elements. We use these to document the constraints and intentions of our schema. We also have a stylesheet that processes these in a specialized way. Annotations: Annotations Our stylesheet uses the various annotations and the structure of the schema to build complete documentation. We have another stylesheet that removes all of the annotations; we don't want to have to read all of them at runtime. Annotations: Annotations andlt;xsd:element name='p'andgt; andlt;xsd:annotationandgt; andlt;xsd:documentation xml:lang='en'andgt; andlt;titleandgt;Define a paragraph andlt;/titleandgt; andlt;descandgt;Defines a paragraph of text.andlt;/descandgt; Cool things about annotations: Cool things about annotations Notice that we use an xml:lang attribute on the andlt;xsd:documentationandgt; element. That means we can create documentation for as many languages as we want, then generate separate documents for each language. The andlt;xsd:documentationandgt; element can contain anything. Generating the documentation: Generating the documentation Some of the documentation is in separate files; we use special tags in the schema to import those when we generate the documentation. Generating the documentation: Generating the documentation Other things are generated automatically. Given an element, what are its potential parents and children? What are its attributes? We can determine these automatically from the structure of the XML schema. Our stylesheet isn't complete, but it works for us. Validating with schemas: Validating with schemas Validating with schemas: Validating with schemas It's relatively simple (heh): XMLReader parser = (XMLReader) new SAXParser(); parser.setFeature ('http://xml.org/sax/features/ validation', true); parser.setFeature('http://xml.org/sax/ features/namespaces', true); parser.setFeature('http://apache.org/xml/features/validation/schema', true); parser.setErrorHandler(this); parser.parse(uri); Validating with schemas: Validating with schemas Always keep in mind that schema validation is expensive. Our approach is to validate everything as it goes into our database; once it's in there, we don't validate it again. Your favorite database vendor likely has functions to enforce this at the SQL level… Character entities: Character entities Character entities: Character entities One major annoyance of XML schemas is that you can't use character entities. Well, not easily, anyway. There are a couple of ways around this. Character entities: Character entities One approach is to paste a bunch of entities at the top of an XML file: andlt;!DOCTYPE dw:tutorial [ andlt;!ENTITY tilde 'andamp;#126;'andgt; andlt;!ENTITY florin 'andamp;#131;'andgt; andlt;!ENTITY elip 'andamp;#133;'andgt; As you might guess, this is tedious… Character entities: Character entities Another approach is to use a notation: andlt;xsd:notation id='xhtml-lat1' name='XHTMLLatin1' public='-//W3C/ENTITIES…' system='http://www.w3.org/…' /andgt; Character entities: Character entities This is the way things are done in the 'Modularization of XHTML in XML Schema' document. See www.w3.org/TR/xhtml-m12n-schema for more information. Character entities: Character entities The actual file of entities (xhtml-lat1.ent) looks like this: andlt;!-- non-breaking space --andgt; andlt;!ENTITY nbsp 'andamp;#160;'andgt; andlt;!-- inverted exclamation mark --andgt; andlt;!ENTITY iexcl 'andamp;#161;'andgt; andlt;!-- cent sign --andgt; andlt;!ENTITY cent 'andamp;#162;'andgt; Future work: Future work Stuff we haven't gotten around to yet… Future work: Future work Here are some ideas we think are cool, we just haven't had time to work on them yet: Use Cocoon to publish XML schemas. The published version would give you an HTML page with links to the raw source and to the generated documentation. Use XSLT to convert the schemas into HTML forms. Future work: Future work More to-dos: Use XSLT to create a Swing-based, state-aware editor based on the schema. The editor would be guaranteed to create a valid XML document. Generate SQL CREATE TABLE statements based on the schema. This might be useful, but we don't anticipate using it very often. Resources: Resources developerWorks XML zone: developerWorks XML zone ibm.com/developerWorks/xml has tons of XML resources: Tutorials Articles Sample code And it’s all FREE!!! developerWorks tutorials: developerWorks tutorials Validating XML A good overview of techniques for validating XML documents, including DTDs and schemas XML Schema Validation in Xerces-Java 2 A more advanced tutorial on how to use the Xerces parser to validate XML documents with XML schemas. The Apache XML project: The Apache XML project xml.apache.org has lots of free tools, all of which are open source and standards compliant: Xerces XML parser Xalan XSLT engine FOP rendering tool Cocoon XML publishing engine Buy this book!: ISBN 0596002521 Order your copy at amazon.com today! The definitive reference, IMHO. Buy this book! Another good resource…: ISBN 0201740958 Order your copy at amazon.com today! A good desk reference, it covers XML, XPath, XSLT, schemas, SOAP, and more. Another good resource… Shameless self-promotion: Shameless self-promotion Shameless self-promotion: ISBN 0596000537 Order your copy at amazon.com today! Makes a great gift! Show your loved ones how much you care by showing them the power of XSLT! Shameless self-promotion Schamlos – Teil zwei: Schamlos – Teil zwei Wer XML bereits kennt und ein Problem schnell lösen möchte oder noch nach einem Problem sucht, für den ist XSLT genau richtig. Doug Tidwell richtet sich an XML-erfahrene Entwickler, die schnell in die komplexen Sphären von XSL eintauchen möchten und Antworten auf ihre eigenen Problemstellungen suchen. – Norbert Hartl See www.oreilly.de/catalog/xsltger. Shameless – Part 3: Shameless – Part 3 ISBN 0596000952 Also available at amazon.com. Co-written with James Snell and Paul Kulchenko. German and Italian translations available! Thanks!: Thanks! Doug Tidwell dtidwell@us.ibm.com Slides and samples at ibm.com/ developerWorks/speakers/dtidwell/ To-Do: To-Do Look at James Clark's tools for DTD-to-XML conversion Look at DocBook's list of entities Set up an XML document that pushes the notation concept You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
tidwell BAWare Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 158 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: June 17, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Schema Wrangling forFun and Profit: Schema Wrangling for Fun and Profit Doug Tidwell IBM developerWorks dtidwell@us.ibm.com Where we started: Where we started We have too many DTDs – we need to move to schemas No real validation (too much legacy HTML) Lack of good tools for authors and editors Lack of good documentation Difficult to separate data and metadata for authors and editors Where we're going: Where we're going Everything is schema-based All documents are validated before being stored in our databases Editing tools automate as much of the authoring process as possible (and are schema-based!) Documentation is generated from the schema Authors and editors see different views of the documents DTDs: DTDs DTDs: DTDs XML’s original attempt at defining valid document structures was the Document Type Definition. This is a holdover from the old SGML days; XML DTDs are very similar to SGML DTDs. Document structure: Document structure DTDs define: The tags that can or must appear How tags can be nested How often tags can appear Attributes of the tags Default values of attributes All valid values of certain attributes So what’s wrong with DTDs?: So what’s wrong with DTDs? DTD syntax is different from XML syntax DTDs can't express certain constraints easily (if at all) Element b can occur from 4 to 17 times A zip code is a five digit number, optionally followed by a hyphen and a four digit number Schemas to the rescue! Schema basics: Schema basics Schema basics: Schema basics The schema specification is in three parts at w3c.org: Part 0 – Tutorial Edited by IBM’s David Fallside, a really good introduction to schemas Part 1 – Structures Rules for defining XML document structures Part 2 – Datatypes Rules for defining datatypes Our earlier examples: Our earlier examples Element b can occur between 4 and 17 times: andlt;xsd:element name='b' type='xsd:string' minOccurs='4' maxOccurs='17'/andgt; Our earlier examples: Our earlier examples A zip code is a five digit number, optionally followed by a hyphen and a four digit number: andlt;xsd:simpleType name='postcode'andgt; andlt;xsd:restriction base='xsd:string'andgt; andlt;xsd:pattern value='[0-9]{5}(-[0-9]{4})?'/andgt; andlt;/xsd:restrictionandgt; andlt;xsd:simpleTypeandgt; Documents and schemas: Documents and schemas To associate a schema with a document, you can use attributes on the root element: andlt;?xml version='1.0' ?andgt; andlt;address xmlns:xsi='http://...' xsi:noNamespaceSchemaLocation= 'doug.xsd'andgt; Documents and schemas: Documents and schemas If your root element is from a particular namespace, you use a different attribute: andlt;?xml version='1.0' ?andgt; andlt;x:address xmlns:x='http://...' xmlns:xsi='http://...' xsi:schemaLocation= 'http://...'andgt; Documents and schemas: Documents and schemas If you’re writing an application that validates the document, you can set the schema externally to the document: parser.setProperty( 'http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation', 'memory.xsd'); A list of elements: A list of elements An x, followed by a y, followed by a z: andlt;xsd:sequenceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:sequenceandgt; A choice of elements: A choice of elements An x, or a y, or a z: andlt;xsd:choiceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:choiceandgt; Combining techniques: Combining techniques Either an x, or a y followed by a z: andlt;xsd:choiceandgt; andlt;xsd:element name='x'/andgt; andlt;xsd:sequenceandgt; andlt;xsd:element name='y'/andgt; andlt;xsd:element name='z'/andgt; andlt;/xsd:sequenceandgt; andlt;/xsd:choiceandgt; Enumerations: Enumerations Here’s a data type that can contain the values red, blue, or green: andlt;xsd:simpleType name='color'andgt; andlt;xsd:restriction base='xsd:string'andgt; andlt;xsd:enumeration value='red'/andgt; andlt;xsd:enumeration value='blue'/andgt; andlt;xsd:enumeration value='green'/andgt; andlt;/xsd:restrictionandgt; andlt;/xsd:simpleTypeandgt; A side note on enumerations: A side note on enumerations To make life simpler, we wanted case-insensitive enumerations. We want to accept blue, BLUE, or bLuE in the previous example. This isn't supported by XML schemas, but there's a workaround. See ibm.com/developerworks/library/x-case for all the details. Groups: Groups With XML schema, you can define groups of elements or attributes. These are very useful for defining elements or attributes that are repeated throughout your schema. Groups: Groups andlt;xsd:attributeGroup name='dmy'andgt; andlt;xsd:attribute name='day' type='day'/andgt; andlt;xsd:attribute name='month' type='month'/andgt; andlt;xsd:attribute name='year' type='year'/andgt; andlt;/xsd:attributeGroupandgt; Annotations: Annotations Annotations: Annotations At least 50% of the text of our XML schema is in andlt;xsd:annotationandgt; and andlt;xsd:documentationandgt; elements. We use these to document the constraints and intentions of our schema. We also have a stylesheet that processes these in a specialized way. Annotations: Annotations Our stylesheet uses the various annotations and the structure of the schema to build complete documentation. We have another stylesheet that removes all of the annotations; we don't want to have to read all of them at runtime. Annotations: Annotations andlt;xsd:element name='p'andgt; andlt;xsd:annotationandgt; andlt;xsd:documentation xml:lang='en'andgt; andlt;titleandgt;Define a paragraph andlt;/titleandgt; andlt;descandgt;Defines a paragraph of text.andlt;/descandgt; Cool things about annotations: Cool things about annotations Notice that we use an xml:lang attribute on the andlt;xsd:documentationandgt; element. That means we can create documentation for as many languages as we want, then generate separate documents for each language. The andlt;xsd:documentationandgt; element can contain anything. Generating the documentation: Generating the documentation Some of the documentation is in separate files; we use special tags in the schema to import those when we generate the documentation. Generating the documentation: Generating the documentation Other things are generated automatically. Given an element, what are its potential parents and children? What are its attributes? We can determine these automatically from the structure of the XML schema. Our stylesheet isn't complete, but it works for us. Validating with schemas: Validating with schemas Validating with schemas: Validating with schemas It's relatively simple (heh): XMLReader parser = (XMLReader) new SAXParser(); parser.setFeature ('http://xml.org/sax/features/ validation', true); parser.setFeature('http://xml.org/sax/ features/namespaces', true); parser.setFeature('http://apache.org/xml/features/validation/schema', true); parser.setErrorHandler(this); parser.parse(uri); Validating with schemas: Validating with schemas Always keep in mind that schema validation is expensive. Our approach is to validate everything as it goes into our database; once it's in there, we don't validate it again. Your favorite database vendor likely has functions to enforce this at the SQL level… Character entities: Character entities Character entities: Character entities One major annoyance of XML schemas is that you can't use character entities. Well, not easily, anyway. There are a couple of ways around this. Character entities: Character entities One approach is to paste a bunch of entities at the top of an XML file: andlt;!DOCTYPE dw:tutorial [ andlt;!ENTITY tilde 'andamp;#126;'andgt; andlt;!ENTITY florin 'andamp;#131;'andgt; andlt;!ENTITY elip 'andamp;#133;'andgt; As you might guess, this is tedious… Character entities: Character entities Another approach is to use a notation: andlt;xsd:notation id='xhtml-lat1' name='XHTMLLatin1' public='-//W3C/ENTITIES…' system='http://www.w3.org/…' /andgt; Character entities: Character entities This is the way things are done in the 'Modularization of XHTML in XML Schema' document. See www.w3.org/TR/xhtml-m12n-schema for more information. Character entities: Character entities The actual file of entities (xhtml-lat1.ent) looks like this: andlt;!-- non-breaking space --andgt; andlt;!ENTITY nbsp 'andamp;#160;'andgt; andlt;!-- inverted exclamation mark --andgt; andlt;!ENTITY iexcl 'andamp;#161;'andgt; andlt;!-- cent sign --andgt; andlt;!ENTITY cent 'andamp;#162;'andgt; Future work: Future work Stuff we haven't gotten around to yet… Future work: Future work Here are some ideas we think are cool, we just haven't had time to work on them yet: Use Cocoon to publish XML schemas. The published version would give you an HTML page with links to the raw source and to the generated documentation. Use XSLT to convert the schemas into HTML forms. Future work: Future work More to-dos: Use XSLT to create a Swing-based, state-aware editor based on the schema. The editor would be guaranteed to create a valid XML document. Generate SQL CREATE TABLE statements based on the schema. This might be useful, but we don't anticipate using it very often. Resources: Resources developerWorks XML zone: developerWorks XML zone ibm.com/developerWorks/xml has tons of XML resources: Tutorials Articles Sample code And it’s all FREE!!! developerWorks tutorials: developerWorks tutorials Validating XML A good overview of techniques for validating XML documents, including DTDs and schemas XML Schema Validation in Xerces-Java 2 A more advanced tutorial on how to use the Xerces parser to validate XML documents with XML schemas. The Apache XML project: The Apache XML project xml.apache.org has lots of free tools, all of which are open source and standards compliant: Xerces XML parser Xalan XSLT engine FOP rendering tool Cocoon XML publishing engine Buy this book!: ISBN 0596002521 Order your copy at amazon.com today! The definitive reference, IMHO. Buy this book! Another good resource…: ISBN 0201740958 Order your copy at amazon.com today! A good desk reference, it covers XML, XPath, XSLT, schemas, SOAP, and more. Another good resource… Shameless self-promotion: Shameless self-promotion Shameless self-promotion: ISBN 0596000537 Order your copy at amazon.com today! Makes a great gift! Show your loved ones how much you care by showing them the power of XSLT! Shameless self-promotion Schamlos – Teil zwei: Schamlos – Teil zwei Wer XML bereits kennt und ein Problem schnell lösen möchte oder noch nach einem Problem sucht, für den ist XSLT genau richtig. Doug Tidwell richtet sich an XML-erfahrene Entwickler, die schnell in die komplexen Sphären von XSL eintauchen möchten und Antworten auf ihre eigenen Problemstellungen suchen. – Norbert Hartl See www.oreilly.de/catalog/xsltger. Shameless – Part 3: Shameless – Part 3 ISBN 0596000952 Also available at amazon.com. Co-written with James Snell and Paul Kulchenko. German and Italian translations available! Thanks!: Thanks! Doug Tidwell dtidwell@us.ibm.com Slides and samples at ibm.com/ developerWorks/speakers/dtidwell/ To-Do: To-Do Look at James Clark's tools for DTD-to-XML conversion Look at DocBook's list of entities Set up an XML document that pushes the notation concept