This page has sections on the following topics:
| |
| |
The SAX2 API for XML parsers was originally developed for
Java. Please be aware that there is no standard SAX2 API for
C++, and that use of the Xerces-C++ SAX2 API does not
guarantee client code compatibility with other C++ XML
parsers.
The SAX2 API presents a callback based API to the parser. An
application that uses SAX2 provides an instance of a handler
class to the parser. When the parser detects XML constructs,
it calls the methods of the handler class, passing them
information about the construct that was detected. The most
commonly used handler classes are ContentHandler which is
called when XML constructs are recognized, and ErrorHandler
which is called when an error occurs. The header files for the
various SAX2 handler classes are in
'<xerces-c1_6_0>/include/sax2'
As a convenience, Xerces-C++ provides the class
DefaultHandler, which is a single class which is publicly derived
from all the Handler classes. DefaultHandler's default
implementation of the handler callback methods is to do
nothing. A convenient way to get started with Xerces-C++ is
to derive your own handler class from DefaultHandler and override
just those methods in HandlerBase which you are interested in
customizing. This simple example shows how to create a handler
which will print element names, and print fatal error
messages. The source code for the sample applications show
additional examples of how to write handler classes.
This is the header file MySAX2Handler.hpp:
| | | | #include <sax2/DefaultHandler.hpp>
class MySAX2Handler : public DefaultHandler {
public:
void startElement(
const XMLCh* const uri,
const XMLCh* const localname,
const XMLCh* const qname,
const Attributes& attrs
);
void fatalError(const SAXParseException&);
}; | | | | |
This is the implementation file MySAX2Handler.cpp:
| | | | #include "MySAX2Handler.hpp"
#include <iostream.h>
MySAX2Handler::MySAX2Handler()
{
}
MySAX2Handler::startElement(const XMLCh* const uri,
const XMLCh* const localname,
const XMLCh* const qname,
const Attributes& attrs)
{
// transcode() is an user application defined function which
// converts unicode strings to usual 'char *'. Look at
// the sample program SAX2Count for an example implementation.
cout << "I saw element: " << transcode(qname) << endl;
}
MySAX2Handler::fatalError(const SAXParseException& exception)
{
cout << "Fatal Error: " << transcode(exception.getMessage())
<< " at line: " << exception.getLineNumber()
<< endl;
} | | | | |
The XMLCh and Attributes types are supplied by
Xerces-C++ and are documented in the include
files. Examples of their usage appear in the source code to
the sample applications.
|
| | | | Xerces SAX2 Supported Features | | | | |
| |
The behavior of the SAX2XMLReader is dependant on the values of the following features.
All of the features below can be set using the function SAX2XMLReader::setFeature(cons XMLCh* const, const bool) .
And can be queried using the function bool SAX2XMLReader::getFeature(const XMLCh* const) .
None of these features can be modified in the middle of a parse, or an exception will be thrown.
http://xml.org/sax/features/namespaces
|
true:
| Perform Namespace processing (default)
|
false:
| Optionally do not perform Namespace processing
|
http://xml.org/sax/features/namespace-prefixes
|
true:
| Report the orignal prefixed names and attributes used for Namespace declarations (default)
|
false:
| Do not report attributes used for Namespace declarations, and optionally do not report original prefixed names.
|
http://xml.org/sax/features/validation
|
true:
| Report all validation errors. (default)
|
false:
| Do not report validation errors.
|
http://apache.org/xml/features/validation/dynamic
|
true:
| The parser will validate the document only if a grammar is specified. (http://xml.org/sax/features/validation must be true)
|
false:
| Validation is determined by the state of the http://xml.org/sax/features/validation feature (default)
|
http://apache.org/xml/features/validation/schema
|
true:
| Enable the parser's schema support. (default)
|
false:
| Disable the parser's schema support.
|
http://apache.org/xml/features/validation/schema-full-checking
|
true:
| Enable full schema constraint checking, including checking
which may be time-consuming or memory intensive. Currently, particle unique
attribution constraint checking and particle derivation resriction checking
are controlled by this option.
|
false:
| Disable full schema constraint checking (default).
|
http://apache.org/xml/features/validation/reuse-grammar
|
true:
| The parser will reuse grammar information from previous parses in subsequent parses.
|
false:
| The parser will not reuse any grammar information. (default)
|
http://apache.org/xml/features/validation/reuse-validator (deprecated)
Please use http://apache.org/xml/features/validation/reuse-grammar
|
true:
| The parser will reuse grammar information from previous parses in subsequent parses.
|
false:
| The parser will not reuse any grammar information. (default)
|
|
| | | | Xerces SAX2 Supported Properties | | | | |
| |
The behavior of the SAX2XMLReader is dependant on the values of the following properties.
All of the properties below can be set using the function SAX2XMLReader::setProperty(const XMLCh* const, void*) .
It takes a void pointer as the property value. Application is required to initialize this void
pointer to a correct type. Please check the column "Value Type" below
to learn exactly what type of property value each property expects for processing.
Passing a void pointer that was initialized with a wrong type will lead to unexpected result.
If the same property is set more than once, the last one takes effect.
Property values can be queried using the function void* SAX2XMLReader::getFeature(const XMLCh* const) .
The parser owns the returned pointer, and the memory allocated for the returned pointer will
be destroyed when the parser is deleted. To ensure assessiblity of the returned information after
the parser is deleted, callers need to copy and store the returned information somewhere else.
Since the returned pointer is a generic void pointer, check the column "Value Type" below to learn
exactly what type of object each property returns for replication.
None of these properties can be modified in the middle of a parse, or an exception will be thrown.
http://apache.org/xml/properties/schema/external-schemaLocation
|
Description
| The XML Schema Recommendation explicitly states that
the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the
instance document is only a hint; it does not mandate that these attributes
must be used to locate schemas. Similar situation happens to <import>
element in schema documents. This property allows the user to specify a list
of schemas to use. If the targetNamespace of a schema specified using this
method matches the targetNamespace of a schema occurring in the instance
document in schemaLocation attribute, or
if the targetNamespace matches the namespace attribute of <import>
element, the schema specified by the user using this property will
be used (i.e., the schemaLocation attribute in the instance document
or on the <import> element will be effectively ignored).
|
Value
| The syntax is the same as for schemaLocation attributes
in instance documents: e.g, "http://www.example.com file_name.xsd".
The user can specify more than one XML Schema in the list.
|
Value Type
| XMLCh*
|
http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation
|
Description
| The XML Schema Recommendation explicitly states that
the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the
instance document is only a hint; it does not mandate that these attributes
must be used to locate schemas. This property allows the user to specify the
no target namespace XML Schema Location externally. If specified, the instance
document's noNamespaceSchemaLocation attribute will be effectively ignored.
|
Value
| The syntax is the same as for the noNamespaceSchemaLocation
attribute that may occur in an instance document: e.g."file_name.xsd".
|
Value Type
| XMLCh*
|
|
|
| |
| | | | Objects and Memory Management | | | | |
| |
The C++ DOM implementation uses automatic memory management,
implemented using reference counting. As a result, the C++
code for most DOM operations is very similar to the equivalent
Java code, right down to the use of factory methods in the DOM
document class for nearly all object creation, and the lack of
any explicit object deletion.
Consider the following code snippets
| | | | // This is C++
DOM_Node aNode;
aNode = someDocument.createElement("ElementName");
DOM_Node docRootNode = someDoc.getDocumentElement();
docRootNode.AppendChild(aNode); | | | | |
| | | | // This is Java
Node aNode;
aNode = someDocument.createElement("ElementName");
Node docRootNode = someDoc.getDocumentElement();
docRootNode.AppendChild(aNode); | | | | |
The Java and the C++ are identical on the surface, except for
the class names, and this similarity remains true for most DOM
code.
However, Java and C++ handle objects in somewhat different
ways, making it important to understand a little bit of what
is going on beneath the surface.
In Java, the variable aNode is an object reference ,
essentially a pointer. It is initially == null, and references
an object only after the assignment statement in the second
line of the code.
In C++ the variable aNode is, from the C++ language's
perspective, an actual live object. It is constructed when the
first line of the code executes, and DOM_Node::operator = ()
executes at the second line. The C++ class DOM_Node
essentially a form of a smart-pointer; it implements much of
the behavior of a Java Object Reference variable, and
delegates the DOM behaviors to an implementation class that
lives behind the scenes.
Key points to remember when using the C++ DOM classes:
- Create them as local variables, or as member variables of
some other class. Never "new" a DOM object into the heap or
make an ordinary C pointer variable to one, as this will
greatly confuse the automatic memory management.
- The "real" DOM objects - nodes, attributes, CData
sections, whatever, do live on the heap, are created with the
create... methods on class DOM_Document. DOM_Node and the
other DOM classes serve as reference variables to the
underlying heap objects.
- The visible DOM classes may be freely copied (assigned),
passed as parameters to functions, or returned by value from
functions.
- Memory management of the underlying DOM heap objects is
automatic, implemented by means of reference counting. So long
as some part of a document can be reached, directly or
indirectly, via reference variables that are still alive in
the application program, the corresponding document data will
stay alive in the heap. When all possible paths of access have
been closed off (all of the application's DOM objects have
gone out of scope) the heap data itself will be automatically
deleted.
- There are restrictions on the ability to subclass the DOM
classes.
|
| |
Class DOMString provides the mechanism for passing string
data to and from the DOM API. DOMString is not intended to be
a completely general string class, but rather to meet the
specific needs of the DOM API.
The design derives from two primary sources: from the DOM's
CharacterData interface and from class java.lang.string .
Main features are:
- It stores Unicode text.
- Automatic memory management, using reference counting.
- DOMStrings are mutable - characters can be inserted,
deleted or appended.
When a string is passed into a method of the DOM, when
setting the value of a Node, for example, the string is cloned
so that any subsequent alteration or reuse of the string by
the application will not alter the document contents.
Similarly, when strings from the document are returned to an
application via the DOM API, the string is cloned so that the
document can not be inadvertently altered by subsequent edits
to the string.
| The ICU classes are a more general solution to UNICODE
character handling for C++ applications. ICU is an Open
Source Unicode library, available at the IBM
DeveloperWorks website. |
|
| |
The C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc.,
are not designed to be subclassed by an application
program.
As an alternative, the DOM_Node class provides a User Data
field for use by applications as a hook for extending nodes by
referencing additional data or objects. See the API
description for DOM_Node for details.
|
|
| | | | Experimental IDOM Programming Guide | | | | |
| |
The experimental IDOM API is a new design of the C++ DOM API.
Please note that this experimental IDOM API is only a prototype
and is subject to change.
| |
The C++ IDOM implementation no longer uses reference counting for
automatic memory management. The C++ IDOM uses an independent storage
allocator per document. The storage for a DOM document is
associated with the document node object.
The advantage here is that allocation would require no synchronization
in most cases (based on the the same threading model that we
have now - one thread active per document, but any number of
documents running in parallel with separate threads).
The allocator does not support a delete operation at all - all
allocated memory would persist for the life of the document, and
then the larger blocks would be returned to the system without separately
deleting all of the individual nodes and strings within the document.
The C++ DOM and IDOM are similar in the use of factory methods in the
document class for all object creation. They differ in the object deletion
mechanism.
In C++ DOM, there is no explicit object deletion. The deallocation of
memory is automatically taken care of by the reference counting.
In C++ IDOM, there is an implict and explict object deletion.
|
| |
If user is manually building a DOM tree in memory using the document factory methods,
then the user needs to explicilty delete the document object to free all the allocated memory.
It normally falls under the following 3 scenarios:
- If a user is manually creating a DOM document using the document implementation
factory methods, IDOM_DOMImplementation::getImplementation()->createDocument,
then the user needs to explicilty delete the document object to free all
allocated memory.
- If a user is creating a DocumentType object using the document implementation factory
method, IDOM_DOMImplementation::getImplementation()->createDocumentType, then
the user also needs to explicilty delete the document type object to free the
allocated memory.
- Special case: If a user is creating a DocumentType using the document
implementation factory method, and clone the node WITHOUT assigning a document
owner to that documentType object, then the cloned node also needs to be explicitly
deleted.
Consider the following code snippets:
| | | |
// C++ IDOM - explicit deletion
// use the document implementation factory method to create a document type and a document
IDOM_DocumentType* myDocType;
IDOM_Document* myDocument;
IDOM_Node* root;
IDOM_Node* aNode;
myDocType = IDOM_DOMImplementation::getImplementation()->createDocumentType(name, 0, 0);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0, name, myDocType);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
// need to delete both myDocType and myDocument which are created through DOM Implementation
delete myDocType;
delete myDocument;
| | | | |
| | | |
// C++ IDOM - explicit deletion
// use the document implementation factory method to create a document
IDOM_DocumentType* myDocType;
IDOM_Document* myDocument;
IDOM_Node* root;
IDOM_Node* aNode;
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument();
myDocType = myDocument->createDocumentType(name);
root = myDocument->createElement(name);
aNode = myDocument->createElement(anElementname);
myDocument->appendChild(myDocType);
myDocument->appendChild(root);
root->appendChild(aNode);
// the myDocType is created through myDocument, not through Document Implementation
// thus no need to delete myDocType
delete myDocument;
| | | | |
| | | |
// C++ IDOM - explicit deletion
// manually build a DOM document
// clone the document type object which does not have an owner yet
IDOM_DocumentType* myDocType1;
IDOM_DocumentType* myDocType;
IDOM_Document* myDocument;
IDOM_Node* root;
IDOM_Node* aNode;
myDocType = IDOM_DOMImplementation::getImplementation()->createDocumentType(name, 0, 0);
myDocType1 = (IDOM_DocumentType*) myDocType->cloneNode(false);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0, name, myDocType);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
// myDocType does not have an owner yet when myDocType1 was cloned.
// thus need to explicitly delete myDocType1
delete myDocType1;
delete myDocType;
delete myDocument;
| | | | |
| | | |
// C++ IDOM - explicit deletion
// manually build a DOM document
// clone the document type object that has an owner already
// thus no need to delete the cloned object
IDOM_DocumentType* myDocType1;
IDOM_DocumentType* myDocType;
IDOM_Document* myDocument;
IDOM_Node* root;
IDOM_Node* aNode;
myDocType = IDOM_DOMImplementation::getImplementation()->createDocumentType(name, 0, 0);
myDocument = IDOM_DOMImplementation::getImplementation()->createDocument(0, name, myDocType);
myDocType1 = (IDOM_DocumentType*) myDocType->cloneNode(false);
root = myDocument->getDocumentElement();
aNode = myDocument->createElement(anElementname);
root->appendChild(aNode);
// myDocType already has myDocument as the owner when myDocType1 was cloned
// thus NO need to explicitly delete myDocType1
delete myDocType;
delete myDocument;
| | | | |
|
Key points to remember when using the C++ IDOM classes:
- The DOM objects are accessed via C++ pointers.
- The DOM objects - nodes, attributes, CData
sections, etc., are created with the factory methods
(create...) in the document class.
- If you are manually building a DOM tree in memory, you
need to explicitly delete the document object.
Memory management will be automatically taken care of by
the IDOM parser when parsing an instance document.
|
|