Using local Copies of XML Schema Documents with JDOM / Xerces

26.7.2005

XML schemas are powerful but also rather complicated to use within Java.
Within my PhD Thesis, I used them a lot. One problem that can arise is that validating an XML document with a schema normally needs network access, as the schema locations are typically specified with URLs. I will show you here how to configure the SAX parser to use schemas stored in the jar instead.


What we use





LocalSAXBuilder.java



To enable the schema validation, we use


SAXBuilder b = new SAXBuilder("org.apache.xerces.parsers.SAXParser");
setFeature ("http://xml.org/sax/features/validation", true);
setValidation(true);

setFeature ("http://apache.org/xml/features/validation/schema", true);
setFeature ("http://apache.org/xml/features/validation/schema-full-checking",true);


Then we locate the schemas within the classpath. We do not know the location of the jar file before executing.
All our schema namespace names begin with http://diuf.unifr.ch/tns/projects/verinec/, thus we just know how to build the string.
If you have different names, you could also build a table of file names and namespaces.
The external-schemaLocation attribute has the format

{namespace url }*

(Thus, spaces in the path name would break your application. You must encode them as %20.
Because of Java Bug 6274990, many non-english letters like 'ä' break the path even if correctly encoded.
This is quite annoying, as you can not know where your jar will be located when executed...)


	  Iterator iter;
String schemas = "";
String BASE_ADRESS = "http://diuf.unifr.ch/tns/projects/verinec/";
try {
iter = ResourceUtility.getSystemResources("/res/schemas/", ".xsd").iterator();
} catch (IOException e) {
throw new Exception("Could not read schemas in classpath",e );
}
if (! iter.hasNext())
logger.warning("Did not find any schemas. Will only work with network access.");

while(iter.hasNext()){
String url = ((URL) iter.next()).toString().replaceAll(" ", "%20");
String schemaname = url.substring(url.lastIndexOf("/")+1, url.length()-4); //it is the forward slash even on windows
String namespace = BASE_ADDRESS+ schemaname;
schemas += namespace + " "+ url+" ";
}

System.out.println("Using the following local schemas: "+schemas); //debug

setProperty("http://apache.org/xml/properties/schema/external-schemaLocation", schemas);

//if we had a xml document without namespace using a schema, we could set the following property to its path.
//setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", schemas);


The complete code can be found in LocalSAXBuilder. You will also need ResourceUtility to search for the schemas within the classpath.





Referencing other schemas



There is one more problem to solve. If a schema uses other schemas, they should also not be loaded from the internet.
xs:import statements are no problem, you can just write:


<xs:import namespace="http://diuf.unifr.ch/tns/projects/verinec/gui" schemaLocation="http://diuf.unifr.ch/tns/projects/verinec/gui.xsd" />

(The schemaLocation is used only if the schema is not set in the external-schemaLocation property)


However, we also use xs:include to use some type definitions in all schemas in their own namespace.
For xs:include, you specify the file name to include.
In many examples, an absolute url beginning with http is used, but to make things work without network access, you should use a relative path.
As that path is handled relative to the location of the including schema, you can just put the schema in the same directory:


<xs:include schemaLocation="base_types.xsd"/>


This will work both for jar as well as for web access to load the schemas. Just make shure the included schema is in the same directory as the including schema.



Acknowledgements



My Co-PhD at unifr Dominik Jungo wrote parts of the LocalSAXBuilder.



Downloads

LocalSAXBuilder.java
ResourceUtility.java

xml java