Chapter 8 Working with XML Documents

Working with text

This section includes a description of how to access and work with the text that is contained in XML documents.

Text occurs in the following places:

Inside elements This is where most text in XML documents resides:
```
<Street>2800 Park Avenue</Street>
```
As attribute values The following is an example:
```
<Region Type="Province">
```

Simple cases

When an element or attribute has no child nodes apart from a single text node, you can access the text content of an element or attribute using the nodeValue of the element's child.

If an element contains no other elements or other non-text pieces, the only child of the element is a text node. The nodeValue of this text node is a string. The following example writes out the text of an element:

document.writeln( elemnode.firstChild.nodeValue )

For an attribute node, the following example writes out the value of the node:

document.writeln( attnode.nodeValue )

More complicated cases

Some elements contain several text nodes among their children. Consider the following cases:

Text mixed with entities The following Names element has three child nodes:

<Names>Sammy &amp; Rosie</Names>

The first child is a text node with value "Sammy ". The second child is an entity node representing the ampersand character. The third child is a text node with value " Rosie".

Text mixed with elements The following Warning element has three child nodes:
```
<Warning>Do <emphasis>not</emphasis> walk on the grass</Warning>
```
The first child is a text node with value "Do ". The second is an element node. The third child is a text node with value " walk on the grass".

For more information, see "Working with entities".

Obtaining all the text inside an element

If you want to obtain all the text inside an element, including its children, you can use the getElementsByTagName method of the root element, with the special value "*", which means all tags. The following function demonstrates this technique.

function listTextOfAllElements(rootelement){
  var elemlist, elem, child, i, j ;
  elemlist = rootelement.getElementsByTagName( "*" );
  for( i = 0 ; i < elemlist.length ; i ++ ){
    elem = elemlist.item(i);
    for( j = 0 ; j < elem.childNodes.length ; j ++ ){
      child = elem.childNodes.item(j);
      if( child.nodeType == 3) { // 3 is a text node
        document.writeln( child.nodeValue );
      }
    }
  }
}

Working with CDATA sections

CDATA sections provide a way of including blocks of text in XML documents even if the text contains characters that would otherwise be recognized as markup. CDATA sections start with <![CDATA . All characters inside a CDATA section, including angle brackets and ampersands, are seen as text data until the marker for the end of the section, which is ]]> , is reached.

Here are some examples of CDATA sections:

Text containing an ampersand can be included:
```
<![CDATA [Jane & John Doe]]>
```
In the above example, the ampersand is treated as a text rather than as starting an entity reference.
Text containing tags can be included:
```
<![CDATA [<title>Working with CDATA sections</title>]]>
```
In the above example, <title> and </title> are treated as text rather than as tags.

The text content of a CDATA section is the nodeValue of the object. For example, the following fragment writes out the content of a node if it is a CDATA section node:

if (child.nodeType == 4 ){ //CData Section
    document.writeln( child.nodeValue ) ;
}

Escaping text with xmlEscape

Dynamo includes the xmlEscape function to assist with preparing text for use in XML documents. The prototype is as follows:

string xmlEscape( input_string [, use_CDATA  ] )

This function encodes '&', '<', and '>' characters in a string and returns the encoded string. The input_string parameter is the string to be encoded. The optional use_CDATA parameter dictates whether a CDATA section will be used to encode the characters. If not provided, this parameter defaults to false. If use_CDATA is false, then characters are encoded using the 'ampersand' method of encoding.

Examples

The following script:

document.writeln( xmlEscape( "<MyTag>Hello!</MyTag>" ));

produces the following output

&lt;MyTag&gt;Hello!&lt;/MyTag&gt;

The following script:

document.writeln( xmlEscape( "Calvin & Hobbs", true ) );

produces the following output:

<![CDATA[Calvin & Hobbs]]>