Filewatcher File Search
FTP Search
  
Directory (beta)
  
Content Search (beta)
   
pkg://PyXML-0.8.4-3.src.rpm:744112/PyXML-0.8.4.tar.gz  info  downloads

PyXML-0.8.4/0000755001241000117560000000000010152625727013207 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/0000755001241000117560000000000010152625721014125 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/dom/0000755001241000117560000000000010152625721014704 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/dom/4tidy.py0000644001241000117560000000135507413602734016324 0ustar  loewishpifb600000000000000import sys, cStringIO
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import XHtmlPrint

def Tidy(doc):
    #stream = cStringIO.StringIO()
    #XHtmlPrint(doc, stream=stream)
    #text = stream.getvalue()

    XHtmlPrint(doc)
    return


if __name__ == "__main__":
    html_reader = HtmlLib.Reader()
    if len(sys.argv) == 3:
        uri = sys.argv[1]
        encoding = sys.argv[2]
    elif len(sys.argv) == 2:
        uri = sys.argv[1]
        encoding = ''
    else:
        print "%s requires one or two arguments: the first is a URL or file name to be tidied.  The optional second is the encoding to assume for the input."%sys.argv[0]
        sys.exit(-1)

    html_doc = html_reader.fromUri(uri, charset=encoding)
    Tidy(html_doc)
PyXML-0.8.4/demo/dom/README0000644001241000117560000000767007244341241015576 0ustar  loewishpifb600000000000000Example Programs and Demos for 4DOM.
====================================

Sample data files which can be used to exercise the various demos:

* addr_book1.xml
* addr_book2.xml
* book_catalog1.xml
* addr_book.dtd
* employee_table.html


Demos:
------

* dom_from_html_file.py

Demonstrates reading HTML from a file, and pretty-printing.

Example: "python dom_from_html_file.py employee_table.html"


* dom_from_xml_file.py

Demonstrates reading XML from a file, and pretty-printing.  Try changing FromXml to have "validate=1".

Example: "python dom_from_xml_file.py addr_book1.xml"


* generate_html1.py

Demonstrates putting together a simple HTML page (a form in this case)
with the standard DOM factory interface.

Just execute with "python generate_html1.py"

You can re-direct the output to file and view the result with a browser.  Try adding in more sophisticated form elements.


* generate_xml1.py

Demonstrates putting together a simple XML document with the standard DOM
factory interface.

Just execute with "python generate_xml1.py"


* 4tidy.py

Demonstrates the XHTML support in 4DOM.  It takes a URL or file name on the
command line and reads the HTML source.  It then prints xhtml based on the HTML
source to standard output.

try "python 4tidy.py http://fourthought.com"


* iterator1.py

Demonstrates the DOM standard Node Iterator interface.  It iterates over each node in the read-in file, and prints out its node type and name.  Then it iterates again, using the NodeFilter interface to restrict it to nodes of type Element.

Example: "python iterator1.py addr_book1.xml"


* visitor1.py

Demonstrates 4DOM's proprietary Walker/Visitor interface.  If you only need to iterate over a tree in pre-order, you are advised to use the standard NodeIterator instead (see iterator1.py and xll_replace.py for examples).  dom.ext.Visitor is best for defi
ning other iteration orders and rules.

This sample actually just runs through a pre-order walk, for simplicity.  The output should be identical to that of the first part of iterator1.py.

Example: "python visitor1.py addr_book1.xml"


* trace_ns.py

A demo of 4DOM's namespace extensions.  Given an XML file-name on the command line, it will walk through the elements in document order (using NodeIterator) and print out the default namespace in effect as well as those of the element and its attributes.

Example: "python trace_ns.py book_catalog1.xml"

For the Namespace spec, see

http://www.w3.org/TR/REC-xml-names/

For James Clark's excellent introduction to and clarification of namespaces, see

http://www.jclark.com/xml/xmlns.htm


* link_title_invert.py

Demonstrates node manipulations.  It takes a sample document with anchors
embedded in header tags, and flips them so that the header tags are instead
embedded in the anchors.

just "python link_title_invert.py"

* xll_replace.py

A rather more involved demo.  This program reads in an XML file, and looks for XLL-type hyperlinks (see http://www.oasis-open.org/cover/xll.html for information on this remarkably powerful spec).

Warning: This script uses a very obsolete version of XLink

When it finds such a link, it looks for the target XML doc
ument and parses it into a DOM node.  It doesn't support XPointer for document fragments yet, but with a decent Xpointer processor, such as xptr (see below), you can add this yourself.  It then replaces the node that contained the link with the entire con
tents of the target document of that link.

For a good example, look at addr_book1.xml and then addr_book2.xml.  The former contains the following line:

 <ENTRY-LINK xml:link="simple" href="addr_book2.xml"/>

if you run

"python xll_replace.py addr_book1.xml"

it will read in the addr_book2.xml file into a node, and replace the ENTRY-LINK node with the new one.  It will then print out the result, which should be self-explanatory.


If you need help with the demos, or any other help working with 4DOM,
please don't hesistate to as on the mailing list: 4Suite@lists.fourthought.com.


PyXML-0.8.4/demo/dom/__init__.py0000644001241000117560000000016307633365761017034 0ustar  loewishpifb600000000000000########################################################################
#
# File Name:            __init__.py
#
#
PyXML-0.8.4/demo/dom/addr_book.dtd0000644001241000117560000000063607117052117017332 0ustar  loewishpifb600000000000000<!ELEMENT ADDRBOOK ((ENTRY | ENTRY-LINK)*)>
<!ELEMENT ENTRY (NAME, ADDRESS, PHONENUM*, EMAIL)>
<!ATTLIST ENTRY
    ID ID #REQUIRED
>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT ADDRESS (#PCDATA)>
<!ELEMENT PHONENUM (#PCDATA)>
<!ATTLIST PHONENUM
    DESC CDATA #REQUIRED
>
<!ELEMENT EMAIL (#PCDATA)>
<!ELEMENT ENTRY-LINK EMPTY>
<!ATTLIST ENTRY-LINK
    xml:link (simple|extended|group) #REQUIRED
    href CDATA #REQUIRED
>
PyXML-0.8.4/demo/dom/addr_book1.xml0000644001241000117560000000172307117052117017436 0ustar  loewishpifb600000000000000<?xml version = "1.0"?>
<!DOCTYPE ADDRBOOK SYSTEM "addr_book.dtd">
<ADDRBOOK xmlns:xlink="http://www.w3.org/XML/XLink/0.9">
	<ENTRY ID="pa">
		<NAME>Pieter Aaron</NAME>
		<ADDRESS>404 Error Way</ADDRESS>
		<PHONENUM DESC="Work">404-555-1234</PHONENUM>
		<PHONENUM DESC="Fax">404-555-4321</PHONENUM>
		<PHONENUM DESC="Pager">404-555-5555</PHONENUM>
		<EMAIL>pieter.aaron@inter.net</EMAIL>
	</ENTRY>
	<ENTRY-LINK xlink:link="simple" xlink:href="addr_book2.xml"/>
	<ENTRY ID="en">
		<NAME>Emeka Ndubuisi</NAME>
		<ADDRESS>42 Spam Blvd</ADDRESS>
		<PHONENUM DESC="Work">767-555-7676</PHONENUM>
		<PHONENUM DESC="Fax">767-555-7642</PHONENUM>
		<PHONENUM DESC="Pager">800-SKY-PAGEx767676</PHONENUM>
		<EMAIL>endubuisi@spamtron.com</EMAIL>
	</ENTRY>
	<ENTRY ID="vz">
		<NAME>Vasia Zhugenev</NAME>
		<ADDRESS>2000 Disaster Plaza</ADDRESS>
		<PHONENUM DESC="Work">000-987-6543</PHONENUM>
		<PHONENUM DESC="Cell">000-000-0000</PHONENUM>
		<EMAIL>vxz@magog.ru</EMAIL>
	</ENTRY>
</ADDRBOOK>
PyXML-0.8.4/demo/dom/addr_book2.xml0000644001241000117560000000043007117052117017431 0ustar  loewishpifb600000000000000<?xml version = "1.0"?>
<!DOCTYPE ADDRBOOK SYSTEM "addr_book.dtd">
<ADDRBOOK>
	<ENTRY ID="gn">
		<NAME>Gegbefuna Nwannem</NAME>
		<ADDRESS>666 Murtala Mohammed Blvd.</ADDRESS>
		<PHONENUM DESC="Home">999-101-1001</PHONENUM>
		<EMAIL>nwanneg@naija.ng</EMAIL>
	</ENTRY>
</ADDRBOOK>
PyXML-0.8.4/demo/dom/benchmark.py0000644001241000117560000000171707413602734017223 0ustar  loewishpifb600000000000000# A DOM benchmark

import sys, time

from xml.dom import core, utils

def main():
    global L, doc
    if len(sys.argv) == 1:
        print 'Usage: benchmark.py <xml file>'
        sys.exit()

    filename = sys.argv[1]

    file = open(filename, 'r')
    size = len(file.read())
    file.close()

    print 'File %s is %iK in size' % (filename, size / 1024)

    start_time = time.time()
    doc = utils.FileReader( filename ).document
    end_time = time.time()
    print 'Building DOM tree:', end_time - start_time, 'sec'

    # Convert DOM tree back to XML
    start_time = time.time()
    xml = doc.toxml()
    end_time = time.time()
    print 'Serializing back to XML:', end_time - start_time, 'sec'

    # Time a complete getElementsByTagName()
    start_time = time.time()
    L = doc.getElementsByTagName("*")
    end_time = time.time()
    print 'getElementsByTagName("*"):', end_time - start_time, 'sec'
    print L[0].nodeName

if __name__ == '__main__': main()
PyXML-0.8.4/demo/dom/book_catalog1.xml0000644001241000117560000000076707117052117020145 0ustar  loewishpifb600000000000000<?xml version="1.0"?>
 <!-- initially, the default namespace is "books" -->
 <book xmlns='urn:loc.gov:books'
       xmlns:isbn='urn:ISBN:0-395-36341-6'
       xmlns:fcat='http://FourThought.com/catalog'>
     <title>Cheaper by the Dozen</title>
     <isbn:number>1568491379</isbn:number>
     <notes fcat:ref='ITEM54321'>
       <!-- make HTML the default namespace for some commentary -->
       <p xmlns='urn:w3-org-ns:HTML'>
           This is a <i>funny</i> book!
       </p>
     </notes>
 </book>
PyXML-0.8.4/demo/dom/building.py0000644001241000117560000000257707413602734017073 0ustar  loewishpifb600000000000000# This demo converts a few nested objects into an XML representation,
# and provides a simple example of using the Builder class.

from xml.dom import core
from xml.dom.builder import Builder

import types, time

def object_convert(builder, obj):

    # Put the entire object inside an element with the same name as
    # the class.
    builder.startElement( obj.__class__.__name__ )
    L = obj.__dict__.keys()
    L.sort()

    for attr in obj.__dict__.keys():

        # Skip internal attributes (ones that begin with a '_')
        if attr[0] == '_': continue

        value = getattr(obj, attr)
        if type(value) == types.InstanceType:
            # Recursively process subobjects
            object_convert( builder, value)

        else:
            # Convert anything else to a string and put it in an element
            builder.startElement(attr)
            builder.text( str(value) )
            builder.endElement(attr)

    builder.endElement( obj.__class__.__name__ )

if __name__ == '__main__':
    class Folder: pass
    class Bookmark: pass

    f=Folder()
    f.title = "Folder Title"
    f.createdTime = time.asctime( time.localtime( time.time() ) )
    f.bookmark = b = Bookmark()
    b.url, b.title = "http://www.python.org", "Python Home Page"

    builder = Builder()
    object_convert(builder, f)
    print "Output from two nested objects:"
    print builder.document.toxml()
PyXML-0.8.4/demo/dom/dom_from_html_file.py0000644001241000117560000000111107413602734021102 0ustar  loewishpifb600000000000000"""Reads in an HTML file from the command line and pretty-prints it."""

from xml.dom.ext.reader import HtmlLib
from xml.dom import ext

def read_html_from_file(fileName):
    #build a DOM tree from the file
    reader = HtmlLib.Reader()
    dom_object = reader.fromUri(fileName)

    #strip any ignorable white-space in preparation for pretty-printing
    ext.StripHtml(dom_object)

    #pretty-print the node
    ext.PrettyPrint(dom_object)

    #reclaim the object
    reader.releaseNode(dom_object);


if __name__ == '__main__':
    import sys
    read_html_from_file(sys.argv[1])
PyXML-0.8.4/demo/dom/dom_from_xml_file.py0000644001241000117560000000060007244341241020733 0ustar  loewishpifb600000000000000from xml.dom import ext
from xml.dom.ext.reader import PyExpat

def read_xml_from_file(fileName):
    #build a DOM tree from the file
    reader = PyExpat.Reader()
    xml_dom_object = reader.fromUri(fileName)

    ext.Print(xml_dom_object)

    #reclaim the object
    reader.releaseNode(xml_dom_object)

if __name__ == '__main__':
    import sys
    read_xml_from_file(sys.argv[1])
PyXML-0.8.4/demo/dom/domconv.py0000644001241000117560000000472107413602734016734 0ustar  loewishpifb600000000000000# A simple library to convert DOM object structures to SGML or XML output,
# usually for xml2html conversion.

import sys,types,string,StringIO

SKIP=1       # Ignore the element and its contents
STRIP=2      # Ignore the element, but process its contents
ID=3         # Identity transform
MAP=4        # Arg: (elem,hash). Map element to elem, map attrs using hash.

def escape_markup(str):
    """Takes a string and escapes all '<'s and quotes in it with character
    entity references."""
    str=string.replace(str,"<","&#60;")
    return string.replace(str,'"',"&#34;")

def convert(rootnode,spec,writer=sys.stdout):
    """Takes a DOM node, a conversion specification and a file-like object
    to write the converted data to, and performs the actual conversion.
    The spec hashtable must map element names to (action,arg) tuples, where
    action must be one of the constants at the top of this file. arg is only
    used for MAP, where it must be a tuple (elementname,maphash) where the
    elementname is the name of the element to substitute for the original
    one, and maphash is a hashtable that maps attribute names to either the
    attribute name to substitute or a function that takes the attribute value
    and returns the string to replace the entire attr='val' sequence with.
    """

    try:
        (action,arg)=spec[rootnode.GI]
    except KeyError:
        action=STRIP

    if action==SKIP:
        return
    elif action==STRIP:
        pass
    elif action==ID:
        writer.write("<" + rootnode.GI)
        for (name,val) in rootnode.attributes.items():
            writer.write(" %s='%s'" % (name,escape_markup(val)))
        writer.write(">")
    elif action==MAP:
        writer.write("<" + arg[0])
        for (name,val) in rootnode.attributes.items():
            if arg[1].has_key(name):
                map=arg[1][name]
                if type(map)==types.StringType:
                    writer.write(" %s=\"%s\"" % (map,escape_markup(val)))
                else:
                    writer.write(map(escape_markup(val)))

        writer.write(">")

    for child in rootnode.getChildren():
        if child.GI=="#PCDATA":
            writer.write(escape_markup(child.data))
        else:
            convert(child,spec,writer)

    if action==ID:
        writer.write("</%s>" % rootnode.GI)
    elif action==MAP:
        writer.write("</%s>" % arg[0])

def convert_str(rootnode,spec):
    obj=StringIO.StringIO()
    convert(rootnode,spec,obj)
    return obj.getvalue()
PyXML-0.8.4/demo/dom/employee_table.html0000644001241000117560000000232207244341241020557 0ustar  loewishpifb600000000000000<HTML>
  <HEAD>
    <TITLE>
      FourThought Employee List
    </TITLE>
  </HEAD>
  <BODY>
    <TABLE BORDER='1'>
      <TBODY>
        <TR>
	  <TH>
	    Last Name, First Name
	  </TH>
	  <TH>
	    Email address
	  </TH>
	  <TH>
	    Extension
	  </TH>
	  <TH>
	    Department
	  </TH>
        </TR>
        <TR>
	  <TD>
	    Butte, Brian
	  </TD>
	  <TD>
	    <A HREF='mailto:Brian.Butte@fourthought.com'>Brian.Butte@fourthought.com</A>
	  </TD>
	  <TD>
	    x1111
	  </TD>
	  <TD>
	    1028
	  </TD>
        </TR>
        <TR>
	  <TD>
	    Ogbuji, Uche
	  </TD>
	  <TD>
	    <A HREF='mailto:Uche.Ogbuji@fourthought.com'>Uche.Ogbuji@fourthought.com</A>
	  </TD>
	  <TD>
	    x1112
	  </TD>
	  <TD>
	    1029
	  </TD>
        </TR>
        <TR>
	  <TD>
	    <A HREF='/~molson'>Olson, Mike</A>
	  </TD>
	  <TD>
	    <A HREF='mailto:Mike.Olson@fourthought.com'>Mike.Olson@fourthought.com</A>
	  </TD>
	  <TD>
	    x1113
	  </TD>
	  <TD>
	    1028
	  </TD>
        </TR>
        <TR>
	  <TD>
	    Roberts, Rich
	  </TD>
	  <TD>
	    <A HREF='mailto:Rich.Roberts@fourthought.com'>Rich.Roberts@fourthought.com</A>
	  </TD>
	  <TD>
	    x1114
	  </TD>
	  <TD>
	    1029
	  </TD>
        </TR>
      </TBODY>
    </TABLE>
  </BODY>
</HTML>
PyXML-0.8.4/demo/dom/generate_html1.py0000644001241000117560000000305007413602734020160 0ustar  loewishpifb600000000000000"""
A basic example of using the DOM to create an HTML document from scratch.
Also demonstrates creation of HTML forms
"""

from xml.dom import ext
from xml.dom import implementation

if __name__ == '__main__':

    #create a concrete HTMLDocument instance.
    doc = implementation.createHTMLDocument('A Basic HTML Document')

    #add in body
    doc.body = doc.createElement('Body')

    #Create a form
    form = doc.createElement('Form')

    #Create some text.  Note: every character is represented in some
    #DOM object.  All text (even between tags) is in a text node
    t = doc.createTextNode('Employee Name:')

    #Create an input tag
    i = doc.createElement('Input')

    #All elements can have attributes directly set
    i.setAttribute('TYPE','TEXT')

    #Some have helper functions defined.
    #This one sets the SIZE attribute to 20
    #Note that the argument must be a string.  4DOM closely
    #follows the DOM spec for the type of the arguments, even
    #when the spec is inconsistent or counter-intuitive
    i.size = '20'

    #This sets the NAME attribute
    i.name = 'EmployeeName'

    #Set the form's ACTION attribute
    form.action = '/cgi-local/test.py'

    #this inserts i as the last child in the form
    form.appendChild(i)

    #Insert t before i in form's child list
    form.insertBefore(t,i)

    #add the form to the document's body.  Note that you can't
    #add child elements directly to the document.
    doc.body.appendChild(form)

    #This prints out the text representation of the HTML document
    ext.PrettyPrint(doc)
PyXML-0.8.4/demo/dom/generate_xml1.py0000644001241000117560000000222207413602734020014 0ustar  loewishpifb600000000000000"""
A basic example of using the DOM to create an XML document from scratch.
"""


from xml.dom import ext
from xml.dom import implementation

if __name__ == '__main__':

    #Create a doctype using document type name, sysid and pubid
    dt = implementation.createDocumentType('mydoc', '', '')

    #Create a document using document element namespace URI, doc element
    #name and doctype.  This automatically creates a document element
    #which is the single element child of the document
    doc = implementation.createHTMLDocument('', 'mydoc', dt)

    #Get the document element
    doc_elem = doc.documentElement

    #Create an element: the Document instanmce acts as a factory
    new_elem = doc.createElementNS('', 'spam')

    #Create an attribute on the new element
    new_elem.setAttributeNS('', 'eggs', 'sunnysideup')

    #Create a text node
    new_text = doc.createTextNode('some text here...')

    #Add the new text node to the new element
    new_elem.appendChild(new_text)

    #Add the new element to the document element
    doc_elem.appendChild(new_elem)

    #Print out the resulting document
    import xml.doc.ext
    xml.doc.ext.Print(doc)
PyXML-0.8.4/demo/dom/html2html0000755001241000117560000000355706624412225016561 0ustar  loewishpifb600000000000000#!/usr/bin/python
#
# This example program converts a chunk of HTML to a DOM tree.
# It then prints the tree as HTML, as XML, and it prints a list of all
# the hyperlinks in the document by using getElementsByTagName() to
# retrieve all the A elements.

from xml.dom.html_builder import HtmlBuilder
from xml.dom.writer import HtmlWriter
from xml.dom import core

HTML_DATA = """<HTML>
<HEAD><TITLE>Les HOWTO Linux</TITLE></HEAD>
<BODY>
<HR> <H1>Les HOWTO Linux</H1>
<P>Les Howto que vous trouverez ci-dessous sont en fran&ccedil;ais. 
Ils peuvent etre trouv&eacute;s dans les formats suivants
sur le site 
<A HREF="ftp://ftp.lip6.fr/pub/linux/french/docs/HOWTO">ftp.lip6.fr</a> 
dans le r&eacute;pertoire /pub/linux/french/docs/HOWTO :
<UL>
<LI><A HREF="Access-HOWTO.html">Access-HOWTO</A> (Version <A
HREF="Access-HOWTO.ps">Postscript</A>)</LI>
</UL></BODY></HTML>
"""

# Construct an HtmlBuilder object and feed the data to it
b = HtmlBuilder()
b.feed(HTML_DATA)

# Get the newly-constructed document object 
doc = b.document

# Output it as HTML
print "============"
print "HTML version"
w = HtmlWriter()
w.write(b.document)

# Output it as XML
print "\n==========="
print "XML version"
print doc.toxml()

print "\n==========="
print "Links in the document"

# Retrieve all the link objects
links = doc.getElementsByTagName('A')
for node in links:
    # Collect any children of the A element that are Text nodes
    # (Note that this won't work on invalid HTML, like
    # <a href="xxx"><b>Text</b></a>.  You could fix this by actually
    # traversing all the child nodes of the A element.)

    linktext = ""
    for child in node.childNodes:
        if child.nodeType == core.TEXT_NODE:
            linktext = linktext + child.value
    
    # Get the HREF attribute, if present
    url = node.getAttribute('HREF')
    if  url != "":
        print "HREF=", url, linktext
            
print links

PyXML-0.8.4/demo/dom/iterator1.py0000644001241000117560000000172107413602734017176 0ustar  loewishpifb600000000000000"""Demonstrates basic walking using DOM level 2 iterators"""

from xml.dom.ext.reader import PyExpat
from xml.dom.NodeFilter import NodeFilter

def Iterate(xml_dom_object):
    print "Printing all nodes:"
    nit = xml_dom_object.ownerDocument.createNodeIterator(xml_dom_object, NodeFilter.SHOW_ALL, None, 0)

    curr_node =  nit.nextNode()
    while curr_node:
        print "%s node %s\n"%(curr_node.nodeType, curr_node.nodeName)
        curr_node =  nit.nextNode()

    print "\n\n\nPrinting only element nodes:"
    snit = xml_dom_object.ownerDocument.createNodeIterator(xml_dom_object, NodeFilter.SHOW_ELEMENT, None, 0)

    curr_node =  snit.nextNode()
    while curr_node:
        print "%s node %s\n"%(curr_node.nodeType, curr_node.nodeName)
        curr_node = snit.nextNode()


if __name__ == '__main__':
    import sys
    reader = PyExpat.Reader()
    xml_dom_object = reader.fromUri(sys.argv[1])
    Iterate(xml_dom_object)
    reader.releaseNode(xml_dom_object)
PyXML-0.8.4/demo/dom/link_title_invert.py0000644001241000117560000000227207413602734021013 0ustar  loewishpifb600000000000000from xml.dom import Node, ext
from xml.dom.ext.reader import PyExpat

test_doc = """<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>LADIES</title></head>
<body>
<h1>LADIES</h1>
<h2><a name="A">Agathas</a></h2>
Four and forty lovers had Agathas in the old days,...
<h2><a name="B">Young Lady</a></h2>
I have fed your lar with poppies,...
<h2><a name="C">Lesbia Illa</a></h2>
Memnon, Memnon, that lady...
</body>
</html>
"""

def link_title_invert():
    #build a DOM tree from the file
    reader = PyExpat.Reader()
    doc = reader.fromString(test_doc)

    h2_elements = doc.getElementsByTagNameNS('http://www.w3.org/1999/xhtml', 'h2')
    for e in h2_elements:
        parent = e.parentNode
        a_list = filter(lambda x: (x.nodeType == Node.ELEMENT_NODE) and (x.localName == 'a'), e.childNodes)
        a = a_list[0]
        e.removeChild(a)
        for node in a.childNodes:
            #Automatically also removes the child from a
            e.appendChild(node)
        parent.replaceChild(a, e)
        a.appendChild(e)

    ext.Print(doc)

    #reclaim the object; not necessary with Python 2.0
    reader.releaseNode(doc)

if __name__ == '__main__':
    import sys
    link_title_invert()
PyXML-0.8.4/demo/dom/trace_ns.py0000644001241000117560000000223107413602734017057 0ustar  loewishpifb600000000000000'''
Walk through a namespace-compliant XML file and print out the
the namespaces of all elements and attributes in document order
'''

from xml.dom.ext.reader import PyExpat
from xml.dom.NodeFilter import NodeFilter

def TraceNs(doc):
    snit = doc.createNodeIterator(doc, NodeFilter.SHOW_ELEMENT, None, 0)
    curr_elem = snit.nextNode()
    while curr_elem:
        print "Current Element", curr_elem.nodeName
        #FIXME: put a GetDefaultNs method into Ext
        #ns = Namespace.GetDefaultNs(curr_elem)
        #print "\tDefault NS\t", ns
        print "\t"+curr_elem.nodeName+"\t\t", curr_elem.namespaceURI

        header_printed = 0

        for k in curr_elem.attributes.keys():
            if curr_elem.attributes[k].namespaceURI:
                if not header_printed:
                    header_printed = 1
                    print "\tAttributes"
                print "\t\t"+curr_elem.attributes[k].nodeName+"\t", curr_elem.attributes[k].namespaceURI

        print
        curr_elem = snit.nextNode()


if __name__ == "__main__":
    import sys
    reader = PyExpat.Reader()
    doc = reader.fromUri(sys.argv[1])
    TraceNs(doc)
    reader.releaseNode(doc)
PyXML-0.8.4/demo/dom/visitor1.py0000644001241000117560000000235507413602734017050 0ustar  loewishpifb600000000000000"""Demonstrates basic, pre-order DOM walking using the default, bare-bones visitor"""

from xml.dom.ext.reader import PyExpat
from xml.dom import Node
from xml.dom.ext import Visitor
from xml.dom.ext.reader import Sax2
from xml.dom.ext import ReleaseNode

class NsVisitor(Visitor.Visitor):
    def visit(self, node):
        print "Node %s namespaceURI: '%s' qualified name: '%s' localName: '%s' prefix: '%s'\n"%(str(node), node.namespaceURI, node.nodeName, node.localName, node.prefix)
        if node.nodeType == Node.ELEMENT_NODE:
            for k in node.attributes.keys():
                print "Node %s namespaceURI: '%s' qualified name: '%s' localName: '%s' prefix: '%s'\n"%(str(node.attributes[k]), node.attributes[k].namespaceURI, node.attributes[k].nodeName, node.attributes[k].localName, node.attributes[k].prefix)
        return None


def Walk(xml_dom_object):
    visitor = Visitor.Visitor()
    walker = Visitor.Walker(visitor, xml_dom_object)
    walker.run()

    visitor = NsVisitor()
    walker = Visitor.Walker(visitor, xml_dom_object)
    walker.run()


if __name__ == '__main__':
    import sys
    reader = PyExpat.Reader()
    xml_dom_object = reader.fromUri(sys.argv[1])
    Walk(xml_dom_object)
    reader.releaseNode(xml_dom_object)
PyXML-0.8.4/demo/dom/xll_replace.py0000644001241000117560000000371607413602734017564 0ustar  loewishpifb600000000000000"""
Demonstrates some advanced DOM manipulation.
This function looks for simple XLinks and replaces the node containing
such links with the contents of the referenced document.
"""

from xml.dom import Node
from xml.dom.NodeFilter import NodeFilter
from xml.dom import ext
from xml.dom.ext.reader import PyExpat

def XllReplace(start_node):
    reader = PyExpat.Reader()
    owner_doc = start_node.ownerDocument
    snit = owner_doc.createNodeIterator(start_node, NodeFilter.SHOW_ELEMENT, None, 0)
    curr_node = snit.nextNode()
    while curr_node:
        #Only empty nodes are allowed to have Links
        if not curr_node.childNodes.length and curr_node.attributes:
            is_link = 0
            href = None
            for k in curr_node.attributes.keys():
                if (curr_node.attributes[k].localName, curr_node.attributes[k].namespaceURI) == ("link", "http://www.w3.org/XML/XLink/0.9"):
                    is_link = 1
                elif (curr_node.attributes[k].localName, curr_node.attributes[k].namespaceURI) == ("href", "http://www.w3.org/XML/XLink/0.9"):
                    href = curr_node.attributes[k].value
            if is_link and href:
               #Then make a tree of the new file and insert it
                f = open(href, "r")
                st = f.read()
                new_df = reader.fromString(st, ownerDoc=start_node.ownerDocument)

                #Get the first element node and assume it's the document node
                for a_node in new_df.childNodes:
                    if a_node.nodeType == Node.ELEMENT_NODE:
                        doc_root = a_node
                        break
                curr_node.parentNode.replaceChild(doc_root, curr_node)
        curr_node = snit.nextNode()

    return start_node

if __name__ == "__main__":
    import sys
    reader = PyExpat.Reader()
    xml_dom_tree = reader.fromUri(sys.argv[1])
    XllReplace(xml_dom_tree)
    ext.PrettyPrint(xml_dom_tree)
    reader.releaseNode(xml_dom_tree)
PyXML-0.8.4/demo/dom/xpointer_query.py0000644001241000117560000000121107413602734020353 0ustar  loewishpifb600000000000000"""Demonstrates using the xptr.py tool to query DOM Nodes using the XPointer spec"""

from xml.dom import ext
from xml.dom.ext.reader import Sax2
import xptr


if __name__ == '__main__':
    import sys

    xpointer_expr = sys.argv[1]

    try:
        xml_dom_object = Sax2.FromXmlUrl(sys.argv[2], validate=0)
    except Sax.saxlib.SAXException, msg:
        print "SAXException caught:", msg
    except Sax.saxlib.SAXParseException, msg:
        print "SAXParseException caught:", msg

    result_node = xptr.LocateNode(xml_dom_object, xpointer_expr)
    ext.StripXml(result_node)
    ext.PrettyPrint(result_node)
    ext.ReleaseNode(result_node)
PyXML-0.8.4/demo/dom/xptr.py0000644001241000117560000004376307413602734016275 0ustar  loewishpifb600000000000000"""
This is an experimental implementation of the XPointer locator language.

Version 0.20 - 23.Aug.98
   Lars Marius Garshol - larsga@ifi.uio.no
   http://www.stud.ifi.uio.no/~larsga/download/python/xml/xptr.html

Changes since version 0.10:
 - 'id' locator term implemented
 - 'attr' locator term implemented
 - node type qualifiers implemented
 - 'origin' locator term implemented

Modified by Uche Ogbuji 25.Jan.99 to work with 4DOM.
Modified by Uche Ogbuji 18.Nov.99 to work with the emerging Python/DOM binding 4DOM.  Distributed with permission.
"""

import re,string,sys
from xml.dom import Node
from xml.dom import ext

# Spec deviations:
# - html keyword not supported
# - negative instance numbers not supported
# - #cdata node type selector not supported
# - * for attribute values/names not supported
# - preceding keyword not supported
# - span keyword unsupported
# - support 'string' location terms

# Spec questions
# - what if locator fails?
# - what to do with "span(...).child(1)"?
# - how to continue from a set of selected nodes?
# - attr: error if does not use element as source?
# - should distinguish between semantic errors and failures?
# - can string terms locate inside attr vals?
# - are the string loc semantics a bit extreme? perhaps restrict to one node?
# - how to represent span and string results in terms of the DOM?

# Global variables

version="0.20"
specver="WD-xptr-19980303"

# Useful regular expressions

reg_sym=re.compile("[a-z]+|\\(|\\)|\\.|[-+]?[1-9][0-9]*|[A-Za-z_:][\-A-Za-z_:.0-9]*|,|#[a-z]+|\\*|\"[^\"]*\"|'[^']*'")
reg_sym_param=re.compile(",|\)|\"|'")
reg_name=re.compile("[A-Za-z_:][\-A-Za-z_:.0-9]*")

# Some exceptions

class XPointerException(Exception):
    "Means something went wrong when attempting to follow an XPointer."
    pass

class XPointerParseException(XPointerException):
    "Means the XPointer was syntactically invalid."

    def __init__(self,msg,pos):
        self.__msg=msg
        self.__pos=pos

    def get_pos(self):
        return self.__pos

    def __str__(self):
        return self.__msg % self.__pos

class XPointerFailedException(XPointerException):
    "Means the XPointer was logically invalid."
    pass

class XPointerUnsupportedException(XPointerException):
    "Means the XPointer used unsupported constructs."
    pass

# Simple XPointer lexical analyzer

class SymbolGenerator:
    "Chops XPointers up into distinct symbols."

    def __init__(self,xpointer):
        self.__data=xpointer
        self.__pos=0
        self.__last_was_param=0
        self.__next_is=""

    def get_pos(self):
        "Returns the current position in the string."
        return self.__pos

    def more_symbols(self):
        "True if there are more symbols in the XPointer."
        return self.__pos<len(self.__data) or self.__next_is!=""

    def next_symbol(self):
        "Returns the next XPointer symbol."
        if self.__next_is!="":
            tmp=self.__next_is
            self.__next_is=""
            return tmp

        if self.__last_was_param:
            self.__last_was_param=0
            sym=""
            count=0

            while self.more_symbols():
                n=self.next_symbol()
                if n=='"' or n=="'":
                    pos=string.find(self.__data,n,self.__pos)
                    if pos==-1:
                        raise XPointerParseException("Unmatched %s at %d" % \
                                                     n,self.__pos)
                    sym=self.__data[self.__pos-1:pos+1]
                    self.__pos=pos+1
                elif n=="(":
                    count=count+1
                elif n==")":
                    count=count-1
                    if count<0:
                        if sym=="":
                            return ")"
                        else:
                            self.__next_is=")"
                            return sym
                elif n=="," and count==0:
                    self.__last_was_param=1
                    self.__next_is=","
                    return sym

                sym=sym+n

        mo=reg_sym.match(self.__data,self.__pos)
        if mo==None:
            raise XPointerParseException("Invalid symbol at position %d",
                                         self.__pos)

        self.__pos=self.__pos+len(mo.group(0))

        self.__last_was_param= mo.group(0)=="("
        return mo.group(0)

# Simple XPointer parser

class XPointerParser:
    """Simple XPointer parser that parses XPointers firing events that receive
    terms and parameters."""

    def __init__(self,xpointer):
        self.__sgen=SymbolGenerator(xpointer)
        self.__first_term=1
        self.__prev=None

    def __skip_over(self,symbol):
        if self.__sgen.next_symbol()!=symbol:
            raise XPointerParseException("Expected '"+symbol+"' at %s",
                                         self.__sgen.get_pos())

    def __is_valid(self,symbol,regexp):
        mo=regexp.match(symbol)
        return mo!=None and len(mo.group(0))==len(symbol)

    def __parse_instance_or_all(self,iora):
        if iora!="all":
            try:
                return int(iora)
            except ValueError,e:
                raise XPointerParseException("Expected number or 'all' at %s",
                                             self.__sgen.get_pos())
        else:
            return "all"

    def parse(self):
        "Runs through the entire XPointer, firing events."
        sym="."
        while sym==".":
            name=self.__sgen.next_symbol()

            if name=="(":
                name=""   # Names can be defaulted
            else:
                self.__skip_over("(")

            sym=self.__sgen.next_symbol()
            if sym!=")":
                params=[sym]
                sym=self.__sgen.next_symbol()
            else:
                params=[]

            while sym==",":
                params.append(self.__sgen.next_symbol())
                sym=self.__sgen.next_symbol()

            if sym!=")":
                raise XPointerParseException("Expected ')' at %s",
                                             self.__sgen.get_pos())

            self.dispatch_term(name,params)

            if self.__sgen.more_symbols():
                sym=self.__sgen.next_symbol()
            else:
                return

        # If the XPointer ends correctly, we'll return from the if above
        raise XPointerParseException("Expected '.' at %s",
                                     self.__sgen.get_pos())

    def dispatch_term(self,name,params):
        """Called when a term is encountered to analyze it and fire more
        detailed events."""
        if self.__first_term:
            if name=="root" or name=="origin" or name=="id" or name=="html":
                if name=="root" or name=="origin":
                    if len(params)!=0:
                        raise XPointerParseException(name+" terms have no "
                                                     "parameters (at %s)",
                                                     self.__sgen.get_pos())
                    else:
                        param=None
                elif name=="id" or name=="html":
                    if len(params)!=1:
                        raise XPointerParseException(name+" terms require one "
                                                     "parameter (at %s)",
                                                     self.__sgen.get_pos())
                    else:
                        param=params[0]
                        # XXX Validate parameter

                self.__first_term=0
                self.handle_abs_term(name,param)
                return
            else:
                self.handle_abs_term("root",None)
        else:
            if name=="" and self.__prev!=None:
                name=self.__prev

        if name=="child" or name=="ancestor" or name=="psibling" or \
           name=="fsibling" or name=="descendant" or name=="following" or \
           name=="preceding":
            self.parse_rel_term(name,params)
        elif name=="span":
            self.parse_span_term(params)
        elif name=="attr":
            self.parse_attr_term(params)
        elif name=="string":
            self.parse_string_term(params)
        else:
            raise XPointerParseException("Illegal term type "+name+\
                                         " at %s",self.__sgen.get_pos())

        self.__prev=name

    def parse_rel_term(self,name,params):
        "Parses the arguments of relative location terms and fires the event."
        no=self.__parse_instance_or_all(params[0])

        if len(params)>1:
            type=params[1]
            if not (type=="#element" or type=="#pi" or type=="#comment" or \
                    type=="#text" or type=="#cdata" or type=="#all" or \
                    self.__is_valid(type,reg_name)):
                raise XPointerParseException("Invalid type at %s",
                                             self.__sgen.get_pos())
        else:
            type="#element"

        attrs=[]
        ix=2
        while ix+1<len(params):
            if not self.__is_valid(params[ix],reg_name):
                raise XPointerParseException("Not a valid name at %s",
                                             self.__sgen.get_pos())

            attrs.append((params[ix],params[ix+1]))
            ix=ix+2

        self.handle_rel_term(name,no,type,attrs)

    def parse_span_term(self,params):
        "Parses the arguments of the span term and fires the event."
        raise XPointerUnsupportedException("'span' keyword unsupported.")

    def parse_attr_term(self,params):
        "Parses the argument of the attr term and fires the event."
        if len(params)!=1:
            raise XPointerParseException("'attr' location terms must have "
                                         "exactly one parameter (at %s)",
                                         self.__sgen.get_pos())

        if not self.__is_valid(params[0],reg_name):
            raise XPointerParseException("'%s' is not a valid attribute "
                                         "name at %s" % name,
                                         self.__sgen.get_pos())

        self.handle_attr_term(params[0])

    def parse_string_term(self,params):
        "Parses the argument of the string term and fires the event."
        no=self.__parse_instance_or_all(params[0])

        if len(params)>1:
            skiplit=params[1]
        else:
            skiplit=None

        if len(params)>2:
            if params[2]=="end":
                pos="end"
            else:
                try:
                    pos=int(params[2])
                except ValueError,e:
                    raise XPointerParseException("Expected number at %s",
                                                 self.__sgen.get_pos())

                if pos==0:
                    raise XPointerParseException("0 is not an acceptable "
                                                 "value at %s",
                                                 self.__sgen.get_pos())
        else:
            pos=None

        if len(params)>3:
            try:
                length=int(params[3])
            except ValueError,e:
                raise XPointerParseException("Expected number at %s",
                                             self.__sgen.get_pos())
        else:
            length=0

        self.handle_string_term(no,skiplit,pos,length)

    # Event methods to be overridden

    def handle_abs_term(self,name,param):
        "Called to handle absolute location terms."
        pass

    def handle_rel_term(self,name,no,type,attrs):
        "Called to handle relative location terms."
        pass

    def handle_attr_term(self,attr_name):
        "Called to handle 'attr' location terms."
        pass

    def handle_span_term(self,frm,to):
        "Called to handle 'span' location terms."
        pass

    def handle_string_term(self,no,skiplit,pos,length):
        "Called to handle 'string' location terms."
        pass

# ----- XPointer implementation that navigates a DOM tree

# Iterator classes

class DescendantIterator:

    def __init__(self):
        self.stack=[]

    def __call__(self,node):
        next=node.firstChild
        if next==None:
            next=node.nextSibling

        while next==None:
            if self.stack==[]:
                raise XPointerFailedException("No matching node")
            next=self.stack[-1].nextSibling
            del self.stack[-1]

        self.stack.append(next)
        return next

class FollowingIterator:

    def __init__(self):
        self.seen_hash={}
        self.skip_child=0

    def __call__(self,node):
        if not self.skip_child:
            next=node.firstChild
        else:
            self.skip_child=0
            next=None

        if next==None:
            next=node.getNextSibling()
        if next==None:
            next=node.parentNode
            self.skip_child=1   # Don't go down, we've been there :-)

            if next.GI=="#DOCUMENT":
                raise XPointerFailedException("No matching node")

            if self.seen_hash.has_key(next.id()):
                next=node.nextSibling
                prev=node

                while next==None:
                    next=prev.parentNode
                    self.skip_child=1   # Don't go down, we've been there :-)
                    prev=next
                    if next.nodeName=="#DOCUMENT":
                        raise XPointerFailedException("No matching node")
                    if self.seen_hash.has_key(next.id()):
                        next=prev.nextSibling
                        if next!=None:
                            self.skip_child=0
            else:
                # We're above all the nodes we've looked at. Throw out the
                # hashed objects.
                self.seen_hash.clear()

        self.seen_hash[next.id()]=1
        return next

# The implementation itself

class XDOMLocator(XPointerParser):
    def __init__(self, xpointer, document):
        XPointerParser.__init__(self, xpointer)
        self.__node=document
        self.__first=1
        self.__prev=None

    def __node_matches(self,node,type,attrs):
        "Checks whether a DOM node matches a foo(2,SECTION,ID,I5) selector."
        if type==node.nodeName or \
           (type=="#element" and node.nodeType == Node.ELEMENT_NODE) or \
           (type=="#pi"      and node.nodeType == Node.PROCESSING_INSTRUCTION_NODE) or \
           (type=="#comment" and node.nodeType == Node.COMMENT_NODE) or \
           (type=="#text"    and node.nodeType == Node.TEXT_NODE) or \
           (type=="#cdata"   and node.nodeType == Node.CDATA_SECTION_NODE) or \
           type=="#all":
            if attrs!=None:
                for (a,v) in attrs:
                    try:
                        if v!=node.getAttribute(a):
                            return 0
                    except KeyError,e:
                        return 0

            return 1
        else:
            return 0

    def __get_node(self,no,type,attrs,iterator):
        """General method that iterates through the tree calling the iterator
        on the current node for each step to get the next node."""
        count=0
        current=iterator(self.__node)

        while current!=None:
            if self.__node_matches(current,type,attrs):
                count=count+1
                if count==no:
                    return current

            current=iterator(current)

        raise XPointerFailedException("No matching node")

    def __get_child(self,no,type,attrs):
        if type==None:
            candidates = self.__node.childNodes
        else:
            candidates = []

            for obj in self.__node.childNodes:
                if self.__node_matches(obj,type,attrs):
                    candidates.append(obj)
        try:
            return candidates[no-1]
        except IndexError,e:
            raise XPointerFailedException("No matching node")

    def get_node(self):
        "Returns the located node."
        return self.__node

    def handle_abs_term(self,name,param):
        "Called to handle absolute location terms."
        if name=="root":
            if self.__node.nodeType != Node.DOCUMENT_NODE:
                raise XPointerFailedException("Expected document node")
            self.__node=self.__node.documentElement
        elif name=="origin":
            pass # Just work from current node
        elif name=="id":
            self.__node=ext.GetElementById(self.__node, param)
        elif name=="html":
            raise XPointerUnsupportedException("Term type 'html' unsupported.")

    def handle_rel_term(self,name,no,type,attrs):
        "Called to handle relative location terms."
        if name=="child":
            next=self.__get_child(no,type,attrs)
        elif name=="ancestor":
            next=self.__get_node(no,type,attrs,DOM.Node._get_parentNode)
        elif name=="psibling":
            next=self.__get_node(no,type,attrs,DOM.Node._get_previousSibling)
        elif name=="fsibling":
            next=self.__get_node(no,type,attrs,DOM.Node._get_nextSibling)
        elif name=="descendant":
            next=self.__get_node(no,type,attrs,DescendantIterator())
        elif name=="following":
            next=self.__get_node(no,type,attrs,FollowingIterator())

        self.__node=next
        self.__prev=name

    def handle_attr_term(self, attr_name):
        if __node.nodeType != Node.ELEMENT_NODE:
            raise XPointerFailedException("'attr' location term used from "
                                          "non-element node")

        if not self.__node.attributes.has_key(attr_name):
            raise XPointerFailedException("Non-existent attribute '%s' located"
                                          " by 'attr' term" % attr_name)

        self.__node=self.__node.attributes.getNamedItem(attr_name)

    def handle_string_term(self,no,skiplit,pos,length):
        raise XPointerUnsupportedException("'string' location terms not "
                                           "supported")


def LocateNode(node, xpointer):
    try:
        xp=XDOMLocator(xpointer, node)
        xp.parse()
        return xp.get_node()
    except XPointerParseException,e:
        print "ERROR: "+str(e)
PyXML-0.8.4/demo/genxml/0000755001241000117560000000000010152625721015417 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/genxml/README0000644001241000117560000000242307001374222016274 0ustar  loewishpifb600000000000000This example demonstrates how to generate XML from non-XML data
sources.  This example is based directly on an example presented by
Tom Gavin and Joseph E. Hughes at the August 1999 Washington DC
SGML/XML User's Group meeting.  PowerPoint slides containing the
original DOM-based solution in Java are available at
http://www.eccnet.com/sgmlug/.

Since the specifics of reading other data formats vary greatly, this
example will use a simple comma-separated-value format similar to that
found as an "export" format for many applications which work with
tabular data.  A sample data file is contained in data.txt.

The loaddata.py script demonstrates three different approaches to XML
generation: DOM-based, SAX-based, and <file>.write()-based.  The first 
two approaches are specific to generating XML, while the third could
be used to generate any format.  It is interesting to note the
differences in code size to get roughly the same output using each of
the three approaches.

The script's main() function does little but parse the command line,
selecting the processing class appropriately.  Processing consists of
instantiating the processing class and calling its run() method.
Concrete subclasses of the abstract processing class determine the
actual machinery used to create the XML output.
PyXML-0.8.4/demo/genxml/data.txt0000644001241000117560000000012407001374222017062 0ustar  loewishpifb600000000000000lname,fname,emp,manager
Jones,Tom,1111,1111
Smith,John,2222,1111
Doe,Jane,3333,1111
PyXML-0.8.4/demo/genxml/loaddata.py0000644001241000117560000001762707165177650017574 0ustar  loewishpifb600000000000000#! /usr/bin/env python
"""
%(program)s -- example script to convert comma-separated value file to
               XML using the Document Object Model (DOM), the Simple
               API for XML (SAX), or the 'write' model (a bunch of calls
               to <file>.write()).

Usage:  %(program)s [--dom|--sax|--write] [infile [outfile]]
"""
__version__ = '$Revision: 1.3 $'

import getopt
import os
import string
import sys

# Note that we only need one of these for any given version of the
# processing class.
#
from xml.dom.DOMImplementation import implementation
import xml.sax.writer
import xml.utils


def main():
    """Process command line parameters and run the conversion."""
    inpath = "-"
    outpath = "-"
    args = sys.argv[1:]
    processor_class = DOMProcess
    try:
        opts, args = getopt.getopt(args, "dhsw",
                                   ["dom", "help", "sax", "write"])
    except getopt.error, e:
        usage(err=e, rc=2)
    for opt, arg in opts:
        if opt in ("-d", "--dom"):
            processor_class = DOMProcess
        elif opt in ("-h", "--help"):
            usage()
        elif opt in ("-s", "--sax"):
            processor_class = SAXProcess
        elif opt in ("-w", "--write"):
            processor_class = WriteProcess
    if len(args) == 2:
        inpath, outpath = args
    elif len(args) == 1:
        inpath = args[0]
    elif len(args) == 0:
        pass
    else:
        usage(err="too many command-line arguments", rc=2)

    infp = get_input(inpath)
    outfp = get_output(outpath)

    processor = processor_class(infp, outfp)
    processor.run()

    infp.close()
    outfp.close()


class BaseProcess:
    """Base class for the conversion processors.  Each concrete subclass
    must provide the following methods:

    initOutput()
        Initialize the output stream and any internal data structures
        that the conversion process needs.

    addRecord(lname, fname, type)
        Add one record to the output stream (or the internal structures)
        where lname is the last name, fname is the first name, and type
        is either 'manager' or 'employee'.

    finishOutput()
        Finish all output generation.  If all work has been on internal
        data structures, this is where they should be converted to text
        and written out.
    """
    def __init__(self, infp, outfp):
        """Store the input and output streams for later use."""
        self.infp = infp
        self.outfp = outfp

    def run(self):
        """Perform the complete conversion process.

        This method is responsible for parsing the input and calling the
        subclass-provided methods in the right order.
        """
        self.initOutput()
        self.infp.readline()            # ignore field names
        rec = self.getNextRecord()
        while rec:
            lname, fname, type = rec
            self.addRecord(lname, fname, type)
            rec = self.getNextRecord()
        self.finishOutput()

    def getNextRecord(self):
        """Read and return the next input record, or return None."""
        line = self.infp.readline()
        if line:
            parts = map(string.strip, string.split(line, ','))
            lname, fname, eid, mid = parts
            type = ("employee", "manager")[eid == mid]
            return lname, fname, type
        else:
            return None


class DOMProcess(BaseProcess):
    """Concrete conversion process which uses a DOM structure as an
    internal data structure.

    Content is added to the DOM tree for each input record, and the
    entire tree is serialized and written to the output stream in the
    finishOutput() method.
    """
    def initOutput(self):
        # Create a new document with no namespace uri, qualified name,
        # or document type
        self.document = implementation.createDocument(None,None,None)
        self.personnel = self.document.createElement("personnel")
        self.document.appendChild(self.personnel)

    def addRecord(self, lname, fname, type):
        doc = self.document
        self.personnel.appendChild(doc.createTextNode("\n  "))
        emp = doc.createElement("employee")
        emp.setAttribute("type", type)
        self.personnel.appendChild(emp)
        emp.appendChild(doc.createTextNode("\n    "))
        ln = doc.createElement("lname")
        ln.appendChild(doc.createTextNode(lname))
        emp.appendChild(ln)
        emp.appendChild(doc.createTextNode("\n    "))
        fn = doc.createElement("fname")
        fn.appendChild(doc.createTextNode(fname))
        emp.appendChild(fn)
        emp.appendChild(doc.createTextNode("\n  "))

    def finishOutput(self):
        t = self.document.createTextNode("\n")
        self.personnel.appendChild(t)
        # XXX toxml not supported by 4DOM
        # self.outfp.write(self.document.toxml())
        xml.dom.ext.PrettyPrint(self.document, self.outfp)
        self.outfp.write("\n")


class SAXProcess(BaseProcess):
    """Concrete conversion process that uses a SAX implementation that
    writes output to a file.

    XML is generated by calling the SAX methods that would be called
    when the resulting document instance is parsed.  Data is written to
    the output stream incrementally with this approach, and no real
    internal state is maintained.
    """
    def initOutput(self):
        info = xml.sax.writer.XMLDoctypeInfo()
        info.add_element_container("personnel")
        info.add_element_container("employee")
        saxout = self.saxout = xml.sax.writer.PrettyPrinter(
            self.outfp, dtdinfo=info)
        saxout.startDocument()
        saxout.startElement("personnel", {})

    def addRecord(self, lname, fname, type):
        saxout = self.saxout
        saxout.startElement("employee", {"type": type})
        saxout.startElement("lname", {})
        saxout.characters(lname, 0, len(lname))
        saxout.endElement("lname")
        saxout.startElement("fname", {})
        saxout.characters(fname, 0, len(fname))
        saxout.endElement("fname")
        saxout.endElement("employee")

    def finishOutput(self):
        self.saxout.endElement("personnel")
        self.saxout.endDocument()


class WriteProcess(BaseProcess):
    """Concrete conversion process that simply formats the XML
    directly and uses the write() method of a file to write it out.

    The only helper function used to generate the XML is the
    xml.utils.escape() function; the methods of this class are
    solely responsible for proper formatting of the markup.
    """
    #
    # Note the simplicity of using a bunch of write() calls; using print
    # statements would also be reasonable in many contexts.
    #
    def initOutput(self):
        self.outfp.write('<?xml version="1.0" encoding="iso-8859-1"?>\n')
        self.outfp.write("<personnel>\n")

    def addRecord(self, lname, fname, type):
        self.outfp.write('  <employee type="%s">\n' % type)
        self.outfp.write("    <lname>%s</lname>\n" % xml.utils.escape(lname))
        self.outfp.write("    <fname>%s</fname>\n" % xml.utils.escape(fname))
        self.outfp.write("  </employee>\n")

    def finishOutput(self):
        self.outfp.write("</personnel>\n")


def get_input(path):
    """Get input file from path; '-' indicates stdin."""
    if path == "-":
        return sys.stdin
    else:
        return open(path)


def get_output(path):
    """Get output file from path; '-' indicates stdout."""
    if path == "-":
        return sys.stdout
    else:
        return open(path, "w")


def usage(err=None, rc=0):
    """Write out a usage message, possibly to stderr.

    If err or rc are true, the message is written to stderr instead of
    stdout.  The script docstring is used as the source of help text.
    Exits with result code rc.
    """
    if err or rc:
        sys.stdout = sys.stderr
    program = os.path.basename(sys.argv[0])
    if err:
        print "%s: %s" % (program, str(err))
    vars = {"program": program}
    print __doc__ % vars
    sys.exit(rc)


if __name__ == "__main__":
    main()
PyXML-0.8.4/demo/quotes/0000755001241000117560000000000010152625721015445 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/quotes/README0000644001241000117560000000163607175443715016347 0ustar  loewishpifb600000000000000The files in this directory demonstrate maintaining a quotation
collection in XML.  The still-unnamed markup language contains
'quotation' elements, which contain the text of the quotation and
optional 'author' and 'source' elements.  For the quotation text,
there are some simple semantic markups such as 'em', 'cite', and
'foreign'.

quotations.dtd		DTD for the markup language.
sample.xml              A sample quotation file.
qtfmt.py		Program to read a file marked up using the language
			specified in quotations.dtd, and output the 
			list in HTML, text, or fortune format.

The qtfmt.py script requires Python 2.0, since it assumes UTF-8 output
and uses the codecs module to convert its output to Latin-1.

Contact amk1@bigfoot.com if you have questions or comments about the
contents of this directory.  For the author's complete quotation
collections, please go to http://starship.python.net/crew/amk/quotations/


PyXML-0.8.4/demo/quotes/qtfmt.py0000644001241000117560000003311007413602734017155 0ustar  loewishpifb600000000000000#!/usr/bin/env python
#
# qtfmt.py v1.10
# v1.10 : Updated to use Python 2.0 Unicode type.
#
# Read a document in the quotation DTD, converting it to a list of Quotation
# objects.  The list can then be output in several formats.

__doc__ = """Usage: qtfmt.py [options] file1.xml file2.xml ...
If no filenames are provided, standard input will be read.
Available options:
  -f or --fortune   Produce output for the fortune(1) program
  -h or --html      Produce HTML output
  -t or --text      Produce plain text output
  -m N or --max N   Suppress quotations longer than N lines;
                    defaults to 0, which suppresses no quotations at all.
"""

import string, re, cgi, types
import codecs

from xml.sax import saxlib, saxexts

def simplify(t, indent="", width=79):
    """Strip out redundant spaces, and insert newlines to
    wrap the text at the given width."""
    t = string.strip(t)
    t = re.sub('\s+', " ", t)
    if t=="": return t
    t = indent + t
    t2 = ""
    while len(t) > width:
        index = string.rfind(t, ' ', 0, width)
        if index == -1: t2 = t2 + t[:width] ; t = t[width:]
        else: t2 = t2 + t[:index] ; t = t[index+1:]
        t2 = t2 + '\n'
    return t2 + t

class Quotation:
    """Encapsulates a single quotation.
    Attributes:
    stack -- used during construction and then deleted
    text -- A list of Text() instances, or subclasses of Text(),
            containing the text of the quotation.
    source -- A list of Text() instances, or subclasses of Text(),
            containing the source of the quotation.  (Optional)
    author -- A list of Text() instances, or subclasses of Text(),
            containing the author of the quotation.  (Optional)

    Methods:
    as_fortune() -- return the quotation formatted for fortune
    as_html() -- return an HTML version of the quotation
    as_text() -- return a plain text version of the quotation
    """
    def __init__(self):
        self.stack = [ Text() ]
        self.text = []

    def as_text(self):
        "Convert instance into a pure text form"
        output = ""

        def flatten(textobj):
            "Flatten a list of subclasses of Text into a list of paragraphs"
            if type(textobj) != types.ListType: textlist=[textobj]
            else: textlist = textobj

            paragraph = "" ; paralist = []
            for t in textlist:
                if (isinstance(t, PreformattedText) or
                    isinstance(t, CodeFormattedText) ):
                    paralist.append(paragraph)
                    paragraph = ""
                    paralist.append(t)
                elif isinstance(t, Break):
                    paragraph = paragraph + t.as_text()
                    paralist.append(paragraph)
                    paragraph = ""
                else:
                    paragraph = paragraph + t.as_text()
            paralist.append(paragraph)
            return paralist

        # Flatten the list of instances into a list of paragraphs
        paralist = flatten(self.text)
        if len(paralist) > 1:
            indent = 2*" "
        else:
            indent = ""

        for para in paralist:
            if isinstance(para, PreformattedText) or isinstance(para, CodeFormattedText):
                output = output + para.as_text()
            else:
                output = output + simplify(para, indent) + '\n'
        attr = ""
        for i in ['author', 'source']:
            if hasattr(self, i):
                paralist = flatten(getattr(self, i))
                text = string.join(paralist)
                if attr:
                    attr = attr + ', '
                    text = string.lower(text[:1]) + text[1:]
                attr = attr + text
        attr=simplify(attr, width = 79 - 4 - 3)
        if attr: output = output + '  -- '+re.sub('\n', '\n   ', attr)
        return output + '\n'

    def as_fortune(self):
        return self.as_text() + '%'

    def as_html(self):
        output = "<P>"
        def flatten(textobj):
            if type(textobj) != types.ListType: textlist = [textobj]
            else: textlist = textobj

            paragraph = "" ; paralist = []
            for t in textlist:
                paragraph = paragraph + t.as_html()
                if isinstance(t, Break):
                    paralist.append(paragraph)
                    paragraph = ""
            paralist.append(paragraph)
            return paralist

        paralist = flatten(self.text)
        for para in paralist: output = output + string.strip(para) + '\n'
        attr = ""
        for i in ['author', 'source']:
            if hasattr(self, i):
                paralist = flatten(getattr(self, i))
                text = string.join(paralist)
                attr=attr + ('<P CLASS=%s>' % i) + string.strip(text)
        return output + attr

# Text and its subclasses are used to hold chunks of text; instances
# know how to display themselves as plain text or as HTML.

class Text:
    "Plain text"
    def __init__(self, text=""):
        self.text = text

    # We need to allow adding a string to Text instances.
    def __add__(self, val):
        newtext = self.text + str(val)
        # __class__ must be used so subclasses create instances of themselves.
        return self.__class__(newtext)

    def __str__(self): return self.text
    def __repr__(self):
        s = string.strip(self.text)
        if len(s) > 15: s = s[0:15] + '...'
        return '<%s: "%s">' % (self.__class__.__name__, s)

    def as_text(self): return self.text
    def as_html(self): return cgi.escape(self.text)

class PreformattedText(Text):
    "Text inside <pre>...</pre>"
    def as_text(self):
        return str(self.text)
    def as_html(self):
        return '<pre>' + cgi.escape(str(self.text)) + '</pre>'

class CodeFormattedText(Text):
    "Text inside <code>...</code>"
    def as_text(self):
        return str(self.text)
    def as_html(self):
        return '<code>' + cgi.escape(str(self.text)) + '</code>'

class CitedText(Text):
    "Text inside <cite>...</cite>"
    def as_text(self):
        return '_' + simplify(str(self.text)) + '_'
    def as_html(self):
        return '<cite>' + string.strip(cgi.escape(str(self.text))) + '</cite>'

class ForeignText(Text):
    "Foreign words, from Latin or French or whatever."
    def as_text(self):
        return '_' + simplify(str(self.text)) + '_'
    def as_html(self):
        return '<i>' + string.strip(cgi.escape(str(self.text))) + '</i>'

class EmphasizedText(Text):
    "Text inside <em>...</em>"
    def as_text(self):
        return '*' + simplify(str(self.text)) + '*'
    def as_html(self):
        return '<em>' + string.strip(cgi.escape(str(self.text))) + '</em>'

class Break(Text):
    def as_text(self): return ""
    def as_html(self): return "<P>"

# The QuotationDocHandler class is a SAX handler class that will
# convert a marked-up document using the quotations DTD into a list of
# quotation objects.

class QuotationDocHandler(saxlib.HandlerBase):
    def __init__(self, process_func):
        self.process_func = process_func
        self.newqt = None

    # Errors should be signaled, so we'll output a message and raise
    # the exception to stop processing
    def fatalError(self, exception):
        sys.stderr.write('ERROR: '+ str(exception)+'\n')
        sys.exit(1)
    error = fatalError
    warning = fatalError

    def characters(self, ch, start, length):
        if self.newqt != None:
            s = ch[start:start+length]

            # Undo the UTF-8 encoding, converting to ISO Latin1, which
            # is the default character set used for HTML.
            latin1_encode = codecs.lookup('iso-8859-1') [0]
            unicode_str = s
            s, consumed = latin1_encode( unicode_str )
            assert consumed == len( unicode_str )

            self.newqt.stack[-1] = self.newqt.stack[-1] + s

    def startDocument(self):
        self.quote_list = []

    def startElement(self, name, attrs):
        methname = 'start_'+str(name)
        if hasattr(self, methname):
            method = getattr(self, methname)
            method(attrs)
        else:
            sys.stderr.write('unknown start tag: <' + name + ' ')
            for name, value in attrs.items():
                sys.stderr.write(name + '=' + '"' + value + '" ')
            sys.stderr.write('>\n')

    def endElement(self, name):
        methname = 'end_'+str(name)
        if hasattr(self, methname):
            method = getattr(self, methname)
            method()
        else:
            sys.stderr.write('unknown end tag: </' + name + '>\n')

    # There's nothing to be done for the <quotations> tag
    def start_quotations(self, attrs):
        pass
    def end_quotations(self):
        pass

    def start_quotation(self, attrs):
        if self.newqt == None: self.newqt = Quotation()

    def end_quotation(self):
        st = self.newqt.stack
        for i in range(len(st)):
            if type(st[i]) == types.StringType:
                st[i] = Text(st[i])
        self.newqt.text=self.newqt.text + st
        del self.newqt.stack
        if self.process_func: self.process_func(self.newqt)
        else:
            print "Completed quotation\n ", self.newqt.__dict__
        self.newqt=Quotation()

    # Attributes of a quotation: <author>...</author> and <source>...</source>
    def start_author(self, data):
        # Add the current contents of the stack to the text of the quotation
        self.newqt.text = self.newqt.text + self.newqt.stack
        # Reset the stack
        self.newqt.stack = [ Text() ]
    def end_author(self):
        # Set the author attribute to contents of the stack; you can't
        # have more than one <author> tag per quotation.
        self.newqt.author = self.newqt.stack
        # Reset the stack for more text.
        self.newqt.stack = [ Text() ]

    # The code for the <source> tag is exactly parallel to that for <author>
    def start_source(self, data):
        self.newqt.text = self.newqt.text + self.newqt.stack
        self.newqt.stack = [ Text() ]
    def end_source(self):
        self.newqt.source = self.newqt.stack
        self.newqt.stack = [ Text() ]

    # Text markups: <br/> for breaks, <pre>...</pre> for preformatted
    # text, <em>...</em> for emphasis, <cite>...</cite> for citations.

    def start_br(self, data):
        # Add a Break instance, and a new Text instance.
        self.newqt.stack.append(Break())
        self.newqt.stack.append( Text() )
    def end_br(self): pass

    def start_pre(self, data):
        self.newqt.stack.append( Text() )
    def end_pre(self):
        self.newqt.stack[-1] = PreformattedText(self.newqt.stack[-1])
        self.newqt.stack.append( Text() )

    def start_code(self, data):
        self.newqt.stack.append( Text() )
    def end_code(self):
        self.newqt.stack[-1] = CodeFormattedText(self.newqt.stack[-1])
        self.newqt.stack.append( Text() )

    def start_em(self, data):
        self.newqt.stack.append( Text() )
    def end_em(self):
        self.newqt.stack[-1] = EmphasizedText(self.newqt.stack[-1])
        self.newqt.stack.append( Text() )

    def start_cite(self, data):
        self.newqt.stack.append( Text() )
    def end_cite(self):
        self.newqt.stack[-1] = CitedText(self.newqt.stack[-1])
        self.newqt.stack.append( Text() )

    def start_foreign(self, data):
        self.newqt.stack.append( Text() )
    def end_foreign(self):
        self.newqt.stack[-1] = ForeignText(self.newqt.stack[-1])
        self.newqt.stack.append( Text() )

if __name__ == '__main__':
    import sys, getopt

    # Process the command-line arguments
    opts, args = getopt.getopt(sys.argv[1:], 'fthm:r',
                               ['fortune', 'text', 'html', 'max=', 'help',
                                'randomize'] )
    # Set defaults
    maxlength = 0 ; method = 'as_fortune'
    randomize = 0

    # Process arguments
    for opt, arg in opts:
        if opt in ['-f', '--fortune']:
            method='as_fortune'
        elif opt in ['-t', '--text']:
            method = 'as_text'
        elif opt in ['-h', '--html']:
            method = 'as_html'
        elif opt in ['-m', '--max']:
            maxlength = string.atoi(arg)
        elif opt in ['-r', '--randomize']:
            randomize = 1
        elif opt == '--help':
            print __doc__ ; sys.exit(0)

    # This function will simply output each quotation by calling the
    # desired method, as long as it's not suppressed by a setting of
    # --max.
    qtlist = []
    def process_func(qt, qtlist=qtlist, maxlength=maxlength, method=method):
        func = getattr(qt, method)
        output = func()
        length = string.count(output, '\n')
        if maxlength!=0 and length > maxlength: return
        qtlist.append(output)

    # Loop over the input files; use sys.stdin if no files are specified
    if len(args) == 0: args = [sys.stdin]
    for file in args:
        if type(file) == types.StringType: input = open(file, 'r')
        else: input = file

        # Enforce the use of the Expat parser, because the code needs to be
        # sure that the output will be UTF-8 encoded.
        p=saxexts.XMLParserFactory.make_parser(["xml.sax.drivers.drv_pyexpat"])
        dh = QuotationDocHandler(process_func)
        p.setDocumentHandler(dh)
        p.setErrorHandler(dh)
        p.parseFile(input)

        if type(file) == types.StringType: input.close()
        p.close()

    # Randomize the order of the quotations
    if randomize:
        import whrandom
        q2 = []
        for i in range(len(qtlist)):
            qt = whrandom.randint(0,len(qtlist)-1 )
            q2.append( qtlist[qt] )
            qtlist[qt:qt+1] = []
        assert len(qtlist) == 0
        qtlist = q2

    for quote in qtlist:
        print quote

    # We're done!
PyXML-0.8.4/demo/quotes/quotations.dtd0000644001241000117560000000176606772561171020375 0ustar  loewishpifb600000000000000
<!-- 
     A DTD for storing simple quotations.  This DTD doesn't provide
     sophisticated cross-referencing or anything like that; if you're
     working on the next edition of Barlett's Familiar Quotations,
     you'll need a fancier DTD with more features.  

     Version 1.0 : Sep 5 1998
     A.M. Kuchling (amk1@bigfoot.com)
-->

<!ELEMENT quotations (quotation)*>

<!ELEMENT quotation (#PCDATA | em | foreign | cite | br | pre | code |
                     author | source)* >
<!ELEMENT author (#PCDATA)>
<!ELEMENT source (#PCDATA|cite)*>

<!-- Different forms of emphasis for phrases -->

<!ELEMENT cite (#PCDATA) >
<!ELEMENT code (#PCDATA) >
<!ELEMENT em (#PCDATA) >
<!ELEMENT foreign (#PCDATA) >
<!ELEMENT pre (#PCDATA) >
<!ATTLIST pre xml:space (default|preserve) 'preserve'>

<!-- Break element -->
<!ELEMENT br EMPTY>
 
<!-- Various accents -->

<!ENTITY acirc "&#226;">
<!ENTITY ccedil "&#231;">
<!ENTITY eacute "&#233;">
<!ENTITY iuml "&#239;">
<!ENTITY oacute "&#243;">
<!ENTITY ouml "&#246;">

PyXML-0.8.4/demo/quotes/sample.xml0000644001241000117560000000451207175443715017466 0ustar  loewishpifb600000000000000<?xml version="1.0"?>
<!DOCTYPE quotations SYSTEM "quotations.dtd">

<quotations>

<quotation>
We will perhaps eventually be writing only small modules which are
identified by name as they are used to build larger ones, so that
devices like indentation, rather than delimiters, might become
feasible for expressing local structure in the source language.

<source>Donald E. Knuth, "Structured Programming with goto
Statements", Computing Surveys, Vol 6 No 4, Dec. 1974</source>
</quotation>

<quotation>
I don't know a lot about this artificial life stuff
-- but I'm suspicious of anything Newsweek gets goofy about
-- and I suspect its primary use is as another money extraction tool
to be applied by ai labs to the department of defense
(and more power to 'em).
<br/>
Nevertheless in wondering why free software is so good these days
it occured to me that the propagation of free software is one gigantic
artificial life evolution experiment, but the metaphor isn't perfect.
<br/>
Programs are thrown out into the harsh environment, and the bad ones
die. The good ones adapt rapidly and become very robust in short
order.
<br/>
The only problem with the metaphor is that the process isn't random
at all. Python <em>chooses</em> to include tk's genes; Linux decides
to make itself more suitable for symbiosis with X, etcetera. 
<br/>
Free software is artificial life, but better.
<source>Aaron Watters, 29 Sep 1994</source>
</quotation>

<quotation>
It has also been referred to as the "Don Beaudry <em>hack</em>," but
that's a misnomer.  There's nothing hackish about it -- in fact,
it is rather elegant and deep, even though there's something dark
to it.
<source>Guido van Rossum, <cite>Metaclass Programming in Python 1.5</cite></source>
</quotation>

<quotation>
This is not a technical issue so much as a human issue; we 
are limited and so is our time.  (Is this a bug or a feature of time?
Careful; trick question!)
<source>Fred Drake on the Documentation SIG, 9 Sep 1998</source> 
</quotation>

<quotation>
Counting is the most simple and primitive of narratives -- 1 2 3 4 5 6
7 8 9 10 --  a tale with a beginning, a middle and an end and a sense
of progression -- arriving at a finish of two digits -- a goal
attained, a denouement reached.
<author>Peter Greenaway</author>
<source><cite>Fear of Drowning By Numbers</cite> (1988)</source>
</quotation>

</quotations>
PyXML-0.8.4/demo/sax/0000755001241000117560000000000010152625721014720 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/sax/README0000644001241000117560000000205707165434556015622 0ustar  loewishpifb600000000000000These examples demonstrate the Python SAX API, version 1. In all examples,
the sax driver can be specified by setting the PY_SAX_PARSER environment
variable. Valid settings are 
- xml.sax.drivers.drv_pyexpat
- xml.sax.drivers.drw_xmlproc
- xml.sax.drivers.drv_sgmlop
as well as any other driver listed in the xml/sax/drivers directory.

sax2obj.py     ???
saxdemo.py     Parses an XML file, and prints it in canonical form.
               Invoke as 'python saxdemo.py filename.xml'.
               The standard driver will be pyexpat. 
               Alternative drivers can be specified with the -d option 
               of saxdemo.py; the prefix 'xml.sax.drivers.drv_' is 
               automatically added to the driver.
saxhack.py     appears to be broken
saxstats.py    Prints statistics about an xml file.
saxtimer.py    Times parsing a document; arguments are the parser name
               (the prefix 'xml.sax.drivers.drv_' is automatically added)
               and the document name.
saxtrace.py    parses a document using xmlproc, and prints all SAX events.PyXML-0.8.4/demo/sax/sax2obj.py0000644001241000117560000001023307413602734016646 0ustar  loewishpifb600000000000000"""
A general XML element -> Python object converter based on SAX.
"""

from xml.sax import saxexts,saxlib,saxutils
import re,string

reg_ws=re.compile("[%s]+" % string.whitespace)

class ConvSpec:
    """Contains the information needed to convert SAX events to Python
    objects."""

    def __init__(self):
        pass

class SAXObject:

    def __init__(self):
        self._fields={}

    def has_field(self,field):
        return self._fields.has_key(field)

    def get_fields(self):
        return self._fields.keys()

    def get_field(self,field):
        return self._fields[field]

    def set_field(self,field,value):
        self._fields[field]=value

    def display(self):
        for field in self._fields.keys():
            print "%s=%s" % (field,self._fields[field])

    def __getattr__(self,attr):
        try:
            return self._fields[attr]
        except KeyError,e:
            raise AttributeError(str(e))

    def __cmp__(self,obj):
        if id(obj)==id(self):
            return 0
        else:
            return 1

class DocHandler(saxlib.DocumentHandler):

    def __init__(self,target_elem,list_elems,ign_elems,rep_field):
        self.target_elem=target_elem
        self.list_elems=list_elems
        self.ign_elems=ign_elems
        self.rep_field=rep_field

        self.ignoring=0
        self.objects=[]
        self.current=None
        self.cur_data=""
        self.stack=[]

    def startElement(self,name,attrs):
        if self.ignoring:
            return

        if name==self.target_elem:
            self.current=SAXObject()
            for attr in attrs:
                self.current.set_field(attr,attrs[attr])
        elif self.list_elems.has_key(name):
            if not self.current.has_field(name):
                self.current.set_field(name,[])

            self.stack.append(self.current)
            self.current=SAXObject()
        elif self.rep_field.has_key(name) and not self.current.has_field(name):
            self.current.set_field(name,[])
        else:
            if self.ign_elems.has_key(name):
                self.ignoring=self.ignoring+1

        self.cur_data=""

    def characters(self,data,start,length):
        if self.ignoring or self.current==None:
            return

        data=data[start:start+length]
        mo=reg_ws.match(data)
        if mo!=None and mo.end(0)==len(data):
            return

        self.cur_data=self.cur_data+data

    def endElement(self,name):
        if self.ign_elems.has_key(name):
            self.ignoring=self.ignoring-1
            return

        if self.ignoring or self.current==None:
            return

        if name==self.target_elem:
            self.objects.append(self.current)
            self.current=None
        elif self.list_elems.has_key(name):
            obj=self.current
            self.current=self.stack[-1]
            del self.stack[-1]
            self.current.get_field(name).append(obj)
        elif self.rep_field.has_key(name):
            self.current.get_field(name).append(self.cur_data)
        else:
            self.current.set_field(name,self.cur_data)

    def get_objects(self):
        return self.objects

def make_objects(url,element,list_elems={},ign_elems={},rep_field={}):
    dh=DocHandler(element,list_elems,ign_elems,rep_field)
    eh=saxutils.ErrorPrinter()

    parser=saxexts.make_parser()
    parser.setDocumentHandler(dh)
    parser.setErrorHandler(eh)
    parser.parse(url)

    return dh.get_objects()

def make_xml(filename,root_elem,trgt_elem,list):
    out=open(filename,"w")
    out.write("<%s>\n" % root_elem)

    for obj in list:
        out.write("  <%s>\n" % trgt_elem)
        for field in obj.get_fields():
            out.write("    <%s>%s</%s>\n" % \
                      (field,escape_markup(obj.get_field(field)),field))
        out.write("  </%s>\n" % trgt_elem)

    out.write("\n</%s>" % root_elem)
    out.close()

def list2hash(lst,key_field):
    hash={}

    for obj in lst:
        hash[obj.get_field(key_field)]=obj

    return hash

def escape_markup(str):
    out=""

    for ch in str:
        if ch=="<":
            out=out+"&lt;"
        elif ch==">":
            out=out+"&gt;"
        else:
            out=out+ch

    return out
PyXML-0.8.4/demo/sax/saxdemo.py0000644001241000117560000000314507526150521016737 0ustar  loewishpifb600000000000000# A demo SAX application: using SAX to parse XML documents into ESIS
# or canonical XML.

from xml.sax import saxexts, saxlib, saxutils

import sys,urllib2,getopt

### Interpreting arguments (rather crudely)

try:
    (args,trail)=getopt.getopt(sys.argv[1:],"sed:")
    assert trail, "No argument provided"
except Exception,e:
    print "ERROR: %s" % e
    print
    print "Usage: python saxdemo.py [-e] [-d drv] filename [outfilename]"
    print
    print " -e: Output ESIS instead of normalized XML."
    print " -s: Silent (no messages except error messages)"
    print " -d: Use driver 'drv', where 'drv' is a module name."
    print " outfilename: Write to this file."
    sys.exit(1)

driver=None
esis=0
silent=0
in_sysID=trail[0]

if len(trail)==2:
    out_sysID=trail[1]
else:
    out_sysID=""

for (arg,val) in args:
    if arg=="-d":
        driver="xml.sax.drivers.drv_" + val
    elif arg=="-e":
        esis=1
    elif arg=="-s":
        silent=1

p=saxexts.make_parser(driver)
p.setErrorHandler(saxutils.ErrorPrinter())

if out_sysID=="":
    out=sys.stdout
else:
    try:
        out=urllib2.urlopen(out_sysID)
    except IOError,e:
        print out_sysID+": "+str(e)

if esis:
    dh=saxutils.ESISDocHandler(out)
else:
    dh=saxutils.Canonizer(out)

### Ready. Let's go!

if not silent:
    print "Parser: %s (%s, %s)" % (p.get_parser_name(),p.get_parser_version(),
                                   p.get_driver_version())
    print

try:
    p.setDocumentHandler(dh)
    p.parse(in_sysID)
except IOError,e:
    print in_sysID+": "+str(e)
except saxlib.SAXException,e:
    print str(e)

### Cleaning up.

out.close()
PyXML-0.8.4/demo/sax/saxhack.py0000644001241000117560000000676107413602734016733 0ustar  loewishpifb600000000000000#
#
# $Id: saxhack.py,v 1.5 2001/12/30 12:17:32 loewis Exp $
#
# illustrate how a saxlib parser can interface directly to sgmlop
#
# history:
# 98-05-23 fl   created (derived from the coreXML parser)
#
# Copyright (c) 1998 by Secret Labs AB
#
# info@pythonware.com
# http://www.pythonware.com
#

from xml.sax.saxlib import HandlerBase
class DocumentHandler:#(HandlerBase):

    # SAX interface

    def startElement(self, tag, attrs):
        pass # print "start", tag

    def endElement(self, tag):
        pass # print "end", tag

    def characters(self, text, start, len):
        pass # print "data", text[start:start+len]

# --------------------------------------------------------------------
# sgmlop-based parser

from xml.parsers import sgmlop

class Parser:

    def setDocumentHandler(self, dh):

        self.parser = sgmlop.XMLParser()
        self.parser.register(dh, 1)

    def parseFile(self, file):

        parser = self.parser

        while 1:
            data = file.read(16384)
            if not data:
                break
            parser.feed(data)

        parser.close()

# --------------------------------------------------------------------
# xmllib-based parser

from xml.parsers import xmllib

class xmllibParser(xmllib.XMLParser):

    def setDocumentHandler(self, dh):

        self.characters = dh.characters
        self.unknown_starttag = dh.startElement
        self.unknown_endtag = dh.endElement

    def handle_data(self, data):
        self.characters(data, 0, len(data))

    def parseFile(self, file):

        while 1:
            data = file.read(16384)
            if not data:
                break
            self.feed(data)

        self.close()

# --------------------------------------------------------------------
# original xmllib-based parser

class slowParser(xmllib.SlowXMLParser):

    def setDocumentHandler(self, dh):

        self.characters = dh.characters
        self.unknown_starttag = dh.startElement
        self.unknown_endtag = dh.endElement

    def handle_data(self, data):
        self.characters(data, 0, len(data))

    def parseFile(self, file):

        while 1:
            data = file.read(16384)
            if not data:
                break
            self.feed(data)

        file.close()

# ====================================================================
# test stuff

import time, os, sys

if len(sys.argv) == 1:
    print 'Usage: saxhack.py <xml filename>'
    sys.exit(1)

FILE = sys.argv[1]

size = os.stat(FILE)[6]

p  = Parser()
dh = DocumentHandler()
p.setDocumentHandler(dh)

f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_direct = time.clock() - t
f.close()

#import sys ; sys.exit(0)

print t_direct
if t_direct == 0:
    print 'Measured time was too small; use a larger XML file'
    sys.exit(1)

print "sgmlop:", int(size / t_direct), "bytes per second"

p = xmllibParser()
#p=slowParser()
dh = DocumentHandler()
p.setDocumentHandler(dh)

f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_fast = time.clock() - t
f.close()

print "xmllib:", int(size / t_fast), "bytes per second"

p = slowParser()
dh = DocumentHandler()
p.setDocumentHandler(dh)

f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_slow = time.clock() - t
f.close()

print "slow xmllib:", int(size / t_slow), "bytes per second"

print
print "normalized timings:"
print "slow xmllib", 1.0
print "fast xmllib", round(t_fast / t_slow, 2), "(%sx)" % round(t_slow / t_fast, 1)
print "sgmlop     ", round(t_direct / t_slow, 2), "(%sx)" % round(t_slow / t_direct, 1)
print
PyXML-0.8.4/demo/sax/saxstats.py0000644001241000117560000000220407413602734017147 0ustar  loewishpifb600000000000000# A simple SAX application that counts the number of elements, attributes and
# processing instructions in a document.

from xml.sax import saxexts
from xml.sax import saxlib
import sys

class CounterHandler(saxlib.DocumentHandler):

    def __init__(self):
        self.elems=0
        self.attrs=0
        self.pis=0

    def startElement(self,name,attrs):
        self.elems=self.elems+1
        self.attrs=self.attrs+len(attrs)

    def processingInstruction(self,target,data):
        self.pis=self.pis+1

# --- Main prog

if len(sys.argv)<2:
    print "Usage: python saxstats.py <document>"
    print
    print " <document>: file name of the document to parse"
    sys.exit(1)

# Load parser and driver

print "\nLoading parser..."

p=saxexts.make_parser()
ch=CounterHandler()
p.setDocumentHandler(ch)

# Ready, set, go!

print "Starting parse..."

OK=0
try:
    p.parse(sys.argv[1])
    OK=1
except IOError,e:
    print "\nERROR: "+sys.argv[1]+": "+str(e)
except saxlib.SAXException,e:
    print "\nERROR: "+str(e)

print "Parse complete:"
print "  Elements:    %d" % ch.elems
print "  Attributes:  %d" % ch.attrs
print "  Proc instrs: %d" % ch.pis
PyXML-0.8.4/demo/sax/saxtimer.py0000644001241000117560000000206407413602734017135 0ustar  loewishpifb600000000000000# A simple SAX application that measures the time spent parsing a
# document with an empty document handler.

from xml.sax import saxexts
from xml.sax import saxlib
import sys,time

if len(sys.argv)<3:
    print "Usage: python <parser> <document>"
    print
    print " <document>: file name of the document to parse"
    print " <parser>:   driver package name"
    sys.exit(1)

# Load parser and driver

print "\nLoading parser..."

try:
    p=saxexts.make_parser("xml.sax.drivers.drv_" + sys.argv[1])
except saxlib.SAXException,e:
    print "ERROR: Parser not available"
    sys.exit(1)

# Ready, set, go!

sum=0
print "Starting parse..."
for ix in range(3):
    start=time.clock()

    OK=0
    pt=0
    try:
        p.parse(sys.argv[2])
        pt=time.clock()-start
        OK=1
    except IOError,e:
        print "\nERROR: "+sys.argv[2]+": "+str(e)
    except saxlib.SAXException,e:
        print "\nERROR: "+str(e)

    if OK:
        print "Parse time: "+`pt`
    else:
        print "Error occurred, parse aborted."

    sum=sum+pt

print "Average: %f" % (sum/3.0)
PyXML-0.8.4/demo/sax/saxtrace.py0000644001241000117560000000351407413602734017114 0ustar  loewishpifb600000000000000"""
A minimal SAX application that just prints out the document-handler events
it receives.
"""

import sys
from xml.sax import saxexts

# --- SAXtracer

class SAXtracer:

    def __init__(self,objname):
        self.objname=objname
        self.met_name=""

    def __getattr__(self,name):
        self.met_name=name # UGLY! :)
        return self.trace

    def error(self,exception):
        print "err_handler.error(%s)" % str(exception)

    def fatalError(self,exception):
        print "err_handler.fatalError(%s)" % str(exception)

    def warning(self,exception):
        print "err_handler.warning(%s)" % str(exception)

    def characters(self,data,start,length):
        print "doc_handler.characters(%s,%d,%d)" % (`data[start:start+length]`,
                                                    start,length)

    def ignorableWhitespace(self,data,start,length):
        print "doc_handler.ignorableWhitespace(%s,%d,%d)" % \
              (`data[start:start+length]`,start,length)

    def startElement(self, name, attrs):
        attr_str="{"
        for attr in attrs:
            attr_str="%s '%s':'%s'," % (attr_str,attr,attrs[attr])

        if attr_str=="{":
            attr_str="{}"
        else:
            attr_str=attr_str[:-1]+" }"

        print "doc_handler.startElement('%s',%s)" % (name,attr_str)

    def trace(self,*rest):
        str="%s.%s(" % (self.objname,self.met_name)

        for param in rest[:-1]:
            str=str+`param`+", "

        if len(rest)>0:
            print str+`rest[-1]`+")"
        else:
            print str+")"

# --- Main prog

pf=saxexts.ParserFactory()
p=pf.make_parser("xml.sax.drivers.drv_xmlproc")

p.setDocumentHandler(SAXtracer("doc_handler"))
p.setDTDHandler(SAXtracer("dtd_handler"))
p.setErrorHandler(SAXtracer("err_handler"))
p.setEntityResolver(SAXtracer("ent_handler"))
p.parse(sys.argv[1])
PyXML-0.8.4/demo/sgmlop/0000755001241000117560000000000010152625721015426 5ustar  loewishpifb600000000000000PyXML-0.8.4/demo/sgmlop/benchsgml.py0000644001241000117560000000400607413602734017747 0ustar  loewishpifb600000000000000# benchmark

import time

from xml.parsers import sgmlop

import sgmllib

SIZE = 16384
FILE = "test2.htm"

bytes = len(open(FILE).read())

def t1():
    fp = open(FILE)
    parser = sgmllib.SlowSGMLParser()
    while 1:
        data = fp.read(SIZE)
        if not data:
            break
        parser.feed(data)
    parser.close()
    fp.close()

def t2():
    fp = open(FILE)
    parser = sgmllib.FastSGMLParser()
    while 1:
        data = fp.read(SIZE)
        if not data:
            break
        parser.feed(data)
    parser.close()
    fp.close()

def t3():
    fp = open(FILE)
    parser = sgmlop.SGMLParser()
    while 1:
        data = fp.read(SIZE)
        if not data:
            break
        parser.feed(data)
    parser.close()
    fp.close()

class Dummy:
    def finish_starttag(self, tag, data):
        pass
    def finish_endtag(self, tag):
        pass
    def handle_entityref(self, data):
        pass
    def handle_data(self, data):
        pass

def t4():