pkg://PyXML-0.8.4-3.src.rpm:744112/PyXML-0.8.4.tar.gz
info downloads
PyXML-0.8.4/ 0000755 0012410 0011756 00000000000 10152625727 013207 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/ 0000755 0012410 0011756 00000000000 10152625721 014125 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/dom/ 0000755 0012410 0011756 00000000000 10152625721 014704 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/dom/4tidy.py 0000644 0012410 0011756 00000001355 07413602734 016324 0 ustar loewis hpifb6 0000000 0000000 import sys, cStringIO
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import XHtmlPrint
def Tidy(doc):
#stream = cStringIO.StringIO()
#XHtmlPrint(doc, stream=stream)
#text = stream.getvalue()
XHtmlPrint(doc)
return
if __name__ == "__main__":
html_reader = HtmlLib.Reader()
if len(sys.argv) == 3:
uri = sys.argv[1]
encoding = sys.argv[2]
elif len(sys.argv) == 2:
uri = sys.argv[1]
encoding = ''
else:
print "%s requires one or two arguments: the first is a URL or file name to be tidied. The optional second is the encoding to assume for the input."%sys.argv[0]
sys.exit(-1)
html_doc = html_reader.fromUri(uri, charset=encoding)
Tidy(html_doc)
PyXML-0.8.4/demo/dom/README 0000644 0012410 0011756 00000007670 07244341241 015576 0 ustar loewis hpifb6 0000000 0000000 Example Programs and Demos for 4DOM.
====================================
Sample data files which can be used to exercise the various demos:
* addr_book1.xml
* addr_book2.xml
* book_catalog1.xml
* addr_book.dtd
* employee_table.html
Demos:
------
* dom_from_html_file.py
Demonstrates reading HTML from a file, and pretty-printing.
Example: "python dom_from_html_file.py employee_table.html"
* dom_from_xml_file.py
Demonstrates reading XML from a file, and pretty-printing. Try changing FromXml to have "validate=1".
Example: "python dom_from_xml_file.py addr_book1.xml"
* generate_html1.py
Demonstrates putting together a simple HTML page (a form in this case)
with the standard DOM factory interface.
Just execute with "python generate_html1.py"
You can re-direct the output to file and view the result with a browser. Try adding in more sophisticated form elements.
* generate_xml1.py
Demonstrates putting together a simple XML document with the standard DOM
factory interface.
Just execute with "python generate_xml1.py"
* 4tidy.py
Demonstrates the XHTML support in 4DOM. It takes a URL or file name on the
command line and reads the HTML source. It then prints xhtml based on the HTML
source to standard output.
try "python 4tidy.py http://fourthought.com"
* iterator1.py
Demonstrates the DOM standard Node Iterator interface. It iterates over each node in the read-in file, and prints out its node type and name. Then it iterates again, using the NodeFilter interface to restrict it to nodes of type Element.
Example: "python iterator1.py addr_book1.xml"
* visitor1.py
Demonstrates 4DOM's proprietary Walker/Visitor interface. If you only need to iterate over a tree in pre-order, you are advised to use the standard NodeIterator instead (see iterator1.py and xll_replace.py for examples). dom.ext.Visitor is best for defi
ning other iteration orders and rules.
This sample actually just runs through a pre-order walk, for simplicity. The output should be identical to that of the first part of iterator1.py.
Example: "python visitor1.py addr_book1.xml"
* trace_ns.py
A demo of 4DOM's namespace extensions. Given an XML file-name on the command line, it will walk through the elements in document order (using NodeIterator) and print out the default namespace in effect as well as those of the element and its attributes.
Example: "python trace_ns.py book_catalog1.xml"
For the Namespace spec, see
http://www.w3.org/TR/REC-xml-names/
For James Clark's excellent introduction to and clarification of namespaces, see
http://www.jclark.com/xml/xmlns.htm
* link_title_invert.py
Demonstrates node manipulations. It takes a sample document with anchors
embedded in header tags, and flips them so that the header tags are instead
embedded in the anchors.
just "python link_title_invert.py"
* xll_replace.py
A rather more involved demo. This program reads in an XML file, and looks for XLL-type hyperlinks (see http://www.oasis-open.org/cover/xll.html for information on this remarkably powerful spec).
Warning: This script uses a very obsolete version of XLink
When it finds such a link, it looks for the target XML doc
ument and parses it into a DOM node. It doesn't support XPointer for document fragments yet, but with a decent Xpointer processor, such as xptr (see below), you can add this yourself. It then replaces the node that contained the link with the entire con
tents of the target document of that link.
For a good example, look at addr_book1.xml and then addr_book2.xml. The former contains the following line:
<ENTRY-LINK xml:link="simple" href="addr_book2.xml"/>
if you run
"python xll_replace.py addr_book1.xml"
it will read in the addr_book2.xml file into a node, and replace the ENTRY-LINK node with the new one. It will then print out the result, which should be self-explanatory.
If you need help with the demos, or any other help working with 4DOM,
please don't hesistate to as on the mailing list: 4Suite@lists.fourthought.com.
PyXML-0.8.4/demo/dom/__init__.py 0000644 0012410 0011756 00000000163 07633365761 017034 0 ustar loewis hpifb6 0000000 0000000 ########################################################################
#
# File Name: __init__.py
#
#
PyXML-0.8.4/demo/dom/addr_book.dtd 0000644 0012410 0011756 00000000636 07117052117 017332 0 ustar loewis hpifb6 0000000 0000000 <!ELEMENT ADDRBOOK ((ENTRY | ENTRY-LINK)*)>
<!ELEMENT ENTRY (NAME, ADDRESS, PHONENUM*, EMAIL)>
<!ATTLIST ENTRY
ID ID #REQUIRED
>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT ADDRESS (#PCDATA)>
<!ELEMENT PHONENUM (#PCDATA)>
<!ATTLIST PHONENUM
DESC CDATA #REQUIRED
>
<!ELEMENT EMAIL (#PCDATA)>
<!ELEMENT ENTRY-LINK EMPTY>
<!ATTLIST ENTRY-LINK
xml:link (simple|extended|group) #REQUIRED
href CDATA #REQUIRED
>
PyXML-0.8.4/demo/dom/addr_book1.xml 0000644 0012410 0011756 00000001723 07117052117 017436 0 ustar loewis hpifb6 0000000 0000000 <?xml version = "1.0"?>
<!DOCTYPE ADDRBOOK SYSTEM "addr_book.dtd">
<ADDRBOOK xmlns:xlink="http://www.w3.org/XML/XLink/0.9">
<ENTRY ID="pa">
<NAME>Pieter Aaron</NAME>
<ADDRESS>404 Error Way</ADDRESS>
<PHONENUM DESC="Work">404-555-1234</PHONENUM>
<PHONENUM DESC="Fax">404-555-4321</PHONENUM>
<PHONENUM DESC="Pager">404-555-5555</PHONENUM>
<EMAIL>pieter.aaron@inter.net</EMAIL>
</ENTRY>
<ENTRY-LINK xlink:link="simple" xlink:href="addr_book2.xml"/>
<ENTRY ID="en">
<NAME>Emeka Ndubuisi</NAME>
<ADDRESS>42 Spam Blvd</ADDRESS>
<PHONENUM DESC="Work">767-555-7676</PHONENUM>
<PHONENUM DESC="Fax">767-555-7642</PHONENUM>
<PHONENUM DESC="Pager">800-SKY-PAGEx767676</PHONENUM>
<EMAIL>endubuisi@spamtron.com</EMAIL>
</ENTRY>
<ENTRY ID="vz">
<NAME>Vasia Zhugenev</NAME>
<ADDRESS>2000 Disaster Plaza</ADDRESS>
<PHONENUM DESC="Work">000-987-6543</PHONENUM>
<PHONENUM DESC="Cell">000-000-0000</PHONENUM>
<EMAIL>vxz@magog.ru</EMAIL>
</ENTRY>
</ADDRBOOK>
PyXML-0.8.4/demo/dom/addr_book2.xml 0000644 0012410 0011756 00000000430 07117052117 017431 0 ustar loewis hpifb6 0000000 0000000 <?xml version = "1.0"?>
<!DOCTYPE ADDRBOOK SYSTEM "addr_book.dtd">
<ADDRBOOK>
<ENTRY ID="gn">
<NAME>Gegbefuna Nwannem</NAME>
<ADDRESS>666 Murtala Mohammed Blvd.</ADDRESS>
<PHONENUM DESC="Home">999-101-1001</PHONENUM>
<EMAIL>nwanneg@naija.ng</EMAIL>
</ENTRY>
</ADDRBOOK>
PyXML-0.8.4/demo/dom/benchmark.py 0000644 0012410 0011756 00000001717 07413602734 017223 0 ustar loewis hpifb6 0000000 0000000 # A DOM benchmark
import sys, time
from xml.dom import core, utils
def main():
global L, doc
if len(sys.argv) == 1:
print 'Usage: benchmark.py <xml file>'
sys.exit()
filename = sys.argv[1]
file = open(filename, 'r')
size = len(file.read())
file.close()
print 'File %s is %iK in size' % (filename, size / 1024)
start_time = time.time()
doc = utils.FileReader( filename ).document
end_time = time.time()
print 'Building DOM tree:', end_time - start_time, 'sec'
# Convert DOM tree back to XML
start_time = time.time()
xml = doc.toxml()
end_time = time.time()
print 'Serializing back to XML:', end_time - start_time, 'sec'
# Time a complete getElementsByTagName()
start_time = time.time()
L = doc.getElementsByTagName("*")
end_time = time.time()
print 'getElementsByTagName("*"):', end_time - start_time, 'sec'
print L[0].nodeName
if __name__ == '__main__': main()
PyXML-0.8.4/demo/dom/book_catalog1.xml 0000644 0012410 0011756 00000000767 07117052117 020145 0 ustar loewis hpifb6 0000000 0000000 <?xml version="1.0"?>
<!-- initially, the default namespace is "books" -->
<book xmlns='urn:loc.gov:books'
xmlns:isbn='urn:ISBN:0-395-36341-6'
xmlns:fcat='http://FourThought.com/catalog'>
<title>Cheaper by the Dozen</title>
<isbn:number>1568491379</isbn:number>
<notes fcat:ref='ITEM54321'>
<!-- make HTML the default namespace for some commentary -->
<p xmlns='urn:w3-org-ns:HTML'>
This is a <i>funny</i> book!
</p>
</notes>
</book>
PyXML-0.8.4/demo/dom/building.py 0000644 0012410 0011756 00000002577 07413602734 017073 0 ustar loewis hpifb6 0000000 0000000 # This demo converts a few nested objects into an XML representation,
# and provides a simple example of using the Builder class.
from xml.dom import core
from xml.dom.builder import Builder
import types, time
def object_convert(builder, obj):
# Put the entire object inside an element with the same name as
# the class.
builder.startElement( obj.__class__.__name__ )
L = obj.__dict__.keys()
L.sort()
for attr in obj.__dict__.keys():
# Skip internal attributes (ones that begin with a '_')
if attr[0] == '_': continue
value = getattr(obj, attr)
if type(value) == types.InstanceType:
# Recursively process subobjects
object_convert( builder, value)
else:
# Convert anything else to a string and put it in an element
builder.startElement(attr)
builder.text( str(value) )
builder.endElement(attr)
builder.endElement( obj.__class__.__name__ )
if __name__ == '__main__':
class Folder: pass
class Bookmark: pass
f=Folder()
f.title = "Folder Title"
f.createdTime = time.asctime( time.localtime( time.time() ) )
f.bookmark = b = Bookmark()
b.url, b.title = "http://www.python.org", "Python Home Page"
builder = Builder()
object_convert(builder, f)
print "Output from two nested objects:"
print builder.document.toxml()
PyXML-0.8.4/demo/dom/dom_from_html_file.py 0000644 0012410 0011756 00000001111 07413602734 021102 0 ustar loewis hpifb6 0000000 0000000 """Reads in an HTML file from the command line and pretty-prints it."""
from xml.dom.ext.reader import HtmlLib
from xml.dom import ext
def read_html_from_file(fileName):
#build a DOM tree from the file
reader = HtmlLib.Reader()
dom_object = reader.fromUri(fileName)
#strip any ignorable white-space in preparation for pretty-printing
ext.StripHtml(dom_object)
#pretty-print the node
ext.PrettyPrint(dom_object)
#reclaim the object
reader.releaseNode(dom_object);
if __name__ == '__main__':
import sys
read_html_from_file(sys.argv[1])
PyXML-0.8.4/demo/dom/dom_from_xml_file.py 0000644 0012410 0011756 00000000600 07244341241 020733 0 ustar loewis hpifb6 0000000 0000000 from xml.dom import ext
from xml.dom.ext.reader import PyExpat
def read_xml_from_file(fileName):
#build a DOM tree from the file
reader = PyExpat.Reader()
xml_dom_object = reader.fromUri(fileName)
ext.Print(xml_dom_object)
#reclaim the object
reader.releaseNode(xml_dom_object)
if __name__ == '__main__':
import sys
read_xml_from_file(sys.argv[1])
PyXML-0.8.4/demo/dom/domconv.py 0000644 0012410 0011756 00000004721 07413602734 016734 0 ustar loewis hpifb6 0000000 0000000 # A simple library to convert DOM object structures to SGML or XML output,
# usually for xml2html conversion.
import sys,types,string,StringIO
SKIP=1 # Ignore the element and its contents
STRIP=2 # Ignore the element, but process its contents
ID=3 # Identity transform
MAP=4 # Arg: (elem,hash). Map element to elem, map attrs using hash.
def escape_markup(str):
"""Takes a string and escapes all '<'s and quotes in it with character
entity references."""
str=string.replace(str,"<","<")
return string.replace(str,'"',""")
def convert(rootnode,spec,writer=sys.stdout):
"""Takes a DOM node, a conversion specification and a file-like object
to write the converted data to, and performs the actual conversion.
The spec hashtable must map element names to (action,arg) tuples, where
action must be one of the constants at the top of this file. arg is only
used for MAP, where it must be a tuple (elementname,maphash) where the
elementname is the name of the element to substitute for the original
one, and maphash is a hashtable that maps attribute names to either the
attribute name to substitute or a function that takes the attribute value
and returns the string to replace the entire attr='val' sequence with.
"""
try:
(action,arg)=spec[rootnode.GI]
except KeyError:
action=STRIP
if action==SKIP:
return
elif action==STRIP:
pass
elif action==ID:
writer.write("<" + rootnode.GI)
for (name,val) in rootnode.attributes.items():
writer.write(" %s='%s'" % (name,escape_markup(val)))
writer.write(">")
elif action==MAP:
writer.write("<" + arg[0])
for (name,val) in rootnode.attributes.items():
if arg[1].has_key(name):
map=arg[1][name]
if type(map)==types.StringType:
writer.write(" %s=\"%s\"" % (map,escape_markup(val)))
else:
writer.write(map(escape_markup(val)))
writer.write(">")
for child in rootnode.getChildren():
if child.GI=="#PCDATA":
writer.write(escape_markup(child.data))
else:
convert(child,spec,writer)
if action==ID:
writer.write("</%s>" % rootnode.GI)
elif action==MAP:
writer.write("</%s>" % arg[0])
def convert_str(rootnode,spec):
obj=StringIO.StringIO()
convert(rootnode,spec,obj)
return obj.getvalue()
PyXML-0.8.4/demo/dom/employee_table.html 0000644 0012410 0011756 00000002322 07244341241 020557 0 ustar loewis hpifb6 0000000 0000000 <HTML>
<HEAD>
<TITLE>
FourThought Employee List
</TITLE>
</HEAD>
<BODY>
<TABLE BORDER='1'>
<TBODY>
<TR>
<TH>
Last Name, First Name
</TH>
<TH>
Email address
</TH>
<TH>
Extension
</TH>
<TH>
Department
</TH>
</TR>
<TR>
<TD>
Butte, Brian
</TD>
<TD>
<A HREF='mailto:Brian.Butte@fourthought.com'>Brian.Butte@fourthought.com</A>
</TD>
<TD>
x1111
</TD>
<TD>
1028
</TD>
</TR>
<TR>
<TD>
Ogbuji, Uche
</TD>
<TD>
<A HREF='mailto:Uche.Ogbuji@fourthought.com'>Uche.Ogbuji@fourthought.com</A>
</TD>
<TD>
x1112
</TD>
<TD>
1029
</TD>
</TR>
<TR>
<TD>
<A HREF='/~molson'>Olson, Mike</A>
</TD>
<TD>
<A HREF='mailto:Mike.Olson@fourthought.com'>Mike.Olson@fourthought.com</A>
</TD>
<TD>
x1113
</TD>
<TD>
1028
</TD>
</TR>
<TR>
<TD>
Roberts, Rich
</TD>
<TD>
<A HREF='mailto:Rich.Roberts@fourthought.com'>Rich.Roberts@fourthought.com</A>
</TD>
<TD>
x1114
</TD>
<TD>
1029
</TD>
</TR>
</TBODY>
</TABLE>
</BODY>
</HTML>
PyXML-0.8.4/demo/dom/generate_html1.py 0000644 0012410 0011756 00000003050 07413602734 020160 0 ustar loewis hpifb6 0000000 0000000 """
A basic example of using the DOM to create an HTML document from scratch.
Also demonstrates creation of HTML forms
"""
from xml.dom import ext
from xml.dom import implementation
if __name__ == '__main__':
#create a concrete HTMLDocument instance.
doc = implementation.createHTMLDocument('A Basic HTML Document')
#add in body
doc.body = doc.createElement('Body')
#Create a form
form = doc.createElement('Form')
#Create some text. Note: every character is represented in some
#DOM object. All text (even between tags) is in a text node
t = doc.createTextNode('Employee Name:')
#Create an input tag
i = doc.createElement('Input')
#All elements can have attributes directly set
i.setAttribute('TYPE','TEXT')
#Some have helper functions defined.
#This one sets the SIZE attribute to 20
#Note that the argument must be a string. 4DOM closely
#follows the DOM spec for the type of the arguments, even
#when the spec is inconsistent or counter-intuitive
i.size = '20'
#This sets the NAME attribute
i.name = 'EmployeeName'
#Set the form's ACTION attribute
form.action = '/cgi-local/test.py'
#this inserts i as the last child in the form
form.appendChild(i)
#Insert t before i in form's child list
form.insertBefore(t,i)
#add the form to the document's body. Note that you can't
#add child elements directly to the document.
doc.body.appendChild(form)
#This prints out the text representation of the HTML document
ext.PrettyPrint(doc)
PyXML-0.8.4/demo/dom/generate_xml1.py 0000644 0012410 0011756 00000002222 07413602734 020014 0 ustar loewis hpifb6 0000000 0000000 """
A basic example of using the DOM to create an XML document from scratch.
"""
from xml.dom import ext
from xml.dom import implementation
if __name__ == '__main__':
#Create a doctype using document type name, sysid and pubid
dt = implementation.createDocumentType('mydoc', '', '')
#Create a document using document element namespace URI, doc element
#name and doctype. This automatically creates a document element
#which is the single element child of the document
doc = implementation.createHTMLDocument('', 'mydoc', dt)
#Get the document element
doc_elem = doc.documentElement
#Create an element: the Document instanmce acts as a factory
new_elem = doc.createElementNS('', 'spam')
#Create an attribute on the new element
new_elem.setAttributeNS('', 'eggs', 'sunnysideup')
#Create a text node
new_text = doc.createTextNode('some text here...')
#Add the new text node to the new element
new_elem.appendChild(new_text)
#Add the new element to the document element
doc_elem.appendChild(new_elem)
#Print out the resulting document
import xml.doc.ext
xml.doc.ext.Print(doc)
PyXML-0.8.4/demo/dom/html2html 0000755 0012410 0011756 00000003557 06624412225 016561 0 ustar loewis hpifb6 0000000 0000000 #!/usr/bin/python
#
# This example program converts a chunk of HTML to a DOM tree.
# It then prints the tree as HTML, as XML, and it prints a list of all
# the hyperlinks in the document by using getElementsByTagName() to
# retrieve all the A elements.
from xml.dom.html_builder import HtmlBuilder
from xml.dom.writer import HtmlWriter
from xml.dom import core
HTML_DATA = """<HTML>
<HEAD><TITLE>Les HOWTO Linux</TITLE></HEAD>
<BODY>
<HR> <H1>Les HOWTO Linux</H1>
<P>Les Howto que vous trouverez ci-dessous sont en français.
Ils peuvent etre trouvés dans les formats suivants
sur le site
<A HREF="ftp://ftp.lip6.fr/pub/linux/french/docs/HOWTO">ftp.lip6.fr</a>
dans le répertoire /pub/linux/french/docs/HOWTO :
<UL>
<LI><A HREF="Access-HOWTO.html">Access-HOWTO</A> (Version <A
HREF="Access-HOWTO.ps">Postscript</A>)</LI>
</UL></BODY></HTML>
"""
# Construct an HtmlBuilder object and feed the data to it
b = HtmlBuilder()
b.feed(HTML_DATA)
# Get the newly-constructed document object
doc = b.document
# Output it as HTML
print "============"
print "HTML version"
w = HtmlWriter()
w.write(b.document)
# Output it as XML
print "\n==========="
print "XML version"
print doc.toxml()
print "\n==========="
print "Links in the document"
# Retrieve all the link objects
links = doc.getElementsByTagName('A')
for node in links:
# Collect any children of the A element that are Text nodes
# (Note that this won't work on invalid HTML, like
# <a href="xxx"><b>Text</b></a>. You could fix this by actually
# traversing all the child nodes of the A element.)
linktext = ""
for child in node.childNodes:
if child.nodeType == core.TEXT_NODE:
linktext = linktext + child.value
# Get the HREF attribute, if present
url = node.getAttribute('HREF')
if url != "":
print "HREF=", url, linktext
print links
PyXML-0.8.4/demo/dom/iterator1.py 0000644 0012410 0011756 00000001721 07413602734 017176 0 ustar loewis hpifb6 0000000 0000000 """Demonstrates basic walking using DOM level 2 iterators"""
from xml.dom.ext.reader import PyExpat
from xml.dom.NodeFilter import NodeFilter
def Iterate(xml_dom_object):
print "Printing all nodes:"
nit = xml_dom_object.ownerDocument.createNodeIterator(xml_dom_object, NodeFilter.SHOW_ALL, None, 0)
curr_node = nit.nextNode()
while curr_node:
print "%s node %s\n"%(curr_node.nodeType, curr_node.nodeName)
curr_node = nit.nextNode()
print "\n\n\nPrinting only element nodes:"
snit = xml_dom_object.ownerDocument.createNodeIterator(xml_dom_object, NodeFilter.SHOW_ELEMENT, None, 0)
curr_node = snit.nextNode()
while curr_node:
print "%s node %s\n"%(curr_node.nodeType, curr_node.nodeName)
curr_node = snit.nextNode()
if __name__ == '__main__':
import sys
reader = PyExpat.Reader()
xml_dom_object = reader.fromUri(sys.argv[1])
Iterate(xml_dom_object)
reader.releaseNode(xml_dom_object)
PyXML-0.8.4/demo/dom/link_title_invert.py 0000644 0012410 0011756 00000002272 07413602734 021013 0 ustar loewis hpifb6 0000000 0000000 from xml.dom import Node, ext
from xml.dom.ext.reader import PyExpat
test_doc = """<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>LADIES</title></head>
<body>
<h1>LADIES</h1>
<h2><a name="A">Agathas</a></h2>
Four and forty lovers had Agathas in the old days,...
<h2><a name="B">Young Lady</a></h2>
I have fed your lar with poppies,...
<h2><a name="C">Lesbia Illa</a></h2>
Memnon, Memnon, that lady...
</body>
</html>
"""
def link_title_invert():
#build a DOM tree from the file
reader = PyExpat.Reader()
doc = reader.fromString(test_doc)
h2_elements = doc.getElementsByTagNameNS('http://www.w3.org/1999/xhtml', 'h2')
for e in h2_elements:
parent = e.parentNode
a_list = filter(lambda x: (x.nodeType == Node.ELEMENT_NODE) and (x.localName == 'a'), e.childNodes)
a = a_list[0]
e.removeChild(a)
for node in a.childNodes:
#Automatically also removes the child from a
e.appendChild(node)
parent.replaceChild(a, e)
a.appendChild(e)
ext.Print(doc)
#reclaim the object; not necessary with Python 2.0
reader.releaseNode(doc)
if __name__ == '__main__':
import sys
link_title_invert()
PyXML-0.8.4/demo/dom/trace_ns.py 0000644 0012410 0011756 00000002231 07413602734 017057 0 ustar loewis hpifb6 0000000 0000000 '''
Walk through a namespace-compliant XML file and print out the
the namespaces of all elements and attributes in document order
'''
from xml.dom.ext.reader import PyExpat
from xml.dom.NodeFilter import NodeFilter
def TraceNs(doc):
snit = doc.createNodeIterator(doc, NodeFilter.SHOW_ELEMENT, None, 0)
curr_elem = snit.nextNode()
while curr_elem:
print "Current Element", curr_elem.nodeName
#FIXME: put a GetDefaultNs method into Ext
#ns = Namespace.GetDefaultNs(curr_elem)
#print "\tDefault NS\t", ns
print "\t"+curr_elem.nodeName+"\t\t", curr_elem.namespaceURI
header_printed = 0
for k in curr_elem.attributes.keys():
if curr_elem.attributes[k].namespaceURI:
if not header_printed:
header_printed = 1
print "\tAttributes"
print "\t\t"+curr_elem.attributes[k].nodeName+"\t", curr_elem.attributes[k].namespaceURI
print
curr_elem = snit.nextNode()
if __name__ == "__main__":
import sys
reader = PyExpat.Reader()
doc = reader.fromUri(sys.argv[1])
TraceNs(doc)
reader.releaseNode(doc)
PyXML-0.8.4/demo/dom/visitor1.py 0000644 0012410 0011756 00000002355 07413602734 017050 0 ustar loewis hpifb6 0000000 0000000 """Demonstrates basic, pre-order DOM walking using the default, bare-bones visitor"""
from xml.dom.ext.reader import PyExpat
from xml.dom import Node
from xml.dom.ext import Visitor
from xml.dom.ext.reader import Sax2
from xml.dom.ext import ReleaseNode
class NsVisitor(Visitor.Visitor):
def visit(self, node):
print "Node %s namespaceURI: '%s' qualified name: '%s' localName: '%s' prefix: '%s'\n"%(str(node), node.namespaceURI, node.nodeName, node.localName, node.prefix)
if node.nodeType == Node.ELEMENT_NODE:
for k in node.attributes.keys():
print "Node %s namespaceURI: '%s' qualified name: '%s' localName: '%s' prefix: '%s'\n"%(str(node.attributes[k]), node.attributes[k].namespaceURI, node.attributes[k].nodeName, node.attributes[k].localName, node.attributes[k].prefix)
return None
def Walk(xml_dom_object):
visitor = Visitor.Visitor()
walker = Visitor.Walker(visitor, xml_dom_object)
walker.run()
visitor = NsVisitor()
walker = Visitor.Walker(visitor, xml_dom_object)
walker.run()
if __name__ == '__main__':
import sys
reader = PyExpat.Reader()
xml_dom_object = reader.fromUri(sys.argv[1])
Walk(xml_dom_object)
reader.releaseNode(xml_dom_object)
PyXML-0.8.4/demo/dom/xll_replace.py 0000644 0012410 0011756 00000003716 07413602734 017564 0 ustar loewis hpifb6 0000000 0000000 """
Demonstrates some advanced DOM manipulation.
This function looks for simple XLinks and replaces the node containing
such links with the contents of the referenced document.
"""
from xml.dom import Node
from xml.dom.NodeFilter import NodeFilter
from xml.dom import ext
from xml.dom.ext.reader import PyExpat
def XllReplace(start_node):
reader = PyExpat.Reader()
owner_doc = start_node.ownerDocument
snit = owner_doc.createNodeIterator(start_node, NodeFilter.SHOW_ELEMENT, None, 0)
curr_node = snit.nextNode()
while curr_node:
#Only empty nodes are allowed to have Links
if not curr_node.childNodes.length and curr_node.attributes:
is_link = 0
href = None
for k in curr_node.attributes.keys():
if (curr_node.attributes[k].localName, curr_node.attributes[k].namespaceURI) == ("link", "http://www.w3.org/XML/XLink/0.9"):
is_link = 1
elif (curr_node.attributes[k].localName, curr_node.attributes[k].namespaceURI) == ("href", "http://www.w3.org/XML/XLink/0.9"):
href = curr_node.attributes[k].value
if is_link and href:
#Then make a tree of the new file and insert it
f = open(href, "r")
st = f.read()
new_df = reader.fromString(st, ownerDoc=start_node.ownerDocument)
#Get the first element node and assume it's the document node
for a_node in new_df.childNodes:
if a_node.nodeType == Node.ELEMENT_NODE:
doc_root = a_node
break
curr_node.parentNode.replaceChild(doc_root, curr_node)
curr_node = snit.nextNode()
return start_node
if __name__ == "__main__":
import sys
reader = PyExpat.Reader()
xml_dom_tree = reader.fromUri(sys.argv[1])
XllReplace(xml_dom_tree)
ext.PrettyPrint(xml_dom_tree)
reader.releaseNode(xml_dom_tree)
PyXML-0.8.4/demo/dom/xpointer_query.py 0000644 0012410 0011756 00000001211 07413602734 020353 0 ustar loewis hpifb6 0000000 0000000 """Demonstrates using the xptr.py tool to query DOM Nodes using the XPointer spec"""
from xml.dom import ext
from xml.dom.ext.reader import Sax2
import xptr
if __name__ == '__main__':
import sys
xpointer_expr = sys.argv[1]
try:
xml_dom_object = Sax2.FromXmlUrl(sys.argv[2], validate=0)
except Sax.saxlib.SAXException, msg:
print "SAXException caught:", msg
except Sax.saxlib.SAXParseException, msg:
print "SAXParseException caught:", msg
result_node = xptr.LocateNode(xml_dom_object, xpointer_expr)
ext.StripXml(result_node)
ext.PrettyPrint(result_node)
ext.ReleaseNode(result_node)
PyXML-0.8.4/demo/dom/xptr.py 0000644 0012410 0011756 00000043763 07413602734 016275 0 ustar loewis hpifb6 0000000 0000000 """
This is an experimental implementation of the XPointer locator language.
Version 0.20 - 23.Aug.98
Lars Marius Garshol - larsga@ifi.uio.no
http://www.stud.ifi.uio.no/~larsga/download/python/xml/xptr.html
Changes since version 0.10:
- 'id' locator term implemented
- 'attr' locator term implemented
- node type qualifiers implemented
- 'origin' locator term implemented
Modified by Uche Ogbuji 25.Jan.99 to work with 4DOM.
Modified by Uche Ogbuji 18.Nov.99 to work with the emerging Python/DOM binding 4DOM. Distributed with permission.
"""
import re,string,sys
from xml.dom import Node
from xml.dom import ext
# Spec deviations:
# - html keyword not supported
# - negative instance numbers not supported
# - #cdata node type selector not supported
# - * for attribute values/names not supported
# - preceding keyword not supported
# - span keyword unsupported
# - support 'string' location terms
# Spec questions
# - what if locator fails?
# - what to do with "span(...).child(1)"?
# - how to continue from a set of selected nodes?
# - attr: error if does not use element as source?
# - should distinguish between semantic errors and failures?
# - can string terms locate inside attr vals?
# - are the string loc semantics a bit extreme? perhaps restrict to one node?
# - how to represent span and string results in terms of the DOM?
# Global variables
version="0.20"
specver="WD-xptr-19980303"
# Useful regular expressions
reg_sym=re.compile("[a-z]+|\\(|\\)|\\.|[-+]?[1-9][0-9]*|[A-Za-z_:][\-A-Za-z_:.0-9]*|,|#[a-z]+|\\*|\"[^\"]*\"|'[^']*'")
reg_sym_param=re.compile(",|\)|\"|'")
reg_name=re.compile("[A-Za-z_:][\-A-Za-z_:.0-9]*")
# Some exceptions
class XPointerException(Exception):
"Means something went wrong when attempting to follow an XPointer."
pass
class XPointerParseException(XPointerException):
"Means the XPointer was syntactically invalid."
def __init__(self,msg,pos):
self.__msg=msg
self.__pos=pos
def get_pos(self):
return self.__pos
def __str__(self):
return self.__msg % self.__pos
class XPointerFailedException(XPointerException):
"Means the XPointer was logically invalid."
pass
class XPointerUnsupportedException(XPointerException):
"Means the XPointer used unsupported constructs."
pass
# Simple XPointer lexical analyzer
class SymbolGenerator:
"Chops XPointers up into distinct symbols."
def __init__(self,xpointer):
self.__data=xpointer
self.__pos=0
self.__last_was_param=0
self.__next_is=""
def get_pos(self):
"Returns the current position in the string."
return self.__pos
def more_symbols(self):
"True if there are more symbols in the XPointer."
return self.__pos<len(self.__data) or self.__next_is!=""
def next_symbol(self):
"Returns the next XPointer symbol."
if self.__next_is!="":
tmp=self.__next_is
self.__next_is=""
return tmp
if self.__last_was_param:
self.__last_was_param=0
sym=""
count=0
while self.more_symbols():
n=self.next_symbol()
if n=='"' or n=="'":
pos=string.find(self.__data,n,self.__pos)
if pos==-1:
raise XPointerParseException("Unmatched %s at %d" % \
n,self.__pos)
sym=self.__data[self.__pos-1:pos+1]
self.__pos=pos+1
elif n=="(":
count=count+1
elif n==")":
count=count-1
if count<0:
if sym=="":
return ")"
else:
self.__next_is=")"
return sym
elif n=="," and count==0:
self.__last_was_param=1
self.__next_is=","
return sym
sym=sym+n
mo=reg_sym.match(self.__data,self.__pos)
if mo==None:
raise XPointerParseException("Invalid symbol at position %d",
self.__pos)
self.__pos=self.__pos+len(mo.group(0))
self.__last_was_param= mo.group(0)=="("
return mo.group(0)
# Simple XPointer parser
class XPointerParser:
"""Simple XPointer parser that parses XPointers firing events that receive
terms and parameters."""
def __init__(self,xpointer):
self.__sgen=SymbolGenerator(xpointer)
self.__first_term=1
self.__prev=None
def __skip_over(self,symbol):
if self.__sgen.next_symbol()!=symbol:
raise XPointerParseException("Expected '"+symbol+"' at %s",
self.__sgen.get_pos())
def __is_valid(self,symbol,regexp):
mo=regexp.match(symbol)
return mo!=None and len(mo.group(0))==len(symbol)
def __parse_instance_or_all(self,iora):
if iora!="all":
try:
return int(iora)
except ValueError,e:
raise XPointerParseException("Expected number or 'all' at %s",
self.__sgen.get_pos())
else:
return "all"
def parse(self):
"Runs through the entire XPointer, firing events."
sym="."
while sym==".":
name=self.__sgen.next_symbol()
if name=="(":
name="" # Names can be defaulted
else:
self.__skip_over("(")
sym=self.__sgen.next_symbol()
if sym!=")":
params=[sym]
sym=self.__sgen.next_symbol()
else:
params=[]
while sym==",":
params.append(self.__sgen.next_symbol())
sym=self.__sgen.next_symbol()
if sym!=")":
raise XPointerParseException("Expected ')' at %s",
self.__sgen.get_pos())
self.dispatch_term(name,params)
if self.__sgen.more_symbols():
sym=self.__sgen.next_symbol()
else:
return
# If the XPointer ends correctly, we'll return from the if above
raise XPointerParseException("Expected '.' at %s",
self.__sgen.get_pos())
def dispatch_term(self,name,params):
"""Called when a term is encountered to analyze it and fire more
detailed events."""
if self.__first_term:
if name=="root" or name=="origin" or name=="id" or name=="html":
if name=="root" or name=="origin":
if len(params)!=0:
raise XPointerParseException(name+" terms have no "
"parameters (at %s)",
self.__sgen.get_pos())
else:
param=None
elif name=="id" or name=="html":
if len(params)!=1:
raise XPointerParseException(name+" terms require one "
"parameter (at %s)",
self.__sgen.get_pos())
else:
param=params[0]
# XXX Validate parameter
self.__first_term=0
self.handle_abs_term(name,param)
return
else:
self.handle_abs_term("root",None)
else:
if name=="" and self.__prev!=None:
name=self.__prev
if name=="child" or name=="ancestor" or name=="psibling" or \
name=="fsibling" or name=="descendant" or name=="following" or \
name=="preceding":
self.parse_rel_term(name,params)
elif name=="span":
self.parse_span_term(params)
elif name=="attr":
self.parse_attr_term(params)
elif name=="string":
self.parse_string_term(params)
else:
raise XPointerParseException("Illegal term type "+name+\
" at %s",self.__sgen.get_pos())
self.__prev=name
def parse_rel_term(self,name,params):
"Parses the arguments of relative location terms and fires the event."
no=self.__parse_instance_or_all(params[0])
if len(params)>1:
type=params[1]
if not (type=="#element" or type=="#pi" or type=="#comment" or \
type=="#text" or type=="#cdata" or type=="#all" or \
self.__is_valid(type,reg_name)):
raise XPointerParseException("Invalid type at %s",
self.__sgen.get_pos())
else:
type="#element"
attrs=[]
ix=2
while ix+1<len(params):
if not self.__is_valid(params[ix],reg_name):
raise XPointerParseException("Not a valid name at %s",
self.__sgen.get_pos())
attrs.append((params[ix],params[ix+1]))
ix=ix+2
self.handle_rel_term(name,no,type,attrs)
def parse_span_term(self,params):
"Parses the arguments of the span term and fires the event."
raise XPointerUnsupportedException("'span' keyword unsupported.")
def parse_attr_term(self,params):
"Parses the argument of the attr term and fires the event."
if len(params)!=1:
raise XPointerParseException("'attr' location terms must have "
"exactly one parameter (at %s)",
self.__sgen.get_pos())
if not self.__is_valid(params[0],reg_name):
raise XPointerParseException("'%s' is not a valid attribute "
"name at %s" % name,
self.__sgen.get_pos())
self.handle_attr_term(params[0])
def parse_string_term(self,params):
"Parses the argument of the string term and fires the event."
no=self.__parse_instance_or_all(params[0])
if len(params)>1:
skiplit=params[1]
else:
skiplit=None
if len(params)>2:
if params[2]=="end":
pos="end"
else:
try:
pos=int(params[2])
except ValueError,e:
raise XPointerParseException("Expected number at %s",
self.__sgen.get_pos())
if pos==0:
raise XPointerParseException("0 is not an acceptable "
"value at %s",
self.__sgen.get_pos())
else:
pos=None
if len(params)>3:
try:
length=int(params[3])
except ValueError,e:
raise XPointerParseException("Expected number at %s",
self.__sgen.get_pos())
else:
length=0
self.handle_string_term(no,skiplit,pos,length)
# Event methods to be overridden
def handle_abs_term(self,name,param):
"Called to handle absolute location terms."
pass
def handle_rel_term(self,name,no,type,attrs):
"Called to handle relative location terms."
pass
def handle_attr_term(self,attr_name):
"Called to handle 'attr' location terms."
pass
def handle_span_term(self,frm,to):
"Called to handle 'span' location terms."
pass
def handle_string_term(self,no,skiplit,pos,length):
"Called to handle 'string' location terms."
pass
# ----- XPointer implementation that navigates a DOM tree
# Iterator classes
class DescendantIterator:
def __init__(self):
self.stack=[]
def __call__(self,node):
next=node.firstChild
if next==None:
next=node.nextSibling
while next==None:
if self.stack==[]:
raise XPointerFailedException("No matching node")
next=self.stack[-1].nextSibling
del self.stack[-1]
self.stack.append(next)
return next
class FollowingIterator:
def __init__(self):
self.seen_hash={}
self.skip_child=0
def __call__(self,node):
if not self.skip_child:
next=node.firstChild
else:
self.skip_child=0
next=None
if next==None:
next=node.getNextSibling()
if next==None:
next=node.parentNode
self.skip_child=1 # Don't go down, we've been there :-)
if next.GI=="#DOCUMENT":
raise XPointerFailedException("No matching node")
if self.seen_hash.has_key(next.id()):
next=node.nextSibling
prev=node
while next==None:
next=prev.parentNode
self.skip_child=1 # Don't go down, we've been there :-)
prev=next
if next.nodeName=="#DOCUMENT":
raise XPointerFailedException("No matching node")
if self.seen_hash.has_key(next.id()):
next=prev.nextSibling
if next!=None:
self.skip_child=0
else:
# We're above all the nodes we've looked at. Throw out the
# hashed objects.
self.seen_hash.clear()
self.seen_hash[next.id()]=1
return next
# The implementation itself
class XDOMLocator(XPointerParser):
def __init__(self, xpointer, document):
XPointerParser.__init__(self, xpointer)
self.__node=document
self.__first=1
self.__prev=None
def __node_matches(self,node,type,attrs):
"Checks whether a DOM node matches a foo(2,SECTION,ID,I5) selector."
if type==node.nodeName or \
(type=="#element" and node.nodeType == Node.ELEMENT_NODE) or \
(type=="#pi" and node.nodeType == Node.PROCESSING_INSTRUCTION_NODE) or \
(type=="#comment" and node.nodeType == Node.COMMENT_NODE) or \
(type=="#text" and node.nodeType == Node.TEXT_NODE) or \
(type=="#cdata" and node.nodeType == Node.CDATA_SECTION_NODE) or \
type=="#all":
if attrs!=None:
for (a,v) in attrs:
try:
if v!=node.getAttribute(a):
return 0
except KeyError,e:
return 0
return 1
else:
return 0
def __get_node(self,no,type,attrs,iterator):
"""General method that iterates through the tree calling the iterator
on the current node for each step to get the next node."""
count=0
current=iterator(self.__node)
while current!=None:
if self.__node_matches(current,type,attrs):
count=count+1
if count==no:
return current
current=iterator(current)
raise XPointerFailedException("No matching node")
def __get_child(self,no,type,attrs):
if type==None:
candidates = self.__node.childNodes
else:
candidates = []
for obj in self.__node.childNodes:
if self.__node_matches(obj,type,attrs):
candidates.append(obj)
try:
return candidates[no-1]
except IndexError,e:
raise XPointerFailedException("No matching node")
def get_node(self):
"Returns the located node."
return self.__node
def handle_abs_term(self,name,param):
"Called to handle absolute location terms."
if name=="root":
if self.__node.nodeType != Node.DOCUMENT_NODE:
raise XPointerFailedException("Expected document node")
self.__node=self.__node.documentElement
elif name=="origin":
pass # Just work from current node
elif name=="id":
self.__node=ext.GetElementById(self.__node, param)
elif name=="html":
raise XPointerUnsupportedException("Term type 'html' unsupported.")
def handle_rel_term(self,name,no,type,attrs):
"Called to handle relative location terms."
if name=="child":
next=self.__get_child(no,type,attrs)
elif name=="ancestor":
next=self.__get_node(no,type,attrs,DOM.Node._get_parentNode)
elif name=="psibling":
next=self.__get_node(no,type,attrs,DOM.Node._get_previousSibling)
elif name=="fsibling":
next=self.__get_node(no,type,attrs,DOM.Node._get_nextSibling)
elif name=="descendant":
next=self.__get_node(no,type,attrs,DescendantIterator())
elif name=="following":
next=self.__get_node(no,type,attrs,FollowingIterator())
self.__node=next
self.__prev=name
def handle_attr_term(self, attr_name):
if __node.nodeType != Node.ELEMENT_NODE:
raise XPointerFailedException("'attr' location term used from "
"non-element node")
if not self.__node.attributes.has_key(attr_name):
raise XPointerFailedException("Non-existent attribute '%s' located"
" by 'attr' term" % attr_name)
self.__node=self.__node.attributes.getNamedItem(attr_name)
def handle_string_term(self,no,skiplit,pos,length):
raise XPointerUnsupportedException("'string' location terms not "
"supported")
def LocateNode(node, xpointer):
try:
xp=XDOMLocator(xpointer, node)
xp.parse()
return xp.get_node()
except XPointerParseException,e:
print "ERROR: "+str(e)
PyXML-0.8.4/demo/genxml/ 0000755 0012410 0011756 00000000000 10152625721 015417 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/genxml/README 0000644 0012410 0011756 00000002423 07001374222 016274 0 ustar loewis hpifb6 0000000 0000000 This example demonstrates how to generate XML from non-XML data
sources. This example is based directly on an example presented by
Tom Gavin and Joseph E. Hughes at the August 1999 Washington DC
SGML/XML User's Group meeting. PowerPoint slides containing the
original DOM-based solution in Java are available at
http://www.eccnet.com/sgmlug/.
Since the specifics of reading other data formats vary greatly, this
example will use a simple comma-separated-value format similar to that
found as an "export" format for many applications which work with
tabular data. A sample data file is contained in data.txt.
The loaddata.py script demonstrates three different approaches to XML
generation: DOM-based, SAX-based, and <file>.write()-based. The first
two approaches are specific to generating XML, while the third could
be used to generate any format. It is interesting to note the
differences in code size to get roughly the same output using each of
the three approaches.
The script's main() function does little but parse the command line,
selecting the processing class appropriately. Processing consists of
instantiating the processing class and calling its run() method.
Concrete subclasses of the abstract processing class determine the
actual machinery used to create the XML output.
PyXML-0.8.4/demo/genxml/data.txt 0000644 0012410 0011756 00000000124 07001374222 017062 0 ustar loewis hpifb6 0000000 0000000 lname,fname,emp,manager
Jones,Tom,1111,1111
Smith,John,2222,1111
Doe,Jane,3333,1111
PyXML-0.8.4/demo/genxml/loaddata.py 0000644 0012410 0011756 00000017627 07165177650 017574 0 ustar loewis hpifb6 0000000 0000000 #! /usr/bin/env python
"""
%(program)s -- example script to convert comma-separated value file to
XML using the Document Object Model (DOM), the Simple
API for XML (SAX), or the 'write' model (a bunch of calls
to <file>.write()).
Usage: %(program)s [--dom|--sax|--write] [infile [outfile]]
"""
__version__ = '$Revision: 1.3 $'
import getopt
import os
import string
import sys
# Note that we only need one of these for any given version of the
# processing class.
#
from xml.dom.DOMImplementation import implementation
import xml.sax.writer
import xml.utils
def main():
"""Process command line parameters and run the conversion."""
inpath = "-"
outpath = "-"
args = sys.argv[1:]
processor_class = DOMProcess
try:
opts, args = getopt.getopt(args, "dhsw",
["dom", "help", "sax", "write"])
except getopt.error, e:
usage(err=e, rc=2)
for opt, arg in opts:
if opt in ("-d", "--dom"):
processor_class = DOMProcess
elif opt in ("-h", "--help"):
usage()
elif opt in ("-s", "--sax"):
processor_class = SAXProcess
elif opt in ("-w", "--write"):
processor_class = WriteProcess
if len(args) == 2:
inpath, outpath = args
elif len(args) == 1:
inpath = args[0]
elif len(args) == 0:
pass
else:
usage(err="too many command-line arguments", rc=2)
infp = get_input(inpath)
outfp = get_output(outpath)
processor = processor_class(infp, outfp)
processor.run()
infp.close()
outfp.close()
class BaseProcess:
"""Base class for the conversion processors. Each concrete subclass
must provide the following methods:
initOutput()
Initialize the output stream and any internal data structures
that the conversion process needs.
addRecord(lname, fname, type)
Add one record to the output stream (or the internal structures)
where lname is the last name, fname is the first name, and type
is either 'manager' or 'employee'.
finishOutput()
Finish all output generation. If all work has been on internal
data structures, this is where they should be converted to text
and written out.
"""
def __init__(self, infp, outfp):
"""Store the input and output streams for later use."""
self.infp = infp
self.outfp = outfp
def run(self):
"""Perform the complete conversion process.
This method is responsible for parsing the input and calling the
subclass-provided methods in the right order.
"""
self.initOutput()
self.infp.readline() # ignore field names
rec = self.getNextRecord()
while rec:
lname, fname, type = rec
self.addRecord(lname, fname, type)
rec = self.getNextRecord()
self.finishOutput()
def getNextRecord(self):
"""Read and return the next input record, or return None."""
line = self.infp.readline()
if line:
parts = map(string.strip, string.split(line, ','))
lname, fname, eid, mid = parts
type = ("employee", "manager")[eid == mid]
return lname, fname, type
else:
return None
class DOMProcess(BaseProcess):
"""Concrete conversion process which uses a DOM structure as an
internal data structure.
Content is added to the DOM tree for each input record, and the
entire tree is serialized and written to the output stream in the
finishOutput() method.
"""
def initOutput(self):
# Create a new document with no namespace uri, qualified name,
# or document type
self.document = implementation.createDocument(None,None,None)
self.personnel = self.document.createElement("personnel")
self.document.appendChild(self.personnel)
def addRecord(self, lname, fname, type):
doc = self.document
self.personnel.appendChild(doc.createTextNode("\n "))
emp = doc.createElement("employee")
emp.setAttribute("type", type)
self.personnel.appendChild(emp)
emp.appendChild(doc.createTextNode("\n "))
ln = doc.createElement("lname")
ln.appendChild(doc.createTextNode(lname))
emp.appendChild(ln)
emp.appendChild(doc.createTextNode("\n "))
fn = doc.createElement("fname")
fn.appendChild(doc.createTextNode(fname))
emp.appendChild(fn)
emp.appendChild(doc.createTextNode("\n "))
def finishOutput(self):
t = self.document.createTextNode("\n")
self.personnel.appendChild(t)
# XXX toxml not supported by 4DOM
# self.outfp.write(self.document.toxml())
xml.dom.ext.PrettyPrint(self.document, self.outfp)
self.outfp.write("\n")
class SAXProcess(BaseProcess):
"""Concrete conversion process that uses a SAX implementation that
writes output to a file.
XML is generated by calling the SAX methods that would be called
when the resulting document instance is parsed. Data is written to
the output stream incrementally with this approach, and no real
internal state is maintained.
"""
def initOutput(self):
info = xml.sax.writer.XMLDoctypeInfo()
info.add_element_container("personnel")
info.add_element_container("employee")
saxout = self.saxout = xml.sax.writer.PrettyPrinter(
self.outfp, dtdinfo=info)
saxout.startDocument()
saxout.startElement("personnel", {})
def addRecord(self, lname, fname, type):
saxout = self.saxout
saxout.startElement("employee", {"type": type})
saxout.startElement("lname", {})
saxout.characters(lname, 0, len(lname))
saxout.endElement("lname")
saxout.startElement("fname", {})
saxout.characters(fname, 0, len(fname))
saxout.endElement("fname")
saxout.endElement("employee")
def finishOutput(self):
self.saxout.endElement("personnel")
self.saxout.endDocument()
class WriteProcess(BaseProcess):
"""Concrete conversion process that simply formats the XML
directly and uses the write() method of a file to write it out.
The only helper function used to generate the XML is the
xml.utils.escape() function; the methods of this class are
solely responsible for proper formatting of the markup.
"""
#
# Note the simplicity of using a bunch of write() calls; using print
# statements would also be reasonable in many contexts.
#
def initOutput(self):
self.outfp.write('<?xml version="1.0" encoding="iso-8859-1"?>\n')
self.outfp.write("<personnel>\n")
def addRecord(self, lname, fname, type):
self.outfp.write(' <employee type="%s">\n' % type)
self.outfp.write(" <lname>%s</lname>\n" % xml.utils.escape(lname))
self.outfp.write(" <fname>%s</fname>\n" % xml.utils.escape(fname))
self.outfp.write(" </employee>\n")
def finishOutput(self):
self.outfp.write("</personnel>\n")
def get_input(path):
"""Get input file from path; '-' indicates stdin."""
if path == "-":
return sys.stdin
else:
return open(path)
def get_output(path):
"""Get output file from path; '-' indicates stdout."""
if path == "-":
return sys.stdout
else:
return open(path, "w")
def usage(err=None, rc=0):
"""Write out a usage message, possibly to stderr.
If err or rc are true, the message is written to stderr instead of
stdout. The script docstring is used as the source of help text.
Exits with result code rc.
"""
if err or rc:
sys.stdout = sys.stderr
program = os.path.basename(sys.argv[0])
if err:
print "%s: %s" % (program, str(err))
vars = {"program": program}
print __doc__ % vars
sys.exit(rc)
if __name__ == "__main__":
main()
PyXML-0.8.4/demo/quotes/ 0000755 0012410 0011756 00000000000 10152625721 015445 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/quotes/README 0000644 0012410 0011756 00000001636 07175443715 016347 0 ustar loewis hpifb6 0000000 0000000 The files in this directory demonstrate maintaining a quotation
collection in XML. The still-unnamed markup language contains
'quotation' elements, which contain the text of the quotation and
optional 'author' and 'source' elements. For the quotation text,
there are some simple semantic markups such as 'em', 'cite', and
'foreign'.
quotations.dtd DTD for the markup language.
sample.xml A sample quotation file.
qtfmt.py Program to read a file marked up using the language
specified in quotations.dtd, and output the
list in HTML, text, or fortune format.
The qtfmt.py script requires Python 2.0, since it assumes UTF-8 output
and uses the codecs module to convert its output to Latin-1.
Contact amk1@bigfoot.com if you have questions or comments about the
contents of this directory. For the author's complete quotation
collections, please go to http://starship.python.net/crew/amk/quotations/
PyXML-0.8.4/demo/quotes/qtfmt.py 0000644 0012410 0011756 00000033110 07413602734 017155 0 ustar loewis hpifb6 0000000 0000000 #!/usr/bin/env python
#
# qtfmt.py v1.10
# v1.10 : Updated to use Python 2.0 Unicode type.
#
# Read a document in the quotation DTD, converting it to a list of Quotation
# objects. The list can then be output in several formats.
__doc__ = """Usage: qtfmt.py [options] file1.xml file2.xml ...
If no filenames are provided, standard input will be read.
Available options:
-f or --fortune Produce output for the fortune(1) program
-h or --html Produce HTML output
-t or --text Produce plain text output
-m N or --max N Suppress quotations longer than N lines;
defaults to 0, which suppresses no quotations at all.
"""
import string, re, cgi, types
import codecs
from xml.sax import saxlib, saxexts
def simplify(t, indent="", width=79):
"""Strip out redundant spaces, and insert newlines to
wrap the text at the given width."""
t = string.strip(t)
t = re.sub('\s+', " ", t)
if t=="": return t
t = indent + t
t2 = ""
while len(t) > width:
index = string.rfind(t, ' ', 0, width)
if index == -1: t2 = t2 + t[:width] ; t = t[width:]
else: t2 = t2 + t[:index] ; t = t[index+1:]
t2 = t2 + '\n'
return t2 + t
class Quotation:
"""Encapsulates a single quotation.
Attributes:
stack -- used during construction and then deleted
text -- A list of Text() instances, or subclasses of Text(),
containing the text of the quotation.
source -- A list of Text() instances, or subclasses of Text(),
containing the source of the quotation. (Optional)
author -- A list of Text() instances, or subclasses of Text(),
containing the author of the quotation. (Optional)
Methods:
as_fortune() -- return the quotation formatted for fortune
as_html() -- return an HTML version of the quotation
as_text() -- return a plain text version of the quotation
"""
def __init__(self):
self.stack = [ Text() ]
self.text = []
def as_text(self):
"Convert instance into a pure text form"
output = ""
def flatten(textobj):
"Flatten a list of subclasses of Text into a list of paragraphs"
if type(textobj) != types.ListType: textlist=[textobj]
else: textlist = textobj
paragraph = "" ; paralist = []
for t in textlist:
if (isinstance(t, PreformattedText) or
isinstance(t, CodeFormattedText) ):
paralist.append(paragraph)
paragraph = ""
paralist.append(t)
elif isinstance(t, Break):
paragraph = paragraph + t.as_text()
paralist.append(paragraph)
paragraph = ""
else:
paragraph = paragraph + t.as_text()
paralist.append(paragraph)
return paralist
# Flatten the list of instances into a list of paragraphs
paralist = flatten(self.text)
if len(paralist) > 1:
indent = 2*" "
else:
indent = ""
for para in paralist:
if isinstance(para, PreformattedText) or isinstance(para, CodeFormattedText):
output = output + para.as_text()
else:
output = output + simplify(para, indent) + '\n'
attr = ""
for i in ['author', 'source']:
if hasattr(self, i):
paralist = flatten(getattr(self, i))
text = string.join(paralist)
if attr:
attr = attr + ', '
text = string.lower(text[:1]) + text[1:]
attr = attr + text
attr=simplify(attr, width = 79 - 4 - 3)
if attr: output = output + ' -- '+re.sub('\n', '\n ', attr)
return output + '\n'
def as_fortune(self):
return self.as_text() + '%'
def as_html(self):
output = "<P>"
def flatten(textobj):
if type(textobj) != types.ListType: textlist = [textobj]
else: textlist = textobj
paragraph = "" ; paralist = []
for t in textlist:
paragraph = paragraph + t.as_html()
if isinstance(t, Break):
paralist.append(paragraph)
paragraph = ""
paralist.append(paragraph)
return paralist
paralist = flatten(self.text)
for para in paralist: output = output + string.strip(para) + '\n'
attr = ""
for i in ['author', 'source']:
if hasattr(self, i):
paralist = flatten(getattr(self, i))
text = string.join(paralist)
attr=attr + ('<P CLASS=%s>' % i) + string.strip(text)
return output + attr
# Text and its subclasses are used to hold chunks of text; instances
# know how to display themselves as plain text or as HTML.
class Text:
"Plain text"
def __init__(self, text=""):
self.text = text
# We need to allow adding a string to Text instances.
def __add__(self, val):
newtext = self.text + str(val)
# __class__ must be used so subclasses create instances of themselves.
return self.__class__(newtext)
def __str__(self): return self.text
def __repr__(self):
s = string.strip(self.text)
if len(s) > 15: s = s[0:15] + '...'
return '<%s: "%s">' % (self.__class__.__name__, s)
def as_text(self): return self.text
def as_html(self): return cgi.escape(self.text)
class PreformattedText(Text):
"Text inside <pre>...</pre>"
def as_text(self):
return str(self.text)
def as_html(self):
return '<pre>' + cgi.escape(str(self.text)) + '</pre>'
class CodeFormattedText(Text):
"Text inside <code>...</code>"
def as_text(self):
return str(self.text)
def as_html(self):
return '<code>' + cgi.escape(str(self.text)) + '</code>'
class CitedText(Text):
"Text inside <cite>...</cite>"
def as_text(self):
return '_' + simplify(str(self.text)) + '_'
def as_html(self):
return '<cite>' + string.strip(cgi.escape(str(self.text))) + '</cite>'
class ForeignText(Text):
"Foreign words, from Latin or French or whatever."
def as_text(self):
return '_' + simplify(str(self.text)) + '_'
def as_html(self):
return '<i>' + string.strip(cgi.escape(str(self.text))) + '</i>'
class EmphasizedText(Text):
"Text inside <em>...</em>"
def as_text(self):
return '*' + simplify(str(self.text)) + '*'
def as_html(self):
return '<em>' + string.strip(cgi.escape(str(self.text))) + '</em>'
class Break(Text):
def as_text(self): return ""
def as_html(self): return "<P>"
# The QuotationDocHandler class is a SAX handler class that will
# convert a marked-up document using the quotations DTD into a list of
# quotation objects.
class QuotationDocHandler(saxlib.HandlerBase):
def __init__(self, process_func):
self.process_func = process_func
self.newqt = None
# Errors should be signaled, so we'll output a message and raise
# the exception to stop processing
def fatalError(self, exception):
sys.stderr.write('ERROR: '+ str(exception)+'\n')
sys.exit(1)
error = fatalError
warning = fatalError
def characters(self, ch, start, length):
if self.newqt != None:
s = ch[start:start+length]
# Undo the UTF-8 encoding, converting to ISO Latin1, which
# is the default character set used for HTML.
latin1_encode = codecs.lookup('iso-8859-1') [0]
unicode_str = s
s, consumed = latin1_encode( unicode_str )
assert consumed == len( unicode_str )
self.newqt.stack[-1] = self.newqt.stack[-1] + s
def startDocument(self):
self.quote_list = []
def startElement(self, name, attrs):
methname = 'start_'+str(name)
if hasattr(self, methname):
method = getattr(self, methname)
method(attrs)
else:
sys.stderr.write('unknown start tag: <' + name + ' ')
for name, value in attrs.items():
sys.stderr.write(name + '=' + '"' + value + '" ')
sys.stderr.write('>\n')
def endElement(self, name):
methname = 'end_'+str(name)
if hasattr(self, methname):
method = getattr(self, methname)
method()
else:
sys.stderr.write('unknown end tag: </' + name + '>\n')
# There's nothing to be done for the <quotations> tag
def start_quotations(self, attrs):
pass
def end_quotations(self):
pass
def start_quotation(self, attrs):
if self.newqt == None: self.newqt = Quotation()
def end_quotation(self):
st = self.newqt.stack
for i in range(len(st)):
if type(st[i]) == types.StringType:
st[i] = Text(st[i])
self.newqt.text=self.newqt.text + st
del self.newqt.stack
if self.process_func: self.process_func(self.newqt)
else:
print "Completed quotation\n ", self.newqt.__dict__
self.newqt=Quotation()
# Attributes of a quotation: <author>...</author> and <source>...</source>
def start_author(self, data):
# Add the current contents of the stack to the text of the quotation
self.newqt.text = self.newqt.text + self.newqt.stack
# Reset the stack
self.newqt.stack = [ Text() ]
def end_author(self):
# Set the author attribute to contents of the stack; you can't
# have more than one <author> tag per quotation.
self.newqt.author = self.newqt.stack
# Reset the stack for more text.
self.newqt.stack = [ Text() ]
# The code for the <source> tag is exactly parallel to that for <author>
def start_source(self, data):
self.newqt.text = self.newqt.text + self.newqt.stack
self.newqt.stack = [ Text() ]
def end_source(self):
self.newqt.source = self.newqt.stack
self.newqt.stack = [ Text() ]
# Text markups: <br/> for breaks, <pre>...</pre> for preformatted
# text, <em>...</em> for emphasis, <cite>...</cite> for citations.
def start_br(self, data):
# Add a Break instance, and a new Text instance.
self.newqt.stack.append(Break())
self.newqt.stack.append( Text() )
def end_br(self): pass
def start_pre(self, data):
self.newqt.stack.append( Text() )
def end_pre(self):
self.newqt.stack[-1] = PreformattedText(self.newqt.stack[-1])
self.newqt.stack.append( Text() )
def start_code(self, data):
self.newqt.stack.append( Text() )
def end_code(self):
self.newqt.stack[-1] = CodeFormattedText(self.newqt.stack[-1])
self.newqt.stack.append( Text() )
def start_em(self, data):
self.newqt.stack.append( Text() )
def end_em(self):
self.newqt.stack[-1] = EmphasizedText(self.newqt.stack[-1])
self.newqt.stack.append( Text() )
def start_cite(self, data):
self.newqt.stack.append( Text() )
def end_cite(self):
self.newqt.stack[-1] = CitedText(self.newqt.stack[-1])
self.newqt.stack.append( Text() )
def start_foreign(self, data):
self.newqt.stack.append( Text() )
def end_foreign(self):
self.newqt.stack[-1] = ForeignText(self.newqt.stack[-1])
self.newqt.stack.append( Text() )
if __name__ == '__main__':
import sys, getopt
# Process the command-line arguments
opts, args = getopt.getopt(sys.argv[1:], 'fthm:r',
['fortune', 'text', 'html', 'max=', 'help',
'randomize'] )
# Set defaults
maxlength = 0 ; method = 'as_fortune'
randomize = 0
# Process arguments
for opt, arg in opts:
if opt in ['-f', '--fortune']:
method='as_fortune'
elif opt in ['-t', '--text']:
method = 'as_text'
elif opt in ['-h', '--html']:
method = 'as_html'
elif opt in ['-m', '--max']:
maxlength = string.atoi(arg)
elif opt in ['-r', '--randomize']:
randomize = 1
elif opt == '--help':
print __doc__ ; sys.exit(0)
# This function will simply output each quotation by calling the
# desired method, as long as it's not suppressed by a setting of
# --max.
qtlist = []
def process_func(qt, qtlist=qtlist, maxlength=maxlength, method=method):
func = getattr(qt, method)
output = func()
length = string.count(output, '\n')
if maxlength!=0 and length > maxlength: return
qtlist.append(output)
# Loop over the input files; use sys.stdin if no files are specified
if len(args) == 0: args = [sys.stdin]
for file in args:
if type(file) == types.StringType: input = open(file, 'r')
else: input = file
# Enforce the use of the Expat parser, because the code needs to be
# sure that the output will be UTF-8 encoded.
p=saxexts.XMLParserFactory.make_parser(["xml.sax.drivers.drv_pyexpat"])
dh = QuotationDocHandler(process_func)
p.setDocumentHandler(dh)
p.setErrorHandler(dh)
p.parseFile(input)
if type(file) == types.StringType: input.close()
p.close()
# Randomize the order of the quotations
if randomize:
import whrandom
q2 = []
for i in range(len(qtlist)):
qt = whrandom.randint(0,len(qtlist)-1 )
q2.append( qtlist[qt] )
qtlist[qt:qt+1] = []
assert len(qtlist) == 0
qtlist = q2
for quote in qtlist:
print quote
# We're done!
PyXML-0.8.4/demo/quotes/quotations.dtd 0000644 0012410 0011756 00000001766 06772561171 020375 0 ustar loewis hpifb6 0000000 0000000
<!--
A DTD for storing simple quotations. This DTD doesn't provide
sophisticated cross-referencing or anything like that; if you're
working on the next edition of Barlett's Familiar Quotations,
you'll need a fancier DTD with more features.
Version 1.0 : Sep 5 1998
A.M. Kuchling (amk1@bigfoot.com)
-->
<!ELEMENT quotations (quotation)*>
<!ELEMENT quotation (#PCDATA | em | foreign | cite | br | pre | code |
author | source)* >
<!ELEMENT author (#PCDATA)>
<!ELEMENT source (#PCDATA|cite)*>
<!-- Different forms of emphasis for phrases -->
<!ELEMENT cite (#PCDATA) >
<!ELEMENT code (#PCDATA) >
<!ELEMENT em (#PCDATA) >
<!ELEMENT foreign (#PCDATA) >
<!ELEMENT pre (#PCDATA) >
<!ATTLIST pre xml:space (default|preserve) 'preserve'>
<!-- Break element -->
<!ELEMENT br EMPTY>
<!-- Various accents -->
<!ENTITY acirc "â">
<!ENTITY ccedil "ç">
<!ENTITY eacute "é">
<!ENTITY iuml "ï">
<!ENTITY oacute "ó">
<!ENTITY ouml "ö">
PyXML-0.8.4/demo/quotes/sample.xml 0000644 0012410 0011756 00000004512 07175443715 017466 0 ustar loewis hpifb6 0000000 0000000 <?xml version="1.0"?>
<!DOCTYPE quotations SYSTEM "quotations.dtd">
<quotations>
<quotation>
We will perhaps eventually be writing only small modules which are
identified by name as they are used to build larger ones, so that
devices like indentation, rather than delimiters, might become
feasible for expressing local structure in the source language.
<source>Donald E. Knuth, "Structured Programming with goto
Statements", Computing Surveys, Vol 6 No 4, Dec. 1974</source>
</quotation>
<quotation>
I don't know a lot about this artificial life stuff
-- but I'm suspicious of anything Newsweek gets goofy about
-- and I suspect its primary use is as another money extraction tool
to be applied by ai labs to the department of defense
(and more power to 'em).
<br/>
Nevertheless in wondering why free software is so good these days
it occured to me that the propagation of free software is one gigantic
artificial life evolution experiment, but the metaphor isn't perfect.
<br/>
Programs are thrown out into the harsh environment, and the bad ones
die. The good ones adapt rapidly and become very robust in short
order.
<br/>
The only problem with the metaphor is that the process isn't random
at all. Python <em>chooses</em> to include tk's genes; Linux decides
to make itself more suitable for symbiosis with X, etcetera.
<br/>
Free software is artificial life, but better.
<source>Aaron Watters, 29 Sep 1994</source>
</quotation>
<quotation>
It has also been referred to as the "Don Beaudry <em>hack</em>," but
that's a misnomer. There's nothing hackish about it -- in fact,
it is rather elegant and deep, even though there's something dark
to it.
<source>Guido van Rossum, <cite>Metaclass Programming in Python 1.5</cite></source>
</quotation>
<quotation>
This is not a technical issue so much as a human issue; we
are limited and so is our time. (Is this a bug or a feature of time?
Careful; trick question!)
<source>Fred Drake on the Documentation SIG, 9 Sep 1998</source>
</quotation>
<quotation>
Counting is the most simple and primitive of narratives -- 1 2 3 4 5 6
7 8 9 10 -- a tale with a beginning, a middle and an end and a sense
of progression -- arriving at a finish of two digits -- a goal
attained, a denouement reached.
<author>Peter Greenaway</author>
<source><cite>Fear of Drowning By Numbers</cite> (1988)</source>
</quotation>
</quotations>
PyXML-0.8.4/demo/sax/ 0000755 0012410 0011756 00000000000 10152625721 014720 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/sax/README 0000644 0012410 0011756 00000002057 07165434556 015622 0 ustar loewis hpifb6 0000000 0000000 These examples demonstrate the Python SAX API, version 1. In all examples,
the sax driver can be specified by setting the PY_SAX_PARSER environment
variable. Valid settings are
- xml.sax.drivers.drv_pyexpat
- xml.sax.drivers.drw_xmlproc
- xml.sax.drivers.drv_sgmlop
as well as any other driver listed in the xml/sax/drivers directory.
sax2obj.py ???
saxdemo.py Parses an XML file, and prints it in canonical form.
Invoke as 'python saxdemo.py filename.xml'.
The standard driver will be pyexpat.
Alternative drivers can be specified with the -d option
of saxdemo.py; the prefix 'xml.sax.drivers.drv_' is
automatically added to the driver.
saxhack.py appears to be broken
saxstats.py Prints statistics about an xml file.
saxtimer.py Times parsing a document; arguments are the parser name
(the prefix 'xml.sax.drivers.drv_' is automatically added)
and the document name.
saxtrace.py parses a document using xmlproc, and prints all SAX events. PyXML-0.8.4/demo/sax/sax2obj.py 0000644 0012410 0011756 00000010233 07413602734 016646 0 ustar loewis hpifb6 0000000 0000000 """
A general XML element -> Python object converter based on SAX.
"""
from xml.sax import saxexts,saxlib,saxutils
import re,string
reg_ws=re.compile("[%s]+" % string.whitespace)
class ConvSpec:
"""Contains the information needed to convert SAX events to Python
objects."""
def __init__(self):
pass
class SAXObject:
def __init__(self):
self._fields={}
def has_field(self,field):
return self._fields.has_key(field)
def get_fields(self):
return self._fields.keys()
def get_field(self,field):
return self._fields[field]
def set_field(self,field,value):
self._fields[field]=value
def display(self):
for field in self._fields.keys():
print "%s=%s" % (field,self._fields[field])
def __getattr__(self,attr):
try:
return self._fields[attr]
except KeyError,e:
raise AttributeError(str(e))
def __cmp__(self,obj):
if id(obj)==id(self):
return 0
else:
return 1
class DocHandler(saxlib.DocumentHandler):
def __init__(self,target_elem,list_elems,ign_elems,rep_field):
self.target_elem=target_elem
self.list_elems=list_elems
self.ign_elems=ign_elems
self.rep_field=rep_field
self.ignoring=0
self.objects=[]
self.current=None
self.cur_data=""
self.stack=[]
def startElement(self,name,attrs):
if self.ignoring:
return
if name==self.target_elem:
self.current=SAXObject()
for attr in attrs:
self.current.set_field(attr,attrs[attr])
elif self.list_elems.has_key(name):
if not self.current.has_field(name):
self.current.set_field(name,[])
self.stack.append(self.current)
self.current=SAXObject()
elif self.rep_field.has_key(name) and not self.current.has_field(name):
self.current.set_field(name,[])
else:
if self.ign_elems.has_key(name):
self.ignoring=self.ignoring+1
self.cur_data=""
def characters(self,data,start,length):
if self.ignoring or self.current==None:
return
data=data[start:start+length]
mo=reg_ws.match(data)
if mo!=None and mo.end(0)==len(data):
return
self.cur_data=self.cur_data+data
def endElement(self,name):
if self.ign_elems.has_key(name):
self.ignoring=self.ignoring-1
return
if self.ignoring or self.current==None:
return
if name==self.target_elem:
self.objects.append(self.current)
self.current=None
elif self.list_elems.has_key(name):
obj=self.current
self.current=self.stack[-1]
del self.stack[-1]
self.current.get_field(name).append(obj)
elif self.rep_field.has_key(name):
self.current.get_field(name).append(self.cur_data)
else:
self.current.set_field(name,self.cur_data)
def get_objects(self):
return self.objects
def make_objects(url,element,list_elems={},ign_elems={},rep_field={}):
dh=DocHandler(element,list_elems,ign_elems,rep_field)
eh=saxutils.ErrorPrinter()
parser=saxexts.make_parser()
parser.setDocumentHandler(dh)
parser.setErrorHandler(eh)
parser.parse(url)
return dh.get_objects()
def make_xml(filename,root_elem,trgt_elem,list):
out=open(filename,"w")
out.write("<%s>\n" % root_elem)
for obj in list:
out.write(" <%s>\n" % trgt_elem)
for field in obj.get_fields():
out.write(" <%s>%s</%s>\n" % \
(field,escape_markup(obj.get_field(field)),field))
out.write(" </%s>\n" % trgt_elem)
out.write("\n</%s>" % root_elem)
out.close()
def list2hash(lst,key_field):
hash={}
for obj in lst:
hash[obj.get_field(key_field)]=obj
return hash
def escape_markup(str):
out=""
for ch in str:
if ch=="<":
out=out+"<"
elif ch==">":
out=out+">"
else:
out=out+ch
return out
PyXML-0.8.4/demo/sax/saxdemo.py 0000644 0012410 0011756 00000003145 07526150521 016737 0 ustar loewis hpifb6 0000000 0000000 # A demo SAX application: using SAX to parse XML documents into ESIS
# or canonical XML.
from xml.sax import saxexts, saxlib, saxutils
import sys,urllib2,getopt
### Interpreting arguments (rather crudely)
try:
(args,trail)=getopt.getopt(sys.argv[1:],"sed:")
assert trail, "No argument provided"
except Exception,e:
print "ERROR: %s" % e
print
print "Usage: python saxdemo.py [-e] [-d drv] filename [outfilename]"
print
print " -e: Output ESIS instead of normalized XML."
print " -s: Silent (no messages except error messages)"
print " -d: Use driver 'drv', where 'drv' is a module name."
print " outfilename: Write to this file."
sys.exit(1)
driver=None
esis=0
silent=0
in_sysID=trail[0]
if len(trail)==2:
out_sysID=trail[1]
else:
out_sysID=""
for (arg,val) in args:
if arg=="-d":
driver="xml.sax.drivers.drv_" + val
elif arg=="-e":
esis=1
elif arg=="-s":
silent=1
p=saxexts.make_parser(driver)
p.setErrorHandler(saxutils.ErrorPrinter())
if out_sysID=="":
out=sys.stdout
else:
try:
out=urllib2.urlopen(out_sysID)
except IOError,e:
print out_sysID+": "+str(e)
if esis:
dh=saxutils.ESISDocHandler(out)
else:
dh=saxutils.Canonizer(out)
### Ready. Let's go!
if not silent:
print "Parser: %s (%s, %s)" % (p.get_parser_name(),p.get_parser_version(),
p.get_driver_version())
print
try:
p.setDocumentHandler(dh)
p.parse(in_sysID)
except IOError,e:
print in_sysID+": "+str(e)
except saxlib.SAXException,e:
print str(e)
### Cleaning up.
out.close()
PyXML-0.8.4/demo/sax/saxhack.py 0000644 0012410 0011756 00000006761 07413602734 016733 0 ustar loewis hpifb6 0000000 0000000 #
#
# $Id: saxhack.py,v 1.5 2001/12/30 12:17:32 loewis Exp $
#
# illustrate how a saxlib parser can interface directly to sgmlop
#
# history:
# 98-05-23 fl created (derived from the coreXML parser)
#
# Copyright (c) 1998 by Secret Labs AB
#
# info@pythonware.com
# http://www.pythonware.com
#
from xml.sax.saxlib import HandlerBase
class DocumentHandler:#(HandlerBase):
# SAX interface
def startElement(self, tag, attrs):
pass # print "start", tag
def endElement(self, tag):
pass # print "end", tag
def characters(self, text, start, len):
pass # print "data", text[start:start+len]
# --------------------------------------------------------------------
# sgmlop-based parser
from xml.parsers import sgmlop
class Parser:
def setDocumentHandler(self, dh):
self.parser = sgmlop.XMLParser()
self.parser.register(dh, 1)
def parseFile(self, file):
parser = self.parser
while 1:
data = file.read(16384)
if not data:
break
parser.feed(data)
parser.close()
# --------------------------------------------------------------------
# xmllib-based parser
from xml.parsers import xmllib
class xmllibParser(xmllib.XMLParser):
def setDocumentHandler(self, dh):
self.characters = dh.characters
self.unknown_starttag = dh.startElement
self.unknown_endtag = dh.endElement
def handle_data(self, data):
self.characters(data, 0, len(data))
def parseFile(self, file):
while 1:
data = file.read(16384)
if not data:
break
self.feed(data)
self.close()
# --------------------------------------------------------------------
# original xmllib-based parser
class slowParser(xmllib.SlowXMLParser):
def setDocumentHandler(self, dh):
self.characters = dh.characters
self.unknown_starttag = dh.startElement
self.unknown_endtag = dh.endElement
def handle_data(self, data):
self.characters(data, 0, len(data))
def parseFile(self, file):
while 1:
data = file.read(16384)
if not data:
break
self.feed(data)
file.close()
# ====================================================================
# test stuff
import time, os, sys
if len(sys.argv) == 1:
print 'Usage: saxhack.py <xml filename>'
sys.exit(1)
FILE = sys.argv[1]
size = os.stat(FILE)[6]
p = Parser()
dh = DocumentHandler()
p.setDocumentHandler(dh)
f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_direct = time.clock() - t
f.close()
#import sys ; sys.exit(0)
print t_direct
if t_direct == 0:
print 'Measured time was too small; use a larger XML file'
sys.exit(1)
print "sgmlop:", int(size / t_direct), "bytes per second"
p = xmllibParser()
#p=slowParser()
dh = DocumentHandler()
p.setDocumentHandler(dh)
f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_fast = time.clock() - t
f.close()
print "xmllib:", int(size / t_fast), "bytes per second"
p = slowParser()
dh = DocumentHandler()
p.setDocumentHandler(dh)
f = open(FILE)
t = time.clock()
p.parseFile(f) # dry run
t_slow = time.clock() - t
f.close()
print "slow xmllib:", int(size / t_slow), "bytes per second"
print
print "normalized timings:"
print "slow xmllib", 1.0
print "fast xmllib", round(t_fast / t_slow, 2), "(%sx)" % round(t_slow / t_fast, 1)
print "sgmlop ", round(t_direct / t_slow, 2), "(%sx)" % round(t_slow / t_direct, 1)
print
PyXML-0.8.4/demo/sax/saxstats.py 0000644 0012410 0011756 00000002204 07413602734 017147 0 ustar loewis hpifb6 0000000 0000000 # A simple SAX application that counts the number of elements, attributes and
# processing instructions in a document.
from xml.sax import saxexts
from xml.sax import saxlib
import sys
class CounterHandler(saxlib.DocumentHandler):
def __init__(self):
self.elems=0
self.attrs=0
self.pis=0
def startElement(self,name,attrs):
self.elems=self.elems+1
self.attrs=self.attrs+len(attrs)
def processingInstruction(self,target,data):
self.pis=self.pis+1
# --- Main prog
if len(sys.argv)<2:
print "Usage: python saxstats.py <document>"
print
print " <document>: file name of the document to parse"
sys.exit(1)
# Load parser and driver
print "\nLoading parser..."
p=saxexts.make_parser()
ch=CounterHandler()
p.setDocumentHandler(ch)
# Ready, set, go!
print "Starting parse..."
OK=0
try:
p.parse(sys.argv[1])
OK=1
except IOError,e:
print "\nERROR: "+sys.argv[1]+": "+str(e)
except saxlib.SAXException,e:
print "\nERROR: "+str(e)
print "Parse complete:"
print " Elements: %d" % ch.elems
print " Attributes: %d" % ch.attrs
print " Proc instrs: %d" % ch.pis
PyXML-0.8.4/demo/sax/saxtimer.py 0000644 0012410 0011756 00000002064 07413602734 017135 0 ustar loewis hpifb6 0000000 0000000 # A simple SAX application that measures the time spent parsing a
# document with an empty document handler.
from xml.sax import saxexts
from xml.sax import saxlib
import sys,time
if len(sys.argv)<3:
print "Usage: python <parser> <document>"
print
print " <document>: file name of the document to parse"
print " <parser>: driver package name"
sys.exit(1)
# Load parser and driver
print "\nLoading parser..."
try:
p=saxexts.make_parser("xml.sax.drivers.drv_" + sys.argv[1])
except saxlib.SAXException,e:
print "ERROR: Parser not available"
sys.exit(1)
# Ready, set, go!
sum=0
print "Starting parse..."
for ix in range(3):
start=time.clock()
OK=0
pt=0
try:
p.parse(sys.argv[2])
pt=time.clock()-start
OK=1
except IOError,e:
print "\nERROR: "+sys.argv[2]+": "+str(e)
except saxlib.SAXException,e:
print "\nERROR: "+str(e)
if OK:
print "Parse time: "+`pt`
else:
print "Error occurred, parse aborted."
sum=sum+pt
print "Average: %f" % (sum/3.0)
PyXML-0.8.4/demo/sax/saxtrace.py 0000644 0012410 0011756 00000003514 07413602734 017114 0 ustar loewis hpifb6 0000000 0000000 """
A minimal SAX application that just prints out the document-handler events
it receives.
"""
import sys
from xml.sax import saxexts
# --- SAXtracer
class SAXtracer:
def __init__(self,objname):
self.objname=objname
self.met_name=""
def __getattr__(self,name):
self.met_name=name # UGLY! :)
return self.trace
def error(self,exception):
print "err_handler.error(%s)" % str(exception)
def fatalError(self,exception):
print "err_handler.fatalError(%s)" % str(exception)
def warning(self,exception):
print "err_handler.warning(%s)" % str(exception)
def characters(self,data,start,length):
print "doc_handler.characters(%s,%d,%d)" % (`data[start:start+length]`,
start,length)
def ignorableWhitespace(self,data,start,length):
print "doc_handler.ignorableWhitespace(%s,%d,%d)" % \
(`data[start:start+length]`,start,length)
def startElement(self, name, attrs):
attr_str="{"
for attr in attrs:
attr_str="%s '%s':'%s'," % (attr_str,attr,attrs[attr])
if attr_str=="{":
attr_str="{}"
else:
attr_str=attr_str[:-1]+" }"
print "doc_handler.startElement('%s',%s)" % (name,attr_str)
def trace(self,*rest):
str="%s.%s(" % (self.objname,self.met_name)
for param in rest[:-1]:
str=str+`param`+", "
if len(rest)>0:
print str+`rest[-1]`+")"
else:
print str+")"
# --- Main prog
pf=saxexts.ParserFactory()
p=pf.make_parser("xml.sax.drivers.drv_xmlproc")
p.setDocumentHandler(SAXtracer("doc_handler"))
p.setDTDHandler(SAXtracer("dtd_handler"))
p.setErrorHandler(SAXtracer("err_handler"))
p.setEntityResolver(SAXtracer("ent_handler"))
p.parse(sys.argv[1])
PyXML-0.8.4/demo/sgmlop/ 0000755 0012410 0011756 00000000000 10152625721 015426 5 ustar loewis hpifb6 0000000 0000000 PyXML-0.8.4/demo/sgmlop/benchsgml.py 0000644 0012410 0011756 00000004006 07413602734 017747 0 ustar loewis hpifb6 0000000 0000000 # benchmark
import time
from xml.parsers import sgmlop
import sgmllib
SIZE = 16384
FILE = "test2.htm"
bytes = len(open(FILE).read())
def t1():
fp = open(FILE)
parser = sgmllib.SlowSGMLParser()
while 1:
data = fp.read(SIZE)
if not data:
break
parser.feed(data)
parser.close()
fp.close()
def t2():
fp = open(FILE)
parser = sgmllib.FastSGMLParser()
while 1:
data = fp.read(SIZE)
if not data:
break
parser.feed(data)
parser.close()
fp.close()
def t3():
fp = open(FILE)
parser = sgmlop.SGMLParser()
while 1:
data = fp.read(SIZE)
if not data:
break
parser.feed(data)
parser.close()
fp.close()
class Dummy:
def finish_starttag(self, tag, data):
pass
def finish_endtag(self, tag):
pass
def handle_entityref(self, data):
pass
def handle_data(self, data):
pass
def t4():