|File Search||Catalog||Content Search|
* STL like navigation of DOM tree, using excellent tree.hh library from Kasper Peeters * It is possible to reproduce exactly, character by character, the original document from the parse tree * Bundled CSS parser * Optional parsing of attributes * C++ code that looks like C++ (not so true anymore) * Offsets of tags/elements in the original document are stored in the nodes of the DOM tree
The parsing politics of htmlcxx were created trying to mimic Mozilla Firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by Firefox. However, differently from Firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document.
This package contains files required for developing software that makes use of htmlcxx.
htmlcxx - html and css APIs for C++ --------------------------------------------- Description =========== htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique: - STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters - It is possible to more»
The source code for the binary /usr/bin/htmlcxx is available as examples/htmlcxx.cc, and serves as an example of usage of the htmlcxx library.
HTMLCXX(1) htmlcxx Man Page HTMLCXX(1) NAME htmlcxx - simple HTML and CSS parser SYNOPSIS htmlcxx [-C] file.html [file.css] htmlcxx [-h | -V] DESCRIPTION This manual page documents briefly the htmlcxx command. This manual page was written for the Debian distribution because the original program does not have a manual pa more»
htmlcxx (0.84-1) unstable; urgency=low * New upstream release. * Update copyright file accordin more»
2008-10-12 18:41 davi * Applied patch by Luca Bruno fixing gcc 4.3 compilation problems. 2007-08 more»
Format-Specification: http://svn.debian.org/wsvn/dep/web/deps/dep5.mdwn?op=file&rev=135 Name: htmlcx more»