pkg://nolce-1.7-2.src.rpm:29920/nolce-1.7-2.tar.gz
info downloads
nolce-1.7-2/ 40755 0 0 0 6450270113 10600 5 ustar root root nolce-1.7-2/docs/ 40755 0 0 0 6450264357 11545 5 ustar root root nolce-1.7-2/docs/CHANGES.html 100644 0 0 10261 6450264146 13614 0 ustar root root <html>
<head>
<title>CHANGES</title>
</head>
<body bgcolor=#ffffff>
<h3>Changes of nolce version 1.7-2 over nolce 1.7</h3>
<ul>
<li>Bug of url_is_valid() fixed (thanks to Patrick Asty).
</ul>
<h3>Changes of nolce version 1.7 over nolce 1.6</h3>
<ul>
<li>Default destination dir is now <code>$HOME/cached</code>.
<li>Incorrect Html is supported better.
<li>Bug regarding displaying of informations about main and related html files fixed.
<li>Better protection against crashes in strange stituations.
<li>Better management of user (Ctrl-C) interruptions.
<li>Better support for using by a non-root user.
<li>Some small bugs or malfunction fixing.
<li>Source files now include short descriptions of what each function does.
</ul>
<h3>Changes of nolce version 1.6 over nolce 1.5</h3>
<ul>
<li>Bug regarding http links contained in ftp documents fixed.
<li>Now the program makes distinction between various command line errors.
<li>Options, unless n_hours, may now be given grouped (i.e. -pt or pt rather than -p -t)
<li>Fixing of a bug in the summary file occurring with names different from index.html.
<li>More precise management of different documents with the same url.
<li>If the program is launched without options, the user is asked if he really
wants to proceed.
<li>Url-type prefixes (http, ftp) are all converted to lower case.
<li>Changes to documentation.
</ul>
<h3>Changes of nolce version 1.5 over netcache 1.4</h3>
<ul>
<li>Changing of the name.
<li>Summary file format has changed drastically: now it's based on the division into
domains of retrieved pages.
<li>Now the program can process cache directories created by Netscape for
Windows present in non Linux-native partitions. Adding of the
<code>-p</code> option.
<li>Now <code>n_hours</code> condition is checked basing on
informations contained in <code>index.db</code>, not on files'
timestamps. This way is faster and better.
<li>Temporary HTML files are now created in <code>/tmp</code> directory.
<li>Adding of the <code>-W</code> option.
<li>Now the program supports the situation in which there are in the cache
multiple (different) files referring to the same url.
<li>Adding of the script <code>install.sh</code>, for installing precompiled version
(make install recompiles).
<li>Various minor bugs fixing.
<li>File organization in distribution changed.
<li>Various changes to sources to remove MAX_PATH limitation on command line
arguments.
</ul>
<h3>Changes of netcache version 1.4 over the 1.3</h3>
<ul>
<li>More complete management of problems connected to the use of the
<code>n_hours</code> parameter. Adding of the <code>-f</code> option.
<li>Adding of the option <code>-g</code>, which permits to process only HTML
files whose URL contains a specified string. Also adding <code>-G</code>, its opposite.
<li>Bug related to <code>-t</code> option fixed.
<li>Some links to local documents and images didn't function due to a bug
now fixed.
<li>The same problem as above was due to another bug also fixed.
<li>Now netcache works even if Netscape Navigator isn't installed, that is
even if the directory <code>$HOME/.netscape</code> doesn't exist.
<li>Progress meter when reading <code>index.db</code> file.
<li>Added support for disk-full situation.
<li>Added support for multiple versions of the same document in the cache.
<li>Heavy changes to documentation.
</ul>
<h3>Changes of netcache version 1.3 over the 1.1</h3>
<ul>
<li>Adding of the option <code>-t</code>, which permits to see downloading date of each
document in the <code>summary_file</code> .
<li>When interrupted by the user, the program now deletes current temporary
file in <code>$HOME/.netscape/cache</code> .
<li>Some changes to documentation.
</ul>
<h3>Changes of netcache version 1.1 over the 1.0</h3>
<ul>
<li>The behavior of the <code>-m</code> switch has changed. Now, by default, missing
images are totally deleted, and <code>-m</code> is used to not delete them.
<li>If no documents are retrieved, no summary file is written and the user is
informed.
<li>Messages during execution are cleaner and more compact. Informations now
include total number of files to be processed.
<li>Verifying of the <code>n_hours</code> condition is made before in the program, saving
execution time.
</ul>
</body>
</html>
nolce-1.7-2/docs/README.html 100644 0 0 643 6450264314 13441 0 ustar root root <html>
<head>
<title> README for nolce</title>
</head>
<frameset rows="*, 74" border=0 frameborder="no">
<frame src="frame_docs.html#start" scrolling=auto name="docs">
<frame src="frame_toc.html" scrolling=auto name="toc">
<noframes>
Sorry, this document needs a frame-capable browser.
<br>
If such a browser isn't available, read the <a href="frame_docs.html">frame_docs.html</a> file.
</noframes>
</frameset>
</html>
nolce-1.7-2/docs/frame_docs.html 100644 0 0 56113 6450264341 14651 0 ustar root root <html>
<head>
<title>README for nolce</title>
<base>
</head>
<body bgcolor="#ffffff">
<table width="100%" border=0 cellspacing=0 cellpadding=3>
<tr><td bgcolor="#e5fff5"><a
HREF="frame_docs.html#prologue"><center>What's new</center></a>
</td><td bgcolor="#e2f1fe">
<a HREF="frame_docs.html#usage"><center>Usage and what the program does</center></a>
</td><td bgcolor="#e5fff5">
<a HREF="frame_docs.html#install"><center>Installation</center></a>
</td></tr>
<tr><td bgcolor="#e2f1fe">
<a HREF="frame_docs.html#comp"><center>Compatibility</center></a>
</td><td bgcolor="#e5fff5"><table width="100%" height="100%" border=0 cellspacing=0 cellpadding=0 ><tr><td>
<a HREF="frame_docs.html#work"><center>How it works</center></a>
</td><td>
<a HREF="frame_docs.html#author"><center>Contacting the author</center></a>
</td></tr></table></td><td bgcolor="#e2f1fe">
<a HREF="LICENCE"><center>Licence</center></a>
</td></tr></table>
<hr>
<a name=start>
<h2>
<basefont size=3>
README for nolce
</h2>
(C) 1997 G. Trovato.
<p>
Nolce (<b>N</b>etscape's <b>O</b>ff <b>L</b>ine <b>C</b>ache <b>E</b>xplorer)
is a Linux program which allows an off-line navigation of Netscape Navigator cache files
adjusting their names and links.
<p>
<a name=prologue>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Introduction</b>
</font>
</td></tr></table><p>
</a>
Everyone that uses <b>Netscape Navigator</b>, on every platform, probably knows
that it saves all files downloaded from the Internet in the local hard disk,
unless this option has been disabled by the user. Every html file, every image,
and every downloaded document is normally stored under the directory
<code>$HOME/.netscape/cache</code>.
<p>
One could like to view those downloaded documents also off-line, read them
with calm, and possibly save them with related images.
But this isn't possible, because stored files in the cache have their names
changed, i.e. an original main.html may become a
<code>cache33BAD64001B0829.html</code>.
Besides they are stored under the cache directory in subdirs like <code>00,
01, ...</code>
without any respect of the relative positions of files.
<br>So even if you could guess what cached file corresponds to your desired
document, you see it without any image and with all links not working.
<p>
Saving a document from Netscape after receiving it doesn't save the related
images and links, so you can see only the textual part of the document when
you are off-line. <br>One can think that in this situation Netscape could retrieve lacking images from the cache, but it
isn't so because before using a cached file, it tries to connect to the
original site to check if the remote file is more recent than the local. As if
you're off-line this check isn't possible, the local file isn't used.
<p>
Using the gold version of the Navigator, it's possible to save a document and the related images entering the editor and
saving from it. But this isn't a good solution, because this operation can
require too much time and work, as you must repeat it for every page you want
to save.
<p>
<a name=usage>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Usage and what the program does</b>
</font>
</td></tr></table><p>
</a>
The file <code>index.db</code> under the Netscape cache directory contains the informations
necessary to associate cached files with their original names, sizes, creation
date, file type and so on. It is created by Netscape when first documents
are cached.
<br>
Nolce must not run when Netscape is in execution, because the file
<code>index.db</code>
may be damaged if two programs open it at the same time. To avoid problems,
nolce uses and recognizes the same lock file of Netscape, so when one of
the two programs runs, the other knows that it can't use the cache.
<br>Lock file is a symbolic link called <code>lock</code> created in the directory
<code>$HOME/.netscape</code> .
<br>
<br>With those informations nolce can copy those files in a new directory
structure under <code>dest_dir</code> (default is <code>$HOME/cached</code>) which
reflects the directory structure of the original site of the file,
restoring obviously their real names.
<p>
For example if <code>00/cache33BAD64001B0829.html</code> corresponds to an URL like
<code>http://www.rai.it/raiuno/aree.html</code>, the program creates the directory
<code>www.rai.it</code>, then under it the direcory <code>raiuno</code> and finally copies
<code>cache33BAD64001B0829.html</code> into <code>aree.html</code> under it.
<br>Nolce creates also symbolic links for images and other documents, for
example for an image like
<code>http://www.rai.it/raiuno/images/backgr.gif</code>, a link
called <code>backgr.gif</code> is created in the directory
<code>images</code> under <code>raiuno</code>.
<p>A <u>summary file</u> is created as an html file, so after that the program finishes
one can easily know what html documents it retrieved and can easily browse
them.
<br>
When viewing retrieved documents, <font color=#ff0000><i><u><b>links which are in italics are
links to other cached files</b></u></i></font>, so you can view them off-line too.
<br>
Note that some fixed fonts may render italics as bold.
<p>
Copied html files are slightly modified when necessary, but
we'll talk of this in the section <a HREF=#work>HOW IT WORKS</a>.
<br>However, it's important to underline that <b>nolce doesn't change in any way
the original Netscape cache</b>, which continues to work normally.
<p>
From version 1.5, nolce can also process caches generated by Netscape
for Windows, with the option <code>-p</code>.
<p>
Let's now talk about how using nolce.
First of all you can obtain a small help launching it with <code>--help</code>
and this is what you get:
<hr>
<xmp>
Usage: nolce [n_hours] [OPTIONS]...
Reads Netscape Navigator (ver. 2 and above) cache files created in
the last n_hours hours and copies them in a new directory adjusting
file names and links to permit an off-line navigation of them.
If n_hours isn't supplied, all cached files are processed.
-c cache_dir directory where cache is, default $HOME/.netscape/cache
-d dest_dir directory where files are copied, default $HOME/cached
-g sub_string process only URLs containing sub_string
-G sub_string process only files whose URL doesn't contain sub_string
-i summary_file file name of summary, it will be created in dest_dir,
default is index.html
-w show pages in another window
-W show pages in the list frame
-s execute silently
-m don't eliminate missing images
-t put downloading date of documents in summary file
-f don't process links not satisfying initial conditions
-p cache is generated by Netscape for Windows
--help shows this help
</xmp>
<hr>
<h4>Some considerations</h4>
<ol>
<li> Giving the <code>n_hours</code> parameter is very useful when you want to process only
the files downloaded during the last connection.
<li> <code>dest_dir</code> is the direcory under which will be created the direcory structures.
The program will distinguish between <code>http://</code> and
<code>ftp://</code> documents putting the
first ones under a subdir <code>http</code> of <code>dest_dir</code> and
the second ones under <code>ftp</code>.
<li><code>summary_file</code> will be always created in dest_dir, even if you supply an
absolute path. If summary file exist, it is not overwritten, but new entries
are appended to it.
<li><code>summary_file</code> contains an entry for every HTML file processed.
<br>To avoid confusion, if a page contains frames, single frames are not
reported in <code>summary_file</code>.
<li>By default, missing images are totally eliminated from the HTML file, so one
doesn't see the Netscape icon indicating them. With the <code>-m</code> option, missing
images are kept.
<li><code>sub_string</code> (options <code>-g</code> or <code>-G</code>) is case sensitive.
<li>Option <code>-p</code> must be used if the cache to be processed
is generated by Netscape for Windows. In this case the name of index
file is assumed to be <code>fat.db</code> and file names are all converted to
lower case, as are Dos files viewed from Linux.
<li>Starting from version 1.7, command line switches, unless n_hours, may be grouped, that is
<code>nolce -smc /cache</code> is the same of <code>nolce -s -m -c /cache</code>
or <code>nolce smc/cache</code>.
</ol>
<a name="important">
<h4>Important notes</h4>
<p>
<font color=#0000ff>i.</font>
<br>
Using previous versions of this program I've noticed that Netscape
doesn't save in the cache HTML files whose it couldn't know modification
time, even if related images are saved. Sometime the percentage of
such files is low, but sometimes it's about the 50% of total files, so this
may be a serious trouble, which, however, can be overridden with a small trick.
<br>In fact Netscape in a first moment saves those files and registers them
in the cache index, but when it exits, checks if there are HTML files
whose it doesn't know modification time and deletes them. So the one
way to maintain these files is to kill brutally Netscape when one
finishes navigation.
<br>When we close Netscape with Ctr-C from the shell, or, worse,
choosing `Exit' from its menu, the browser has all the time for doing
the cache's cleaning we want to avoid, but if we kill it with the SIGKILL
signal its execution ends immediately, because there is no way to
catch and to handle that signal.
<br>The command to give is:
<pre> kill -s 9 `pidof netscape`
</pre>
where <code>`pidof netscape`</code> is a manner to obtain process
identifier of Netscape (see also the command <code>ps</code>).
<br>If there is more than a copy of Netscape running, the above
command will close all of them, so it's better to use:
<pre> kill -s 9 PID
</pre>
where <code>PID</code> is the process ID of <u>your</u> Netscape.
<p>
Killing the browser with SIGKILL, it can't delete lock file, so it's necessary
doing a
<pre> rm $HOME/.netscape/lock
</pre>
A simple shell script can automate this procedure. For
example, for a single user environment, create, somewhere in your path, a
file called (for example) <code>nk</code> with this content:
<pre>
#!/bin/sh
kill -s 9 `pidof netscape`
rm $HOME/.netscape/lock
</pre>
then execute <code>chmod +x</code> on it and you're o.k.
<p>
Note that if you kill Netscape to retrieve at-risk documents, nolce
must to be launched before next Netscape's execution, at the end of which the
browser will do the cache's cleaning it couldn't do in the previous
execution.
<p>
<font color=#0000ff>ii.</font>
<br>
You may not find everything you expect in the cache. It may happen that documents and
images not completely downloaded aren't saved.
<br>In any case, it's better to press the STOP button before going away from
a page not completely loaded.
<br>Some images, typically counters provided at run-time by cgi-bin servers, aren't
even saved.
<h4>About parameters n_hours and -f</h4>
When giving this option, only HTML files which are downloaded after n_hours
ago are processed. Starting from version 1.5 time check is made using
informations of <code>index.db</code> rather than modification time of
the file. This way is faster and better, because if an already
existing cache document is re-visited, the new date is registered
in the index, while the file timestamp isn't changed.
<br>Time check is made only on HTML documents. Everything other, that is
images, zip files... are always valid.
<br>But what happens when a document that satisfies the n_hours condition has a
link to another which is in the cache, but was downloaded before of n_hours
ago?
<br>Starting from version 1.4, nolce processes (that is copies under <code>
dest_dir</code> and adjusts their links) these files also, even they won't
appear in the summary file. If one doesn't want this, the option
<code>-f</code> may be used.
This option is useful also in conjunction with
<code>-g</code> and <code>-G</code>.
<br>Regarding to messages shown during nolce execution, files in order with
the n_hours condition are called <i>main HTML files</i>, others <i>related
HTML files</i>.
<a name=instr>
<h4>About the summary file</h4>
Starting from version 1.5, the format of summary file changed. Now it's a
document divided into three areas (frames). The strip on the top is the
<u>status</u> frame, the area on the left is the <u>domains</u> frame, and the other is
the <u>list</u> frame.
<br>Domains windows contains all different domains encountered during retrieval
of pages. Clicking on a domain name, available documents, related to that
domain are displayed in the list frame.
<br>To view a retrieved document, click on its icon, while clicking on the URL
the page is downloaded from the Internet.
<p>If neither <code>-w</code> or <code>-W</code> option is given, pages will be displayed in the same
window of the summary, but taking the entire space, that is also that of
other two frames. With <code>-W</code> the document is viewed only in the list frame,
allowing an easy selections of other domains and other documents.
Finall, with <code>-w</code>, another browser window is created for viewing documents.
Normally the other window is created once, then, if the user doesn't close it,
it is used every time a document is selected.
<p>Selecting <u>Lists & domains</u> or <u>Simple List</u> from the status frame,
one can return immediately to the index of processed pages, but in the first
case the default layout (domains + list) is used, while in the second the list area
takes all the space below the status frame.
<h4>Disk Usage</h4>
Nolce requires a certain quantity of available space on disk. Every processed HTML file
needs to be copied under <code>dest_dir</code>, because it must be
modified to make its link to point to local files, and we want to leave
files in the cache untouched, in order to permit Netscape to continue using
them.
<br>Non HTML files aren't changed, so for them a symbolic link is created.
<br>Basing on my experience, nolce requires approximately 1/4 or 1/5 of
space occupied by Netscape cache if one wants to retrieve all documents in
the cache.
<h4>Cache generated by Netscape for Windows</h4>
In this case one must use the <code>-p</code> option.
<br>As said above, images under <code>dest_dir</code> are only symbolic links to
files in the cache, so to correctly view retrieved pages, the (dos) partition
containing the cache must be currently mounted under the same dir of when
nolce was executed.
<br>It's better to mount the dos partition with type <code>msdos</code>
rather than <code>vfat</code> because in the first case access is faster and
file names aren't case sensitive.
<p>
<a name=install>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Installation</b>
</font>
</td></tr></table><p>
</a>
This software is available in a package containing both source and binary versions.
<br>It can be obtained at
<br><a href="ftp://sunsite.unc.edu/pub/Linux/apps/www/plugins">ftp://sunsite.unc.edu/pub/Linux/apps/www/plugins</a>, at
<br><A HREF="http://www.aspide.it/freeweb/giustrov/nolce.html">http://www.aspide.it/freeweb/giustrov/nolce.html</a>
and at
<br><a href="ftp://194.243.202.167/giustrov">ftp://194.243.202.167/giustrov</a>
<p>For using this program, you must have installed the DB library.
It's necessary to read records
stored in the <code>index.db</code> file.
<br>
In practice you need <code>libdb.so</code> to run the compiled version, and also db include
files to compile the program.
<br>
For Linux, with Slackware and Redhat distributions, the library should be
present by default.
<br>
For the include files, with Redhat you must install a package called
<code>db-devel</code> or similar. For Slackware, they are in
<code>libc.tgz</code>, so they aren't a problem.
<br>
<p>
For compiling, cd to <code>src</code> subdir and do <code>make</code>.
<br>
Do <code>make install</code> to compile and copy the executable in
<code>/usr/bin</code>, the man
page in <code>/usr/man/man1</code> and the documentation in
<code>/usr/doc/nolce</code>.
<p>
If you haven't the compiler installed, or if you want to use precompiled
version, launch the <code>install.sh</code> script, from top nolce dir.
<p>
If standard destinations don't fit your taste, modify them in the Makefile
or in install.sh .
<p>
<a name=comp>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Compatibility</b>
</font>
</td></tr></table><p>
</a>
I have tested the program under Linux only, and with Netscape Navigator 3.01
and 4.0b5.
<br>
Probably it works with version 2.0 also, since the present format of the cache
was introduced with this release.
<br>
It should work also with other Unix, if their Netscape indexes its cache in the
same way of the linux version, that is with a DB hash file named
<code>index.db</code> under <code>$HOME/.netscape/cache</code>.
<br>
If the name is different, it's easy to
change the value of CACHE_FILE, in the defines section of the source file.
<br>
From the point of view of the language, I use code conforming to ANSI C or
POSIX standards only, so if your system supports them, there must be no
problems.
<p>
As I know, the following circumstances may cause problems or errors in
compiling <code>nolce</code>:
<ol>
<li> Makefile assumes that your <code>make</code> correctly defines the
variable <code>CC</code> as your
site compiler name (i.e cc or gcc), and the variable <code>LEX</code> as your lex program.
This must be ensured by every <code>make</code>, but if not, define them by hand.
<li> The behavior of the lex program may change. Apart from program options,
it often requires linking with some libraries. The Linux standard lex, that
is GNU flex, requires the <code>-lfl</code> library, and it's provided in the variable
<code>LDFLAGS</code> of the Makefile.
<br>If your site uses a different lex, read its documentation and change the
Makefile accordingly.<br>Possible options needed by the program may be given
in the <code>LFLAGS</code> variable.
<li> The program interfaces with the lexical analyzer through the usual
<code>yylex()</code>
function, called in the <code>process_html_file</code> of <code>main.c</code>.
Input and output files are supplied to yylex with the extern variables
<code>yyin</code> and <code>yyout</code>. Probably this is not conforming with original AT&T lex,
but, as I know, it conforms to POSIX specification for lex, and, above all,
it's almost the only way one can use with flex.
<li> Flex defines <code>yytext</code> as a char pointer, while other lex may define it as a
char array. If this is your case, you must compile <code>main.c</code> with
the <code>-Darray</code>
option, which can be done by setting the variable <code>DEFINES</code> of the Makefile.
</ol>
If problems persist, send me an e-mail, describing, besides the problem, what
system you're using, what lex and so on. But I haven't access to other systems
further my Linux machine, so, don't expect a certain solution.
<p>
Some persons have encountered problems with nolce which disappeared using the
precompiled version.
<p>
If you discovery a bug, i.e. an abnormal exit of the program with a Segmentation Fault error, please let me know. You should send me an e-mail with a brief
description of the circumstances under which the error happened, command line
options, and above all the core file generated by the program.
<br>Shells permit to decide if one wants to obtain a core dump after an abnormal
termination of a program. With <code>bash</code> see the command <code>ulimit
</code>.
<br>For being the core file useful to me, it must be generated by a program
compiled with debug info: add the option <code>-g3</code> to <code>CGLAGS</code> in the Makefile. If you have <code>libg</code> installed, add also <code>-lg</code> to <code>LDFLAGS</code>.
<p>
<a name=work>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>How it works</b>
</font>
</td></tr></table><p>
</a>
i. <font color=#0000ff>INDEX.HTML</font>
<p>
A lot of urls, i.e. <code>http://home.netscape.com</code>, don't contain an HTML file name.
<br>In this situation the server provides a default HTML file, usually
<code>index.html</code>,
and nolce appends the same name to these urls.
<br>It could happen that an HTML file contains a link to such an url with the file
name explicited. If this name is different from <code>index.html</code>, the link doesn't
work.
<p>
ii. <font color=#0000ff>LINKS</font>
<p>
The main work nolce does is changing links in HTML files to point to local
files.
<br>
There are various types of links (imagine you're browsing the document
<code>http://www.aaaa.com/bbb/index.html</code>):
<ul>
<li><font color=#058805>Relative</font> links, i.e. <code>HREF="ccc/image.gif"</code>. In this case the browser loads
the file <code>image.gif</code> from the directory <code>ccc</code> at the same
level of current document directory, that is <code>bbb</code>.
<li><font color=#058805>Absolute</font> links, i.e.
<code>HREF="http://www.aaaa.com/ccc/image.gif"</code>. In this
case Netscape will always try to obtain the document from the net, so
nolce transforms the link in something like <code>"../ccc/image.gif"</code>.
<li><font color=#058805>Base-related</font> links, i.e <code>HREF="/ccc/image.gif"</code>. These links must be
interpreted as <code>http://www.aaaa.com/ccc/image.gif</code>, not regarding of the
directory in which the HTML files is.
</ul>
<p><u>
If a link points to a document present in the cache, it is changed to a
relative link, otherwise it's turned in an absolute link.
</u><p>
iii. <font color=#0000ff>LEX</font>
<p>
If your lex program is GNU flex, the flag <code>-Cf</code> may be given to it (put in
the variable <code>LFLAGS</code> of the Makefile). This makes the program bigger, but
execution speeds up of 10-15%.
<p>
iv. <font color=#0000ff>MISCELLANEOUS</font>
<p>
<ul>
<li> In the file <code>nolce.h</code> there are some defines which can be customized.
<li> Links pointing to documents which are present in the cache are in italics.
Obviously the HTML document can contain links which are in italics of
origin, and in this case they may point to non-local files.
<br>Besides, if a link is presented as a formatted text, i.e <code><h3>Link</h3></code>, the italics isn't shown.
<li>If two or more versions of a document are present in the cache, the more
recent is taken.
<li>Netscape seems to have problems to follow links to local files which
contain characters like <code>`?'</code>. Mainly for this reason, when creating
directories, strange characters like <code>`?', `=', `('</code> and so on are substituted
with an underscore.
</ul>
<p>
<a name=author>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<h2>
<font color=#eeffff>
Contacting the author
</font>
</td></tr></table><p>
</>
For any question, bug report or comment, email to <a href="mailto:g.trovato@usa.net">g.trovato@usa.net</a>
<br>
My home page is<br>
<a href="http://members.tripod.com/~giustrov">http://members.tripod.com/~giustrov</a>
<p>
Nolce web page is:<br>
<a href="http://www.aspide.it/freeweb/giustrov/nolce.html">
http://www.aspide.it/freeweb/giustrov/nolce.html</a>
<a href="LICENCE">
<h2>
<font color=#ff0000>
LICENCE
<hr noshade size=2>
</h2>
</font>
</a>
</body>
</html>
nolce-1.7-2/docs/frame_toc.html 100644 0 0 2615 6426222742 14466 0 ustar root root
<html>
<head>
<title>Menu</title>
<base target="docs">
</head>
<body bgcolor=#b5b2b5>
<font size=3>
<form>
<table width="100%" border=0 cellspacing=0 cellpadding=0>
<tr align=center>
<td>
<b>
<input type="button" value="Introduction" onclick="parent.docs.location.href='frame_docs.html#start'">
</b>
</td><td>
<b>
<input type="button" value="Usage and what the program does" onclick="parent.docs.location.href='frame_docs.html#usage'">
</b>
</td><td>
<b>
<input type="button" value="Installation" onclick="parent.docs.location.href='frame_docs.html#install'">
</b>
</td>
<td>
<b><font color=#0000ff>
<input type="button" value="Changes"
onclick="parent.docs.location.href='CHANGES.html'">
</b></font>
</td>
</tr>
<tr align=center><td>
<b>
<input type="button" value="Compatibility" onclick="parent.docs.location.href='frame_docs.html#comp'">
</b>
</td><td>
<b>
<input type="button" value="How it works" onclick="parent.docs.location.href='frame_docs.html#work'">
<input type="button" value="Contacting the author" onclick="parent.docs.location.href='frame_docs.html#author'">
</b>
</td><td>
<b><font color=#0000ff>
<input type="button" value="Licence"
onclick="parent.docs.location.href='LICENCE'">
</b></font>
</td><td>
<b><font color=#ff0000>
<input type="button" value="Important!" onclick=" parent.docs.location.href='frame_docs.html#important';">
</b></font>
</td>
</tr>
</table>
</form>
</font>
</body>
</html>
nolce-1.7-2/docs/nolce.1 100644 0 0 6622 6450264357 13032 0 ustar root root .TH nolce 1 "9 November 1997" "Nolce version 1.7-2" \" -*- nroff -*-
.SH NAME
nolce - allow off-line Netscape Navigator cache browsing
.SH USAGE
.B nolce
[n_hours] [-c cache_dir] [-d dest_dir] [-g|-G sub_string] [-i summary_file] [-w|-W] [-s] [-m] [-t] [-f] [--help]
.SH DESCRIPTION
Nolce copies Netscape Navigator (ver. 2 and above) cache files
in a new directory, adjusting file names and links to permit an off-line
navigation of them.
.SH OPTIONS
.TP
.I "n_hours"
Process only files created, that is downloaded, in the last n_hours hours.
If this option isn't given, nolce processes all the files of the cache.
.TP
.I "-c cache_dir"
Process cache files under the cache_dir directory. Default is
.BR $HOME/.netscape/cache .
.TP
.I "-d dest_dir"
Store processed files under dest_dir. Default is
.BR $HOME/cached .
.TP
.I "-g sub_string"
Process only HTML files whose URL contains the specified sub_string
.TP
.I "-G sub_string"
Process only HTML files whose URL doesn't contain the specified sub_string
.TP
.I "-i summary_file"
Put summary about documents retrieved in the html file summary_file. It will
contain titles of documents, and links both to local copy and to
original site of them. Even if it's given an absolute path, the file will
be always created in dest_dir. Default is
.BR index.html .
If summary file exists, it is not overwritten, but new entries are appended to
it.
.TP
.I "-w, -W"
By default pages are displayed in the summary window. With
.BR -w
another browser window is created for pages, while with
.BR -W
they are displayed in the list frame.
.TP
.I "-s"
Programs executes silently.
.TP
.I "-m"
By default, missing images are totally eliminated from the HTML file, so one doesn't see
the Netscape icon indicating them. With this option, missing
images are kept.
.TP
.I "-t"
Put downloading date of each document in summary file.
.TP
.I "-f"
Don't process links not satisfying initial conditions.
When one of the options
.BR n_hours ,
.BR -g
or
.BR -G
is given, if a document has a link to another one which doesn't satisfy those
conditions, the link is processed also, unless the -f option is
given. However, even when processed, these links doesn't appear in the summary
file.
.TP
.I "--help"
Shows help message and exit.
.SH OVERVIEW
Netscape normally stores all types of downloaded documents (html files, images and so
on) under the directory $HOME/.netscape/cache, but it changes their names and
their relative position, so it isn't possible to browse them off-line.
.PP
Nolce basically does three things: copies cached files in a new directory
structure which reflects the original site structure, restores their original
file names, and adjusts links in html files to point to local files rather than
to Internet ones.
.I The original Netscape cache is left intact.
After running the program, user should open the summary
file in Netscape Navigator, and from it view desired documents.
Alternatively one can explore the directories and files created under
dest_dir.
.PP
When viewing processed documents, links in
.I italics
refer to local documents,
while normal links to Internet ones.
.PP
For further informations see the next section.
.SH FILES
This software comes with a LICENCE file and a README file in the HTML format.
Refer to them for further informations, especially to
the section `HOW IT WORKS' for knowing how the program acts and what it does
and doesn't.
.SH AUTHOR
Giuseppe Trovato (g.trovato@usa.net)
nolce-1.7-2/docs/LICENCE 100644 0 0 5776 6426222742 12641 0 ustar root root Nolce is Copyright (C) 1997 Giuseppe Trovato (g.trovato@usa.net)
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTY IS DISCLAIMED.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES.
Redistribution in source and binary forms, are permitted provided
that the following conditions are met:
1. Redistributions of this software must include all the files, left
intact, present in this package. You may omit only the sources files,
that is only main.c, utils.c, skeletons.c, scanner.lex, Makefile and
nolce.h, or, alternatively the binary executable nolce.
2. Without the written permission of the author, nobody can obtain money
for this software, or any software which include portions of it, with
the exception of that money necessary for the physical support and the
copying operation.
Vice versa, this software, or portions of it, even modified, can be
included in a free software, provided that it contains a copyright note
like this: "This software contains parts which are copyright (C) 1997
Giuseppe Trovato".
-*---**---*-
This software is cardware for non-commercial (i.e personal or educational)
use. That is you must send me a postal card of your town if using this program.
If this is a too big effort for you, I'll be satisfied with an e-mail
message :-)
Even if you do the right thing, sending a postal card, an e-mail message is
useful if you want to be notified of new versions, enhancements, bug fixes...
Commercial users must register themselves. Commercial use means using this
program in making your work or business.
Registration fee is 15$ (or Lit. 25000) for a single-user machine and 25$ (or
Lit. 42000) for a multi-user one.
A Licence is required for every machine on which the program is installed.
Payment must be made as check payable to me or, less recommended, as cash.
Registered users will receive a Licence document on paper and will be notified
via e-mail about new versions of this software.
So, a letter requesting registration must include postal address, e-mail,
specification of the type of Licence (single or multi user), and, obviously,
money :-)
Further the postal mail, send me an e-mail informing about your request.
In no event I will be responsible of problems concerning postal delivery of
the request of registration.
Obviously, registration or sending postal card is required once, not every time
you download a new version.
-*---**---*-
If nolce is distributed in a physical support, I would like to receive a
copy of it, while if you make available it in a new site, please let me know
about it.
My postal address is:
Giuseppe Trovato
Via F. Ferruccio 36
91011 Alcamo (TP)
ITALY
Email: g.trovato@usa.net
nolce-1.7-2/LICENCE 120777 0 0 0 6450267757 13542 2docs/LICENCE ustar root root nolce-1.7-2/README 100644 0 0 663 6426222742 11552 0 ustar root root To install precompiled version launch install.sh .
To rebuild, go to src subdir and do make, or make install.
For nolce documentation, refer to the file README.html in subdir docs.
It requires a frame and JavaScript capable browser (like Netscape 2.0+).
If such a browser isn't available, read frame_docs.html .
Look at the file LICENCE for terms about using this program.
Read also CHANGES.html for changes over previous versions.
nolce-1.7-2/src/ 40755 0 0 0 6450267223 11377 5 ustar root root nolce-1.7-2/src/main.c 100644 0 0 56543 6446031416 12617 0 ustar root root
/*----------------------------------------------------------------------------*
* main.c, utils.c, nolce.h : Copyright (C) 1997 Giuseppe Trovato *
* (g.trovato@usa.net) *
* *
* Read LICENCE file before using the program. *
*----------------------------------------------------------------------------*/
/**************************** INCLUDES *************************************/
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <signal.h>
#include <db.h>
#include <ctype.h>
#include "nolce.h"
/************************ EXTERN VARIABLES *********************************/
char *netscape_dir, *lock_file, *t_path, *curr_dir;
char g_or_G = '\0';
size_t len_cache_dir;
NODE *root = NULL;
long total = 0;
int status = 0;
bool yet_to_process = 0;
T_OPT opt = {0, 0, NULL, NULL, NULL, NULL};
enum ops
{ real_copy, symbolic_link };
enum vto
{ process_main, process_related, summarize };
enum pr
{ with_frames, is_image, normal };
/*************************** SIGNAL HANDLER **********************************/
void
sig_handler (int sig_number)
{
if (sig_number == SIGINT)
{
char c;
printf ("\n---- Do you really want to quit (y/n)? ");
c = getchar ();
while (getchar () != '\n');
if ((c != 'y') && (c != 'Y'))
{
signal (SIGINT, sig_handler);
if ((status == 1) && !(opt.var & SILENT))
printf ("Processing main HTML files: ");
if ((status == 2) && !(opt.var & SILENT))
printf ("Processing related HTML files: ");
fflush (stdout);
return;
}
}
if (sig_number == SIGSEGV)
printf ("\nI feel unwell... it's better to die!\n");
else
printf ("\nBye!\n");
unlink (lock_file);
unlink (t_path);
fflush (NULL);
signal (sig_number, SIG_DFL);
raise (sig_number);
}
/****************************** MAIN ***************************************/
int
main (int argc, char **argv)
{
int k, j;
char temp[64];
time_t n_hours = 0, t, cut_time;
char *home, *err, *ls;
int len_home;
size_t currdir_size = 0;
/* Obtains home dir and Netscape's dir. */
home = getenv ("HOME");
if (home)
len_home = strlen (home);
else
gt_exit ("\nnolce:\nCould not obtain your home dir from the enviroment, exiting...")
do
{
currdir_size += 30;
if (currdir_size > 10000)
gt_exit ("\nnolce:\nCould not read current dir, exiting...")
curr_dir = (char *) gt_realloc (curr_dir, currdir_size);
}
while (!getcwd (curr_dir, currdir_size));
netscape_dir = (char *) gt_malloc (len_home + 1 + 9 + 1);
sprintf (netscape_dir, "%s/.netscape", home);
lock_file = (char *) gt_malloc (len_home + 1 + 9 + 1 + 4 + 1);
sprintf (lock_file, "%s/lock", netscape_dir);
/* Command line scanning. */
if (argc == 1)
{
char ans;
printf ("Ok to process all files under %s/.netscape/cache ([y]es/[n]o/[h]elp) ? ", home);
ans = getchar ();
while (getchar () != '\n');
if (ans == 'n')
exit (EXIT_SUCCESS);
else if (ans == 'h')
help ();
}
for (k = 1; k < argc; k++)
{
if (!strncmp (argv[k], "--h", 3))
help ();
t = strtoul (argv[k], &err, 10);
if (err && *err) /* Arg isn't n_hours */
{
char sw;
char **ptr;
size_t len = strlen (argv[k]);
for (j = 0; j < len; j++)
{
sw = argv[k][j];
ptr = NULL;
switch (sw)
{
case '-': break;
case 'w': opt.view_window = 1; break;
case 'W': opt.view_window = -1; break;
case 'm': opt.var = opt.var | MISSING_IMAGES; break;
case 's': opt.var = opt.var | SILENT; break;
case 't': opt.var = opt.var | REPORT_TIME; break;
case 'f': opt.var = opt.var | NO_LINKS; break;
case 'p': opt.var = opt.var | WIN_CACHE; break;
case 'c': ptr = &opt.cache_dir; break;
case 'd': ptr = &opt.dest_dir; break;
case 'i': ptr = &opt.summary_file; break;
case 'g':
case 'G':
g_or_G = sw;
ptr = &opt.str_tbc;
break;
default:
print_error (sw, argv[0], illegal);
}
if (ptr)
{
bool oper = 1;
if (strchr ("igG", sw))
oper = 0;
if (*ptr)
print_error (sw, argv[0], already_supplied);
if (j == len - 1)
{
if (k < argc - 1)
process_arg (argv[++k], ptr, oper);
else
print_error (sw, argv[0], need_arg);
}
else
{
process_arg ((argv[k] + j + 1), ptr, oper);
break;
}
}
}
}
else
/* Arg is n_hours */
{
if (n_hours)
print_error ('\0', argv[0], already_supplied);
else
n_hours = t;
}
}
/* End of command line scanning. */
/* If some parameters aren't given, the defaults are used. */
if (!opt.cache_dir)
{
opt.cache_dir = (char *) gt_malloc (len_home + 1 + 9 + 1 + 5 + 1);
sprintf (opt.cache_dir, "%s/cache", netscape_dir);
}
len_cache_dir = strlen (opt.cache_dir);
if (!opt.dest_dir)
{
opt.dest_dir = (char *) gt_malloc (len_home + 1 + strlen (DEST_DIR) + 1);
sprintf (opt.dest_dir, "%s/%s", home, DEST_DIR);
}
if (!opt.summary_file)
{
opt.summary_file = (char *) gt_malloc (strlen (SUMMARY_FILE) + 1);
strcpy (opt.summary_file, SUMMARY_FILE);
}
else
/* Takes only the final file name of a */
/* a path which may contain also the direcory */
if ((ls = strrchr (opt.summary_file, '/')))
{
gt_strshift (opt.summary_file, 1 + ls - opt.summary_file);
printf ("summary file is: %s/%s\n", opt.dest_dir, opt.summary_file);
}
/* Turns signal handler on. */
signal (SIGINT, sig_handler);
signal (SIGQUIT, sig_handler);
signal (SIGTERM, sig_handler);
signal (SIGSEGV, sig_handler);
/* Creates lock file. */
if (!chdir (netscape_dir))
{
sprintf (temp, "1.0.0.127:%d", getpid ());
if (symlink (temp, lock_file))
{
fprintf (stderr, "\nnolce:\nThe program can't get exclusive access to the cache."
"\nIf Netscape or another copy of nolce is running, exit from it and"
"\nretry, otherwise delete the file $HOME/.netscape/lock\n");
exit (EXIT_FAILURE);
}
}
else
gt_exit ("\nnolce:\nCould not access $HOME/.netscape dir, exiting...")
/* n_hours stuff. */
if (n_hours != 0)
{
if (n_hours < 0)
n_hours = -n_hours;
if (n_hours > 175200)
n_hours = 175200; /* Max 20 years, for avoiding problems */
/* with data overflow. */
cut_time = time (NULL) - n_hours * 3600L;
}
else
cut_time = 0;
/* Creates directory /tmp if it doesn't exist, */
mkdir (P_tmpdir, S_IRWXU | S_IRWXG | S_IRWXO);
/* Reads the cache index. */
root = get_data (root, cut_time, opt.str_tbc);
if (root)
{
if (!(opt.var & SILENT))
printf (" (done)");
fflush (stdout);
/* Creates dest_dir. */
mkdir (opt.dest_dir, S_IRWXU | S_IRWXG | S_IRWXO);
chdir (opt.dest_dir);
mkdir ("nolce_files", S_IRWXU | S_IRWXG | S_IRWXO);
start_summary_files (CONST, NULL);
if (!(opt.var & SILENT))
printf ("\nProcessing main HTML files: ");
status = 1;
/* Reads the tree to process html files. */
visit_tree (root, process_main);
if (yet_to_process && !(opt.var & SILENT))
printf ("\nProcessing related HTML files: ");
status = 2;
do
{
yet_to_process = 0;
visit_tree (root, process_related);
}
while (yet_to_process);
if (!(opt.var & SILENT))
{
printf ("\nWriting summary file `%s/%s'", opt.dest_dir, opt.summary_file);
fflush (stdout);
}
status = 3;
/* Creates the summary file. */
visit_tree (root, summarize);
put_in_summary_files (NULL, 1);
if (!(opt.var & SILENT))
printf (" (all done)\n");
}
else if (total == 0)
printf ("\nNo documents were found matching requested conditions.\n");
/* Removes lock file. */
unlink (lock_file);
fflush (NULL);
return 0;
}
/********************** Function get_data *************************************
* Reads the cache index file (index.db) and creates the binary tree with all *
* the informations about cached files. The root of the tree is "start", *
* "cut_time" is the correspondent of n_hours, "sub_str" is the string *
* supplied with -g or -G. *
******************************************************************************/
NODE *
get_data (NODE * start, time_t cut_time, char *sub_str)
{ /* Creates b-tree. */
DB *index;
DBT key, data;
int type;
size_t len;
time_t d_time;
int fd;
bool tc;
struct stat attr;
char *url, *file, *c_url, *path, *cont;
if (!(opt.var & WIN_CACHE))
{
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (CACHE_FILE) + 1);
sprintf (path, "%s/" CACHE_FILE, opt.cache_dir);
}
else
{
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (WIN_CACHE_FILE) + 1);
sprintf (path, "%s/" WIN_CACHE_FILE, opt.cache_dir);
}
chdir (opt.cache_dir);
index = dbopen (path, O_RDONLY, 0, DB_HASH, NULL);
if (index == NULL)
{
fprintf (stderr, "The supplied cache dir (%s) isn't valid.\n", opt.cache_dir);
total = -1;
return NULL;
}
else if (!(opt.var & SILENT))
{
printf ("Processing cache information from `%s'", path);
fflush (stdout);
}
fd = index->fd (index);
while (!index->seq (index, &key, &data, R_NEXT))
{
if (key.size == *(int *) (key.data))
{
url = (char *) (key.data + 8);
file = (char *) (data.data + 33);
d_time = *((time_t *) (data.data + 12));
cont = (char *) (data.data + 71 + strlen (file));
if (strstr (cont, "html") || strstr (cont, "x-www"))
type = 1;
else
type = 0;
if (!url_is_valid (url))
continue;
if (!strncmp (url, "wysiwyg", 7))
continue;
if ((opt.var & WIN_CACHE))
{
int k, len;
len = strlen (file);
for (k = 0; k < len; k++)
file[k] = tolower (file[k]);
}
if (type)
{
if (stat (file, &attr))
continue;
len = strlen (url) + 1;
c_url = (char *) gt_malloc (len);
strcpy (c_url, url);
}
standardize (url);
if (type)
{
tc = d_time > cut_time;
if (tc && g_or_G)
{
if (g_or_G == 'g')
tc = (strstr (c_url, sub_str) != NULL);
else
tc = (strstr (c_url, sub_str) == NULL);
}
if (check_html_url (url, 1))
start = add_url (start, c_url, url, (tc) ? (IS_HTML | NEED_INDEX | PR_MAIN | SHOW) : IS_HTML | NEED_INDEX, file, d_time);
else
start = add_url (start, c_url, url, (tc) ? (IS_HTML | PR_MAIN | SHOW) : IS_HTML, file, d_time);
free (c_url);
}
else
start = add_url (start, "", url, 0, file, d_time);
}
}
close (fd);
free (path);
if (total > 0)
return start;
else
return NULL;
}
/********************** Function add_url **************************************
* Adds informations for a file to the binary tree. First searches recursively *
* starting from "node" the right place for the new node, then creates it. *
* Informations stored are: "or_url", the original url of the file; "url", the *
* url modified by standardize(); "type", a bitmapped variable indicating the *
* type (image, html...) of the file; "file_name", the name under the cache *
* dir; "mtime", the time when the file was created/modified. *
******************************************************************************/
NODE *
add_url (NODE * node, char *or_url, char *url, int type,
char *file_name, time_t mtime)
{
int cmp = 1;
if (node != NULL)
cmp = strcmp (url, node->url);
if (node == NULL)
{
size_t len;
node = (NODE *) gt_malloc (sizeof (NODE));
len = strlen (url);
node->url = (char *) gt_malloc (len + 1);
strcpy (node->url, url);
node->type = type;
node->mod_time = mtime;
node->dup = 1;
node->file_name = (char *) gt_malloc (strlen (file_name) + 1);
strcpy (node->file_name, file_name);
if (type & IS_HTML)
{
node->or_url = (char *) gt_malloc (strlen (or_url) + 1);
strcpy (node->or_url, or_url);
node->title = (char *) gt_malloc (9);
strcpy (node->title, "Untitled");
if (type & NEED_INDEX)
{
node->url_w_index = (char *) gt_malloc (len + 1 + strlen (DEFAULT_HTML) + 1);
strcpy (node->url_w_index, url);
check_html_url (node->url_w_index, 0);
node->r_url = node->url_w_index;
}
else
node->r_url = node->url;
}
node->sx = NULL;
node->dx = NULL;
if (type & PR_MAIN)
total++;
}
else if (cmp == 0)
{
if (strcmp (node->file_name, file_name))
{
if (type & IS_HTML)
{
char *temp;
temp = (char *) gt_malloc (strlen (url) + 8 + 1);
sprintf (temp, "%s-%u", url, ++(node->dup));
if (mtime > node->mod_time)
{
node = add_url (node, node->or_url, temp, node->type, node->file_name, node->mod_time);
if ((type & PR_MAIN) && !(node->type & PR_MAIN))
total++;
node->type = type;
node->mod_time = mtime;
node->file_name = (char *) gt_realloc (node->file_name, strlen (file_name) + 1);
strcpy (node->file_name, file_name);
}
else
node = add_url (node, or_url, temp, type, file_name, mtime);
free (temp);
}
else if (mtime > node->mod_time)
{
node->mod_time = mtime;
node->file_name = (char *) gt_realloc (node->file_name, strlen (file_name) + 1);
strcpy (node->file_name, file_name);
}
}
}
else if (cmp < 0)
node->sx = add_url (node->sx, or_url, url, type, file_name, mtime);
else if (cmp > 0)
node->dx = add_url (node->dx, or_url, url, type, file_name, mtime);
return node;
}
/********************** Function visit_tree ***********************************
* Visits (reads) the binary tree in pre-order, and, depending of "operation", *
* calls process_html_file() or put_in_summary_files(); *
******************************************************************************/
void
visit_tree (NODE * node, int operation)
{
static long c = 0;
if (node == NULL)
return;
else
{
visit_tree (node->sx, operation);
if ((operation == process_main) && (node->type & PR_MAIN))
{
if (!(opt.var & SILENT))
{
printf ("\r\t\t\t\t%ld/%ld", ++c, total);
fflush (stdout);
}
process_html_file (node);
}
if ((operation == process_related) && (node->type & PR_REL)
&& !(node->type & PROCESSED))
{
if (!(opt.var & SILENT))
{
printf ("\r\t\t\t\t%ld", ++c - total);
fflush (stdout);
}
node->type = node->type | PROCESSED;
process_html_file (node);
}
if ((operation == summarize) && (node->type & SHOW))
put_in_summary_files (node, 0);
visit_tree (node->dx, operation);
}
}
/********************** Function find_url *************************************
* Finds the node in the tree whose root is "node" containing informations *
* about the url "url". *
******************************************************************************/
NODE *
find_url (NODE * node, char *url)
{
int comp;
if (node != NULL)
{
comp = strcmp (url, node->url);
if (comp == 0)
return node;
else if (comp < 0)
return (find_url (node->sx, url));
else
return (find_url (node->dx, url));
}
else
return NULL;
}
/********************** Function process_html_file ****************************
* Scans, through yylex(), the html file referred by "node", writes it with *
* the adjustments provided by process_reference() to make links working to a *
* temporary file which is then copied, by copy_file(), in the appropriate *
* directory under dest_dir. *
******************************************************************************/
extern int yylex (void);
extern FILE *yyin, *yyout;
#ifdef array
extern char yytext[];
#else
extern char *yytext;
#endif
extern unsigned char i_a_tag, i_frame, found;
extern char title[], img_other[];
extern char *img_src;
void
process_html_file (NODE * node)
{
char *http_base, *mod_base, *fname, *dup_or_url, *path;
FILE *orig, *dest;
size_t len = strlen (node->or_url) + strlen (DEFAULT_HTML) + 2;
http_base = (char *) gt_malloc (len);
dup_or_url = (char *) gt_malloc (len);
mod_base = (char *) gt_malloc (strlen (node->r_url) + 1);
fname = cut_path (node->r_url, mod_base);
strcpy (dup_or_url, node->or_url);
check_html_url (dup_or_url, 0);
cut_path (dup_or_url, http_base);
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (node->file_name) + 1);
sprintf (path, "%s/%s", opt.cache_dir, node->file_name);
orig = fopen (path, "r");
if (orig != NULL)
{
enum rets
{
TITLE = 1, BASE, REF, IMG
};
int what;
t_path = tmpnam (NULL);
if (t_path == NULL)
gt_exit ("\nnolce:\nProblems generating temporary files:"
" try to clean " P_tmpdir " directory, exiting...")
dest = fopen (t_path, "w");
if (dest == NULL)
gt_exit ("\nnolce:\nProblems opening temporary files:"
" disk is full or I haven't permission to write in " P_tmpdir ", exiting...")
yyin = orig;
yyout = dest;
i_a_tag = 0;
found = 0;
while ((what = yylex ()))
switch (what)
{
case BASE:
http_base = (char *) gt_realloc (http_base, strlen (yytext) + strlen (DEFAULT_HTML) + 2);
if (!url_is_valid (yytext))
break;
strcpy (http_base, yytext);
check_html_url (http_base, 0);
*(strrchr (http_base, '/') + 1) = '\0';
break;
case TITLE:
if (title[0] != '\0')
{
node->title = (char *) gt_realloc (node->title, strlen (title) + 1);
strcpy (node->title, title);
}
break;
case REF:
if (i_frame)
process_reference (yytext, http_base, mod_base, fname, yyout, with_frames);
else if (i_a_tag)
found = process_reference (yytext, http_base, mod_base, fname, yyout, normal);
else
process_reference (yytext, http_base, mod_base, fname, yyout, normal);
break;
case IMG:
process_reference (img_src, http_base, mod_base, fname, yyout, is_image);
break;
}
fclose (orig);
fclose (dest);
copy_file (node->r_url, t_path, real_copy);
unlink (t_path); /* Removes temporary file. */
free (http_base);
free (mod_base);
}
else
node->type = node->type & (~SHOW);
free (path);
}
/********************** Function process_reference ****************************
* Adjusts links. First transforms the link to an absolute url, with *
* check_path(), then checks if that url corresponds to a file in the cache, *
* with find_url(). If yes, the url is retransformed to a a relative path, *
* with relative_position, then, if it doesn't refer to an html document, *
* copy_file() is called to make the link under dest_dir. *
* "ref" is the link to process, "base" is the BASE url of the document to *
* whom the link belongs, "mod_base" is like the previous, but in the format *
* of standardize(), "file_name" is the filename of the document, "out" the *
* temporary file with the adjusted version of the document, "operation" is *
* used in case that link refers to an image. *
******************************************************************************/
bool
process_reference (char *ref, char *base, char *mod_base,
char *file_name, FILE *out, int operation)
{
NODE *point = NULL;
bool ret_value = 1;
char *cleaned_url, *abs_url, *a, *anchor = NULL;
char *topic, *eff;
size_t len = strlen (ref), lbase = strlen (base), ldh = 1 + strlen (DEFAULT_HTML),
lname = strlen (file_name);
if (ref && (eff = strtok (ref, "\"' ")))
{
topic = (char *) gt_malloc (lbase * 2 + len + ldh + lname + 1);
strcpy (topic, eff);
}
else
return 0;
if (!strstr (topic, "internal-gopher"))
{
cleaned_url = (char *) gt_malloc (lbase + len + ldh + lname + 1);
abs_url = (char *) gt_malloc (lbase + len + ldh + lname + 1);
if ((a = strrchr (topic, '#')))
{
anchor = (char *) gt_malloc (strlen (a) + 1);
strcpy (anchor, a); /* If we have an anchor, we must */
if (a == topic) /* eliminate it when checking if */
strcpy (topic, file_name); /* the file is in the cache. */
else
*a = '\0';
}
check_path (cleaned_url, base, topic);
if (!url_is_valid (cleaned_url))
return 0;
strcpy (abs_url, cleaned_url);
standardize (cleaned_url);
point = find_url (root, cleaned_url);
if (!point)
{
char *sl;
sl = strrchr (cleaned_url, '/');
if (!strcmp (sl + 1, DEFAULT_HTML))
{
*sl = '\0';
point = find_url (root, cleaned_url);
if (!point)
*sl = '/';
}
}
if (point)
{
if (point->type & NEED_INDEX)
strcpy (cleaned_url, point->url_w_index);
relative_position (cleaned_url, mod_base, topic);
if (!(point->type & IS_HTML))
ret_value = copy_file (point->url, point->file_name, symbolic_link);
else if (operation == with_frames)
point->type = point->type & (~SHOW);
/* When a link points to a document present in the cache, but not */
/* processed because it doesn't satisfy the n_hour condition, */
/* this action is taken. */
if ((point->type & IS_HTML) && !(point->type & PR_MAIN))
{
if (!(opt.var & NO_LINKS))
{
point->type = point->type | PR_REL;
yet_to_process = 1;
}
else
ret_value = 0;
}
/* Copy link only if it isn't an HTML file */
if (anchor)
{
strcat (topic, anchor);
free (anchor);
}
}
else
ret_value = 0;
if (!ret_value)
strcpy (topic, abs_url);
free (cleaned_url);
free (abs_url);
}
if (operation == is_image)
{
if ((opt.var & MISSING_IMAGES) || (!(opt.var & MISSING_IMAGES) && ret_value))
fprintf (out, "<IMG SRC=\"%s\" %s>", topic, img_other);
}
else
fprintf (out, "\"%s\"", topic);
free (topic);
return ret_value;
}
/********************** Function copy_file ************************************
* If it's called by process_html_file, copies a temporary file ("file") under *
* dest_dir, creating the appropriate directory structure, basing on "url". If *
* the caller is process_reference, makes only a symbolic link to the non-html *
* file in the cache. *
******************************************************************************/
bool
copy_file (char *url, char *file, int operation)
{
char *base;
char *dest_file, *dir;
char *orig_file;
int ret = 1;
struct stat attr;
base = (char *) gt_malloc (strlen (url) + 1);
dest_file = cut_path (url, base);
/* Creation of directory structure. */
chdir (opt.dest_dir);
dir = strtok (base, "/"); /* base contains at least a '/' */
do
{
mkdir (dir, S_IRWXU | S_IRWXG | S_IRWXO);
chdir (dir);
}
while ((dir = strtok (NULL, "/")) != NULL);
if (operation == symbolic_link)
{
orig_file = (char *) gt_malloc (len_cache_dir + 1 + strlen (file) + 1);
sprintf (orig_file, "%s/%s", opt.cache_dir, file);
if (!stat (orig_file, &attr))
{
unlink (dest_file);
ret = symlink (orig_file, dest_file);
}
else
ret = -1;
free (orig_file);
}
else
{
unlink (dest_file);
ret = link (file, dest_file);
}
if (ret == 1)
gt_exit ("\nnolce:\nProblems with symbolic links:"
" probably file system doesn't support them,\nexiting...");
chdir (curr_dir);
free (base);
if (ret == -1)
return 0;
else
return 1;
}
nolce-1.7-2/src/skeletons.c 100644 0 0 6331 6450264437 13656 0 ustar root root
/* SEE COPYRIGHT NOTE IN THE FILE main.c */
/* Skeletons for files related to summary. */
char *index[9] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<head>\n<title> Nolce summary</title>\n</head>",
"\n<frameset rows=\"26, *\" border=0 frameborder=\"no\" bordercolor=#ffffff>",
"\n<frame scrolling=auto name=\"banner\" ",
">\n<frame scrolling=auto name=\"user\" ",
">\n<noframes>",
"\nIf a frame capable browser isn't avaible, use this ",
"\n<a href=\"nolce_files/full_index.html\">full index</a>.",
"\n</noframes>\n</frameset>\n</html>"
};
char *banner[23] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<body bgcolor=#008877 link=#ffffff vlink=#ffffff alink=#bbbbbb>",
"\n<table width=\"100%\" border=0 cellpadding=0 cellspacing=0>",
"\n<tr><td align=center>",
"\n<font size=2><a target=\"user\" ",
" onMouseOver=\"window.status='Shows index and domains window.';return true\">",
"\nList & domains</a></td><td align=center>",
"\n<font size=2><a target=\"user\"",
" onMouseOver=\"window.status='Shows index without domains window.';return true\">",
"\nSimple list</a></td><td align=center>",
"\n<font size=2><a target =\"user\" href=\"/usr/doc/nolce-1.7-2/frame_docs.html#instr\"",
"onMouseOver=\"window.status='About using the summary.';return true\">",
"\nInstructions</a></td><td align=center>",
"\n<font size=2><a target=\"user\" href=\"/usr/doc/nolce-1.7-2/README.html\"",
"onMouseOver=\"window.status='Help on using nolce.';return true\">",
"\nHelp</a></td><td align=center>",
"\n<font size=2><a target=\"user\" href=\"http://www.aspide.it/freeweb/giustrov/nolce.html\" ",
"onMouseOver=\"window.status='Check for news regarding nolce!';return true\">",
"\nNolce web page</a></td><td align=center>",
"\n<font size=2><a href=\"mailto:g.trovato@usa.net\" onMouseOver=\"window.status=",
"'For questions, bug reports, suggestions...';return true\">",
"\nContact the author</a></td></tr></table>",
"\n</body>\n</html>"
};
char *wd_index[8] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<frameset cols=\"180, *\" border=1 frameborder=\"yes\">",
"\n<frame scrolling=auto name=\"domains\" ",
">\n<frame scrolling=auto name=\"list\" ",
">\n<noframes>\nSorry, this document needs a frame-capable browser.\n<br>",
"\nIf such a browser isn't avaible, read the ",
"<a href=\"full_index.html\">full_index.html</a> file.",
"\n</noframes>\n</frameset>\n</html>"
};
char *dom_index[4] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<body bgcolor=#ffffff>",
"\n<table bgcolor=#ff6060 width=\"100%\" cellpadding=3 cellspacing=0 border=0>",
"\n<tr><td>\n<font size=4 color=#ffeeee><b>Domains</b></font>\n</td></tr></table>\n<p>"
};
char *full_index[6] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<HTML>\n<HEAD>\n<TITLE>nolce summary</TITLE>\n</HEAD>\n<BODY BGCOLOR=#ffffff>",
"\n<table bgcolor=#1188ff width=\"100%\" cellpadding=3 cellspacing=0 border=0>",
"\n<tr><td>\n<font size=4 color=#eeffff><b>Pages</b></font>\n</td></tr></table>",
"\n<p>\n<font size=2>Pages can be viewed off-line, clicking on the icon. Clicking ",
"\non the url, the document will be downloaded from the Internet.\n</font>\n<p><a name=\"1\">"
};
nolce-1.7-2/src/utils.c 100644 0 0 37117 6450265106 13027 0 ustar root root
/* SEE COPYRIGHT NOTE IN THE FILE main.c */
/**************************** INCLUDES *************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <ctype.h>
#include <unistd.h>
#include "nolce.h"
/************************ EXTERN VARIABLES *********************************/
extern char *curr_dir, *lock_file;
extern T_OPT opt;
extern NODE *root;
/********************** Function gt_malloc ************************************
* Calls malloc, checks if we are out of memory, and returns a pointer to a *
* block of a certain size ("size") of allocated memory. *
******************************************************************************/
void *
gt_malloc (size_t size)
{
void *ptr;
ptr = malloc (size);
if (ptr == NULL)
gt_exit ("\nnolce:\nThe program can't get necessary memory, exiting...");
return ptr;
}
/********************** Function gt_realloc ***********************************
* Calls realloc, checks if we are out of memory, and returns a pointer to a *
* block of a certain size ("size") of allocated memory. *
******************************************************************************/
void *
gt_realloc (void *aap, size_t size)
{
void *ptr;
ptr = realloc (aap, size);
if ((ptr == NULL) && (size != 0))
gt_exit ("\nnolce:\nThe program can't get necessary memory, exiting...");
return ptr;
}
/********************** Function help *****************************************
* Prints the help message and exits successfully. *
******************************************************************************/
void
help ()
{
printf ("nolce " VERSION ", (C) 1997 Giuseppe Trovato. See LICENCE file for terms of use."
"\n\nUsage: nolce [n_hours] [OPTIONS]..."
"\nReads Netscape Navigator (ver. 2 and above) cache files created in"
"\nthe last n_hours hours and copies them in a new directory adjusting"
"\nfile names and links to permit an off-line navigation of them."
"\nIf n_hours isn't supplied, all cached files are processed."
"\nOptions:"
"\n\n -c cache_dir\t directory where cache is, default $HOME/.netscape/cache"
"\n -d dest_dir\t directory where files are copied, default $HOME/" DEST_DIR
"\n -g sub_string\t process only files whose URL contains sub_string"
"\n -G sub_string\t process only files whose URL doesn't contain sub_string"
"\n -i summary_file file name of summary, it will be created in dest_dir,"
"\n\t\t default is " SUMMARY_FILE
"\n -w\t\t show pages in another window"
"\n -W\t\t show pages in the list frame"
"\n -s\t\t execute silently"
"\n -m\t\t don't eliminate missing images"
"\n -t\t\t put downloading date of documents in summary file"
"\n -f\t\t don't process links not satisfying initial conditions"
"\n -p\t\t cache is generated by Netscape for Windows"
"\n --help\t shows this help\n");
exit (EXIT_SUCCESS);
}
/********************** Function print_error **********************************
* Prints various messages for various types of errors in command line. *
******************************************************************************/
void
print_error (char option, char *prg_name, int what_error)
{
if (what_error == illegal)
printf ("%s: illegal option -- %c", prg_name, option);
else if (what_error == need_arg)
printf ("%s: option requires an argument -- %c", prg_name, option);
else if (what_error == already_supplied)
{
if (!option)
printf ("%s: already supplied option -- n_hours", prg_name);
else
printf ("%s: already supplied option -- %c", prg_name, option);
}
printf ("\nTry `%s --help' for more information.\n", prg_name);
exit (EXIT_FAILURE);
}
/********************** Function cut_path *************************************
* From a complete path ("path"), that is directories + filename, returns *
* these two components separately (directories in "base" and the filename as *
* the returned value of the function). *
******************************************************************************/
char *
cut_path (char *path, char *base)
{
char *pos;
strcpy (base, path); /* base needs to be allocated from the */
pos = strrchr (base, '/'); /* caller, at least strlen(path) + 1. */
*(pos + 1) = '\0';
return 1 + strrchr (path, '/');
}
/********************** Function check_path ***********************************
* From the base url ("base") of a document and a link ("entry"), returns the *
* absolute url ("cleaned"). *
******************************************************************************/
char *
check_path (char *cleaned, char *base, char *entry)
{
char *l_entry; /* Local copy of the entry parameter. */
char *p_l_entry, *pasf;
if (strstr (entry, "://"))
return strcpy (cleaned, entry);
p_l_entry = l_entry = (char *) gt_malloc (strlen (entry) + 1);
strcpy (l_entry, entry);
strcpy (cleaned, base);
if (l_entry[0] == '/')
{
char *a, *b;
int k;
b = cleaned;
for (k = 1; *b != '\0' && k <= 3; k++)
{
a = strchr (b, '/');
if (k == 3)
if (a)
*a = '\0';
b = a + 1;
}
}
else
while (strstr (l_entry, "../"))
{ /* For every "../" in the ref., cuts a directory in the base path. */
pasf = strrchr (cleaned, '/');
if (pasf)
*pasf = '\0';
pasf = strrchr (cleaned, '/');
if (pasf)
*(pasf + 1) = '\0';
l_entry += 3;
}
strcat (cleaned, l_entry);
free (p_l_entry);
return cleaned;
}
/********************** Function check_html_url *******************************
* Checks if the url ("str") contains the specification of an html file. If *
* no, the DEFAULT_HTML suffix is added. *
******************************************************************************/
bool
check_html_url (char *str, bool only_test)
{
char *ext;
ext = strrchr (str, '.');
if (ext == NULL || (strcmp (ext, ".html") && strcmp (ext, ".htm") && strcmp (ext, ".HTML") && strcmp (ext, ".HTM")))
{
if (!only_test && (str[0] != '\0'))
{
size_t len;
len = strlen (str);
if (str[len - 1] != '/')
strcat (str, "/");
strcat (str, DEFAULT_HTML);
}
return 1;
}
return 0;
}
/********************** Function process_arg **********************************
* Stores a command line parameter ("str") in the appropriate variable *
* ("store") and, when necessary, turns a relative path into an absolute one. *
******************************************************************************/
void
process_arg (char *str, char **store, bool need_abs_path)
{
if (need_abs_path && (str[0] != '/'))
{
*store = (char *) gt_malloc (strlen (curr_dir) + 1 + strlen (str) + 1);
sprintf (*store, "%s/%s", curr_dir, str);
}
else
{
*store = (char *) gt_malloc (strlen (str) + 1);
strcpy (*store, str);
}
}
/********************** Function standardize **********************************
* Deletes possible final '/' or double '/' from an url ("str"). *
******************************************************************************/
char *
standardize (char *str)
{
char *snew, *c, *sp, *token;
snew = (char *) gt_malloc (strlen (str) + 1);
sp = strstr (str, "://");
strncpy (snew, str, sp - str);
snew[sp - str] = '\0';
c = snew - 1;
while (*++c != '\0')
*c = tolower (*c);
sp += 3;
token = strtok (sp, "/");
strcat (snew, "/");
strcat (snew, token);
while ((token = strtok (NULL, "/")) != NULL)
{
c = token - 1;
while (*++c != '\0')
{
if (!isalnum (*c))
if (!strchr ("._-+", *c))
*c = '_';
if (c - token > 128)
{
*c = '\0';
break;
}
}
if (strcmp (token, "."))
strcat (strcat (snew, "/"), token);
}
strcpy (str, snew);
free (snew);
return str;
}
/********************** Function gt_strshift **********************************
* Shifts to left the characters of a string ("str") of a given number ("pos") *
* of positions. *
******************************************************************************/
char *
gt_strshift (char *name, int pos)
{
register char *str = name - 1;
while ((*++str = *(str + pos)));
return name;
}
/********************** Function relative_position ****************************
* Transforms an absolute url ("url") to a relative one ("ref"), basing on a *
* given document's base ("base"). *
******************************************************************************/
char *
relative_position (char *url, char *base, char *ref)
{
char *b = base - 1;
ptrdiff_t off = url - base;
char *suff, *tok, *last_slash = base;
size_t dirs = 0;
char c;
int k;
do
{
c = *++b;
if (c == '/')
last_slash = b + 1;
if (c != b[off])
{
char *l_base;
suff = last_slash + off;
l_base = (char *) gt_malloc (strlen (last_slash) + 1);
strcpy (l_base, last_slash);
tok = strtok (l_base, "/");
while (tok)
{
dirs++;
tok = strtok (NULL, "/");
}
break;
}
}
while (c != '\0');
ref[0] = '\0';
for (k = 1; k <= dirs; k++)
strcat (ref, "../");
strcat (ref, suff);
return ref;
}
/********************** Function start_summary_files **************************
* Creates the various index-related files under nolce_files. *
******************************************************************************/
FILE *
start_summary_files (int what, long *nd)
{
#include "skeletons.c"
int k;
FILE *f;
char line[80];
char *temp;
if (what == CONST)
{
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 4; k++)
fputs (index[k], f);
fprintf (f, "src=\"nolce_files/banner_%s\"", opt.summary_file);
fputs (index[4], f);
fprintf (f, "src=\"nolce_files/wd_%s\"", opt.summary_file);
for (k = 5; k < 9; k++)
fputs (index[k], f);
fclose (f);
sprintf (temp, "%s/nolce_files/banner_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 5; k++)
fputs (banner[k], f);
fprintf (f, "href=\"wd_%s\"", opt.summary_file);
for (k = 5; k < 8; k++)
fputs (banner[k], f);
fprintf (f, "href=\"full_%s\"", opt.summary_file);
for (k = 8; k < 23; k++)
fputs (banner[k], f);
fclose (f);
sprintf (temp, "%s/nolce_files/wd_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 3; k++)
fputs (wd_index[k], f);
fprintf (f, "src=\"dom_%s\"", opt.summary_file);
fputs (wd_index[3], f);
fprintf (f, "src=\"full_%s\"", opt.summary_file);
for (k = 4; k < 8; k++)
fputs (wd_index[k], f);
fclose (f);
}
if (what == FULL)
{
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/nolce_files/full_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "r+");
if (f && !strcmp (fgets (line, 80, f), "<!-- File generated by nolce. Do not edit! -->\n"))
fseek (f, -15L, SEEK_END);
else
{
fclose (f);
f = fopen (temp, "w");
for (k = 0; k < 6; k++)
fputs (full_index[k], f);
}
}
if (what == DOM)
{
*nd = 0;
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/nolce_files/dom_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "r+");
if (f && !strcmp (fgets (line, 80, f), "<!-- File generated by nolce. Do not edit! -->\n"))
{
while (fgets (line, 80, f) != NULL)
if (!strncmp (line, "<a target=", 10))
(*nd)++;
fseek (f, -15L, SEEK_CUR);
}
else
{
fclose (f);
f = fopen (temp, "w");
for (k = 0; k < 4; k++)
fputs (dom_index[k], f);
}
}
free (temp);
if (what != CONST)
return f;
else
return NULL;
}
/********************** Function put_in_summary_files *************************
* Creates an index entry for an html document ("data"). If the document *
* is the last, closes all index files. *
******************************************************************************/
void
put_in_summary_files (NODE * data, bool close_all)
{
static char *old_domain;
static size_t times = 0;
static FILE *f_full, *f_dom;
static long label;
static char target[16];
char red_url[74];
int k, j;
if (close_all)
{
fputs ("\n</pre>\n<hr size=1 noshade>\n</body>\n</html>", f_dom);
fputs ("\n</body>\n</html>", f_full);
fclose (f_dom);
fclose (f_full);
return;
}
for (k = 0, j = 0; j < 3; k++)
if (data->or_url[k] == '/')
j++;
if (times++ == 0)
{
time_t curr_time;
old_domain = (char *) gt_malloc (1);
old_domain[0] = '\0';
f_dom = start_summary_files (DOM, &label);
f_full = start_summary_files (FULL, NULL);
curr_time = time (NULL);
fprintf (f_dom, "\n<font size=2>Retrieved on %s</font><pre>",
ctime (&curr_time));
if (opt.view_window == 0)
strcpy (target, "target=\"_top\"");
else if (opt.view_window == 1)
strcpy (target, "target=\"_view\"");
else if (opt.view_window == -1)
*target = '\0';
}
if ((strlen (old_domain) != (k - 1)) || strncmp (old_domain, data->or_url, k - 1))
{
old_domain = (char *) gt_realloc (old_domain, k);
strncpy (old_domain, data->or_url, k - 1);
old_domain[k - 1] = '\0';
fprintf (f_dom, "\n<a target=list HREF=\"full_%s#%ld\">"
"<img src=\"internal-gopher-menu\" hspace=2 "
"align=bottom border=0>%s</a>",
opt.summary_file, ++label, strchr (old_domain, '/') + 2);
if (label > 1)
fprintf (f_full, "<p>\n<font size=1>New domain</font>"
"\n<hr align=left width=\"150\" size=1 noshade>\n<a name=\"%ld\">", label);
}
strncpy (red_url, data->or_url, 70);
red_url[70] = red_url[71] = red_url[72] = '.';
red_url[73] = '\0';
fprintf (f_full, "<p><table border=0><tr><td align=left><a %s href=\"../%s\"><img src=\"internal-gopher-text \""
"align=left border=0></a></td><td nowrap><b><font color=#ff0000>Title: </font></b>"
"%s\n<br><b>Url: </b>\n<a target=\"_top\" href=\"%s\">%s</a></td></tr></table>",
target, data->r_url, data->title, data->or_url, red_url);
if ((opt.var & REPORT_TIME))
fprintf (f_full, "\n<font color=\"#000055\" size=\"-1\">Downloaded on %s</font>", ctime (&(data->mod_time)));
}
/********************** Function url_is_valid *********************************
* Checks if the url ("str") is valid. *
******************************************************************************/
bool
url_is_valid (char *str)
{
if (str)
{
char *w_prfx;
w_prfx = strstr (str, "://");
if (!w_prfx)
return 0;
w_prfx += 3;
if (!isalpha (*w_prfx) && !isdigit (*w_prfx))
return 0;
return 1;
}
else
return 0;
}
/********************** Function check_file_stream ****************************
* Checks if a file ("path") was opened successfully. *
******************************************************************************/
void
check_file_stream (FILE *f, char *path)
{
if (f == NULL)
{
if (errno == EACCES)
fprintf (stderr, "\n\nnolce:\nCould not create index file `%s', haven't permission.", path);
else
fprintf (stderr, "\n\nnolce:\nCould not create index file `%s'", path);
gt_exit ("Exiting...");
}
}
nolce-1.7-2/src/nolce.h 100644 0 0 5423 6450264412 12746 0 ustar root root
/* SEE COPYRIGHT NOTE IN THE FILE main.c */
/***************************** DEFINES *************************************/
#define CACHE_FILE "index.db"
#define WIN_CACHE_FILE "fat.db"
#define DEST_DIR "cached" /* Goes under $HOME */
#define SUMMARY_FILE "index.html" /* Under DEST_DIR */
#define DEFAULT_HTML "index.html" /* Under site's directory tree */
#define gt_exit(str) {printf ("\n" str "\n"); unlink (lock_file); exit (EXIT_FAILURE);}
#define VERSION "1.7-2"
/***************************** TYPEDEFS ************************************/
typedef unsigned char bool; /* Boolean True (1) or False (0) */
typedef struct node_of_btree
{
char *or_url;
char *url;
char *url_w_index;
char *r_url; /* Link to url or url_w_index */
int type;
char *file_name;
char *title;
size_t dup;
time_t mod_time;
struct node_of_btree *sx;
struct node_of_btree *dx;
}
NODE;
typedef struct cmd_options
{
unsigned int var; /* Bit-mapped */
int view_window;
char *cache_dir, *dest_dir, *summary_file, *str_tbc;
}
T_OPT;
/************************ FUNCTION DECLARATIONS ****************************/
void *gt_malloc (size_t size);
void *gt_realloc (void *aap, size_t size);
NODE *get_data (NODE * start, time_t cut_time, char *sub_str);
NODE *add_url (NODE * node, char *or_url, char *url,
int type, char *file_name, time_t mtime);
NODE *find_url (NODE * node, char *url);
bool check_html_url (char *str, bool only_test);
bool process_reference (char *ref, char *base, char *mod_base,
char *file_name, FILE * out, int operation);
bool copy_file (char *url, char *file, int operation);
bool url_is_valid (char *str);
char *cut_path (char *path, char *base);
char *check_path (char *cleaned, char *base, char *entry);
char *standardize (char *str);
char *relative_position (char *url, char *base, char *ref);
char *gt_strshift (char *name, int pos);
void process_html_file (NODE * node);
void process_arg (char *str, char **store, bool need_abs_path);
void sig_handler (int sig_number);
void help ();
void check_file_stream (FILE *f, char *path);
void print_error (char option, char *prg_name, int what_error);
void visit_tree (NODE * node, int operation);
void put_in_summary_files (NODE * data, bool close_all);
FILE *start_summary_files (int what, long *nd);
/***************************** OTHERS **************************************/
enum errors
{
illegal = 1, need_arg, already_supplied
};
enum opt_bits /* For T_OPT */
{
SILENT = 1, MISSING_IMAGES = 2, REPORT_TIME = 4,
NO_LINKS = 8, WIN_CACHE = 16
};
enum type_bits /* For NODE.type */
{
IS_HTML = 1, NEED_INDEX = 2, PR_MAIN = 4,
PR_REL = 8, SHOW = 16, PROCESSED = 32
};
enum what_index
{
DOM, FULL, CONST
};
nolce-1.7-2/src/scanner.lex 100644 0 0 6541 6445505401 13642 0 ustar root root %{
#include <stdio.h>
enum rets {TITLE=1, BASE, REF, IMG};
enum stats {STD = 0, I_BASE=1, N_I_R};
enum rs {generic, image};
char title[256], img_other[512];
char *img_src;
int i_k = 0, t_k = 0, ref;
unsigned char i_a_tag, found, i_frame;
%}
%s STD I_BASE I_XMP I_TAG N_I_R I_TITLE I_IMG
SP [ \t\r\n]
a [Aa]
b [Bb]
c [Cc]
d [Dd]
e [Ee]
f [Ff]
g [Gg]
h [Hh]
i [Ii]
j [Jj]
k [Kk]
l [Ll]
m [Mm]
n [Nn]
o [Oo]
p [Pp]
q [Qq]
r [Rr]
s [Ss]
t [Tt]
u [Uu]
v [Vv]
w [Ww]
x [Xx]
y [Yy]
z [Zz]
%%
<STD>"<"{b}{a}{s}{e}{SP}* BEGIN I_BASE;
<I_BASE>{t}{a}{r}{g}{e}{t}{SP}*={SP}*[^ >]+ {
fprintf (yyout, "<base %s>", yytext);
}
<I_BASE>[^<>= '\"\t\r\n]+ return BASE;
<I_BASE>">" BEGIN STD;
<I_BASE>[<>= '\"\t\r\n] |
<I_BASE>{h}{r}{e}{f}{SP}*={SP}* { /* Discard */ }
<STD>"<"{t}{i}{t}{l}{e}{SP}*">" { ECHO;
BEGIN I_TITLE;
}
<I_TITLE>.|\n { ECHO;
if (t_k < 255 )
title[t_k++] = *yytext;
}
<I_TITLE>"<""/"{b}{o}{d}{y}{SP}*">" {
ECHO;
title[0] = '\0';
t_k = 0;
BEGIN STD;
return TITLE;
}
<I_TITLE>"<""/"{t}{i}{t}{l}{e}{SP}*">" { ECHO;
title[t_k] = '\0';
t_k = 0;
BEGIN STD;
return TITLE;
}
<N_I_R>[^> \t\r\n]* { if (ref == generic)
{
BEGIN I_TAG;
return REF;
}
else
{
img_src = (char *) realloc (img_src, yyleng + 1);
strcpy (img_src, yytext);
BEGIN I_IMG;
}
}
<N_I_R>">" {
yyless (0);
BEGIN I_TAG;
}
<STD>"<""/"{a}{SP}*">" { if (found)
{ found = 0;
fputs ("</I>", yyout); }
ECHO;
}
<STD>"<"{f}{r}{a}{m}{e}{SP}+ { ECHO;
yyless(6);
i_frame = 1;
BEGIN I_TAG;
}
<STD>"<"{a}{SP}+ { ECHO;
yyless(2);
i_a_tag = 1;
BEGIN I_TAG;
}
<STD>"<"[a-zA-Z]+ { ECHO;
BEGIN I_TAG;
}
<STD>"<"{i}{m}{g}{SP}+ {
yyless (4);
BEGIN I_IMG;
}
<I_IMG>{SP}+{s}{r}{c}{SP}*= {
ref = image;
BEGIN N_I_R;
}
<I_IMG>{SP}+{l}{o}{w}{s}{r}{c}{SP}*={SP}*[^ \n\t\r>]+ {}
<I_IMG>[^>] if (i_k < 511) img_other[i_k++] = *yytext;
<I_IMG>">" {
img_other[i_k] = '\0';
i_k = 0;
BEGIN STD;
return IMG;
}
<I_TAG>{SP}+{h}{r}{e}{f}[{SP}="]+{m}{a}{i}{l}{t}{o}:[^>]*">" |
<I_TAG>{SP}+{h}{r}{e}{f}[{SP}="]+{n}{e}{w}{s}:[^>]*">" {
ECHO;
i_a_tag = 0;
BEGIN STD;
}
<I_TAG>{SP}+{h}{r}{e}{f}{SP}*={SP}* |
<I_TAG>{SP}+{s}{r}{c}{SP}*={SP}* |
<I_TAG>{SP}+{b}{a}{c}{k}{g}{r}{o}{u}{n}{d}{SP}*={SP}* {
ECHO;
ref = generic;
BEGIN N_I_R;
}
<I_TAG>\"([^"]|(\\\"))+\" ECHO;
<I_TAG>">" { ECHO;
if (i_frame)
i_frame = 0;
if (i_a_tag)
{ i_a_tag = 0;
if (found)
fputs ("<I>", yyout);}
BEGIN STD;
}
<STD>"<""/"[a-zA-Z]+{SP}*">" ECHO;
<STD>"<"{x}{m}{p}{SP}*">" { ECHO;
BEGIN I_XMP;
}
<I_XMP>"<""/"{x}{m}{p}{SP}*">" { ECHO;
BEGIN STD;
}
<STD,I_TAG,I_XMP>[^<>= \t\r\n]* ECHO;
<STD,N_I_R,I_TAG,I_XMP>.|\n ECHO;
.|\n { /* This is used to enter STD mode
every time yylex() is called. */
yyless (0);
BEGIN STD;
}
nolce-1.7-2/src/Makefile 100644 0 0 2406 6450263722 13136 0 ustar root root # Makefile for nolce 1.7-2
# (C) 1997 G. Trovato (g.trovato@usa.net)
########### Variables ############
DOCS_DIR=/usr/doc/nolce-1.7-2
BIN_DIR=/usr/bin
MAN_DIR=/usr/man/man1
LDFLAGS = -ldb -lfl # -lfl is needed from flex.
CFLAGS = -Wall # Compiler flags.
LFLAGS = # Possible lex command flags. If you're using
# flex, -Cf is strongly advised.
DEFINES= # Use -Darray if your lex defines yytext as a
# char array rather than char pointer.
########### End of user variables ############
OBJS=main.o utils.o lex.yy.o
SHELL = /bin/sh
all: nolce
nolce: $(OBJS)
$(CC) -o nolce $(OBJS) $(LDFLAGS)
main.o: main.c nolce.h
$(CC) -o main.o -c main.c $(CFLAGS) $(DEFINES)
utils.o: utils.c nolce.h skeletons.c
$(CC) -o utils.o -c utils.c $(CFLAGS)
lex.yy.o: lex.yy.c
$(CC) -o lex.yy.o -c lex.yy.c
lex.yy.c: scanner.lex
$(LEX) $(LFLAGS) scanner.lex
clean:
-rm -f main.o
-rm -f utils.o
-rm -f lex.yy.c
-rm -f lex.yy.o
-rm -f nolce
DOCS = ../docs/LICENCE ../docs/README.html ../docs/frame_docs.html ../docs/frame_toc.html ../docs/CHANGES.html
install: nolce
cp nolce $(BIN_DIR)
-mkdir -p $(DOCS_DIR)
cp ../docs/nolce.1 $(MAN_DIR)
cp $(DOCS) $(DOCS_DIR)
uninstall:
-rm -f $(BIN_DIR)/nolce
-rm -f $(MAN_DIR)/nolce.1
-rm -rf $(DOCS_DIR)
nolce-1.7-2/IMPORTANT_NOTE 100644 0 0 302 6431320063 12774 0 ustar root root Read in the documentation the paragraph "Important notes" under the section
"Usage and what the program does" for info regarding how to avoid that some
visited pages aren't stored in the cache.
nolce-1.7-2/.Nolce_was_known_as_netcache 100755 0 0 442 6431364426 16336 0 ustar root root #!/bin/sh
more +6 .Nolce_was_known_as_netcache
exit
##### End of commands.
##### Message:
Yes, netcache-1.4 of ftp://sunsite.unc.edu/pub/Linux/www/plugins was one of the previous versions of this program.
The name was changed because I discovered that netcache is trade mark.
G. Trovato