pkg://nolce-1.9-2.src.rpm:30912/nolce-1.9.2.tar.gz
info downloads
nolce-1.9.2/ 40755 0 0 0 6506751176 10622 5 ustar root root nolce-1.9.2/docs/ 40755 0 0 0 6506755102 11543 5 ustar root root nolce-1.9.2/docs/CHANGES.html 100644 0 0 13424 6506755111 13622 0 ustar root root <html>
<head>
<title>CHANGES</title>
</head>
<body bgcolor=#ffffff>
<h3>Changes of nolce version 1.9.2 over nolce 1.9-1</h3>
<ul>
<li>Changes to documentation's directory.
<li>Changes to package's structure.
</ul>
<h3>Changes of nolce version 1.9-1 over nolce 1.8-1</h3>
<ul>
<li>Now images and other non html documents are copied under <code>dest_dir</code>.
If one still wants to have symbolic links for those files, the new switch <code>-k</code> can
be used.
<li>Small changes to source to compile and work on Redhat 5.0
<li>Fixing of a bad bug in <code>scanner.lex</code>
<li>Changes to compile properly on Irix 5.3 (thanks to Andrei Mircea).
<li>Some other small changes and enhancements.
</ul>
<h3>Changes of nolce version 1.8-1 over nolce 1.7-2</h3>
<ul>
<li>The behavior of the <code>-w</code> switch has changed. Now, by default,
pages are showed in another window, while with <code>-w</code> the same index
window is used. <code>-W</code> is unchanged.
<li>Html files are now created directly to the proper place under <code>dest_dir</code>.
There is no longer need for temporary files and for write permission on <code>/tmp</code>.
And there is no longer the problem of lacking html files when <code>dest_dir</code> and <code>/tmp</code> were on different partitions.
<li>New option <code>-l</code> for ignoring lock file in <code>$HOME/.netscape</code>
when one is sure that the cache isn't used by <b>that</b> Netscape.
<li>Now, the Makefile, by default uses flex. This is to avoid problems on
some Slackware systems where <code>lex</code> is a call to <code>flex -l</code>.
<li>Better diagnostics.
<li>Various small corrections to documentation.
<li>Some changes to sources.
</ul>
<h3>Changes of nolce version 1.7-2 over nolce 1.7</h3>
<ul>
<li>Bug of url_is_valid() fixed (thanks to Patrick Asty).
</ul>
<h3>Changes of nolce version 1.7 over nolce 1.6</h3>
<ul>
<li>Default destination dir is now <code>$HOME/cached</code>.
<li>Incorrect Html is supported better.
<li>Bug regarding displaying of informations about main and related html files fixed.
<li>Better protection against crashes in strange stituations.
<li>Better management of user (Ctrl-C) interruptions.
<li>Better support for using by a non-root user.
<li>Some small bugs or malfunction fixing.
<li>Source files now include short descriptions of what each function does.
</ul>
<h3>Changes of nolce version 1.6 over nolce 1.5</h3>
<ul>
<li>Bug regarding http links contained in ftp documents fixed.
<li>Now the program makes distinction between various command line errors.
<li>Options, unless n_hours, may now be given grouped (i.e. -pt or pt rather than -p -t)
<li>Fixing of a bug in the summary file occurring with names different from index.html.
<li>More precise management of different documents with the same url.
<li>If the program is launched without options, the user is asked if he really
wants to proceed.
<li>Url-type prefixes (http, ftp) are all converted to lower case.
<li>Changes to documentation.
</ul>
<h3>Changes of nolce version 1.5 over netcache 1.4</h3>
<ul>
<li>Changing of the name.
<li>Summary file format has changed drastically: now it's based on the division into
domains of retrieved pages.
<li>Now the program can process cache directories created by Netscape for
Windows present in non Linux-native partitions. Adding of the
<code>-p</code> option.
<li>Now <code>n_hours</code> condition is checked basing on
informations contained in <code>index.db</code>, not on files'
timestamps. This way is faster and better.
<li>Temporary HTML files are now created in <code>/tmp</code> directory.
<li>Adding of the <code>-W</code> option.
<li>Now the program supports the situation in which there are in the cache
multiple (different) files referring to the same url.
<li>Adding of the script <code>install.sh</code>, for installing precompiled version
(make install recompiles).
<li>Various minor bugs fixing.
<li>File organization in distribution changed.
<li>Various changes to sources to remove MAX_PATH limitation on command line
arguments.
</ul>
<h3>Changes of netcache version 1.4 over the 1.3</h3>
<ul>
<li>More complete management of problems connected to the use of the
<code>n_hours</code> parameter. Adding of the <code>-f</code> option.
<li>Adding of the option <code>-g</code>, which permits to process only HTML
files whose URL contains a specified string. Also adding <code>-G</code>, its opposite.
<li>Bug related to <code>-t</code> option fixed.
<li>Some links to local documents and images didn't function due to a bug
now fixed.
<li>The same problem as above was due to another bug also fixed.
<li>Now netcache works even if Netscape Navigator isn't installed, that is
even if the directory <code>$HOME/.netscape</code> doesn't exist.
<li>Progress meter when reading <code>index.db</code> file.
<li>Added support for disk-full situation.
<li>Added support for multiple versions of the same document in the cache.
<li>Heavy changes to documentation.
</ul>
<h3>Changes of netcache version 1.3 over the 1.1</h3>
<ul>
<li>Adding of the option <code>-t</code>, which permits to see downloading date of each
document in the <code>summary_file</code> .
<li>When interrupted by the user, the program now deletes current temporary
file in <code>$HOME/.netscape/cache</code> .
<li>Some changes to documentation.
</ul>
<h3>Changes of netcache version 1.1 over the 1.0</h3>
<ul>
<li>The behavior of the <code>-m</code> switch has changed. Now, by default, missing
images are totally deleted, and <code>-m</code> is used to not delete them.
<li>If no documents are retrieved, no summary file is written and the user is
informed.
<li>Messages during execution are cleaner and more compact. Informations now
include total number of files to be processed.
<li>Verifying of the <code>n_hours</code> condition is made before in the program, saving
execution time.
</ul>
</body>
</html>
nolce-1.9.2/docs/README.html 100644 0 0 643 6506751175 13454 0 ustar root root <html>
<head>
<title> README for nolce</title>
</head>
<frameset rows="*, 74" border=0 frameborder="no">
<frame src="frame_docs.html#start" scrolling=auto name="docs">
<frame src="frame_toc.html" scrolling=auto name="toc">
<noframes>
Sorry, this document needs a frame-capable browser.
<br>
If such a browser isn't available, read the <a href="frame_docs.html">frame_docs.html</a> file.
</noframes>
</frameset>
</html>
nolce-1.9.2/docs/frame_docs.html 100644 0 0 55251 6506751504 14662 0 ustar root root <html>
<head>
<title>README for nolce</title>
<base>
</head>
<body bgcolor="#ffffff">
<table width="100%" border=0 cellspacing=0 cellpadding=3>
<tr><td bgcolor="#e5fff5"><a
HREF="frame_docs.html#prologue"><center>What's new</center></a>
</td><td bgcolor="#e2f1fe">
<a HREF="frame_docs.html#usage"><center>Usage and what the program does</center></a>
</td><td bgcolor="#e5fff5">
<a HREF="frame_docs.html#install"><center>Installation</center></a>
</td></tr>
<tr><td bgcolor="#e2f1fe">
<a HREF="frame_docs.html#comp"><center>Compatibility</center></a>
</td><td bgcolor="#e5fff5"><table width="100%" height="100%" border=0 cellspacing=0 cellpadding=0 ><tr><td>
<a HREF="frame_docs.html#work"><center>How it works</center></a>
</td><td>
<a HREF="frame_docs.html#author"><center>Contacting the author</center></a>
</td></tr></table></td><td bgcolor="#e2f1fe">
<a HREF="LICENCE"><center>Licence</center></a>
</td></tr></table>
<hr>
<a name=start>
<h2>
<basefont size=3>
README for nolce
</h2>
(C) 1997-98 G. Trovato.
<p>
Nolce (<b>N</b>etscape's <b>O</b>ff <b>L</b>ine <b>C</b>ache <b>E</b>xplorer)
is a Linux program which allows an off-line navigation of Netscape Navigator cache files
adjusting their names and links.
<p>
<a name=prologue>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Introduction</b>
</font>
</td></tr></table><p>
</a>
Every <b>Netscape Navigator</b> user probably knows
that it saves almost all files downloaded from the Internet in the local hard disk,
unless this option has been disabled by the user. Html files, images,
and downloaded documents are normally stored under the directory
<code>$HOME/.netscape/cache</code>.
<p>
One could like to view those downloaded documents also off-line, read them
with calm, and possibly save them with related images.
But this isn't immediately possible, because stored files in the cache have their names
changed, i.e. an original main.html may become a
<code>cache33BAD64001B0829.html</code>.
Besides they are stored under the cache directory in subdirs like <code>00,
01, ...</code>
without any respect of the relative positions of files.
<br>So even if you could guess what cached file corresponds to your desired
document, you see it without any image and with all links not working.
<p>
Saving a document from Netscape after receiving it doesn't save the related
images and links, so you can see only the textual part of the document when
you are off-line. <br>One can think that in this situation Netscape could retrieve lacking images from the cache, but it
isn't so because before using a cached file, it tries to connect to the
original site to check if the remote file is more recent than the local. As if
you're off-line this check isn't possible, the local file isn't used.
<p>
<a name=usage>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Usage and what the program does</b>
</font>
</td></tr></table><p>
</a>
The file <code>index.db</code> under the Netscape cache directory contains the informations
necessary to associate cached files with their original names, sizes, creation
date, file type and so on. It is created by Netscape when first documents
are cached.
<br>
Nolce must not run when Netscape is in execution, because the file
<code>index.db</code>
may be damaged if two programs open it at the same time. To avoid problems,
nolce uses and recognizes the same lock file of Netscape, so when one of
the two programs runs, the other knows that it can't use the cache.
<br>Lock file is a symbolic link called <code>lock</code> created in the directory
<code>$HOME/.netscape</code> .
<br>
(However, from version 1.8-1 the option <code>-l</code> may be used to
ignore lock file, see below)
<p>With those informations nolce can copy those files in a new directory
structure under <code>dest_dir</code> (default is <code>$HOME/cached</code>) which
reflects the directory structure of the original site of the file,
restoring obviously their real names.
<p>
For example if <code>00/cache33BAD64001B0829.html</code> corresponds to an URL like
<code>http://www.rai.it/raiuno/aree.html</code>, the program creates the directory
<code>www.rai.it</code>, then under it the direcory <code>raiuno</code> and finally copies
<code>cache33BAD64001B0829.html</code> into <code>aree.html</code> under it.
<p>A <u>summary file</u> is created as an html file, so after that the program finishes
one can easily know what html documents it retrieved and can easily browse
them.
<br>
When viewing retrieved documents, <font color=#ff0000><i><u><b>links which are in italics are
links to other cached files</b></u></i></font>, so you can view them off-line too.
<br>
Note that some fixed fonts may render italics as bold.
<p>
Copied html files are slightly modified when necessary, but
we'll talk of this in the section <a HREF=#work>HOW IT WORKS</a>.
<p>Nolce <b>doesn't change in any way
the original Netscape cache</b>, which continues to work normally.
<p>
From version 1.5, nolce can also process caches generated by Netscape
for Windows, with the option <code>-p</code>.
<p>
Let's now talk about how using nolce.
First of all you can obtain a small help launching it with <code>--help</code>
and this is what you get:
<hr>
<xmp>
Usage: nolce [n_hours] [OPTIONS]...
Reads Netscape Navigator (ver. 2 and above) cache files created in
the last n_hours hours and copies them in a new directory adjusting
file names and links to permit an off-line navigation of them.
If n_hours isn't supplied, all cached files are processed.
Options:
-c cache_dir directory where cache is, default $HOME/.netscape/cache
-d dest_dir directory where files are copied, default $HOME/cached
-g sub_string process only files whose URL contains sub_string
-G sub_string process only files whose URL doesn't contain sub_string
-i summary_file file name of summary, it will be created in dest_dir,
default is index.html
-w show pages in the same index window
-W show pages in the list frame
-s execute silently
-m don't eliminate missing images
-t put downloading date of documents in summary file
-f don't process links not satisfying initial conditions
-p cache is generated by Netscape for Windows
-l ignore lock files (use with attention: see docs)
-k make symbolic links for non html files
--help shows this help
</xmp>
<hr>
<h4>Some considerations</h4>
<ol>
<li> Giving the <code>n_hours</code> parameter is very useful when you want to process only
the files downloaded during the last connection.
<li> <code>dest_dir</code> is the direcory under which will be created the direcory structures.
The program will distinguish between <code>http://</code> and
<code>ftp://</code> documents putting the
first ones under a subdir <code>http</code> of <code>dest_dir</code> and
the second ones under <code>ftp</code>.
<li><code>summary_file</code> will be always created in dest_dir, even if you supply an
absolute path. If summary file exist, it is not overwritten, but new entries
are appended to it.
<li><code>summary_file</code> contains an entry for every HTML file processed.
<br>To avoid confusion, if a page contains frames, single frames are not
reported in <code>summary_file</code>.
<li>By default, missing images are totally eliminated from the HTML file, so one
doesn't see the Netscape icon indicating them. With the <code>-m</code> option, missing
images are kept.
<li><code>sub_string</code> (options <code>-g</code> or <code>-G</code>) is case sensitive.
<li>Option <code>-p</code> must be used if the cache to be processed
is generated by Netscape for Windows. In this case the name of index
file is assumed to be <code>fat.db</code> and file names are all converted to
lower case, as are Dos files viewed from Linux.
<li>With <code>-l</code>, the cache is processed even if there is a lock
file in <code>$HOME/.netscape</code> . It's useful when the cache specified
with <code>-c</code> isn't the one the Netscape in execution uses, or when
Netscape isn't installed. Use with care, and don't launch more copies of nolce on
the same directory.
<li>Starting from version 1.7, command line switches, unless n_hours, may be grouped, that is
<code>nolce -smc /cache</code> is the same of <code>nolce -s -m -c /cache</code>
or <code>nolce smc/cache</code>.
</ol>
<a name="important">
<table><h4>Important notes</h4>
<p>
<font color=#0000ff>i.</font>
<br>
Seems that Netscape
doesn't save in the cache HTML files whose it couldn't know modification
time, even if related images are saved. Sometimes the percentage of
such files is low, but sometimes it's about the 50% of total files, so this
may be a serious trouble, which, however, can be overridden with a small trick.
<br>In fact Netscape in a first moment saves those files and registers them
in the cache index, but when it exits, checks if there are HTML files
whose it doesn't know modification time and deletes them. So the one
way to maintain these files is to kill brutally Netscape when one
finishes navigation.
<br>When we close Netscape with Ctr-C from the shell, or, worse,
choosing `Exit' from its menu, the browser has all the time for doing
the cache's cleaning we want to avoid, but if we kill it with the SIGKILL
signal its execution ends immediately, because there is no way to
catch and to handle that signal.
<br>The command to give is:
<pre> kill -s 9 `pidof netscape`
</pre>
where <code>`pidof netscape`</code> is a manner to obtain process
identifier of Netscape (see also the command <code>ps</code>).
<br>If there is more than a copy of Netscape running, the above
command will close all of them, so it's better to use:
<pre> kill -s 9 PID
</pre>
where <code>PID</code> is the process ID of <u>your</u> Netscape.
<p>
Killing the browser with SIGKILL, it can't delete lock file, so it's necessary
doing a
<pre> rm $HOME/.netscape/lock
</pre>
A simple shell script can automate this procedure. For
example, for a single user environment, create, somewhere in your path, a
file called (for example) <code>nk</code> with this content:
<pre>
#!/bin/sh
kill -s 9 `pidof netscape`
rm $HOME/.netscape/lock
</pre>
then execute <code>chmod +x</code> on it and you're o.k.
<p>
Note that if you kill Netscape to retrieve at-risk documents, nolce
must to be launched before next Netscape's execution, at the end of which the
browser will do the cache's cleaning it couldn't do in the previous
execution.
<br><u>For this reason, it's not advisable to use the <code>-k</code> switch, that is using
symbolic links for non html documents</u>.
<p>
<font color=#0000ff>ii.</font>
<br>
You may not find everything you expect in the cache, even using the previous tip. It happens that certain documents or
images aren't saved, without any apparent reason.
<br>In any case, it's better to press the STOP button before going away from
a page not completely loaded.
<br>Some images, typically counters provided at run-time by cgi-bin servers, aren't
even saved.
<h4>About parameters n_hours, -g and -f</h4>
When giving the n_hours option, only HTML files which are downloaded after n_hours
ago are processed. Starting from version 1.5 time check is made using
informations of <code>index.db</code> rather than modification time of
the file. This way is faster and better.
<br>Time check is made only on HTML documents. Everything other, that is
images, zip files... are always valid.
<br>If a document that satisfies the n_hours condition has a
link to another which is in the cache, but was downloaded before of n_hours,
nolce processes (that is copies under <code>
dest_dir</code> and adjusts their links) this file also, even if won't
appear in the summary file. If one doesn't want this, the option
<code>-f</code> may be used.
This option has the same meaning also in conjunction with
<code>-g</code> and <code>-G</code>.
<br>Regarding to messages shown during nolce execution, files in order with
the n_hours (or -g, or -G) condition are called <i>main HTML files</i>, the others <i>related
HTML files</i>.
<a name=instr>
<h4>About the summary file</h4>
Starting from version 1.5, the format of summary file changed. Now it's a
document divided into three areas (frames). The strip on the top is the
<u>status</u> frame, the area on the left is the <u>domains</u> frame, and the other is
the <u>list</u> frame.
<br>Domains windows contains all different domains encountered during retrieval
of pages. Clicking on a domain name, available documents, related to that
domain are displayed in the list frame.
<br>To view a retrieved document, click on its icon, while clicking on the URL
the page is downloaded from the Internet (if the connection is active).
<p>If neither <code>-w</code> or <code>-W</code> option is given, pages will be
displayed in another browser window. Normally the other window is created once, then,
if the user doesn't close it, it's used every time a document is selected.
With <code>-W</code> the document is viewed in the list frame,
allowing an easy selections of other domains and other documents.
Finall, with <code>-w</code>, the entire index window is used for viewing pages.
<p>Selecting <u>Lists & domains</u> or <u>Simple List</u> from the status frame,
one can return immediately to the index of processed pages, but in the first
case the default layout (domains + list) is used, while in the second the list area
takes all the space below the status frame.
<h4>Cache generated by Netscape for Windows</h4>
In this case one must use the <code>-p</code> option.
<br>It's better to mount the dos partition with type <code>msdos</code>
rather than <code>vfat</code> because in the first case access is faster and
file names aren't case sensitive.
<p>
<a name=install>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Installation</b>
</font>
</td></tr></table><p>
</a>
This software is available in a package containing both source and binary versions.
<br>It can be obtained at
<br><a href="ftp://sunsite.unc.edu/pub/Linux/apps/www/plugins">ftp://sunsite.unc.edu/pub/Linux/apps/www/plugins</a> and at
<br><A HREF="http://members.tripod.com/~giustrov/download.html">http://members.tripod.com/~giustrov/download.html</a>
<p>For using this program, you must have installed the DB library.
It's necessary to read records
stored in the <code>index.db</code> file.
<br>
In practice you need <code>libdb.so</code> to run the compiled version, and also db include
files to compile the program.
<br>
For Linux, with Slackware and Redhat distributions, the library should be
present by default.
<br>
For the include files, with Redhat you must install a package called
<code>db-devel</code> or similar. For Slackware, they are in
<code>libc.tgz</code>, so they aren't a problem.
<br>
<p>
For compiling, cd to <code>src</code> subdir and do <code>make</code>.
<br>
Do <code>make install</code> to compile and copy the executable in
<code>/usr/bin</code>, the man
page in <code>/usr/man/man1</code> and the documentation in
<code>/usr/doc/nolce-VERSION</code>.
<br>
If standard destinations don't fit your taste, modify them in the Makefile.
<p>
<a name=comp>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>Compatibility</b>
</font>
</td></tr></table><p>
</a>
I have tested the program under Linux only, and with Netscape Navigator 3.01,
4.0b5, and 4.03 .
<br>
Probably it works with version 2.0 also, since the present format of the cache
was introduced with this release.
<br>
It should work also with other Unix, if their Netscape indexes its cache in the
same way of the linux version, that is with a DB hash file named
<code>index.db</code> under <code>$HOME/.netscape/cache</code>.
<br>
If the name is different, it's easy to
change the value of CACHE_FILE, in the defines section of the source file.
<br>
From the point of view of the language, I use code conforming to ANSI C or
POSIX standards only, so if your system supports them, there must be no
problems.
<p>
As I know, the following circumstances may cause problems or errors in
compiling <code>nolce</code>:
<ol>
<li> Makefile assumes that your <code>make</code> correctly defines the
variable <code>CC</code> as your
site compiler name (i.e cc or gcc).
This must be ensured by every <code>make</code>, but if not, define them by hand.
<li>The flex program used must be a real flex, that is not an emulation of the
original lex, as it happens using <code>flex -l</code>. This is what happens on some Slackware systems, where flex calls the real
program flex.slk with the -l option. The result is a segmentation fault error
when nolce is executed.
<br>In this situation, adding <code>-Darray</code> to <code>DEFINES</code> in the Makefile (see below),
solves the problem.
<br>A line <code>LEX=flex</code> is present in the Makefile. On non Linux systems, this probably should be changed.
<li> The behavior of the lex program may change. Apart from program options,
it often requires linking with some libraries. The Linux standard lex, that
is GNU flex, requires the <code>-lfl</code> library, and it's provided in the variable
<code>LDFLAGS</code> of the Makefile.
<br>If your site uses a different lex, read its documentation and change the
Makefile accordingly.<br>Possible options needed by the program may be given
in the <code>LFLAGS</code> variable.
<li> The program interfaces with the lexical analyzer through the usual
<code>yylex()</code>
function, called in the <code>process_html_file</code> of <code>main.c</code>.
Input and output files are supplied to yylex with the extern variables
<code>yyin</code> and <code>yyout</code>. Probably this is not conforming with original AT&T lex,
but, as I know, it conforms to POSIX specification for lex, and, above all,
it's almost the only way one can use with flex.
<li> Flex defines <code>yytext</code> as a char pointer, while other lex may define it as a
char array. If this is your case, you must compile <code>main.c</code> with
the <code>-Darray</code>
option, which can be done by setting the variable <code>DEFINES</code> of the Makefile.
</ol>
If problems persist, send me an e-mail, describing, besides the problem, what
system you're using, what lex and so on. But I haven't access to other systems
further my Linux machine, so, don't expect a certain solution.
<p>
If you discovery a bug, i.e. an abnormal exit of the program with a Segmentation Fault error, please let me know. You should send me an e-mail with a brief
description of the circumstances under which the error happened, command line
options, and above all the core file generated by the program (compress it to avoid mail messages too heavy).
<br>Shells permit to decide if one wants to obtain a core dump after an abnormal
termination of a program. With <code>bash</code> see the command <code>ulimit
</code>.
<br>For being the core file useful to me, it must be generated by a program
compiled with debug info: add the option <code>-g3</code> to <code>CFLAGS</code> in the Makefile. If you have <code>libg</code> installed, add also <code>-lg</code> to <code>LDFLAGS</code>.
<p>
However, before sending the core file, it could be useful the simple output of
<code>gdb</code>. In case of problems, compile nolce with debug infos, launch it from
the debugger, and when the execution stops with the error, inside <code>gdb</code>,
give the command <code>bt</code> and send to me the informations displayed.
<p>
<a name=work>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<font size="+2" color=#eeffff>
<b>How it works</b>
</font>
</td></tr></table><p>
</a>
i. <font color=#0000ff>INDEX.HTML</font>
<p>
A lot of urls, i.e. <code>http://home.netscape.com</code>, don't contain an HTML file name.
<br>In this situation the server provides a default HTML file, usually
<code>index.html</code>,
and nolce appends this same name to these urls.
<br>It could happen that an HTML file contains a link to such an url with the file
name explicited. If this name is different from <code>index.html</code>, the link doesn't
work.
<p>
ii. <font color=#0000ff>LINKS</font>
<p>
The main work nolce does is changing links in HTML files to point to local
files.
<br>
There are various types of links (imagine you're browsing the document
<code>http://www.aaaa.com/bbb/index.html</code>):
<ul>
<li><font color=#058805>Relative</font> links, i.e. <code>HREF="ccc/image.gif"</code>. In this case the browser loads
the file <code>image.gif</code> from the directory <code>ccc</code> under
<code>bbb</code>.
<li><font color=#058805>Absolute</font> links, i.e.
<code>HREF="http://www.aaaa.com/ccc/image.gif"</code>. In this
case Netscape will always try to obtain the document from the net, so
nolce transforms the link in something like <code>"../ccc/image.gif"</code>.
<li><font color=#058805>Base-related</font> links, i.e <code>HREF="/ccc/image.gif"</code>. These links must be
interpreted as <code>http://www.aaaa.com/ccc/image.gif</code>, not regarding of the
directory in which the HTML files is.
</ul>
<p><u>
If a link points to a document present in the cache, it is changed to a
relative link, otherwise it's turned in an absolute link.
</u><p>
iii. <font color=#0000ff>LEX</font>
<p>
If your lex program is GNU flex, the flag <code>-Cf</code> may be given to it (put in
the variable <code>LFLAGS</code> of the Makefile). This makes the program bigger, but
execution speeds up of 10-15%.
<p>
iv. <font color=#0000ff>MISCELLANEOUS</font>
<p>
<ul>
<li> In the file <code>nolce.h</code> there are some defines which can be customized.
<li> Links pointing to documents which are present in the cache are in italics.
Obviously the HTML document can contain links which are in italics of
origin, and in this case they may point to non-local files.
<br>Besides, if a link is presented as a formatted text, i.e <code><h3>Link</h3></code>, the italics isn't shown.
<li>If two or more versions of a document are present in the cache, the more
recent is taken with its original name; for the others a progressive number
is appended to the url.
<li>Netscape seems to have problems to follow links to local files which
contain characters like <code>`?'</code>. Mainly for this reason, when creating
directories, strange characters like <code>`?', `=', `('</code> and so on are substituted
with an underscore.
</ul>
<p>
<a name=author>
<table cellpadding=3 width="100%" border=0 bgcolor=#1188ff><tr><td>
<h2>
<font color=#eeffff>
Contacting the author
</font>
</td></tr></table><p>
</>
For any question, bug report or comment, email to <a href="mailto:g.trovato@usa.net">g.trovato@usa.net</a>
<br>
My home page is<br>
<a href="http://members.tripod.com/~giustrov">http://members.tripod.com/~giustrov</a>
<p>
Nolce web page is:<br>
<a href="http://members.tripod.com/~giustrov/nolce.html">
http://members.tripod.com/~giustrov/nolce.html</a>
<a href="LICENCE">
<h2>
<font color=#ff0000>
LICENCE
<hr noshade size=2>
</h2>
</font>
</a>
</body>
</html>
nolce-1.9.2/docs/frame_toc.html 100644 0 0 2615 6506751175 14477 0 ustar root root
<html>
<head>
<title>Menu</title>
<base target="docs">
</head>
<body bgcolor=#b5b2b5>
<font size=3>
<form>
<table width="100%" border=0 cellspacing=0 cellpadding=0>
<tr align=center>
<td>
<b>
<input type="button" value="Introduction" onclick="parent.docs.location.href='frame_docs.html#start'">
</b>
</td><td>
<b>
<input type="button" value="Usage and what the program does" onclick="parent.docs.location.href='frame_docs.html#usage'">
</b>
</td><td>
<b>
<input type="button" value="Installation" onclick="parent.docs.location.href='frame_docs.html#install'">
</b>
</td>
<td>
<b><font color=#0000ff>
<input type="button" value="Changes"
onclick="parent.docs.location.href='CHANGES.html'">
</b></font>
</td>
</tr>
<tr align=center><td>
<b>
<input type="button" value="Compatibility" onclick="parent.docs.location.href='frame_docs.html#comp'">
</b>
</td><td>
<b>
<input type="button" value="How it works" onclick="parent.docs.location.href='frame_docs.html#work'">
<input type="button" value="Contacting the author" onclick="parent.docs.location.href='frame_docs.html#author'">
</b>
</td><td>
<b><font color=#0000ff>
<input type="button" value="Licence"
onclick="parent.docs.location.href='LICENCE'">
</b></font>
</td><td>
<b><font color=#ff0000>
<input type="button" value="Important!" onclick=" parent.docs.location.href='frame_docs.html#important';">
</b></font>
</td>
</tr>
</table>
</form>
</font>
</body>
</html>
nolce-1.9.2/docs/nolce.1 100644 0 0 7653 6506755102 13035 0 ustar root root .TH nolce 1 "26 March 1998" "Nolce version 1.9.2" \" -*- nroff -*-
.SH NAME
nolce - allow off-line Netscape Navigator cache browsing
.SH USAGE
.B nolce
[n_hours] [-c cache_dir] [-d dest_dir] [-g|-G sub_string] [-i summary_file] [-w|-W] [-s] [-m] [-t] [-f] [-l] [-k] [--help]
.SH DESCRIPTION
Nolce copies Netscape Navigator (ver. 2 and above) cache files
in a new directory, adjusting file names and links to permit an off-line
navigation of them.
.SH OPTIONS
.TP
.I "n_hours"
Process only files created, that is downloaded, in the last n_hours hours.
If this option isn't given, nolce processes all the files of the cache.
.TP
.I "-c cache_dir"
Process cache files under the cache_dir directory. Default is
.BR $HOME/.netscape/cache .
.TP
.I "-d dest_dir"
Store processed files under dest_dir. Default is
.BR $HOME/cached .
.TP
.I "-g sub_string"
Process only HTML files whose URL contains the specified sub_string
.TP
.I "-G sub_string"
Process only HTML files whose URL doesn't contain the specified sub_string
.TP
.I "-i summary_file"
Put summary about documents retrieved in the html file summary_file. It will
contain titles of documents, and links both to local copy and to
original site of them. Even if it's given an absolute path, the file will
be always created in dest_dir. Default is
.BR index.html .
If summary file exists, it is not overwritten, but new entries are appended to
it.
.TP
.I "-w, -W"
By default, retrieved pages are displayed in another window. With
.BR -w
the same index window is used, while with
.BR -W
they are displayed in the list frame.
.TP
.I "-s"
Programs executes silently.
.TP
.I "-m"
By default, missing images are totally eliminated from the HTML file, so one doesn't see
the Netscape icon indicating them. With this option, missing
images are kept.
.TP
.I "-t"
Put downloading date of each document in summary file.
.TP
.I "-f"
Don't process links not satisfying initial conditions.
When one of the options
.BR n_hours ,
.BR -g
or
.BR -G
is given, if a document has a link to another one which doesn't satisfy those
conditions, the link is processed also, unless the -f option is
given. However, even when processed, these links doesn't appear in the summary
file.
.TP
.I "-l"
Ignore lock file. With this option, the cache is processed even if there is a lock
file in $HOME/.netscape . It's useful when the cache specified with
.BR -c
isn't the one the Netscape in execution uses, or when Netscape isn't installed. Use with care, and don't launch
more copies of nolce on the same directory.
.TP
.I "-k"
Make symbolic links for images and other non html files. This permits to reduce
disk usage, but pages depend from files in the Netscape's cache, which sooner or later are deleted by the browser.
.TP
.I "--help"
Shows help message and exit.
.SH OVERVIEW
Netscape normally stores all types of downloaded documents (html files, images and so
on) under the directory $HOME/.netscape/cache, but it changes their names and
their relative position, so it isn't possible to browse them off-line.
.PP
Nolce basically does three things: copies cached files in a new directory
structure which reflects the original site structure, restores their original
file names, and adjusts links in html files to point to local files rather than
to Internet ones.
.I The original Netscape cache is left intact.
After running the program, user should open the summary
file in Netscape Navigator, and from it view desired documents.
Alternatively one can explore the directories and files created under
dest_dir.
.PP
When viewing processed documents, links in
.I italics
refer to local documents,
while normal links to Internet ones.
.PP
For further informations see the next section.
.SH FILES
This software comes with a LICENCE file and a README file in the HTML format.
Refer to them for further informations, especially to
the section `HOW IT WORKS' for knowing how the program acts and what it does
and doesn't.
.SH AUTHOR
Giuseppe Trovato (g.trovato@usa.net)
nolce-1.9.2/docs/LICENCE 100644 0 0 6001 6506751175 12630 0 ustar root root Nolce is Copyright (C) 1997-98 Giuseppe Trovato (g.trovato@usa.net)
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTY IS DISCLAIMED.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES.
Redistribution in source and binary forms, are permitted provided
that the following conditions are met:
1. Redistributions of this software must include all the files, left
intact, present in this package. You may omit only the sources files,
that is only main.c, utils.c, skeletons.c, scanner.lex, Makefile and
nolce.h, or, alternatively the binary executable nolce.
2. Without the written permission of the author, nobody can obtain money
for this software, or any software which include portions of it, with
the exception of that money necessary for the physical support and the
copying operation.
Vice versa, this software, or portions of it, even modified, can be
included in a free software, provided that it contains a copyright note
like this: "This software contains parts which are copyright (C) 1997
Giuseppe Trovato".
-*---**---*-
This software is cardware for non-commercial (i.e personal or educational)
use. That is you must send me a postal card of your town if using this program.
If this is a too big effort for you, I'll be satisfied with an e-mail
message :-)
Even if you do the right thing, sending a postal card, an e-mail message is
useful if you want to be notified of new versions, enhancements, bug fixes...
Commercial users must register themselves. Commercial use means using this
program in making your work or business.
Registration fee is 15$ (or Lit. 25000) for a single-user machine and 25$ (or
Lit. 42000) for a multi-user one.
A Licence is required for every machine on which the program is installed.
Payment must be made as check payable to me or, less recommended, as cash.
Registered users will receive a Licence document on paper and will be notified
via e-mail about new versions of this software.
So, a letter requesting registration must include postal address, e-mail,
specification of the type of Licence (single or multi user), and, obviously,
money :-)
Further the postal mail, send me an e-mail informing about your request.
In no event I will be responsible of problems concerning postal delivery of
the request of registration.
Obviously, registration or sending postal card is required once, not every time
you download a new version.
-*---**---*-
If nolce is distributed in a physical support, I would like to receive a
copy of it, while if you make available it in a new site, please let me know
about it.
My postal address is:
Giuseppe Trovato
Via F. Ferruccio 36
91011 Alcamo (TP)
ITALY
Email: g.trovato@usa.net
nolce-1.9.2/LICENCE 120777 0 0 0 6506751175 13537 2docs/LICENCE ustar root root nolce-1.9.2/src/ 40755 0 0 0 6506755637 11417 5 ustar root root nolce-1.9.2/src/main.c 100644 0 0 60271 6506754745 12631 0 ustar root root
/*
* main.c, utils.c, nolce.h : Copyright (C) 1997-98 Giuseppe Trovato
* (g.trovato@usa.net)
* Version 1.9.2
* Read LICENCE file before using the program.
*/
/**************************** INCLUDES *************************************/
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <signal.h>
#include <db.h>
#include <ctype.h>
#include "nolce.h"
/************************ EXTERN VARIABLES *********************************/
char *netscape_dir, *lock_file, *curr_dir;
char g_or_G = '\0';
size_t len_cache_dir;
NODE *root = NULL;
long total = 0;
int status = 0;
bool yet_to_process = 0;
T_OPT opt = {0, 0, NULL, NULL, NULL, NULL};
enum vto
{ process_main, process_related, summarize };
enum pr
{ with_frames, is_image, normal };
/*************************** SIGNAL HANDLER **********************************/
void
sig_handler (int sig_number)
{
if (sig_number == SIGINT)
{
char c;
printf ("\n---- Do you really want to quit (y/n)? ");
c = getchar ();
while (getchar () != '\n');
if ((c != 'y') && (c != 'Y'))
{
signal (SIGINT, sig_handler);
if ((status == 1) && !(opt.var & SILENT))
printf ("Processing main HTML files: ");
if ((status == 2) && !(opt.var & SILENT))
printf ("Processing related HTML files: ");
fflush (stdout);
return;
}
}
if (sig_number == SIGSEGV)
printf ("\nI feel unwell... it's better to die!\n");
else
printf ("\nBye!\n");
unlink (lock_file);
fflush (NULL);
signal (sig_number, SIG_DFL);
raise (sig_number);
}
/****************************** MAIN ***************************************/
int
main (int argc, char **argv)
{
int k, j;
char temp[64];
time_t n_hours = 0, t, cut_time;
char *home, *err, *ls;
int len_home;
size_t currdir_size = 0;
/*
* Obtains home dir and Netscape's dir.
*/
home = getenv ("HOME");
if (home)
len_home = strlen (home);
else
perror_exit ("\nnolce:\nCould not obtain your home dir from the enviroment, exiting...")
do
{
currdir_size += 30;
if (currdir_size > 10000)
perror_exit ("\nnolce:\nCould not read current dir")
curr_dir = (char *) gt_realloc (curr_dir, currdir_size);
}
while (!getcwd (curr_dir, currdir_size));
netscape_dir = (char *) gt_malloc (len_home + 1 + 9 + 1);
sprintf (netscape_dir, "%s/.netscape", home);
lock_file = (char *) gt_malloc (len_home + 1 + 9 + 1 + 4 + 1);
sprintf (lock_file, "%s/lock", netscape_dir);
/*
* Command line scanning.
*/
if (argc == 1)
{
char ans;
printf ("Ok to process all files under %s/.netscape/cache ([y]es/[n]o/[h]elp) ? ", home);
ans = getchar ();
while (getchar () != '\n');
if (ans == 'n')
exit (EXIT_SUCCESS);
else if (ans == 'h')
help ();
}
for (k = 1; k < argc; k++)
{
if (!strncmp (argv[k], "--h", 3))
help ();
t = strtoul (argv[k], &err, 10);
if (err && *err) /* Arg isn't n_hours */
{
char sw;
char **ptr;
size_t len = strlen (argv[k]);
for (j = 0; j < len; j++)
{
sw = argv[k][j];
ptr = NULL;
switch (sw)
{
case '-': break;
case 'w': opt.view_window = 1; break;
case 'W': opt.view_window = -1; break;
case 'm': opt.var |= MISSING_IMAGES; break;
case 's': opt.var |= SILENT; break;
case 't': opt.var |= REPORT_TIME; break;
case 'f': opt.var |= NO_LINKS; break;
case 'p': opt.var |= WIN_CACHE; break;
case 'l': opt.var |= IGNORE_LOCK; break;
case 'k': opt.var |= LINK_FOR_IMAGES; break;
case 'c': ptr = &opt.cache_dir; break;
case 'd': ptr = &opt.dest_dir; break;
case 'i': ptr = &opt.summary_file; break;
case 'g':
case 'G':
g_or_G = sw;
ptr = &opt.str_tbc;
break;
default:
print_error (sw, argv[0], illegal);
}
if (ptr)
{
bool oper = 1;
if (strchr ("igG", sw))
oper = 0;
if (*ptr)
print_error (sw, argv[0], already_supplied);
if (j == len - 1)
{
if (k < argc - 1)
process_arg (argv[++k], ptr, oper);
else
print_error (sw, argv[0], need_arg);
}
else
{
process_arg ((argv[k] + j + 1), ptr, oper);
break;
}
}
}
}
else
/*
* Arg is n_hours
*/
{
if (n_hours)
print_error ('\0', argv[0], already_supplied);
else
n_hours = t;
}
}
/*
* End of command line scanning.
*/
/*
* If some parameters aren't given, the defaults are used.
*/
if (!opt.cache_dir)
{
opt.cache_dir = (char *) gt_malloc (len_home + 1 + 9 + 1 + 5 + 1);
sprintf (opt.cache_dir, "%s/cache", netscape_dir);
}
len_cache_dir = strlen (opt.cache_dir);
if (!opt.dest_dir)
{
opt.dest_dir = (char *) gt_malloc (len_home + 1 + strlen (DEST_DIR) + 1);
sprintf (opt.dest_dir, "%s/%s", home, DEST_DIR);
}
if (!opt.summary_file)
{
opt.summary_file = (char *) gt_malloc (strlen (SUMMARY_FILE) + 1);
strcpy (opt.summary_file, SUMMARY_FILE);
}
else
/*
* Takes only the final file name of a path
* which may contain also the direcory.
*/
if ((ls = strrchr (opt.summary_file, '/')))
{
gt_strshift (opt.summary_file, 1 + ls - opt.summary_file);
printf ("summary file is: %s/%s\n", opt.dest_dir, opt.summary_file);
}
/*
* Turns signal handler on.
*/
signal (SIGINT, sig_handler);
signal (SIGQUIT, sig_handler);
signal (SIGTERM, sig_handler);
signal (SIGSEGV, sig_handler);
if (! (opt.var & IGNORE_LOCK))
{
/*
* Creates lock file.
*/
if (!chdir (netscape_dir))
{
sprintf (temp, "1.0.0.127:%ld", (long) getpid ());
if (symlink (temp, lock_file))
{
fprintf (stderr, "\nnolce:\nThe program can't get exclusive access to the cache."
"\nIf Netscape or another copy of nolce is running, exit from it and"
"\nretry, otherwise delete the file $HOME/.netscape/lock\n");
exit (EXIT_FAILURE);
}
}
else
perror_exit ("\nnolce:\nCould not access $HOME/.netscape dir")
}
/*
* n_hours stuff.
*/
if (n_hours != 0)
{
if (n_hours < 0)
n_hours = -n_hours;
if (n_hours > 175200)
n_hours = 175200; /*
* Max 20 years, for avoiding problems
* with data overflow.
*/
cut_time = time (NULL) - n_hours * 3600L;
}
else
cut_time = 0;
/*
* Reads the cache index.
*/
root = get_data (root, cut_time, opt.str_tbc);
if (root)
{
if (!(opt.var & SILENT))
printf (" (done)");
fflush (stdout);
/*
* Creates dest_dir.
*/
mkdir (opt.dest_dir, S_IRWXU | S_IRWXG | S_IRWXO);
chdir (opt.dest_dir);
mkdir ("nolce_files", S_IRWXU | S_IRWXG | S_IRWXO);
start_summary_files (CONST, NULL);
if (!(opt.var & SILENT))
printf ("\nProcessing main HTML files: ");
status = 1;
/*
* Reads the tree to process html files.
*/
visit_tree (root, process_main);
if (yet_to_process && !(opt.var & SILENT))
printf ("\nProcessing related HTML files: ");
status = 2;
do
{
yet_to_process = 0;
visit_tree (root, process_related);
}
while (yet_to_process);
if (!(opt.var & SILENT))
{
printf ("\nWriting summary file `%s/%s'", opt.dest_dir, opt.summary_file);
fflush (stdout);
}
status = 3;
/*
* Creates the summary file.
*/
visit_tree (root, summarize);
put_in_summary_files (NULL, 1);
if (!(opt.var & SILENT))
printf (" (all done)\n");
}
else if (total == 0)
printf ("\nNo documents were found, or could be accessed, matching requested conditions.\n");
/*
* Removes lock file.
*/
if (! (opt.var & IGNORE_LOCK))
unlink (lock_file);
fflush (NULL);
return 0;
}
/********************** Function get_data *************************************
* Reads the cache index file (index.db) and creates the binary tree with all *
* the informations about cached files. The root of the tree is "start", *
* "cut_time" is the correspondent of n_hours, "sub_str" is the string *
* supplied with -g or -G. *
******************************************************************************/
NODE *
get_data (NODE * start, time_t cut_time, char *sub_str)
{ /* Creates b-tree. */
DB *index;
DBT key, data;
int type;
size_t len;
time_t d_time;
bool tc;
struct stat attr;
char *url, *file, *c_url, *path, *cont;
if (!(opt.var & WIN_CACHE))
{
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (CACHE_FILE) + 1);
sprintf (path, "%s/" CACHE_FILE, opt.cache_dir);
}
else
{
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (WIN_CACHE_FILE) + 1);
sprintf (path, "%s/" WIN_CACHE_FILE, opt.cache_dir);
}
chdir (opt.cache_dir);
index = dbopen (path, O_RDONLY, 0, DB_HASH, NULL);
if (index == NULL)
{
fprintf (stderr, "\n\nnolce:\nthe supplied cache dir (%s) isn't valid, "
"or I can't read the index file.\n", opt.cache_dir);
total = -1;
return NULL;
}
if (!(opt.var & SILENT))
{
printf ("Processing cache information from `%s'", path);
fflush (stdout);
}
while (!index->seq (index, &key, &data, R_NEXT))
{
url = (char *) (key.data);
if (url)
{
url += 8;
if (!url_is_valid (url))
continue;
file = (char *) (data.data);
cont = (char *) (data.data);
file += 33;
d_time = *((time_t*) (cont + 12));
cont += (71 + strlen (file));
if (strstr (cont, "html") || strstr (cont, "x-www"))
type = 1;
else
type = 0;
if (!strncmp (url, "wysiwyg", 7))
continue;
if ((opt.var & WIN_CACHE))
{
int k, len;
len = strlen (file);
for (k = 0; k < len; k++)
file[k] = tolower (file[k]);
}
if (type)
{
if (stat (file, &attr))
continue;
len = strlen (url) + 1;
c_url = (char *) gt_malloc (len);
strcpy (c_url, url);
}
standardize (url);
if (type)
{
tc = d_time > cut_time;
if (tc && g_or_G)
{
if (g_or_G == 'g')
tc = (strstr (c_url, sub_str) != NULL);
else
tc = (strstr (c_url, sub_str) == NULL);
}
if (check_html_url (url, 1))
start = add_url (start, c_url, url, (tc) ? (IS_HTML | NEED_INDEX | PR_MAIN | SHOW) : IS_HTML | NEED_INDEX, file, d_time);
else
start = add_url (start, c_url, url, (tc) ? (IS_HTML | PR_MAIN | SHOW) : IS_HTML, file, d_time);
free (c_url);
}
else
start = add_url (start, "", url, 0, file, d_time);
}
}
index->close (index);
free (path);
if (total > 0)
return start;
else
return NULL;
}
/********************** Function add_url **************************************
* Adds informations for a file to the binary tree. First searches recursively *
* starting from "node" the right place for the new node, then creates it. *
* Informations stored are: "or_url", the original url of the file; "url", the *
* url modified by standardize(); "type", a bitmapped variable indicating the *
* type (image, html...) of the file; "file_name", the name under the cache *
* dir; "mtime", the time when the file was created/modified. *
******************************************************************************/
NODE *
add_url (NODE * node, char *or_url, char *url, int type,
char *file_name, time_t mtime)
{
int cmp = 1;
if (node != NULL)
cmp = strcmp (url, node->url);
if (node == NULL)
{
size_t len;
node = (NODE *) gt_malloc (sizeof (NODE));
len = strlen (url);
node->url = (char *) gt_malloc (len + 1);
strcpy (node->url, url);
node->type = type;
node->mod_time = mtime;
node->dup = 1;
node->file_name = (char *) gt_malloc (strlen (file_name) + 1);
strcpy (node->file_name, file_name);
if (type & IS_HTML)
{
node->or_url = (char *) gt_malloc (strlen (or_url) + 1);
strcpy (node->or_url, or_url);
node->title = (char *) gt_malloc (9);
strcpy (node->title, "Untitled");
if (type & NEED_INDEX)
{
node->url_w_index = (char *) gt_malloc (len + 1 + strlen (DEFAULT_HTML) + 1);
strcpy (node->url_w_index, url);
check_html_url (node->url_w_index, 0);
node->r_url = node->url_w_index;
}
else
node->r_url = node->url;
}
node->sx = NULL;
node->dx = NULL;
if (type & PR_MAIN)
total++;
}
else if (cmp == 0)
{
if (strcmp (node->file_name, file_name))
{
if (type & IS_HTML)
{
char *temp;
temp = (char *) gt_malloc (strlen (url) + 8 + 1);
sprintf (temp, "%s-%u", url, ++(node->dup));
if (mtime > node->mod_time)
{
node = add_url (node, node->or_url, temp, node->type, node->file_name, node->mod_time);
if ((type & PR_MAIN) && !(node->type & PR_MAIN))
total++;
node->type = type;
node->mod_time = mtime;
node->file_name = (char *) gt_realloc (node->file_name, strlen (file_name) + 1);
strcpy (node->file_name, file_name);
}
else
node = add_url (node, or_url, temp, type, file_name, mtime);
free (temp);
}
else if (mtime > node->mod_time)
{
node->mod_time = mtime;
node->file_name = (char *) gt_realloc (node->file_name, strlen (file_name) + 1);
strcpy (node->file_name, file_name);
}
}
}
else if (cmp < 0)
node->sx = add_url (node->sx, or_url, url, type, file_name, mtime);
else if (cmp > 0)
node->dx = add_url (node->dx, or_url, url, type, file_name, mtime);
return node;
}
/********************** Function visit_tree ***********************************
* Visits (reads) the binary tree in pre-order, and, depending of "operation", *
* calls process_html_file() or put_in_summary_files(); *
******************************************************************************/
void
visit_tree (NODE * node, int operation)
{
static long c = 0;
if (node == NULL)
return;
else
{
visit_tree (node->sx, operation);
if ((operation == process_main) && (node->type & PR_MAIN))
{
if (!(opt.var & SILENT))
{
printf ("\r\t\t\t\t%ld/%ld", ++c, total);
fflush (stdout);
}
process_html_file (node);
}
if ((operation == process_related) && (node->type & PR_REL)
&& !(node->type & PROCESSED))
{
if (!(opt.var & SILENT))
{
printf ("\r\t\t\t\t%ld", ++c - total);
fflush (stdout);
}
node->type |= PROCESSED;
process_html_file (node);
}
if ((operation == summarize) && (node->type & SHOW))
put_in_summary_files (node, 0);
visit_tree (node->dx, operation);
}
}
/********************** Function find_url *************************************
* Finds the node in the tree whose root is "node" containing informations *
* about the url "url". *
******************************************************************************/
NODE *
find_url (NODE * node, char *url)
{
int comp;
if (node != NULL)
{
comp = strcmp (url, node->url);
if (comp == 0)
return node;
else if (comp < 0)
return (find_url (node->sx, url));
else
return (find_url (node->dx, url));
}
else
return NULL;
}
/********************** Function process_html_file ****************************
* Scans, through yylex(), the html file referred by "node" and writes it, *
* with the adjustments provided by process_reference() to make links working, *
* the correct directory under dest_dir. *
******************************************************************************/
extern int yylex (void);
extern FILE *yyin, *yyout;
#ifdef array
extern char yytext[];
#else
extern char *yytext;
#endif
extern unsigned char i_a_tag, i_frame, found;
extern char title[], img_other[];
extern char *img_src;
void
process_html_file (NODE * node)
{
char *http_base, *mod_base, *fname, *dup_or_url, *path;
FILE *orig, *dest;
size_t len = strlen (node->or_url) + strlen (DEFAULT_HTML) + 2;
http_base = (char *) gt_malloc (len);
dup_or_url = (char *) gt_malloc (len);
mod_base = (char *) gt_malloc (strlen (node->r_url) + 1);
cut_path (node->r_url, mod_base);
strcpy (dup_or_url, node->or_url);
check_html_url (dup_or_url, 0);
cut_path (dup_or_url, http_base);
path = (char *) gt_malloc (len_cache_dir + 1 + strlen (node->file_name) + 1);
sprintf (path, "%s/%s", opt.cache_dir, node->file_name);
orig = fopen (path, "r");
if (orig != NULL)
{
enum rets
{
TITLE = 1, BASE, REF, IMG
};
int what;
fname = dirs_and_name (node->r_url);
dest = fopen (fname, "w");
if (dest == NULL)
perror_exit ("\nnolce:\nCouldn't open html file for writing")
yyin = orig;
yyout = dest;
i_a_tag = 0;
found = 0;
while ((what = yylex ()))
switch (what)
{
case BASE:
http_base = (char *) gt_realloc (http_base, strlen (yytext) + strlen (DEFAULT_HTML) + 2);
if (!url_is_valid (yytext))
break;
strcpy (http_base, yytext);
check_html_url (http_base, 0);
*(strrchr (http_base, '/') + 1) = '\0';
break;
case TITLE:
if (title[0] != '\0')
{
node->title = (char *) gt_realloc (node->title, strlen (title) + 1);
strcpy (node->title, title);
}
break;
case REF:
if (i_frame)
process_reference (yytext, http_base, mod_base, fname, yyout, with_frames);
else if (i_a_tag)
found = !process_reference (yytext, http_base, mod_base, fname, yyout, normal);
else
process_reference (yytext, http_base, mod_base, fname, yyout, normal);
break;
case IMG:
process_reference (img_src, http_base, mod_base, fname, yyout, is_image);
break;
}
fclose (orig);
fclose (dest);
free (http_base);
free (mod_base);
}
else
{
node->type &= (~SHOW);
perror ("\n\nnolce:\nWARNING: Couldn't open cache file for reading");
}
free (path);
}
/********************** Function process_reference ****************************
* Adjusts links. First transforms the link to an absolute url, with *
* check_path(), then checks if that url corresponds to a file in the cache, *
* with find_url(). If yes, the url is retransformed to a a relative path, *
* with relative_position, then, if it doesn't refer to an html document, *
* make_link() or copy_file() are called. *
* "ref" is the link to process, "base" is the BASE url of the document to *
* whom the link belongs, "mod_base" is like the previous, but in the format *
* of standardize(), "file_name" is the filename of the document, "out" the *
* file with the adjusted version of the document, "operation" is used in *
* in case that link refers to an image. *
******************************************************************************/
bool
process_reference (char *ref, char *base, char *mod_base,
char *file_name, FILE *out, int operation)
{
NODE *point = NULL;
char *cleaned_url, *abs_url, *a, *anchor = NULL;
bool ret_value = 0;
char *topic, *eff;
size_t len = strlen (ref), lbase = strlen (base), ldh = 1 + strlen (DEFAULT_HTML),
lname = strlen (file_name);
if (!strcmp (ref, " photo.html"))
printf ("\n*****************");
if (ref && (eff = strtok (ref, "' \"")))
{
topic = (char *) gt_malloc (lbase * 2 + len + ldh + lname + 1);
strcpy (topic, eff);
}
else
return 1;
if (!strstr (topic, "internal-gopher"))
{
cleaned_url = (char *) gt_malloc (lbase + len + ldh + lname + 1);
abs_url = (char *) gt_malloc (lbase + len + ldh + lname + 1);
if ((a = strrchr (topic, '#')))
{
anchor = (char *) gt_malloc (strlen (a) + 1);
strcpy (anchor, a); /* If we have an anchor, we must */
if (a == topic) /* eliminate it when checking if */
strcpy (topic, file_name); /* the file is in the cache. */
else
*a = '\0';
}
check_path (cleaned_url, base, topic);
if (!url_is_valid (cleaned_url))
return 1;
strcpy (abs_url, cleaned_url);
standardize (cleaned_url);
point = find_url (root, cleaned_url);
if (!point)
{
char *sl;
sl = strrchr (cleaned_url, '/');
if (!strcmp (sl + 1, DEFAULT_HTML))
{
*sl = '\0';
point = find_url (root, cleaned_url);
if (!point)
*sl = '/';
}
}
if (point)
{
if (point->type & NEED_INDEX)
strcpy (cleaned_url, point->url_w_index);
relative_position (cleaned_url, mod_base, topic);
if (!(point->type & IS_HTML))
{
if (opt.var & LINK_FOR_IMAGES)
ret_value = make_link (point->url, point->file_name);
else
ret_value = copy_file (point->url, point->file_name);
}
else
if (operation == with_frames)
point->type &= (~SHOW);
/*
* When a link points to a document present in the cache, but not
* processed because it doesn't satisfy the n_hour condition,
* this action is taken.
*/
if ((point->type & IS_HTML) && !(point->type & PR_MAIN))
{
if (!(opt.var & NO_LINKS))
{
point->type |= PR_REL;
yet_to_process = 1;
}
else
ret_value = 1;
}
if (anchor)
{
strcat (topic, anchor);
free (anchor);
}
}
else
ret_value = 1;
if (ret_value)
strcpy (topic, abs_url);
free (cleaned_url);
free (abs_url);
}
if (operation == is_image)
{
if ((opt.var & MISSING_IMAGES) || (!(opt.var & MISSING_IMAGES) && !ret_value))
fprintf (out, "<IMG SRC=\"%s\" %s>", topic, img_other);
}
else
fprintf (out, "\"%s\"", topic);
free (topic);
return ret_value;
}
/********************** Function make_link ************************************
* Makes a symbolic link to a non-html file in the cache. *
******************************************************************************/
bool
make_link (char *url, char *file)
{
int ret;
char *link_name, *orig_file;
struct stat attr;
link_name = dirs_and_name (url);
orig_file = (char *) gt_malloc (len_cache_dir + 1 + strlen (file) + 1);
sprintf (orig_file, "%s/%s", opt.cache_dir, file);
ret = stat (orig_file, &attr);
if (!ret)
{
unlink (link_name);
ret = symlink (orig_file, link_name);
}
else
{
free (orig_file);
return 1;
}
free (orig_file);
if (ret)
perror_exit ("\nnolce:\nProblems with symbolic links:"
" probably file system doesn't support them,\nexiting...");
return 0;
}
/********************** Function copy_file ************************************
* Copies a non html file from the cache to dest_dir. *
******************************************************************************/
bool
copy_file (char *url, char *file)
{
FILE *to, *from;
char *dest_file, *orig_file;
char buf[512];
size_t char_size, n_read, n_write;
dest_file = dirs_and_name (url);
unlink (dest_file);
to = fopen(dest_file, "w");
orig_file = (char *) gt_malloc (len_cache_dir + 1 + strlen (file) + 1);
sprintf (orig_file, "%s/%s", opt.cache_dir, file);
from = fopen (orig_file, "r");
if (!to || !from)
return 1;
free (orig_file);
char_size = sizeof (char);
while (1)
{
n_read = fread (buf, char_size, 512, from);
if (!n_read)
break;
n_write = fwrite (buf, char_size, n_read, to);
if (n_read != n_write)
perror_exit ("\nnolce:\nCould not copy images.");
}
fclose(from);
fclose(to);
return 0;
}
/********************** Function dirs_and_name ********************************
* Creates the directory structure under dest_dir which reflects the given *
* url, and returns the file name. The current directory remains the last *
* created. *
******************************************************************************/
char *
dirs_and_name (char *url)
{
char *base;
char *dest_file, *dir;
base = (char *) gt_malloc (strlen (url) + 1);
dest_file = cut_path (url, base);
/*
* Creation of directories structure.
*/
chdir (opt.dest_dir);
dir = strtok (base, "/"); /* base contains at least a '/' */
do
{
mkdir (dir, S_IRWXU | S_IRWXG | S_IRWXO);
chdir (dir);
}
while ((dir = strtok (NULL, "/")) != NULL);
free (base);
return dest_file;
}
nolce-1.9.2/src/skeletons.c 100644 0 0 6311 6506754776 13673 0 ustar root root
/*
* SEE COPYRIGHT NOTE IN THE FILE main.c
* Version 1.9.2
*/
/*
* Skeletons for files related to summary.
*/
char *index[9] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<head>\n<title> Nolce summary</title>\n</head>",
"\n<frameset rows=\"26, *\" border=0 frameborder=\"no\" bordercolor=#ffffff>",
"\n<frame scrolling=auto name=\"banner\" ",
">\n<frame scrolling=auto name=\"user\" ",
">\n<noframes>",
"\nIf a frame capable browser isn't avaible, use this ",
"\n<a href=\"nolce_files/full_index.html\">full index</a>.",
"\n</noframes>\n</frameset>\n</html>"
};
char *banner[23] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<body bgcolor=#008877 link=#ffffff vlink=#ffffff alink=#bbbbbb>",
"\n<table width=\"100%\" border=0 cellpadding=0 cellspacing=0>",
"\n<tr><td align=center>",
"\n<font size=2><a target=\"user\" ",
" onMouseOver=\"window.status='Shows index and domains window.';return true\">",
"\nList & domains</a></td><td align=center>",
"\n<font size=2><a target=\"user\"",
" onMouseOver=\"window.status='Shows index without domains window.';return true\">",
"\nSimple list</a></td><td align=center>",
"\n<font size=2><a target =\"user\" href=\"/usr/doc/nolce-1.9/frame_docs.html#instr\"",
"onMouseOver=\"window.status='About using the summary.';return true\">",
"\nInstructions</a></td><td align=center>",
"\n<font size=2><a target=\"user\" href=\"/usr/doc/nolce-1.9/README.html\"",
"onMouseOver=\"window.status='Help on using nolce.';return true\">",
"\nHelp</a></td><td align=center>",
"\n<font size=2><a target=\"user\" href=\"http://members.tripod.com/~giustrov/nolce.html\" ",
"onMouseOver=\"window.status='Check for news regarding nolce!';return true\">",
"\nNolce web page</a></td><td align=center>",
"\n<font size=2><a href=\"mailto:g.trovato@usa.net\" onMouseOver=\"window.status=",
"'For questions, bug reports, suggestions...';return true\">",
"\nContact the author</a></td></tr></table>",
"\n</body>\n</html>"
};
char *wd_index[8] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<frameset cols=\"180, *\" border=1 frameborder=\"yes\">",
"\n<frame scrolling=auto name=\"domains\" ",
">\n<frame scrolling=auto name=\"list\" ",
">\n<noframes>\nSorry, this document needs a frame-capable browser.\n<br>",
"\nIf such a browser isn't avaible, read the ",
"<a href=\"full_index.html\">full_index.html</a> file.",
"\n</noframes>\n</frameset>\n</html>"
};
char *dom_index[4] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<html>\n<body bgcolor=#ffffff>",
"\n<table bgcolor=#ff6060 width=\"100%\" cellpadding=3 cellspacing=0 border=0>",
"\n<tr><td>\n<font size=4 color=#ffeeee><b>Domains</b></font>\n</td></tr></table>\n<p>"
};
char *full_index[6] =
{"<!-- File generated by nolce. Do not edit! -->\n",
"\n<HTML>\n<HEAD>\n<TITLE>nolce summary</TITLE>\n</HEAD>\n<BODY BGCOLOR=#ffffff>",
"\n<table bgcolor=#1188ff width=\"100%\" cellpadding=3 cellspacing=0 border=0>",
"\n<tr><td>\n<font size=4 color=#eeffff><b>Pages</b></font>\n</td></tr></table>",
"\n<p>\n<font size=2>Pages can be viewed off-line, clicking on the icon. Clicking ",
"\non the url, the document will be downloaded from the Internet.\n</font>\n<p><a name=\"1\">"
};
nolce-1.9.2/src/utils.c 100644 0 0 40116 6506754762 13040 0 ustar root root
/*
* SEE COPYRIGHT NOTE IN THE FILE main.c
* Version 1.9.2
*/
/**************************** INCLUDES *************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <ctype.h>
#include <unistd.h>
#include <stddef.h> /* Needed on Irix */
#include <errno.h>
#include "nolce.h"
/************************ EXTERN VARIABLES *********************************/
extern char *curr_dir, *lock_file;
extern T_OPT opt;
extern NODE *root;
/********************** Function gt_malloc ************************************
* Calls malloc, checks if we are out of memory, and returns a pointer to a *
* block of a certain size ("size") of allocated memory. *
******************************************************************************/
void *
gt_malloc (size_t size)
{
void *ptr;
ptr = malloc (size);
if (ptr == NULL)
perror_exit ("\nnolce:\nThe program can't get necessary memory")
return ptr;
}
/********************** Function gt_realloc ***********************************
* Calls realloc, checks if we are out of memory, and returns a pointer to a *
* block of a certain size ("size") of allocated memory. *
******************************************************************************/
void *
gt_realloc (void *aap, size_t size)
{
void *ptr;
ptr = realloc (aap, size);
if ((ptr == NULL) && (size != 0))
perror_exit ("\nnolce:\nThe program can't get necessary memory")
return ptr;
}
/********************** Function help *****************************************
* Prints the help message and exits successfully. *
******************************************************************************/
void
help ()
{
printf ("nolce " VERSION ", (C) 1997-98 Giuseppe Trovato. See LICENCE file for terms of use."
"\n\nUsage: nolce [n_hours] [OPTIONS]..."
"\nReads Netscape Navigator (ver. 2 and above) cache files created in"
"\nthe last n_hours hours and copies them in a new directory adjusting"
"\nfile names and links to permit an off-line navigation of them."
"\nIf n_hours isn't supplied, all cached files are processed."
"\nOptions:"
"\n\n -c cache_dir\t directory where cache is, default $HOME/.netscape/cache"
"\n -d dest_dir\t directory where files are copied, default $HOME/" DEST_DIR
"\n -g sub_string\t process only files whose URL contains sub_string"
"\n -G sub_string\t process only files whose URL doesn't contain sub_string"
"\n -i summary_file file name of summary, it will be created in dest_dir,"
"\n\t\t default is " SUMMARY_FILE
"\n -w\t\t show pages in the same index window"
"\n -W\t\t show pages in the list frame"
"\n -s\t\t execute silently"
"\n -m\t\t don't eliminate missing images"
"\n -t\t\t put downloading date of documents in summary file"
"\n -f\t\t don't process links not satisfying initial conditions"
"\n -p\t\t cache is generated by Netscape for Windows"
"\n -l\t\t ignore lock files (use with attention: see docs)"
"\n -k\t\t make symbolic links for non html files"
"\n --help\t shows this help\n");
exit (EXIT_SUCCESS);
}
/********************** Function print_error **********************************
* Prints various messages for various types of errors in command line. *
******************************************************************************/
void
print_error (char option, char *prg_name, int what_error)
{
if (what_error == illegal)
printf ("%s: illegal option -- %c", prg_name, option);
else if (what_error == need_arg)
printf ("%s: option requires an argument -- %c", prg_name, option);
else if (what_error == already_supplied)
{
if (!option)
printf ("%s: already supplied option -- n_hours", prg_name);
else
printf ("%s: already supplied option -- %c", prg_name, option);
}
printf ("\nTry `%s --help' for more information.\n", prg_name);
exit (EXIT_FAILURE);
}
/********************** Function cut_path *************************************
* From a complete path ("path"), that is directories + filename, returns *
* these two components separately (directories in "base" and the filename as *
* the returned value of the function). *
******************************************************************************/
char *
cut_path (char *path, char *base)
{
char *pos;
strcpy (base, path); /* base needs to be allocated from the */
pos = strrchr (base, '/'); /* caller, at least strlen(path) + 1. */
*(pos + 1) = '\0';
return 1 + strrchr (path, '/');
}
/********************** Function check_path ***********************************
* From the base url ("base") of a document and a link ("entry"), returns the *
* absolute url ("cleaned"). *
******************************************************************************/
char *
check_path (char *cleaned, char *base, char *entry)
{
char *l_entry; /* Local copy of the entry parameter. */
char *p_l_entry, *pasf;
if (strstr (entry, "://"))
return strcpy (cleaned, entry);
p_l_entry = l_entry = (char *) gt_malloc (strlen (entry) + 1);
strcpy (l_entry, entry);
strcpy (cleaned, base);
if (l_entry[0] == '/')
{
char *a, *b;
int k;
b = cleaned;
for (k = 1; *b != '\0' && k <= 3; k++)
{
a = strchr (b, '/');
if (k == 3)
if (a)
*a = '\0';
b = a + 1;
}
}
else
while (strstr (l_entry, "../"))
{
/*
* For every "../" in the ref., cuts a directory in the base path.
*/
pasf = strrchr (cleaned, '/');
if (pasf)
*pasf = '\0';
pasf = strrchr (cleaned, '/');
if (pasf)
*(pasf + 1) = '\0';
l_entry += 3;
}
strcat (cleaned, l_entry);
free (p_l_entry);
return cleaned;
}
/********************** Function check_html_url *******************************
* Checks if the url ("str") contains the specification of an html file. If *
* no, the DEFAULT_HTML suffix is added. *
******************************************************************************/
bool
check_html_url (char *str, bool only_test)
{
char *ext;
ext = strrchr (str, '.');
if (ext == NULL || (strcmp (ext, ".html") && strcmp (ext, ".htm") && strcmp (ext, ".HTML") && strcmp (ext, ".HTM")))
{
if (!only_test && (str[0] != '\0'))
{
size_t len;
len = strlen (str);
if (str[len - 1] != '/')
strcat (str, "/");
strcat (str, DEFAULT_HTML);
}
return 1;
}
return 0;
}
/********************** Function process_arg **********************************
* Stores a command line parameter ("str") in the appropriate variable *
* ("store") and, when necessary, turns a relative path into an absolute one. *
******************************************************************************/
void
process_arg (char *str, char **store, bool need_abs_path)
{
if (need_abs_path && (str[0] != '/'))
{
*store = (char *) gt_malloc (strlen (curr_dir) + 1 + strlen (str) + 1);
sprintf (*store, "%s/%s", curr_dir, str);
}
else
{
*store = (char *) gt_malloc (strlen (str) + 1);
strcpy (*store, str);
}
}
/********************** Function standardize **********************************
* Deletes possible final '/' or double '/' from an url ("str"). *
* Also substitutes strange characters with '_', truncates too long urls and *
* transforms the "://" in "/". *
******************************************************************************/
char *
standardize (char *str)
{
char *snew, *c, *sp, *token;
snew = (char *) gt_malloc (strlen (str) + 1);
sp = strstr (str, "://");
strncpy (snew, str, sp - str);
snew[sp - str] = '\0';
c = snew - 1;
while (*++c != '\0')
*c = tolower (*c);
sp += 3;
token = strtok (sp, "/");
strcat (snew, "/");
strcat (snew, token);
while ((token = strtok (NULL, "/")) != NULL)
{
c = token - 1;
while (*++c != '\0')
{
if (!isalnum (*c))
if (!strchr ("._-+", *c))
*c = '_';
if (c - token > 128)
{
*c = '\0';
break;
}
}
if (strcmp (token, "."))
strcat (strcat (snew, "/"), token);
}
strcpy (str, snew);
free (snew);
return str;
}
/********************** Function gt_strshift **********************************
* Shifts to left the characters of a string ("str") of a given number ("pos") *
* of positions. *
******************************************************************************/
char *
gt_strshift (char *name, int pos)
{
register char *str = name - 1;
while ((*++str = *(str + pos)));
return name;
}
/********************** Function relative_position ****************************
* Transforms an absolute url ("url") to a relative one ("ref"), basing on a *
* given document's base ("base"). *
******************************************************************************/
char *
relative_position (char *url, char *base, char *ref)
{
char *b = base - 1;
ptrdiff_t off = url - base;
char *suff, *tok, *last_slash = base;
size_t dirs = 0;
char c;
int k;
do
{
c = *++b;
if (c == '/')
last_slash = b + 1;
if (c != b[off])
{
char *l_base;
suff = last_slash + off;
l_base = (char *) gt_malloc (strlen (last_slash) + 1);
strcpy (l_base, last_slash);
tok = strtok (l_base, "/");
while (tok)
{
dirs++;
tok = strtok (NULL, "/");
}
break;
}
}
while (c != '\0');
ref[0] = '\0';
for (k = 1; k <= dirs; k++)
strcat (ref, "../");
strcat (ref, suff);
return ref;
}
/********************** Function start_summary_files **************************
* Creates the various index-related files under nolce_files. *
******************************************************************************/
FILE *
start_summary_files (int what, long *nd)
{
#include "skeletons.c"
int k;
FILE *f;
char line[80];
char *temp;
if (what == CONST)
{
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 4; k++)
fputs (index[k], f);
fprintf (f, "src=\"nolce_files/banner_%s\"", opt.summary_file);
fputs (index[4], f);
fprintf (f, "src=\"nolce_files/wd_%s\"", opt.summary_file);
for (k = 5; k < 9; k++)
fputs (index[k], f);
fclose (f);
sprintf (temp, "%s/nolce_files/banner_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 5; k++)
fputs (banner[k], f);
fprintf (f, "href=\"wd_%s\"", opt.summary_file);
for (k = 5; k < 8; k++)
fputs (banner[k], f);
fprintf (f, "href=\"full_%s\"", opt.summary_file);
for (k = 8; k < 23; k++)
fputs (banner[k], f);
fclose (f);
sprintf (temp, "%s/nolce_files/wd_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 3; k++)
fputs (wd_index[k], f);
fprintf (f, "src=\"dom_%s\"", opt.summary_file);
fputs (wd_index[3], f);
fprintf (f, "src=\"full_%s\"", opt.summary_file);
for (k = 4; k < 8; k++)
fputs (wd_index[k], f);
fclose (f);
}
if (what == FULL)
{
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/nolce_files/full_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "r+");
if (f && !strcmp (fgets (line, 80, f), "<!-- File generated by nolce. Do not edit! -->\n"))
fseek (f, -15L, SEEK_END);
else
{
if (f)
fclose (f);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 6; k++)
fputs (full_index[k], f);
}
}
if (what == DOM)
{
*nd = 0;
temp = (char *) gt_malloc (strlen (opt.dest_dir) + strlen (opt.summary_file) + 40);
sprintf (temp, "%s/nolce_files/dom_%s", opt.dest_dir, opt.summary_file);
f = fopen (temp, "r+");
if (f && !strcmp (fgets (line, 80, f), "<!-- File generated by nolce. Do not edit! -->\n"))
{
while (fgets (line, 80, f) != NULL)
if (!strncmp (line, "<a target=", 10))
(*nd)++;
fseek (f, -15L, SEEK_CUR);
}
else
{
if (f)
fclose (f);
f = fopen (temp, "w");
check_file_stream (f, temp);
for (k = 0; k < 4; k++)
fputs (dom_index[k], f);
}
}
free (temp);
if (what != CONST)
return f;
else
return NULL;
}
/********************** Function put_in_summary_files *************************
* Creates an index entry for an html document ("data"). If the document *
* is the last, closes all index files. *
******************************************************************************/
void
put_in_summary_files (NODE * data, bool close_all)
{
static char *old_domain;
static size_t times = 0;
static FILE *f_full, *f_dom;
static long label;
static char target[16];
char red_url[74];
int k, j;
if (close_all)
{
fputs ("\n</pre>\n<hr size=1 noshade>\n</body>\n</html>", f_dom);
fputs ("\n</body>\n</html>", f_full);
fclose (f_dom);
fclose (f_full);
if (times == 0)
perror_exit ("\nnolce:\nCouldn't process any html file, summary won't work, exiting...")
return;
}
for (k = 0, j = 0; j < 3; k++)
if (data->or_url[k] == '/')
j++;
if (times++ == 0)
{
time_t curr_time;
old_domain = (char *) gt_malloc (1);
old_domain[0] = '\0';
f_dom = start_summary_files (DOM, &label);
f_full = start_summary_files (FULL, NULL);
curr_time = time (NULL);
fprintf (f_dom, "\n<font size=2>Retrieved on %s</font><pre>",
ctime (&curr_time));
if (opt.view_window == 0)
strcpy (target, "target=\"_view\"");
else if (opt.view_window == 1)
strcpy (target, "target=\"_top\"");
else if (opt.view_window == -1)
*target = '\0';
}
if ((strlen (old_domain) != (k - 1)) || strncmp (old_domain, data->or_url, k - 1))
{
old_domain = (char *) gt_realloc (old_domain, k);
strncpy (old_domain, data->or_url, k - 1);
old_domain[k - 1] = '\0';
fprintf (f_dom, "\n<a target=list HREF=\"full_%s#%ld\">"
"<img src=\"internal-gopher-menu\" hspace=2 "
"align=bottom border=0>%s</a>",
opt.summary_file, ++label, strchr (old_domain, '/') + 2);
if (label > 1)
fprintf (f_full, "<p>\n<font size=1>New domain</font>"
"\n<hr align=left width=\"150\" size=1 noshade>\n<a name=\"%ld\">", label);
}
strncpy (red_url, data->or_url, 70);
red_url[70] = red_url[71] = red_url[72] = '.';
red_url[73] = '\0';
fprintf (f_full, "<p><table border=0><tr><td align=left><a %s href=\"../%s\"><img src=\"internal-gopher-text \""
"align=left border=0></a></td><td nowrap><b><font color=#ff0000>Title: </font></b>"
"%s\n<br><b>Url: </b>\n<a target=\"_top\" href=\"%s\">%s</a></td></tr></table>",
target, data->r_url, data->title, data->or_url, red_url);
if ((opt.var & REPORT_TIME))
fprintf (f_full, "\n<font color=\"#000055\" size=\"-1\">Downloaded on %s</font>", ctime (&(data->mod_time)));
}
/********************** Function url_is_valid *********************************
* Checks if the url ("str") is valid. *
******************************************************************************/
bool
url_is_valid (char *str)
{
if (str)
{
char *w_prfx;
w_prfx = strstr (str, "://");
if (!w_prfx)
return 0;
w_prfx += 3;
if (!isalpha (*w_prfx) && !isdigit (*w_prfx))
return 0;
return 1;
}
else
return 0;
}
/********************** Function check_file_stream ****************************
* Checks if a file ("path") was opened successfully. *
******************************************************************************/
void
check_file_stream (FILE *f, char *path)
{
if (f == NULL)
{
if (errno == EACCES)
fprintf (stderr, "\n\nnolce:\nCould not create index file `%s', haven't permission.", path);
else
fprintf (stderr, "\n\nnolce:\nCould not create index file `%s'", path);
perror_exit ("Exiting...");
}
}
nolce-1.9.2/src/nolce.h 100644 0 0 5550 6506755017 12762 0 ustar root root
/*
* SEE COPYRIGHT NOTE IN THE FILE main.c
* Version 1.9.2
*/
/***************************** DEFINES *************************************/
#define CACHE_FILE "index.db"
#define WIN_CACHE_FILE "fat.db"
#define DEST_DIR "cached" /* Goes under $HOME */
#define SUMMARY_FILE "index.html" /* Under DEST_DIR */
#define DEFAULT_HTML "index.html" /* Under site's directory tree */
#define perror_exit(str) {perror ("\n" str); unlink (lock_file); exit (EXIT_FAILURE);}
#define VERSION "1.9.2"
/***************************** TYPEDEFS ************************************/
typedef unsigned char bool; /* Boolean True (1) or False (0) */
typedef struct node_of_btree
{
char *or_url;
char *url;
char *url_w_index;
char *r_url; /* Link to url or url_w_index */
int type;
char *file_name;
char *title;
size_t dup;
time_t mod_time;
struct node_of_btree *sx;
struct node_of_btree *dx;
}
NODE;
typedef struct cmd_options
{
unsigned int var; /* Bit-mapped */
int view_window;
char *cache_dir, *dest_dir, *summary_file, *str_tbc;
}
T_OPT;
/************************ FUNCTION DECLARATIONS ****************************/
void *gt_malloc (size_t size);
void *gt_realloc (void *aap, size_t size);
NODE *get_data (NODE * start, time_t cut_time, char *sub_str);
NODE *add_url (NODE * node, char *or_url, char *url,
int type, char *file_name, time_t mtime);
NODE *find_url (NODE * node, char *url);
bool check_html_url (char *str, bool only_test);
bool copy_file (char *url, char *file);
bool make_link (char *url, char *file);
bool url_is_valid (char *str);
bool process_reference (char *ref, char *base, char *mod_base,
char *file_name, FILE * out, int operation);
char *dirs_and_name (char *url);
char *cut_path (char *path, char *base);
char *check_path (char *cleaned, char *base, char *entry);
char *standardize (char *str);
char *relative_position (char *url, char *base, char *ref);
char *gt_strshift (char *name, int pos);
void process_html_file (NODE * node);
void process_arg (char *str, char **store, bool need_abs_path);
void sig_handler (int sig_number);
void help ();
void check_file_stream (FILE *f, char *path);
void print_error (char option, char *prg_name, int what_error);
void visit_tree (NODE * node, int operation);
void put_in_summary_files (NODE * data, bool close_all);
FILE *start_summary_files (int what, long *nd);
/***************************** OTHERS **************************************/
enum errors
{
illegal = 1, need_arg, already_supplied
};
enum opt_bits /* For T_OPT */
{
SILENT = 1, MISSING_IMAGES = 2, REPORT_TIME = 4,
NO_LINKS = 8, WIN_CACHE = 16
};
enum type_bits /* For NODE.type */
{
IS_HTML = 1, NEED_INDEX = 2, PR_MAIN = 4,
PR_REL = 8, SHOW = 16, PROCESSED = 32,
IGNORE_LOCK = 64, LINK_FOR_IMAGES = 128
};
enum what_index
{
DOM, FULL, CONST
};
nolce-1.9.2/src/scanner.lex 100644 0 0 6561 6506751175 13660 0 ustar root root %{
#include <stdio.h>
enum rets {TITLE=1, BASE, REF, IMG};
enum stats {STD = 0, I_BASE=1, N_I_R};
enum rs {generic, image};
char title[256], img_other[512];
char *img_src;
int i_k = 0, t_k = 0, ref;
unsigned char i_a_tag, found, i_frame;
%}
%s STD I_BASE I_XMP I_TAG N_I_R I_TITLE I_IMG
SP [ \t\r\n]
a [Aa]
b [Bb]
c [Cc]
d [Dd]
e [Ee]
f [Ff]
g [Gg]
h [Hh]
i [Ii]
j [Jj]
k [Kk]
l [Ll]
m [Mm]
n [Nn]
o [Oo]
p [Pp]
q [Qq]
r [Rr]
s [Ss]
t [Tt]
u [Uu]
v [Vv]
w [Ww]
x [Xx]
y [Yy]
z [Zz]
%%
<STD>"<"{b}{a}{s}{e}{SP}* BEGIN I_BASE;
<I_BASE>{t}{a}{r}{g}{e}{t}{SP}*={SP}*[^ >]+ {
fprintf (yyout, "<base %s>", yytext);
}
<I_BASE>[^<>= '\"\t\r\n]+ return BASE;
<I_BASE>">" BEGIN STD;
<I_BASE>[<>= '\"\t\r\n] |
<I_BASE>{h}{r}{e}{f}{SP}*={SP}* { /* Discard */ }
<STD>"<"{t}{i}{t}{l}{e}{SP}*">" { ECHO;
BEGIN I_TITLE;
}
<I_TITLE>.|\n { ECHO;
if (t_k < 255 )
title[t_k++] = *yytext;
}
<I_TITLE>"<""/"{b}{o}{d}{y}{SP}*">" {
ECHO;
title[0] = '\0';
t_k = 0;
BEGIN STD;
return TITLE;
}
<I_TITLE>"<""/"{t}{i}{t}{l}{e}{SP}*">" { ECHO;
title[t_k] = '\0';
t_k = 0;
BEGIN STD;
return TITLE;
}
<N_I_R>\"[^">]*\" |
<N_I_R>[^> \t\r\n]* { if (ref == generic)
{
BEGIN I_TAG;
return REF;
}
else
{
img_src = (char *) realloc (img_src, yyleng + 1);
strcpy (img_src, yytext);
BEGIN I_IMG;
}
}
<N_I_R>">" {
yyless (0);
BEGIN I_TAG;
}
<STD>"<""/"{a}{SP}*">" { if (found)
{ found = 0;
fputs ("</I>", yyout); }
ECHO;
}
<STD>"<"{f}{r}{a}{m}{e}{SP}+ { ECHO;
yyless(6);
i_frame = 1;
BEGIN I_TAG;
}
<STD>"<"{a}{SP}+ { ECHO;
yyless(2);
i_a_tag = 1;
BEGIN I_TAG;
}
<STD>"<"[a-zA-Z]+ { ECHO;
BEGIN I_TAG;
}
<STD>"<"{i}{m}{g}{SP}+ {
yyless (4);
BEGIN I_IMG;
}
<I_IMG>{SP}+{s}{r}{c}{SP}*= {
ref = image;
BEGIN N_I_R;
}
<I_IMG>{SP}+{l}{o}{w}{s}{r}{c}{SP}*={SP}*[^ \n\t\r>]+ {}
<I_IMG>[^>] if (i_k < 511) img_other[i_k++] = *yytext;
<I_IMG>">" {
img_other[i_k] = '\0';
i_k = 0;
BEGIN STD;
return IMG;
}
<I_TAG>{SP}+{h}{r}{e}{f}[{SP}="]+{m}{a}{i}{l}{t}{o}:[^>]*">" |
<I_TAG>{SP}+{h}{r}{e}{f}[{SP}="]+{n}{e}{w}{s}:[^>]*">" {
ECHO;
i_a_tag = 0;
BEGIN STD;
}
<I_TAG>{SP}+{h}{r}{e}{f}{SP}*={SP}* |
<I_TAG>{SP}+{s}{r}{c}{SP}*={SP}* |
<I_TAG>{SP}+{b}{a}{c}{k}{g}{r}{o}{u}{n}{d}{SP}*={SP}* {
ECHO;
ref = generic;
BEGIN N_I_R;
}
<I_TAG>\"([^"]|(\\\"))+\" ECHO;
<I_TAG>">" { ECHO;
if (i_frame)
i_frame = 0;
if (i_a_tag)
{ i_a_tag = 0;
if (found)
fputs ("<I>", yyout);}
BEGIN STD;
}
<STD>"<""/"[a-zA-Z]+{SP}*">" ECHO;
<STD>"<"{x}{m}{p}{SP}*">" { ECHO;
BEGIN I_XMP;
}
<I_XMP>"<""/"{x}{m}{p}{SP}*">" { ECHO;
BEGIN STD;
}
<STD,I_TAG,I_XMP>[^<>= \t\r\n]* ECHO;
<STD,N_I_R,I_TAG,I_XMP>.|\n ECHO;
.|\n { /*
* This is used to enter STD mode
* every time yylex() is called.
*/
yyless (0);
BEGIN STD;
}
nolce-1.9.2/src/Makefile 100644 0 0 2617 6506754700 13150 0 ustar root root # Makefile for nolce 1.9.2
# (C) 1997-98 G. Trovato (g.trovato@usa.net)
########### Variables ############
DOCS_DIR=/usr/doc/nolce-1.9
BIN_DIR=/usr/bin
MAN_DIR=/usr/man/man1
LEX=flex # On some non-Linux systems, this must be
# deleted or commented.
LDFLAGS = -ldb -lfl # -lfl is needed bt flex.
CFLAGS = -Wall # Compiler flags.
LFLAGS = # Possible lex command flags. If you're using
# flex, -Cf is advised.
DEFINES= # Use -Darray if your lex defines yytext as a
# char array rather than char pointer
# (like on some Slackware installatios)
########### End of user variables ############
OBJS=main.o utils.o lex.yy.o
SHELL = /bin/sh
all: nolce
nolce: $(OBJS)
$(CC) -o nolce $(OBJS) $(LDFLAGS)
main.o: main.c nolce.h
$(CC) -o main.o -c main.c $(CFLAGS) $(DEFINES)
utils.o: utils.c nolce.h skeletons.c
$(CC) -o utils.o -c utils.c $(CFLAGS)
lex.yy.o: lex.yy.c
$(CC) -o lex.yy.o -c lex.yy.c
lex.yy.c: scanner.lex
$(LEX) $(LFLAGS) scanner.lex
clean:
-rm -f main.o
-rm -f utils.o
-rm -f lex.yy.c
-rm -f lex.yy.o
-rm -f nolce
DOCS = ../docs/LICENCE ../docs/README.html ../docs/frame_docs.html ../docs/frame_toc.html ../docs/CHANGES.html ../README.1st
install: nolce
cp nolce $(BIN_DIR)
-mkdir -p $(DOCS_DIR)
cp ../docs/nolce.1 $(MAN_DIR)
cp $(DOCS) $(DOCS_DIR)
uninstall:
-rm -f $(BIN_DIR)/nolce
-rm -f $(MAN_DIR)/nolce.1
-rm -rf $(DOCS_DIR)
nolce-1.9.2/.Nolce_was_known_as_netcache 100755 0 0 442 6506751176 16346 0 ustar root root #!/bin/sh
more +6 .Nolce_was_known_as_netcache
exit
##### End of commands.
##### Message:
Yes, netcache-1.4 of ftp://sunsite.unc.edu/pub/Linux/www/plugins was one of the previous versions of this program.
The name was changed because I discovered that netcache is trade mark.
G. Trovato
nolce-1.9.2/README.1st 100644 0 0 1455 6506751176 12312 0 ustar root root To build, go to src subdir and do make, or make install.
For nolce documentation, refer to the file README.html in the subdir docs.
It requires a frame and JavaScript capable browser (like Netscape 2.0+).
If such a browser isn't available, read frame_docs.html .
Look at the file LICENCE for terms about using this program.
Read also CHANGES.html for changes over previous versions.
IMPORTANT
=========
Read in the documentation the paragraph "Important notes" under the section
"Usage and what the program does" for info regarding how to avoid that some
visited pages aren't stored in the cache.
If you're using Slackware and experience a Segmentation fault error on nolce's
execution, it could be necessary to add -Darray to DEFINES, in the Makefile.
See the "Compatibility" section in the documentation.