Filewatcher File Search File Search
Catalog
Content Search
» » » » » » Algorithm-VSM-1.0.tar.gz » Content »
pkg://Algorithm-VSM-1.0.tar.gz:318912/Algorithm-VSM-1.0/examples/  info  downloads

README



(1)  To become familiar with this module, first run the script:

             retrieve_with_VSM.pl

     This script carries out basic VSM-based model construction and
     retrieval.

     When you run the above script, you will be asking the module to
     retrieve the Java files in the 'corpus' directory that match the
     un-commented query shown at the top of the script.  Try running the
     script with one of other queries by uncommenting the appropriate query
     line.



(2)  Next run the script:

             retrieve_with_LSA.pl

     For basic LSA-based model construction and retrieval.  As with the
     previous script, you will be asking the module to retrieve the Java
     files in the 'corpus' directory that match the un-commented query
     shown at the top of the script.  Try running the script with one of
     other queries by uncommenting the appropriate query line.



(3)  Both the scripts mentioned above deposit the VSM and the LSA models
     created in disk-based database files.  To retrieve from the disk-based
     VSM model created by the 'retrieve_with_VSM.pl' script, run the script

             retrieve_with_disk_based_VSM.pl

     Try running this script with different queries.  You can uncomment one
     of the other queries that are currently commented out or you can 
     create fresh queries of your own.



(4)  To retrieve using the disk-based LSA model created by the previous 
     execution of the script 'retrieve_with_LSA.pl', now execute the script

             retrieve_with_disk_based_LSA.pl

     Try running this script with different queries.  You can uncomment one
     of the other queries that are currently commented out or you can 
     create fresh queries of your own.



(5)  For your first experiments with measuring the accuracy of retrieval  
     performance, execute the script

             calculate_precision_and_recall_for_VSM.pl
   
     This script first tries to estimate the relevancies of the corpus
     files to each of the queries in the file 'test_queries.txt'.  The
     module calculates the two measures Precision@rank and Recall@rank.
     The under the Precision vs. Recall curve for each query is the
     accuracy of retrieval for that query.  Averaging of this result over
     all the queries yields the more global metric MAP (Mean Average
     Precision).
     
     As mentioned elsewhere in the module documentation, estimating
     relevancies in the manner the module does is not safe.  Relevancies
     are supposed to be supplied by humans.  All that a computer can do to
     estimate relevancies is to count the number of query words in a
     document.  But, measuring relevancies in this manner, creates circular
     dependency between the retrieval algorithm and the estimated
     relevancy.


(6)  Do the same as in the previous step, but this time for LSA, by
     executing the script

             calculate_precision_and_recall_for_LSA.pl


(7)  As mentioned above in the note for step (5), measuring retrieval
     accuracy requires human-supplied relevancy judgments.  Assuming 
     that such judgments are made available to the module through the
     file named through the constructor parameter 'relevancy_file',
     you can run the script

          calculate_precision_and_recall_from_file_based_relevancies_for_VSM.pl

     for the case of VSM.  This script will print out the average
     precisions for the different test queries and calculate the MAP metric
     of retrieval accuracy.

(8)  To do the same as above but for the case of LSA, run the script


          calculate_precision_and_recall_from_file_based_relevancies_for_LSA.pl



Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com