Filewatcher File Search File Search
Catalog
Content Search
» » » » akonadi_1.3.1.orig.tar.gz » Content »
pkg://akonadi_1.3.1.orig.tar.gz:257221/akonadi-1.3.1/server/tests/enron_email_dataset/  info  downloads

README

These scripts are designed to make use of the Enron email dataset (see http://www.cs.cmu.edu/~enron/). They try to retrieve contact information from the messages and create an address book (VCARD format) from them.

To start the whole process, just run './run.sh'. It needs wget, tar, gzip, cat, sort, uniq and PHP.

Following is a list of the scripts:
- extract_contacts.php
  Reads the mail files, analyzes their addressee information (names, company names, email addresses) and outputs it.

  Takes about 4 minutes on a Pentium 4 2.4 GHz to read 128326 mail files.

- create_vcards.php
	Reads the output from extract_contacts.php and creates a VCARD address book from it. Additional information, like telephone numbers, birthdays, photographs, is randomly generated and added to the VCARDs.

  Takes about 20 seconds on a Pentium 4 2.4 GHz to create 32072 VCARDS.

- add_attachments.php
  Randomly adds fake attachments to the email messages, in order to create a more realistic dataset.

(c) 2007 Robert Zwerus <arzie@dds.nl>
Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com