Welcome:
Thank you for taking the time to download PD, despite it's awful name. PD is an easy to use web-content filtering utility. It will crawl the web, starting wherever you specify, and pull down all cross-linked web pages to your hard drive and then filter through the files based upon various criteria you specify. Below is a full account of the programs funtionality.
Please look over our unambiguous privacy guarantees.
Prelude:
When you first unleash PD on the Internet (clicking the crawl button), PD creates two folders inside the directory where you have PD located.
- One folder, "raw", is for the unfiltered files and images PD has downloaded from the Internet. Most of these files are "stubs" which tell PD not to revisit these URLs. If you delete these files, PD will revisit the URL. This can be a good thing and a bad thing. If you want to re-crawl a web site, you should delete the "html" & "htm" files. If you do not want to re-crawl, don’t delete anything in this folder.
- The other folder is named "ready", this is where PD puts the html files which point to the filtered images it has downloaded for you. The new "ready" files appear after PD has quit, so be patient when you click "Quit".
PD will download every file to your hard drive, and will follow every link unless you tell it not to. PD comes with, or will make, two files named "exclude-ending.txt" & "exclude-middle.txt".
- PD will follow every link which does not end with the strings you type into the "exclude-ending.txt" file. For example: if you've typed ".exe" in the "exclude-ending.txt" file, no urls ending in ".exe" will be downloaded.
- PD will follow every link which does not contain the strings you type into the "exclude-middle.txt" file. For example: if you've typed "copyright" in the "exclude-middle.txt" file, no urls containing the word "copyright" will be downloaded.
This is one of the very powerful features of PD. You can use the "exclude-ending.txt" and "exclude-middle.txt" files to fine tune your crawling output.
PD has three windows:
Main Window:
There are three buttons here & two boxes where PD will display text.
Button One: Crawl
Click this to start downloading from the Internet. If you are not already connected, PD will try to connect you to the Internet. If you cancel your dialer program, or otherwise disallow PD access to the Internet, it will not be able to do its job. (Sorry, there’s really no way around this fact.)
If you click this button before you setup the PD options, PD will open the Option window for you to choose your starting point, and other options (See Options Window).
(During PD crawling, this button is disabled. (For your protection).)
Button Two: Options
Click this button to open the Options Window (See Options Window).
(During PD crawling, this button is disabled. (For your protection).)
Button Three: Quit
Click this button to close PD. Simple.
The only catch with quitting is that PD is a networking application, so sometimes PD will be busy waiting for a web server to respond so PD may not quit instantly. (Sorry about that.) If you click this and PD seems to have crashed, wait sixty seconds. PD should close, it’s a very well-behaved application.
Also, once you click this button, PD will move all the "ready" html files from deep inside the "raw" directory, to the "ready" directory. If you "End Task" on PD, these "ready" html files will not be transferred, and will not have "Go Next Page" links added to them. (It is best to give PD the time it needs to shut down.)
The Top Text Box:
This is where PD tells you it’s current status. The three lines PD draws, tell you:
- The number of bytes downloaded in the file it is working on.
- The number of files PD has downloaded for you. (This is a grand total.)
- The number of megabytes PD has downloaded for you. (This is a grand total.)
The Bottom Text Box:
This is how PD tells you what it is doing. Typical messages will include:
- "Downloading Page"
- "Downloading Images"
- "Quitting" (Telling you, PD understands you want to quit, but it is trying to clean up.)
- "All done" (Telling you, PD is done working and is waiting for you to click the "Quit" button.)
Options Window:
"Start URL":
This is the URL where PD will start crawling. It will grab every image from this page, and follow every link on this page. All of this data will be stored on your hard drive, in a folder named "raw" (which will be created by PD) in the folder where PD is located. PD will download every image from the "followed links" and will follow all links on that page, until the "depth" option tells PD to stop crawling away from the start URL.
"This site requires a user name & password":
Have you subscribed to a site, and want to use PD on this site? Does this site allow you to create URLs like this:
http://username:password@www.website.com/memberarea/
If so, then you can use PD on this site too. Check this box & enter your user name & password in the two text areas. But if you already knew about creating URLs like the above, then you probably will not use this feature, will you?
"Minimum Image size" (Height, Width):
You can limit the images which appear on the "ready" pages. But if an image’s size is below the size (height and width in pixel) which you have specified, then the image will not appear on the "ready" pages.
If you enter "0", then PD will not eliminate any image based upon the dimension.
If you don’t know what a pixel is, or how big 100 pixels is... try something between 100 & 150, until you get familiar with this concept.
"Session maximum":
(See "Crawl depth") PD could potentially download the entire Internet onto your hard drive, if you’re not careful. (Theoretically. That was not a warranty!) Session Maximum is how you keep at least a few megabytes left for other computing uses. If you can afford 300 megabytes of "files" to be downloaded, enter the number 300 here. Every time you click the "Crawl" button, this counter resets, so the total number of megabytes used by PD "raw" files will eventually exceed this single session maximum. There is no way to configure PD’s grand total maximum. (Sorry.)
If you enter "0", PD will not stop until every link available has been downloaded, or your hard drive is full.
"Maximum file time":
PD does not understand the concept "reasonable". It will download any file it comes across, not matter how long it takes, unless you specify your concept of an unreasonably long time to spend on one file. Tell PD what’s the most amount of time it should waste on any single file, in minutes. (For fast connections five minutes may be excessive, but for 33.6 modems five minutes may be totally reasonable. "You Make The Call".)
If you enter "0", PD will complain. One is the smallest value for this option, but no maximum is enforced.
"Maximum file size":
(Again,) PD does not understand the concept "reasonable". It will download any file it comes across, not matter how large, unless you specify your concept of an unreasonably large file. Tell PD what’s the most hard drive space it should waste on any single file, in bytes. (One million bytes in a megabyte folks, but if you’re only downloading jpegs, a full megabyte is a huge file. Stick to something under 500,000.)
If you enter "0", PD will not limit the files downloaded based upon their size.
"Crawl depth":
(See "Start URL") PD follows every link on your start page (this is depth one), and follows every link on those pages (this is depth two), and then follows every link on every one of those pages (this is depth three), then (if you can believe this) PD follows every link on every one of those pages (this is depth four). You can guess what the other depths are, right?
"Page Color":
When PD finishes a session, it copies html files into the "ready" directory. This feature allows you to choose what the background color for those pages will be.
"Keep log file"
PD can write a log file so you can see what PD was doing, but unless you’re debugging, this is not a useful feature.
"Hang up modem":
If you check this box PD will close your dialup connection when it is through. If you do not have a dialup connection, unselect this option.
"Register your copy" Window:
There is one place to enter your registration code, and one place to see your serial number.
(The serial numbers are machine-specific, so place PD on the computer where you intend to run PD and email that serial number to p_depot@hotmail.com.)
The first time you run PD, go ahead and click "No". This identifies you as a "demonstration version" user.
In this mode, PD will:
- Distort the jpegs it downloads.
- Write all over the jpegs it downloads.
- Limit the amount of data it will download.
- Limit the amount of time it will run.
Even with all these limitations, we believe you see the value PD has to offer you. When you send five dollars to p_depot@hotmail.com, we will email you a registration code which will remove all the above-metioned limitations. Five dollars, one time, and PD will run on the registered computer.
If you are concerned about your privacy, please read our privacy statement.
If you have any other questions, comments, or concerns, please email us at p_depot@hotmail.com. We're real people. We will listen to you and respond.
Copyright 2001 p_depot@hotmail.com