WebMaven FAQ

FAQ Sheet (In process and subject to change) Q & A FAQ:

What are the functional differences between the Windows 95/98, Windows NT, and OS/2 versions of WebMaven?

There are no functional differences using WebMaven on the different platforms.

Is a WebMaven registration key good for all of the operating system platforms supported by WebMaven or is a different key needed to register each variant?

A single WebMaven key can be used across all operating system platforms. Also, if both a personal registration key exists along with an enterprise key, the enterprise key takes precedence.

After purchasing a personal (non-enterprise) key, is there an upgrade path to purchasing an enterprise key?

Yes, 70% of the actual purchase price of the personal registration key can be applied to the suggested retail price of the enterprise key.

Can I run multiple copies of WebMaven concurrently?

Yes, subject to the resources available to your computer (line speed, available disk space, etc.), there is no limit to the number of instances of WebMaven which can run concurrently.

WebMaven can be installed on a server and run concurrently from any number of clients. The only restriction regarding multiple copies of WebMaven running concurrently is that only one copy may reference a specific local path at a time.

What happens if my Internet dial-up connection breaks while WebMaven is retrieving files?

So long as the dialer you are using to connect with your ISP (Internet services provider) automatically re-dials on a broken connection and completes a new connection within 75 seconds, WebMaven will continue as if the interruption had not occurred. If the interruption lasts longer than 75 seconds, WebMaven will, of necessity, prompt you whether to continue after a re-connection has been made.

In the event that a new connection is not made within the allotted time, you are prompted if WebMaven should terminate. When WebMaven is ended before all of the appropriate files are retrieved, a WebMaven.CHK file is created. the .CHK file allows WebMaven to be restarted from where it left off.

Why does the WebMaven main task appear to stall periodically when retrieving certain sites?

WebMaven has to determine the IP address from the domain name server you have specified in your configuration. Though WebMaven only checks each domain name once, it is still dependent on responses from the domain name server. Therefore, a page with numerous different domain names will have an excessive amount of wait time.

This is the same reason why pages with a lot of active links (non-clickable references like images) load into your browser slowly.

WebMaven reports this time in the Domain Name Server Lookup Time Report which is available with the enterprise version of WebMaven.

Can I specify multiple remote paths on the same domain to be retrieved together?

Yes, with an enterprise registration key you can specify any number of paths on the same, or different, domains to be retrieved together. Continuity of links between the specified paths is preserved. Domain names, with and without leading www. and ftp. are considered synonymous so long as the names resolve to the same IP address.

How does WebMaven handle non-HTML related protocols - specifically mailto:?

In general, all non-HTML protocol tags result in the creation of out of tree pages. Mailto: links are left intact in the localized files. Therefore, mail messages can be created off line if your mail program supports deferred transmission of messages.

How does WebMaven handle forms on a retrieved page?

It depends on the METHOD= attribute in the form. If METHOD=POST, the form submission can only be handled by the server; the URI specified in the ACTION= attribute is dead on the local page.

If METHOD=GET then the URI specified in the ACTION= attribute is handled like any other link.

Why do links that are specified as options in a form selection list (e.g. ) function properly when the remote page is accessed via a browser but WebMaven reports that the files could not be retrieved (HTTP error 403 or 404)?

More than likely the form processing script on the server compensates for the path value specified in the VALUE clause whereas WebMaven references the remote file exactly as it is stated.

This is the result of poor Web design.

Why is there sometimes a difference between where a link takes me with a page that WebMaven has localized vs. the same link when I look at the on-line page?

If a page contains erroneous HTML (e.g. http:/www.abc.com - note the omitted slash) WebMaven, as well as different browsers, may interpret the incorrect HTML differently.

Why does the WebMaven Syntax Error report indicate numerous HTML errors yet the referenced page renders correctly both in the on-line page as well as the page that has been retrieved and localized by WebMaven?

The WebMaven Syntax Error Report includes HTML tags that contain strings which do not comply with the HTML specification. The pages are rendered correctly because of the "forgiving" style in which most popular browsers operate.

In many instances, correcting these HTML errors will result in a page rendering more quickly by your browser. In some cases, correcting the HTML errors (particularly ambiguous paths - i.e. URIs that end with a directory name and no trailing slash) will also eliminate the need for your browser to request the page twice.

Why are there so many INDEX!.HTM files in the directories created by WebMaven?

When an HTML link is specified with a path but no file name (e.g. http://cfsrexx.com/WebMaven/) neither WebMaven nor your favorite browser can identify the file name that will be served. Since WebMaven must have a local file name to save the file, it uses a default value of INDEX!.HTM. Registered copies of WebMaven allow the user to specify any default file name, overall or by site.

Why do I see file names which have an underscore followed by 7 characters inserted before the file extension?

When possible, the same directory and file names used on the server are used by WebMaven (case preserved) when it retrieves files.

When WebMaven encounters a name on a server that is not compatible with the local client file system, it must create a suitable, unique name.

Why do some of the directory names created by WebMaven include the domain name (without a preceding www. or ftp.)?

Out of tree directories include the abbreviated domain name to assure that paths are unique.

Why do I get different results with WebMaven when I specify a textual domain name (e.g. http://disneyland.com) than when I specify the dotted IP address for that domain name (e.g. http://208.218.3.18)?

This has nothing to do with WebMaven; rather, it is the way that Internet routing and Web servers work. To see for yourself, specify the dotted IP address for Disneyland (as shown above) and see what you get for a page. Then, enter the textual domain name (as shown above) and you will wind up at http://www.disney.com/Disneyland/.

Can I retrieve files with WebMaven onto a drive that uses the 8.3 file naming convention?

Yes and no. All of the Window's file systems (FAT16, FAT32, & NTFS) support long file names; however, OS/2 FAT partitions can NOT be used to retrieve Web sites. OS/2 users must specify HPFS partitions in the Local path value when a local path is specified in the Site Properties notebook for WebMaven.

Why does the time stamp (date and time) differ between the server and the retrieved file?

WebMaven converts the server time stamp, specified in GMT, to its local equivalent.

Why do the files created by WebMaven for out of tree links and HTTP errors use a single line feed character (vs. a carriage return / line feed pair) to terminate each line of the HTML file even though I am running on a Windows or OS/2 client?

WebMaven uses the same line termination sequence (either line feed or carriage return / line feed) that was used in the retrieved page.

How does WebMaven handle cookies?

WebMaven processes cookie requests independently from your browser. In other words, WebMaven manages cookies by receiving SET COOKIE data and returning that cookie data when appropriate.

The cookie data collected by WebMaven is not kept from run to run but is detailed in the Cookie Report. The WebMaven Cookie Report requires the appropriate WebMaven registration key.

Can WebMaven process Web sites that use dynamically built URIs?

Unfortunately the answer is sometimes. Dynamically constructed URIs are a very complex issue and can result in URIs that WebMaven cannot process correctly.

WebMaven has a lite JavaScript interpreter built into it that can handle some URIs created within a script. However, if a script file is in a different directory than the page that references it (i.e. ) WebMaven will not be able to resolve the dynamically created URIs

What happens if I retrieve files from a Unix server to a local client where the file system doesn't distinguish case yet preserves the case of file names and directories (i.e. Windows 95/98, NT, OS/2)?

With a WebMaven registration key, WebMaven defaults to preserving case for all retrieved files and directories. The default case preservation setting can be changed for each site. When two different file names or directories are found with the same characters except for case, WebMaven will alter the second and subsequent occurrence of the name with trailing exclamation points (!) until the name is unique. Telling WebMaven to preserve case may result in multiple occurrences of the same file being stored on the local client if the Web site contains the same URIs with differing case. If you are going to retrieve a site known to have a non-case sensitive file system (e.g. Windows NT), space and download time will be saved by specifying non-case sensitive URIs for the site.

Running WebMaven without a registration key causes a file named FileName.DOC and a file named filename.doc  to be considered the same file; and only the first occurrence found by WebMaven will be retrieved.

Can I use WebMaven to verify the links on my own Web site?

Yes, WebMaven creates a bad HTML link report. WebMaven also creates a bad IP address report. This report details any unresolved domain names or IP addresses for the retrieved site.

Both of these reports also have the ability, at the user's option, of e-mailing a copy of the report detail to "Webmaster" at the domain. If "Webmaster" does not appear as an e-mail address within the retrieved files, the most commonly used e-mail address in the retrieved files at the domain will be used as the mailto: address for the reports. With Navigator 3 or later, or Internet Explorer 4.0 or later, this e-mail message is completely constructed by WebMaven. All that is necessary to send the report to the Webmaster, or alternate recipient, is to open the report file (WebMavenDomainException.htm or WebMavenLinksException.htm in the local path directory), click on "E-mail report to ...", and then send it from your mail program.

Note: E-mail programs other than Netscape or Internet Explorer may not support this facility in which case the contents of the report must be cut and pasted into the mail message. The To: and Subject: fields of the mail message should be correct as generated by WebMaven.

Is there any convenient way to see what Java classes WebMaven has retrieved?

Yes, with a WebMaven registration key the Java class report can be generated.

What HTML tags does WebMaven handle?

WebMaven complies with the published HTML 4.0 specification. Therefore, it processes all HTML 4.0 tags which reference URIs. See the HTML tag processing table.

What HTTP client level does WebMaven present to a server?

If WebMaven is unregistered, it will function only as an HTTP/1.0 client. With a WebMaven registration key, the user can select either HTTP/1.1 (default) or HTTP/1.0 globally or per site.

When running as an HTTP/1.1 client, WebMaven is fully compliant with RFC2068 (the HTTP specification).

If I enter my ID and password in the WebMaven site properties notebook, is it exposed to snoopers?

No, WebMaven encodes the password in all of the WebMaven.IPT files. The WebMaven.IPT files represent the settings notebook repository for each respective local path. Though the .IPT files themselves are plain ASCII and can be changed with your favorite editor, the password entry MUST never be changed.

How does WebMaven improve download time?

WebMaven downloading is multi-tasked. That means that while the main task parses the HTML pages, the download tasks retrieve files from the Web server asynchronously.

However, sites that have an exceptionally large number of links to other domains -- particularly domains that do not respond, defeat all of the overlap that WebMaven would normally realize. This delay time is spent communicating with the domain name server used by your ISP. This is one of the reasons that some Web pages, particularly those with a large number of links to banner ads, take so long for your browser to render.

This name server time is reported as a separate value in the property values for the site.

Can I use WebMaven with a proxy server?

All registered versions of WebMaven can be used through a proxy server.

What is the WebMaven.!!! file in the local path directory?

The WebMaven.!!! file is a sentinel that the download subtasks use to determine that the main task is still running. Removing this file causes the subtasks to erroneously terminate.

In the event that WebMaven terminates abnormally, which we hope never happens, the WebMaven.!!! file should have been erased. If it is not, it indicates a malfunction in the WebMaven engine.

Why do I occasionally see the Bad image replacement image on localized pages but not on the live page?

This image replaces graphic files that WebMaven could not retrieve. The main reason why WebMaven can't retrieve an image is because the URI for the image was constructed with either JavaScript or Java.

However, it is still possible for a broken link to appear in pages retrieved by WebMaven if an HTML file is sent by the server in place of the graphic file. This rarely occurs.

Why, on occasion, are there missing images on a retrieved page?

If WebMaven finds a URI constructed with JavaScript and rendered via the document.write or document.writeln  functions, it brackets the JavaScript function with comment indicators.

Why does WebMaven look so complicated? I simply want to retrieve Internet site information.

Quite the contrary -- WebMaven is not complicated at all. Simply install it, specify a local and remote path and push "Start".

The options that are available are intended for the more experienced user, or the novice user after he/she becomes comfortable with WebMaven. Even most enterprise users will run with the default options.

Also, there is both hint text as you navigate around WebMaven along with context sensitive help. Simply give any object of the WebMaven windows the focus (either with your mouse or the tab key) and press F1. A full explanation of the item will be displayed.

Is there anything that WebMaven can't do (within its stated purpose, of course)?

Yes, there are some things that are beyond WebMaven's capability:

WebMaven cannot retrieve some URIs which are dynamically created via JavaScript or Java. groups.