WebMaven

WebMaven was a commercial computer program currently under development by Dick Goran, President of CFSNevada, Inc. Its general purpose is to serve as an off-line imaging and management facility for the Internet or intranets.

WebMaven captures and copies all or portions of publicly available material from locations on the World Wide Web (Web sites), or from a private intranet, to the user's local computer. At the same time, WebMaven "localizes" all of the HTML links to their respective values in the copied files. WebMaven is available in both a consumer edition and an enterprise edition. A FAQ sheet and the technical specifications for WebMaven are available for review.

WebMaven captures a remote Web site based on the site's "tree" structure. HTML links which are deemed to be out of tree are replaced with a link to an a page indicating the out of tree reference. WebMaven does not leave remote links in the localized pages. Therefore, unlike the results from similar products, there are no broken links in the WebMaven localized files.

WebMaven has a built-in JavaScript pre-processor and a Java virtual machine interpreter so Web sites containing either are handled by WebMaven without regard to your operating system software or your preferred Web browser.

WebMaven has no browser dependencies. You use your browser of choice to view the reports created while processing a Web site.

The enterprise edition of WebMaven serves as a powerful management tool for anyone responsible for the maintenance of a Web site, whether the site is large or small. WebMaven produces numerous management reports. A summary of these reports, along with sample copies of the individual reports, is available for review.

An uncrippled, but limited feature, version of WebMaven will be available via anonymous download when the product becomes available.

WebMaven Advantages
There are drawbacks to reading material which is published electronically on the Internet. The user must be connected to the Internet via a modem or some other kind of connection in order to retrieve the material. That means that the connection must be maintained all of the time while the user is reading the material.

Similarly, downloading of Internet documents is single-threaded (except for the overlap of included images provided by the common browsers today (Netscape Navigator / Communicator, Microsoft Internet Explorer, etc.).

WebMaven overcomes this by allowing up to nine download processes to run concurrently thus optimizing the throughput between the user's computer and the Internet. WebMaven keeps the "pipe" full at what ever speed you are connected to the Internet. This is of particular advantage to users who are billed for connect time based on hours used.

There can also be delays in retrieving subsequent pages after the user has finished with the current page. Internet users find this delay especially frustrating causing them to abandon many Web sites.

Since multiple instances of WebMaven can run concurrently, if a server for a particular site is temporarily slow in responding, the other instances of WebMaven will keep Internet transmission going at maximum speed.

WebMaven Uses

 * Off line browsing:By downloading Web sites with WebMaven you eliminate the delays associated with slow connections. There is no "world wide wait".
 * If you are concerned about your kids having unrestricted access to the Internet, download Web sites for them and they have unfettered access to the material including all of the text, graphics, and multimedia files without being connected to the Internet.
 * If you have a requirement to view all of the images included in a site, WebMaven creates an Image Group report that shows all of the images that were refenced on the retrieved site.
 * If you have employees that have a need to access Internet material but don't want them surfing the net, downloading the material with WebMaven provides access to the material without being connected.
 * Have you ever saved Internet pages to your local hard drive for later reference only to find, when you view the local files, that they contain numerous broken links?
 * WebMaven eliminates this problem by retrieving all of the referenced non-HTML files regardless where they exist. However, WebMaven doesn't wander "out of tree" for referenced HTML files. Therefore, you don't wind up downloading megabytes worth of unwanted files.
 * Groups like libraries and classrooms with limited Internet access that need specific Web material for teaching purposes can use WebMaven to make the material available as local files on computers that are not connected to the Internet.


 * Distribution:Material created for browser viewing can be copied to media, such as CD-ROMs, for non-electronic distribution (e.g. postal mailing) without incurring the cost of separate maintenance and programming.
 * Catalogs, technical manuals, and other documentation can be delivered for disconnected viewing.
 * Businesses that would like to use their Web site as an advertising vehicle for computer owners that are not connected to the Internet can use WebMaven for this function.


 * Disconnected viewing:If you have a need to access Internet or intranet material without having a network connection, download the material with WebMaven and access it from your local hard drive.
 * For example, if you need access to Internet or intranet material on a laptop that you use without being connected, WebMaven will download the files for you and you don't have to worry about missing files.


 * Archiving:
 * Every business and organization maintains voluminous archives of financial and other business records. Yet, they probably don't have any archives of their Web site. With Web sites becoming an ever-increasing "business document", WebMaven is the ideal tool for archiving your Web site. Unless your Web site is exceptionally large, you can burn a CD-ROM of the Web site files that WebMaven retrieves and have future access to the site material as it existed at the time the archive was created. Even backing up the files on your site is not as good as a WebMaven archive since the odds are quite good that the tree structure of the Web site will change. Furthermore, in order to reference the backed-up files you would have to have a spare Web server available to restore the backups to.
 * The legal community can archive a Web site for evidentiary use. Regardless of the side of an issue you represent, a WebMaven archive of a Web site gives you an operational replica of the Web site.

Legal statutes may prohibit downloading of certain material. Caution is advised on the reuse of copyrighted material.
 * Unattended access:If you want to download collections of images from a Web site, WebMaven will download the images for you without you having to watch your modem lights flash.
 * Similarly, if you want to download a collection of large files, for example sound or movie files, let WebMaven grab those files for you without you having to be at your computer.


 * Selective retrieval:Suppose you want just ShowBiz Today from CNN's site, WebMaven will retrieve just the "tree" that you specify as a remote path without you having to wade through other files.
 * Local file processing:Have you ever wished that you could process the contents of a Web site with your own tools? You can use WebMaven to retrieve the remote files and process them with your local programs.
 * Site management:Use WebMaven to verify that all of your Internet links are valid. WebMaven identifies all bad HTML links.
 * Use WebMaven to identify dead HTML links on your "Links" pages. WebMaven lists all domains that cannot be referenced. By removing these "dead" links, you speed up a browser's rendering of your pages.
 * If your Web pages contain a lot of graphics, most Web browsers will render your pages more quickly if the physical dimensions (height and width) of the graphic images are correctly specified in your HTML documents. With the WebMaven Image Exception report you can quickly identify any image references that contain incorrect or omitted values.
 * Most of the current Web browsers provide a "mouse over", or bubble pop up, for all of the images on a Web page. The value that is displayed comes from the ALT="" value specified for the image. The WebMaven image exception report will identify all image references where the ALT="" value has been omitted.
 * Also, the WebMaven Title report will display all of these ALT="" values and identify the pages where the images are referenced.

WebMaven as a Webmaster's Tool
The enterprise edition of WebMaven is a powerful tool for anyone who has the responsibility of maintaining a Web site.

WebMaven is the Webmaster's dream when it is necessary to move a Web site from one domain to another. With WebMaven's localize function (described below), Web pages containing complete URIs within a specified domain will be converted to relative URIs. Localize can be set to leave full URIs on other domains intact.

WebMaven produces numerous reports (samples) that assist the Webmaster in maintaining a Web site.

In addition to the [Unresolved Domain Names], [Links Exception Report], [Syntax Error Report] and produced by all editions of WebMaven, the enterprise edition provides the reports shown in the following table to assist the Webmaster in managing his or her site. All of the reports are shown in the [WebMaven Summary Report] with footnotes (though they appear on the side) about the contents of each report.

Since some of these reports can be excessively large, the content of the large reports is also produced as delimited ASCII files which can be imported in programs of the user's choice.

The [Unresolved Domain Names] and [Links Exception Report], which are the only reports produced by an unlicensed copy of WebMaven, contain a link to e-mail the content of each respective report to the Webmaster of the site that was retrieved. If "Webmaster" does not appear as an e-mail address within the retrieved files, the most commonly used e-mail address in the retrieved files at the domain will be used as the mailto: address for the reports. Consistent with the spirit of Internet privacy, both Netscape Navigator and Internet Explorer will prompt for permission before actually sending the report.

The included reports are from an actual site retrieved with WebMaven. All of the files from the site are also included, but localized to their current domain and path. Some of the report files are large and may be slow to download. The [SampleReports.zip file] (5.2 MB) contains all of the files included in the SampleReports and subordinate directories and can be downloaded.

Functions
A Q & A FAQ sheet, along with the WebMaven technical specifications, is available for review.

Localize
The "localize" capability is one of the unique features of WebMaven. If a Web site page is copied to the user's computer with currently available means, graphics, sound, and text images that are referenced on the Web page must also be copied otherwise, the local copy is rendered with portions omitted (broken links). While these related images or files can also generally be copied, they must be retrieved one-by-one.

Even after they are copied to the local computer, they may not be rendered properly by the user's Internet browser since the references to them (HTML links) in the original document may be described in a way that references their original location on the Internet rather than the local copy. The localize capability of WebMaven overcomes this. All HTML links in the local copies of the retrieved documents are converted, where necessary, so that they are rendered properly when the local copy of the Web page is viewed.

Another unique capability of WebMaven is its ability to retrieve the operative portions (class files) of Java-based Web applications. WebMaven not only downloads all class files explicitly referenced on the Web site, it also retrieves all of the other class files required by a Java applet or object. Where possible, any Java-related data files are also retrieved.

The localized documents/files, including the related images, can be compressed (zipped) by the user for local archiving. When the archive is decompressed (unzipped), the integrity of the retrieved documents remains intact regardless of the status, or the presence or absence, of the original Web site documents.

Chron
While one of the purposes of the Internet is to provide timely dissemination of information, there are Web sites that are updated at frequent intervals. Some are even built dynamically for each individual request. This leads to the requirement that some sites be viewed on a regular basis at the risk of missing changing information which is not archived by the Web site owner.

The "chron" capability of WebMaven allows the program to log onto the Internet at regularly or irregularly scheduled dates and times and retrieve specified Web site documents. This feature allows dynamically changing Web sites to be copied to the local computer in the user's absence. These Web documents can then be viewed at the reader's leisure.

One of the best examples of a use for WebMaven is the retrieval of a multi-page document from the Internet like a white paper or book. These documents are typically published in a tree-like structure with the first page branching off into one or more subordinate branches. WebMaven has the ability of retrieving all of the subordinate pages within the tree structure that comprise the complete document.

Another feature of WebMaven is the ability to download desired information from a Web site and store it on a computer's disk. Portable computers such as laptops and notebooks can be used to view this information without being connected to the Internet.

There are many other day-to-day uses for this program (many of which the author hasn't thought of yet).

Inherent with all of the applicable uses of the program is its ability to optimize the transfer speed of the material from the Internet to the user's computer. Furthermore, it runs unattended so that the material can be gathered (downloaded) while the user is away from his or her computer.

Operating Environment
WebMaven is applicable for both the individual Internet surfer as well as the corporate Internet user. It will run on any personal computer running Windows 95 or Windows NT from Microsoft, or OS/2 from IBM.

WebMaven is both CPU and memory intensive when retrieving large Web sites. A minimum of a 133 Mhz Pentium CPU is recommended for both the personal and enterprise versions of WebMaven with faster CPUs recommended.

The personal version of WebMaven requires a minimum of 32 MB of memory with 64 MB recommended. The enterprise version requires a minimum of 64 MB of memory with 128 or 256 MB recommended.

Hard drive requirements are dependent on the Web sites being retrieved.

Competition
There are no known programs available that contain all of the functions included in WebMaven. While there are other programs whose purpose is to capture or mirror Web sites, they do not contain many of the technological capabilities of WebMaven. WebMaven maintains the integrity of the path (tree) structure from the original Web site along with the original names unless they conflict with the client file system. WebMaven does not encode the files it downloads making the material viewable by any browser, and portable without conversion or "export".

WebMaven processes Web sites created with CGI scripting as well as sites that use redirection, scripting, and Java applets. WebMaven has the ability to process each Java class like it is processed by the Java virtual machine and is therefore able to retrieve all associated classes and most data files referenced by these classes.

The only restriction on what WebMaven can process is HTML links that are dynamically generated through scripting.

Distribution
Initial product introduction and distribution is planned via Internet and catalog mail order sites.

A trial version of the program will be made available via the Internet for worldwide distribution (anonymous download) at no cost to the user. The trial version will have some of its features limited in either scope or size.

Full functionality of the trial version can be enabled by the user's purchasing and installing an electronic "key" for the program. The key will be available from established distribution sources.