Home Blogwespiva - Web Spider Validator

wespiva — Web Spider Validator

Deutsche Textversion anzeigen

Web Spider Validator, short named wespiva, is a mix of a

  1. Web-Spider (Robot, Crawler) , which traverses between web pages linked together,
  2. and an XHTML-Validator, which proofs whether a page contains valid tags, attributes and allowed attribute-values.

Content

Description

The purpose of this tool is to ensure high-quality standard-complying websites.
With xenu's link sleuth there is a great tool for spidering and finding dead links, but it does not validate a page.
With the w3.org-Validator there is a great validation-tool, but it only checks a single page, and is often overloaded and slow.
The solution to overcome these restrictions is wespiva, which spiders and validates in one rush. This tool assists in the transition of bigger sites to XHTML.

Download

In spite of being programmed not to harm any computer, there is a chance of a crash by accident or programming-error in the application or one of the .NET-functions used by it, which could hurt your system. In order not to be held liable for any negative circumstances resulting of the usage of this program, you may only use the program when accepting the following rules:

  • You backup your system before installing and using it or run it on your own responsibility in a Virtual Machine
  • You will not make me responsible for damages (lost time, crashed computer, etc), if the damage is not provoked intentionally.

Click here to download wespiva Version 1.2010 (100 kb ZIP-File, 2010-01-27)
wespiva Version 0.1.9 (117 kb ZIP-File, 2008-12-19)
wespiva Version 0.1.8 (119 kb ZIP-File, 2008-12-11)
wespiva Version 0.1.7 (115 kb ZIP-File, 2008-08-21)
wespiva Version 0.1.6 (100 kb ZIP-File, 2007-09-14)

Installation

Prerequisites

wespiva runs on Windows with NET Framework 3.5 installed.

How to run

Just unzip the single file in the zip-archive and start using it.

Frequently asked questions for wespiva

Does it run on MONO for Windows?
A special version runs on Mono 2.2, but hangs when resizing the form while wespiva is spidering. The reason is unknown, possibly Mono has some bugs with Windows.Forms and Multithreading. If you don't touch the App until the scan is over, all is well.
Will there be a MONO-Version for Linux/OS-X?
Probably yes—if someone pays for it. If no one would pay for it, there is no big demand for it.
How much pages could be checked in one run?
I've used it to check sites with more than 50.000 elements in less than 15 minutes. The duration depends mainly on the line-speed and responsiveness of the page-delivering web server.
Why Validation?
I'll let others speak here:

Samples

Main page with progress log wespiva main page
Page list wespiva main page
Option dialog wespiva main page
Report sample
Sitemap sample

Features

  • easy to use
  • easy to install (just a single exe file)
  • wespiva could be used in an intranet, no internet needed for validation
  • fast (could check over 50.000 elements in less than 15 minutes)
  • detects dead links
  • finds validation issues
  • generates easy to understand reports
  • generates a sitemap in the standard-sitemap-format
  • could be called per command-line for automated periodically checking of a site
  • Spidering and validation is done in a background-thread, the GUI stays responsive
  • comfortable configuration, for example a grace-period could be set

runnable from command-line

		c:\wespiva.exe "www.wissing.com" "example@example.not"
	

Known Bugs

  • It's not a bug, but a limitation:
    Only well-formed pages are checked. If a page is not well-formed, the reason for the offending error is shown. Please correct these errors first. The spidering is unaffected, but the validation stops at these errors.
  • This version shows not the correct result of the check of external links

Future Features

  • https-Support
  • Online-Version
  • Thumbnail of every web-page and graphic resources
  • checking of inline-Anchor-Hrefs (like #top)

Already done:

  • Multi-Threading (for other than the GUI)
  • JavaScript-Extraction
  • robots.txt conformance
  • Basic Authentication
  • X.509 Certificates
  • Proxy-Support
  • Integration into our CMS
  • Text-Extraction
  • Style/CSS-Extraction

History / Changes

  • 2010-01-27, some minor errors eliminated, extracting JavaScript, Multithreaded validation
  • 2008-12-19, version 0.1.9: robots.txt, parsing error eliminated

Other nice Validators

They are really good, but don't let you check whole sites:

© Christoph Wissing – 7/22/2007, aktualisiert: 1/27/2010