Sitemap Generator – Settings
Sitemap Generator buttons, commands and tabs…
The “Settings” window allows you to configure program behavior.
“Max.file size” is the maximal filesize accepted. Every target file (page) bigger than this will be skipped and will be labeled as “failed”
“Max.URL length” — this is the maximal size of the URL in characters. URLs longer than this will be skipped.
“Server Response Timeout” — this is the max.time that Sitemap Generator will wait for the target server to respond
“Delay between pages” — this says the program how long to wait before downloading the next page (gives you a way to control the server load)
The “www” checkbox at the bottom, if checked, will instruct the program to crawl both “www.domain.com” and “domain.com” as the same site. Usially the “www.” prefix points to the same site, but because “www.” is in fact a subdomain, it may be separate site. If so – check this box to tell the program that these are different sites.
You can set additional delay after some amount of page, for example, you can set the generator to stop for 5 seconds after every 100 pages downloaded. This is just another server-load control feature (put larger delays or make these stops often to decrease server load (slower scanning)
Stop After XXX pages — use this to increase the max. number of pages scanned. Don’t set the limit over 50 000 pages — the tool will crash and google will not accept the sitemap file produced (because they limit the number of pages in one sitemap to 50 000 links)
- Google (XML) Settings
Here you can define additional tags for the google sitemap file.
These additional attributes are not mandatory — google will crawl your site in it’s default way — but if you want you can set these here. Just note that you will set the same attributes for all the pages in the sitemap.
Use this combo-box to set the maximal number of simultaneous connections. In most cases this will speed-up the process of crawling, however – the target server will receive more concurrent conections which may be potential problem (may slow-down the server speed, unless you have a dedicated server hosting your site). In general – use this option only AT YOUR OWN RISK.
“Clear Cache” button
The “cache” is temp folder where the program stores all crawled (downloaded) files since the program start. On exit Sitemap Generator will delete these files. They exist only to make the crawling faster — for example, if you click the “Stop” button and then restart scanning the site, the second time crawler will get the files crawled on the first start from cache, without downloading them again. “In case you want to re-download all pages from the server, just click the “Clear Cache” button.