Sitemap Creator 0.2 beta
Sitemap Creator crawls/spiders your website creating XML sitemaps compatible with the standard sitemaps.org protocol supported by Google, Yahoo!, MSN and MoreOver. The script pings Google, Yahoo!, MSN and MoreOver bots to download the sitemap file, then tracks the bot and sends you an email on every scan to your Sitemap and gives you a full report of the Search Engine respond.
Sitemaps are created from a CSV file which could easily be edited using any text editor before creating the sitemap. Sitemap Creator has three built in ranking mechanism which decide priorities of your pages depending on the number and the placement of link backs, crawled first links or URL structure. You can also limit the crawler by memory, run time or number of URLs.
- beta info
- cURL is not needed anymore, all requests are processed through fsockopen.
- Big fix to the link back ranking functions.
- Limit crawler to a number of URLs
- Disable crawling specific directories or links. Regular expressions are supported
- limit number of links to show on start page
The script was tested on PHP5, let me know how it worked for PHP4 .
Online demo might be available on the next release.
Download (build 20080514) :
sitemap_creator.tar.gz - sitemap_creator.zip


May 16th, 2008 at 6:11 pm
[...] You need a sitemap! Lucky for you GadElKareem created a little script written in PHP called Sitemap Creator. So what does it do? Upon logging into the sitemap admin section and clicking “Crawl [...]
May 17th, 2008 at 12:57 am
What is the URL you are supposed to submit to google for the sitemap?
May 17th, 2008 at 1:09 am
@gtnman
Please use “Add reference to robots.txt" to show you the default sitemap URL, remember to chmod 666 robots.txt .
it should look something like
http://www.greatertalent.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz
May 19th, 2008 at 6:09 pm
Interesting note. I have tested this script on two VPS servers both running the latest version of Plesk on two different web hosts. Seems that the unix command utime (which touch()) uses is not available in Plesk. Is touch vital to the script or will is_writable do the same thing? (I made this change, and the robots.txt generated perfectly.)
Does the cronjob re-generate robots.txt or is this something that the user must do.
May 19th, 2008 at 6:13 pm
Another note, I tested this on my ubuntu box and it worked fine which utime did exist on.
May 21st, 2008 at 1:44 am
@gtnman
- touch() is not related to Plesk, you need to have permissions to modify files in that directory, I can not find any relation between touch() function and utime. make sure you 'chmod 777′ data directory.
- robots.txt are not generated every time the sitemap is created, you only need to modify it once.
May 27th, 2008 at 4:21 pm
hi
help me
i see this error:
No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host
May 27th, 2008 at 4:22 pm
where is the right setting for site
May 28th, 2008 at 11:14 pm
@abyzn
Can you enable the debug mode from the configuration file and give me the results?
June 13th, 2008 at 2:54 pm
I use your Sitemap Creator for a site and all going OK, show a table with the URL’s and no error messages. But when I click “Create Sitemaps" the XML is mal formed and incorrect. Any suggestion?
The Site is in ISO-8859-1 and Sitemap in UTF-8…
June 14th, 2008 at 5:13 am
@Ferran
can you give me a like to your sitemap xml file?
June 17th, 2008 at 9:43 am
http://cordigual.com/__sitemap/sitemap.php?do=showsitemap&sm=sitemap.xml.gz
June 18th, 2008 at 2:35 am
[...] Sitemap Creator 0.2 beta :: GadElKareem (tags: Sitemap Creator 0.2 beta :: GadElKareem) [...]
June 18th, 2008 at 9:19 am
@Ferran
can you change constant SMC_GSS to false to disable GSS and try again?
June 18th, 2008 at 11:06 am
@wkarim
I set to false SMC_GSS and the result is the same…
When I look at CSV or the table of sitemap.php show the links… It may be the codification of the files? I try in my sites in UTF and works perfectly, but when i try in this site that’s in ISO, the xml is corrupt… I don't understand it, the functions are correct and config too…
July 12th, 2008 at 12:10 pm
Thanks wkarim, works well using this version.
I use it on my site tukarinfobispak.com
anytime search engine robot read my sitemap, email sent to my mailbox inform about their activity.
July 17th, 2008 at 1:30 am
nice script! just one question: i have several urls ending with “?a=page:2″ which i want to exclude from being crawled. Will this be possible using SMC_DISABLED_DIRS?
July 17th, 2008 at 2:04 am
@flobster
yes, you can use regular expressions to disable those URLs, examples are included on the config file.
ex : define('SMC_DISABLED_DIRS', '\?a=page:2′ );
July 24th, 2008 at 5:45 am
hi,
i get this error:
WARNING: Page http://mym8.eu/ is redirecting to home.php
DEBUG: URL “http://mym8.eu/" Blacklisted. Reason : Empty Page
can you please advice?
thank you in advance
July 24th, 2008 at 8:39 am
@Bo
it should continue crawling the site normally but http://mym8.eu/ will not show on the sitemap as it’s a redirect page
July 31st, 2008 at 9:34 pm
Installed, crawls fine after logging in via http://thenetradio.co.uk/sitemap/sitemap.php
Once crawl has finished I press Create Sitemaps and get Please crawl thenetradio.co.uk/index.php first.
Sevrer is running PHP 5.2.6
Any ideas?
Mike
August 5th, 2008 at 10:01 pm
Same here i upgrade to the new version and still no luck when i click the Crawl link it displays a number then it does nothing, now if i click any other link it shows that the cache has around 36 elements but nothing else no list nothing, dont know what else to do
August 5th, 2008 at 10:13 pm
i been checking the log file and the only error in it is this one:
[05-Aug-2008 15:06:56] PHP Fatal error: Maximum execution time of 30 seconds exceeded in /home/xxxxxxx/public_html/sitemap/.function.inc.php on line 295
August 6th, 2008 at 5:51 pm
@mikey, SearkeCom
Please enable debug mode to give me a better idea about what is happening.
August 13th, 2008 at 4:40 pm
I'm running php4 and 0.2b and can not get it to run. I have got to the point where everything runs fine, but google and others returns errors. the below one is saying the file is not present i 20080813 and there is no file
/hsphere/local/home/mgruppe/forkortelse.dk/sitemap/data/sites/www.forkortelse.dk_sitemaps/20080813 could not be found IP -: http://whois.domaintools.com/66.249.71.79
Date -: 04:26:44 pm ( Wednesday 13 August 2008 ) Bot -: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Location -: http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz
if i go to http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz
it returns that something is wrong with the xml file
file attibutes should be ok
Thanks for a nice script!
BR
Henrik
August 14th, 2008 at 12:06 am
@king
looks like you pinged google with a sitemap then deleted it, try ping google again with the right sitemap name or add it manually on google webmaster tools
September 8th, 2008 at 8:15 am
I installed it on my site and worked fine so far, i created a sitemap and managed to ping all search engines.
Thanks
Khal
September 17th, 2008 at 2:49 pm
I have tried this scrip, unmodified on php4 and i get the following message when i run the script:
WARNING: Connection failed (111) Connection refused
DEBUG: URL “http://www.tractiontime.co.uk/" Blacklisted. Reason : Empty Page
The page i presume it’s refering to is index.php and i know its not empty. Can you shed any light on this please?
MAny thanks
September 18th, 2008 at 8:46 am
@Stuart
please check your connection from the server running the script to http://www.tractiontime.co.uk, check what you get with < ?php echo file_get_contents( 'http://www.tractiontime.co.uk/'); ?>
September 18th, 2008 at 10:38 am
Hi,
I created a new file within the sitemap directory called test.php and copied in the above short script, removing the space after the first < in your example. The results are as follows:
Warning: file_get_contents(http://www.tractiontime.co.uk/) [function.file-get-contents]: failed to open stream: Connection refused in /home/sites/tractiontime.co.uk/public_html/sitemap/test.php on line 1
September 23rd, 2008 at 10:11 pm
@Stuart
Thre must be something wrong with your connection
October 8th, 2008 at 2:45 pm
When I enter to my page energyenhancement.org/sitemap_creator/sitemap.php they ask me for a loggin.
Where can I find it?
October 11th, 2008 at 3:34 am
@Jorge
the password is 'demopass'
change it to any password you like in the config file
October 23rd, 2008 at 9:58 am
After crawling the site, the page displays all of the urls etc. At the end of the page is the URL to add to the crontbab.
The page then shows a link to “Create Sitemaps" - The sitemap has a date stamp. Is the XML sitemap created automatically when it is run from the crontab? Does it have a new time stamp?
If the Ping is enabled in the config file, will the correct sitemap be “pinged" to the search engines?
October 23rd, 2008 at 10:11 am
@Antonimo
Yes, all sitemaps are time stamped and the ping is for the newest created one.
if the request does not contain a time stamp then the script will send the most recently created sitemap
October 23rd, 2008 at 10:56 am
@wkarim
Thanks for the quick response.
Cron set up - looking forward to seeing the results.
Excellent script - kudos
October 31st, 2008 at 11:17 am
Hi again,
I am having great difficulty getting the cron to work.
I have other crons that run fine using the command /usr/bin/php -q /home/DOMAIN/public_html/BackUp.php (for example)
Whe I try to run the sitemap creator from cron, (/usr/bin/php -q /home/DOMAIN/public_html/sitemap/sitemap.php?do=createsitemap) I receive an error, “No input file specified"
Is there a special way to do this or can I call the “do" script from another php file?
October 31st, 2008 at 4:38 pm
@Antonimo
The script is not designed to run from command line, alternatively you can use lynx with crontab…the command should look something like
lynx -dump http://example.com/sitemap/sitemap.php?do=createsitemap&hash=blah
November 19th, 2008 at 10:44 pm
This looks great but idea why I get this error and one one page ?
Warning: Division by zero in /myhost/public_html/sitemap/.function.inc.php on line 320
Finished crawling mydomain.com, Crawled 1 links
November 20th, 2008 at 1:17 am
@Dave
I can not investigate the problem without knowing your domain, you might want to make sure no redirects are made