Sitemap Creator 0.2 beta

Sitemap Creator crawls/spiders your website creating XML sitemaps compatible with the standard sitemaps.org protocol supported by Google, Yahoo!, MSN and MoreOver. The script pings Google, Yahoo!, MSN and MoreOver bots to download the sitemap file, then tracks the bot and sends you an email on every scan to your Sitemap and gives you a full report of the Search Engine respond.
Sitemaps are created from a CSV file which could easily be edited using any text editor before creating the sitemap. Sitemap Creator has three built in ranking mechanism which decide priorities of your pages depending on the number and the placement of link backs, crawled first links or URL structure. You can also limit the crawler by memory, run time or number of URLs.

    beta info

  • cURL is not needed anymore, all requests are processed through fsockopen.
  • Big fix to the link back ranking functions.
  • Limit crawler to a number of URLs
  • Disable crawling specific directories or links. Regular expressions are supported
  • limit number of links to show on start page

The script was tested on PHP5, let me know how it worked for PHP4 .
Online demo might be available on the next release.
Download (build 20080514) :
sitemap_creator.tar.gz - sitemap_creator.zip


Tags :

This entry was posted on Thursday, May 15th, 2008 at 4:24 pm and is filed under News, Programs. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

40 Responses to “Sitemap Creator 0.2 beta”

  1. Sitemap Creator 0.2 beta Released « jared.brodsky Says:

    [...] You need a sitemap!  Lucky for you GadElKareem created a little script written in PHP called Sitemap Creator.  So what does it do?  Upon logging into the sitemap admin section and clicking “Crawl [...]

  2. gtnman Says:

    What is the URL you are supposed to submit to google for the sitemap?

  3. wkarim Says:

    @gtnman
    Please use “Add reference to robots.txt" to show you the default sitemap URL, remember to chmod 666 robots.txt .
    it should look something like
    http://www.greatertalent.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz

  4. gtnman Says:

    Interesting note. I have tested this script on two VPS servers both running the latest version of Plesk on two different web hosts. Seems that the unix command utime (which touch()) uses is not available in Plesk. Is touch vital to the script or will is_writable do the same thing? (I made this change, and the robots.txt generated perfectly.)
    Does the cronjob re-generate robots.txt or is this something that the user must do.

  5. gtnman Says:

    Another note, I tested this on my ubuntu box and it worked fine which utime did exist on.

  6. wkarim Says:

    @gtnman
    - touch() is not related to Plesk, you need to have permissions to modify files in that directory, I can not find any relation between touch() function and utime. make sure you 'chmod 777′ data directory.
    - robots.txt are not generated every time the sitemap is created, you only need to modify it once.

  7. abyzn Says:

    hi
    help me
    i see this error:

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

  8. abyzn Says:

    where is the right setting for site

  9. wkarim Says:

    @abyzn
    Can you enable the debug mode from the configuration file and give me the results?

  10. Ferran Says:

    I use your Sitemap Creator for a site and all going OK, show a table with the URL’s and no error messages. But when I click “Create Sitemaps" the XML is mal formed and incorrect. Any suggestion?
    The Site is in ISO-8859-1 and Sitemap in UTF-8…

  11. wkarim Says:

    @Ferran
    can you give me a like to your sitemap xml file?

  12. Ferran Says:

    http://cordigual.com/__sitemap/sitemap.php?do=showsitemap&sm=sitemap.xml.gz

  13. links for 2008-06-18 « Free Open Source Directory Says:

    [...] Sitemap Creator 0.2 beta :: GadElKareem (tags: Sitemap Creator 0.2 beta :: GadElKareem) [...]

  14. wkarim Says:

    @Ferran
    can you change constant SMC_GSS to false to disable GSS and try again?

  15. Ferran Says:

    @wkarim
    I set to false SMC_GSS and the result is the same…
    When I look at CSV or the table of sitemap.php show the links… It may be the codification of the files? I try in my sites in UTF and works perfectly, but when i try in this site that’s in ISO, the xml is corrupt… I don't understand it, the functions are correct and config too…

  16. bispak Says:

    Thanks wkarim, works well using this version.

    I use it on my site tukarinfobispak.com

    anytime search engine robot read my sitemap, email sent to my mailbox inform about their activity.

  17. flobster Says:

    nice script! just one question: i have several urls ending with “?a=page:2″ which i want to exclude from being crawled. Will this be possible using SMC_DISABLED_DIRS?

  18. wkarim Says:

    @flobster
    yes, you can use regular expressions to disable those URLs, examples are included on the config file.
    ex : define('SMC_DISABLED_DIRS', '\?a=page:2′ );

  19. Bo Says:

    hi,
    i get this error:
    WARNING: Page http://mym8.eu/ is redirecting to home.php
    DEBUG: URL “http://mym8.eu/" Blacklisted. Reason : Empty Page

    can you please advice?
    thank you in advance

  20. wkarim Says:

    @Bo
    it should continue crawling the site normally but http://mym8.eu/ will not show on the sitemap as it’s a redirect page

  21. SearleCom Says:

    Installed, crawls fine after logging in via http://thenetradio.co.uk/sitemap/sitemap.php

    Once crawl has finished I press Create Sitemaps and get Please crawl thenetradio.co.uk/index.php first.

    Sevrer is running PHP 5.2.6

    Any ideas?

    Mike

  22. mikey Says:

    Same here i upgrade to the new version and still no luck when i click the Crawl link it displays a number then it does nothing, now if i click any other link it shows that the cache has around 36 elements but nothing else no list nothing, dont know what else to do

  23. mikey Says:

    i been checking the log file and the only error in it is this one:

    [05-Aug-2008 15:06:56] PHP Fatal error: Maximum execution time of 30 seconds exceeded in /home/xxxxxxx/public_html/sitemap/.function.inc.php on line 295

  24. wkarim Says:

    @mikey, SearkeCom
    Please enable debug mode to give me a better idea about what is happening.

  25. king Says:

    I'm running php4 and 0.2b and can not get it to run. I have got to the point where everything runs fine, but google and others returns errors. the below one is saying the file is not present i 20080813 and there is no file

    /hsphere/local/home/mgruppe/forkortelse.dk/sitemap/data/sites/www.forkortelse.dk_sitemaps/20080813 could not be found IP -: http://whois.domaintools.com/66.249.71.79
    Date -: 04:26:44 pm ( Wednesday 13 August 2008 ) Bot -: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Location -: http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz

    if i go to http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz
    it returns that something is wrong with the xml file

    file attibutes should be ok
    Thanks for a nice script!

    BR
    Henrik

  26. wkarim Says:

    @king
    looks like you pinged google with a sitemap then deleted it, try ping google again with the right sitemap name or add it manually on google webmaster tools

  27. Khal Says:

    I installed it on my site and worked fine so far, i created a sitemap and managed to ping all search engines.
    Thanks
    Khal

  28. Stuart Says:

    I have tried this scrip, unmodified on php4 and i get the following message when i run the script:
    WARNING: Connection failed (111) Connection refused
    DEBUG: URL “http://www.tractiontime.co.uk/" Blacklisted. Reason : Empty Page

    The page i presume it’s refering to is index.php and i know its not empty. Can you shed any light on this please?
    MAny thanks

  29. wkarim Says:

    @Stuart
    please check your connection from the server running the script to http://www.tractiontime.co.uk, check what you get with < ?php echo file_get_contents( 'http://www.tractiontime.co.uk/'); ?>

  30. Stuart Says:

    Hi,
    I created a new file within the sitemap directory called test.php and copied in the above short script, removing the space after the first < in your example. The results are as follows:
    Warning: file_get_contents(http://www.tractiontime.co.uk/) [function.file-get-contents]: failed to open stream: Connection refused in /home/sites/tractiontime.co.uk/public_html/sitemap/test.php on line 1

  31. wkarim Says:

    @Stuart
    Thre must be something wrong with your connection

  32. jorge Says:

    When I enter to my page energyenhancement.org/sitemap_creator/sitemap.php they ask me for a loggin.
    Where can I find it?

  33. wkarim Says:

    @Jorge
    the password is 'demopass'
    change it to any password you like in the config file

  34. Antonimo Says:

    After crawling the site, the page displays all of the urls etc. At the end of the page is the URL to add to the crontbab.

    The page then shows a link to “Create Sitemaps" - The sitemap has a date stamp. Is the XML sitemap created automatically when it is run from the crontab? Does it have a new time stamp?

    If the Ping is enabled in the config file, will the correct sitemap be “pinged" to the search engines?

  35. wkarim Says:

    @Antonimo
    Yes, all sitemaps are time stamped and the ping is for the newest created one.
    if the request does not contain a time stamp then the script will send the most recently created sitemap

  36. Antonimo Says:

    @wkarim
    Thanks for the quick response.
    Cron set up - looking forward to seeing the results.
    Excellent script - kudos

  37. Antonimo Says:

    Hi again,

    I am having great difficulty getting the cron to work.

    I have other crons that run fine using the command /usr/bin/php -q /home/DOMAIN/public_html/BackUp.php (for example)

    Whe I try to run the sitemap creator from cron, (/usr/bin/php -q /home/DOMAIN/public_html/sitemap/sitemap.php?do=createsitemap) I receive an error, “No input file specified"

    Is there a special way to do this or can I call the “do" script from another php file?

  38. wkarim Says:

    @Antonimo
    The script is not designed to run from command line, alternatively you can use lynx with crontab…the command should look something like
    lynx -dump http://example.com/sitemap/sitemap.php?do=createsitemap&hash=blah

  39. Dave Says:

    This looks great but idea why I get this error and one one page ?

    Warning: Division by zero in /myhost/public_html/sitemap/.function.inc.php on line 320

    Finished crawling mydomain.com, Crawled 1 links

  40. wkarim Says:

    @Dave
    I can not investigate the problem without knowing your domain, you might want to make sure no redirects are made

 

Leave a Reply


 Top