Topics

Sitemap Creator 0.2a: Create sitemaps 09 valid for Google, Yahoo, MSN, Ask.com and moreover sitemaps

New Sitemap Creator Beta available
Creator 0.2a is different from version 0.1, The now is able to / your website, your sitemaps, ping Google, Yahoo, , Ask.com, moreover.com with the location of your sitemaps and send you alerts by email when sitemaps are created or crawled by the bot. The crawler saves sitemaps data into an easy to edit CSV file.

(build 20070109) :
sitemap_creator.tar.gzsitemap_creator.zip

  • Pingback: Sitemap Creator 0.1 : Create Sitemaps 0.9 valid for Google, Yahoo! and MSN Sitemaps :: GadElKareem()

  • Has anyone got this to work? Have spent several hours on it. No luck, can not get it to redirect, I think.

  • @Jerry : Can you describe the error?

  • Mike

    I’ve also tried with no success (although I’m not proficient with PHP). Are there more detailed instructions available?

  • I get:
    Parse error: syntax error, unexpected ‘=’, expecting ‘)’ in /home/hqcodec/public_html/.function.inc.php on line 99

  • @Flash Buddy:
    what PHP version are you using?
    a fast answer would be to remove the ‘&’ before the ‘$val’ on that line.
    Let me know if it worked

  • Chris

    Had the same error as Jerry, removed the “&” in various lines with errors.
    Now the program starts but if I click on “Crawl” I get the following error: “Call to undefined function: curl_setopt_array() in /www/htdocs/v031207/.function.inc.php on line 218”
    My php-version: 4.4.8

  • @Chris
    you either should recompile PHP with curl library support, or change ‘SMC_USE_CURL’ on the configuration file to false.

  • The script is tested against PHP v5.2.5
    However, I tried to make it as compatible as possible with previous versions of PHP
    the error
    ‘Parse error: syntax error, unexpected ‘=’, expecting ‘)’ in /.function.inc.php on line 99’ is due to using a default value on a variable passed by reference which is not supported on PHP 4.2.2

    Solution :
    download Sitemap Creator 0.2 for 4.2.2
    fixed on the new build

  • nit

    after i insall and edit the config file and go to web and enter the password i put in the config file I get this;

    Warning: Cannot modify header information – headers already sent by (output started at /home/osgames/public_html/sitemap/sitemap/sitemap.php:54) in /home/osgames/public_html/sitemap/sitemap/.function.inc.php on line 703

    when i click on the crawl site link it puts me back to the password screen and it just goes on and on..

    php -v
    PHP 5.2.1 (cli) (built: Feb 23 2007 08:00:24)
    Copyright (c) 1997-2007 The PHP Group
    Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
    with Zend Extension Manager v1.2.0, Copyright (c) 2003-2006, by Zend Technologies
    with Zend Optimizer v3.2.2, Copyright (c) 1998-2006, by Zend Technologies

    mysql is 4.1.22

    thanks..

  • nit

    forgot to add;

    the redirect doesnt work too. it loads for a moment then its a blank page..

    every link puts me back to the login screen

  • @nit
    Thank you for reporting this error, I’m not sure how I didn’t get that on my server!
    a new build is available with all reported errors fixed

  • nit

    np thx for this nice script..

  • i don’t know what i have done but i get lots of errors:

    Warning: Division by zero in C:\xampp\htdocs\New Folder\sitemap\.config.inc.php on line 52

    Warning: Cannot modify header information – headers already sent by (output started at C:\xampp\htdocs\New Folder\sitemap\.config.inc.php:52) in C:\xampp\htdocs\New Folder\sitemap\sitemap.php on line 49

    Warning: file(C:\xampp\htdocs\New Folder\sitemap/data/sites/) [function.file]: failed to open stream: No such file or directory in C:\xampp\htdocs\New Folder\sitemap\.function.inc.php on line 570

    Warning: Invalid argument supplied for foreach() in C:\xampp\htdocs\New Folder\sitemap\.function.inc.php on line 572

    i dont know whats happining can anyone help me

  • Please re download and extract the files and run without any editing, it appears that you’ve added unneeded characters in the config file

  • Hi,
    How do I exclude the directroies I don’t wan’t to be listed, crawled BEFORE running the script and where is the blacklist file or directory?

  • @Reiner
    Current version of Sitemap Creator 0.2a does not have exclude directory feature, you might edit the created CSV file with any text editor to do that.
    blacklist directory path depends on your configuration, default is sitemap/data/errors/ or you can change it through config file by changing SMC_DATA_ERRORS

  • shaun

    I click crawl site. and it says crawling please wait and after 4 seconds I get this error.

    Fatal error: Call to undefined function: stripos() in /home/shaun/public_html/site/sitemap/.function.inc.php on line 233

  • shaun

    ahh I’m running php 4.4.8. stripos is for php 5. I changed stripos to strpos and now it is running how ever it isn’t going to be case insensitive now. Not sure what the result will be but it is running.

  • @shaun
    yeah sorry for that one…you can use this instead
    strpos(strtolower($header[‘content_type’]), ‘text’)
    I guess the script might need PHP5 to work, anyway let me know if it worked with this fix

  • Does this thing works???

  • @Mafiozy
    yes it’s working

  • Hello,

    i need some help to install this script.
    http://www.webynux.net/sitemap.php is a blank page …

    i think i’ve problem to configure the .config.inc.php.

    Could you please help me ??

  • you need to access the sitemap through
    /sitemap/sitemap.php
    I think you have disabled error reporting from your php.ini file, you need to enable that to check what error you have.
    also, please let me know your PHP version

  • Warning: tempnam() [function.tempnam]: open_basedir restriction in effect. File(/tmp) is not within the allowed path(s): (/home/www/tmp/:/usr/local/lib/php:/usr/local/bin:/home/www/pfff) in /home/www/pfff/www/sitemap/.function.inc.php on line 87

    Warning: curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /home/www/pfff/www/sitemap/.function.inc.php on line 218
    ERROR6: Couldn’t resolve host ‘wwwwebynuxnet’ for URL http://wwwwebynuxnet/

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

    why the “.” are note between www, webynux and net ?

  • Hello,

    i just solve the matter by uploading the config file without any change BUT now, my website can’t be crawled …

    here’s the error message:

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

    but idon’t know how to set the domain …

    Please help 😉

  • @pfff
    I think you do not have enough privileges to run curl on your server, please disable CURL :
    define(‘SMC_USE_CURL’, false);
    Then clean your cache and blacklist folders as the index page might has been blacklisted

  • Hi,
    Get the following message:

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

    I’ve index.php in the root folder which redirects to http://www.example.com/md/index.php

    index.php

    any idee what goes wrong here?

  • some additional info:
    CURL :
    define(’SMC_USE_CURL’, false);
    Then clean your cache and blacklist folders as the index page might has been blacklisted

    I did this also.

  • @Ruud
    you have your domain set with www at the beginning while no link on the pages begins with ‘www’ The script is able to crawl up-level sub-domains but not the reverse.
    Set your domain without ‘www’

  • webmaster

    Warning: Cannot modify header information – headers already sent by (output started at /var/www/virtual/luvshades.com/htdocs/sitemap.php:54) in /var/www/virtual/luvshades.com/htdocs/.function.inc.php on line 703

    When I try any function, it goes back to sign-in page.

  • @webmaster
    please download the new build, it should fix this problem

  • okay removed the www now still get this:
    NOTICE: Document type is text/html for URL http://example.com/md/index.php
    NOTICE: Document type is text/html for URL http://example.com/
    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

  • @Ruud
    please replace the files with the original ones and empty cache and error folders.
    That’s not a possible error!

  • webmaster

    Hi,

    I have downloaded the most recent script and still get the same error.

  • @webmaster
    can you copy and paste line 54 on sitemap.php

  • webmaster

    script language=”javascript” type=”text/javascript

    I had to remove the brackets to post.

  • @wbmaster
    that’s not the same line on the last build, please re download.

  • please replace the files with the original ones
    What do you mean with this?
    I changed everywhere the http://www.example.com into http://example.com

    cache and errors empty, result the same

  • webmaster

    Hi,

    Now I get:

    Warning: tempnam() [function.tempnam]: open_basedir restriction in effect. File(/tmp) is not within the allowed path(s): (/var/www/virtual/anysite.com/:/usr/share/php/:/tmp/) in /var/www/virtual/anysite.com/htdocs/sitemap/.function.inc.php on line 87

    Warning: file_exists() [function.file-exists]: open_basedir restriction in effect. File(/smc_cookies) is not within the allowed path(s): (/var/www/virtual/anysite.com/:/usr/share/php/:/tmp/) in /var/www/virtual/anysite.com/htdocs/sitemap/.function.inc.php on line 143

    Also, I have two questions:

    1. I see a folder with sitemaps. Is there supposed to be a seperate sitemaps.php for the root and then a seperate folder called sitemap.

    2. The script keeps stating that my “data” folder may not exist or might not be writable. However, it does exist and is writable.

  • @Ruud
    Remove all edited php files and replace them with the ones from the recent build.
    Do not add ‘http://’ at the beginning of your domain

  • @webmaster
    For the warnings you weather need to fix permissions or disable the use for curl from the config file.

    1. Yes, there are two sitemap.php files, and you are supposed to use the one inside the sitemap folder
    to extract directly on your server do
    tar xzf sitemap_creator.tar.gz

    2. same as previous

  • I use your last build
    Sitemap Creator 0.2 alpha build 20080109

    and use this in config.inc.php
    define(‘SMC_SITE’, $_SERVER[‘HTTP_HOST’]);

    if ( isset($header[‘content_type’]) && strpos(strtolower($header[’content_type’]), ‘text’) === false ){
    _error(“Document type is {$header[‘content_type’]} for URL {$url}“);

    after this it stops I think

  • @Ruud
    if you’re using $_SERVER[’HTTP_HOST’] then you need to access the script from http://example.com not http://www.example.com

  • this way I access the script:

    http://example.com/sitemap/sitemap.php

  • @Ruud
    yes, That should work

  • neeraj

    hello
    i have install it at mmmec.com but i have a forum under mmmec.com/forum ,so how can i make it to map my forum also ,where should i put the link depth for it ??

  • @neeraj
    as far as there’s a link to the forums it should crawl it with no problem

  • I get error “No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host”. what must i do?

  • @Dhyar IRdiansyah
    make sure the links on the page have the same domain as your localhost or make changes on the config file

  • Guys and Girls..I downloaded the most recent version on hotscripts.com and edited config.php.inc. All I changed was the prefix for my email and the password (did not touch the HTTP_HOST). I uploaded the files to my site, chmod 777 the data folder, and accessed the site via this link: http://domain.com/sitemap/sitemap.php (NO WWW)

    It went through and indexed everything fine. Works like a charm, thanks webmaster!

  • @Andrew
    Thank you for your comment.

  • Is there a way to limit the depth in a single directory. I have a calendar and once it goes into that area it fills it up fast.

  • @Darren
    I am afraid this is not available, I will try to include that as an option on the config file on next releases.
    A workaround is to edit the csv file and delete links you do not want to include on the sitemap. Or in case you do not want to crawl that directory you may blacklist it.

  • I keep getting this error when crawling. I’ve tried raising the timeout to no avail

    WARNING: Connection failed (0)
    WARNING: Connection failed (0)
    NOTICE: Document type is application/xml for URL http://mysite.com/clients/announcements.xml
    WARNING: Connection failed (0)
    NOTICE: Document type is application/zip for URL http://mysite.com/themes/dreamland.zip
    NOTICE: Document type is application/zip for URL http://mysite.com/themes/Green-Glow_000.zip
    NOTICE: Document type is application/zip for URL http://mysite.com/themes/mountain-dawn.zip
    NOTICE: Document type is application/zip for URL http://mysite.com/themes/yalla.zip

  • @Tammy
    This might be related to concurrent connection or bandwidth limit per session on your server.
    To solve this problem try commenting line :
    $get .= “Connection: close\r\n\r\n”;
    on function.inc.php file

  • thank you wkarim, I do that but now when it crawls it goes through the whole 350 seconds and then tells me:
    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

  • @Tammy
    please change SMC_USE_WWW to true and add www to your domain as all links displayed there include www

  • Pingback: Sitemap Creator - Create sitemaps for Google, Yahoo, MSN and Ask.comphp and javascript()

  • this is just what i was looking for for a week and 50 scripts later yep works like a charm my site is huge so i uped the mem to 600 and time out to 1800 and it rolls like a swiss watch

    I CAN NOT SAY THAK YOU ENOUGH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  • @shawn
    glad it helped, thanks for nice words

  • wkarim,

    can you tell me if the
    /*internal settings*/
    define(‘SMC_VERSION’, ‘0.2a’);
    define(‘SMC_DATA_CACHE’, SMC_DATA.’cache/’);
    define(‘SMC_DATA_SITES’, SMC_DATA.’sites/’);
    define(‘SMC_DATA_ERRORS’, SMC_DATA.’errors/’);
    define(‘SMC_DATA_SITEMAPS’, SMC_DATA_SITES.SMC_SITE.’_sitemaps/’);
    $pings = array (
    ‘Google’ => ‘http://www.google.com/webmasters/sitemaps/ping?sitemap=’,
    ‘Yahoo’ => ‘http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=’,
    ‘Live Search’ => ‘http://webmaster.live.com/ping.aspx?siteMap=’,
    ‘Ask.com’ => ‘http://submissions.ask.com/ping?sitemap=’,
    ‘MoreOver’ => ‘http://api.moreover.com/ping?u=’,
    );
    does any of this have to be edited and will this script automatacly ping the search engines
    because google askes for sitemap.xml in your root directory and the script does not save one there that i can see please explain how this works!! the scrips is awsome also in the admin section it would be nice if the admin could edit or add to the robots.txt file i have set up a cron task to run every hour it seems to work but does it save a new site map in the cron task or just update the existing one i have an auction site so it will be changing all the time.

  • Sitemap : http://floafieds.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz
    when google runs this they get a server internal error is there any fix?

  • @shawn
    – you do not need to edit internal settings.
    – the script provides another page ‘sitemap.php’ which should be placed in the root directory where it redirects the bot to the sitemap location.
    – There is an option on the admin area to add the sitemap URL to robots.txt
    – The script creates a new sitemap if the time of crawling is one day different from the last created sitemap, you can create a cron job to delete old sitemaps
    – The sitemap on the posted URL does not exist, please regenerate it.

  • so you do need to add any thing to these
    Google’ => ‘http://www.google.com/webmasters/sitemaps/ping?sitemap=’,
    ‘Yahoo’ => ‘http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=’,
    ‘Live Search’ => ‘http://webmaster.live.com/ping.aspx?siteMap=’,
    ‘Ask.com’ => ‘http://submissions.ask.com/ping?sitemap=’,
    ‘MoreOver’ => ‘http://api.moreover.com/ping?u=’,?

  • sorry i do have this sitemap.php file and it is in the root directory.and there is also referance in the robots.txt to it!

    i have 3 domains and want them all crawled seperate and they seem to be working but what do i need to tell google my site map is? sitemap.php?

    if we get this working properly i will help you promote it my placing a link on my sites!

  • @shawn
    – No, you mostly do not need to edit this.
    – For crawling more than one domain you need to keep SMC_SITE as $_SERVER[‘HTTP_HOST’] and run the script from these different domains, every domain will create its own sitemap and will product a different sitemap which you can submit to Google webmaster’s CP.
    – You are welcome to show a link back to Sitemap Creator.

  • sl

    stripos() and curl_setopt_array() is not php 4.x compilant. so i fixed following lines in file .function.inc.php.

    line 217:
    $ch = curl_init($url);

    changed to:
    foreach ($options as $myoption => $myvalue) {
    curl_setopt ($ch, $myoption, $myvalue);
    }

    line 233:
    if ( isset($header[‘content_type’]) && stripos($header[‘content_type’], ‘text’) === false ){

    changed to:
    if ( isset($header[‘content_type’]) && strpos(strtolower($header[‘content_type’]), ‘text’) === false ){

  • Thanks for the sitemap script, it’s working great
    nice work ..

  • sl I changed the .function.inc.php as you described and it now crawls my site, but when I go to create the sitemaps it says, please crawl domain.com first. Any suggestions?

  • I assume that this script does not work w/ PHP safemode?

  • I turned off safe mode, however when I crawl the site it crawls perfect then I click create sitemap, and it tell me I must crawl the site first.

    I have curl disabled and www set to true.

  • @bispak
    Make sure the script finishes crawling before you create the sitemap

  • i’m sure crawling is finish with the result:
    Finished crawling http://www.tukarinfobispak.com, Crawled 72 links
    Took 26.69 Seconds, using 2MB of memory

    and then at the end mention about the url to add to crontab or schedule tasks.

    Still, when i click “Create Sitemaps” i got answer: “Please crawl tukarinfobispak.com first”.

    I use PHP Version 5.2.5

  • When i clink on “Display CSV File” the result is same with when i click on “Create Sitemaps”.
    “Please crawl tukarinfobispak.com first”

  • same result happened with or without “www”.
    success in crawl but csv file and sitemap not created.
    what’s wrong with me? am i not lucky?
    🙁

  • @bispak
    please use the new version sitemap creator 0.2b

  • thanks, i’ll download and try sitemap creator 0.2b

  • when i download, install and run sitemap creator 0.2b
    Voila.. it’s works well.. yippeeee….

    sitemap20080711.xml.gz Created successfully

    Where i can see sitemap20080711.xml.gz ?

    when i’ll submit sitemap to google as per SMC advised i dont know what i can submit since i cant find xml file under root or sitemap folder.

    Please more advise and sorry to make you busy with my questions. Hope you dont mind.

  • Frank

    Hi,
    first, thank you for this great script. It works perfect with a browser. But it doesn’t work from console. The only output is the HTML Markup from sitemap.php. But nothing is crawled and no sitemap is created. Do you have a workaround how to crawl from console/cron?
    Thank You

  • @Frank
    Good point! It has not been tested from console, however I will consider adding console interface on the next versions

  • Sitemap Creator the best

  • It works well on my site but how to added sitemap at google webmaster tool? which file should be added?

  • @prakash
    it should be at http://example.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz
    you might use rewrite rules to create more simple URL
    make sure you use the new version at http://gadelkareem.com/2008/05/15/sitemap-creator-02-beta/

  • DoubleClic

    I have an error with url with quote.
    Example :
    http://www.site.com/she's.htm

    The script return only the first part :
    http://www.site.com/she

    Thank’s for your script.

  • @DoubleClic
    you might need to urlencode() your URL
    http://www.example.com/she’s.htm should be http://www.example.com/she%27s.htm

  • Great script.It works perfect

  • Pingback: Sitemap Creator- create sitemap for google,yahoo,msn and ask.com | SAYFARZ()

  • Creat i needed this for my site

  • Thank you for the script. It worked ok, I think. But now I have two doubts:
    1- The program automaticaly pings to google and yahoo and the others?
    2- Where is the xml file created so we can put it in google webmaster tools?

    Thank you

  • The site map looks good on the screen using url below but Google WebMaster tools shows errors and says wrong file type.
    ttp://www.reddotdeals.com/sitemap.php?do=showsitemap&sm=20090328.xml.gz

    I actually copied the code this generates and saved it as sitemap.xml and this works perfectly.

    I would suggest the script should create ‘sitemap.xml’ either on demand or using a chron

    respect for the work on this script

    Michael

  • Thanks for it.
    I have download it

  • Can you tell where to download a newest version..?

  • iam

    My wp have coment like this:
    couldn’t resolve host ‘search.yahooapis.com’

  • Nice work, thanks!

  • Wkarim,

    Your script works excellently on my site. You are the best.

    I have a question: I got a 404 error on a very very long URL. The URL also has “,” and “+”. How can I overcome this error. It’s very important that this URL gets indexed. I will have many more like it so I don’t want to manually edit every time. Please advise. Here is the URL that had the 404 error:

    http://cupodona.com/search/word/Valentines,+Valentine's,+jewelry,+diamond,+chocolate,+flower,+her,+him,+love,+candy,+heart,+ring,+fragrance,+wine,+lingerie,+stone,+perfume/1.html

    Thanks in advance for your advice.

  • The url placed in my robot.txt file gets a 404 error when I place it in the browser address. Here is the url: http://www.wereviewitall.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz

    Is this correct?

  • Sorry, I discovered that it is correct. I did not have the first sitemap.php in the correct directory.

    Does it submit the correct type of sitemap (xml or text)?

  • Why are you all trying to do your link building using a non-follow link instead of writing a real comment to no whether we can work with the script and test on your site and let us know if you have any comment or how you over come issues with installation. This is what this comment area about.