Sitemap Creator 0.2 beta

New Version is available. Click here

Sitemap Creator crawls/spiders your website creating XML sitemaps compatible with the standard sitemaps.org protocol supported by Google, Yahoo!, MSN and MoreOver. The script pings Google, Yahoo!, MSN and MoreOver bots to download the sitemap file, then tracks the bot and sends you an email on every scan to your Sitemap and gives you a full report of the Search Engine respond.
Sitemaps are created from a CSV file which could easily be edited using any text editor before creating the sitemap.


Sitemap Creator has three built in ranking mechanism which decide priorities of your pages depending on the number and the placement of link backs, crawled first links or URL structure. You can also limit the crawler by memory, run time or number of URLs.

    beta info

  • cURL is not needed anymore, all requests are processed through fsockopen.
  • Big fix to the link back ranking functions.
  • Limit crawler to a number of URLs
  • Disable crawling specific directories or links. Regular expressions are supported
  • limit number of links to show on start page

The script was tested on PHP5, let me know how it worked for PHP4 .
Online demo might be available on the next release.
Download (build 20080514) :
sitemap_creator.tar.gzsitemap_creator.zip

Recommended posts:

  • Pingback: Sitemap Creator 0.2 beta Released « jared.brodsky

  • http://www.greatertalent.com gtnman

    What is the URL you are supposed to submit to google for the sitemap?

  • http://gadelkareem.com wkarim

    @gtnman
    Please use “Add reference to robots.txt” to show you the default sitemap URL, remember to chmod 666 robots.txt .
    it should look something like
    http://www.greatertalent.com/sitemap.php?do=showsitemap&sm=sitemap.xml.gz

  • http://www.greatertalent.com gtnman

    Interesting note. I have tested this script on two VPS servers both running the latest version of Plesk on two different web hosts. Seems that the unix command utime (which touch()) uses is not available in Plesk. Is touch vital to the script or will is_writable do the same thing? (I made this change, and the robots.txt generated perfectly.)
    Does the cronjob re-generate robots.txt or is this something that the user must do.

  • http://www.greatertalent.com gtnman

    Another note, I tested this on my ubuntu box and it worked fine which utime did exist on.

  • http://gadelkareem.com wkarim

    @gtnman
    - touch() is not related to Plesk, you need to have permissions to modify files in that directory, I can not find any relation between touch() function and utime. make sure you ‘chmod 777′ data directory.
    - robots.txt are not generated every time the sitemap is created, you only need to modify it once.

  • http://mirsoft.net abyzn

    hi
    help me
    i see this error:

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

  • http://mirsoft.net abyzn

    where is the right setting for site

  • http://gadelkareem.com wkarim

    @abyzn
    Can you enable the debug mode from the configuration file and give me the results?

  • http://www.etdom.com Ferran

    I use your Sitemap Creator for a site and all going OK, show a table with the URL’s and no error messages. But when I click “Create Sitemaps” the XML is mal formed and incorrect. Any suggestion?
    The Site is in ISO-8859-1 and Sitemap in UTF-8…

  • http://gadelkareem.com wkarim

    @Ferran
    can you give me a like to your sitemap xml file?

  • http://www.etdom.com Ferran
  • Pingback: links for 2008-06-18 « Free Open Source Directory

  • http://gadelkareem.com wkarim

    @Ferran
    can you change constant SMC_GSS to false to disable GSS and try again?

  • http://www.etdom.com Ferran

    @wkarim
    I set to false SMC_GSS and the result is the same…
    When I look at CSV or the table of sitemap.php show the links… It may be the codification of the files? I try in my sites in UTF and works perfectly, but when i try in this site that’s in ISO, the xml is corrupt… I don’t understand it, the functions are correct and config too…

  • http://www.tukarinfobispak.com bispak

    Thanks wkarim, works well using this version.

    I use it on my site tukarinfobispak.com

    anytime search engine robot read my sitemap, email sent to my mailbox inform about their activity.

  • flobster

    nice script! just one question: i have several urls ending with “?a=page:2″ which i want to exclude from being crawled. Will this be possible using SMC_DISABLED_DIRS?

  • http://gadelkareem.com wkarim

    @flobster
    yes, you can use regular expressions to disable those URLs, examples are included on the config file.
    ex : define(‘SMC_DISABLED_DIRS’, ‘\?a=page:2′ );

  • http://mym8.eu Bo

    hi,
    i get this error:
    WARNING: Page http://mym8.eu/ is redirecting to home.php
    DEBUG: URL “http://mym8.eu/” Blacklisted. Reason : Empty Page

    can you please advice?
    thank you in advance

  • http://gadelkareem.com wkarim

    @Bo
    it should continue crawling the site normally but http://mym8.eu/ will not show on the sitemap as it’s a redirect page

  • http://thenetradio.co.uk SearleCom

    Installed, crawls fine after logging in via http://thenetradio.co.uk/sitemap/sitemap.php

    Once crawl has finished I press Create Sitemaps and get Please crawl thenetradio.co.uk/index.php first.

    Sevrer is running PHP 5.2.6

    Any ideas?

    Mike

  • http://www.tiendalacor.com mikey

    Same here i upgrade to the new version and still no luck when i click the Crawl link it displays a number then it does nothing, now if i click any other link it shows that the cache has around 36 elements but nothing else no list nothing, dont know what else to do

  • http://www.tiendalacor.com mikey

    i been checking the log file and the only error in it is this one:

    [05-Aug-2008 15:06:56] PHP Fatal error: Maximum execution time of 30 seconds exceeded in /home/xxxxxxx/public_html/sitemap/.function.inc.php on line 295

  • http://gadelkareem.com wkarim

    @mikey, SearkeCom
    Please enable debug mode to give me a better idea about what is happening.

  • king

    I’m running php4 and 0.2b and can not get it to run. I have got to the point where everything runs fine, but google and others returns errors. the below one is saying the file is not present i 20080813 and there is no file

    /hsphere/local/home/mgruppe/forkortelse.dk/sitemap/data/sites/www.forkortelse.dk_sitemaps/20080813 could not be found IP -: http://whois.domaintools.com/66.249.71.79
    Date -: 04:26:44 pm ( Wednesday 13 August 2008 ) Bot -: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Location -: http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz

    if i go to http://www.forkortelse.dk/sitemap.php?do=showsitemap&sm=20080813.xml.gz
    it returns that something is wrong with the xml file

    file attibutes should be ok
    Thanks for a nice script!

    BR
    Henrik

  • http://gadelkareem.com wkarim

    @king
    looks like you pinged google with a sitemap then deleted it, try ping google again with the right sitemap name or add it manually on google webmaster tools

  • http://www.bahherbalife.com Khal

    I installed it on my site and worked fine so far, i created a sitemap and managed to ping all search engines.
    Thanks
    Khal

  • http://www.tractiontime.co.uk Stuart

    I have tried this scrip, unmodified on php4 and i get the following message when i run the script:
    WARNING: Connection failed (111) Connection refused
    DEBUG: URL “http://www.tractiontime.co.uk/” Blacklisted. Reason : Empty Page

    The page i presume it’s refering to is index.php and i know its not empty. Can you shed any light on this please?
    MAny thanks

  • http://gadelkareem.com wkarim

    @Stuart
    please check your connection from the server running the script to http://www.tractiontime.co.uk, check what you get with < ?php echo file_get_contents( 'http://www.tractiontime.co.uk/'); ?>

  • http://www.tractiontime.co.uk Stuart

    Hi,
    I created a new file within the sitemap directory called test.php and copied in the above short script, removing the space after the first < in your example. The results are as follows:
    Warning: file_get_contents(http://www.tractiontime.co.uk/) [function.file-get-contents]: failed to open stream: Connection refused in /home/sites/tractiontime.co.uk/public_html/sitemap/test.php on line 1

  • http://gadelkareem.com wkarim

    @Stuart
    Thre must be something wrong with your connection

  • jorge

    When I enter to my page energyenhancement.org/sitemap_creator/sitemap.php they ask me for a loggin.
    Where can I find it?

  • http://gadelkareem.com wkarim

    @Jorge
    the password is ‘demopass’
    change it to any password you like in the config file

  • Antonimo

    After crawling the site, the page displays all of the urls etc. At the end of the page is the URL to add to the crontbab.

    The page then shows a link to “Create Sitemaps” – The sitemap has a date stamp. Is the XML sitemap created automatically when it is run from the crontab? Does it have a new time stamp?

    If the Ping is enabled in the config file, will the correct sitemap be “pinged” to the search engines?

  • http://gadelkareem.com wkarim

    @Antonimo
    Yes, all sitemaps are time stamped and the ping is for the newest created one.
    if the request does not contain a time stamp then the script will send the most recently created sitemap

  • Antonimo

    @wkarim
    Thanks for the quick response.
    Cron set up – looking forward to seeing the results.
    Excellent script – kudos

  • Antonimo

    Hi again,

    I am having great difficulty getting the cron to work.

    I have other crons that run fine using the command /usr/bin/php -q /home/DOMAIN/public_html/BackUp.php (for example)

    Whe I try to run the sitemap creator from cron, (/usr/bin/php -q /home/DOMAIN/public_html/sitemap/sitemap.php?do=createsitemap) I receive an error, “No input file specified”

    Is there a special way to do this or can I call the “do” script from another php file?

  • http://gadelkareem.com wkarim

    @Antonimo
    The script is not designed to run from command line, alternatively you can use lynx with crontab…the command should look something like
    lynx -dump http://example.com/sitemap/sitemap.php?do=createsitemap&hash=blah

  • Dave

    This looks great but idea why I get this error and one one page ?

    Warning: Division by zero in /myhost/public_html/sitemap/.function.inc.php on line 320

    Finished crawling mydomain.com, Crawled 1 links

  • http://gadelkareem.com wkarim

    @Dave
    I can not investigate the problem without knowing your domain, you might want to make sure no redirects are made

  • antiwow

    I get a message after crawling,

    Crawler Timed out after 200 seconds while crawling www. etc

    After that, it wont let me generate a sitemap

    i looked at config, but couldnt find anything

    any ideas?

  • http://gadelkareem.com wkarim

    @antiwow
    Try to enable the debuging mode from the config file

  • http://www.luciffere.ro Luciffere

    Hello… Is the best free sitemap creator… The cron function haw is stoped or restarted? How can i put a cron job for my site to make another sitemap mounthly? I insert the link generated at end of crow in cron job from cpanel?
    Thank’s

  • http://gadelkareem.com wkarim

    @Luciffere
    Thanks for the comment
    You need to refer to Cpanel documentation for creating cron job, for UNIX crontab you may use Lynx command line browser to run the script.

  • http://www.luciffere.ro Luciffere

    After the script finished the job, he tell me put this link http://www.luciffere.ro/sitemap/sitemap.php?do=createsitemap&secure=4c7a34d25eff9121c49658dbceadf694 on cron job…
    My question is if i put this link (http://www.luciffere.ro/sitemap/sitemap.php?do=createsitemap&secure=4c7a34d25eff9121c49658dbceadf694) generated by your script, in my cron job and set mounthly Saturday 1 12:00 AM… the script make sitemap in every mounth in first day at 12:30 AM and ping the google, yahoo… etc… ? I need to login at http://www.google.com/webmasters/tool to insert the link to the sitemap? Now i have at google a manual sitemap http://www.luciffere.ro/sitemap.xml but i deleted. Is necesary insert to google a link manualy?
    Thank’s for your work….

  • http://gadelkareem.com wkarim

    @Luciffere
    you can add “http://www.luciffere.ro/sitemap.php?do=showsitemap&sm=sitemap.xml.gz” to Google webmaster tools. For MSN webmaster tools please create a rewrite rule for this URL…it should look like “http://www.luciffere.ro/sitemap.xml.gz”

  • http://coma.su Drone

    Your script very cool. But some webmasters need to correct var max_execution_time in php.ini or in htaccess file.

  • http://www.klassiskabyggvaror.se Peter

    I´ve been using your SMC a while. Nice work.
    I just realized that the SMC cuts an url where a space appears instead of inserting %20. This breaks my links and the sitemap points to a number of pages having mysql-errors from incompleate url.
    Is there more than function: cleanurl() to correct this?
    Is it about text encoding? I´m using iso-8859-1.

  • http://gadelkareem.com wkarim

    @Peter
    Anchor Links should have escaped URI, that means you should replace any spaces on your links with %20

  • http://www.luciffere.ro luciffere

    Hello. I use your scrip, and he generate the sitemep, but google tell me: Unsupported file format
    Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit.

    Address of my sitemap is: http://www.luciffere.ro/sitemap/sitemap.php?do=showsitemap&sm=20081221.xml.gz

  • http://www.klassiskabyggvaror.se Peter

    The url sent to searchengines seems to be wrong.
    In my case it miss the directory: sitemap_creator_0.2a/
    At least the url returned from moreover to my email.
    Has all default settings exept SMC_SITE where $_SERVER['HTTP_HOST'] doesn´t work. Replaced with ‘klassiskabyggvaror.se’

  • http://www.klassiskabyggvaror.se Peter

    Think I got it working. Found the 0.2b and some reading.

    For editing: is it only the CSV I need to edit?
    What about the: data/sites/klassiskabyggvaro.se_sitemaps/20090114

    Thanks for a good script and active blog!

  • http://gadelkareem.com wkarim

    @luciffere
    you should submit http://www.luciffere.ro/sitemap.php?do=showsitemap&sm=20081221.xml.gz instead (note that no “sitemap” directory in the URL )
    @Peter
    You need to edit the CSV before creating your sitemap in case you like to change something

  • Pedro

    Hi, I have installed the script and apparently works very fine (last version). I have Php 4x

    I have some doubts.
    A. I need to crawl first http://www.mysite.com and after mysite.com?

    B. What time interval for cron is the best option?

    Thanks

  • Pedro

    Another doubt.
    The 99% priorities of my sitemap are 0.1. I need to change manually the priorities?

    Thanks.

  • Pedro

    I think lynx -dump command dont work in my cronjobs. Any alternatives?

  • http://gadelkareem.com wkarim

    @Pedro
    - I think cron job could run every week
    - priorities depends on your link structure, you might need to check your SEO
    - You whether install lynx ( yum install lynx ) or you can use curl

  • http://www.luciffere.ro luciffere

    Hello again… I want make sitemap with your script for http://www.sauber.ro but the script craw only 13 links… Many links who are linked from first pages are listed with 403 code but all work… Where are the problem?
    Thank’s

  • http://www.savef1.co.uk Rod

    having this error massage: No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

    can you advise me my code is as below:
    ‘http://www.google.com/webmasters/sitemaps/ping?sitemap=’,
    ‘Yahoo’ => ‘http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=’,
    ‘Live Search’ => ‘http://webmaster.live.com/ping.aspx?siteMap=’,
    ‘Ask.com’ => ‘http://submissions.ask.com/ping?sitemap=’,
    ‘MoreOver’ => ‘http://api.moreover.com/ping?u=’,
    );

    ?>

  • http://www.savef1.co.uk Rod

    i get this error can you help :

    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host

    my config is:

    ‘http://www.google.com/webmasters/sitemaps/ping?sitemap=’,
    ‘Yahoo’ => ‘http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=’,
    ‘Live Search’ => ‘http://webmaster.live.com/ping.aspx?siteMap=’,
    ‘Ask.com’ => ‘http://submissions.ask.com/ping?sitemap=’,
    ‘MoreOver’ => ‘http://api.moreover.com/ping?u=’,
    );

    ?>

  • http://scooterswebdirectory.com Scooter

    I found no major problems with your script – installing or other! I was wondering if IE7 defeats the timeout in setcookie (smc_pass)?
    I have altered the time and yet to see it not show a new login page. I can work it by closing the browser. Do you have a “logout” feature or must I always close the browser?
    Thanks for a neat script!!

  • http://gadelkareem.com wkarim

    @Scooter
    the cookie should be deleted if browser closed, try clear your cookies if you want to logout

  • Harold

    I have successfully used Sitemap Creator 0.2 beta for over 2 months without any problems. Just recently Google can no longer read my sitemap and when I personally visited my sitemap it says

    “XML Parsing Error: no element found
    Location: http://cybercircuits.co.cc/sitemap.php?do=showsitemap&sm=20090209.xml.gz
    Line Number 1, Column 1:”

    Any ideas?

  • http://0daynews.org/ ZDN

    giving SMC a whirl and noticing some problems and have some suggestions…

    this is on LAMP with PHP v5.2.8 and a website of ~700 pages

    bugs???

    define(‘SMC_USE_BLACKLIST’, true);

    -if set to false, SMC is unable to crawl at all (using http://domain/sitemap/sitemap.php?do=crawl). this is repeatable 100%

    if i change back to ‘true’, SMC fails to crawl and says “No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host”. i’m a bit foggy here, but i think i had to delete the SMC cache to get it to run again.

    suggestions…

    -ability to edit priority and frequency from browser instead of CSV file, and ability to select and edit multiple files at once.

    -ability to make priority and frequency static instead of being changed when re-crawled. ability to set per directory would be good here too.

    -ability to set default priority and frequency for a directory, or a directory and all sub directories, so any new or existing files found in these directories during a crawl or by a future crawl will inherit the same priority/freq.

    -meaningful names for blacklist directory files or, better yet, new link for sitemap.php that shows what was blacklisted

    -default protection for /sitemap/ directory so public can’t access

    i’ve been looking for quite a long time for a sitemap script that can auto-ping and auto-update (CRON) and i like what you have done. looks VERY promising!

  • http://www.ntandme.com Daniel

    Hi Karim, thanks a lot for this awesome script. I did install with only little adoptions and works like a charm. Crawled 7395! pages on my site (time limit: 7000, memory: 600) without a hickup. Sitemap successfully published to the different search engines as well.

    Only issue right now is the created sitemap only contains 500 pages. :( Checked the config a couple of times, but found nothing. Am I missing something?

    Cheers
    Daniel

  • http://gadelkareem.com wkarim

    @ZDN
    thanks for the suggestions, I’ll consider them in new releases
    @Daniel
    can you check the CSV file and let me know how many URLs does it have?

  • http://www.blackpoolevents.co.uk Blackpool Shows

    Hi, thanks for creating this great sitemap creator. It is the best one I have used. I do have a couple of things, please don’t consider them criticisms they are not.
    1. I have a htaccess redirect on my site so all domain.com is rewritten to http://www.domain.com . This does cause a problem with the emails. Its not a real problem but I guess I am not the only one who uses mod rewrite, maybe its something to consider in future versions?
    2. I have one page that has a space in the page name and that is caught in the Blacklist. I must admit the page name with a space was an error of mine many years ago and I have never got round to fixing it so its my fault with bad page naming.

    So thanks very much for a great sitemap creator.

    Regards
    Pete

  • http://gadelkareem.com wkarim

    @Blackpool
    thank you for your comments
    1. you will need to add your domain with ‘www’ on the config file
    2. you need to add a redirect header for that URL to a new URL without spaces, and replace old one on all your pages. check google webmaster guidelines for more info

  • http://www.ilmiobusiness.net mauro

    Hi Karim, thanks a lot for this awesome script.
    It required to modify my website because of ./ links will recursive call crawling. I suggest you to remove (trim) /./ urls.
    If a url is href=”./pagename.html” (that I use for redirect to index.html) don’t add domain-base url but link to http://pagename.html
    I hope this help other people!
    Best regards,
    Mauro

  • http://www.hognutz.com cdl

    The script runs great, no problems at all – Thanks for all the work.

    My problem is getting a cron job to run it. My host uses cPanel X. I have tried everything I can think of and either get:
    1. No input file specified.
    -OR-
    2. The complete text of the script.

    Here’s the “Command to run” I’m using:
    php /home/DOMAIN/public_html/sitemap/sitemap.php?do=createsitemap&secure=16312581c5c30b78630b89d2205e8675

    Anyone work this out on cPanel X?

  • http://gadelkareem.com wkarim

    @mauro
    I will investigate this more with next releases
    @cdl
    you need a program like lynx or cURL to execute the script, it can not be executed from php cli

  • http://www.dpcgamers.com alek

    I got questions about this script.

    1 – there’s a way to filter some directories or files? (eg: /directoryname/*)

    2 – I’ve put define(‘SMC_SITE’, $_SERVER['dpcgamers.com'] ); but I still got the error saying “No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host”.

  • http://www.not-only-pixel.de minobu

    First: your script is a realy nice work, thanks for that stuff!

    I had a similar bug like mauro. I solved it with the following patch. Maybe this helps someone.

    .function.inc.php – arround line 288:
    ————-
    /** original **/
    $url = preg_replace(‘#/[^/]*$#’,'/’,$url);

    /** replace with this **/
    $url = str_replace(“http:”, “”, $url);
    $url = str_replace(“/”, “”, $url);
    $url = SMC_SCHEME. $url .’/';
    ————-

    And if you are already coding you may add something to line 252
    ————-
    /** original **/
    if( !empty($sub) && preg_match(‘#\.(ico|png|jpg|gif|css|js)(\?.*)?$#i’, $sub) ) #excluding graphics
    /** after change **/
    if( !empty($sub) && preg_match(‘#\.(ico|png|jpg|gif|css|js|pdf|doc|eps)(\?.*)?$#i’, $sub) ) #excluding graphics and other none html documents
    ————-

    @wkarim
    A user defined filter setting to rip vars like &uid=999 from the url would be nice in next release.

  • http://www.not-only-pixel.de minobu

    $url = str_replace(‘http:’, ”, $url);
    $url = str_replace(‘/’, ”, $url);
    $url = SMC_SCHEME. $url .’/';

  • simon

    this is the best script I have used.
    Thanks for it.

    Could you do a meta,keyword one that would be good?

    or does anyone know if one that crawls your site to make them like this script crawls your site.

  • http://www.goldstark.com yannick

    is there a way to get all url in javascript like document.location=’http://…’

  • http://gadelkareem.com wkarim

    @yannick
    I am afraid the crawler can not understand JavaScript.

  • hotep

    Thank you for the excellent script! :-)

    Anybody know the best parameters to include when trying to parse phpBB version 2?

    The just script hangs when trying to crawl the bulletin board.

  • http://www.store.epubliuspost.com Jim

    Every time I attempt to crawl my site with your program I get a “Zero Size Reply”. What is this telling me and how do I fix it?

    Thanks

  • http://harryy.us Harry

    Warning: Division by zero in /var/www/sitemap/.function.inc.php on line 320

    Help please :(

    Using lighttpd.

  • http://francophonerss.com Angiolelli

    Hi Karim, thanks a lot for this awesome script. I did install with only little adoptions and works like a charm. Crawled 7395! pages on my site (time limit: 7000, memory: 600) without a hickup. Sitemap successfully published to the different search engines as well.

    Only issue right now is the created sitemap only contains 500 pages. :( Checked the config a couple of times, but found nothing. Am I missing something?

    Cheers
    Daniel…

  • http://www.ceriacell.com Ceri

    hi thx for u sharing i need it …

  • http://www.transeurotour.ro Luciffere

    I want to craw the http://www.transeurotour.ro but:
    No Pages were crawled, Please make sure you have set your site domain correctly and you have valid connection to host
    The script works in the past. I don’t made any modification. What happened?
    Thank you.

  • http://www.mixd.com.br Fabricio Sahdo

    Hi dude, I have a question.

    Why after 44 seconds my crawler redirect for page 404 not found?
    Have somebody any explication?

    thx

  • http://www.compreblindado.com.br blindados

    Why the crawler did takes my sub-domain?
    I just specified the first domain :S

  • http://www.farejadorweb.com.br guia rio preto

    So great, I liked so much, thx!!

  • Terry

    Hi there, I think the script is great – so far so good. I just cant get it to crawl deeper than the root directory http://www.example.com It will not look into sub directory folders eg http://www.example.com/folder Any ideas anyone?

  • Terry

    It’s ok I found out why sub folder were not being crawled!
    There has to be links to them first – silly me.

  • Terry

    whether not weather

  • peps

    Thanks for this great script!! Works great!

    How do i add .pdf & .docx to the allowed file types so that they will be added to the sitemap?

    Thanks!

  • peps

    anyone?

  • peps

    still figuring out how.. anyone?
    i get this notice:
    NOTICE: Document type is application/pdf for URL http://example.com/test.pdf