Contains information about a page or file the crawler found and received during the crawling-process.
Located in /libs/PHPCrawler/PHPCrawlerDocumentInfo.class.php (line 7)
Some internal benchmak-results as array.
The number of bytes the crawler received of the content of the document.
The content of the requested document (html-sourcecode or content of file).
Will be empty if "received" is FALSE and the source won't be complete if "received_completly" is FALSE!
The temporary file to which the content was received.
Will be NULL if the content wasn't received to the temporary file.
The content-type of the page or file, e.g. "text/html" or "image/gif".
Cookies send by the server.
The average data-transferrate for this document.
The time it took to receive the document.
The code of the error that perhaps occured while requesting/receiving the document.
(See PHPCrawlerRequestErrors::ERROR_... - constants)
Indicates whether an error occured while requesting/receiving the document.
A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
The name of the requested page or file, e.g. "page.html".
The complete HTTP-header the webserver responded with this page or file.
The complete HTTP-request-header the crawler sent to the server (debugging info).
The host-part of the URL of the requested page or file, e.g. "www.foo.com".
The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
An numeric array containing information about all links that were found in the source of the page.
Every element of that numeric array contains the following keys again:
link_raw - contains the raw link as it was found url_rebuild - contains the full qualified URL the link leads to linkcode - the html-codepart that contained the link. linktext - the linktext the link was layed over (may be empty).
So e.g $page_data["links_found"][5]["link_raw"] contains the fifth link that was found in the current page. (May be something like "../../foo.html").
An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
Example: Printing the second raw link that was found on the page
All meta-tag atteributes found in the source of the document.
The path in the URL of the requested page or file, e.g. "/page/".
The port of the URL the request was send to, e.g. 80
The protocol-part of the URL of the page or file, e.g. "http://"
The query-part of the URL of the requested page or file, e.g. "?x=y".
Flag indicating whether content was received from the page or file.
Flag indicating whether content was completely received from the page or file.
The conten of the current document may not be received comepletely due to settings made with PHPCrawler::setContentSizeLimit())PHPCrawler::setTrafficLimit().
Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl.
Will be true if the content was received into temporary file.
The content is stored in the temporary file $pageInfo->content_tmp_file in this case.
Will be true if the content was received into local memory.
You will have access to the content of the current page or file through $pageInfo->source.
The complete URL of the page that contained the link to this document.
The html-sourcecode that contained the link to the current document.
(E.g. <a href="../foo.html">LINKTEXT</a>)
The linktext of the link that "linked" to this document.
E.g. if the refering link was <a href="../foo.html">LINKTEXT</a>, the refering linktext is "LINKTEXT". May contain html-tags of course.
Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
Same as "content", the content of the requested document.
Indicated whether the traffic-limit set by the user was reached after downloading this document.
The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4