Docs For Class PHPCrawlerDocumentInfo

var: Array containing some interlnal benchmark-results for receiving and processing this document. The keys are the identifiers, the values are the benchmark-times.
section: 10 Benchmarks
access: public

int $bytes_received = 0 (line 141)

The number of bytes the crawler received of the content of the document.

var: Received bytes
section: 2 Content-related information
access: public

string $content = "" (line 159)

The content of the requested document (html-sourcecode or content of file).

Will be empty if "received" is FALSE and the source won't be complete if "received_completly" is FALSE!

section: 2 Content-related information
access: public

string $content_tmp_file = null (line 177)

The temporary file to which the content was received.

Will be NULL if the content wasn't received to the temporary file.

section: 2 Content-related information
access: public

string $content_type = "" (line 149)

The content-type of the page or file, e.g. "text/html" or "image/gif".

var: The content-type
section: 2 Content-related information
access: public

array $cookies = array() (line 193)

Cookies send by the server.

var: Numeric array containing all send cookies as PHPCrawlerCookieDescriptor-objects.
section: 2 Content-related information
access: public

float $data_transfer_rate = null (line 309)

The average data-transferrate for this document.

var: The rate in bytes per seconds.
section: 10 Benchmarks
access: public

float $data_transfer_time = null (line 301)

The time it took to receive the document.

var: The time seconds
section: 10 Benchmarks
access: public

int $error_code = null (line 278)

The code of the error that perhaps occured while requesting/receiving the document.

(See PHPCrawlerRequestErrors::ERROR_... - constants)

var: One of the PHPCrawlerRequestErrors::ERROR_ ... constants.
section: 8 Error-handling
access: public

bool $error_occured = false (line 269)

Indicates whether an error occured while requesting/receiving the document.

var: TRUE if an error occured.
section: 8 Error-handling
access: public

string $error_string = null (line 286)

A representig, human readable string for the error that perhaps occured while requesting/receiving the document.

var: A human readable error-string.
section: 8 Error-handling
access: public

string $file = "" (line 47)

The name of the requested page or file, e.g. "page.html".

section: 1 URL-related information
access: public

string $header = "" (line 71)

The complete HTTP-header the webserver responded with this page or file.

section: 2 Content-related information
access: public

string $header_send = "" (line 86)

The complete HTTP-request-header the crawler sent to the server (debugging info).

access: public

string $host = "" (line 31)

The host-part of the URL of the requested page or file, e.g. "www.foo.com".

section: 1 URL-related information
access: public

int $http_status_code = null (line 185)

The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).

section: 2 Content-related information
access: public

array $links_found = array() (line 211)

An numeric array containing information about all links that were found in the source of the page.

Every element of that numeric array contains the following keys again:

link_raw - contains the raw link as it was found url_rebuild - contains the full qualified URL the link leads to linkcode - the html-codepart that contained the link. linktext - the linktext the link was layed over (may be empty).

So e.g $page_data["links_found"][5]["link_raw"] contains the fifth link that was found in the current page. (May be something like "../../foo.html").

section: 3 Information about found links
access: public

array $links_found_url_descriptors = array() (line 224)

An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.

Example: Printing the second raw link that was found on the page

echo $PageInfo->links_found_url_descriptors[2]->link_raw;

var: Numneric array containing PHPCrawlerURLDescriptor-objects
section: 3 Information about found links
access: public

array $meta_attributes = array() (line 330)

All meta-tag atteributes found in the source of the document.

var: Assoziative array conatining all found meta-attributes. The keys are the meta-names, the values the content of the attributes. (like $tags["robots"] = "nofollow")
section: 2 Content-related information
access: public

string $path = "" (line 39)

The path in the URL of the requested page or file, e.g. "/page/".

section: 1 URL-related information
access: public

int $port (line 63)

The port of the URL the request was send to, e.g. 80

section: 1 URL-related information
access: public

string $protocol = "" (line 23)

The protocol-part of the URL of the page or file, e.g. "http://"

section: 1 URL-related information
access: public

string $query = "" (line 55)

The query-part of the URL of the requested page or file, e.g. "?x=y".

section: 1 URL-related information
access: public

bool $received = false (line 94)

Flag indicating whether content was received from the page or file.

var: TRUE if the crawler received at least some source/content of this page or file.
section: 2 Content-related information
access: public

bool $received_completely = false (line 105)

Flag indicating whether content was completely received from the page or file.

The conten of the current document may not be received comepletely due to settings made with PHPCrawler::setContentSizeLimit())PHPCrawler::setTrafficLimit().

var: TRUE if the crawler received the complete source/content of this page or file.
section: 2 Content-related information
access: public

mixed $received_completly = false (line 113)

Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl.

deprecated:
section: 11 Deprecated
access: public

bool $received_to_file = false (line 133)

Will be true if the content was received into temporary file.

The content is stored in the temporary file $pageInfo->content_tmp_file in this case.

section: 2 Content-related information
access: public

bool $received_to_memory = false (line 123)

Will be true if the content was received into local memory.

You will have access to the content of the current page or file through $pageInfo->source.

section: 2 Content-related information
access: public

string $referer_url = null (line 232)

The complete URL of the page that contained the link to this document.

section: 7 Referer information
access: public

string $refering_linkcode = null (line 242)

The html-sourcecode that contained the link to the current document.

(E.g. <a href="../foo.html">LINKTEXT</a>)

section: 7 Referer information
access: public

string $refering_linktext = null (line 261)

The linktext of the link that "linked" to this document.

E.g. if the refering link was <a href="../foo.html">LINKTEXT</a>, the refering linktext is "LINKTEXT". May contain html-tags of course.

section: 7 Referer information
access: public

string $refering_link_raw = null (line 250)

Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")

section: 7 Referer information
access: public

PHPCrawlerResponseHeader $responseHeader (line 79)

The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.

section: 2 Content-related information
access: public

string $source = "" (line 167)

Same as "content", the content of the requested document.

section: 2 Content-related information
access: public

bool $traffic_limit_reached = false (line 293)

Indicated whether the traffic-limit set by the user was reached after downloading this document.

var: TRUE if traffic-limit was reached.
access: public

string $url = "" (line 15)

The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".

section: 1 URL-related information
access: public

Methods

setLinksFoundArray (line 337)

Workaround-method, copies and converts the array $links_found_url_descriptors to $links_found.

access: public

void setLinksFoundArray ()

toArray (line 357)

Returns an array with all properties of this class.

access: public

array toArray ()

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4