Class PHPCrawlerHTTPRequest

Description

Class for performing HTTP-requests.

Located in /libs/PHPCrawler/PHPCrawlerHTTPRequest.class.php (line 8)


	
			
Variable Summary
Method Summary
PHPCrawlerHTTPRequest __construct ()
void addCookie (string $name, string $value)
void addCookieDescriptors (array $cookies)
bool addLinkSearchContentType (string $regex)
void addPostData ( $key,  $value)
bool addReceiveContentType (string $regex)
bool addStreamToFileContentType (string $regex)
string buildCookieHeader ()
array buildPostContent ()
void clearCookies ()
void clearPostData ()
bool decideStreamToFile (string $response_header)
bool enableAggressiveLinkSearch (bool $mode)
bool openSocket (int &$error_code, string &$error_string)
string prepareHTTPRequestQuery (stirng $query)
string readResponseContent ([bool $stream_to_file = false], int &$error_code, &string &$error_string, &string &$document_received_completely, &string &$bytes_received)
string readResponseHeader (int &$error_code, string &$error_string)
void sendRequestHeader ( $request_header_lines)
void setBasicAuthentication ( $username,  $password)
bool setContentSizeLimit (int $bytes)
bool setFindRedirectURLs ( $mode)
void setHeaderCheckCallbackFunction ( &$obj,  $method_name)
bool setLinkExtractionTags (array $tag_array)
void setProxy ( $proxy_host,  $proxy_port, [ $proxy_username = null], [ $proxy_password = null])
void setTmpFile (string $tmp_file)
void setUrl (PHPCrawlerURLDescriptor $UrlDescriptor)
Variables
int $content_size_limit = 0 (line 30)

Limit for content-size to receive

  • var: The kimit n bytes
  • access: protected
mixed $cookie_array = array() (line 113)

Array containing cookies to send with the request

  • access: protected
mixed $data_transfer_time = 0 (line 44)

The time it took te receive data-packets for the request.

  • access: protected
PHPCrawlerDNSCache $DNSCache (line 94)

DNS-cache

  • access: public
mixed $global_traffic_count = 0 (line 37)

Global counter for traffic this instance of the HTTPRequest-class caused.

  • access: protected
mixed $header_check_callback_function = null (line 134)
  • access: protected
mixed $lastResponseHeader (line 106)

The last response-header this request-instance received.

  • access: protected
PHPCrawlerLinkFinder $LinkFinder (line 101)

Link-finder object

  • access: protected
array $linksearch_content_types = array("#text/html# i") (line 66)

Contains all rules defining the content-types defining which documents shoud get checked for links.

  • var: Numeric array conatining the regex-rules
  • access: protected
array $post_data = array() (line 120)

Array containing POST-data to send with the request

  • access: protected
array $proxy (line 127)

The proxy to use

  • var: Array containing the keys "proxy_host", "proxy_port", "proxy_username", "proxy_password".
  • access: protected
array $receive_content_types = array() (line 51)

Contains all rules defining the content-types that should be received

  • var: Numeric array conatining the regex-rules
  • access: protected
array $receive_to_file_content_types = array() (line 59)

Contains all rules defining the content-types of pages/files that should be streamed directly to a temporary file (instead of to memory)

  • var: Numeric array conatining the regex-rules
  • access: protected
mixed $socket (line 132)

The socket used for HTTP-requests

  • access: protected
mixed $socketConnectTimeout = 5 (line 18)

Timeout-value for socket-connection

  • access: public
mixed $socketReadTimeout = 2 (line 23)

Socket-read-timeout

  • access: public
string $tmpFile = "phpcrawl.tmp" (line 73)

The TMP-File to use when a page/file should be streamed to file.

  • access: protected
PHPCrawlerURLDescriptor $UrlDescriptor (line 80)

The URL for the request as PHPCrawlerURLDescriptor-object

  • access: protected
array $url_parts = array() (line 87)

The parts of the URL for the request as returned by PHPCrawlerUtils::splitURL()

  • access: protected
mixed $userAgentString = "PHPCrawl" (line 13)

The user-agent-string

  • access: public
Methods
Constructor __construct (line 136)
  • access: public
PHPCrawlerHTTPRequest __construct ()
addCookie (line 172)

Adds a cookie to send with the request.

  • access: public
void addCookie (string $name, string $value)
  • string $name: Cookie-name
  • string $value: Cookie-value
addCookieDescriptor (line 182)

Adds a cookie to send with the request.

  • access: public
void addCookieDescriptor (PHPCrawlerCookieDescriptor $Cookie)
addCookieDescriptors (line 193)

Adds a bunch of cookies to send with the request

  • access: public
void addCookieDescriptors (array $cookies)
  • array $cookies: Numeric array containins cookies as PHPCrawlerCookieDescriptor-objects
addLinkSearchContentType (line 995)

Adds a rule to the list of rules that decide what kind of documents should get checked for links in (regarding their content-type)

  • return: TRUE if the rule was successfully added
bool addLinkSearchContentType (string $regex)
  • string $regex: Regular-expression defining the rule
addPostData (line 241)

Adds post-data to send with the request.

  • access: public
void addPostData ( $key,  $value)
  • $key
  • $value
addReceiveContentType (line 909)

Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received

If the content-type of a requested document doesn't match with the given rules, the request will be aborted after the header was received.

  • return: TRUE if the rule was added to the list. FALSE if the given regex is not valid.
  • access: public
bool addReceiveContentType (string $regex)
  • string $regex: The rule as a regular-expression
addStreamToFileContentType (line 929)

Adds a rule to the list of rules that decides what types of content should be streamed diretly to the temporary file.

If a content-type of a page or file matches with one of these rules, the content will be streamed directly into the temporary file given in setTmpFile() without claiming local RAM.

  • return: TRUE if the rule was added to the list and the regex is valid.
  • access: public
bool addStreamToFileContentType (string $regex)
  • string $regex: The rule as a regular-expression
buildCookieHeader (line 836)

Builds the cookie-header-part for the header to send.

  • return: The cookie-header-part, i.e. "Cookie: test=bla; palimm=palaber"
  • access: protected
string buildCookieHeader ()
buildPostContent (line 813)

Builds the post-content from the postdata-array for the header to send with the request (MIME-style)

  • return: Numeric array containing the lines of the POST-part for the header
  • access: protected
array buildPostContent ()
buildRequestHeader (line 701)

Builds the request-header from the given settings.

  • return: Numeric array containing the lines of the request-header
  • access: protected
array buildRequestHeader ()
clearCookies (line 205)

Removes all cookies to send with the request.

  • access: public
void clearCookies ()
clearPostData (line 249)

Removes all post-data to send with the request.

  • access: public
void clearPostData ()
decideRecevieContent (line 863)

Checks whether the content of this page/file should be received (based on the content-type and the applied rules)

  • return: TRUE if the content should be received
  • access: protected
bool decideRecevieContent (PHPCrawlerResponseHeader $responseHeader)
decideStreamToFile (line 883)

Checks whether the content of this page/file should be streamed directly to file.

  • return: TRUE if the content should be streamed to TMP-file
  • access: protected
bool decideStreamToFile (string $response_header)
  • string $response_header: The response-header
enableAggressiveLinkSearch (line 278)

Enables/disables aggresive linksearch

  • access: public
bool enableAggressiveLinkSearch (bool $mode)
  • bool $mode
getGlobalTrafficCount (line 983)

Returns the global traffic this instance of the HTTPRequest-class caused so far.

  • return: The traffic in bytes.
  • access: public
int getGlobalTrafficCount ()
openSocket (line 426)

Opens the socket to the host.

  • return: TRUE if socket could be opened, otherwise FALSE.
  • access: protected
bool openSocket (int &$error_code, string &$error_string)
  • int &$error_code: Error-code by referenct if an error occured.
  • string &$error_string: Error-string by reference
prepareHTTPRequestQuery (line 777)

Prepares the given HTTP-query-string for the HTTP-request.

  • access: protected
string prepareHTTPRequestQuery (stirng $query)
  • stirng $query
readResponseContent (line 590)

Reads the response-content.

  • return: The response-content/source. May be emtpy if an error ocdured or data was streamed to the tmp-file.
  • access: protected
string readResponseContent ([bool $stream_to_file = false], int &$error_code, &string &$error_string, &string &$document_received_completely, &string &$bytes_received)
  • bool $stream_to_file: If TRUE, the content will be streamed diretly to the temporary file and this method will not return the content as a string.
  • int &$error_code: Error-code by reference if an error occured.
  • &string &$error_string: Error-string by reference
  • &string &$document_received_completely: Flag indicatign whether the content was received completely passed by reference
  • &string &$bytes_received: Number of bytes received, passed by reference
readResponseHeader (line 511)

Reads the response-header.

  • return: The response-header or NULL if an error occured
  • access: protected
string readResponseHeader (int &$error_code, string &$error_string)
  • int &$error_code: Error-code by reference if an error occured.
  • string &$error_string: Error-string by reference
sendRequest (line 296)

Sends the HTTP-request and receives the page/file.

  • return: PHPCrawlerDocumentInfo-object containing all information about the received page/file
  • access: public
A sendRequest ()
sendRequestHeader (line 490)

Send the request-header.

  • access: protected
void sendRequestHeader ( $request_header_lines)
  • $request_header_lines
setBasicAuthentication (line 266)

Sets basic-authentication login-data for protected URLs.

  • access: public
void setBasicAuthentication ( $username,  $password)
  • $username
  • $password
setContentSizeLimit (line 968)

Sets the size-limit in bytes for content the request should receive.

  • access: public
bool setContentSizeLimit (int $bytes)
  • int $bytes
setFindRedirectURLs (line 229)

Specifies whether redirect-links set in http-headers should get searched for.

  • access: public
bool setFindRedirectURLs ( $mode)
  • $mode
setHeaderCheckCallbackFunction (line 286)
  • access: public
void setHeaderCheckCallbackFunction ( &$obj,  $method_name)
  • &$obj
  • $method_name
setLinkExtractionTags (line 216)

Sets the html-tags from which to extract/find links from.

  • access: public
bool setLinkExtractionTags (array $tag_array)
  • array $tag_array: Numeric array containing the tags, i.g. array("href", "src", "url", ...)
setProxy (line 254)
  • access: public
void setProxy ( $proxy_host,  $proxy_port, [ $proxy_username = null], [ $proxy_password = null])
  • $proxy_host
  • $proxy_port
  • $proxy_username
  • $proxy_password
setTmpFile (line 945)

Sets the temporary file to use when content of found documents should be streamed directly into a temporary file.

  • access: public
void setTmpFile (string $tmp_file)
  • string $tmp_file: The TMP-file to use.
setUrl (line 158)

Sets the URL for the request.

  • access: public
void setUrl (PHPCrawlerURLDescriptor $UrlDescriptor)

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4