Class PHPCrawlerUtils

Description

Static util-methods used by phpcrawl.

Located in /libs/PHPCrawler/PHPCrawlerUtils.class.php (line 8)


	
			
Method Summary
static string buildURLFromLink (string $link, PHPCrawlerUrlPartsDescriptor $BaseUrlParts)
static string buildURLFromParts (array $url_parts, [bool $normalize = false])
static bool checkRegexPattern ( $pattern)
static bool checkStringAgainstRegexArray ( &$string, array $regex_array, &string $string)
static mixed deserializeFromFile (string $file)
static string getBaseUrlFromMetaTag ( &$html_source)
static array getCookiesFromHeader (string $header, string $source_url)
static string getHeaderValue (string $header, string $directive)
static int getHTTPStatusCode (string $header)
static array getMetaTagAttributes (&string &$html_source)
static string getRedirectURLFromHeader ( &$header)
static string getRootUrl (string $url)
static string getSystemTempDir ()
static bool isUTF8String (string $string)
static bool isValidUrlString (string $string)
static string normalizeURL (string $url)
static void rmDir ( $dir)
static void serializeToFile ( $target_file,  $data)
static void sort2dArray ( &$array,  $sort_args)
static array splitURL (string $url)
Methods
static method buildURLFromLink (line 228)

Reconstructs a full qualified and normalized URL from a given link relating to the URL the link was found in.

  • return: The rebuild, full qualified and normilazed URL the link is leading to (i.e. "http://www.foo.com/page.htm") Or NULL if the link couldn't be rebuild correctly.
  • access: public
static string buildURLFromLink (string $link, PHPCrawlerUrlPartsDescriptor $BaseUrlParts)
  • string $link: The link (i.e. "../page.htm")
  • PHPCrawlerUrlPartsDescriptor $BaseUrlParts: The parts of the URL the link was found in (i.e. "http://www.foo.com/folder/index.html")
static method buildURLFromParts (line 123)

Builds an URL from it's single parts.

  • return: The URL
  • access: public
static string buildURLFromParts (array $url_parts, [bool $normalize = false])
  • array $url_parts:

    Array conatining the URL-parts. The keys should be:

    "protocol" (z.B. "http://") OPTIONAL "host" (z.B. "www.bla.de") "path" (z.B. "/test/palimm/") OPTIONAL "file" (z.B. "index.htm") OPTIONAL "port" (z.B. 80) OPTIONAL "auth_username" OPTIONAL "auth_password" OPTIONAL

  • bool $normalize: If TRUE, the URL will be returned normalized. (I.e. http://www.foo.com/path/ insetad of http://www.foo.com:80/path/)
static method checkRegexPattern (line 194)

Checks whether a given RegEx-pattern is valid or not.

  • access: public
static bool checkRegexPattern ( $pattern)
  • $pattern
static method checkStringAgainstRegexArray (line 388)

Checks whether a given string matches with one of the given regular-expressions.

  • return: TRUE if one of the regexes matches the string, otherwise FALSE.
  • access: public
static bool checkStringAgainstRegexArray ( &$string, array $regex_array, &string $string)
  • &string $string: The string
  • array $regex_array: Numerich array containing the regular-expressions to check against.
  • &$string
static method deserializeFromFile (line 506)

Returns deserialized data that is stored in a file.

  • return: The data or NULL if the file doesn't exist
  • access: public
static mixed deserializeFromFile (string $file)
  • string $file: The file containing the serialized data
static method getBaseUrlFromMetaTag (line 350)

Returns the base-URL specified in a meta-tag in the given HTML-source

  • return: The base-URL or NULL if not found.
  • access: public
static string getBaseUrlFromMetaTag ( &$html_source)
  • &$html_source
static method getCookiesFromHeader (line 435)

Returns all cookies from the give response-header.

  • return: Numeric array containing all cookies as PHPCrawlerCookieDescriptor-objects.
  • access: public
static array getCookiesFromHeader (string $header, string $source_url)
  • string $header: The response-header
  • string $source_url: URL the cookie was send from.
static method getHeaderValue (line 416)

Gets the value of an header-directive from the given HTTP-header.

Example:

  1. PHPCrawlerUtils::getHeaderValue($header"content-type");

  • return: The value of the given directive found in the header. Or NULL if not found.
  • access: public
static string getHeaderValue (string $header, string $directive)
  • string $header: The HTTP-header
  • string $directive: The header-directive
static method getHTTPStatusCode (line 207)

Gets the HTTP-statuscode from a given response-header.

  • return: The status-code or NULL if no status-code was found.
  • access: public
static int getHTTPStatusCode (string $header)
  • string $header: The response-header
static method getMetaTagAttributes (line 586)

Gets all meta-tag atteributes from the given HTML-source.

  • return: Assoziative array conatining all found meta-attributes. The keys are the meta-names, the values the content of the attributes. (like $tags["robots"] = "nofollow")
  • access: public
static array getMetaTagAttributes (&string &$html_source)
  • &string &$html_source
static method getRedirectURLFromHeader (line 367)

Returns the redirect-URL from the given HTML-header

  • return: The redirect-URL or NULL if not found.
  • access: public
static string getRedirectURLFromHeader ( &$header)
  • &$header
static method getRootUrl (line 458)

Returns the normalized root-URL of the given URL

  • return: The root-URL, e.g. "http://www.foo.com"
  • access: public
static string getRootUrl (string $url)
  • string $url: The URL, e.g. "www.foo.con/something/index.html"
static method getSystemTempDir (line 568)

Determinates the systems temporary-directory.

  • access: public
static string getSystemTempDir ()
static method isUTF8String (line 614)

Checks wether the given string is an UTF8-encoded string.

Taken from http://www.php.net/manual/de/function.mb-detect-encoding.php (comment from "prgss at bk dot ru")

  • return: TRUE if the string is UTF-8 encoded.
  • access: public
static bool isUTF8String (string $string)
  • string $string: The string
static method isValidUrlString (line 630)

Checks whether the given string is a valid, urlencoded URL (by RFC)

  • return: TRUE if the string is a valid url-string.
  • access: public
static bool isValidUrlString (string $string)
  • string $string: The string
static method normalizeURL (line 179)

Normalizes an URL

I.e. converts http://www.foo.com:80/path/ to http://www.foo.com/path/

  • return: OR NULL on failure
  • access: public
static string normalizeURL (string $url)
  • string $url
static method rmDir (line 469)

Deletes a directory recursivly

  • access: public
static void rmDir ( $dir)
  • $dir
static method serializeToFile (line 493)

Serializes data (objects, arrayse etc.) and writes it to the given file.

  • access: public
static void serializeToFile ( $target_file,  $data)
  • $target_file
  • $data
static method sort2dArray (line 519)

Sorts a twodimensiolnal array.

  • access: public
static void sort2dArray ( &$array,  $sort_args)
  • &$array
  • $sort_args
static method splitURL (line 27)

Splits an URL into its parts

  • return:

    An array containig the parts of the URL

    The keys are:

    "protocol" (z.B. "http://") "host" (z.B. "www.bla.de") "path" (z.B. "/test/palimm/") "file" (z.B. "index.htm") "domain" (z.B. "foo.com") "port" (z.B. 80) "auth_username" "auth_password"

  • access: public
static array splitURL (string $url)
  • string $url: The URL

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4