Class PHPCrawlerURLCacheBase

Description

Abstract baseclass for implemented URL-caching classes.

  • abstract:

Located in /libs/PHPCrawler/UrlCache/PHPCrawlerURLCacheBase.class.php (line 8)


	
			
Direct descendents
Class Description
PHPCrawlerSQLiteURLCache Class for caching/storing URLs/links in a SQLite-database-file.
PHPCrawlerMemoryURLCache Class for caching/storing URLs/links in memory.
Class Constant Summary
Variable Summary
Method Summary
void addLinkPriorities (array $priority_array)
void addLinkPriority (string $regex, int $level)
void addURL (PHPCrawlerURLDescriptor $UrlDescriptor)
void addURLs (array $urls)
void cleanup ()
void clear ()
bool containsURLs ()
array getAllURLs ()
string getDistinctURLHash (PHPCrawlerURLDescriptor $UrlDescriptor)
PhpCrawlerURLDescriptor getNextUrl ()
void getUrlPriority ( $url)
void purgeCache ()
Variables
int $url_distinct_property = self::URLHASH_URL (line 17)

Defines which property of an URL is used to ensure that each URL is only cached once.

  • var: One of the URLHASH_.. constants
  • access: public
mixed $url_priorities = array() (line 10)
  • access: protected
Methods
addLinkPriorities (line 133)

Adds a bunch of link-priorities

  • access: public
void addLinkPriorities (array $priority_array)
  • array $priority_array: Numeric array containing the subkeys "match" and "level"
addLinkPriority (line 118)

Adds a Link-Priority-Level

  • access: public
void addLinkPriority (string $regex, int $level)
  • string $regex
  • int $level
addURL (line 47)

Adds an URL to the url-cache

  • abstract:
  • access: public
void addURL (PHPCrawlerURLDescriptor $UrlDescriptor)

Redefined in descendants as:
addURLs (line 54)

Adds an bunch of URLs to the url-cache

  • abstract:
  • access: public
void addURLs (array $urls)
  • array $urls: A numeric array containing the URLs as PHPCrawlerURLDescriptor-objects

Redefined in descendants as:
cleanup (line 73)

Do cleanups after the cache is not needed anymore

  • abstract:
  • access: public
void cleanup ()

Redefined in descendants as:
clear (line 40)

Removes all URLs and all priority-rules from the URL-cache.

  • abstract:
  • access: public
void clear ()

Redefined in descendants as:
containsURLs (line 61)

Checks whether there are URLs left in the cache or not.

  • abstract:
  • access: public
bool containsURLs ()

Redefined in descendants as:
getAllURLs (line 35)

Returns all URLs currently cached in the URL-cache.

  • return: Numeric array containing all URLs as PHPCrawlerURLDescriptor-objects
  • abstract:
  • access: public
array getAllURLs ()

Redefined in descendants as:
getDistinctURLHash (line 85)

Returns the distinct-hash for the given URL that ensures that no URLs a cached more than one time.

  • return: The hash or NULL if no distinct-hash should be used.
  • access: protected
string getDistinctURLHash (PHPCrawlerURLDescriptor $UrlDescriptor)
getNextUrl (line 28)

Returns the next URL from the cache that should be crawled.

  • abstract:
  • access: public
PhpCrawlerURLDescriptor getNextUrl ()

Redefined in descendants as:
getUrlPriority (line 98)

Gets the priority-level of the given URL

  • access: protected
void getUrlPriority ( $url)
  • $url
markUrlAsFollowed (line 68)

Marks the given URL in the cache as "followed"

  • abstract:
  • access: public
void markUrlAsFollowed (PHPCrawlerURLDescriptor $UrlDescriptor)

Redefined in descendants as:
purgeCache (line 78)

Cleans/purges the URL-cache from inconsistent entries.

  • abstract:
  • access: public
void purgeCache ()

Redefined in descendants as:
Class Constants
URLHASH_NONE = 3 (line 21)
URLHASH_RAWLINK = 2 (line 20)
URLHASH_URL = 1 (line 19)

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4