Class PHPCrawlerLinkFinder

Description

Class for finding links in HTML-documents.

Located in /libs/PHPCrawler/PHPCrawlerLinkFinder.class.php (line 8)


	
			
Variable Summary
Method Summary
PHPCrawlerLinkFinder __construct ()
void addLinkToCache ( $link_raw,  $link_code, [ $link_text = ""])
void findLinksInHTMLChunk ( &$html_source)
void findRedirectLinkInHeader ( &$http_header)
array getAllURLs ()
void processHTTPHeader ( &$header, &string $header)
void resetLinkCache ()
Variables
bool $aggressive_search = true (line 22)

Specifies whether links will also be searched outside of HTML-tags

  • access: public
PHPCrawlerUrlPartsDescriptor $baseUrlParts (line 55)

Parts of the base-url as PHPCrawlerUrlPartsDescriptor-object

  • access: protected
array $extract_tags = array("href", "src", "url", "location", "codebase", "background", "data", "profile", "action", "open") (line 15)

Numeric array containing all tags to extract links from

  • access: public
bool $find_redirect_urls = true (line 29)

Specifies whether redirect-links set in http-headers should get found.

  • access: public
mixed $found_links_map = array() (line 57)
  • access: protected
PHPCrawlerURLCache $LinkCache (line 43)

Cache for storing found links/urls

  • access: protected
array $meta_attributes = array() (line 64)

Meta-attributes found in the html-source.

  • access: protected
PHPCrawlerURLDescriptor $SourceUrl (line 36)

The URL of the html-source to find links from

  • access: protected
mixed $top_lines_processed = false (line 48)

Flag indicating whether the top lines of the HTML-source were processed.

  • access: protected
Methods
Constructor __construct (line 66)
  • access: public
PHPCrawlerLinkFinder __construct ()
addLinkToCache (line 232)
  • access: protected
void addLinkToCache ( $link_raw,  $link_code, [ $link_text = ""])
  • $link_raw
  • $link_code
  • $link_text
findLinksInHTMLChunk (line 134)

Searches for links in the given HTML-chunk and adds found links the the internal link-cache.

  • access: public
void findLinksInHTMLChunk ( &$html_source)
  • &$html_source
findRedirectLinkInHeader (line 109)

Checks for a redirect-URL in the given http-header and adds it to the internal link-cache.

  • access: protected
void findRedirectLinkInHeader ( &$http_header)
  • &$http_header
getAllMetaAttributes (line 276)

Returns all meta-tag attributes found so far in the document.

  • return: Assoziative array conatining all found meta-attributes. The keys are the meta-names, the values the content of the attributes. (like $tags["robots"] = "nofollow")
  • access: public
array getAllMetaAttributes ()
getAllURLs (line 263)

Returns all URLs/links found so far in the document.

  • return: Numeric array containing all URLs as PHPCrawlerURLDescriptor-objects
  • access: public
array getAllURLs ()
processHTTPHeader (line 89)

Processes the response-header of the document.

  • access: public
void processHTTPHeader ( &$header, &string $header)
  • &string $header: The response-header of the document.
  • &$header
resetLinkCache (line 100)

Resets/clears the internal link-cache.

  • access: public
void resetLinkCache ()
setSourceUrl (line 78)

Sets the source-URL of the document to find links in

  • access: public
void setSourceUrl (PHPCrawlerURLDescriptor $SourceUrl)

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4