Class PHPCrawlerRobotsTxtParser

Description

Class for parsing robots.txt-files.

Located in /libs/PHPCrawler/PHPCrawlerRobotsTxtParser.class.php (line 8)


	
			
Variable Summary
Method Summary
PHPCrawlerRobotsTxtParser __construct ()
array buildRegExpressions (array &$applying_lines, string $base_url)
array getApplyingLines ( &$robots_txt_content,  $user_agent_string)
array parseRobotsTxt (PHPCrawlerURLDescriptor $Url, string $user_agent_string)
Variables
PHPCrawlerHTTPRequest $PageRequest (line 15)

A PHPCrawlerHTTPRequest-object for requesting robots.txt-files.

  • access: protected
Methods
static method getRobotsTxtURL (line 218)

Returns the Robots.txt-URL related to the given URL

  • return: Url of the related to the passed URL.
  • access: public
static PHPCrawlerURLDescriptor getRobotsTxtURL (PHPCrawlerURLDescriptor $Url)
Constructor __construct (line 17)
  • access: public
PHPCrawlerRobotsTxtParser __construct ()
buildRegExpressions (line 151)

Returns an array containig regular-expressions corresponding to the given robots.txt-style "Disallow"-lines

  • return: Numeric array containing regular-expresseions created for each "disallow"-line.
  • access: protected
array buildRegExpressions (array &$applying_lines, string $base_url)
  • array &$applying_lines: Numeric array containing "disallow"-lines.
  • string $base_url: Base-URL the robots.txt-file was found in.
getApplyingLines (line 67)

Function returns all RAW lines in the given robots.txt-content that apply to the given useragent-string.

  • return: Numeric array with found lines
  • access: protected
array getApplyingLines ( &$robots_txt_content,  $user_agent_string)
  • &$robots_txt_content
  • $user_agent_string
getRobotsTxtContent (line 194)

Retreives the content of a robots.txt-file

  • return: The content of the robots.txt or NULL if no robots.txt was found.
  • access: protected
string getRobotsTxtContent (PHPCrawlerURLDescriptor $Url)
parseRobotsTxt (line 34)

Parses the robots.txt-file related to the given URL and returns regular-expression-rules corresponding to the containing "disallow"-rules that are adressed to the given user-agent.

  • return: Numeric array containing regular-expressions for each "disallow"-rule defined in the robots.txt-file that's adressed to the given user-agent.
  • access: public
array parseRobotsTxt (PHPCrawlerURLDescriptor $Url, string $user_agent_string)

Documentation generated on Sun, 20 Jan 2013 21:18:50 +0200 by phpDocumentor 1.4.4