[phpcrawl] element index

Package indexes

All elements
a b c d e f g h i k l m n o p q r s t u v w _
_
top
__construct
PHPCrawlerRobotsTxtParser::__construct() in PHPCrawlerRobotsTxtParser.class.php
__construct
PHPCrawlerResponseHeader::__construct() in PHPCrawlerResponseHeader.class.php
Initiates an new PHPCrawlerResponseHeader.
__construct
PHPCrawlerSQLiteCookieCache::__construct() in PHPCrawlerSQLiteCookieCache.class.php
__construct
PHPCrawlerSQLiteURLCache::__construct() in PHPCrawlerSQLiteURLCache.class.php
Initiates an SQLite-URL-cache.
__construct
PHPCrawlerURLDescriptor::__construct() in PHPCrawlerURLDescriptor.class.php
Initiates an URL-descriptor
__construct
PHPCrawlerProcessCommunication::__construct() in PHPCrawlerProcessCommunication.class.php
Initiates a new PHPCrawlerProcessCommunication-object.
__construct
PHPCrawlerLinkFinder::__construct() in PHPCrawlerLinkFinder.class.php
__construct
PHPCrawlerCookieDescriptor::__construct() in PHPCrawlerCookieDescriptor.class.php
Initiates a new PHPCrawlerCookieDescriptor-object.
__construct
PHPCrawlerDNSCache::__construct() in PHPCrawlerDNSCache.class.php
__construct
PHPCrawlerDocumentInfoQueue::__construct() in PHPCrawlerDocumentInfoQueue.class.php
Initiates a PHPCrawlerDocumentInfoQueue
__construct
PHPCrawlerHTTPRequest::__construct() in PHPCrawlerHTTPRequest.class.php
__construct
PHPCrawler::__construct() in PHPCrawler.class.php
Initiates a new crawler.
a
top
$abort_reason
PHPCrawlerStatus::$abort_reason in PHPCrawlerStatus.class.php
Abort reason for aborting the crawling-process.
$abort_reason
PHPCrawlerProcessReport::$abort_reason in PHPCrawlerProcessReport.class.php
Reason for the abortion of the crawling-process
$aggressive_search
PHPCrawlerLinkFinder::$aggressive_search in PHPCrawlerLinkFinder.class.php
Specifies whether links will also be searched outside of HTML-tags
$auth_password
PHPCrawlerUrlPartsDescriptor::$auth_password in PHPCrawlerUrlPartsDescriptor.class.php
$auth_username
PHPCrawlerUrlPartsDescriptor::$auth_username in PHPCrawlerUrlPartsDescriptor.class.php
addBasicAuthentication
PHPCrawlerUserSendDataCache::addBasicAuthentication() in PHPCrawlerUserSendDataCache.class.php
Adds a basic-authentication (username and password) to the list of authentications that will be send with requests.
addBasicAuthentication
PHPCrawler::addBasicAuthentication() in PHPCrawler.class.php
Adds a basic-authentication (username and password) to the list of basic authentications that will be send with requests.
addContentTypeReceiveRule
Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addCookie
PHPCrawlerMemoryCookieCache::addCookie() in PHPCrawlerMemoryCookieCache.class.php
Adds a cookie to the cookie-cache.
addCookie
PHPCrawlerSQLiteCookieCache::addCookie() in PHPCrawlerSQLiteCookieCache.class.php
Adds a cookie to the cookie-cache.
addCookie
PHPCrawlerHTTPRequest::addCookie() in PHPCrawlerHTTPRequest.class.php
Adds a cookie to send with the request.
addCookie
PHPCrawlerCookieCacheBase::addCookie() in PHPCrawlerCookieCacheBase.class.php
Adds a cookie to the cookie-cache.
addCookieDescriptor
PHPCrawlerHTTPRequest::addCookieDescriptor() in PHPCrawlerHTTPRequest.class.php
Adds a cookie to send with the request.
addCookieDescriptors
PHPCrawlerHTTPRequest::addCookieDescriptors() in PHPCrawlerHTTPRequest.class.php
Adds a bunch of cookies to send with the request
addCookies
PHPCrawlerCookieCacheBase::addCookies() in PHPCrawlerCookieCacheBase.class.php
Adds a bunch of cookies to the cookie-cache.
addCookies
PHPCrawlerMemoryCookieCache::addCookies() in PHPCrawlerMemoryCookieCache.class.php
Adds a bunch of cookies to the cookie-cache.
addCookies
PHPCrawlerSQLiteCookieCache::addCookies() in PHPCrawlerSQLiteCookieCache.class.php
Adds a bunch of cookies to the cookie-cache.
addDocumentInfo
PHPCrawlerDocumentInfoQueue::addDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php
Adds a PHPCrawlerDocumentInfo-object to the queue
addFollowMatch
PHPCrawler::addFollowMatch() in PHPCrawler.class.php
Alias for addURLFollowRule().
addLinkExtractionTags
PHPCrawler::addLinkExtractionTags() in PHPCrawler.class.php
Sets the list of html-tags from which links should be extracted from.
addLinkPriorities
PHPCrawlerURLCacheBase::addLinkPriorities() in PHPCrawlerURLCacheBase.class.php
Adds a bunch of link-priorities
addLinkPriority
PHPCrawlerURLCacheBase::addLinkPriority() in PHPCrawlerURLCacheBase.class.php
Adds a Link-Priority-Level
addLinkPriority
PHPCrawler::addLinkPriority() in PHPCrawler.class.php
Adds a regular expression togehter with a priority-level to the list of rules that decide what links should be prefered.
addLinkSearchContentType
Adds a rule to the list of rules that decide in what kind of documents the crawler should search for links in (regarding their content-type)
addLinkSearchContentType
PHPCrawlerHTTPRequest::addLinkSearchContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decide what kind of documents should get checked for links in (regarding their content-type)
addLinkToCache
PHPCrawlerLinkFinder::addLinkToCache() in PHPCrawlerLinkFinder.class.php
addNonFollowMatch
PHPCrawler::addNonFollowMatch() in PHPCrawler.class.php
Alias for addURLFilterRule().
addPostData
PHPCrawlerHTTPRequest::addPostData() in PHPCrawlerHTTPRequest.class.php
Adds post-data to send with the request.
addPostData
PHPCrawler::addPostData() in PHPCrawler.class.php
Adds post-data together with an URL-rule to the list of post-data to send with requests.
addPostData
PHPCrawlerUserSendDataCache::addPostData() in PHPCrawlerUserSendDataCache.class.php
Adds post-data together with an URL-regex to the list of post-data to send with requests.
addReceiveContentType
PHPCrawlerHTTPRequest::addReceiveContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addReceiveContentType
PHPCrawler::addReceiveContentType() in PHPCrawler.class.php
Alias for addContentTypeReceiveRule().
addReceiveToMemoryMatch
Has no function anymore!
addReceiveToTmpFileMatch
Alias for addStreamToFileContentType().
addStreamToFileContentType
PHPCrawlerHTTPRequest::addStreamToFileContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decides what types of content should be streamed diretly to the temporary file.
addStreamToFileContentType
Adds a rule to the list of rules that decides what types of content should be streamed diretly to a temporary file.
addURL
PHPCrawlerSQLiteURLCache::addURL() in PHPCrawlerSQLiteURLCache.class.php
Adds an URL to the url-cache
addURL
PHPCrawlerURLCacheBase::addURL() in PHPCrawlerURLCacheBase.class.php
Adds an URL to the url-cache
addURL
PHPCrawlerMemoryURLCache::addURL() in PHPCrawlerMemoryURLCache.class.php
Adds an URL to the url-cache
addURLFilterRule
PHPCrawlerURLFilter::addURLFilterRule() in PHPCrawlerURLFilter.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRule
PHPCrawler::addURLFilterRule() in PHPCrawler.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRules
PHPCrawlerURLFilter::addURLFilterRules() in PHPCrawlerURLFilter.class.php
Adds a bunch of rules to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFollowRule
PHPCrawlerURLFilter::addURLFollowRule() in PHPCrawlerURLFilter.class.php
addURLFollowRule
PHPCrawler::addURLFollowRule() in PHPCrawler.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be followd explicitly.
addURLs
PHPCrawlerMemoryURLCache::addURLs() in PHPCrawlerMemoryURLCache.class.php
Adds an bunch of URLs to the url-cache
addURLs
PHPCrawlerURLCacheBase::addURLs() in PHPCrawlerURLCacheBase.class.php
Adds an bunch of URLs to the url-cache
addURLs
PHPCrawlerSQLiteURLCache::addURLs() in PHPCrawlerSQLiteURLCache.class.php
Adds an bunch of URLs to the url-cache
addURL_Entry
SMCCrawler::addURL_Entry() in SitemapCreatorCrawler.class.php
add URL entry $entries
b
top
$baseUrlParts
PHPCrawlerLinkFinder::$baseUrlParts in PHPCrawlerLinkFinder.class.php
Parts of the base-url as PHPCrawlerUrlPartsDescriptor-object
$basic_authentications
PHPCrawlerUserSendDataCache::$basic_authentications in PHPCrawlerUserSendDataCache.class.php
Array containing basic-authentications to send.
$benchmarks
PHPCrawlerDocumentInfo::$benchmarks in PHPCrawlerDocumentInfo.class.php
Some internal benchmak-results as array.
$benchmark_results
PHPCrawlerBenchmark::$benchmark_results in PHPCrawlerBenchmark.class.php
$benchmark_startcount
PHPCrawlerBenchmark::$benchmark_startcount in PHPCrawlerBenchmark.class.php
$benchmark_starttimes
PHPCrawlerBenchmark::$benchmark_starttimes in PHPCrawlerBenchmark.class.php
$bytes_received
PHPCrawlerDocumentInfo::$bytes_received in PHPCrawlerDocumentInfo.class.php
The number of bytes the crawler received of the content of the document.
$bytes_received
PHPCrawlerProcessReport::$bytes_received in PHPCrawlerProcessReport.class.php
The total number of bytes the crawler received alltogether.
$bytes_received
PHPCrawlerStatus::$bytes_received in PHPCrawlerStatus.class.php
Number of bytes the crawler-instance received so far
buildCookieHeader
PHPCrawlerHTTPRequest::buildCookieHeader() in PHPCrawlerHTTPRequest.class.php
Builds the cookie-header-part for the header to send.
buildPostContent
PHPCrawlerHTTPRequest::buildPostContent() in PHPCrawlerHTTPRequest.class.php
Builds the post-content from the postdata-array for the header to send with the request (MIME-style)
buildRegExpressions
PHPCrawlerRobotsTxtParser::buildRegExpressions() in PHPCrawlerRobotsTxtParser.class.php
Returns an array containig regular-expressions corresponding to the given robots.txt-style "Disallow"-lines
buildRequestHeader
PHPCrawlerHTTPRequest::buildRequestHeader() in PHPCrawlerHTTPRequest.class.php
Builds the request-header from the given settings.
buildURLFromLink
PHPCrawlerUtils::buildURLFromLink() in PHPCrawlerUtils.class.php
Reconstructs a full qualified and normalized URL from a given link relating to the URL the link was found in.
buildURLFromParts
PHPCrawlerUtils::buildURLFromParts() in PHPCrawlerUtils.class.php
Builds an URL from it's single parts.
c
top
$child_process_number
PHPCrawler::$child_process_number in PHPCrawler.class.php
Number of child-process (NOT the PID!)
$class_version
PHPCrawler::$class_version in PHPCrawler.class.php
$content
PHPCrawlerDocumentInfo::$content in PHPCrawlerDocumentInfo.class.php
The content of the requested document (html-sourcecode or content of file).
$content_length
PHPCrawlerResponseHeader::$content_length in PHPCrawlerResponseHeader.class.php
The content-length as stated in the header.
$content_size_limit
PHPCrawlerHTTPRequest::$content_size_limit in PHPCrawlerHTTPRequest.class.php
Limit for content-size to receive
$content_tmp_file
PHPCrawlerDocumentInfo::$content_tmp_file in PHPCrawlerDocumentInfo.class.php
The temporary file to which the content was received.
$content_type
PHPCrawlerDocumentInfo::$content_type in PHPCrawlerDocumentInfo.class.php
The content-type of the page or file, e.g. "text/html" or "image/gif".
$content_type
PHPCrawlerResponseHeader::$content_type in PHPCrawlerResponseHeader.class.php
The content-type
$CookieCache
PHPCrawler::$CookieCache in PHPCrawler.class.php
The PHPCrawlerCookieCache-Object
$cookies
PHPCrawlerResponseHeader::$cookies in PHPCrawlerResponseHeader.class.php
All cookies found in the header
$cookies
PHPCrawlerDocumentInfo::$cookies in PHPCrawlerDocumentInfo.class.php
Cookies send by the server.
$cookies
PHPCrawlerMemoryCookieCache::$cookies in PHPCrawlerMemoryCookieCache.class.php
$cookie_array
PHPCrawlerHTTPRequest::$cookie_array in PHPCrawlerHTTPRequest.class.php
Array containing cookies to send with the request
$cookie_handling_enabled
PHPCrawler::$cookie_handling_enabled in PHPCrawler.class.php
Flag cookie-handling enabled/diabled
$cookie_send_time
PHPCrawlerCookieDescriptor::$cookie_send_time in PHPCrawlerCookieDescriptor.class.php
The time the cookie was send
$crawlerStatus
PHPCrawlerProcessCommunication::$crawlerStatus in PHPCrawlerProcessCommunication.class.php
$crawler_uniqid
PHPCrawlerProcessCommunication::$crawler_uniqid in PHPCrawlerProcessCommunication.class.php
$crawler_uniqid
PHPCrawler::$crawler_uniqid in PHPCrawler.class.php
UID of this instance of the crawler
$CurrentDocumentInfo
PHPCrawlerURLFilter::$CurrentDocumentInfo in PHPCrawlerURLFilter.class.php
Current PHPCrawlerDocumentInfo-object of the current document
checkForAbort
PHPCrawler::checkForAbort() in PHPCrawler.class.php
Checks if the crawling-process should be aborted.
checkRegexPattern
PHPCrawlerUtils::checkRegexPattern() in PHPCrawlerUtils.class.php
Checks whether a given RegEx-pattern is valid or not.
checkStringAgainstRegexArray
Checks whether a given string matches with one of the given regular-expressions.
childProcessAlive
PHPCrawlerProcessCommunication::childProcessAlive() in PHPCrawlerProcessCommunication.class.php
Checks wehther any child-processes a (still) running.
cleanup
PHPCrawlerSQLiteURLCache::cleanup() in PHPCrawlerSQLiteURLCache.class.php
Cleans up the cache after is it not needed anymore.
cleanup
PHPCrawlerURLCacheBase::cleanup() in PHPCrawlerURLCacheBase.class.php
Do cleanups after the cache is not needed anymore
cleanup
PHPCrawler::cleanup() in PHPCrawler.class.php
Cleans up the crawler after it has finished.
cleanup
PHPCrawlerMemoryURLCache::cleanup() in PHPCrawlerMemoryURLCache.class.php
Has no function in this class.
clear
PHPCrawlerSQLiteURLCache::clear() in PHPCrawlerSQLiteURLCache.class.php
Removes all URLs and all priority-rules from the URL-cache.
clear
PHPCrawlerMemoryURLCache::clear() in PHPCrawlerMemoryURLCache.class.php
Removes all URLs and all priority-rules from the URL-cache.
clear
PHPCrawlerURLCacheBase::clear() in PHPCrawlerURLCacheBase.class.php
Removes all URLs and all priority-rules from the URL-cache.
clearCookies
PHPCrawlerHTTPRequest::clearCookies() in PHPCrawlerHTTPRequest.class.php
Removes all cookies to send with the request.
clearPostData
PHPCrawlerHTTPRequest::clearPostData() in PHPCrawlerHTTPRequest.class.php
Removes all post-data to send with the request.
containsURLs
PHPCrawlerURLCacheBase::containsURLs() in PHPCrawlerURLCacheBase.class.php
Checks whether there are URLs left in the cache or not.
containsURLs
PHPCrawlerSQLiteURLCache::containsURLs() in PHPCrawlerSQLiteURLCache.class.php
Checks whether there are URLs left in the cache that should be processed or not.
containsURLs
PHPCrawlerMemoryURLCache::containsURLs() in PHPCrawlerMemoryURLCache.class.php
Checks whether there are URLs left in the cache or not.
createPreparedInsertStatement
PHPCrawlerSQLiteURLCache::createPreparedInsertStatement() in PHPCrawlerSQLiteURLCache.class.php
Creates the prepared statement for insterting URLs into database (if not done yet)
createPreparedStatements
PHPCrawlerDocumentInfoQueue::createPreparedStatements() in PHPCrawlerDocumentInfoQueue.class.php
createWorkingDirectory
PHPCrawler::createWorkingDirectory() in PHPCrawler.class.php
Creates the working-directory for this instance of the cralwer.
d
top
$data_throughput
PHPCrawlerProcessReport::$data_throughput in PHPCrawlerProcessReport.class.php
The average data-throughput in bytes per second.
$data_transfer_rate
PHPCrawlerDocumentInfo::$data_transfer_rate in PHPCrawlerDocumentInfo.class.php
The average data-transferrate for this document.
$data_transfer_time
PHPCrawlerHTTPRequest::$data_transfer_time in PHPCrawlerHTTPRequest.class.php
The time it took te receive data-packets for the request.
$data_transfer_time
PHPCrawlerDocumentInfo::$data_transfer_time in PHPCrawlerDocumentInfo.class.php
The time it took to receive the document.
$db_analyzed
PHPCrawlerSQLiteURLCache::$db_analyzed in PHPCrawlerSQLiteURLCache.class.php
$DNSCache
PHPCrawlerHTTPRequest::$DNSCache in PHPCrawlerHTTPRequest.class.php
DNS-cache
$DocumentInfoQueue
PHPCrawler::$DocumentInfoQueue in PHPCrawler.class.php
DocumentInfoQueue-object
$documents_received
PHPCrawlerStatus::$documents_received in PHPCrawlerStatus.class.php
Number of documents the crawler-instance received so far
$document_limit
PHPCrawler::$document_limit in PHPCrawler.class.php
Limit of documents to receive
$domain
PHPCrawlerCookieDescriptor::$domain in PHPCrawlerCookieDescriptor.class.php
Cookie-domain
$domain
PHPCrawlerUrlPartsDescriptor::$domain in PHPCrawlerUrlPartsDescriptor.class.php
decideRecevieContent
PHPCrawlerHTTPRequest::decideRecevieContent() in PHPCrawlerHTTPRequest.class.php
Checks whether the content of this page/file should be received (based on the content-type and the applied rules)
decideStreamToFile
PHPCrawlerHTTPRequest::decideStreamToFile() in PHPCrawlerHTTPRequest.class.php
Checks whether the content of this page/file should be streamed directly to file.
deserializeFromFile
PHPCrawlerUtils::deserializeFromFile() in PHPCrawlerUtils.class.php
Returns deserialized data that is stored in a file.
disableExtendedLinkInfo
Has no function anymore.
e
top
$entries
SMCCrawler::$entries in SitemapCreatorCrawler.class.php
Array contianing the entries.
$error_code
PHPCrawlerDocumentInfo::$error_code in PHPCrawlerDocumentInfo.class.php
The code of the error that perhaps occured while requesting/receiving the document.
$error_occured
PHPCrawlerDocumentInfo::$error_occured in PHPCrawlerDocumentInfo.class.php
Indicates whether an error occured while requesting/receiving the document.
$error_string
PHPCrawlerDocumentInfo::$error_string in PHPCrawlerDocumentInfo.class.php
A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
$expires
PHPCrawlerCookieDescriptor::$expires in PHPCrawlerCookieDescriptor.class.php
Expire-string, e.g. "Sat, 08-Aug-2020 23:59:08 GMT"
$expire_timestamp
PHPCrawlerCookieDescriptor::$expire_timestamp in PHPCrawlerCookieDescriptor.class.php
Expire-date as unix-timestamp
$extract_tags
PHPCrawlerLinkFinder::$extract_tags in PHPCrawlerLinkFinder.class.php
Numeric array containing all tags to extract links from
enableAggressiveLinkSearch
Enables or disables agressive link-searching.
enableAggressiveLinkSearch
PHPCrawlerHTTPRequest::enableAggressiveLinkSearch() in PHPCrawlerHTTPRequest.class.php
Enables/disables aggresive linksearch
enableCookieHandling
PHPCrawler::enableCookieHandling() in PHPCrawler.class.php
Enables or disables cookie-handling.
enableLastModifiedCount
SMCCrawler::enableLastModifiedCount() in SitemapCreatorCrawler.class.php
Enable or diable last-Modified calculation $LastModifiedCount
enableResumption
PHPCrawler::enableResumption() in PHPCrawler.class.php
Prepares the crawler for process-resumption.
f
top
$file
PHPCrawlerDocumentInfo::$file in PHPCrawlerDocumentInfo.class.php
The name of the requested page or file, e.g. "page.html".
$file
PHPCrawlerUrlPartsDescriptor::$file in PHPCrawlerUrlPartsDescriptor.class.php
$files_received
PHPCrawlerProcessReport::$files_received in PHPCrawlerProcessReport.class.php
The total number of documents the crawler received.
$file_limit_reached
PHPCrawlerProcessReport::$file_limit_reached in PHPCrawlerProcessReport.class.php
Will be TRUE if the page/file-limit was reached.
$find_redirect_urls
PHPCrawlerLinkFinder::$find_redirect_urls in PHPCrawlerLinkFinder.class.php
Specifies whether redirect-links set in http-headers should get found.
$first_content_url
PHPCrawlerStatus::$first_content_url in PHPCrawlerStatus.class.php
$follow_redirects_till_content
$found_links_map
PHPCrawlerLinkFinder::$found_links_map in PHPCrawlerLinkFinder.class.php
filterUrls
PHPCrawlerURLFilter::filterUrls() in PHPCrawlerURLFilter.class.php
Filters the given URLs (contained in the given PHPCrawlerDocumentInfo-object) by the given rules.
findLinksInHTMLChunk
PHPCrawlerLinkFinder::findLinksInHTMLChunk() in PHPCrawlerLinkFinder.class.php
Searches for links in the given HTML-chunk and adds found links the the internal link-cache.
findRedirectLinkInHeader
PHPCrawlerLinkFinder::findRedirectLinkInHeader() in PHPCrawlerLinkFinder.class.php
Checks for a redirect-URL in the given http-header and adds it to the internal link-cache.
fromURL
PHPCrawlerUrlPartsDescriptor::fromURL() in PHPCrawlerUrlPartsDescriptor.class.php
Returns the PHPCrawlerUrlPartsDescriptor-object for the given URL.
g
top
$general_follow_mode
PHPCrawlerURLFilter::$general_follow_mode in PHPCrawlerURLFilter.class.php
The general follow-mode of the crawler
$global_traffic_count
PHPCrawlerHTTPRequest::$global_traffic_count in PHPCrawlerHTTPRequest.class.php
Global counter for traffic this instance of the HTTPRequest-class caused.
getAllBenchmarks
PHPCrawlerBenchmark::getAllBenchmarks() in PHPCrawlerBenchmark.class.php
Returns all registered benchmark-results.
getAllMetaAttributes
PHPCrawlerLinkFinder::getAllMetaAttributes() in PHPCrawlerLinkFinder.class.php
Returns all meta-tag attributes found so far in the document.
getAllURLs
PHPCrawlerMemoryURLCache::getAllURLs() in PHPCrawlerMemoryURLCache.class.php
Returns all URLs currently cached in the URL-cache.
getAllURLs
PHPCrawlerLinkFinder::getAllURLs() in PHPCrawlerLinkFinder.class.php
Returns all URLs/links found so far in the document.
getAllURLs
PHPCrawlerURLCacheBase::getAllURLs() in PHPCrawlerURLCacheBase.class.php
Returns all URLs currently cached in the URL-cache.
getAllURLs
PHPCrawlerSQLiteURLCache::getAllURLs() in PHPCrawlerSQLiteURLCache.class.php
Has no function in this class
getApplyingLines
PHPCrawlerRobotsTxtParser::getApplyingLines() in PHPCrawlerRobotsTxtParser.class.php
Function returns all RAW lines in the given robots.txt-content that apply to the given useragent-string.
getBaseUrlFromMetaTag
PHPCrawlerUtils::getBaseUrlFromMetaTag() in PHPCrawlerUtils.class.php
Returns the base-URL specified in a meta-tag in the given HTML-source
getBasicAuthenticationForUrl
PHPCrawlerUserSendDataCache::getBasicAuthenticationForUrl() in PHPCrawlerUserSendDataCache.class.php
Returns the basic-authentication (username and password) that should be send to the given URL.
getCallCount
PHPCrawlerBenchmark::getCallCount() in PHPCrawlerBenchmark.class.php
getChildPIDs
PHPCrawlerProcessCommunication::getChildPIDs() in PHPCrawlerProcessCommunication.class.php
Returns alls PIDs of all running child-processes
getCookiesForUrl
PHPCrawlerMemoryCookieCache::getCookiesForUrl() in PHPCrawlerMemoryCookieCache.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl
PHPCrawlerCookieCacheBase::getCookiesForUrl() in PHPCrawlerCookieCacheBase.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl
PHPCrawlerSQLiteCookieCache::getCookiesForUrl() in PHPCrawlerSQLiteCookieCache.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesFromHeader
PHPCrawlerUtils::getCookiesFromHeader() in PHPCrawlerUtils.class.php
Returns all cookies from the give response-header.
getCrawlerId
PHPCrawler::getCrawlerId() in PHPCrawler.class.php
Returns the unique ID of the instance of the crawler
getCrawlerStatus
PHPCrawlerProcessCommunication::getCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Returns/reads the current crawler-status
getDistinctURLHash
PHPCrawlerURLCacheBase::getDistinctURLHash() in PHPCrawlerURLCacheBase.class.php
Returns the distinct-hash for the given URL that ensures that no URLs a cached more than one time.
getDocumentInfoCount
PHPCrawlerDocumentInfoQueue::getDocumentInfoCount() in PHPCrawlerDocumentInfoQueue.class.php
Returns the current number of PHPCrawlerDocumentInfo-objects in the queue
getElapsedTime
PHPCrawlerBenchmark::getElapsedTime() in PHPCrawlerBenchmark.class.php
Gets the elapsed time for the given benchmark.
getFromHeaderLine
PHPCrawlerCookieDescriptor::getFromHeaderLine() in PHPCrawlerCookieDescriptor.class.php
Returns a PHPCrawlerCookieDescriptor-object initiated by the given cookie-header-line.
getGlobalTrafficCount
PHPCrawlerHTTPRequest::getGlobalTrafficCount() in PHPCrawlerHTTPRequest.class.php
Returns the global traffic this instance of the HTTPRequest-class caused so far.
getHeaderValue
PHPCrawlerUtils::getHeaderValue() in PHPCrawlerUtils.class.php
Gets the value of an header-directive from the given HTTP-header.
getHTTPStatusCode
PHPCrawlerUtils::getHTTPStatusCode() in PHPCrawlerUtils.class.php
Gets the HTTP-statuscode from a given response-header.
getIP
PHPCrawlerDNSCache::getIP() in PHPCrawlerDNSCache.class.php
Returns the IP for the given hostname.
getLastModified
SMCCrawler::getLastModified() in SitemapCreatorCrawler.class.php
get Last-Modified header
getMaxPriorityLevel
PHPCrawlerMemoryURLCache::getMaxPriorityLevel() in PHPCrawlerMemoryURLCache.class.php
Returns the highest priority-level an URL exists in cache for.
getMetaTagAttributes
PHPCrawlerUtils::getMetaTagAttributes() in PHPCrawlerUtils.class.php
Gets all meta-tag atteributes from the given HTML-source.
getmicrotime
PHPCrawlerBenchmark::getmicrotime() in PHPCrawlerBenchmark.class.php
Returns the current time in seconds and milliseconds.
getNextDocumentInfo
PHPCrawlerDocumentInfoQueue::getNextDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php
Returns a PHPCrawlerDocumentInfo-object from the queue
getNextUrl
PHPCrawlerURLCacheBase::getNextUrl() in PHPCrawlerURLCacheBase.class.php
Returns the next URL from the cache that should be crawled.
getNextUrl
PHPCrawlerSQLiteURLCache::getNextUrl() in PHPCrawlerSQLiteURLCache.class.php
Returns the next URL from the cache that should be crawled.
getNextUrl
PHPCrawlerMemoryURLCache::getNextUrl() in PHPCrawlerMemoryURLCache.class.php
Returns the next URL from the cache that should be crawled.
getPostDataForUrl
PHPCrawlerUserSendDataCache::getPostDataForUrl() in PHPCrawlerUserSendDataCache.class.php
Returns the post-data (key and value) that should be send to the given URL.
getProcessReport
PHPCrawler::getProcessReport() in PHPCrawler.class.php
Retruns summarizing report-information about the crawling-process after it has finished.
getRedirectURLFromHeader
PHPCrawlerUtils::getRedirectURLFromHeader() in PHPCrawlerUtils.class.php
Returns the redirect-URL from the given HTML-header
getReport
PHPCrawler::getReport() in PHPCrawler.class.php
Retruns an array with summarizing report-information after the crawling-process has finished
getRobotsTxtContent
PHPCrawlerRobotsTxtParser::getRobotsTxtContent() in PHPCrawlerRobotsTxtParser.class.php
Retreives the content of a robots.txt-file
getRobotsTxtURL
PHPCrawlerRobotsTxtParser::getRobotsTxtURL() in PHPCrawlerRobotsTxtParser.class.php
Returns the Robots.txt-URL related to the given URL
getRootUrl
PHPCrawlerUtils::getRootUrl() in PHPCrawlerUtils.class.php
Returns the normalized root-URL of the given URL
getSystemTempDir
PHPCrawlerUtils::getSystemTempDir() in PHPCrawlerUtils.class.php
Determinates the systems temporary-directory.
getUrlCount
PHPCrawlerSQLiteURLCache::getUrlCount() in PHPCrawlerSQLiteURLCache.class.php
getUrlPriority
PHPCrawlerURLCacheBase::getUrlPriority() in PHPCrawlerURLCacheBase.class.php
Gets the priority-level of the given URL
go
PHPCrawler::go() in PHPCrawler.class.php
Starts the crawling process in single-process-mode.
goMultiProcessed
PHPCrawler::goMultiProcessed() in PHPCrawler.class.php
Starts the cralwer by using multi processes.
h
top
$header
PHPCrawlerDocumentInfo::$header in PHPCrawlerDocumentInfo.class.php
The complete HTTP-header the webserver responded with this page or file.
$header_check_callback_function
$header_raw
PHPCrawlerResponseHeader::$header_raw in PHPCrawlerResponseHeader.class.php
The raw HTTP-header as it was send by the server
$header_send
PHPCrawlerDocumentInfo::$header_send in PHPCrawlerDocumentInfo.class.php
The complete HTTP-request-header the crawler sent to the server (debugging info).
$host
PHPCrawlerDocumentInfo::$host in PHPCrawlerDocumentInfo.class.php
The host-part of the URL of the requested page or file, e.g. "www.foo.com".
$host
PHPCrawlerUrlPartsDescriptor::$host in PHPCrawlerUrlPartsDescriptor.class.php
$host_ip_array
PHPCrawlerDNSCache::$host_ip_array in PHPCrawlerDNSCache.class.php
Array for caching IPs of the requested hostnames
$http_status_code
PHPCrawlerResponseHeader::$http_status_code in PHPCrawlerResponseHeader.class.php
The HTTP-statuscode
$http_status_code
PHPCrawlerDocumentInfo::$http_status_code in PHPCrawlerDocumentInfo.class.php
The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
handleDocumentInfo
SMCCrawler::handleDocumentInfo() in SitemapCreatorCrawler.class.php
get access to all information about a page or file the crawler found and received.
handleDocumentInfo
PHPCrawler::handleDocumentInfo() in PHPCrawler.class.php
Override this method to get access to all information about a page or file the crawler found and received.
handleHeaderInfo
PHPCrawler::handleHeaderInfo() in PHPCrawler.class.php
Overridable method that will be called after the header of a document was received and BEFORE the content will be received.
handlePageData
PHPCrawler::handlePageData() in PHPCrawler.class.php
Override this method to get access to all information about a page or file the crawler found and received.
hostInCache
PHPCrawlerDNSCache::hostInCache() in PHPCrawlerDNSCache.class.php
Checks whether a hostname is already cached.
i
top
$is_chlid_process
PHPCrawler::$is_chlid_process in PHPCrawler.class.php
Flag indicating whether this instance is running in a child-process (if crawler runs multi-processed)
$is_parent_process
PHPCrawler::$is_parent_process in PHPCrawler.class.php
Flag indicating whether this instance is running in the parent-process (if crawler runs multi-processed)
$is_redirect_url
PHPCrawlerURLDescriptor::$is_redirect_url in PHPCrawlerURLDescriptor.class.php
Flag indicating whether this URL was target of an HTTP-redirect.
initChildProcess
PHPCrawler::initChildProcess() in PHPCrawler.class.php
Overridable method that will be called by every used child-process just before it starts the crawling-procedure.
initCrawlerProcess
PHPCrawler::initCrawlerProcess() in PHPCrawler.class.php
Initiates a crawler-process
isUTF8String
PHPCrawlerUtils::isUTF8String() in PHPCrawlerUtils.class.php
Checks wether the given string is an UTF8-encoded string.
isValidUrlString
PHPCrawlerUtils::isValidUrlString() in PHPCrawlerUtils.class.php
Checks whether the given string is a valid, urlencoded URL (by RFC)
k
top
keepRedirectUrls
PHPCrawlerURLFilter::keepRedirectUrls() in PHPCrawlerURLFilter.class.php
Filters out all non-redirect-URLs from the URLs given in the PHPCrawlerDocumentInfo-object
killChildProcesses
PHPCrawlerProcessCommunication::killChildProcesses() in PHPCrawlerProcessCommunication.class.php
Kills all running child-processes
l
top
$LastModifiedCount
SMCCrawler::$LastModifiedCount in SitemapCreatorCrawler.class.php
get Last Modified header
$lastResponseHeader
PHPCrawlerHTTPRequest::$lastResponseHeader in PHPCrawlerHTTPRequest.class.php
The last response-header this request-instance received.
$LinkCache
PHPCrawlerLinkFinder::$LinkCache in PHPCrawlerLinkFinder.class.php
Cache for storing found links/urls
$LinkCache
PHPCrawler::$LinkCache in PHPCrawler.class.php
The PHPCrawlerLinkCache-Object
$linkcode
PHPCrawlerURLDescriptor::$linkcode in PHPCrawlerURLDescriptor.class.php
The html-codepart that contained the link to this URL, i.e. "<a href="../foo.html">LINKTEXT</a>"
$LinkFinder
PHPCrawlerHTTPRequest::$LinkFinder in PHPCrawlerHTTPRequest.class.php
Link-finder object
$linksearch_content_types
PHPCrawlerHTTPRequest::$linksearch_content_types in PHPCrawlerHTTPRequest.class.php
Contains all rules defining the content-types defining which documents shoud get checked for links.
$links_followed
PHPCrawlerProcessReport::$links_followed in PHPCrawlerProcessReport.class.php
The total number of links/URLs the crawler found and followed.
$links_followed
PHPCrawlerStatus::$links_followed in PHPCrawlerStatus.class.php
Number of links the crawler-instance followed so far
$links_found
PHPCrawlerDocumentInfo::$links_found in PHPCrawlerDocumentInfo.class.php
An numeric array containing information about all links that were found in the source of the page.
$links_found_url_descriptors
PHPCrawlerDocumentInfo::$links_found_url_descriptors in PHPCrawlerDocumentInfo.class.php
An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
$linktext
PHPCrawlerURLDescriptor::$linktext in PHPCrawlerURLDescriptor.class.php
The linktext or html-code the link to this URL was layed over.
$link_priority_array
PHPCrawler::$link_priority_array in PHPCrawler.class.php
$link_raw
PHPCrawlerURLDescriptor::$link_raw in PHPCrawlerURLDescriptor.class.php
The raw link to this URL as it was found in the HTML-source, i.e. "../dunno/index.php"
m
top
$memory_peak_usage
PHPCrawlerProcessReport::$memory_peak_usage in PHPCrawlerProcessReport.class.php
The peak memory-usage the crawling-process caused.
$meta_attributes
PHPCrawlerLinkFinder::$meta_attributes in PHPCrawlerLinkFinder.class.php
Meta-attributes found in the html-source.
$meta_attributes
PHPCrawlerDocumentInfo::$meta_attributes in PHPCrawlerDocumentInfo.class.php
All meta-tag atteributes found in the source of the document.
$multiprocess_mode
PHPCrawlerProcessCommunication::$multiprocess_mode in PHPCrawlerProcessCommunication.class.php
$multiprocess_mode
PHPCrawler::$multiprocess_mode in PHPCrawler.class.php
Multiprocess-mode the crawler is runnung in.
markUrlAsFollowed
PHPCrawlerURLCacheBase::markUrlAsFollowed() in PHPCrawlerURLCacheBase.class.php
Marks the given URL in the cache as "followed"
markUrlAsFollowed
PHPCrawlerMemoryURLCache::markUrlAsFollowed() in PHPCrawlerMemoryURLCache.class.php
Has no function in this memory-cache.
markUrlAsFollowed
PHPCrawlerSQLiteURLCache::markUrlAsFollowed() in PHPCrawlerSQLiteURLCache.class.php
Marks the given URL in the cache as "followed"
n
top
$name
PHPCrawlerCookieDescriptor::$name in PHPCrawlerCookieDescriptor.class.php
Cookie-name
normalizeURL
PHPCrawlerUtils::normalizeURL() in PHPCrawlerUtils.class.php
Normalizes an URL
o
top
$obey_nofollow_tags
PHPCrawlerURLFilter::$obey_nofollow_tags in PHPCrawlerURLFilter.class.php
Defines whether nofollow-tags should get obeyed.
$obey_robots_txt
PHPCrawler::$obey_robots_txt in PHPCrawler.class.php
Defines whether robots.txt-file should be obeyed
$only_count_received_documents
Defines if only documents that were received will be counted.
obeyNoFollowTags
PHPCrawler::obeyNoFollowTags() in PHPCrawler.class.php
Decides whether the crawler should obey "nofollow"-tags
obeyRobotsTxt
PHPCrawler::obeyRobotsTxt() in PHPCrawler.class.php
Decides whether the crawler should parse and obey robots.txt-files.
openConnection
PHPCrawlerSQLiteURLCache::openConnection() in PHPCrawlerSQLiteURLCache.class.php
Creates the sqlite-db-file and opens connection to it.
openConnection
PHPCrawlerDocumentInfoQueue::openConnection() in PHPCrawlerDocumentInfoQueue.class.php
Creates the sqlite-db-file and opens connection to it.
openConnection
PHPCrawlerSQLiteCookieCache::openConnection() in PHPCrawlerSQLiteCookieCache.class.php
Creates the sqlite-db-file and opens connection to it.
openSocket
PHPCrawlerHTTPRequest::openSocket() in PHPCrawlerHTTPRequest.class.php
Opens the socket to the host.
p
top
$PageRequest
PHPCrawlerRobotsTxtParser::$PageRequest in PHPCrawlerRobotsTxtParser.class.php
A PHPCrawlerHTTPRequest-object for requesting robots.txt-files.
$PageRequest
PHPCrawler::$PageRequest in PHPCrawler.class.php
The PHPCrawlerHTTPRequest-Object
$path
PHPCrawlerDocumentInfo::$path in PHPCrawlerDocumentInfo.class.php
The path in the URL of the requested page or file, e.g. "/page/".
$path
PHPCrawlerCookieDescriptor::$path in PHPCrawlerCookieDescriptor.class.php
Cookie-path
$path
PHPCrawlerUrlPartsDescriptor::$path in PHPCrawlerUrlPartsDescriptor.class.php
$PDO
PHPCrawlerDocumentInfoQueue::$PDO in PHPCrawlerDocumentInfoQueue.class.php
$PDO
PHPCrawlerSQLiteCookieCache::$PDO in PHPCrawlerSQLiteCookieCache.class.php
$PDO
PHPCrawlerSQLiteURLCache::$PDO in PHPCrawlerSQLiteURLCache.class.php
PDO-object for querying SQLite-file.
$porcess_abort_reason
PHPCrawler::$porcess_abort_reason in PHPCrawler.class.php
The reason why the process was aborted/finished.
$port
PHPCrawlerDocumentInfo::$port in PHPCrawlerDocumentInfo.class.php
The port of the URL the request was send to, e.g. 80
$port
PHPCrawlerUrlPartsDescriptor::$port in PHPCrawlerUrlPartsDescriptor.class.php
$post_data
PHPCrawlerHTTPRequest::$post_data in PHPCrawlerHTTPRequest.class.php
Array containing POST-data to send with the request
$post_data
PHPCrawlerUserSendDataCache::$post_data in PHPCrawlerUserSendDataCache.class.php
Array containing post-data to send.
$PreparedInsertStatement
PHPCrawlerSQLiteURLCache::$PreparedInsertStatement in PHPCrawlerSQLiteURLCache.class.php
Prepared statement for inserting URLS into the db-file as PDOStatement-object.
$prepared_statements_created
PHPCrawlerDocumentInfoQueue::$prepared_statements_created in PHPCrawlerDocumentInfoQueue.class.php
$ProcessCommunication
PHPCrawler::$ProcessCommunication in PHPCrawler.class.php
ProcessCommunication-object
$process_runtime
PHPCrawlerProcessReport::$process_runtime in PHPCrawlerProcessReport.class.php
The total time the crawling-process was running in seconds.
$protocol
PHPCrawlerDocumentInfo::$protocol in PHPCrawlerDocumentInfo.class.php
The protocol-part of the URL of the page or file, e.g. "http://"
$protocol
PHPCrawlerUrlPartsDescriptor::$protocol in PHPCrawlerUrlPartsDescriptor.class.php
$proxy
PHPCrawlerHTTPRequest::$proxy in PHPCrawlerHTTPRequest.class.php
The proxy to use
PHPCrawlerCookieCacheBase.class.php
PHPCrawlerCookieCacheBase.class.php in PHPCrawlerCookieCacheBase.class.php
PHPCrawlerMemoryCookieCache.class.php
PHPCrawlerMemoryCookieCache.class.php in PHPCrawlerMemoryCookieCache.class.php
PHPCrawlerSQLiteCookieCache.class.php
PHPCrawlerSQLiteCookieCache.class.php in PHPCrawlerSQLiteCookieCache.class.php
PHPCrawler.class.php
PHPCrawler.class.php in PHPCrawler.class.php
PHPCrawlerBenchmark.class.php
PHPCrawlerBenchmark.class.php in PHPCrawlerBenchmark.class.php
PHPCrawlerCookieDescriptor.class.php
PHPCrawlerCookieDescriptor.class.php in PHPCrawlerCookieDescriptor.class.php
PHPCrawlerDNSCache.class.php
PHPCrawlerDNSCache.class.php in PHPCrawlerDNSCache.class.php
PHPCrawlerDocumentInfo.class.php
PHPCrawlerDocumentInfo.class.php in PHPCrawlerDocumentInfo.class.php
PHPCrawlerHTTPRequest.class.php
PHPCrawlerHTTPRequest.class.php in PHPCrawlerHTTPRequest.class.php
PHPCrawlerLinkFinder.class.php
PHPCrawlerLinkFinder.class.php in PHPCrawlerLinkFinder.class.php
PHPCrawlerProcessReport.class.php
PHPCrawlerProcessReport.class.php in PHPCrawlerProcessReport.class.php
PHPCrawlerResponseHeader.class.php
PHPCrawlerResponseHeader.class.php in PHPCrawlerResponseHeader.class.php
PHPCrawlerRobotsTxtParser.class.php
PHPCrawlerRobotsTxtParser.class.php in PHPCrawlerRobotsTxtParser.class.php
PHPCrawlerStatus.class.php
PHPCrawlerStatus.class.php in PHPCrawlerStatus.class.php
PHPCrawlerURLDescriptor.class.php
PHPCrawlerURLDescriptor.class.php in PHPCrawlerURLDescriptor.class.php
PHPCrawlerURLFilter.class.php
PHPCrawlerURLFilter.class.php in PHPCrawlerURLFilter.class.php
PHPCrawlerUrlPartsDescriptor.class.php
PHPCrawlerUrlPartsDescriptor.class.php in PHPCrawlerUrlPartsDescriptor.class.php
PHPCrawlerUserSendDataCache.class.php
PHPCrawlerUserSendDataCache.class.php in PHPCrawlerUserSendDataCache.class.php
PHPCrawlerUtils.class.php
PHPCrawlerUtils.class.php in PHPCrawlerUtils.class.php
PHPCrawlerDocumentInfoQueue.class.php
PHPCrawlerDocumentInfoQueue.class.php in PHPCrawlerDocumentInfoQueue.class.php
PHPCrawlerProcessCommunication.class.php
PHPCrawlerProcessCommunication.class.php in PHPCrawlerProcessCommunication.class.php
PHPCrawlerMemoryURLCache.class.php
PHPCrawlerMemoryURLCache.class.php in PHPCrawlerMemoryURLCache.class.php
PHPCrawlerSQLiteURLCache.class.php
PHPCrawlerSQLiteURLCache.class.php in PHPCrawlerSQLiteURLCache.class.php
PHPCrawlerURLCacheBase.class.php
PHPCrawlerURLCacheBase.class.php in PHPCrawlerURLCacheBase.class.php
parseRobotsTxt
PHPCrawlerRobotsTxtParser::parseRobotsTxt() in PHPCrawlerRobotsTxtParser.class.php
Parses the robots.txt-file related to the given URL and returns regular-expression-rules corresponding to the containing "disallow"-rules that are adressed to the given user-agent.
PHPCrawler
PHPCrawler in PHPCrawler.class.php
PHPCrawl mainclass
PHPCrawlerBenchmark
PHPCrawlerBenchmark in PHPCrawlerBenchmark.class.php
A static benchmark-class for doing benchmarks within phpcrawl.
PHPCrawlerCookieCacheBase
PHPCrawlerCookieCacheBase in PHPCrawlerCookieCacheBase.class.php
Abstract baseclass for storing cookies.
PHPCrawlerCookieDescriptor
PHPCrawlerCookieDescriptor in PHPCrawlerCookieDescriptor.class.php
Describes a cookie within the PHPCrawl-system.
PHPCrawlerDNSCache
PHPCrawlerDNSCache in PHPCrawlerDNSCache.class.php
Simple DNS-cache used by phpcrawl.
PHPCrawlerDocumentInfo
PHPCrawlerDocumentInfo in PHPCrawlerDocumentInfo.class.php
Contains information about a page or file the crawler found and received during the crawling-process.
PHPCrawlerDocumentInfoQueue
PHPCrawlerDocumentInfoQueue in PHPCrawlerDocumentInfoQueue.class.php
Queue for PHPCrawlerDocumentInfo-objects
PHPCrawlerHTTPRequest
PHPCrawlerHTTPRequest in PHPCrawlerHTTPRequest.class.php
Class for performing HTTP-requests.
PHPCrawlerLinkFinder
PHPCrawlerLinkFinder in PHPCrawlerLinkFinder.class.php
Class for finding links in HTML-documents.
PHPCrawlerMemoryCookieCache
PHPCrawlerMemoryCookieCache in PHPCrawlerMemoryCookieCache.class.php
Class for storing/caching cookies in memory.
PHPCrawlerMemoryURLCache
PHPCrawlerMemoryURLCache in PHPCrawlerMemoryURLCache.class.php
Class for caching/storing URLs/links in memory.
PHPCrawlerProcessCommunication
PHPCrawlerProcessCommunication in PHPCrawlerProcessCommunication.class.php
Class containing methods for process handling and communication
PHPCrawlerProcessReport
PHPCrawlerProcessReport in PHPCrawlerProcessReport.class.php
Contains summarizing information about a crawling-process after the process is finished.
PHPCrawlerResponseHeader
PHPCrawlerResponseHeader in PHPCrawlerResponseHeader.class.php
Describes an HTTP response-header within the phpcrawl-system.
PHPCrawlerRobotsTxtParser
PHPCrawlerRobotsTxtParser in PHPCrawlerRobotsTxtParser.class.php
Class for parsing robots.txt-files.
PHPCrawlerSQLiteCookieCache
PHPCrawlerSQLiteCookieCache in PHPCrawlerSQLiteCookieCache.class.php
Class for storing/caching cookies in a SQLite-db-file.
PHPCrawlerSQLiteURLCache
PHPCrawlerSQLiteURLCache in PHPCrawlerSQLiteURLCache.class.php
Class for caching/storing URLs/links in a SQLite-database-file.
PHPCrawlerStatus
PHPCrawlerStatus in PHPCrawlerStatus.class.php
Describes the current status of an crawler-instance.
PHPCrawlerURLCacheBase
PHPCrawlerURLCacheBase in PHPCrawlerURLCacheBase.class.php
Abstract baseclass for implemented URL-caching classes.
PHPCrawlerURLDescriptor
PHPCrawlerURLDescriptor in PHPCrawlerURLDescriptor.class.php
Describes a URL within the PHPCrawl-system.
PHPCrawlerURLFilter
PHPCrawlerURLFilter in PHPCrawlerURLFilter.class.php
Class for filtering URLs by given filter-rules.
PHPCrawlerUrlPartsDescriptor
PHPCrawlerUrlPartsDescriptor in PHPCrawlerUrlPartsDescriptor.class.php
Describes the single parts of an URL.
PHPCrawlerUserSendDataCache
PHPCrawlerUserSendDataCache in PHPCrawlerUserSendDataCache.class.php
Cache for storing user-data to send with requests, like cookies, post-data and basic-authentications.
PHPCrawlerUtils
PHPCrawlerUtils in PHPCrawlerUtils.class.php
Static util-methods used by phpcrawl.
prepareHTTPRequestQuery
PHPCrawlerHTTPRequest::prepareHTTPRequestQuery() in PHPCrawlerHTTPRequest.class.php
Prepares the given HTTP-query-string for the HTTP-request.
printAllBenchmarks
PHPCrawlerBenchmark::printAllBenchmarks() in PHPCrawlerBenchmark.class.php
processHTTPHeader
PHPCrawlerLinkFinder::processHTTPHeader() in PHPCrawlerLinkFinder.class.php
Processes the response-header of the document.
processRobotsTxt
PHPCrawler::processRobotsTxt() in PHPCrawler.class.php
processUrl
PHPCrawler::processUrl() in PHPCrawler.class.php
Receives and processes the given URL
purgeCache
PHPCrawlerMemoryURLCache::purgeCache() in PHPCrawlerMemoryURLCache.class.php
Has no function in this class.
purgeCache
PHPCrawlerSQLiteURLCache::purgeCache() in PHPCrawlerSQLiteURLCache.class.php
Cleans/purges the URL-cache from inconsistent entries.
purgeCache
PHPCrawlerURLCacheBase::purgeCache() in PHPCrawlerURLCacheBase.class.php
Cleans/purges the URL-cache from inconsistent entries.
q
top
$query
PHPCrawlerDocumentInfo::$query in PHPCrawlerDocumentInfo.class.php
The query-part of the URL of the requested page or file, e.g. "?x=y".
$queue_max_size
PHPCrawlerDocumentInfoQueue::$queue_max_size in PHPCrawlerDocumentInfoQueue.class.php
r
top
$received
PHPCrawlerDocumentInfo::$received in PHPCrawlerDocumentInfo.class.php
Flag indicating whether content was received from the page or file.
$received_completely
PHPCrawlerDocumentInfo::$received_completely in PHPCrawlerDocumentInfo.class.php
Flag indicating whether content was completely received from the page or file.
$received_completly
PHPCrawlerDocumentInfo::$received_completly in PHPCrawlerDocumentInfo.class.php
Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl.
$received_to_file
PHPCrawlerDocumentInfo::$received_to_file in PHPCrawlerDocumentInfo.class.php
Will be true if the content was received into temporary file.
$received_to_memory
PHPCrawlerDocumentInfo::$received_to_memory in PHPCrawlerDocumentInfo.class.php
Will be true if the content was received into local memory.
$receive_content_types
PHPCrawlerHTTPRequest::$receive_content_types in PHPCrawlerHTTPRequest.class.php
Contains all rules defining the content-types that should be received
$receive_to_file_content_types
Contains all rules defining the content-types of pages/files that should be streamed directly to a temporary file (instead of to memory)
$referer_url
PHPCrawlerDocumentInfo::$referer_url in PHPCrawlerDocumentInfo.class.php
The complete URL of the page that contained the link to this document.
$refering_linkcode
PHPCrawlerDocumentInfo::$refering_linkcode in PHPCrawlerDocumentInfo.class.php
The html-sourcecode that contained the link to the current document.
$refering_linktext
PHPCrawlerDocumentInfo::$refering_linktext in PHPCrawlerDocumentInfo.class.php
The linktext of the link that "linked" to this document.
$refering_link_raw
PHPCrawlerDocumentInfo::$refering_link_raw in PHPCrawlerDocumentInfo.class.php
Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
$refering_url
PHPCrawlerURLDescriptor::$refering_url in PHPCrawlerURLDescriptor.class.php
The URL of the page that contained the link to the URL described here.
$responseHeader
PHPCrawlerDocumentInfo::$responseHeader in PHPCrawlerDocumentInfo.class.php
The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
$resumtion_enabled
PHPCrawler::$resumtion_enabled in PHPCrawler.class.php
Flag indicating whether resumtion is activated
$resumtion_enabled
PHPCrawlerProcessCommunication::$resumtion_enabled in PHPCrawlerProcessCommunication.class.php
Flag indicating whether resumtion is activated
$RobotsTxtParser
PHPCrawler::$RobotsTxtParser in PHPCrawler.class.php
The RobotsTxtParser-Object
readResponseContent
PHPCrawlerHTTPRequest::readResponseContent() in PHPCrawlerHTTPRequest.class.php
Reads the response-content.
readResponseHeader
PHPCrawlerHTTPRequest::readResponseHeader() in PHPCrawlerHTTPRequest.class.php
Reads the response-header.
registerChildPID
PHPCrawlerProcessCommunication::registerChildPID() in PHPCrawlerProcessCommunication.class.php
Registers the PID of a child-process
reset
PHPCrawlerBenchmark::reset() in PHPCrawlerBenchmark.class.php
Resets the clock for the given benchmark.
resetAll
PHPCrawlerBenchmark::resetAll() in PHPCrawlerBenchmark.class.php
Resets all clocks for all benchmarks.
resetLinkCache
PHPCrawlerLinkFinder::resetLinkCache() in PHPCrawlerLinkFinder.class.php
Resets/clears the internal link-cache.
resume
PHPCrawler::resume() in PHPCrawler.class.php
Resumes the crawling-process with the given crawler-ID
rmDir
PHPCrawlerUtils::rmDir() in PHPCrawlerUtils.class.php
Deletes a directory recursivly
s
top
$socket
PHPCrawlerHTTPRequest::$socket in PHPCrawlerHTTPRequest.class.php
The socket used for HTTP-requests
$socketConnectTimeout
PHPCrawlerHTTPRequest::$socketConnectTimeout in PHPCrawlerHTTPRequest.class.php
Timeout-value for socket-connection
$socketReadTimeout
PHPCrawlerHTTPRequest::$socketReadTimeout in PHPCrawlerHTTPRequest.class.php
Socket-read-timeout
$source
PHPCrawlerDocumentInfo::$source in PHPCrawlerDocumentInfo.class.php
Same as "content", the content of the requested document.
$SourceUrl
PHPCrawlerLinkFinder::$SourceUrl in PHPCrawlerLinkFinder.class.php
The URL of the html-source to find links from
$source_domain
PHPCrawlerCookieDescriptor::$source_domain in PHPCrawlerCookieDescriptor.class.php
The domain the cookie was send from
$source_url
PHPCrawlerCookieDescriptor::$source_url in PHPCrawlerCookieDescriptor.class.php
The URL the cookie was send from
$source_url
PHPCrawlerResponseHeader::$source_url in PHPCrawlerResponseHeader.class.php
The URL of the website the header was recevied from.
$sqlite_db_file
PHPCrawlerSQLiteURLCache::$sqlite_db_file in PHPCrawlerSQLiteURLCache.class.php
$sqlite_db_file
PHPCrawlerSQLiteCookieCache::$sqlite_db_file in PHPCrawlerSQLiteCookieCache.class.php
$sqlite_db_file
PHPCrawlerDocumentInfoQueue::$sqlite_db_file in PHPCrawlerDocumentInfoQueue.class.php
$starting_url
PHPCrawlerURLFilter::$starting_url in PHPCrawlerURLFilter.class.php
The full qualified and normalized URL the crawling-prpocess was started with.
$starting_url
PHPCrawler::$starting_url in PHPCrawler.class.php
The URL the crawler should start with.
$starting_url_parts
PHPCrawlerURLFilter::$starting_url_parts in PHPCrawlerURLFilter.class.php
The URL-parts of the starting-url.
sendRequest
PHPCrawlerHTTPRequest::sendRequest() in PHPCrawlerHTTPRequest.class.php
Sends the HTTP-request and receives the page/file.
sendRequestHeader
PHPCrawlerHTTPRequest::sendRequestHeader() in PHPCrawlerHTTPRequest.class.php
Send the request-header.
serializeToFile
PHPCrawlerUtils::serializeToFile() in PHPCrawlerUtils.class.php
Serializes data (objects, arrayse etc.) and writes it to the given file.
setAggressiveLinkExtraction
Alias for enableAggressiveLinkSearch()
setBaseURL
PHPCrawlerURLFilter::setBaseURL() in PHPCrawlerURLFilter.class.php
Sets the base-URL of the crawling process some rules relate to
setBasicAuthentication
PHPCrawlerHTTPRequest::setBasicAuthentication() in PHPCrawlerHTTPRequest.class.php
Sets basic-authentication login-data for protected URLs.
setConnectionTimeout
PHPCrawler::setConnectionTimeout() in PHPCrawler.class.php
Sets the timeout in seconds for connection tries to hosting webservers.
setContentSizeLimit
PHPCrawler::setContentSizeLimit() in PHPCrawler.class.php
Sets the content-size-limit for content the crawler should receive from documents.
setContentSizeLimit
PHPCrawlerHTTPRequest::setContentSizeLimit() in PHPCrawlerHTTPRequest.class.php
Sets the size-limit in bytes for content the request should receive.
setCookieHandling
PHPCrawler::setCookieHandling() in PHPCrawler.class.php
Alias for enableCookieHandling()
setCrawlerStatus
PHPCrawlerProcessCommunication::setCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Sets/writes the current crawler-status
setFindRedirectURLs
PHPCrawlerHTTPRequest::setFindRedirectURLs() in PHPCrawlerHTTPRequest.class.php
Specifies whether redirect-links set in http-headers should get searched for.
setFollowMode
PHPCrawler::setFollowMode() in PHPCrawler.class.php
Sets the basic follow-mode of the crawler.
setFollowRedirects
PHPCrawler::setFollowRedirects() in PHPCrawler.class.php
Defines whether the crawler should follow redirects sent with headers by a webserver or not.
setFollowRedirectsTillContent
Defines whether the crawler should follow HTTP-redirects until first content was found, regardless of defined filter-rules and follow-modes.
setHeaderCheckCallbackFunction
setLinkExtractionTags
PHPCrawlerHTTPRequest::setLinkExtractionTags() in PHPCrawlerHTTPRequest.class.php
Sets the html-tags from which to extract/find links from.
setLinkExtractionTags
PHPCrawler::setLinkExtractionTags() in PHPCrawler.class.php
Sets the list of html-tags the crawler should search for links in.
setLinksFoundArray
PHPCrawlerDocumentInfo::setLinksFoundArray() in PHPCrawlerDocumentInfo.class.php
Workaround-method, copies and converts the array $links_found_url_descriptors to $links_found.
setPageLimit
PHPCrawler::setPageLimit() in PHPCrawler.class.php
Sets a limit to the number of pages/files the crawler should follow.
setPort
PHPCrawler::setPort() in PHPCrawler.class.php
Sets the port to connect to for crawling the starting-url set in setUrl().
setProxy
PHPCrawler::setProxy() in PHPCrawler.class.php
Assigns a proxy-server the crawler should use for all HTTP-Requests.
setProxy
PHPCrawlerHTTPRequest::setProxy() in PHPCrawlerHTTPRequest.class.php
setSourceUrl
PHPCrawlerLinkFinder::setSourceUrl() in PHPCrawlerLinkFinder.class.php
Sets the source-URL of the document to find links in
setStreamTimeout
PHPCrawler::setStreamTimeout() in PHPCrawler.class.php
Sets the timeout in seconds for waiting for data on an established server-connection.
setTmpFile
PHPCrawler::setTmpFile() in PHPCrawler.class.php
Has no function anymore.
setTmpFile
PHPCrawlerHTTPRequest::setTmpFile() in PHPCrawlerHTTPRequest.class.php
Sets the temporary file to use when content of found documents should be streamed directly into a temporary file.
setTrafficLimit
PHPCrawler::setTrafficLimit() in PHPCrawler.class.php
Sets a limit to the number of bytes the crawler should receive alltogether during crawling-process.
setUrl
PHPCrawlerHTTPRequest::setUrl() in PHPCrawlerHTTPRequest.class.php
Sets the URL for the request.
setURL
PHPCrawler::setURL() in PHPCrawler.class.php
Sets the URL of the first page the crawler should crawl (root-page).
setUrlCacheType
PHPCrawler::setUrlCacheType() in PHPCrawler.class.php
Defines what type of cache will be internally used for caching URLs.
setUserAgentString
PHPCrawler::setUserAgentString() in PHPCrawler.class.php
Sets the "User-Agent" identification-string that will be send with HTTP-requests.
setWorkingDirectory
PHPCrawler::setWorkingDirectory() in PHPCrawler.class.php
Sets the working-directory the crawler should use for storing temporary data.
SMCCrawler
SMCCrawler in SitemapCreatorCrawler.class.php
Loading external PHPCrawler-class
sort2dArray
PHPCrawlerUtils::sort2dArray() in PHPCrawlerUtils.class.php
Sorts a twodimensiolnal array.
splitURL
PHPCrawlerUtils::splitURL() in PHPCrawlerUtils.class.php
Splits an URL into its parts
starControllerProcessLoop
Starts the loop of the controller-process (main-process).
start
PHPCrawlerBenchmark::start() in PHPCrawlerBenchmark.class.php
Starts the clock for the given benchmark.
startChildProcessLoop
PHPCrawler::startChildProcessLoop() in PHPCrawler.class.php
Starts the loop of a child-process.
stop
PHPCrawlerBenchmark::stop() in PHPCrawlerBenchmark.class.php
Stops the benchmark-clock for the given benchmark.
t
top
$temporary_benchmarks
PHPCrawlerBenchmark::$temporary_benchmarks in PHPCrawlerBenchmark.class.php
$tmpFile
PHPCrawlerHTTPRequest::$tmpFile in PHPCrawlerHTTPRequest.class.php
The TMP-File to use when a page/file should be streamed to file.
$top_lines_processed
PHPCrawlerLinkFinder::$top_lines_processed in PHPCrawlerLinkFinder.class.php
Flag indicating whether the top lines of the HTML-source were processed.
$traffic_limit
PHPCrawler::$traffic_limit in PHPCrawler.class.php
Limit of bytes to receive
$traffic_limit_reached
PHPCrawlerProcessReport::$traffic_limit_reached in PHPCrawlerProcessReport.class.php
Will be TRUE if the crawling-process stopped becaus the traffic-limit was reached.
$traffic_limit_reached
PHPCrawlerDocumentInfo::$traffic_limit_reached in PHPCrawlerDocumentInfo.class.php
Indicated whether the traffic-limit set by the user was reached after downloading this document.
toArray
PHPCrawlerUrlPartsDescriptor::toArray() in PHPCrawlerUrlPartsDescriptor.class.php
toArray
PHPCrawlerProcessReport::toArray() in PHPCrawlerProcessReport.class.php
Returns an array with all properties of this class.
toArray
PHPCrawlerDocumentInfo::toArray() in PHPCrawlerDocumentInfo.class.php
Returns an array with all properties of this class.
u
top
$url
PHPCrawlerDocumentInfo::$url in PHPCrawlerDocumentInfo.class.php
The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
$urlcache_purged
PHPCrawler::$urlcache_purged in PHPCrawler.class.php
Flag indicating whether the URL-cahce was purged at the beginning of a crawling-process
$UrlDescriptor
PHPCrawlerHTTPRequest::$UrlDescriptor in PHPCrawlerHTTPRequest.class.php
The URL for the request as PHPCrawlerURLDescriptor-object
$UrlFilter
PHPCrawler::$UrlFilter in PHPCrawler.class.php
The UrlFilter-Object
$urls
PHPCrawlerMemoryURLCache::$urls in PHPCrawlerMemoryURLCache.class.php
$url_cache_type
PHPCrawler::$url_cache_type in PHPCrawler.class.php
URl cache-type.
$url_distinct_property
PHPCrawlerURLCacheBase::$url_distinct_property in PHPCrawlerURLCacheBase.class.php
Defines which property of an URL is used to ensure that each URL is only cached once.
$url_filter_rules
PHPCrawlerURLFilter::$url_filter_rules in PHPCrawlerURLFilter.class.php
Array containing regex-rules for URLs that should NOT be followed.
$url_follow_rules
PHPCrawlerURLFilter::$url_follow_rules in PHPCrawlerURLFilter.class.php
Array containing regex-rules for URLs that should be followed.
$url_map
PHPCrawlerMemoryURLCache::$url_map in PHPCrawlerMemoryURLCache.class.php
$url_parts
PHPCrawlerHTTPRequest::$url_parts in PHPCrawlerHTTPRequest.class.php
The parts of the URL for the request as returned by PHPCrawlerUtils::splitURL()
$url_priorities
PHPCrawlerURLCacheBase::$url_priorities in PHPCrawlerURLCacheBase.class.php
$url_rebuild
PHPCrawlerURLDescriptor::$url_rebuild in PHPCrawlerURLDescriptor.class.php
The complete, full qualified and normalized URL
$userAgentString
PHPCrawlerHTTPRequest::$userAgentString in PHPCrawlerHTTPRequest.class.php
The user-agent-string
$UserSendDataCache
PHPCrawler::$UserSendDataCache in PHPCrawler.class.php
UserSendDataCahce-object.
$user_abort
PHPCrawlerProcessReport::$user_abort in PHPCrawlerProcessReport.class.php
Will be TRUE if the crawling-process stopped because the overridable function handleDocumentInfo() returned a negative value.
updateCrawlerStatus
PHPCrawlerProcessCommunication::updateCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Updates the status of the crawler
URLHASH_NONE
PHPCrawlerURLCacheBase::URLHASH_NONE in PHPCrawlerURLCacheBase.class.php
URLHASH_RAWLINK
PHPCrawlerURLCacheBase::URLHASH_RAWLINK in PHPCrawlerURLCacheBase.class.php
URLHASH_URL
PHPCrawlerURLCacheBase::URLHASH_URL in PHPCrawlerURLCacheBase.class.php
urlHostInCache
PHPCrawlerDNSCache::urlHostInCache() in PHPCrawlerDNSCache.class.php
Checks whether the hostname of the given URL is already cached
urlMatchesRules
PHPCrawlerURLFilter::urlMatchesRules() in PHPCrawlerURLFilter.class.php
Checks whether a given URL matches the rules.
v
top
$value
PHPCrawlerCookieDescriptor::$value in PHPCrawlerCookieDescriptor.class.php
Cookie-value
w
top
$working_base_directory
PHPCrawler::$working_base_directory in PHPCrawler.class.php
Base-directory for temporary directories
$working_directory
PHPCrawlerProcessCommunication::$working_directory in PHPCrawlerProcessCommunication.class.php
$working_directory
PHPCrawler::$working_directory in PHPCrawler.class.php
Complete path to the temporary directory
$working_directory
PHPCrawlerDocumentInfoQueue::$working_directory in PHPCrawlerDocumentInfoQueue.class.php
a b c d e f g h i k l m n o p q r s t u v w _