Full index

Package indexes


a b c d e f g h i k l m n o p q r s t u v w x _
a
top
$abort_reason
PHPCrawlerStatus::$abort_reason in PHPCrawlerStatus.class.php
Abort reason for aborting the crawling-process.
$abort_reason
PHPCrawlerProcessReport::$abort_reason in PHPCrawlerProcessReport.class.php
Reason for the abortion of the crawling-process
$aggressive_search
PHPCrawlerLinkFinder::$aggressive_search in PHPCrawlerLinkFinder.class.php
Specifies whether links will also be searched outside of HTML-tags
$auth_password
PHPCrawlerUrlPartsDescriptor::$auth_password in PHPCrawlerUrlPartsDescriptor.class.php
$auth_username
PHPCrawlerUrlPartsDescriptor::$auth_username in PHPCrawlerUrlPartsDescriptor.class.php
ABORTREASON_FILELIMIT_REACHED
PHPCrawlerAbortReasons::ABORTREASON_FILELIMIT_REACHED in PHPCrawlerAbortReasons.class.php
Crawling-process aborted because the filelimit set by user was reached.
ABORTREASON_PASSEDTHROUGH
PHPCrawlerAbortReasons::ABORTREASON_PASSEDTHROUGH in PHPCrawlerAbortReasons.class.php
Crawling-process aborted because everything is done/passedthrough.
ABORTREASON_TRAFFICLIMIT_REACHED
Crawling-process aborted because the traffic-limit set by user was reached.
ABORTREASON_USERABORT
PHPCrawlerAbortReasons::ABORTREASON_USERABORT in PHPCrawlerAbortReasons.class.php
Crawling-process aborted because the handleDocumentInfo-method returned a negative value
addBasicAuthentication
PHPCrawler::addBasicAuthentication() in PHPCrawler.class.php
Adds a basic-authentication (username and password) to the list of basic authentications that will be send with requests.
addBasicAuthentication
PHPCrawlerUserSendDataCache::addBasicAuthentication() in PHPCrawlerUserSendDataCache.class.php
Adds a basic-authentication (username and password) to the list of authentications that will be send with requests.
addContentTypeReceiveRule
Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addCookie
PHPCrawlerMemoryCookieCache::addCookie() in PHPCrawlerMemoryCookieCache.class.php
Adds a cookie to the cookie-cache.
addCookie
PHPCrawlerCookieCacheBase::addCookie() in PHPCrawlerCookieCacheBase.class.php
Adds a cookie to the cookie-cache.
addCookie
PHPCrawlerSQLiteCookieCache::addCookie() in PHPCrawlerSQLiteCookieCache.class.php
Adds a cookie to the cookie-cache.
addCookie
PHPCrawlerHTTPRequest::addCookie() in PHPCrawlerHTTPRequest.class.php
Adds a cookie to send with the request.
addCookieDescriptor
PHPCrawlerHTTPRequest::addCookieDescriptor() in PHPCrawlerHTTPRequest.class.php
Adds a cookie to send with the request.
addCookieDescriptors
PHPCrawlerHTTPRequest::addCookieDescriptors() in PHPCrawlerHTTPRequest.class.php
Adds a bunch of cookies to send with the request
addCookies
PHPCrawlerSQLiteCookieCache::addCookies() in PHPCrawlerSQLiteCookieCache.class.php
Adds a bunch of cookies to the cookie-cache.
addCookies
PHPCrawlerCookieCacheBase::addCookies() in PHPCrawlerCookieCacheBase.class.php
Adds a bunch of cookies to the cookie-cache.
addCookies
PHPCrawlerMemoryCookieCache::addCookies() in PHPCrawlerMemoryCookieCache.class.php
Adds a bunch of cookies to the cookie-cache.
addDocumentInfo
PHPCrawlerDocumentInfoQueue::addDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php
Adds a PHPCrawlerDocumentInfo-object to the queue
addEngine
SitemapCreator::addEngine() in SitemapCreator.class.php
add a ping URL to the array $engines
addEntry
SitemapCreator::addEntry() in SitemapCreator.class.php
add URL entry manually to $entries
addFollowMatch
PHPCrawler::addFollowMatch() in PHPCrawler.class.php
Alias for addURLFollowRule().
addLinkExtractionTags
PHPCrawler::addLinkExtractionTags() in PHPCrawler.class.php
Sets the list of html-tags from which links should be extracted from.
addLinkPriorities
PHPCrawlerURLCacheBase::addLinkPriorities() in PHPCrawlerURLCacheBase.class.php
Adds a bunch of link-priorities
addLinkPriority
PHPCrawlerURLCacheBase::addLinkPriority() in PHPCrawlerURLCacheBase.class.php
Adds a Link-Priority-Level
addLinkPriority
PHPCrawler::addLinkPriority() in PHPCrawler.class.php
Adds a regular expression togehter with a priority-level to the list of rules that decide what links should be prefered.
addLinkSearchContentType
PHPCrawlerHTTPRequest::addLinkSearchContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decide what kind of documents should get checked for links in (regarding their content-type)
addLinkSearchContentType
Adds a rule to the list of rules that decide in what kind of documents the crawler should search for links in (regarding their content-type)
addLinkToCache
PHPCrawlerLinkFinder::addLinkToCache() in PHPCrawlerLinkFinder.class.php
addNonFollowMatch
PHPCrawler::addNonFollowMatch() in PHPCrawler.class.php
Alias for addURLFilterRule().
addPostData
PHPCrawlerHTTPRequest::addPostData() in PHPCrawlerHTTPRequest.class.php
Adds post-data to send with the request.
addPostData
PHPCrawler::addPostData() in PHPCrawler.class.php
Adds post-data together with an URL-rule to the list of post-data to send with requests.
addPostData
PHPCrawlerUserSendDataCache::addPostData() in PHPCrawlerUserSendDataCache.class.php
Adds post-data together with an URL-regex to the list of post-data to send with requests.
addReceiveContentType
PHPCrawler::addReceiveContentType() in PHPCrawler.class.php
Alias for addContentTypeReceiveRule().
addReceiveContentType
PHPCrawlerHTTPRequest::addReceiveContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addReceiveToMemoryMatch
Has no function anymore!
addReceiveToTmpFileMatch
Alias for addStreamToFileContentType().
addSitemapEXT
SitemapCreator::addSitemapEXT() in SitemapCreator.class.php
adds sitemap gzip extension to sitemap filename if $use_gzip enabled
addStreamToFileContentType
PHPCrawlerHTTPRequest::addStreamToFileContentType() in PHPCrawlerHTTPRequest.class.php
Adds a rule to the list of rules that decides what types of content should be streamed diretly to the temporary file.
addStreamToFileContentType
Adds a rule to the list of rules that decides what types of content should be streamed diretly to a temporary file.
addToRobots
SitemapCreator::addToRobots() in SitemapCreator.class.php
Add index sitemap URL to robots.txt file
addURL
PHPCrawlerURLCacheBase::addURL() in PHPCrawlerURLCacheBase.class.php
Adds an URL to the url-cache
addURL
PHPCrawlerSQLiteURLCache::addURL() in PHPCrawlerSQLiteURLCache.class.php
Adds an URL to the url-cache
addURL
PHPCrawlerMemoryURLCache::addURL() in PHPCrawlerMemoryURLCache.class.php
Adds an URL to the url-cache
addURLFilterRule
PHPCrawlerURLFilter::addURLFilterRule() in PHPCrawlerURLFilter.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRule
PHPCrawler::addURLFilterRule() in PHPCrawler.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRules
PHPCrawlerURLFilter::addURLFilterRules() in PHPCrawlerURLFilter.class.php
Adds a bunch of rules to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFollowRule
PHPCrawlerURLFilter::addURLFollowRule() in PHPCrawlerURLFilter.class.php
addURLFollowRule
PHPCrawler::addURLFollowRule() in PHPCrawler.class.php
Adds a rule to the list of rules that decide which URLs found on a page should be followd explicitly.
addURLs
PHPCrawlerURLCacheBase::addURLs() in PHPCrawlerURLCacheBase.class.php
Adds an bunch of URLs to the url-cache
addURLs
PHPCrawlerMemoryURLCache::addURLs() in PHPCrawlerMemoryURLCache.class.php
Adds an bunch of URLs to the url-cache
addURLs
PHPCrawlerSQLiteURLCache::addURLs() in PHPCrawlerSQLiteURLCache.class.php
Adds an bunch of URLs to the url-cache
addURL_Entry
SMCCrawler::addURL_Entry() in SitemapCreatorCrawler.class.php
add URL entry $entries
addXMLURLSet
SitemapCreator::addXMLURLSet() in SitemapCreator.class.php
Add single XML code to $xml_url_set
b
top
$baseUrlParts
PHPCrawlerLinkFinder::$baseUrlParts in PHPCrawlerLinkFinder.class.php
Parts of the base-url as PHPCrawlerUrlPartsDescriptor-object
$basic_authentications
PHPCrawlerUserSendDataCache::$basic_authentications in PHPCrawlerUserSendDataCache.class.php
Array containing basic-authentications to send.
$benchmarks
PHPCrawlerDocumentInfo::$benchmarks in PHPCrawlerDocumentInfo.class.php
Some internal benchmak-results as array.
$benchmark_results
PHPCrawlerBenchmark::$benchmark_results in PHPCrawlerBenchmark.class.php
$benchmark_startcount
PHPCrawlerBenchmark::$benchmark_startcount in PHPCrawlerBenchmark.class.php
$benchmark_starttimes
PHPCrawlerBenchmark::$benchmark_starttimes in PHPCrawlerBenchmark.class.php
$bytes_received
PHPCrawlerDocumentInfo::$bytes_received in PHPCrawlerDocumentInfo.class.php
The number of bytes the crawler received of the content of the document.
$bytes_received
PHPCrawlerProcessReport::$bytes_received in PHPCrawlerProcessReport.class.php
The total number of bytes the crawler received alltogether.
$bytes_received
PHPCrawlerStatus::$bytes_received in PHPCrawlerStatus.class.php
Number of bytes the crawler-instance received so far
buildCookieHeader
PHPCrawlerHTTPRequest::buildCookieHeader() in PHPCrawlerHTTPRequest.class.php
Builds the cookie-header-part for the header to send.
buildPostContent
PHPCrawlerHTTPRequest::buildPostContent() in PHPCrawlerHTTPRequest.class.php
Builds the post-content from the postdata-array for the header to send with the request (MIME-style)
buildRegExpressions
PHPCrawlerRobotsTxtParser::buildRegExpressions() in PHPCrawlerRobotsTxtParser.class.php
Returns an array containig regular-expressions corresponding to the given robots.txt-style "Disallow"-lines
buildRequestHeader
PHPCrawlerHTTPRequest::buildRequestHeader() in PHPCrawlerHTTPRequest.class.php
Builds the request-header from the given settings.
buildURLFromLink
PHPCrawlerUtils::buildURLFromLink() in PHPCrawlerUtils.class.php
Reconstructs a full qualified and normalized URL from a given link relating to the URL the link was found in.
buildURLFromParts
PHPCrawlerUtils::buildURLFromParts() in PHPCrawlerUtils.class.php
Builds an URL from it's single parts.
c
top
$child_process_number
PHPCrawler::$child_process_number in PHPCrawler.class.php
Number of child-process (NOT the PID!)
$classpath
SitemapCreator::$classpath in SitemapCreator.class.php
$class_version
PHPCrawler::$class_version in PHPCrawler.class.php
$class_version
SitemapCreator::$class_version in SitemapCreator.class.php
$content
PHPCrawlerDocumentInfo::$content in PHPCrawlerDocumentInfo.class.php
The content of the requested document (html-sourcecode or content of file).
$content_length
PHPCrawlerResponseHeader::$content_length in PHPCrawlerResponseHeader.class.php
The content-length as stated in the header.
$content_size_limit
PHPCrawlerHTTPRequest::$content_size_limit in PHPCrawlerHTTPRequest.class.php
Limit for content-size to receive
$content_tmp_file
PHPCrawlerDocumentInfo::$content_tmp_file in PHPCrawlerDocumentInfo.class.php
The temporary file to which the content was received.
$content_type
PHPCrawlerDocumentInfo::$content_type in PHPCrawlerDocumentInfo.class.php
The content-type of the page or file, e.g. "text/html" or "image/gif".
$content_type
PHPCrawlerResponseHeader::$content_type in PHPCrawlerResponseHeader.class.php
The content-type
$CookieCache
PHPCrawler::$CookieCache in PHPCrawler.class.php
The PHPCrawlerCookieCache-Object
$cookies
PHPCrawlerMemoryCookieCache::$cookies in PHPCrawlerMemoryCookieCache.class.php
$cookies
PHPCrawlerResponseHeader::$cookies in PHPCrawlerResponseHeader.class.php
All cookies found in the header
$cookies
PHPCrawlerDocumentInfo::$cookies in PHPCrawlerDocumentInfo.class.php
Cookies send by the server.
$cookie_array
PHPCrawlerHTTPRequest::$cookie_array in PHPCrawlerHTTPRequest.class.php
Array containing cookies to send with the request
$cookie_handling_enabled
PHPCrawler::$cookie_handling_enabled in PHPCrawler.class.php
Flag cookie-handling enabled/diabled
$cookie_send_time
PHPCrawlerCookieDescriptor::$cookie_send_time in PHPCrawlerCookieDescriptor.class.php
The time the cookie was send
$Crawler
SitemapCreator::$Crawler in SitemapCreator.class.php
$crawlerStatus
PHPCrawlerProcessCommunication::$crawlerStatus in PHPCrawlerProcessCommunication.class.php
$crawler_reports
SitemapCreator::$crawler_reports in SitemapCreator.class.php
$crawler_uniqid
PHPCrawlerProcessCommunication::$crawler_uniqid in PHPCrawlerProcessCommunication.class.php
$crawler_uniqid
PHPCrawler::$crawler_uniqid in PHPCrawler.class.php
UID of this instance of the crawler
$CurrentDocumentInfo
PHPCrawlerURLFilter::$CurrentDocumentInfo in PHPCrawlerURLFilter.class.php
Current PHPCrawlerDocumentInfo-object of the current document
calcFrequency
SitemapCreator::calcFrequency() in SitemapCreator.class.php
calculates Frequency for each entry $priority_mode
calcPriority
SitemapCreator::calcPriority() in SitemapCreator.class.php
calculates Priority for each entry $priority_mode
checkForAbort
PHPCrawler::checkForAbort() in PHPCrawler.class.php
Checks if the crawling-process should be aborted.
checkRegexPattern
PHPCrawlerUtils::checkRegexPattern() in PHPCrawlerUtils.class.php
Checks whether a given RegEx-pattern is valid or not.
checkStringAgainstRegexArray
Checks whether a given string matches with one of the given regular-expressions.
childProcessAlive
PHPCrawlerProcessCommunication::childProcessAlive() in PHPCrawlerProcessCommunication.class.php
Checks wehther any child-processes a (still) running.
cleanup
PHPCrawlerURLCacheBase::cleanup() in PHPCrawlerURLCacheBase.class.php
Do cleanups after the cache is not needed anymore
cleanup
PHPCrawler::cleanup() in PHPCrawler.class.php
Cleans up the crawler after it has finished.
cleanup
PHPCrawlerMemoryURLCache::cleanup() in PHPCrawlerMemoryURLCache.class.php
Has no function in this class.
cleanup
PHPCrawlerSQLiteURLCache::cleanup() in PHPCrawlerSQLiteURLCache.class.php
Cleans up the cache after is it not needed anymore.
clear
PHPCrawlerURLCacheBase::clear() in PHPCrawlerURLCacheBase.class.php
Removes all URLs and all priority-rules from the URL-cache.
clear
PHPCrawlerSQLiteURLCache::clear() in PHPCrawlerSQLiteURLCache.class.php
Removes all URLs and all priority-rules from the URL-cache.
clear
PHPCrawlerMemoryURLCache::clear() in PHPCrawlerMemoryURLCache.class.php
Removes all URLs and all priority-rules from the URL-cache.
clearCookies
PHPCrawlerHTTPRequest::clearCookies() in PHPCrawlerHTTPRequest.class.php
Removes all cookies to send with the request.
clearPostData
PHPCrawlerHTTPRequest::clearPostData() in PHPCrawlerHTTPRequest.class.php
Removes all post-data to send with the request.
containsURLs
PHPCrawlerURLCacheBase::containsURLs() in PHPCrawlerURLCacheBase.class.php
Checks whether there are URLs left in the cache or not.
containsURLs
PHPCrawlerSQLiteURLCache::containsURLs() in PHPCrawlerSQLiteURLCache.class.php
Checks whether there are URLs left in the cache that should be processed or not.
containsURLs
PHPCrawlerMemoryURLCache::containsURLs() in PHPCrawlerMemoryURLCache.class.php
Checks whether there are URLs left in the cache or not.
Crawl
SitemapCreator::Crawl() in SitemapCreator.class.php
Start the crawl process
createPreparedInsertStatement
PHPCrawlerSQLiteURLCache::createPreparedInsertStatement() in PHPCrawlerSQLiteURLCache.class.php
Creates the prepared statement for insterting URLs into database (if not done yet)
createPreparedStatements
PHPCrawlerDocumentInfoQueue::createPreparedStatements() in PHPCrawlerDocumentInfoQueue.class.php
CreateSitemaps
SitemapCreator::CreateSitemaps() in SitemapCreator.class.php
Create sitemaps files and index
createWorkingDirectory
PHPCrawler::createWorkingDirectory() in PHPCrawler.class.php
Creates the working-directory for this instance of the cralwer.
csvFile
SitemapCreator::csvFile() in SitemapCreator.class.php
get CSV file path
d
top
$data_dir
SitemapCreator::$data_dir in SitemapCreator.class.php
Data directory Path.
$data_throughput
PHPCrawlerProcessReport::$data_throughput in PHPCrawlerProcessReport.class.php
The average data-throughput in bytes per second.
$data_transfer_rate
PHPCrawlerDocumentInfo::$data_transfer_rate in PHPCrawlerDocumentInfo.class.php
The average data-transferrate for this document.
$data_transfer_time
PHPCrawlerHTTPRequest::$data_transfer_time in PHPCrawlerHTTPRequest.class.php
The time it took te receive data-packets for the request.
$data_transfer_time
PHPCrawlerDocumentInfo::$data_transfer_time in PHPCrawlerDocumentInfo.class.php
The time it took to receive the document.
$db_analyzed
PHPCrawlerSQLiteURLCache::$db_analyzed in PHPCrawlerSQLiteURLCache.class.php
$DNSCache
PHPCrawlerHTTPRequest::$DNSCache in PHPCrawlerHTTPRequest.class.php
DNS-cache
$DocumentInfoQueue
PHPCrawler::$DocumentInfoQueue in PHPCrawler.class.php
DocumentInfoQueue-object
$documents_received
PHPCrawlerStatus::$documents_received in PHPCrawlerStatus.class.php
Number of documents the crawler-instance received so far
$document_limit
PHPCrawler::$document_limit in PHPCrawler.class.php
Limit of documents to receive
$domain
PHPCrawlerUrlPartsDescriptor::$domain in PHPCrawlerUrlPartsDescriptor.class.php
$domain
PHPCrawlerCookieDescriptor::$domain in PHPCrawlerCookieDescriptor.class.php
Cookie-domain
decideRecevieContent
PHPCrawlerHTTPRequest::decideRecevieContent() in PHPCrawlerHTTPRequest.class.php
Checks whether the content of this page/file should be received (based on the content-type and the applied rules)
decideStreamToFile
PHPCrawlerHTTPRequest::decideStreamToFile() in PHPCrawlerHTTPRequest.class.php
Checks whether the content of this page/file should be streamed directly to file.
deserializeFromFile
PHPCrawlerUtils::deserializeFromFile() in PHPCrawlerUtils.class.php
Returns deserialized data that is stored in a file.
disableExtendedLinkInfo
Has no function anymore.
e
top
$engines
SitemapCreator::$engines in SitemapCreator.class.php
Ping URLs of the search engines sitemaps API
$entries
SitemapCreator::$entries in SitemapCreator.class.php
Array contianing the entries of the sitemap.
$entries
SMCCrawler::$entries in SitemapCreatorCrawler.class.php
Array contianing the entries.
$entries_per_sitemap
SitemapCreator::$entries_per_sitemap in SitemapCreator.class.php
Maximum number of entries per sitemap file
$error_code
PHPCrawlerDocumentInfo::$error_code in PHPCrawlerDocumentInfo.class.php
The code of the error that perhaps occured while requesting/receiving the document.
$error_occured
PHPCrawlerDocumentInfo::$error_occured in PHPCrawlerDocumentInfo.class.php
Indicates whether an error occured while requesting/receiving the document.
$error_string
PHPCrawlerDocumentInfo::$error_string in PHPCrawlerDocumentInfo.class.php
A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
$expires
PHPCrawlerCookieDescriptor::$expires in PHPCrawlerCookieDescriptor.class.php
Expire-string, e.g. "Sat, 08-Aug-2020 23:59:08 GMT"
$expire_timestamp
PHPCrawlerCookieDescriptor::$expire_timestamp in PHPCrawlerCookieDescriptor.class.php
Expire-date as unix-timestamp
$extract_tags
PHPCrawlerLinkFinder::$extract_tags in PHPCrawlerLinkFinder.class.php
Numeric array containing all tags to extract links from
enableAggressiveLinkSearch
PHPCrawlerHTTPRequest::enableAggressiveLinkSearch() in PHPCrawlerHTTPRequest.class.php
Enables/disables aggresive linksearch
enableAggressiveLinkSearch
Enables or disables agressive link-searching.
enableCookieHandling
PHPCrawler::enableCookieHandling() in PHPCrawler.class.php
Enables or disables cookie-handling.
enableLastModifiedCount
SMCCrawler::enableLastModifiedCount() in SitemapCreatorCrawler.class.php
Enable or diable last-Modified calculation $LastModifiedCount
enableResumption
PHPCrawler::enableResumption() in PHPCrawler.class.php
Prepares the crawler for process-resumption.
ERROR_HOST_UNREACHABLE
PHPCrawlerRequestErrors::ERROR_HOST_UNREACHABLE in PHPCrawlerRequestErrors.class.php
Error-Code: Host not reachable
ERROR_NO_HTTP_HEADER
PHPCrawlerRequestErrors::ERROR_NO_HTTP_HEADER in PHPCrawlerRequestErrors.class.php
Error-Code: Host didn't respond with a valid HTTP-header.
ERROR_PROXY_UNREACHABLE
PHPCrawlerRequestErrors::ERROR_PROXY_UNREACHABLE in PHPCrawlerRequestErrors.class.php
Error-Code: Proxy not reachable
ERROR_SOCKET_TIMEOUT
PHPCrawlerRequestErrors::ERROR_SOCKET_TIMEOUT in PHPCrawlerRequestErrors.class.php
Error-Code: Socket timed out while reading data.
ERROR_SSL_NOT_SUPPORTED
PHPCrawlerRequestErrors::ERROR_SSL_NOT_SUPPORTED in PHPCrawlerRequestErrors.class.php
Error-Code: SSL/HTTPS not supported (probably openssl-extension not installed)
ERROR_TMP_FILE_NOT_WRITEABLE
PHPCrawlerRequestErrors::ERROR_TMP_FILE_NOT_WRITEABLE in PHPCrawlerRequestErrors.class.php
Error-Code: Could not write or create TMP-file.
example.php
example.php in example.php
f
top
$file
PHPCrawlerUrlPartsDescriptor::$file in PHPCrawlerUrlPartsDescriptor.class.php
$file
PHPCrawlerDocumentInfo::$file in PHPCrawlerDocumentInfo.class.php
The name of the requested page or file, e.g. "page.html".
$files_received
PHPCrawlerProcessReport::$files_received in PHPCrawlerProcessReport.class.php
The total number of documents the crawler received.
$file_limit_reached
PHPCrawlerProcessReport::$file_limit_reached in PHPCrawlerProcessReport.class.php
Will be TRUE if the page/file-limit was reached.
$find_redirect_urls
PHPCrawlerLinkFinder::$find_redirect_urls in PHPCrawlerLinkFinder.class.php
Specifies whether redirect-links set in http-headers should get found.
$first_content_url
PHPCrawlerStatus::$first_content_url in PHPCrawlerStatus.class.php
$follow_redirects_till_content
$found_links_map
PHPCrawlerLinkFinder::$found_links_map in PHPCrawlerLinkFinder.class.php
$frequency_mode
SitemapCreator::$frequency_mode in SitemapCreator.class.php
Frequency mode
$frequency_types
SitemapCreator::$frequency_types in SitemapCreator.class.php
Array contains Frequency types as keys and max time in seconds as values.
filterUrls
PHPCrawlerURLFilter::filterUrls() in PHPCrawlerURLFilter.class.php
Filters the given URLs (contained in the given PHPCrawlerDocumentInfo-object) by the given rules.
findLinksInHTMLChunk
PHPCrawlerLinkFinder::findLinksInHTMLChunk() in PHPCrawlerLinkFinder.class.php
Searches for links in the given HTML-chunk and adds found links the the internal link-cache.
findRedirectLinkInHeader
PHPCrawlerLinkFinder::findRedirectLinkInHeader() in PHPCrawlerLinkFinder.class.php
Checks for a redirect-URL in the given http-header and adds it to the internal link-cache.
FREQUENCY_Disable
SitemapCreator::FREQUENCY_Disable in SitemapCreator.class.php
Disables frequency calculations
FREQUENCY_LAST_MODIFIED
SitemapCreator::FREQUENCY_LAST_MODIFIED in SitemapCreator.class.php
Latest modified pages get higher frequency
FREQUENCY_PRIORITY
SitemapCreator::FREQUENCY_PRIORITY in SitemapCreator.class.php
Higher priority pages get higher frequency
fromURL
PHPCrawlerUrlPartsDescriptor::fromURL() in PHPCrawlerUrlPartsDescriptor.class.php
Returns the PHPCrawlerUrlPartsDescriptor-object for the given URL.
g
top
$general_follow_mode
PHPCrawlerURLFilter::$general_follow_mode in PHPCrawlerURLFilter.class.php
The general follow-mode of the crawler
$global_traffic_count
PHPCrawlerHTTPRequest::$global_traffic_count in PHPCrawlerHTTPRequest.class.php
Global counter for traffic this instance of the HTTPRequest-class caused.
getAllBenchmarks
PHPCrawlerBenchmark::getAllBenchmarks() in PHPCrawlerBenchmark.class.php
Returns all registered benchmark-results.
getAllMetaAttributes
PHPCrawlerLinkFinder::getAllMetaAttributes() in PHPCrawlerLinkFinder.class.php
Returns all meta-tag attributes found so far in the document.
getAllURLs
PHPCrawlerLinkFinder::getAllURLs() in PHPCrawlerLinkFinder.class.php
Returns all URLs/links found so far in the document.
getAllURLs
PHPCrawlerMemoryURLCache::getAllURLs() in PHPCrawlerMemoryURLCache.class.php
Returns all URLs currently cached in the URL-cache.
getAllURLs
PHPCrawlerSQLiteURLCache::getAllURLs() in PHPCrawlerSQLiteURLCache.class.php
Has no function in this class
getAllURLs
PHPCrawlerURLCacheBase::getAllURLs() in PHPCrawlerURLCacheBase.class.php
Returns all URLs currently cached in the URL-cache.
getApplyingLines
PHPCrawlerRobotsTxtParser::getApplyingLines() in PHPCrawlerRobotsTxtParser.class.php
Function returns all RAW lines in the given robots.txt-content that apply to the given useragent-string.
getBaseUrlFromMetaTag
PHPCrawlerUtils::getBaseUrlFromMetaTag() in PHPCrawlerUtils.class.php
Returns the base-URL specified in a meta-tag in the given HTML-source
getBasicAuthenticationForUrl
PHPCrawlerUserSendDataCache::getBasicAuthenticationForUrl() in PHPCrawlerUserSendDataCache.class.php
Returns the basic-authentication (username and password) that should be send to the given URL.
getCallCount
PHPCrawlerBenchmark::getCallCount() in PHPCrawlerBenchmark.class.php
getChildPIDs
PHPCrawlerProcessCommunication::getChildPIDs() in PHPCrawlerProcessCommunication.class.php
Returns alls PIDs of all running child-processes
getCookiesForUrl
PHPCrawlerMemoryCookieCache::getCookiesForUrl() in PHPCrawlerMemoryCookieCache.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl
PHPCrawlerSQLiteCookieCache::getCookiesForUrl() in PHPCrawlerSQLiteCookieCache.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl
PHPCrawlerCookieCacheBase::getCookiesForUrl() in PHPCrawlerCookieCacheBase.class.php
Returns all cookies from the cache that are adressed to the given URL
getCookiesFromHeader
PHPCrawlerUtils::getCookiesFromHeader() in PHPCrawlerUtils.class.php
Returns all cookies from the give response-header.
getCrawlerId
PHPCrawler::getCrawlerId() in PHPCrawler.class.php
Returns the unique ID of the instance of the crawler
getCrawlerStatus
PHPCrawlerProcessCommunication::getCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Returns/reads the current crawler-status
getDataDir
SitemapCreator::getDataDir() in SitemapCreator.class.php
Get data directory path $data_dir
getDistinctURLHash
PHPCrawlerURLCacheBase::getDistinctURLHash() in PHPCrawlerURLCacheBase.class.php
Returns the distinct-hash for the given URL that ensures that no URLs a cached more than one time.
getDocumentInfoCount
PHPCrawlerDocumentInfoQueue::getDocumentInfoCount() in PHPCrawlerDocumentInfoQueue.class.php
Returns the current number of PHPCrawlerDocumentInfo-objects in the queue
getElapsedTime
PHPCrawlerBenchmark::getElapsedTime() in PHPCrawlerBenchmark.class.php
Gets the elapsed time for the given benchmark.
getEntries
SitemapCreator::getEntries() in SitemapCreator.class.php
Get URLs sets array $entries
getFromHeaderLine
PHPCrawlerCookieDescriptor::getFromHeaderLine() in PHPCrawlerCookieDescriptor.class.php
Returns a PHPCrawlerCookieDescriptor-object initiated by the given cookie-header-line.
getGlobalTrafficCount
PHPCrawlerHTTPRequest::getGlobalTrafficCount() in PHPCrawlerHTTPRequest.class.php
Returns the global traffic this instance of the HTTPRequest-class caused so far.
getHeaderValue
PHPCrawlerUtils::getHeaderValue() in PHPCrawlerUtils.class.php
Gets the value of an header-directive from the given HTTP-header.
getHTTPStatusCode
PHPCrawlerUtils::getHTTPStatusCode() in PHPCrawlerUtils.class.php
Gets the HTTP-statuscode from a given response-header.
getIP
PHPCrawlerDNSCache::getIP() in PHPCrawlerDNSCache.class.php
Returns the IP for the given hostname.
getLastModified
SMCCrawler::getLastModified() in SitemapCreatorCrawler.class.php
get Last-Modified header
getMaxPriorityLevel
PHPCrawlerMemoryURLCache::getMaxPriorityLevel() in PHPCrawlerMemoryURLCache.class.php
Returns the highest priority-level an URL exists in cache for.
getMetaTagAttributes
PHPCrawlerUtils::getMetaTagAttributes() in PHPCrawlerUtils.class.php
Gets all meta-tag atteributes from the given HTML-source.
getmicrotime
PHPCrawlerBenchmark::getmicrotime() in PHPCrawlerBenchmark.class.php
Returns the current time in seconds and milliseconds.
getNextDocumentInfo
PHPCrawlerDocumentInfoQueue::getNextDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php
Returns a PHPCrawlerDocumentInfo-object from the queue
getNextUrl
PHPCrawlerMemoryURLCache::getNextUrl() in PHPCrawlerMemoryURLCache.class.php
Returns the next URL from the cache that should be crawled.
getNextUrl
PHPCrawlerSQLiteURLCache::getNextUrl() in PHPCrawlerSQLiteURLCache.class.php
Returns the next URL from the cache that should be crawled.
getNextUrl
PHPCrawlerURLCacheBase::getNextUrl() in PHPCrawlerURLCacheBase.class.php
Returns the next URL from the cache that should be crawled.
getPostDataForUrl
PHPCrawlerUserSendDataCache::getPostDataForUrl() in PHPCrawlerUserSendDataCache.class.php
Returns the post-data (key and value) that should be send to the given URL.
getProcessReport
PHPCrawler::getProcessReport() in PHPCrawler.class.php
Retruns summarizing report-information about the crawling-process after it has finished.
getRedirectURLFromHeader
PHPCrawlerUtils::getRedirectURLFromHeader() in PHPCrawlerUtils.class.php
Returns the redirect-URL from the given HTML-header
getReport
PHPCrawler::getReport() in PHPCrawler.class.php
Retruns an array with summarizing report-information after the crawling-process has finished
getRobotsTxtContent
PHPCrawlerRobotsTxtParser::getRobotsTxtContent() in PHPCrawlerRobotsTxtParser.class.php
Retreives the content of a robots.txt-file
getRobotsTxtURL
PHPCrawlerRobotsTxtParser::getRobotsTxtURL() in PHPCrawlerRobotsTxtParser.class.php
Returns the Robots.txt-URL related to the given URL
getRootUrl
PHPCrawlerUtils::getRootUrl() in PHPCrawlerUtils.class.php
Returns the normalized root-URL of the given URL
getSitemapDirName
SitemapCreator::getSitemapDirName() in SitemapCreator.class.php
Get sitemap directory name
getSitemapPath
SitemapCreator::getSitemapPath() in SitemapCreator.class.php
Get sitemap file path
getSitemapsDir
SitemapCreator::getSitemapsDir() in SitemapCreator.class.php
Get sitemap directory path $sitemaps_dir
getSitemapURL
SitemapCreator::getSitemapURL() in SitemapCreator.class.php
Get sitemap file URL
getSystemTempDir
PHPCrawlerUtils::getSystemTempDir() in PHPCrawlerUtils.class.php
Determinates the systems temporary-directory.
getUrlCount
PHPCrawlerSQLiteURLCache::getUrlCount() in PHPCrawlerSQLiteURLCache.class.php
getUrlPriority
PHPCrawlerURLCacheBase::getUrlPriority() in PHPCrawlerURLCacheBase.class.php
Gets the priority-level of the given URL
go
PHPCrawler::go() in PHPCrawler.class.php
Starts the crawling process in single-process-mode.
goMultiProcessed
PHPCrawler::goMultiProcessed() in PHPCrawler.class.php
Starts the cralwer by using multi processes.
h
top
$header
PHPCrawlerDocumentInfo::$header in PHPCrawlerDocumentInfo.class.php
The complete HTTP-header the webserver responded with this page or file.
$header_check_callback_function
$header_raw
PHPCrawlerResponseHeader::$header_raw in PHPCrawlerResponseHeader.class.php
The raw HTTP-header as it was send by the server
$header_send
PHPCrawlerDocumentInfo::$header_send in PHPCrawlerDocumentInfo.class.php
The complete HTTP-request-header the crawler sent to the server (debugging info).
$host
PHPCrawlerDocumentInfo::$host in PHPCrawlerDocumentInfo.class.php
The host-part of the URL of the requested page or file, e.g. "www.foo.com".
$host
PHPCrawlerUrlPartsDescriptor::$host in PHPCrawlerUrlPartsDescriptor.class.php
$host_ip_array
PHPCrawlerDNSCache::$host_ip_array in PHPCrawlerDNSCache.class.php
Array for caching IPs of the requested hostnames
$http_status_code
PHPCrawlerResponseHeader::$http_status_code in PHPCrawlerResponseHeader.class.php
The HTTP-statuscode
$http_status_code
PHPCrawlerDocumentInfo::$http_status_code in PHPCrawlerDocumentInfo.class.php
The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
handleDocumentInfo
SMCCrawler::handleDocumentInfo() in SitemapCreatorCrawler.class.php
get access to all information about a page or file the crawler found and received.
handleDocumentInfo
PHPCrawler::handleDocumentInfo() in PHPCrawler.class.php
Override this method to get access to all information about a page or file the crawler found and received.
handleHeaderInfo
PHPCrawler::handleHeaderInfo() in PHPCrawler.class.php
Overridable method that will be called after the header of a document was received and BEFORE the content will be received.
handlePageData
PHPCrawler::handlePageData() in PHPCrawler.class.php
Override this method to get access to all information about a page or file the crawler found and received.
hostInCache
PHPCrawlerDNSCache::hostInCache() in PHPCrawlerDNSCache.class.php
Checks whether a hostname is already cached.
i
top
$is_chlid_process
PHPCrawler::$is_chlid_process in PHPCrawler.class.php
Flag indicating whether this instance is running in a child-process (if crawler runs multi-processed)
$is_parent_process
PHPCrawler::$is_parent_process in PHPCrawler.class.php
Flag indicating whether this instance is running in the parent-process (if crawler runs multi-processed)
$is_redirect_url
PHPCrawlerURLDescriptor::$is_redirect_url in PHPCrawlerURLDescriptor.class.php
Flag indicating whether this URL was target of an HTTP-redirect.
initChildProcess
PHPCrawler::initChildProcess() in PHPCrawler.class.php
Overridable method that will be called by every used child-process just before it starts the crawling-procedure.
initCrawler
SitemapCreator::initCrawler() in SitemapCreator.class.php
Initiate the crawler $Crawler
initCrawlerProcess
PHPCrawler::initCrawlerProcess() in PHPCrawler.class.php
Initiates a crawler-process
isDataDirWritable
SitemapCreator::isDataDirWritable() in SitemapCreator.class.php
Check if data directory is writable $data_dir
isUTF8String
PHPCrawlerUtils::isUTF8String() in PHPCrawlerUtils.class.php
Checks wether the given string is an UTF8-encoded string.
isValidUrlString
PHPCrawlerUtils::isValidUrlString() in PHPCrawlerUtils.class.php
Checks whether the given string is a valid, urlencoded URL (by RFC)
k
top
keepRedirectUrls
PHPCrawlerURLFilter::keepRedirectUrls() in PHPCrawlerURLFilter.class.php
Filters out all non-redirect-URLs from the URLs given in the PHPCrawlerDocumentInfo-object
killChildProcesses
PHPCrawlerProcessCommunication::killChildProcesses() in PHPCrawlerProcessCommunication.class.php
Kills all running child-processes
l
top
$LastModifiedCount
SMCCrawler::$LastModifiedCount in SitemapCreatorCrawler.class.php
get Last Modified header
$lastResponseHeader
PHPCrawlerHTTPRequest::$lastResponseHeader in PHPCrawlerHTTPRequest.class.php
The last response-header this request-instance received.
$LinkCache
PHPCrawlerLinkFinder::$LinkCache in PHPCrawlerLinkFinder.class.php
Cache for storing found links/urls
$LinkCache
PHPCrawler::$LinkCache in PHPCrawler.class.php
The PHPCrawlerLinkCache-Object
$linkcode
PHPCrawlerURLDescriptor::$linkcode in PHPCrawlerURLDescriptor.class.php
The html-codepart that contained the link to this URL, i.e. "<a href="../foo.html">LINKTEXT</a>"
$LinkFinder
PHPCrawlerHTTPRequest::$LinkFinder in PHPCrawlerHTTPRequest.class.php
Link-finder object
$linksearch_content_types
PHPCrawlerHTTPRequest::$linksearch_content_types in PHPCrawlerHTTPRequest.class.php
Contains all rules defining the content-types defining which documents shoud get checked for links.
$links_followed
PHPCrawlerProcessReport::$links_followed in PHPCrawlerProcessReport.class.php
The total number of links/URLs the crawler found and followed.
$links_followed
PHPCrawlerStatus::$links_followed in PHPCrawlerStatus.class.php
Number of links the crawler-instance followed so far
$links_found
PHPCrawlerDocumentInfo::$links_found in PHPCrawlerDocumentInfo.class.php
An numeric array containing information about all links that were found in the source of the page.
$links_found_url_descriptors
PHPCrawlerDocumentInfo::$links_found_url_descriptors in PHPCrawlerDocumentInfo.class.php
An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
$linktext
PHPCrawlerURLDescriptor::$linktext in PHPCrawlerURLDescriptor.class.php
The linktext or html-code the link to this URL was layed over.
$link_priority_array
PHPCrawler::$link_priority_array in PHPCrawler.class.php
$link_raw
PHPCrawlerURLDescriptor::$link_raw in PHPCrawlerURLDescriptor.class.php
The raw link to this URL as it was found in the HTML-source, i.e. "../dunno/index.php"
m
top
$memory_peak_usage
PHPCrawlerProcessReport::$memory_peak_usage in PHPCrawlerProcessReport.class.php
The peak memory-usage the crawling-process caused.
$meta_attributes
PHPCrawlerLinkFinder::$meta_attributes in PHPCrawlerLinkFinder.class.php
Meta-attributes found in the html-source.
$meta_attributes
PHPCrawlerDocumentInfo::$meta_attributes in PHPCrawlerDocumentInfo.class.php
All meta-tag atteributes found in the source of the document.
$min_frequency
SitemapCreator::$min_frequency in SitemapCreator.class.php
Minimum Priority
$min_priority
SitemapCreator::$min_priority in SitemapCreator.class.php
Minimum Priority
$multiprocess_mode
PHPCrawler::$multiprocess_mode in PHPCrawler.class.php
Multiprocess-mode the crawler is runnung in.
$multiprocess_mode
PHPCrawlerProcessCommunication::$multiprocess_mode in PHPCrawlerProcessCommunication.class.php
markUrlAsFollowed
PHPCrawlerURLCacheBase::markUrlAsFollowed() in PHPCrawlerURLCacheBase.class.php
Marks the given URL in the cache as "followed"
markUrlAsFollowed
PHPCrawlerSQLiteURLCache::markUrlAsFollowed() in PHPCrawlerSQLiteURLCache.class.php
Marks the given URL in the cache as "followed"
markUrlAsFollowed
PHPCrawlerMemoryURLCache::markUrlAsFollowed() in PHPCrawlerMemoryURLCache.class.php
Has no function in this memory-cache.
MPMODE_CHILDS_EXECUTES_USERCODE
PHPCrawlerMultiProcessModes::MPMODE_CHILDS_EXECUTES_USERCODE in PHPCrawlerMultiProcessModes.class.php
Crawler runs in multiprocess-mode, usercode is executed by child-processes directly.
MPMODE_NONE
PHPCrawlerMultiProcessModes::MPMODE_NONE in PHPCrawlerMultiProcessModes.class.php
Crawler runs in a single process
MPMODE_PARENT_EXECUTES_USERCODE
PHPCrawlerMultiProcessModes::MPMODE_PARENT_EXECUTES_USERCODE in PHPCrawlerMultiProcessModes.class.php
Crawler runs in multiprocess-mode, usercode is executed by parent-process only.
n
top
$name
PHPCrawlerCookieDescriptor::$name in PHPCrawlerCookieDescriptor.class.php
Cookie-name
$now
SitemapCreator::$now in SitemapCreator.class.php
Current time().
normalizeURL
PHPCrawlerUtils::normalizeURL() in PHPCrawlerUtils.class.php
Normalizes an URL
o
top
$obey_nofollow_tags
PHPCrawlerURLFilter::$obey_nofollow_tags in PHPCrawlerURLFilter.class.php
Defines whether nofollow-tags should get obeyed.
$obey_robots_txt
PHPCrawler::$obey_robots_txt in PHPCrawler.class.php
Defines whether robots.txt-file should be obeyed
$only_count_received_documents
Defines if only documents that were received will be counted.
obeyNoFollowTags
PHPCrawler::obeyNoFollowTags() in PHPCrawler.class.php
Decides whether the crawler should obey "nofollow"-tags
obeyRobotsTxt
PHPCrawler::obeyRobotsTxt() in PHPCrawler.class.php
Decides whether the crawler should parse and obey robots.txt-files.
openConnection
PHPCrawlerSQLiteURLCache::openConnection() in PHPCrawlerSQLiteURLCache.class.php
Creates the sqlite-db-file and opens connection to it.
openConnection
PHPCrawlerDocumentInfoQueue::openConnection() in PHPCrawlerDocumentInfoQueue.class.php
Creates the sqlite-db-file and opens connection to it.
openConnection
PHPCrawlerSQLiteCookieCache::openConnection() in PHPCrawlerSQLiteCookieCache.class.php
Creates the sqlite-db-file and opens connection to it.
openSocket
PHPCrawlerHTTPRequest::openSocket() in PHPCrawlerHTTPRequest.class.php
Opens the socket to the host.
openURL
SitemapCreator::openURL() in SitemapCreator.class.php
Open URL and get respond body or error
p
top
$PageRequest
PHPCrawler::$PageRequest in PHPCrawler.class.php
The PHPCrawlerHTTPRequest-Object
$PageRequest
PHPCrawlerRobotsTxtParser::$PageRequest in PHPCrawlerRobotsTxtParser.class.php
A PHPCrawlerHTTPRequest-object for requesting robots.txt-files.
$path
PHPCrawlerCookieDescriptor::$path in PHPCrawlerCookieDescriptor.class.php
Cookie-path
$path
PHPCrawlerDocumentInfo::$path in PHPCrawlerDocumentInfo.class.php
The path in the URL of the requested page or file, e.g. "/page/".
$path
PHPCrawlerUrlPartsDescriptor::$path in PHPCrawlerUrlPartsDescriptor.class.php
$PDO
PHPCrawlerSQLiteCookieCache::$PDO in PHPCrawlerSQLiteCookieCache.class.php
$PDO
PHPCrawlerDocumentInfoQueue::$PDO in PHPCrawlerDocumentInfoQueue.class.php
$PDO
PHPCrawlerSQLiteURLCache::$PDO in PHPCrawlerSQLiteURLCache.class.php
PDO-object for querying SQLite-file.
$porcess_abort_reason
PHPCrawler::$porcess_abort_reason in PHPCrawler.class.php
The reason why the process was aborted/finished.
$port
PHPCrawlerDocumentInfo::$port in PHPCrawlerDocumentInfo.class.php
The port of the URL the request was send to, e.g. 80
$port
PHPCrawlerUrlPartsDescriptor::$port in PHPCrawlerUrlPartsDescriptor.class.php
$post_data
PHPCrawlerHTTPRequest::$post_data in PHPCrawlerHTTPRequest.class.php
Array containing POST-data to send with the request
$post_data
PHPCrawlerUserSendDataCache::$post_data in PHPCrawlerUserSendDataCache.class.php
Array containing post-data to send.
$PreparedInsertStatement
PHPCrawlerSQLiteURLCache::$PreparedInsertStatement in PHPCrawlerSQLiteURLCache.class.php
Prepared statement for inserting URLS into the db-file as PDOStatement-object.
$prepared_statements_created
PHPCrawlerDocumentInfoQueue::$prepared_statements_created in PHPCrawlerDocumentInfoQueue.class.php
$priority_mode
SitemapCreator::$priority_mode in SitemapCreator.class.php
Priority mode
$ProcessCommunication
PHPCrawler::$ProcessCommunication in PHPCrawler.class.php
ProcessCommunication-object
$process_runtime
PHPCrawlerProcessReport::$process_runtime in PHPCrawlerProcessReport.class.php
The total time the crawling-process was running in seconds.
$protocol
PHPCrawlerDocumentInfo::$protocol in PHPCrawlerDocumentInfo.class.php
The protocol-part of the URL of the page or file, e.g. "http://"
$protocol
PHPCrawlerUrlPartsDescriptor::$protocol in PHPCrawlerUrlPartsDescriptor.class.php
$proxy
PHPCrawlerHTTPRequest::$proxy in PHPCrawlerHTTPRequest.class.php
The proxy to use
PHPCrawlerCookieCacheBase.class.php
PHPCrawlerCookieCacheBase.class.php in PHPCrawlerCookieCacheBase.class.php
PHPCrawlerMemoryCookieCache.class.php
PHPCrawlerMemoryCookieCache.class.php in PHPCrawlerMemoryCookieCache.class.php
PHPCrawlerSQLiteCookieCache.class.php
PHPCrawlerSQLiteCookieCache.class.php in PHPCrawlerSQLiteCookieCache.class.php
PHPCrawlerAbortReasons.class.php
PHPCrawlerAbortReasons.class.php in PHPCrawlerAbortReasons.class.php
PHPCrawlerMultiProcessModes.class.php
PHPCrawlerMultiProcessModes.class.php in PHPCrawlerMultiProcessModes.class.php
PHPCrawlerRequestErrors.class.php
PHPCrawlerRequestErrors.class.php in PHPCrawlerRequestErrors.class.php
PHPCrawlerUrlCacheTypes.class.php
PHPCrawlerUrlCacheTypes.class.php in PHPCrawlerUrlCacheTypes.class.php
PHPCrawler.class.php
PHPCrawler.class.php in PHPCrawler.class.php
PHPCrawlerBenchmark.class.php
PHPCrawlerBenchmark.class.php in PHPCrawlerBenchmark.class.php
PHPCrawlerCookieDescriptor.class.php
PHPCrawlerCookieDescriptor.class.php in PHPCrawlerCookieDescriptor.class.php
PHPCrawlerDNSCache.class.php
PHPCrawlerDNSCache.class.php in PHPCrawlerDNSCache.class.php
PHPCrawlerDocumentInfo.class.php
PHPCrawlerDocumentInfo.class.php in PHPCrawlerDocumentInfo.class.php
PHPCrawlerHTTPRequest.class.php
PHPCrawlerHTTPRequest.class.php in PHPCrawlerHTTPRequest.class.php
PHPCrawlerLinkFinder.class.php
PHPCrawlerLinkFinder.class.php in PHPCrawlerLinkFinder.class.php
PHPCrawlerProcessReport.class.php
PHPCrawlerProcessReport.class.php in PHPCrawlerProcessReport.class.php
PHPCrawlerResponseHeader.class.php
PHPCrawlerResponseHeader.class.php in PHPCrawlerResponseHeader.class.php
PHPCrawlerRobotsTxtParser.class.php
PHPCrawlerRobotsTxtParser.class.php in PHPCrawlerRobotsTxtParser.class.php
PHPCrawlerStatus.class.php
PHPCrawlerStatus.class.php in PHPCrawlerStatus.class.php
PHPCrawlerURLDescriptor.class.php
PHPCrawlerURLDescriptor.class.php in PHPCrawlerURLDescriptor.class.php
PHPCrawlerURLFilter.class.php
PHPCrawlerURLFilter.class.php in PHPCrawlerURLFilter.class.php
PHPCrawlerUrlPartsDescriptor.class.php
PHPCrawlerUrlPartsDescriptor.class.php in PHPCrawlerUrlPartsDescriptor.class.php
PHPCrawlerUserSendDataCache.class.php
PHPCrawlerUserSendDataCache.class.php in PHPCrawlerUserSendDataCache.class.php
PHPCrawlerUtils.class.php
PHPCrawlerUtils.class.php in PHPCrawlerUtils.class.php
PHPCrawlerDocumentInfoQueue.class.php
PHPCrawlerDocumentInfoQueue.class.php in PHPCrawlerDocumentInfoQueue.class.php
PHPCrawlerProcessCommunication.class.php
PHPCrawlerProcessCommunication.class.php in PHPCrawlerProcessCommunication.class.php
PHPCrawlerMemoryURLCache.class.php
PHPCrawlerMemoryURLCache.class.php in PHPCrawlerMemoryURLCache.class.php
PHPCrawlerSQLiteURLCache.class.php
PHPCrawlerSQLiteURLCache.class.php in PHPCrawlerSQLiteURLCache.class.php
PHPCrawlerURLCacheBase.class.php
PHPCrawlerURLCacheBase.class.php in PHPCrawlerURLCacheBase.class.php
parseRobotsTxt
PHPCrawlerRobotsTxtParser::parseRobotsTxt() in PHPCrawlerRobotsTxtParser.class.php
Parses the robots.txt-file related to the given URL and returns regular-expression-rules corresponding to the containing "disallow"-rules that are adressed to the given user-agent.
PHPCrawler
PHPCrawler in PHPCrawler.class.php
PHPCrawl mainclass
PHPCrawlerAbortReasons
PHPCrawlerAbortReasons in PHPCrawlerAbortReasons.class.php
Contains all possible abortreasons for a crawling-process.
PHPCrawlerBenchmark
PHPCrawlerBenchmark in PHPCrawlerBenchmark.class.php
A static benchmark-class for doing benchmarks within phpcrawl.
PHPCrawlerCookieCacheBase
PHPCrawlerCookieCacheBase in PHPCrawlerCookieCacheBase.class.php
Abstract baseclass for storing cookies.
PHPCrawlerCookieDescriptor
PHPCrawlerCookieDescriptor in PHPCrawlerCookieDescriptor.class.php
Describes a cookie within the PHPCrawl-system.
PHPCrawlerDNSCache
PHPCrawlerDNSCache in PHPCrawlerDNSCache.class.php
Simple DNS-cache used by phpcrawl.
PHPCrawlerDocumentInfo
PHPCrawlerDocumentInfo in PHPCrawlerDocumentInfo.class.php
Contains information about a page or file the crawler found and received during the crawling-process.
PHPCrawlerDocumentInfoQueue
PHPCrawlerDocumentInfoQueue in PHPCrawlerDocumentInfoQueue.class.php
Queue for PHPCrawlerDocumentInfo-objects
PHPCrawlerHTTPRequest
PHPCrawlerHTTPRequest in PHPCrawlerHTTPRequest.class.php
Class for performing HTTP-requests.
PHPCrawlerLinkFinder
PHPCrawlerLinkFinder in PHPCrawlerLinkFinder.class.php
Class for finding links in HTML-documents.
PHPCrawlerMemoryCookieCache
PHPCrawlerMemoryCookieCache in PHPCrawlerMemoryCookieCache.class.php
Class for storing/caching cookies in memory.
PHPCrawlerMemoryURLCache
PHPCrawlerMemoryURLCache in PHPCrawlerMemoryURLCache.class.php
Class for caching/storing URLs/links in memory.
PHPCrawlerMultiProcessModes
PHPCrawlerMultiProcessModes in PHPCrawlerMultiProcessModes.class.php
Multiprocessing-modes currently supported by phpcrawl.
PHPCrawlerProcessCommunication
PHPCrawlerProcessCommunication in PHPCrawlerProcessCommunication.class.php
Class containing methods for process handling and communication
PHPCrawlerProcessReport
PHPCrawlerProcessReport in PHPCrawlerProcessReport.class.php
Contains summarizing information about a crawling-process after the process is finished.
PHPCrawlerRequestErrors
PHPCrawlerRequestErrors in PHPCrawlerRequestErrors.class.php
Contains all possible errorcodes for errors that may appear during a http-request.
PHPCrawlerResponseHeader
PHPCrawlerResponseHeader in PHPCrawlerResponseHeader.class.php
Describes an HTTP response-header within the phpcrawl-system.
PHPCrawlerRobotsTxtParser
PHPCrawlerRobotsTxtParser in PHPCrawlerRobotsTxtParser.class.php
Class for parsing robots.txt-files.
PHPCrawlerSQLiteCookieCache
PHPCrawlerSQLiteCookieCache in PHPCrawlerSQLiteCookieCache.class.php
Class for storing/caching cookies in a SQLite-db-file.
PHPCrawlerSQLiteURLCache
PHPCrawlerSQLiteURLCache in PHPCrawlerSQLiteURLCache.class.php
Class for caching/storing URLs/links in a SQLite-database-file.
PHPCrawlerStatus
PHPCrawlerStatus in PHPCrawlerStatus.class.php
Describes the current status of an crawler-instance.
PHPCrawlerURLCacheBase
PHPCrawlerURLCacheBase in PHPCrawlerURLCacheBase.class.php
Abstract baseclass for implemented URL-caching classes.
PHPCrawlerUrlCacheTypes
PHPCrawlerUrlCacheTypes in PHPCrawlerUrlCacheTypes.class.php
Possible cache-types for caching found URLs within the phpcrawl-system.
PHPCrawlerURLDescriptor
PHPCrawlerURLDescriptor in PHPCrawlerURLDescriptor.class.php
Describes a URL within the PHPCrawl-system.
PHPCrawlerURLFilter
PHPCrawlerURLFilter in PHPCrawlerURLFilter.class.php
Class for filtering URLs by given filter-rules.
PHPCrawlerUrlPartsDescriptor
PHPCrawlerUrlPartsDescriptor in PHPCrawlerUrlPartsDescriptor.class.php
Describes the single parts of an URL.
PHPCrawlerUserSendDataCache
PHPCrawlerUserSendDataCache in PHPCrawlerUserSendDataCache.class.php
Cache for storing user-data to send with requests, like cookies, post-data and basic-authentications.
PHPCrawlerUtils
PHPCrawlerUtils in PHPCrawlerUtils.class.php
Static util-methods used by phpcrawl.
ping
SitemapCreator::ping() in SitemapCreator.class.php
Ping search engines
prepareHTTPRequestQuery
PHPCrawlerHTTPRequest::prepareHTTPRequestQuery() in PHPCrawlerHTTPRequest.class.php
Prepares the given HTTP-query-string for the HTTP-request.
prepareSitemapsDir
SitemapCreator::prepareSitemapsDir() in SitemapCreator.class.php
Creates sitemaps directory $sitemaps_dir
printAllBenchmarks
PHPCrawlerBenchmark::printAllBenchmarks() in PHPCrawlerBenchmark.class.php
PRIORITY_CRAWLED_FIRST
SitemapCreator::PRIORITY_CRAWLED_FIRST in SitemapCreator.class.php
Crawled first pages get higher priority
PRIORITY_Disable
SitemapCreator::PRIORITY_Disable in SitemapCreator.class.php
Disables priority calculations
PRIORITY_URL_STRUCTURE
SitemapCreator::PRIORITY_URL_STRUCTURE in SitemapCreator.class.php
Deeper pathes get lower priority
processHTTPHeader
PHPCrawlerLinkFinder::processHTTPHeader() in PHPCrawlerLinkFinder.class.php
Processes the response-header of the document.
processRobotsTxt
PHPCrawler::processRobotsTxt() in PHPCrawler.class.php
processUrl
PHPCrawler::processUrl() in PHPCrawler.class.php
Receives and processes the given URL
purgeCache
PHPCrawlerSQLiteURLCache::purgeCache() in PHPCrawlerSQLiteURLCache.class.php
Cleans/purges the URL-cache from inconsistent entries.
purgeCache
PHPCrawlerURLCacheBase::purgeCache() in PHPCrawlerURLCacheBase.class.php
Cleans/purges the URL-cache from inconsistent entries.
purgeCache
PHPCrawlerMemoryURLCache::purgeCache() in PHPCrawlerMemoryURLCache.class.php
Has no function in this class.
putContent
SitemapCreator::putContent() in SitemapCreator.class.php
Write XML to disk
q
top
$query
PHPCrawlerDocumentInfo::$query in PHPCrawlerDocumentInfo.class.php
The query-part of the URL of the requested page or file, e.g. "?x=y".
$queue_max_size
PHPCrawlerDocumentInfoQueue::$queue_max_size in PHPCrawlerDocumentInfoQueue.class.php
r
top
$received
PHPCrawlerDocumentInfo::$received in PHPCrawlerDocumentInfo.class.php
Flag indicating whether content was received from the page or file.
$received_completely
PHPCrawlerDocumentInfo::$received_completely in PHPCrawlerDocumentInfo.class.php
Flag indicating whether content was completely received from the page or file.
$received_completly
PHPCrawlerDocumentInfo::$received_completly in PHPCrawlerDocumentInfo.class.php
Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl.
$received_to_file
PHPCrawlerDocumentInfo::$received_to_file in PHPCrawlerDocumentInfo.class.php
Will be true if the content was received into temporary file.
$received_to_memory
PHPCrawlerDocumentInfo::$received_to_memory in PHPCrawlerDocumentInfo.class.php
Will be true if the content was received into local memory.
$receive_content_types
PHPCrawlerHTTPRequest::$receive_content_types in PHPCrawlerHTTPRequest.class.php
Contains all rules defining the content-types that should be received
$receive_to_file_content_types
Contains all rules defining the content-types of pages/files that should be streamed directly to a temporary file (instead of to memory)
$referer_url
PHPCrawlerDocumentInfo::$referer_url in PHPCrawlerDocumentInfo.class.php
The complete URL of the page that contained the link to this document.
$refering_linkcode
PHPCrawlerDocumentInfo::$refering_linkcode in PHPCrawlerDocumentInfo.class.php
The html-sourcecode that contained the link to the current document.
$refering_linktext
PHPCrawlerDocumentInfo::$refering_linktext in PHPCrawlerDocumentInfo.class.php
The linktext of the link that "linked" to this document.
$refering_link_raw
PHPCrawlerDocumentInfo::$refering_link_raw in PHPCrawlerDocumentInfo.class.php
Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
$refering_url
PHPCrawlerURLDescriptor::$refering_url in PHPCrawlerURLDescriptor.class.php
The URL of the page that contained the link to the URL described here.
$responseHeader
PHPCrawlerDocumentInfo::$responseHeader in PHPCrawlerDocumentInfo.class.php
The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
$resumtion_enabled
PHPCrawlerProcessCommunication::$resumtion_enabled in PHPCrawlerProcessCommunication.class.php
Flag indicating whether resumtion is activated
$resumtion_enabled
PHPCrawler::$resumtion_enabled in PHPCrawler.class.php
Flag indicating whether resumtion is activated
$RobotsTxtParser
PHPCrawler::$RobotsTxtParser in PHPCrawler.class.php
The RobotsTxtParser-Object
readFromCSV
SitemapCreator::readFromCSV() in SitemapCreator.class.php
Read from CSV file and add to $entries
readResponseContent
PHPCrawlerHTTPRequest::readResponseContent() in PHPCrawlerHTTPRequest.class.php
Reads the response-content.
readResponseHeader
PHPCrawlerHTTPRequest::readResponseHeader() in PHPCrawlerHTTPRequest.class.php
Reads the response-header.
readSitemap
SitemapCreator::readSitemap() in SitemapCreator.class.php
Read sitemap file from disk
registerChildPID
PHPCrawlerProcessCommunication::registerChildPID() in PHPCrawlerProcessCommunication.class.php
Registers the PID of a child-process
removeSitemaps
SitemapCreator::removeSitemaps() in SitemapCreator.class.php
Delete all sitemap dir and files
reset
PHPCrawlerBenchmark::reset() in PHPCrawlerBenchmark.class.php
Resets the clock for the given benchmark.
resetAll
PHPCrawlerBenchmark::resetAll() in PHPCrawlerBenchmark.class.php
Resets all clocks for all benchmarks.
resetLinkCache
PHPCrawlerLinkFinder::resetLinkCache() in PHPCrawlerLinkFinder.class.php
Resets/clears the internal link-cache.
resume
PHPCrawler::resume() in PHPCrawler.class.php
Resumes the crawling-process with the given crawler-ID
rmDir
PHPCrawlerUtils::rmDir() in PHPCrawlerUtils.class.php
Deletes a directory recursivly
s
top
$site
SitemapCreator::$site in SitemapCreator.class.php
The URL of the website.It should be full qualified and normalized.
$sitemaps_count
SitemapCreator::$sitemaps_count in SitemapCreator.class.php
Number of sitemap files created
$sitemaps_dir
SitemapCreator::$sitemaps_dir in SitemapCreator.class.php
Sitemaps directory path auto created in prepareSitemapsDir()
$sitemaps_url
SitemapCreator::$sitemaps_url in SitemapCreator.class.php
Sitemap URL where the sitemap file name will be appended to the end of the URL.
$socket
PHPCrawlerHTTPRequest::$socket in PHPCrawlerHTTPRequest.class.php
The socket used for HTTP-requests
$socketConnectTimeout
PHPCrawlerHTTPRequest::$socketConnectTimeout in PHPCrawlerHTTPRequest.class.php
Timeout-value for socket-connection
$socketReadTimeout
PHPCrawlerHTTPRequest::$socketReadTimeout in PHPCrawlerHTTPRequest.class.php
Socket-read-timeout
$source
PHPCrawlerDocumentInfo::$source in PHPCrawlerDocumentInfo.class.php
Same as "content", the content of the requested document.
$SourceUrl
PHPCrawlerLinkFinder::$SourceUrl in PHPCrawlerLinkFinder.class.php
The URL of the html-source to find links from
$source_domain
PHPCrawlerCookieDescriptor::$source_domain in PHPCrawlerCookieDescriptor.class.php
The domain the cookie was send from
$source_url
PHPCrawlerResponseHeader::$source_url in PHPCrawlerResponseHeader.class.php
The URL of the website the header was recevied from.
$source_url
PHPCrawlerCookieDescriptor::$source_url in PHPCrawlerCookieDescriptor.class.php
The URL the cookie was send from
$sqlite_db_file
PHPCrawlerSQLiteCookieCache::$sqlite_db_file in PHPCrawlerSQLiteCookieCache.class.php
$sqlite_db_file
PHPCrawlerSQLiteURLCache::$sqlite_db_file in PHPCrawlerSQLiteURLCache.class.php
$sqlite_db_file
PHPCrawlerDocumentInfoQueue::$sqlite_db_file in PHPCrawlerDocumentInfoQueue.class.php
$starting_url
PHPCrawler::$starting_url in PHPCrawler.class.php
The URL the crawler should start with.
$starting_url
PHPCrawlerURLFilter::$starting_url in PHPCrawlerURLFilter.class.php
The full qualified and normalized URL the crawling-prpocess was started with.
$starting_url_parts
PHPCrawlerURLFilter::$starting_url_parts in PHPCrawlerURLFilter.class.php
The URL-parts of the starting-url.
sendRequest
PHPCrawlerHTTPRequest::sendRequest() in PHPCrawlerHTTPRequest.class.php
Sends the HTTP-request and receives the page/file.
sendRequestHeader
PHPCrawlerHTTPRequest::sendRequestHeader() in PHPCrawlerHTTPRequest.class.php
Send the request-header.
serializeToFile
PHPCrawlerUtils::serializeToFile() in PHPCrawlerUtils.class.php
Serializes data (objects, arrayse etc.) and writes it to the given file.
setAggressiveLinkExtraction
Alias for enableAggressiveLinkSearch()
setBaseURL
PHPCrawlerURLFilter::setBaseURL() in PHPCrawlerURLFilter.class.php
Sets the base-URL of the crawling process some rules relate to
setBasicAuthentication
PHPCrawlerHTTPRequest::setBasicAuthentication() in PHPCrawlerHTTPRequest.class.php
Sets basic-authentication login-data for protected URLs.
setConnectionTimeout
PHPCrawler::setConnectionTimeout() in PHPCrawler.class.php
Sets the timeout in seconds for connection tries to hosting webservers.
setContentSizeLimit
PHPCrawler::setContentSizeLimit() in PHPCrawler.class.php
Sets the content-size-limit for content the crawler should receive from documents.
setContentSizeLimit
PHPCrawlerHTTPRequest::setContentSizeLimit() in PHPCrawlerHTTPRequest.class.php
Sets the size-limit in bytes for content the request should receive.
setCookieHandling
PHPCrawler::setCookieHandling() in PHPCrawler.class.php
Alias for enableCookieHandling()
setCrawlerDefaults
SitemapCreator::setCrawlerDefaults() in SitemapCreator.class.php
Load default crawler settings for $Crawler
setCrawlerStatus
PHPCrawlerProcessCommunication::setCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Sets/writes the current crawler-status
setDataDir
SitemapCreator::setDataDir() in SitemapCreator.class.php
Sets the data directory path of the sitemaps $data_dir
setEntries
SitemapCreator::setEntries() in SitemapCreator.class.php
Set the URLs sets manually $entries
setEntriesPerSitemap
SitemapCreator::setEntriesPerSitemap() in SitemapCreator.class.php
Sets number of URLs set for each sitemap file $entries_per_sitemap
setFindRedirectURLs
PHPCrawlerHTTPRequest::setFindRedirectURLs() in PHPCrawlerHTTPRequest.class.php
Specifies whether redirect-links set in http-headers should get searched for.
setFollowMode
PHPCrawler::setFollowMode() in PHPCrawler.class.php
Sets the basic follow-mode of the crawler.
setFollowRedirects
PHPCrawler::setFollowRedirects() in PHPCrawler.class.php
Defines whether the crawler should follow redirects sent with headers by a webserver or not.
setFollowRedirectsTillContent
Defines whether the crawler should follow HTTP-redirects until first content was found, regardless of defined filter-rules and follow-modes.
setFrequency
SitemapCreator::setFrequency() in SitemapCreator.class.php
Set Frequency mode $frequency_mode
setHeaderCheckCallbackFunction
setLinkExtractionTags
PHPCrawler::setLinkExtractionTags() in PHPCrawler.class.php
Sets the list of html-tags the crawler should search for links in.
setLinkExtractionTags
PHPCrawlerHTTPRequest::setLinkExtractionTags() in PHPCrawlerHTTPRequest.class.php
Sets the html-tags from which to extract/find links from.
setLinksFoundArray
PHPCrawlerDocumentInfo::setLinksFoundArray() in PHPCrawlerDocumentInfo.class.php
Workaround-method, copies and converts the array $links_found_url_descriptors to $links_found.
setMinFrequency
SitemapCreator::setMinFrequency() in SitemapCreator.class.php
Set minimum Frequency value for all URLs $min_priority
setMinPriority
SitemapCreator::setMinPriority() in SitemapCreator.class.php
Set minimum Priority value for all URLs $min_priority
setPageLimit
PHPCrawler::setPageLimit() in PHPCrawler.class.php
Sets a limit to the number of pages/files the crawler should follow.
setPort
PHPCrawler::setPort() in PHPCrawler.class.php
Sets the port to connect to for crawling the starting-url set in setUrl().
setPriority
SitemapCreator::setPriority() in SitemapCreator.class.php
Set Priority mode $priority_mode
setProxy
PHPCrawler::setProxy() in PHPCrawler.class.php
Assigns a proxy-server the crawler should use for all HTTP-Requests.
setProxy
PHPCrawlerHTTPRequest::setProxy() in PHPCrawlerHTTPRequest.class.php
setSite
SitemapCreator::setSite() in SitemapCreator.class.php
Sets the URL of the website $site
setSitemapURL
SitemapCreator::setSitemapURL() in SitemapCreator.class.php
Sets the URL of the sitemap files $sitemaps_url
setSourceUrl
PHPCrawlerLinkFinder::setSourceUrl() in PHPCrawlerLinkFinder.class.php
Sets the source-URL of the document to find links in
setStreamTimeout
PHPCrawler::setStreamTimeout() in PHPCrawler.class.php
Sets the timeout in seconds for waiting for data on an established server-connection.
setTmpFile
PHPCrawlerHTTPRequest::setTmpFile() in PHPCrawlerHTTPRequest.class.php
Sets the temporary file to use when content of found documents should be streamed directly into a temporary file.
setTmpFile
PHPCrawler::setTmpFile() in PHPCrawler.class.php
Has no function anymore.
setTrafficLimit
PHPCrawler::setTrafficLimit() in PHPCrawler.class.php
Sets a limit to the number of bytes the crawler should receive alltogether during crawling-process.
setUrl
PHPCrawlerHTTPRequest::setUrl() in PHPCrawlerHTTPRequest.class.php
Sets the URL for the request.
setURL
PHPCrawler::setURL() in PHPCrawler.class.php
Sets the URL of the first page the crawler should crawl (root-page).
setUrlCacheType
PHPCrawler::setUrlCacheType() in PHPCrawler.class.php
Defines what type of cache will be internally used for caching URLs.
setUserAgentString
PHPCrawler::setUserAgentString() in PHPCrawler.class.php
Sets the "User-Agent" identification-string that will be send with HTTP-requests.
setWorkingDirectory
PHPCrawler::setWorkingDirectory() in PHPCrawler.class.php
Sets the working-directory the crawler should use for storing temporary data.
SitemapCreator
SitemapCreator in SitemapCreator.class.php
Sitemap Creator creates XML sitemaps files compatible with the standard sitemaps.org protocol and supported by Google and Bing.
SitemapCreator.class.php
SitemapCreator.class.php in SitemapCreator.class.php
SitemapCreatorCrawler.class.php
SitemapCreatorCrawler.class.php in SitemapCreatorCrawler.class.php
SMCCrawler
SMCCrawler in SitemapCreatorCrawler.class.php
Loading external PHPCrawler-class
sort2dArray
PHPCrawlerUtils::sort2dArray() in PHPCrawlerUtils.class.php
Sorts a twodimensiolnal array.
splitURL
PHPCrawlerUtils::splitURL() in PHPCrawlerUtils.class.php
Splits an URL into its parts
starControllerProcessLoop
Starts the loop of the controller-process (main-process).
start
PHPCrawlerBenchmark::start() in PHPCrawlerBenchmark.class.php
Starts the clock for the given benchmark.
startChildProcessLoop
PHPCrawler::startChildProcessLoop() in PHPCrawler.class.php
Starts the loop of a child-process.
stop
PHPCrawlerBenchmark::stop() in PHPCrawlerBenchmark.class.php
Stops the benchmark-clock for the given benchmark.
t
top
$temporary_benchmarks
PHPCrawlerBenchmark::$temporary_benchmarks in PHPCrawlerBenchmark.class.php
$tmpFile
PHPCrawlerHTTPRequest::$tmpFile in PHPCrawlerHTTPRequest.class.php
The TMP-File to use when a page/file should be streamed to file.
$top_lines_processed
PHPCrawlerLinkFinder::$top_lines_processed in PHPCrawlerLinkFinder.class.php
Flag indicating whether the top lines of the HTML-source were processed.
$traffic_limit
PHPCrawler::$traffic_limit in PHPCrawler.class.php
Limit of bytes to receive
$traffic_limit_reached
PHPCrawlerProcessReport::$traffic_limit_reached in PHPCrawlerProcessReport.class.php
Will be TRUE if the crawling-process stopped becaus the traffic-limit was reached.
$traffic_limit_reached
PHPCrawlerDocumentInfo::$traffic_limit_reached in PHPCrawlerDocumentInfo.class.php
Indicated whether the traffic-limit set by the user was reached after downloading this document.
toArray
PHPCrawlerUrlPartsDescriptor::toArray() in PHPCrawlerUrlPartsDescriptor.class.php
toArray
PHPCrawlerProcessReport::toArray() in PHPCrawlerProcessReport.class.php
Returns an array with all properties of this class.
toArray
PHPCrawlerDocumentInfo::toArray() in PHPCrawlerDocumentInfo.class.php
Returns an array with all properties of this class.
u
top
$url
PHPCrawlerDocumentInfo::$url in PHPCrawlerDocumentInfo.class.php
The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
$urlcache_purged
PHPCrawler::$urlcache_purged in PHPCrawler.class.php
Flag indicating whether the URL-cahce was purged at the beginning of a crawling-process
$UrlDescriptor
PHPCrawlerHTTPRequest::$UrlDescriptor in PHPCrawlerHTTPRequest.class.php
The URL for the request as PHPCrawlerURLDescriptor-object
$UrlFilter
PHPCrawler::$UrlFilter in PHPCrawler.class.php
The UrlFilter-Object
$urls
PHPCrawlerMemoryURLCache::$urls in PHPCrawlerMemoryURLCache.class.php
$url_cache_type
PHPCrawler::$url_cache_type in PHPCrawler.class.php
URl cache-type.
$url_distinct_property
PHPCrawlerURLCacheBase::$url_distinct_property in PHPCrawlerURLCacheBase.class.php
Defines which property of an URL is used to ensure that each URL is only cached once.
$url_filter_rules
PHPCrawlerURLFilter::$url_filter_rules in PHPCrawlerURLFilter.class.php
Array containing regex-rules for URLs that should NOT be followed.
$url_follow_rules
PHPCrawlerURLFilter::$url_follow_rules in PHPCrawlerURLFilter.class.php
Array containing regex-rules for URLs that should be followed.
$url_map
PHPCrawlerMemoryURLCache::$url_map in PHPCrawlerMemoryURLCache.class.php
$url_parts
PHPCrawlerHTTPRequest::$url_parts in PHPCrawlerHTTPRequest.class.php
The parts of the URL for the request as returned by PHPCrawlerUtils::splitURL()
$url_priorities
PHPCrawlerURLCacheBase::$url_priorities in PHPCrawlerURLCacheBase.class.php
$url_rebuild
PHPCrawlerURLDescriptor::$url_rebuild in PHPCrawlerURLDescriptor.class.php
The complete, full qualified and normalized URL
$useragent
SitemapCreator::$useragent in SitemapCreator.class.php
$userAgentString
PHPCrawlerHTTPRequest::$userAgentString in PHPCrawlerHTTPRequest.class.php
The user-agent-string
$UserSendDataCache
PHPCrawler::$UserSendDataCache in PHPCrawler.class.php
UserSendDataCahce-object.
$user_abort
PHPCrawlerProcessReport::$user_abort in PHPCrawlerProcessReport.class.php
Will be TRUE if the crawling-process stopped because the overridable function handleDocumentInfo() returned a negative value.
$use_gzip
SitemapCreator::$use_gzip in SitemapCreator.class.php
choose to save sitemaps in gzip format
updateCrawlerStatus
PHPCrawlerProcessCommunication::updateCrawlerStatus() in PHPCrawlerProcessCommunication.class.php
Updates the status of the crawler
URLCACHE_MEMORY
PHPCrawlerUrlCacheTypes::URLCACHE_MEMORY in PHPCrawlerUrlCacheTypes.class.php
URLs get cached in local RAM. Best performance.
URLCACHE_SQLITE
PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE in PHPCrawlerUrlCacheTypes.class.php
URLs get cached in a SQLite-database-file. Recommended for spidering huge websites.
URLHASH_NONE
PHPCrawlerURLCacheBase::URLHASH_NONE in PHPCrawlerURLCacheBase.class.php
URLHASH_RAWLINK
PHPCrawlerURLCacheBase::URLHASH_RAWLINK in PHPCrawlerURLCacheBase.class.php
URLHASH_URL
PHPCrawlerURLCacheBase::URLHASH_URL in PHPCrawlerURLCacheBase.class.php
urlHostInCache
PHPCrawlerDNSCache::urlHostInCache() in PHPCrawlerDNSCache.class.php
Checks whether the hostname of the given URL is already cached
urlMatchesRules
PHPCrawlerURLFilter::urlMatchesRules() in PHPCrawlerURLFilter.class.php
Checks whether a given URL matches the rules.
useGzip
SitemapCreator::useGzip() in SitemapCreator.class.php
Use gzip compressed sitemaps files $use_gzip
v
top
$value
PHPCrawlerCookieDescriptor::$value in PHPCrawlerCookieDescriptor.class.php
Cookie-value
validSitemapName
SitemapCreator::validSitemapName() in SitemapCreator.class.php
Validates sitemap filename
w
top
$working_base_directory
PHPCrawler::$working_base_directory in PHPCrawler.class.php
Base-directory for temporary directories
$working_directory
PHPCrawlerDocumentInfoQueue::$working_directory in PHPCrawlerDocumentInfoQueue.class.php
$working_directory
PHPCrawlerProcessCommunication::$working_directory in PHPCrawlerProcessCommunication.class.php
$working_directory
PHPCrawler::$working_directory in PHPCrawler.class.php
Complete path to the temporary directory
writeIndex
SitemapCreator::writeIndex() in SitemapCreator.class.php
Write sitemap index file
writeSitemap
SitemapCreator::writeSitemap() in SitemapCreator.class.php
Write sitemap file
writeToCSV
SitemapCreator::writeToCSV() in SitemapCreator.class.php
Write to CSV file
x
top
$xml_foot
SitemapCreator::$xml_foot in SitemapCreator.class.php
$xml_head
SitemapCreator::$xml_head in SitemapCreator.class.php
$xml_url_set
SitemapCreator::$xml_url_set in SitemapCreator.class.php
XML string containing sitemap <urlset></urlset> elements
_
top
__construct
PHPCrawlerSQLiteCookieCache::__construct() in PHPCrawlerSQLiteCookieCache.class.php
__construct
PHPCrawlerRobotsTxtParser::__construct() in PHPCrawlerRobotsTxtParser.class.php
__construct
PHPCrawlerSQLiteURLCache::__construct() in PHPCrawlerSQLiteURLCache.class.php
Initiates an SQLite-URL-cache.
__construct
PHPCrawlerURLDescriptor::__construct() in PHPCrawlerURLDescriptor.class.php
Initiates an URL-descriptor
__construct
SitemapCreator::__construct() in SitemapCreator.class.php
Initiates a new Sitemap.
__construct
PHPCrawlerResponseHeader::__construct() in PHPCrawlerResponseHeader.class.php
Initiates an new PHPCrawlerResponseHeader.
__construct
PHPCrawlerProcessCommunication::__construct() in PHPCrawlerProcessCommunication.class.php
Initiates a new PHPCrawlerProcessCommunication-object.
__construct
PHPCrawlerDNSCache::__construct() in PHPCrawlerDNSCache.class.php
__construct
PHPCrawlerCookieDescriptor::__construct() in PHPCrawlerCookieDescriptor.class.php
Initiates a new PHPCrawlerCookieDescriptor-object.
__construct
PHPCrawlerDocumentInfoQueue::__construct() in PHPCrawlerDocumentInfoQueue.class.php
Initiates a PHPCrawlerDocumentInfoQueue
__construct
PHPCrawlerHTTPRequest::__construct() in PHPCrawlerHTTPRequest.class.php
__construct
PHPCrawlerLinkFinder::__construct() in PHPCrawlerLinkFinder.class.php
__construct
PHPCrawler::__construct() in PHPCrawler.class.php
Initiates a new crawler.
a b c d e f g h i k l m n o p q r s t u v w x _