[phpcrawl] element index

Package indexes

All elements

top

__construct: PHPCrawlerRobotsTxtParser::__construct() in PHPCrawlerRobotsTxtParser.class.php
__construct: PHPCrawlerResponseHeader::__construct() in PHPCrawlerResponseHeader.class.php

Initiates an new PHPCrawlerResponseHeader.
__construct: PHPCrawlerSQLiteCookieCache::__construct() in PHPCrawlerSQLiteCookieCache.class.php
__construct: PHPCrawlerSQLiteURLCache::__construct() in PHPCrawlerSQLiteURLCache.class.php

Initiates an SQLite-URL-cache.
__construct: PHPCrawlerURLDescriptor::__construct() in PHPCrawlerURLDescriptor.class.php

Initiates an URL-descriptor
__construct: PHPCrawlerProcessCommunication::__construct() in PHPCrawlerProcessCommunication.class.php

Initiates a new PHPCrawlerProcessCommunication-object.
__construct: PHPCrawlerLinkFinder::__construct() in PHPCrawlerLinkFinder.class.php
__construct: PHPCrawlerCookieDescriptor::__construct() in PHPCrawlerCookieDescriptor.class.php

Initiates a new PHPCrawlerCookieDescriptor-object.
__construct: PHPCrawlerDNSCache::__construct() in PHPCrawlerDNSCache.class.php
__construct: PHPCrawlerDocumentInfoQueue::__construct() in PHPCrawlerDocumentInfoQueue.class.php

Initiates a PHPCrawlerDocumentInfoQueue
__construct: PHPCrawlerHTTPRequest::__construct() in PHPCrawlerHTTPRequest.class.php
__construct: PHPCrawler::__construct() in PHPCrawler.class.php

Initiates a new crawler.

top

$abort_reason: PHPCrawlerStatus::$abort_reason in PHPCrawlerStatus.class.php

Abort reason for aborting the crawling-process.
$abort_reason: PHPCrawlerProcessReport::$abort_reason in PHPCrawlerProcessReport.class.php

Reason for the abortion of the crawling-process
$aggressive_search: PHPCrawlerLinkFinder::$aggressive_search in PHPCrawlerLinkFinder.class.php

Specifies whether links will also be searched outside of HTML-tags
$auth_password: PHPCrawlerUrlPartsDescriptor::$auth_password in PHPCrawlerUrlPartsDescriptor.class.php
$auth_username: PHPCrawlerUrlPartsDescriptor::$auth_username in PHPCrawlerUrlPartsDescriptor.class.php
addBasicAuthentication: PHPCrawlerUserSendDataCache::addBasicAuthentication() in PHPCrawlerUserSendDataCache.class.php

Adds a basic-authentication (username and password) to the list of authentications that will be send with requests.
addBasicAuthentication: PHPCrawler::addBasicAuthentication() in PHPCrawler.class.php

Adds a basic-authentication (username and password) to the list of basic authentications that will be send with requests.
addContentTypeReceiveRule: PHPCrawler::addContentTypeReceiveRule() in PHPCrawler.class.php

Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addCookie: PHPCrawlerMemoryCookieCache::addCookie() in PHPCrawlerMemoryCookieCache.class.php

Adds a cookie to the cookie-cache.
addCookie: PHPCrawlerSQLiteCookieCache::addCookie() in PHPCrawlerSQLiteCookieCache.class.php

Adds a cookie to the cookie-cache.
addCookie: PHPCrawlerHTTPRequest::addCookie() in PHPCrawlerHTTPRequest.class.php

Adds a cookie to send with the request.
addCookie: PHPCrawlerCookieCacheBase::addCookie() in PHPCrawlerCookieCacheBase.class.php

Adds a cookie to the cookie-cache.
addCookieDescriptor: PHPCrawlerHTTPRequest::addCookieDescriptor() in PHPCrawlerHTTPRequest.class.php

Adds a cookie to send with the request.
addCookieDescriptors: PHPCrawlerHTTPRequest::addCookieDescriptors() in PHPCrawlerHTTPRequest.class.php

Adds a bunch of cookies to send with the request
addCookies: PHPCrawlerCookieCacheBase::addCookies() in PHPCrawlerCookieCacheBase.class.php

Adds a bunch of cookies to the cookie-cache.
addCookies: PHPCrawlerMemoryCookieCache::addCookies() in PHPCrawlerMemoryCookieCache.class.php

Adds a bunch of cookies to the cookie-cache.
addCookies: PHPCrawlerSQLiteCookieCache::addCookies() in PHPCrawlerSQLiteCookieCache.class.php

Adds a bunch of cookies to the cookie-cache.
addDocumentInfo: PHPCrawlerDocumentInfoQueue::addDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php

Adds a PHPCrawlerDocumentInfo-object to the queue
addFollowMatch: PHPCrawler::addFollowMatch() in PHPCrawler.class.php

Alias for addURLFollowRule().
addLinkExtractionTags: PHPCrawler::addLinkExtractionTags() in PHPCrawler.class.php

Sets the list of html-tags from which links should be extracted from.
addLinkPriorities: PHPCrawlerURLCacheBase::addLinkPriorities() in PHPCrawlerURLCacheBase.class.php

Adds a bunch of link-priorities
addLinkPriority: PHPCrawlerURLCacheBase::addLinkPriority() in PHPCrawlerURLCacheBase.class.php

Adds a Link-Priority-Level
addLinkPriority: PHPCrawler::addLinkPriority() in PHPCrawler.class.php

Adds a regular expression togehter with a priority-level to the list of rules that decide what links should be prefered.
addLinkSearchContentType: PHPCrawler::addLinkSearchContentType() in PHPCrawler.class.php

Adds a rule to the list of rules that decide in what kind of documents the crawler should search for links in (regarding their content-type)
addLinkSearchContentType: PHPCrawlerHTTPRequest::addLinkSearchContentType() in PHPCrawlerHTTPRequest.class.php

Adds a rule to the list of rules that decide what kind of documents should get checked for links in (regarding their content-type)
addLinkToCache: PHPCrawlerLinkFinder::addLinkToCache() in PHPCrawlerLinkFinder.class.php
addNonFollowMatch: PHPCrawler::addNonFollowMatch() in PHPCrawler.class.php

Alias for addURLFilterRule().
addPostData: PHPCrawlerHTTPRequest::addPostData() in PHPCrawlerHTTPRequest.class.php

Adds post-data to send with the request.
addPostData: PHPCrawler::addPostData() in PHPCrawler.class.php

Adds post-data together with an URL-rule to the list of post-data to send with requests.
addPostData: PHPCrawlerUserSendDataCache::addPostData() in PHPCrawlerUserSendDataCache.class.php

Adds post-data together with an URL-regex to the list of post-data to send with requests.
addReceiveContentType: PHPCrawlerHTTPRequest::addReceiveContentType() in PHPCrawlerHTTPRequest.class.php

Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received
addReceiveContentType: PHPCrawler::addReceiveContentType() in PHPCrawler.class.php

Alias for addContentTypeReceiveRule().
addReceiveToMemoryMatch: PHPCrawler::addReceiveToMemoryMatch() in PHPCrawler.class.php

Has no function anymore!
addReceiveToTmpFileMatch: PHPCrawler::addReceiveToTmpFileMatch() in PHPCrawler.class.php

Alias for addStreamToFileContentType().
addStreamToFileContentType: PHPCrawlerHTTPRequest::addStreamToFileContentType() in PHPCrawlerHTTPRequest.class.php

Adds a rule to the list of rules that decides what types of content should be streamed diretly to the temporary file.
addStreamToFileContentType: PHPCrawler::addStreamToFileContentType() in PHPCrawler.class.php

Adds a rule to the list of rules that decides what types of content should be streamed diretly to a temporary file.
addURL: PHPCrawlerSQLiteURLCache::addURL() in PHPCrawlerSQLiteURLCache.class.php

Adds an URL to the url-cache
addURL: PHPCrawlerURLCacheBase::addURL() in PHPCrawlerURLCacheBase.class.php

Adds an URL to the url-cache
addURL: PHPCrawlerMemoryURLCache::addURL() in PHPCrawlerMemoryURLCache.class.php

Adds an URL to the url-cache
addURLFilterRule: PHPCrawlerURLFilter::addURLFilterRule() in PHPCrawlerURLFilter.class.php

Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRule: PHPCrawler::addURLFilterRule() in PHPCrawler.class.php

Adds a rule to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFilterRules: PHPCrawlerURLFilter::addURLFilterRules() in PHPCrawlerURLFilter.class.php

Adds a bunch of rules to the list of rules that decide which URLs found on a page should be ignored by the crawler.
addURLFollowRule: PHPCrawlerURLFilter::addURLFollowRule() in PHPCrawlerURLFilter.class.php
addURLFollowRule: PHPCrawler::addURLFollowRule() in PHPCrawler.class.php

Adds a rule to the list of rules that decide which URLs found on a page should be followd explicitly.
addURLs: PHPCrawlerMemoryURLCache::addURLs() in PHPCrawlerMemoryURLCache.class.php

Adds an bunch of URLs to the url-cache
addURLs: PHPCrawlerURLCacheBase::addURLs() in PHPCrawlerURLCacheBase.class.php

Adds an bunch of URLs to the url-cache
addURLs: PHPCrawlerSQLiteURLCache::addURLs() in PHPCrawlerSQLiteURLCache.class.php

Adds an bunch of URLs to the url-cache
addURL_Entry: SMCCrawler::addURL_Entry() in SitemapCreatorCrawler.class.php

add URL entry $entries

top

$baseUrlParts: PHPCrawlerLinkFinder::$baseUrlParts in PHPCrawlerLinkFinder.class.php

Parts of the base-url as PHPCrawlerUrlPartsDescriptor-object
$basic_authentications: PHPCrawlerUserSendDataCache::$basic_authentications in PHPCrawlerUserSendDataCache.class.php

Array containing basic-authentications to send.
$benchmarks: PHPCrawlerDocumentInfo::$benchmarks in PHPCrawlerDocumentInfo.class.php

Some internal benchmak-results as array.
$benchmark_results: PHPCrawlerBenchmark::$benchmark_results in PHPCrawlerBenchmark.class.php
$benchmark_startcount: PHPCrawlerBenchmark::$benchmark_startcount in PHPCrawlerBenchmark.class.php
$benchmark_starttimes: PHPCrawlerBenchmark::$benchmark_starttimes in PHPCrawlerBenchmark.class.php
$bytes_received: PHPCrawlerDocumentInfo::$bytes_received in PHPCrawlerDocumentInfo.class.php

The number of bytes the crawler received of the content of the document.
$bytes_received: PHPCrawlerProcessReport::$bytes_received in PHPCrawlerProcessReport.class.php

The total number of bytes the crawler received alltogether.
$bytes_received: PHPCrawlerStatus::$bytes_received in PHPCrawlerStatus.class.php

Number of bytes the crawler-instance received so far
buildCookieHeader: PHPCrawlerHTTPRequest::buildCookieHeader() in PHPCrawlerHTTPRequest.class.php

Builds the cookie-header-part for the header to send.
buildPostContent: PHPCrawlerHTTPRequest::buildPostContent() in PHPCrawlerHTTPRequest.class.php

Builds the post-content from the postdata-array for the header to send with the request (MIME-style)
buildRegExpressions: PHPCrawlerRobotsTxtParser::buildRegExpressions() in PHPCrawlerRobotsTxtParser.class.php

Returns an array containig regular-expressions corresponding to the given robots.txt-style "Disallow"-lines
buildRequestHeader: PHPCrawlerHTTPRequest::buildRequestHeader() in PHPCrawlerHTTPRequest.class.php

Builds the request-header from the given settings.
buildURLFromLink: PHPCrawlerUtils::buildURLFromLink() in PHPCrawlerUtils.class.php

Reconstructs a full qualified and normalized URL from a given link relating to the URL the link was found in.
buildURLFromParts: PHPCrawlerUtils::buildURLFromParts() in PHPCrawlerUtils.class.php

Builds an URL from it's single parts.

top

$child_process_number: PHPCrawler::$child_process_number in PHPCrawler.class.php

Number of child-process (NOT the PID!)
$class_version: PHPCrawler::$class_version in PHPCrawler.class.php
$content: PHPCrawlerDocumentInfo::$content in PHPCrawlerDocumentInfo.class.php

The content of the requested document (html-sourcecode or content of file).
$content_length: PHPCrawlerResponseHeader::$content_length in PHPCrawlerResponseHeader.class.php

The content-length as stated in the header.
$content_size_limit: PHPCrawlerHTTPRequest::$content_size_limit in PHPCrawlerHTTPRequest.class.php

Limit for content-size to receive
$content_tmp_file: PHPCrawlerDocumentInfo::$content_tmp_file in PHPCrawlerDocumentInfo.class.php

The temporary file to which the content was received.
$content_type: PHPCrawlerDocumentInfo::$content_type in PHPCrawlerDocumentInfo.class.php

The content-type of the page or file, e.g. "text/html" or "image/gif".
$content_type: PHPCrawlerResponseHeader::$content_type in PHPCrawlerResponseHeader.class.php

The content-type
$CookieCache: PHPCrawler::$CookieCache in PHPCrawler.class.php

The PHPCrawlerCookieCache-Object
$cookies: PHPCrawlerResponseHeader::$cookies in PHPCrawlerResponseHeader.class.php

All cookies found in the header
$cookies: PHPCrawlerDocumentInfo::$cookies in PHPCrawlerDocumentInfo.class.php

Cookies send by the server.
$cookies: PHPCrawlerMemoryCookieCache::$cookies in PHPCrawlerMemoryCookieCache.class.php
$cookie_array: PHPCrawlerHTTPRequest::$cookie_array in PHPCrawlerHTTPRequest.class.php

Array containing cookies to send with the request
$cookie_handling_enabled: PHPCrawler::$cookie_handling_enabled in PHPCrawler.class.php

Flag cookie-handling enabled/diabled
$cookie_send_time: PHPCrawlerCookieDescriptor::$cookie_send_time in PHPCrawlerCookieDescriptor.class.php

The time the cookie was send
$crawlerStatus: PHPCrawlerProcessCommunication::$crawlerStatus in PHPCrawlerProcessCommunication.class.php
$crawler_uniqid: PHPCrawlerProcessCommunication::$crawler_uniqid in PHPCrawlerProcessCommunication.class.php
$crawler_uniqid: PHPCrawler::$crawler_uniqid in PHPCrawler.class.php

UID of this instance of the crawler
$CurrentDocumentInfo: PHPCrawlerURLFilter::$CurrentDocumentInfo in PHPCrawlerURLFilter.class.php

Current PHPCrawlerDocumentInfo-object of the current document
checkForAbort: PHPCrawler::checkForAbort() in PHPCrawler.class.php

Checks if the crawling-process should be aborted.
checkRegexPattern: PHPCrawlerUtils::checkRegexPattern() in PHPCrawlerUtils.class.php

Checks whether a given RegEx-pattern is valid or not.
checkStringAgainstRegexArray: PHPCrawlerUtils::checkStringAgainstRegexArray() in PHPCrawlerUtils.class.php

Checks whether a given string matches with one of the given regular-expressions.
childProcessAlive: PHPCrawlerProcessCommunication::childProcessAlive() in PHPCrawlerProcessCommunication.class.php

Checks wehther any child-processes a (still) running.
cleanup: PHPCrawlerSQLiteURLCache::cleanup() in PHPCrawlerSQLiteURLCache.class.php

Cleans up the cache after is it not needed anymore.
cleanup: PHPCrawlerURLCacheBase::cleanup() in PHPCrawlerURLCacheBase.class.php

Do cleanups after the cache is not needed anymore
cleanup: PHPCrawler::cleanup() in PHPCrawler.class.php

Cleans up the crawler after it has finished.
cleanup: PHPCrawlerMemoryURLCache::cleanup() in PHPCrawlerMemoryURLCache.class.php

Has no function in this class.
clear: PHPCrawlerSQLiteURLCache::clear() in PHPCrawlerSQLiteURLCache.class.php

Removes all URLs and all priority-rules from the URL-cache.
clear: PHPCrawlerMemoryURLCache::clear() in PHPCrawlerMemoryURLCache.class.php

Removes all URLs and all priority-rules from the URL-cache.
clear: PHPCrawlerURLCacheBase::clear() in PHPCrawlerURLCacheBase.class.php

Removes all URLs and all priority-rules from the URL-cache.
clearCookies: PHPCrawlerHTTPRequest::clearCookies() in PHPCrawlerHTTPRequest.class.php

Removes all cookies to send with the request.
clearPostData: PHPCrawlerHTTPRequest::clearPostData() in PHPCrawlerHTTPRequest.class.php

Removes all post-data to send with the request.
containsURLs: PHPCrawlerURLCacheBase::containsURLs() in PHPCrawlerURLCacheBase.class.php

Checks whether there are URLs left in the cache or not.
containsURLs: PHPCrawlerSQLiteURLCache::containsURLs() in PHPCrawlerSQLiteURLCache.class.php

Checks whether there are URLs left in the cache that should be processed or not.
containsURLs: PHPCrawlerMemoryURLCache::containsURLs() in PHPCrawlerMemoryURLCache.class.php

Checks whether there are URLs left in the cache or not.
createPreparedInsertStatement: PHPCrawlerSQLiteURLCache::createPreparedInsertStatement() in PHPCrawlerSQLiteURLCache.class.php

Creates the prepared statement for insterting URLs into database (if not done yet)
createPreparedStatements: PHPCrawlerDocumentInfoQueue::createPreparedStatements() in PHPCrawlerDocumentInfoQueue.class.php
createWorkingDirectory: PHPCrawler::createWorkingDirectory() in PHPCrawler.class.php

Creates the working-directory for this instance of the cralwer.

top

$data_throughput: PHPCrawlerProcessReport::$data_throughput in PHPCrawlerProcessReport.class.php

The average data-throughput in bytes per second.
$data_transfer_rate: PHPCrawlerDocumentInfo::$data_transfer_rate in PHPCrawlerDocumentInfo.class.php

The average data-transferrate for this document.
$data_transfer_time: PHPCrawlerHTTPRequest::$data_transfer_time in PHPCrawlerHTTPRequest.class.php

The time it took te receive data-packets for the request.
$data_transfer_time: PHPCrawlerDocumentInfo::$data_transfer_time in PHPCrawlerDocumentInfo.class.php

The time it took to receive the document.
$db_analyzed: PHPCrawlerSQLiteURLCache::$db_analyzed in PHPCrawlerSQLiteURLCache.class.php
$DNSCache: PHPCrawlerHTTPRequest::$DNSCache in PHPCrawlerHTTPRequest.class.php

DNS-cache
$DocumentInfoQueue: PHPCrawler::$DocumentInfoQueue in PHPCrawler.class.php

DocumentInfoQueue-object
$documents_received: PHPCrawlerStatus::$documents_received in PHPCrawlerStatus.class.php

Number of documents the crawler-instance received so far
$document_limit: PHPCrawler::$document_limit in PHPCrawler.class.php

Limit of documents to receive
$domain: PHPCrawlerCookieDescriptor::$domain in PHPCrawlerCookieDescriptor.class.php

Cookie-domain
$domain: PHPCrawlerUrlPartsDescriptor::$domain in PHPCrawlerUrlPartsDescriptor.class.php
decideRecevieContent: PHPCrawlerHTTPRequest::decideRecevieContent() in PHPCrawlerHTTPRequest.class.php

Checks whether the content of this page/file should be received (based on the content-type and the applied rules)
decideStreamToFile: PHPCrawlerHTTPRequest::decideStreamToFile() in PHPCrawlerHTTPRequest.class.php

Checks whether the content of this page/file should be streamed directly to file.
deserializeFromFile: PHPCrawlerUtils::deserializeFromFile() in PHPCrawlerUtils.class.php

Returns deserialized data that is stored in a file.
disableExtendedLinkInfo: PHPCrawler::disableExtendedLinkInfo() in PHPCrawler.class.php

Has no function anymore.

top

$entries: SMCCrawler::$entries in SitemapCreatorCrawler.class.php

Array contianing the entries.
$error_code: PHPCrawlerDocumentInfo::$error_code in PHPCrawlerDocumentInfo.class.php

The code of the error that perhaps occured while requesting/receiving the document.
$error_occured: PHPCrawlerDocumentInfo::$error_occured in PHPCrawlerDocumentInfo.class.php

Indicates whether an error occured while requesting/receiving the document.
$error_string: PHPCrawlerDocumentInfo::$error_string in PHPCrawlerDocumentInfo.class.php

A representig, human readable string for the error that perhaps occured while requesting/receiving the document.
$expires: PHPCrawlerCookieDescriptor::$expires in PHPCrawlerCookieDescriptor.class.php

Expire-string, e.g. "Sat, 08-Aug-2020 23:59:08 GMT"
$expire_timestamp: PHPCrawlerCookieDescriptor::$expire_timestamp in PHPCrawlerCookieDescriptor.class.php

Expire-date as unix-timestamp
$extract_tags: PHPCrawlerLinkFinder::$extract_tags in PHPCrawlerLinkFinder.class.php

Numeric array containing all tags to extract links from
enableAggressiveLinkSearch: PHPCrawler::enableAggressiveLinkSearch() in PHPCrawler.class.php

Enables or disables agressive link-searching.
enableAggressiveLinkSearch: PHPCrawlerHTTPRequest::enableAggressiveLinkSearch() in PHPCrawlerHTTPRequest.class.php

Enables/disables aggresive linksearch
enableCookieHandling: PHPCrawler::enableCookieHandling() in PHPCrawler.class.php

Enables or disables cookie-handling.
enableLastModifiedCount: SMCCrawler::enableLastModifiedCount() in SitemapCreatorCrawler.class.php

Enable or diable last-Modified calculation $LastModifiedCount
enableResumption: PHPCrawler::enableResumption() in PHPCrawler.class.php

Prepares the crawler for process-resumption.

top

$file: PHPCrawlerDocumentInfo::$file in PHPCrawlerDocumentInfo.class.php

The name of the requested page or file, e.g. "page.html".
$file: PHPCrawlerUrlPartsDescriptor::$file in PHPCrawlerUrlPartsDescriptor.class.php
$files_received: PHPCrawlerProcessReport::$files_received in PHPCrawlerProcessReport.class.php

The total number of documents the crawler received.
$file_limit_reached: PHPCrawlerProcessReport::$file_limit_reached in PHPCrawlerProcessReport.class.php

Will be TRUE if the page/file-limit was reached.
$find_redirect_urls: PHPCrawlerLinkFinder::$find_redirect_urls in PHPCrawlerLinkFinder.class.php

Specifies whether redirect-links set in http-headers should get found.
$first_content_url: PHPCrawlerStatus::$first_content_url in PHPCrawlerStatus.class.php
$follow_redirects_till_content: PHPCrawler::$follow_redirects_till_content in PHPCrawler.class.php
$found_links_map: PHPCrawlerLinkFinder::$found_links_map in PHPCrawlerLinkFinder.class.php
filterUrls: PHPCrawlerURLFilter::filterUrls() in PHPCrawlerURLFilter.class.php

Filters the given URLs (contained in the given PHPCrawlerDocumentInfo-object) by the given rules.
findLinksInHTMLChunk: PHPCrawlerLinkFinder::findLinksInHTMLChunk() in PHPCrawlerLinkFinder.class.php

Searches for links in the given HTML-chunk and adds found links the the internal link-cache.
findRedirectLinkInHeader: PHPCrawlerLinkFinder::findRedirectLinkInHeader() in PHPCrawlerLinkFinder.class.php

Checks for a redirect-URL in the given http-header and adds it to the internal link-cache.
fromURL: PHPCrawlerUrlPartsDescriptor::fromURL() in PHPCrawlerUrlPartsDescriptor.class.php

Returns the PHPCrawlerUrlPartsDescriptor-object for the given URL.

top

$general_follow_mode: PHPCrawlerURLFilter::$general_follow_mode in PHPCrawlerURLFilter.class.php

The general follow-mode of the crawler
$global_traffic_count: PHPCrawlerHTTPRequest::$global_traffic_count in PHPCrawlerHTTPRequest.class.php

Global counter for traffic this instance of the HTTPRequest-class caused.
getAllBenchmarks: PHPCrawlerBenchmark::getAllBenchmarks() in PHPCrawlerBenchmark.class.php

Returns all registered benchmark-results.
getAllMetaAttributes: PHPCrawlerLinkFinder::getAllMetaAttributes() in PHPCrawlerLinkFinder.class.php

Returns all meta-tag attributes found so far in the document.
getAllURLs: PHPCrawlerMemoryURLCache::getAllURLs() in PHPCrawlerMemoryURLCache.class.php

Returns all URLs currently cached in the URL-cache.
getAllURLs: PHPCrawlerLinkFinder::getAllURLs() in PHPCrawlerLinkFinder.class.php

Returns all URLs/links found so far in the document.
getAllURLs: PHPCrawlerURLCacheBase::getAllURLs() in PHPCrawlerURLCacheBase.class.php

Returns all URLs currently cached in the URL-cache.
getAllURLs: PHPCrawlerSQLiteURLCache::getAllURLs() in PHPCrawlerSQLiteURLCache.class.php

Has no function in this class
getApplyingLines: PHPCrawlerRobotsTxtParser::getApplyingLines() in PHPCrawlerRobotsTxtParser.class.php

Function returns all RAW lines in the given robots.txt-content that apply to the given useragent-string.
getBaseUrlFromMetaTag: PHPCrawlerUtils::getBaseUrlFromMetaTag() in PHPCrawlerUtils.class.php

Returns the base-URL specified in a meta-tag in the given HTML-source
getBasicAuthenticationForUrl: PHPCrawlerUserSendDataCache::getBasicAuthenticationForUrl() in PHPCrawlerUserSendDataCache.class.php

Returns the basic-authentication (username and password) that should be send to the given URL.
getCallCount: PHPCrawlerBenchmark::getCallCount() in PHPCrawlerBenchmark.class.php
getChildPIDs: PHPCrawlerProcessCommunication::getChildPIDs() in PHPCrawlerProcessCommunication.class.php

Returns alls PIDs of all running child-processes
getCookiesForUrl: PHPCrawlerMemoryCookieCache::getCookiesForUrl() in PHPCrawlerMemoryCookieCache.class.php

Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl: PHPCrawlerCookieCacheBase::getCookiesForUrl() in PHPCrawlerCookieCacheBase.class.php

Returns all cookies from the cache that are adressed to the given URL
getCookiesForUrl: PHPCrawlerSQLiteCookieCache::getCookiesForUrl() in PHPCrawlerSQLiteCookieCache.class.php

Returns all cookies from the cache that are adressed to the given URL
getCookiesFromHeader: PHPCrawlerUtils::getCookiesFromHeader() in PHPCrawlerUtils.class.php

Returns all cookies from the give response-header.
getCrawlerId: PHPCrawler::getCrawlerId() in PHPCrawler.class.php

Returns the unique ID of the instance of the crawler
getCrawlerStatus: PHPCrawlerProcessCommunication::getCrawlerStatus() in PHPCrawlerProcessCommunication.class.php

Returns/reads the current crawler-status
getDistinctURLHash: PHPCrawlerURLCacheBase::getDistinctURLHash() in PHPCrawlerURLCacheBase.class.php

Returns the distinct-hash for the given URL that ensures that no URLs a cached more than one time.
getDocumentInfoCount: PHPCrawlerDocumentInfoQueue::getDocumentInfoCount() in PHPCrawlerDocumentInfoQueue.class.php

Returns the current number of PHPCrawlerDocumentInfo-objects in the queue
getElapsedTime: PHPCrawlerBenchmark::getElapsedTime() in PHPCrawlerBenchmark.class.php

Gets the elapsed time for the given benchmark.
getFromHeaderLine: PHPCrawlerCookieDescriptor::getFromHeaderLine() in PHPCrawlerCookieDescriptor.class.php

Returns a PHPCrawlerCookieDescriptor-object initiated by the given cookie-header-line.
getGlobalTrafficCount: PHPCrawlerHTTPRequest::getGlobalTrafficCount() in PHPCrawlerHTTPRequest.class.php

Returns the global traffic this instance of the HTTPRequest-class caused so far.
getHeaderValue: PHPCrawlerUtils::getHeaderValue() in PHPCrawlerUtils.class.php

Gets the value of an header-directive from the given HTTP-header.
getHTTPStatusCode: PHPCrawlerUtils::getHTTPStatusCode() in PHPCrawlerUtils.class.php

Gets the HTTP-statuscode from a given response-header.
getIP: PHPCrawlerDNSCache::getIP() in PHPCrawlerDNSCache.class.php

Returns the IP for the given hostname.
getLastModified: SMCCrawler::getLastModified() in SitemapCreatorCrawler.class.php

get Last-Modified header
getMaxPriorityLevel: PHPCrawlerMemoryURLCache::getMaxPriorityLevel() in PHPCrawlerMemoryURLCache.class.php

Returns the highest priority-level an URL exists in cache for.
getMetaTagAttributes: PHPCrawlerUtils::getMetaTagAttributes() in PHPCrawlerUtils.class.php

Gets all meta-tag atteributes from the given HTML-source.
getmicrotime: PHPCrawlerBenchmark::getmicrotime() in PHPCrawlerBenchmark.class.php

Returns the current time in seconds and milliseconds.
getNextDocumentInfo: PHPCrawlerDocumentInfoQueue::getNextDocumentInfo() in PHPCrawlerDocumentInfoQueue.class.php

Returns a PHPCrawlerDocumentInfo-object from the queue
getNextUrl: PHPCrawlerURLCacheBase::getNextUrl() in PHPCrawlerURLCacheBase.class.php

Returns the next URL from the cache that should be crawled.
getNextUrl: PHPCrawlerSQLiteURLCache::getNextUrl() in PHPCrawlerSQLiteURLCache.class.php

Returns the next URL from the cache that should be crawled.
getNextUrl: PHPCrawlerMemoryURLCache::getNextUrl() in PHPCrawlerMemoryURLCache.class.php

Returns the next URL from the cache that should be crawled.
getPostDataForUrl: PHPCrawlerUserSendDataCache::getPostDataForUrl() in PHPCrawlerUserSendDataCache.class.php

Returns the post-data (key and value) that should be send to the given URL.
getProcessReport: PHPCrawler::getProcessReport() in PHPCrawler.class.php

Retruns summarizing report-information about the crawling-process after it has finished.
getRedirectURLFromHeader: PHPCrawlerUtils::getRedirectURLFromHeader() in PHPCrawlerUtils.class.php

Returns the redirect-URL from the given HTML-header
getReport: PHPCrawler::getReport() in PHPCrawler.class.php

Retruns an array with summarizing report-information after the crawling-process has finished
getRobotsTxtContent: PHPCrawlerRobotsTxtParser::getRobotsTxtContent() in PHPCrawlerRobotsTxtParser.class.php

Retreives the content of a robots.txt-file
getRobotsTxtURL: PHPCrawlerRobotsTxtParser::getRobotsTxtURL() in PHPCrawlerRobotsTxtParser.class.php

Returns the Robots.txt-URL related to the given URL
getRootUrl: PHPCrawlerUtils::getRootUrl() in PHPCrawlerUtils.class.php

Returns the normalized root-URL of the given URL
getSystemTempDir: PHPCrawlerUtils::getSystemTempDir() in PHPCrawlerUtils.class.php

Determinates the systems temporary-directory.
getUrlCount: PHPCrawlerSQLiteURLCache::getUrlCount() in PHPCrawlerSQLiteURLCache.class.php
getUrlPriority: PHPCrawlerURLCacheBase::getUrlPriority() in PHPCrawlerURLCacheBase.class.php

Gets the priority-level of the given URL
go: PHPCrawler::go() in PHPCrawler.class.php

Starts the crawling process in single-process-mode.
goMultiProcessed: PHPCrawler::goMultiProcessed() in PHPCrawler.class.php

Starts the cralwer by using multi processes.

top

$header: PHPCrawlerDocumentInfo::$header in PHPCrawlerDocumentInfo.class.php

The complete HTTP-header the webserver responded with this page or file.
$header_check_callback_function: PHPCrawlerHTTPRequest::$header_check_callback_function in PHPCrawlerHTTPRequest.class.php
$header_raw: PHPCrawlerResponseHeader::$header_raw in PHPCrawlerResponseHeader.class.php

The raw HTTP-header as it was send by the server
$header_send: PHPCrawlerDocumentInfo::$header_send in PHPCrawlerDocumentInfo.class.php

The complete HTTP-request-header the crawler sent to the server (debugging info).
$host: PHPCrawlerDocumentInfo::$host in PHPCrawlerDocumentInfo.class.php

The host-part of the URL of the requested page or file, e.g. "www.foo.com".
$host: PHPCrawlerUrlPartsDescriptor::$host in PHPCrawlerUrlPartsDescriptor.class.php
$host_ip_array: PHPCrawlerDNSCache::$host_ip_array in PHPCrawlerDNSCache.class.php

Array for caching IPs of the requested hostnames
$http_status_code: PHPCrawlerResponseHeader::$http_status_code in PHPCrawlerResponseHeader.class.php

The HTTP-statuscode
$http_status_code: PHPCrawlerDocumentInfo::$http_status_code in PHPCrawlerDocumentInfo.class.php

The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found).
handleDocumentInfo: SMCCrawler::handleDocumentInfo() in SitemapCreatorCrawler.class.php

get access to all information about a page or file the crawler found and received.
handleDocumentInfo: PHPCrawler::handleDocumentInfo() in PHPCrawler.class.php

Override this method to get access to all information about a page or file the crawler found and received.
handleHeaderInfo: PHPCrawler::handleHeaderInfo() in PHPCrawler.class.php

Overridable method that will be called after the header of a document was received and BEFORE the content will be received.
handlePageData: PHPCrawler::handlePageData() in PHPCrawler.class.php

Override this method to get access to all information about a page or file the crawler found and received.
hostInCache: PHPCrawlerDNSCache::hostInCache() in PHPCrawlerDNSCache.class.php

Checks whether a hostname is already cached.

top

$is_chlid_process: PHPCrawler::$is_chlid_process in PHPCrawler.class.php

Flag indicating whether this instance is running in a child-process (if crawler runs multi-processed)
$is_parent_process: PHPCrawler::$is_parent_process in PHPCrawler.class.php

Flag indicating whether this instance is running in the parent-process (if crawler runs multi-processed)
$is_redirect_url: PHPCrawlerURLDescriptor::$is_redirect_url in PHPCrawlerURLDescriptor.class.php

Flag indicating whether this URL was target of an HTTP-redirect.
initChildProcess: PHPCrawler::initChildProcess() in PHPCrawler.class.php

Overridable method that will be called by every used child-process just before it starts the crawling-procedure.
initCrawlerProcess: PHPCrawler::initCrawlerProcess() in PHPCrawler.class.php

Initiates a crawler-process
isUTF8String: PHPCrawlerUtils::isUTF8String() in PHPCrawlerUtils.class.php

Checks wether the given string is an UTF8-encoded string.
isValidUrlString: PHPCrawlerUtils::isValidUrlString() in PHPCrawlerUtils.class.php

Checks whether the given string is a valid, urlencoded URL (by RFC)

top

keepRedirectUrls: PHPCrawlerURLFilter::keepRedirectUrls() in PHPCrawlerURLFilter.class.php

Filters out all non-redirect-URLs from the URLs given in the PHPCrawlerDocumentInfo-object
killChildProcesses: PHPCrawlerProcessCommunication::killChildProcesses() in PHPCrawlerProcessCommunication.class.php

Kills all running child-processes

top

$LastModifiedCount: SMCCrawler::$LastModifiedCount in SitemapCreatorCrawler.class.php

get Last Modified header
$lastResponseHeader: PHPCrawlerHTTPRequest::$lastResponseHeader in PHPCrawlerHTTPRequest.class.php

The last response-header this request-instance received.
$LinkCache: PHPCrawlerLinkFinder::$LinkCache in PHPCrawlerLinkFinder.class.php

Cache for storing found links/urls
$LinkCache: PHPCrawler::$LinkCache in PHPCrawler.class.php

The PHPCrawlerLinkCache-Object
$linkcode: PHPCrawlerURLDescriptor::$linkcode in PHPCrawlerURLDescriptor.class.php

The html-codepart that contained the link to this URL, i.e. "<a href="../foo.html">LINKTEXT</a>"
$LinkFinder: PHPCrawlerHTTPRequest::$LinkFinder in PHPCrawlerHTTPRequest.class.php

Link-finder object
$linksearch_content_types: PHPCrawlerHTTPRequest::$linksearch_content_types in PHPCrawlerHTTPRequest.class.php

Contains all rules defining the content-types defining which documents shoud get checked for links.
$links_followed: PHPCrawlerProcessReport::$links_followed in PHPCrawlerProcessReport.class.php

The total number of links/URLs the crawler found and followed.
$links_followed: PHPCrawlerStatus::$links_followed in PHPCrawlerStatus.class.php

Number of links the crawler-instance followed so far
$links_found: PHPCrawlerDocumentInfo::$links_found in PHPCrawlerDocumentInfo.class.php

An numeric array containing information about all links that were found in the source of the page.
$links_found_url_descriptors: PHPCrawlerDocumentInfo::$links_found_url_descriptors in PHPCrawlerDocumentInfo.class.php

An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page.
$linktext: PHPCrawlerURLDescriptor::$linktext in PHPCrawlerURLDescriptor.class.php

The linktext or html-code the link to this URL was layed over.
$link_priority_array: PHPCrawler::$link_priority_array in PHPCrawler.class.php
$link_raw: PHPCrawlerURLDescriptor::$link_raw in PHPCrawlerURLDescriptor.class.php

The raw link to this URL as it was found in the HTML-source, i.e. "../dunno/index.php"

top

$memory_peak_usage: PHPCrawlerProcessReport::$memory_peak_usage in PHPCrawlerProcessReport.class.php

The peak memory-usage the crawling-process caused.
$meta_attributes: PHPCrawlerLinkFinder::$meta_attributes in PHPCrawlerLinkFinder.class.php

Meta-attributes found in the html-source.
$meta_attributes: PHPCrawlerDocumentInfo::$meta_attributes in PHPCrawlerDocumentInfo.class.php

All meta-tag atteributes found in the source of the document.
$multiprocess_mode: PHPCrawlerProcessCommunication::$multiprocess_mode in PHPCrawlerProcessCommunication.class.php
$multiprocess_mode: PHPCrawler::$multiprocess_mode in PHPCrawler.class.php

Multiprocess-mode the crawler is runnung in.
markUrlAsFollowed: PHPCrawlerURLCacheBase::markUrlAsFollowed() in PHPCrawlerURLCacheBase.class.php

Marks the given URL in the cache as "followed"
markUrlAsFollowed: PHPCrawlerMemoryURLCache::markUrlAsFollowed() in PHPCrawlerMemoryURLCache.class.php

Has no function in this memory-cache.
markUrlAsFollowed: PHPCrawlerSQLiteURLCache::markUrlAsFollowed() in PHPCrawlerSQLiteURLCache.class.php

Marks the given URL in the cache as "followed"

top

$name: PHPCrawlerCookieDescriptor::$name in PHPCrawlerCookieDescriptor.class.php

Cookie-name
normalizeURL: PHPCrawlerUtils::normalizeURL() in PHPCrawlerUtils.class.php

Normalizes an URL

top

$obey_nofollow_tags: PHPCrawlerURLFilter::$obey_nofollow_tags in PHPCrawlerURLFilter.class.php

Defines whether nofollow-tags should get obeyed.
$obey_robots_txt: PHPCrawler::$obey_robots_txt in PHPCrawler.class.php

Defines whether robots.txt-file should be obeyed
$only_count_received_documents: PHPCrawler::$only_count_received_documents in PHPCrawler.class.php

Defines if only documents that were received will be counted.
obeyNoFollowTags: PHPCrawler::obeyNoFollowTags() in PHPCrawler.class.php

Decides whether the crawler should obey "nofollow"-tags
obeyRobotsTxt: PHPCrawler::obeyRobotsTxt() in PHPCrawler.class.php

Decides whether the crawler should parse and obey robots.txt-files.
openConnection: PHPCrawlerSQLiteURLCache::openConnection() in PHPCrawlerSQLiteURLCache.class.php

Creates the sqlite-db-file and opens connection to it.
openConnection: PHPCrawlerDocumentInfoQueue::openConnection() in PHPCrawlerDocumentInfoQueue.class.php

Creates the sqlite-db-file and opens connection to it.
openConnection: PHPCrawlerSQLiteCookieCache::openConnection() in PHPCrawlerSQLiteCookieCache.class.php

Creates the sqlite-db-file and opens connection to it.
openSocket: PHPCrawlerHTTPRequest::openSocket() in PHPCrawlerHTTPRequest.class.php

Opens the socket to the host.

top

$PageRequest: PHPCrawlerRobotsTxtParser::$PageRequest in PHPCrawlerRobotsTxtParser.class.php

A PHPCrawlerHTTPRequest-object for requesting robots.txt-files.
$PageRequest: PHPCrawler::$PageRequest in PHPCrawler.class.php

The PHPCrawlerHTTPRequest-Object
$path: PHPCrawlerDocumentInfo::$path in PHPCrawlerDocumentInfo.class.php

The path in the URL of the requested page or file, e.g. "/page/".
$path: PHPCrawlerCookieDescriptor::$path in PHPCrawlerCookieDescriptor.class.php

Cookie-path
$path: PHPCrawlerUrlPartsDescriptor::$path in PHPCrawlerUrlPartsDescriptor.class.php
$PDO: PHPCrawlerDocumentInfoQueue::$PDO in PHPCrawlerDocumentInfoQueue.class.php
$PDO: PHPCrawlerSQLiteCookieCache::$PDO in PHPCrawlerSQLiteCookieCache.class.php
$PDO: PHPCrawlerSQLiteURLCache::$PDO in PHPCrawlerSQLiteURLCache.class.php

PDO-object for querying SQLite-file.
$porcess_abort_reason: PHPCrawler::$porcess_abort_reason in PHPCrawler.class.php

The reason why the process was aborted/finished.
$port: PHPCrawlerDocumentInfo::$port in PHPCrawlerDocumentInfo.class.php

The port of the URL the request was send to, e.g. 80
$port: PHPCrawlerUrlPartsDescriptor::$port in PHPCrawlerUrlPartsDescriptor.class.php
$post_data: PHPCrawlerHTTPRequest::$post_data in PHPCrawlerHTTPRequest.class.php

Array containing POST-data to send with the request
$post_data: PHPCrawlerUserSendDataCache::$post_data in PHPCrawlerUserSendDataCache.class.php

Array containing post-data to send.
$PreparedInsertStatement: PHPCrawlerSQLiteURLCache::$PreparedInsertStatement in PHPCrawlerSQLiteURLCache.class.php

Prepared statement for inserting URLS into the db-file as PDOStatement-object.
$prepared_statements_created: PHPCrawlerDocumentInfoQueue::$prepared_statements_created in PHPCrawlerDocumentInfoQueue.class.php
$ProcessCommunication: PHPCrawler::$ProcessCommunication in PHPCrawler.class.php

ProcessCommunication-object
$process_runtime: PHPCrawlerProcessReport::$process_runtime in PHPCrawlerProcessReport.class.php

The total time the crawling-process was running in seconds.
$protocol: PHPCrawlerDocumentInfo::$protocol in PHPCrawlerDocumentInfo.class.php

The protocol-part of the URL of the page or file, e.g. "http://"
$protocol: PHPCrawlerUrlPartsDescriptor::$protocol in PHPCrawlerUrlPartsDescriptor.class.php
$proxy: PHPCrawlerHTTPRequest::$proxy in PHPCrawlerHTTPRequest.class.php

The proxy to use
PHPCrawlerCookieCacheBase.class.php: PHPCrawlerCookieCacheBase.class.php in PHPCrawlerCookieCacheBase.class.php
PHPCrawlerMemoryCookieCache.class.php: PHPCrawlerMemoryCookieCache.class.php in PHPCrawlerMemoryCookieCache.class.php
PHPCrawlerSQLiteCookieCache.class.php: PHPCrawlerSQLiteCookieCache.class.php in PHPCrawlerSQLiteCookieCache.class.php
PHPCrawler.class.php: PHPCrawler.class.php in PHPCrawler.class.php
PHPCrawlerBenchmark.class.php: PHPCrawlerBenchmark.class.php in PHPCrawlerBenchmark.class.php
PHPCrawlerCookieDescriptor.class.php: PHPCrawlerCookieDescriptor.class.php in PHPCrawlerCookieDescriptor.class.php
PHPCrawlerDNSCache.class.php: PHPCrawlerDNSCache.class.php in PHPCrawlerDNSCache.class.php
PHPCrawlerDocumentInfo.class.php: PHPCrawlerDocumentInfo.class.php in PHPCrawlerDocumentInfo.class.php
PHPCrawlerHTTPRequest.class.php: PHPCrawlerHTTPRequest.class.php in PHPCrawlerHTTPRequest.class.php
PHPCrawlerLinkFinder.class.php: PHPCrawlerLinkFinder.class.php in PHPCrawlerLinkFinder.class.php
PHPCrawlerProcessReport.class.php: PHPCrawlerProcessReport.class.php in PHPCrawlerProcessReport.class.php
PHPCrawlerResponseHeader.class.php: PHPCrawlerResponseHeader.class.php in PHPCrawlerResponseHeader.class.php
PHPCrawlerRobotsTxtParser.class.php: PHPCrawlerRobotsTxtParser.class.php in PHPCrawlerRobotsTxtParser.class.php
PHPCrawlerStatus.class.php: PHPCrawlerStatus.class.php in PHPCrawlerStatus.class.php
PHPCrawlerURLDescriptor.class.php: PHPCrawlerURLDescriptor.class.php in PHPCrawlerURLDescriptor.class.php
PHPCrawlerURLFilter.class.php: PHPCrawlerURLFilter.class.php in PHPCrawlerURLFilter.class.php
PHPCrawlerUrlPartsDescriptor.class.php: PHPCrawlerUrlPartsDescriptor.class.php in PHPCrawlerUrlPartsDescriptor.class.php
PHPCrawlerUserSendDataCache.class.php: PHPCrawlerUserSendDataCache.class.php in PHPCrawlerUserSendDataCache.class.php
PHPCrawlerUtils.class.php: PHPCrawlerUtils.class.php in PHPCrawlerUtils.class.php
PHPCrawlerDocumentInfoQueue.class.php: PHPCrawlerDocumentInfoQueue.class.php in PHPCrawlerDocumentInfoQueue.class.php
PHPCrawlerProcessCommunication.class.php: PHPCrawlerProcessCommunication.class.php in PHPCrawlerProcessCommunication.class.php
PHPCrawlerMemoryURLCache.class.php: PHPCrawlerMemoryURLCache.class.php in PHPCrawlerMemoryURLCache.class.php
PHPCrawlerSQLiteURLCache.class.php: PHPCrawlerSQLiteURLCache.class.php in PHPCrawlerSQLiteURLCache.class.php
PHPCrawlerURLCacheBase.class.php: PHPCrawlerURLCacheBase.class.php in PHPCrawlerURLCacheBase.class.php
parseRobotsTxt: PHPCrawlerRobotsTxtParser::parseRobotsTxt() in PHPCrawlerRobotsTxtParser.class.php

Parses the robots.txt-file related to the given URL and returns regular-expression-rules corresponding to the containing "disallow"-rules that are adressed to the given user-agent.
PHPCrawler: PHPCrawler in PHPCrawler.class.php

PHPCrawl mainclass
PHPCrawlerBenchmark: PHPCrawlerBenchmark in PHPCrawlerBenchmark.class.php

A static benchmark-class for doing benchmarks within phpcrawl.
PHPCrawlerCookieCacheBase: PHPCrawlerCookieCacheBase in PHPCrawlerCookieCacheBase.class.php

Abstract baseclass for storing cookies.
PHPCrawlerCookieDescriptor: PHPCrawlerCookieDescriptor in PHPCrawlerCookieDescriptor.class.php

Describes a cookie within the PHPCrawl-system.
PHPCrawlerDNSCache: PHPCrawlerDNSCache in PHPCrawlerDNSCache.class.php

Simple DNS-cache used by phpcrawl.
PHPCrawlerDocumentInfo: PHPCrawlerDocumentInfo in PHPCrawlerDocumentInfo.class.php

Contains information about a page or file the crawler found and received during the crawling-process.
PHPCrawlerDocumentInfoQueue: PHPCrawlerDocumentInfoQueue in PHPCrawlerDocumentInfoQueue.class.php

Queue for PHPCrawlerDocumentInfo-objects
PHPCrawlerHTTPRequest: PHPCrawlerHTTPRequest in PHPCrawlerHTTPRequest.class.php

Class for performing HTTP-requests.
PHPCrawlerLinkFinder: PHPCrawlerLinkFinder in PHPCrawlerLinkFinder.class.php

Class for finding links in HTML-documents.
PHPCrawlerMemoryCookieCache: PHPCrawlerMemoryCookieCache in PHPCrawlerMemoryCookieCache.class.php

Class for storing/caching cookies in memory.
PHPCrawlerMemoryURLCache: PHPCrawlerMemoryURLCache in PHPCrawlerMemoryURLCache.class.php

Class for caching/storing URLs/links in memory.
PHPCrawlerProcessCommunication: PHPCrawlerProcessCommunication in PHPCrawlerProcessCommunication.class.php

Class containing methods for process handling and communication
PHPCrawlerProcessReport: PHPCrawlerProcessReport in PHPCrawlerProcessReport.class.php

Contains summarizing information about a crawling-process after the process is finished.
PHPCrawlerResponseHeader: PHPCrawlerResponseHeader in PHPCrawlerResponseHeader.class.php

Describes an HTTP response-header within the phpcrawl-system.
PHPCrawlerRobotsTxtParser: PHPCrawlerRobotsTxtParser in PHPCrawlerRobotsTxtParser.class.php

Class for parsing robots.txt-files.
PHPCrawlerSQLiteCookieCache: PHPCrawlerSQLiteCookieCache in PHPCrawlerSQLiteCookieCache.class.php

Class for storing/caching cookies in a SQLite-db-file.
PHPCrawlerSQLiteURLCache: PHPCrawlerSQLiteURLCache in PHPCrawlerSQLiteURLCache.class.php

Class for caching/storing URLs/links in a SQLite-database-file.
PHPCrawlerStatus: PHPCrawlerStatus in PHPCrawlerStatus.class.php

Describes the current status of an crawler-instance.
PHPCrawlerURLCacheBase: PHPCrawlerURLCacheBase in PHPCrawlerURLCacheBase.class.php

Abstract baseclass for implemented URL-caching classes.
PHPCrawlerURLDescriptor: PHPCrawlerURLDescriptor in PHPCrawlerURLDescriptor.class.php

Describes a URL within the PHPCrawl-system.
PHPCrawlerURLFilter: PHPCrawlerURLFilter in PHPCrawlerURLFilter.class.php

Class for filtering URLs by given filter-rules.
PHPCrawlerUrlPartsDescriptor: PHPCrawlerUrlPartsDescriptor in PHPCrawlerUrlPartsDescriptor.class.php

Describes the single parts of an URL.
PHPCrawlerUserSendDataCache: PHPCrawlerUserSendDataCache in PHPCrawlerUserSendDataCache.class.php

Cache for storing user-data to send with requests, like cookies, post-data and basic-authentications.
PHPCrawlerUtils: PHPCrawlerUtils in PHPCrawlerUtils.class.php

Static util-methods used by phpcrawl.
prepareHTTPRequestQuery: PHPCrawlerHTTPRequest::prepareHTTPRequestQuery() in PHPCrawlerHTTPRequest.class.php

Prepares the given HTTP-query-string for the HTTP-request.
printAllBenchmarks: PHPCrawlerBenchmark::printAllBenchmarks() in PHPCrawlerBenchmark.class.php
processHTTPHeader: PHPCrawlerLinkFinder::processHTTPHeader() in PHPCrawlerLinkFinder.class.php

Processes the response-header of the document.
processRobotsTxt: PHPCrawler::processRobotsTxt() in PHPCrawler.class.php
processUrl: PHPCrawler::processUrl() in PHPCrawler.class.php

Receives and processes the given URL
purgeCache: PHPCrawlerMemoryURLCache::purgeCache() in PHPCrawlerMemoryURLCache.class.php

Has no function in this class.
purgeCache: PHPCrawlerSQLiteURLCache::purgeCache() in PHPCrawlerSQLiteURLCache.class.php

Cleans/purges the URL-cache from inconsistent entries.
purgeCache: PHPCrawlerURLCacheBase::purgeCache() in PHPCrawlerURLCacheBase.class.php

Cleans/purges the URL-cache from inconsistent entries.

top

$query: PHPCrawlerDocumentInfo::$query in PHPCrawlerDocumentInfo.class.php

The query-part of the URL of the requested page or file, e.g. "?x=y".
$queue_max_size: PHPCrawlerDocumentInfoQueue::$queue_max_size in PHPCrawlerDocumentInfoQueue.class.php

top

$received: PHPCrawlerDocumentInfo::$received in PHPCrawlerDocumentInfo.class.php

Flag indicating whether content was received from the page or file.
$received_completely: PHPCrawlerDocumentInfo::$received_completely in PHPCrawlerDocumentInfo.class.php

Flag indicating whether content was completely received from the page or file.
$received_completly: PHPCrawlerDocumentInfo::$received_completly in PHPCrawlerDocumentInfo.class.php

Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl.
$received_to_file: PHPCrawlerDocumentInfo::$received_to_file in PHPCrawlerDocumentInfo.class.php

Will be true if the content was received into temporary file.
$received_to_memory: PHPCrawlerDocumentInfo::$received_to_memory in PHPCrawlerDocumentInfo.class.php

Will be true if the content was received into local memory.
$receive_content_types: PHPCrawlerHTTPRequest::$receive_content_types in PHPCrawlerHTTPRequest.class.php

Contains all rules defining the content-types that should be received
$receive_to_file_content_types: PHPCrawlerHTTPRequest::$receive_to_file_content_types in PHPCrawlerHTTPRequest.class.php

Contains all rules defining the content-types of pages/files that should be streamed directly to a temporary file (instead of to memory)
$referer_url: PHPCrawlerDocumentInfo::$referer_url in PHPCrawlerDocumentInfo.class.php

The complete URL of the page that contained the link to this document.
$refering_linkcode: PHPCrawlerDocumentInfo::$refering_linkcode in PHPCrawlerDocumentInfo.class.php

The html-sourcecode that contained the link to the current document.
$refering_linktext: PHPCrawlerDocumentInfo::$refering_linktext in PHPCrawlerDocumentInfo.class.php

The linktext of the link that "linked" to this document.
$refering_link_raw: PHPCrawlerDocumentInfo::$refering_link_raw in PHPCrawlerDocumentInfo.class.php

Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html")
$refering_url: PHPCrawlerURLDescriptor::$refering_url in PHPCrawlerURLDescriptor.class.php

The URL of the page that contained the link to the URL described here.
$responseHeader: PHPCrawlerDocumentInfo::$responseHeader in PHPCrawlerDocumentInfo.class.php

The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object.
$resumtion_enabled: PHPCrawler::$resumtion_enabled in PHPCrawler.class.php

Flag indicating whether resumtion is activated
$resumtion_enabled: PHPCrawlerProcessCommunication::$resumtion_enabled in PHPCrawlerProcessCommunication.class.php

Flag indicating whether resumtion is activated
$RobotsTxtParser: PHPCrawler::$RobotsTxtParser in PHPCrawler.class.php

The RobotsTxtParser-Object
readResponseContent: PHPCrawlerHTTPRequest::readResponseContent() in PHPCrawlerHTTPRequest.class.php

Reads the response-content.
readResponseHeader: PHPCrawlerHTTPRequest::readResponseHeader() in PHPCrawlerHTTPRequest.class.php

Reads the response-header.
registerChildPID: PHPCrawlerProcessCommunication::registerChildPID() in PHPCrawlerProcessCommunication.class.php

Registers the PID of a child-process
reset: PHPCrawlerBenchmark::reset() in PHPCrawlerBenchmark.class.php

Resets the clock for the given benchmark.
resetAll: PHPCrawlerBenchmark::resetAll() in PHPCrawlerBenchmark.class.php

Resets all clocks for all benchmarks.
resetLinkCache: PHPCrawlerLinkFinder::resetLinkCache() in PHPCrawlerLinkFinder.class.php

Resets/clears the internal link-cache.
resume: PHPCrawler::resume() in PHPCrawler.class.php

Resumes the crawling-process with the given crawler-ID
rmDir: PHPCrawlerUtils::rmDir() in PHPCrawlerUtils.class.php

Deletes a directory recursivly

top

$socket: PHPCrawlerHTTPRequest::$socket in PHPCrawlerHTTPRequest.class.php

The socket used for HTTP-requests
$socketConnectTimeout: PHPCrawlerHTTPRequest::$socketConnectTimeout in PHPCrawlerHTTPRequest.class.php

Timeout-value for socket-connection
$socketReadTimeout: PHPCrawlerHTTPRequest::$socketReadTimeout in PHPCrawlerHTTPRequest.class.php

Socket-read-timeout
$source: PHPCrawlerDocumentInfo::$source in PHPCrawlerDocumentInfo.class.php

Same as "content", the content of the requested document.
$SourceUrl: PHPCrawlerLinkFinder::$SourceUrl in PHPCrawlerLinkFinder.class.php

The URL of the html-source to find links from
$source_domain: PHPCrawlerCookieDescriptor::$source_domain in PHPCrawlerCookieDescriptor.class.php

The domain the cookie was send from
$source_url: PHPCrawlerCookieDescriptor::$source_url in PHPCrawlerCookieDescriptor.class.php

The URL the cookie was send from
$source_url: PHPCrawlerResponseHeader::$source_url in PHPCrawlerResponseHeader.class.php

The URL of the website the header was recevied from.
$sqlite_db_file: PHPCrawlerSQLiteURLCache::$sqlite_db_file in PHPCrawlerSQLiteURLCache.class.php
$sqlite_db_file: PHPCrawlerSQLiteCookieCache::$sqlite_db_file in PHPCrawlerSQLiteCookieCache.class.php
$sqlite_db_file: PHPCrawlerDocumentInfoQueue::$sqlite_db_file in PHPCrawlerDocumentInfoQueue.class.php
$starting_url: PHPCrawlerURLFilter::$starting_url in PHPCrawlerURLFilter.class.php

The full qualified and normalized URL the crawling-prpocess was started with.
$starting_url: PHPCrawler::$starting_url in PHPCrawler.class.php

The URL the crawler should start with.
$starting_url_parts: PHPCrawlerURLFilter::$starting_url_parts in PHPCrawlerURLFilter.class.php

The URL-parts of the starting-url.
sendRequest: PHPCrawlerHTTPRequest::sendRequest() in PHPCrawlerHTTPRequest.class.php

Sends the HTTP-request and receives the page/file.
sendRequestHeader: PHPCrawlerHTTPRequest::sendRequestHeader() in PHPCrawlerHTTPRequest.class.php

Send the request-header.
serializeToFile: PHPCrawlerUtils::serializeToFile() in PHPCrawlerUtils.class.php

Serializes data (objects, arrayse etc.) and writes it to the given file.
setAggressiveLinkExtraction: PHPCrawler::setAggressiveLinkExtraction() in PHPCrawler.class.php

Alias for enableAggressiveLinkSearch()
setBaseURL: PHPCrawlerURLFilter::setBaseURL() in PHPCrawlerURLFilter.class.php

Sets the base-URL of the crawling process some rules relate to
setBasicAuthentication: PHPCrawlerHTTPRequest::setBasicAuthentication() in PHPCrawlerHTTPRequest.class.php

Sets basic-authentication login-data for protected URLs.
setConnectionTimeout: PHPCrawler::setConnectionTimeout() in PHPCrawler.class.php

Sets the timeout in seconds for connection tries to hosting webservers.
setContentSizeLimit: PHPCrawler::setContentSizeLimit() in PHPCrawler.class.php

Sets the content-size-limit for content the crawler should receive from documents.
setContentSizeLimit: PHPCrawlerHTTPRequest::setContentSizeLimit() in PHPCrawlerHTTPRequest.class.php

Sets the size-limit in bytes for content the request should receive.
setCookieHandling: PHPCrawler::setCookieHandling() in PHPCrawler.class.php

Alias for enableCookieHandling()
setCrawlerStatus: PHPCrawlerProcessCommunication::setCrawlerStatus() in PHPCrawlerProcessCommunication.class.php

Sets/writes the current crawler-status
setFindRedirectURLs: PHPCrawlerHTTPRequest::setFindRedirectURLs() in PHPCrawlerHTTPRequest.class.php

Specifies whether redirect-links set in http-headers should get searched for.
setFollowMode: PHPCrawler::setFollowMode() in PHPCrawler.class.php

Sets the basic follow-mode of the crawler.
setFollowRedirects: PHPCrawler::setFollowRedirects() in PHPCrawler.class.php

Defines whether the crawler should follow redirects sent with headers by a webserver or not.
setFollowRedirectsTillContent: PHPCrawler::setFollowRedirectsTillContent() in PHPCrawler.class.php

Defines whether the crawler should follow HTTP-redirects until first content was found, regardless of defined filter-rules and follow-modes.
setHeaderCheckCallbackFunction: PHPCrawlerHTTPRequest::setHeaderCheckCallbackFunction() in PHPCrawlerHTTPRequest.class.php
setLinkExtractionTags: PHPCrawlerHTTPRequest::setLinkExtractionTags() in PHPCrawlerHTTPRequest.class.php

Sets the html-tags from which to extract/find links from.
setLinkExtractionTags: PHPCrawler::setLinkExtractionTags() in PHPCrawler.class.php

Sets the list of html-tags the crawler should search for links in.
setLinksFoundArray: PHPCrawlerDocumentInfo::setLinksFoundArray() in PHPCrawlerDocumentInfo.class.php

Workaround-method, copies and converts the array $links_found_url_descriptors to $links_found.
setPageLimit: PHPCrawler::setPageLimit() in PHPCrawler.class.php

Sets a limit to the number of pages/files the crawler should follow.
setPort: PHPCrawler::setPort() in PHPCrawler.class.php

Sets the port to connect to for crawling the starting-url set in setUrl().
setProxy: PHPCrawler::setProxy() in PHPCrawler.class.php

Assigns a proxy-server the crawler should use for all HTTP-Requests.
setProxy: PHPCrawlerHTTPRequest::setProxy() in PHPCrawlerHTTPRequest.class.php
setSourceUrl: PHPCrawlerLinkFinder::setSourceUrl() in PHPCrawlerLinkFinder.class.php

Sets the source-URL of the document to find links in
setStreamTimeout: PHPCrawler::setStreamTimeout() in PHPCrawler.class.php

Sets the timeout in seconds for waiting for data on an established server-connection.
setTmpFile: PHPCrawler::setTmpFile() in PHPCrawler.class.php

Has no function anymore.
setTmpFile: PHPCrawlerHTTPRequest::setTmpFile() in PHPCrawlerHTTPRequest.class.php

Sets the temporary file to use when content of found documents should be streamed directly into a temporary file.
setTrafficLimit: PHPCrawler::setTrafficLimit() in PHPCrawler.class.php

Sets a limit to the number of bytes the crawler should receive alltogether during crawling-process.
setUrl: PHPCrawlerHTTPRequest::setUrl() in PHPCrawlerHTTPRequest.class.php

Sets the URL for the request.
setURL: PHPCrawler::setURL() in PHPCrawler.class.php

Sets the URL of the first page the crawler should crawl (root-page).
setUrlCacheType: PHPCrawler::setUrlCacheType() in PHPCrawler.class.php

Defines what type of cache will be internally used for caching URLs.
setUserAgentString: PHPCrawler::setUserAgentString() in PHPCrawler.class.php

Sets the "User-Agent" identification-string that will be send with HTTP-requests.
setWorkingDirectory: PHPCrawler::setWorkingDirectory() in PHPCrawler.class.php

Sets the working-directory the crawler should use for storing temporary data.
SMCCrawler: SMCCrawler in SitemapCreatorCrawler.class.php

Loading external PHPCrawler-class
sort2dArray: PHPCrawlerUtils::sort2dArray() in PHPCrawlerUtils.class.php

Sorts a twodimensiolnal array.
splitURL: PHPCrawlerUtils::splitURL() in PHPCrawlerUtils.class.php

Splits an URL into its parts
starControllerProcessLoop: PHPCrawler::starControllerProcessLoop() in PHPCrawler.class.php

Starts the loop of the controller-process (main-process).
start: PHPCrawlerBenchmark::start() in PHPCrawlerBenchmark.class.php

Starts the clock for the given benchmark.
startChildProcessLoop: PHPCrawler::startChildProcessLoop() in PHPCrawler.class.php

Starts the loop of a child-process.
stop: PHPCrawlerBenchmark::stop() in PHPCrawlerBenchmark.class.php

Stops the benchmark-clock for the given benchmark.

top

$temporary_benchmarks: PHPCrawlerBenchmark::$temporary_benchmarks in PHPCrawlerBenchmark.class.php
$tmpFile: PHPCrawlerHTTPRequest::$tmpFile in PHPCrawlerHTTPRequest.class.php

The TMP-File to use when a page/file should be streamed to file.
$top_lines_processed: PHPCrawlerLinkFinder::$top_lines_processed in PHPCrawlerLinkFinder.class.php

Flag indicating whether the top lines of the HTML-source were processed.
$traffic_limit: PHPCrawler::$traffic_limit in PHPCrawler.class.php

Limit of bytes to receive
$traffic_limit_reached: PHPCrawlerProcessReport::$traffic_limit_reached in PHPCrawlerProcessReport.class.php

Will be TRUE if the crawling-process stopped becaus the traffic-limit was reached.
$traffic_limit_reached: PHPCrawlerDocumentInfo::$traffic_limit_reached in PHPCrawlerDocumentInfo.class.php

Indicated whether the traffic-limit set by the user was reached after downloading this document.
toArray: PHPCrawlerUrlPartsDescriptor::toArray() in PHPCrawlerUrlPartsDescriptor.class.php
toArray: PHPCrawlerProcessReport::toArray() in PHPCrawlerProcessReport.class.php

Returns an array with all properties of this class.
toArray: PHPCrawlerDocumentInfo::toArray() in PHPCrawlerDocumentInfo.class.php

Returns an array with all properties of this class.

top

$url: PHPCrawlerDocumentInfo::$url in PHPCrawlerDocumentInfo.class.php

The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y".
$urlcache_purged: PHPCrawler::$urlcache_purged in PHPCrawler.class.php

Flag indicating whether the URL-cahce was purged at the beginning of a crawling-process
$UrlDescriptor: PHPCrawlerHTTPRequest::$UrlDescriptor in PHPCrawlerHTTPRequest.class.php

The URL for the request as PHPCrawlerURLDescriptor-object
$UrlFilter: PHPCrawler::$UrlFilter in PHPCrawler.class.php

The UrlFilter-Object
$urls: PHPCrawlerMemoryURLCache::$urls in PHPCrawlerMemoryURLCache.class.php
$url_cache_type: PHPCrawler::$url_cache_type in PHPCrawler.class.php

URl cache-type.
$url_distinct_property: PHPCrawlerURLCacheBase::$url_distinct_property in PHPCrawlerURLCacheBase.class.php

Defines which property of an URL is used to ensure that each URL is only cached once.
$url_filter_rules: PHPCrawlerURLFilter::$url_filter_rules in PHPCrawlerURLFilter.class.php

Array containing regex-rules for URLs that should NOT be followed.
$url_follow_rules: PHPCrawlerURLFilter::$url_follow_rules in PHPCrawlerURLFilter.class.php

Array containing regex-rules for URLs that should be followed.
$url_map: PHPCrawlerMemoryURLCache::$url_map in PHPCrawlerMemoryURLCache.class.php
$url_parts: PHPCrawlerHTTPRequest::$url_parts in PHPCrawlerHTTPRequest.class.php

The parts of the URL for the request as returned by PHPCrawlerUtils::splitURL()
$url_priorities: PHPCrawlerURLCacheBase::$url_priorities in PHPCrawlerURLCacheBase.class.php
$url_rebuild: PHPCrawlerURLDescriptor::$url_rebuild in PHPCrawlerURLDescriptor.class.php

The complete, full qualified and normalized URL
$userAgentString: PHPCrawlerHTTPRequest::$userAgentString in PHPCrawlerHTTPRequest.class.php

The user-agent-string
$UserSendDataCache: PHPCrawler::$UserSendDataCache in PHPCrawler.class.php

UserSendDataCahce-object.
$user_abort: PHPCrawlerProcessReport::$user_abort in PHPCrawlerProcessReport.class.php

Will be TRUE if the crawling-process stopped because the overridable function handleDocumentInfo() returned a negative value.
updateCrawlerStatus: PHPCrawlerProcessCommunication::updateCrawlerStatus() in PHPCrawlerProcessCommunication.class.php

Updates the status of the crawler
URLHASH_NONE: PHPCrawlerURLCacheBase::URLHASH_NONE in PHPCrawlerURLCacheBase.class.php
URLHASH_RAWLINK: PHPCrawlerURLCacheBase::URLHASH_RAWLINK in PHPCrawlerURLCacheBase.class.php
URLHASH_URL: PHPCrawlerURLCacheBase::URLHASH_URL in PHPCrawlerURLCacheBase.class.php
urlHostInCache: PHPCrawlerDNSCache::urlHostInCache() in PHPCrawlerDNSCache.class.php

Checks whether the hostname of the given URL is already cached
urlMatchesRules: PHPCrawlerURLFilter::urlMatchesRules() in PHPCrawlerURLFilter.class.php

Checks whether a given URL matches the rules.

top

$value: PHPCrawlerCookieDescriptor::$value in PHPCrawlerCookieDescriptor.class.php

Cookie-value

top

$working_base_directory: PHPCrawler::$working_base_directory in PHPCrawler.class.php

Base-directory for temporary directories
$working_directory: PHPCrawlerProcessCommunication::$working_directory in PHPCrawlerProcessCommunication.class.php
$working_directory: PHPCrawler::$working_directory in PHPCrawler.class.php

Complete path to the temporary directory
$working_directory: PHPCrawlerDocumentInfoQueue::$working_directory in PHPCrawlerDocumentInfoQueue.class.php