Documentation
¶
Overview ¶
Package config takes care of the configuration file parsing.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { // CloneDir is the path to the folder where all repositories are cloned. CloneDir string `json:"clone_dir"` // TarRepos tells whether repositories shall be stored as tar archives. TarRepos bool `json:"tar_repositories"` // TmpDir can be used to specify a temporary working directory. If // left unspecified, the default system temporary directory will be used. // If you have a ramdisk, you are advised to use it here. TmpDir string `json:"tmp_dir"` // TmpDirFileSizeLimit can be used to specify the maximum size in GB of an // object to be temporarily placed in TmpDir for processing. Files of size // larger than this value will not be processed in TmpDir. TmpDirFileSizeLimit float64 `json:"tmp_dir_file_size_limit"` // MaxFetcherWorkers defines the maximum number of workers for the // repositories fetching task. // It defaults to 1 but if your machine has good I/O throughput and a good // CPU, you probably want to increase this conservative value for // performance reasons. Note that fetching is I/O and networked bound // more than CPU bound and hence you probably do not want to increase this // value too much. MaxFetcherWorkers uint `json:"max_fetcher_workers"` // FetchTimeInterval corresponds to the time to wait betweeb 2 full // repositories fetching periods. FetchTimeInterval string `json:"fetch_time_interval"` // FetchLanguages is the list of programming languages to fetch. // If the list is empty or nil, the fetcher will fetch all repositories, // independently of the language. FetchLanguages []string `json:"fetch_languages"` // ThrottlerWaitTime can be used to specify how much time to wait, in // seconds, before resuming normal operations if the error rate is too high // (defaults to 1800). ThrottlerWaitTime uint `json:"throttler_wait_time"` // SlidingWindowSize can be used to specify the sliding window size to // consider for error throttling (defaults to 60). SlidingWindowSize uint `json:"throttler_sliding_window_size"` // LeakInterval corresponds to the time, in milliseconds, the throttler // waits before discarding an error (defaults to 1000, ie 1 second). LeakInterval uint `json:"throttler_leak_interval"` // Crawlers is a group of crawlers configuration. Crawlers []CrawlerConfig `json:"crawlers"` // CrawlingTimeInterval corresponds to the time to wait between 2 full // crawling periods. CrawlingTimeInterval string `json:"crawling_time_interval"` // Database is the database configuration. Database DatabaseConfig `json:"database"` }
Config is the main configuration structure.
func ReadConfig ¶
ReadConfig reads a JSON formatted configuration file, verifies the values of the configuration parameters and fills the Config structure.
type CrawlerConfig ¶
type CrawlerConfig struct { // Type defines the crawler type (eg: "github"). Type string `json:"type"` // Languages is the list of programming languages of interest. Languages []string `json:"languages"` // Limit limits the number of repositories to crawl. Set this value to 0 to // not use a limit. Otherwise, crawling will stop when "limit" repositories // have been fetched. // Note that the behavior is slightly different whether UseSearchAPI is set // to true or not. When using the search API, this limit correspond to the // number of repositories to crawl per language listed in "languages". // Otherwise, this is a global limit, regardless of the language. Limit int64 `json:"limit"` // SinceID corresponds to the repository ID (eg: GitHub repository ID in // the case of the github crawler) from which to start querying repositories. // Note that this value is ignored when using the search API. SinceID int `json:"since_id"` // Fork indicate whether "fork" repositories need to be crawled or not. Fork bool `json:"fork"` // OAuthAccessToken is the API token. If not provided, crawld will work but // the number of API call is usually limited to a low number. // For instance, in the case of the GitHub crawler, unauthenticated // requests are limited to 60 per hour where authenticated requests goes up // to 5000 per hour. OAuthAccessToken string `json:"oauth_access_token"` // UseSearchAPI specifies whether to use the search API or not. The number // of results returned by a search API is usually limited. For instance, // the GitHub search API limits the results to 1000 repositories. // In the case of the github crawler, this means that the maximum number of // repositories that can be crawled is 1000 per language (the github crawler // orders the results by repository popularity with regard to the number of // stars). When a lot of data is wanted, this option shall therefore be set // to false. UseSearchAPI bool `json:"use_search_api"` }
CrawlerConfig is a configuration for a crawler.
type DatabaseConfig ¶
type DatabaseConfig struct { // HostName is the hostname, or IP address, of the database server. HostName string `json:"hostname"` // Port is the PostgreSQL port. Port uint `json:"port"` // UserName is the PostgreSQL user that has access to the database. UserName string `json:"username"` // Password is the password of the database user. Password string `json:"password"` // DBName is the database name. DBName string `json:"dbname"` // SSLMode defines the SSL mode for the connection to the database. // Refer to sslModes for the possible values and their meaning. SSLMode string `json:"ssl_mode"` }
DatabaseConfig is a configuration for PostgreSQL database connection information
Click to show internal directories.
Click to hide internal directories.