README ¶
Fitter + Fitter CLI
Fitter - new way for collect information from the API's/Websites
Fitter CLI - small cli command which provide result from Fitter for test/debug/home usage
Fitter Lib - library which provide functional of fitter CLI as a library
Way to collect information
- Server - parsing response from some API's or http request(usage of http.Client)
- Browser - emulate real browser using chromium + docker + playwright/cypress and get DOM information
- Static - parsing static string as data
Format which can be parsed
- JSON - parsing JSON to get specific information
- XML - parsing xml tree to get specific information
- HTML - parsing dom tree to get specific information
- XPath - parsing dom tree to get specific information but by xpath
Use like a library
go get github.com/PxyUp/fitter
package main
import (
"fmt"
"github.com/PxyUp/fitter/lib"
"github.com/PxyUp/fitter/pkg/config"
"log"
"net/http"
)
func main() {
res, err := lib.Parse(&config.Item{
ConnectorConfig: &config.ConnectorConfig{
ResponseType: config.Json,
Url: "https://random-data-api.com/api/appliance/random_appliance",
ServerConfig: &config.ServerConnectorConfig{
Method: http.MethodGet,
},
},
Model: &config.Model{
ObjectConfig: &config.ObjectConfig{
Fields: map[string]*config.Field{
"my_id": {
BaseField: &config.BaseField{
Type: config.Int,
Path: "id",
},
},
"generated_id": {
BaseField: &config.BaseField{
Generated: &config.GeneratedFieldConfig{
UUID: &config.UUIDGeneratedFieldConfig{},
},
},
},
"generated_array": {
ArrayConfig: &config.ArrayConfig{
RootPath: "@this|@keys",
ItemConfig: &config.ObjectConfig{
Field: &config.BaseField{
Type: config.String,
},
},
},
},
},
},
},
}, nil, nil, nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.ToJson())
}
Output:
{
"generated_array": ["id","uid","brand","equipment"],
"my_id": 6000,
"generated_id": "26b08b73-2f2e-444d-bcf2-dac77ac3130e"
}
How to use Fitter
Download latest version from the release page
or locally:
go run cmd/fitter/main.go --path=./examples/config_api.json
Arguments
- --path - string[config.yaml] - path for the configuration of the Fitter
- --verbose - bool[false] - enable logging
- --plugins - string[""] - path for plugins for Fitter
- --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
How to use Fitter_CLI
Download latest version from the release page
or locally:
go run cmd/cli/main.go --path=./examples/cli/config_cli.json
Arguments
- --path - string[config.yaml] - path for the configuration of the Fitter_CLI
- --copy - bool[false] - copy information into clipboard
- --pretty - bool[true] - make readable result(also affect on copy)
- --verbose - bool[false] - enable logging
- --omit-error-pretty - bool[false] - Provide pure value if pretty is invalid
- --plugins - string[""] - path for plugins for Fitter
- --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
./fitter_cli_${VERSION} --path=./examples/cli/config_cli.json --copy=true
Examples:
- Server version HackerNews + Quotes + Guardian News - using API + HTML + XPath parsing
- Chromium version Guardian News + Quotes - using HTML parsing + browser emulation
- Docker version Docker version: Guardian News + Quotes - using HTML parsing + browser from Docker image
- Playwright version Playwright version: Guardian News + Quotes - using HTML parsing + browser from Playwright framework
- Playwright version Playwright version: England Cities + Weather - using HTML + XPath parsing + browser from Playwright framework
- JSON version Generate pagination - using static connector for generate pagination array
- Server version Get current time - get time from url and format it
Configuration
Connector
It is the way how you fetch the data
type ConnectorConfig struct {
ResponseType ParserType `json:"response_type" yaml:"response_type"`
Url string `json:"url" yaml:"url"`
Attempts uint32 `json:"attempts" yaml:"attempts"`
StaticConfig *StaticConnectorConfig `json:"static_config" yaml:"static_config"`
ServerConfig *ServerConnectorConfig `json:"server_config" yaml:"server_config"`
BrowserConfig *BrowserConnectorConfig `yaml:"browser_config" json:"browser_config"`
PluginConnectorConfig *PluginConnectorConfig `json:"plugin_connector_config" yaml:"plugin_connector_config"`
ReferenceConfig *ReferenceConnectorConfig `yaml:"reference_config" json:"reference_config"`
IntSequenceConfig *IntSequenceConnectorConfig `json:"int_sequence_config" yaml:"int_sequence_config"`
}
- ResponseType - enum["HTML", "json","xpath"] - in which format data comes from the connector
- Attempts - how many attempts to use for fetch data by connector
- Url - define which address to request
Config can be one of:
Example:
{
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
PluginConnectorConfig
Connector can be defined via plugin system. For use that you need apply next flags to Fitter/Cli(location of the plugins):
... --plugins=./examples/plugin
--plugins - looking for all files with ".so" extension in provided folder(subdirs excluded)
type PluginConnectorConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
{
"name": "connector",
"config": {
"name": "Elon"
}
}
- Name - name of the plugin
- Config - json config of the plugin
How to build plugin
Build plugin
go build -buildmode=plugin -gcflags="all=-N -l" -o examples/plugin/connector.so examples/plugin/hardcoder/connector.go
Make sure you export Plugin variable which implements pl.ConnectorPlugin interface
Example for CLI:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_plugin.json#L5
Plugin example:
package main
import (
"encoding/json"
"fmt"
"github.com/PxyUp/fitter/pkg/config"
"github.com/PxyUp/fitter/pkg/logger"
"github.com/PxyUp/fitter/pkg/builder"
pl "github.com/PxyUp/fitter/pkg/plugins/plugin"
)
var (
_ pl.ConnectorPlugin = &plugin{}
Plugin plugin
)
type plugin struct {
log logger.Logger
Name string `json:"name" yaml:"name"`
}
func (pl *plugin) Get(parsedValue builder.Jsonable, index *uint32) ([]byte, error) {
return []byte(fmt.Sprintf(`{"name": "%s"}`, pl.Name)), nil
}
func (pl *plugin) SetConfig(cfg *config.PluginConnectorConfig, logger logger.Logger) {
pl.log = logger
if cfg.Config != nil {
err := json.Unmarshal(cfg.Config, pl)
if err != nil {
pl.log.Errorw("cant unmarshal plugin configuration", "error", err.Error())
return
}
}
}
ReferenceConnectorConfig
Connector which allow get prefetched data from references
type ReferenceConnectorConfig struct {
Name string `yaml:"name" json:"name"`
}
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L66
- Name - reference name from references map
IntSequenceConnectorConfig
Improved version of static connector which generate int sequence as result
type IntSequenceConnectorConfig struct {
Start int `json:"start" yaml:"start"`
End int `json:"end" yaml:"end"`
Step int `json:"step" yaml:"step"`
}
- Start[0] - start point for generation(included)
- End[0] - end point for generation(excluded from final result like range in any lang)
- Step[1] - interval for sequence
Example
{
"start": 0,
"end": 2
// Generate [0, 1]
}
StaticConnectorConfig
Connector type which fetch data from provided string
type StaticConnectorConfig struct {
Value string `json:"value" yaml:"value"`
}
- Value - static string as data, can be html, json
Example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json#L5
{
"value": "[1,2,3,4,5]"
}
ServerConnectorConfig
Connector type which fetch data using golang http.Client(server side request like curl)
type ServerConnectorConfig struct {
Method string `json:"method" yaml:"method"`
Headers map[string]string `yaml:"headers" json:"headers"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Body string `yaml:"body" json:"body"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
- Method - supported all http methods: GET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
- Headers - predefine headers for using during request can be injected into value
- Timeout[sec] - default 60sec timeout or used provided
- Body - body of the request, parsed value can be injected
- Proxy - setup proxy for request config
Example:
{
"method": "GET",
"proxy": {
"server": "http://localhost:8080",
"username": "pyx"
}
}
Right now default timeout it is 10 sec.
type ProxyConfig struct {
// Proxy to be used for all requests. HTTP and SOCKS proxies are supported, for example
// `http://myproxy.com:3128` or `socks5://myproxy.com:3128`. Short form `myproxy.com:3128`
// is considered an HTTP proxy.
Server string `json:"server" yaml:"server"`
// Optional username to use if HTTP proxy requires authentication.
Username string `json:"username" yaml:"username"`
// Optional password to use if HTTP proxy requires authentication.
Password string `json:"password" yaml:"password"`
}
- Server - address with schema of proxy server
- Username - username for proxy(can be empty)
- Password - password for proxy(can be empty)
{
"server": "http://localhost:8080",
"username": "pyx"
}
- FITTER_HTTP_WORKER - int[1000] - default concurrent HTTP workers
BrowserConnectorConfig
Connector type which emulate fetching of data via browser
type BrowserConnectorConfig struct {
Chromium *ChromiumConfig `json:"chromium" yaml:"chromium"`
Docker *DockerConfig `json:"docker" yaml:"docker"`
Playwright *PlaywrightConfig `json:"playwright" yaml:"playwright"`
}
Config can be one of:
- Chromium - use local installed Chromium for fetch data
- Docker - use docker as service for spin up container for fetch data
- Playwright - use playwright framework for fetch data
Example:
{
"docker": {
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
}
Chromium
Use locally installed Chromium for fetch the data
type ChromiumConfig struct {
Path string `yaml:"path" json:"path"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
}
- Path - path to binary of Chromium
- Timeout[sec] - timeout for execution of the chromium
- Wait[msec] - timeout of page loading
- Flags - flags for Chromium default: "--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-extensions", "--no-sandbox"
Example:
{
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
Docker
Use Docker for spin up container for fetch data
type DockerConfig struct {
Image string `yaml:"image" json:"image"`
EntryPoint string `json:"entry_point" yaml:"entry_point"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
Purge bool `json:"purge" yaml:"purge"`
NoPull bool `yaml:"no_pull" json:"no_pull"`
PullTimeout uint32 `yaml:"pull_timeout" json:"pull_timeout"`
}
Docker default image: docker.io/zenika/alpine-chrome
- Image - image for the docker registry(provide with registry host)
- EntryPoint - cmd which will be run inside container
- Timeout[sec] - timeout for run container(without pulling image)
- Wait[msec] - timeout of page loading (works just for Chromium based containers)
- Flags - cmd arguments for run containers, default for Chromium based: "--no-sandbox","--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-gpu"
- Purge - should we remove container after work done(like docker rm)
- NoPull - prevent pulling of the image
- PullTimeout - define timeout for pull contains
- DOCKER_HOST - string - (EnvOverrideHost) to set the URL to the docker server.
- DOCKER_API_VERSION - string - (EnvOverrideAPIVersion) to set the version of the API to use, leave empty for latest.
- DOCKER_CERT_PATH - string - (EnvOverrideCertPath) to specify the directory from which to load the TLS certificates (ca.pem, cert.pem, key.pem).
- DOCKER_TLS_VERIFY - bool - (EnvTLSVerify) to enable or disable TLS verification (off by default)
Example:
{
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
Playwright
Run browsers via playwright framework
type PlaywrightConfig struct {
Browser PlaywrightBrowser `json:"browser" yaml:"browser"`
Install bool `yaml:"install" json:"install"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
TypeOfWait *playwright.WaitUntilState `json:"type_of_wait" yaml:"type_of_wait"`
PreRunScript string `json:"pre_run_script" yaml:"pre_run_script"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
- Browser - enum["Chromium", "FireFox", "WebKit"] - which browser to use
- Install - should we install browser
- Timeout[sec] - timeout to run playwright
- Wait[sec] - timeout of page loading
- TypeOfWait - enum["load", "domcontentloaded", "networkidle", "commit"] which state of page we waiting, default is "load"
- PreRunScript[""] - script which will be executed before reading content of the page. Also support placeholder {PL}
- Proxy - setup proxy for request config
Example
{
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
Model
With model we define result of the scrapping
type Model struct {
ObjectConfig *ObjectConfig `yaml:"object_config" json:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
BaseField *BaseField `json:"base_field" yaml:"base_field"`
}
Config can be one of:
- ObjectConfig - configuration of object format
- ArrayConfig - configuration of array format
- BaseField - configuration of single/generated field
Example:
{
"object_config": {}
}
ObjectConfig
Configuration of the object and fields
type ObjectConfig struct {
Fields map[string]*Field `json:"fields" yaml:"fields"`
Field *BaseField `json:"field" yaml:"field"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
}
Config can be one of:
- Fields - map of each field definition; key - field name, value - configuration
- Field - used for element of array; fields which will be deserialized like basic type like "string", "int" and etc (used here for case array of basic types)
- ArrayConfig - used for element of array; deserialization array of array
Example:
{
"fields": {
"title": {
"base_field": {
"type": "string",
"path": "type"
}
}
}
}
ArrayConfig
Configuration of the array and fields
type ArrayConfig struct {
RootPath string `json:"root_path" yaml:"root_path"`
ItemConfig *ObjectConfig `json:"item_config" yaml:"item_config"`
LengthLimit uint32 `json:"length_limit" yaml:"length_limit"`
StaticConfig *StaticArrayConfig `json:"static_array" yaml:"static_array"`
}
- RootPath - selector for find root element of the array or repeated element in case of html parsing, size of array will be amount of children element under the root
- LengthLimit - for define size of array only for generated(not working for static)
Config can be one of:
- ItemConfig - configuration of each element of the array
- StaticConfig - configuration of the static array
Example:
{
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
Field
Common of the field
type Field struct {
BaseField *BaseField `json:"base_field" yaml:"base_field"`
ObjectConfig *ObjectConfig `json:"object_config" yaml:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
FirstOf []*Field `json:"first_of" yaml:"first_of"`
}
Config can be one of:
- BaseField - fields which will be deserialized like basic type like "string", "int" and etc
- ObjectConfig - in case our field in nested object
- ArrayConfig - in case our field in array
- FirstOf - first not empty resolved field will be selected
Example:
{
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
BaseField
In case we want get some static information or generate new one
type BaseField struct {
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
HTMLAttribute string `json:"html_attribute" yaml:"html_attribute"`
Generated *GeneratedFieldConfig `yaml:"generated" json:"generated"`
FirstOf []*BaseField `json:"first_of" yaml:"first_of"`
}
- FieldType - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object", "html", "raw_string"] - static field for parse. Important: type html will only works from connector which return HTML (HTMLAttribute - have no effect in this case). Example
- Path - selector(relative in case it is array child) for parsing
- HTMLAttribute - extra value which have effect only in HTML parsing via goquery. Here you can specify which attribute need to be parsed.
Important: by default "string" type trimmed and all special chars is replaced, if you need plain string use "raw_string"
Config can be one of or empty:
- Generated - field can be generated one which custom configuration
- FirstOf - first not empty resolved field will be selected
Examples
{
"generated": {
"uuid": {}
}
}
{
"type": "string",
"path": "text()"
}
GeneratedFieldConfig
Provide functionality of generating field on the flight
type GeneratedFieldConfig struct {
UUID *UUIDGeneratedFieldConfig `yaml:"uuid" json:"uuid"`
Static *StaticGeneratedFieldConfig `yaml:"static" json:"static"`
Formatted *FormattedFieldConfig `json:"formatted" yaml:"formatted"`
Plugin *PluginFieldConfig `yaml:"plugin" json:"plugin"`
Calculated *CalculatedConfig `yaml:"calculated" json:"calculated"`
File *FileFieldConfig `yaml:"file" json:"file"`
Model *ModelField `yaml:"model" json:"model"`
}
Config can be one of:
- UUID - generate random UUID V4
- Static - generate static field
- Formatted - format field
- Model - model generated from the other connector and model
- Plugin - plugin field
- Calculated - calculated field
- File - file field (for download file from server)
Examples:
{
"uuid": {}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L58
{
"model": {
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 3,
"browser_config": {
"url": "http://www.quotationspage.com/random.php",
"chromium": {
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
}
}
}
}
UUID
Generate random UUID V4 on the flight, can be used for generate uniq id
type UUIDGeneratedFieldConfig struct {
Regexp string `yaml:"regexp" json:"regexp"`
}
- Regexp - provide matcher which can be used for get part of generated uuid
Static
Generate static field
type StaticGeneratedFieldConfig struct {
Type FieldType `yaml:"type" json:"type"`
Value string `json:"value" yaml:"value"`
}
- Type - enum["null", "boolean", "string", "int","int64","float","float64"] - type of the field
- Value - string value of the field
Example
{
"type": "int",
"value": "65"
}
Formatted Field Config
Generate formatted field which will pass value from parent base field
type FormattedFieldConfig struct {
Template string `yaml:"template" json:"template"`
}
Example: https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L98
{
"template": "https://news.ycombinator.com/item?id={PL}"
}
File Field
Field can be used for download file from server locally
type FileFieldConfig struct {
Config *ServerConnectorConfig `yaml:"config" json:"config"`
Url string `yaml:"url" json:"url"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
}
- Config - ServerConfig use default fitter http.Client for send request
- Url - url of the image. Important: URL in the connector can be with inject of the parent value as a string
- FileName - local file name for storing file. By default, it is try get FileName from header, after that from url. Important: can be with inject of the parent value as a string.
- Path - local file parent directory for storing file. Default path it is process directory. Important: can be with inject of the parent value as a string
Result of the field will be local file path as string
{
"url": "https://images.shcdn.de/resized/w680/p/dekostoff-gobelinstoff-panel-oriental-cat-46-x-46_P19-KP_2.jpg",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
With propagated URL (inject of the parent value as a string)
{
"url": "https://picsum.photos{PL}",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
Config example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image.json
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image_multiple.json
Calculated field
Field can generate different types depends from expression
type CalculatedConfig struct {
Type FieldType `yaml:"type" json:"type"`
Expression string `yaml:"expression" json:"expression"`
}
- Type - resulting type of expression\
- Expression - expression for calculation (we use this lib for calculated expression)
Predefined values:
fRes - it is raw(with proper type) result from the parsing base field
fIndex - it is index in parent array(only if parent was array field)
{
"type": "bool",
"expression": "fRes > 500"
}
Plugin field
Field can be some external plugin for fitter
type PluginFieldConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
- Name - name of the plugin(without extension just name)
- Config - json config of the plugin
Model Field
Field type which can be generated on the flight by news model and connector
type ModelField struct {
// Type of parsing
ConnectorConfig *ConnectorConfig `yaml:"connector_config" json:"connector_config"`
// Model of the response
Model *Model `yaml:"model" json:"model"`
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
Expression string `yaml:"expression" json:"expression"`
}
- ConnectorConfig - which connector to use. Important: URL in the connector can be with inject of the parent value as a string
- Model - configuration of the underhood model
- Type - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object"] - type of generated field
- Path - in case we cant extract some information from generated field we can use json selector for extract
- Expression - string which can be used for post processing of the Model (ignoring path field)
Examples:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L60
{
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json#L37
{
"type": "string",
"path": "temp.temp",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "//div[@id='forecast_list_ul']//td/b/a/@href",
"generated": {
"model": {
"type": "string",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 4,
"url": "https://openweathermap.org{PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "FireFox",
"type_of_wait": "networkidle"
}
}
}
}
}
}
}
}
}
},
"connector_config": {
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
}
Static Array Config
Provide static(fixed length) array generation
type StaticArrayConfig struct {
Items map[uint32]*Field `yaml:"items" json:"items"`
Length uint32 `yaml:"length" json:"length"`
}
- Items - map[uint32]*Field - key is index in array, value is field definition
- Length - if set(1+) can be used for define custom length of array
Examples:
{
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"2": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
Placeholder list
- {PL} - for inject value
- {INDEX} - for inject index in parent array
- {HUMAN_INDEX} - for inject index in parent array in human way
- {{{json_path}}} - will get information from propagated "object"/"array" field
- {{{RefName=SomeName}}} - get reference value by name. Example
- {{{RefName=SomeName json.path}}} - get reference value by name and extract value by json path. Example
References
Special map which prefetched(before any processing) and can be user for connector or for placeholder
Can be used for:
- Cache jwt token and use them in headers
- Cache values
- Etc
Reference
type Reference struct {
*ModelField
Expire uint32 `yaml:"expire" json:"expire"`
}
- ModelField - is embedded struct, you can use same fields
- Expire[sec] - duration when reference is expired after fetching (0 means no expired)
For Fitter
type RefMap map[string]*Reference
type Config struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
For Fitter Cli
type RefMap map[string]*Reference
type CliItem struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
- References - map[string]*Reference - object where is key if ReferenceName (can be user for connector or placeholder) and value is Reference
- Limits
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L2
{
"references": {
"TokenRef": {
"expire": 10,
"connector_config": {
"response_type": "json",
"static_config": {
"value": "\"plain token\""
}
},
"model": {
"base_field": {
"type": "string"
}
}
},
"TokenObjectRef": {
"connector_config": {
"response_type": "json",
"static_config": {
"value": "{\"token\":\"token from object\"}"
}
},
"model": {
"object_config": {
"fields": {
"token": {
"base_field": {
"type": "string",
"path": "token"
}
}
}
}
}
}
}
}
Limits
Provide limitation for prevent DDOS, big usage of memory
type Limits struct {
HostRequestLimiter HostRequestLimiter `yaml:"host_request_limiter" json:"host_request_limiter"`
ChromiumInstance uint32 `yaml:"chromium_instance" json:"chromium_instance"`
DockerContainers uint32 `yaml:"docker_containers" json:"docker_containers"`
PlaywrightInstance uint32 `yaml:"playwright_instance" json:"playwright_instance"`
}
- HostRequestLimiter - map[string]int64 - limitation per host name, key is host, value is amount of parallel request(usage for server connector)
- ChromiumInstance - amount of parallel chromium instance
- DockerContainers - amount of parallel docker instance
- PlaywrightInstance - amount of parallel playwright instance
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L2
{
"limits": {
"host_request_limiter": {
"hacker-news.firebaseio.com": 5
},
"chromium_instance": 3,
"docker_containers": 3,
"playwright_instance": 3
}
}
Roadmap
- Add browser scenario for preparing, after parsing
- Add scrolling support for scenario
- Add pagination support for scenario
- Add notification methods for Fitter: Webhook/Queue