README ¶
Fitter + Fitter CLI
Fitter - new way for collect information from the API's/Websites
Fitter CLI - small cli command which provide result from Fitter for test/debug/home usage
Fitter Lib - library which provide functional of fitter CLI as a library
Way to collect information
- Server - parsing response from some API's or http request(usage of http.Client)
- Browser - emulate real browser using chromium + docker + playwright/cypress and get DOM information
- Static - parsing static string as data
Format which can be parsed
- JSON - parsing JSON to get specific information
- XML - parsing xml tree to get specific information
- HTML - parsing dom tree to get specific information
- XPath - parsing dom tree to get specific information but by xpath
Use like a library
go get github.com/PxyUp/fitter
package main
import (
"fmt"
"github.com/PxyUp/fitter/lib"
"github.com/PxyUp/fitter/pkg/config"
"log"
"net/http"
)
func main() {
res, err := lib.Parse(&config.Item{
ConnectorConfig: &config.ConnectorConfig{
ResponseType: config.Json,
Url: "https://random-data-api.com/api/appliance/random_appliance",
ServerConfig: &config.ServerConnectorConfig{
Method: http.MethodGet,
},
},
Model: &config.Model{
ObjectConfig: &config.ObjectConfig{
Fields: map[string]*config.Field{
"my_id": {
BaseField: &config.BaseField{
Type: config.Int,
Path: "id",
},
},
"generated_id": {
BaseField: &config.BaseField{
Generated: &config.GeneratedFieldConfig{
UUID: &config.UUIDGeneratedFieldConfig{},
},
},
},
"generated_array": {
ArrayConfig: &config.ArrayConfig{
RootPath: "@this|@keys",
ItemConfig: &config.ObjectConfig{
Field: &config.BaseField{
Type: config.String,
},
},
},
},
},
},
},
}, nil, nil, nil, nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.ToJson())
}
Output:
{
"generated_array": ["id","uid","brand","equipment"],
"my_id": 6000,
"generated_id": "26b08b73-2f2e-444d-bcf2-dac77ac3130e"
}
How to use Fitter
Download latest version from the release page
or locally:
go run cmd/fitter/main.go --path=./examples/config_api.json
Arguments
- --path - string[config.yaml] - path for the configuration of the Fitter
- --verbose - bool[false] - enable logging
- --plugins - string[""] - path for plugins for Fitter
- --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
How to use Fitter_CLI
Download latest version from the release page
or locally:
go run cmd/cli/main.go --path=./examples/cli/config_cli.json
Arguments
- --path - string[config.yaml] - path for the configuration of the Fitter_CLI
- --copy - bool[false] - copy information into clipboard
- --pretty - bool[true] - make readable result(also affect on copy)
- --verbose - bool[false] - enable logging
- --omit-error-pretty - bool[false] - Provide pure value if pretty is invalid
- --plugins - string[""] - path for plugins for Fitter
- --log-level - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
- --input - string[""] - specify input value for formatting. Examples:
--input=\""124"\"
--input=124
--input='{"test": 5}'
./fitter_cli_${VERSION} --path=./examples/cli/config_cli.json --copy=true
Examples:
- Server version HackerNews + Quotes + Guardian News - using API + HTML + XPath parsing
- Chromium version Guardian News + Quotes - using HTML parsing + browser emulation
- Docker version Docker version: Guardian News + Quotes - using HTML parsing + browser from Docker image
- Playwright version Playwright version: Guardian News + Quotes - using HTML parsing + browser from Playwright framework
- Playwright version Playwright version: England Cities + Weather - using HTML + XPath parsing + browser from Playwright framework
- JSON version Generate pagination - using static connector for generate pagination array
- Server version Get current time - get time from url and format it
Configuration
Connector
It is the way how you fetch the data
type ConnectorConfig struct {
ResponseType ParserType `json:"response_type" yaml:"response_type"`
Url string `json:"url" yaml:"url"`
Attempts uint32 `json:"attempts" yaml:"attempts"`
StaticConfig *StaticConnectorConfig `json:"static_config" yaml:"static_config"`
IntSequenceConfig *IntSequenceConnectorConfig `json:"int_sequence_config" yaml:"int_sequence_config"`
ServerConfig *ServerConnectorConfig `json:"server_config" yaml:"server_config"`
BrowserConfig *BrowserConnectorConfig `yaml:"browser_config" json:"browser_config"`
PluginConnectorConfig *PluginConnectorConfig `json:"plugin_connector_config" yaml:"plugin_connector_config"`
ReferenceConfig *ReferenceConnectorConfig `yaml:"reference_config" json:"reference_config"`
FileConfig *FileConnectorConfig `json:"file_config" yaml:"file_config"`
}
- ResponseType - enum["HTML", "json","xpath"] - in which format data comes from the connector
- Attempts - how many attempts to use for fetch data by connector
- Url - define which address to request
Config can be one of:
- ServerConfig
- BrowserConfig
- StaticConfig
- PluginConnectorConfig
- ReferenceConfig
- IntSequenceConfig
- FileConfig
Example:
{
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
PluginConnectorConfig
Connector can be defined via plugin system. For use that you need apply next flags to Fitter/Cli(location of the plugins):
... --plugins=./examples/plugin
--plugins - looking for all files with ".so" extension in provided folder(subdirs excluded)
type PluginConnectorConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
{
"name": "connector",
"config": {
"name": "Elon"
}
}
- Name - name of the plugin
- Config - json config of the plugin
How to build plugin
Build plugin
go build -buildmode=plugin -gcflags="all=-N -l" -o examples/plugin/connector.so examples/plugin/hardcoder/connector.go
Make sure you export Plugin variable which implements pl.ConnectorPlugin interface
Example for CLI:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_plugin.json#L5
Plugin example:
package main
import (
"encoding/json"
"fmt"
"github.com/PxyUp/fitter/pkg/config"
"github.com/PxyUp/fitter/pkg/logger"
"github.com/PxyUp/fitter/pkg/builder"
pl "github.com/PxyUp/fitter/pkg/plugins/plugin"
)
var (
_ pl.ConnectorPlugin = &plugin{}
Plugin plugin
)
type plugin struct {
log logger.Logger
Name string `json:"name" yaml:"name"`
}
func (pl *plugin) Get(parsedValue builder.Interfacable, index *uint32, input builder.Interfacable) ([]byte, error) {
return []byte(fmt.Sprintf(`{"name": "%s"}`, pl.Name)), nil
}
func (pl *plugin) SetConfig(cfg *config.PluginConnectorConfig, logger logger.Logger) {
pl.log = logger
if cfg.Config != nil {
err := json.Unmarshal(cfg.Config, pl)
if err != nil {
pl.log.Errorw("cant unmarshal plugin configuration", "error", err.Error())
return
}
}
}
ReferenceConnectorConfig
Connector which allow get prefetched data from references
type ReferenceConnectorConfig struct {
Name string `yaml:"name" json:"name"`
}
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L66
- Name - reference name from references map
IntSequenceConnectorConfig
Improved version of static connector which generate int sequence as result
type IntSequenceConnectorConfig struct {
Start int `json:"start" yaml:"start"`
End int `json:"end" yaml:"end"`
Step int `json:"step" yaml:"step"`
}
- Start[0] - start point for generation(included)
- End[0] - end point for generation(excluded from final result like range in any lang)
- Step[1] - interval for sequence
Example
{
"start": 0,
"end": 2
// Generate [0, 1]
}
FileConnectorConfig
Connector type which fetch data from provided file
type FileConnectorConfig struct {
Path string `yaml:"path" json:"path"`
UseFormatting bool `yaml:"use_formatting" json:"use_formatting"`
}
- Path - file path. Support formatting
- UseFormatting[false] - use formatting file content or not
StaticConnectorConfig
Connector type which fetch data from provided string
type StaticConnectorConfig struct {
Value string `json:"value" yaml:"value"`
Raw json.RawMessage `json:"raw" yaml:"raw"`
}
- Value - static string as data, can be html, json
- Raw - accept raw json. Example. Also support formatting
Example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json#L5
{
"value": "[1,2,3,4,5]"
}
ServerConnectorConfig
Connector type which fetch data using golang http.Client(server side request like curl)
type ServerConnectorConfig struct {
Method string `json:"method" yaml:"method"`
Headers map[string]string `yaml:"headers" json:"headers"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Body string `yaml:"body" json:"body"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
- Method - supported all http methods: GET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
- Headers - predefine headers for using during request can be injected into value
- Timeout[sec] - default 60sec timeout or used provided
- Body - body of the request, parsed value can be injected
- Proxy - setup proxy for request config
Example:
{
"method": "GET",
"proxy": {
"server": "http://localhost:8080",
"username": "pyx"
}
}
Right now default timeout it is 10 sec.
type ProxyConfig struct {
// Proxy to be used for all requests. HTTP and SOCKS proxies are supported, for example
// `http://myproxy.com:3128` or `socks5://myproxy.com:3128`. Short form `myproxy.com:3128`
// is considered an HTTP proxy.
Server string `json:"server" yaml:"server"`
// Optional username to use if HTTP proxy requires authentication.
Username string `json:"username" yaml:"username"`
// Optional password to use if HTTP proxy requires authentication.
Password string `json:"password" yaml:"password"`
}
- Server - address with schema of proxy server. Also support formatting
- Username - username for proxy(can be empty). Also support formatting
- Password - password for proxy(can be empty). Also support formatting
{
"server": "http://localhost:8080",
"username": "pyx"
}
- FITTER_HTTP_WORKER - int[1000] - default concurrent HTTP workers
BrowserConnectorConfig
Connector type which emulate fetching of data via browser
type BrowserConnectorConfig struct {
Chromium *ChromiumConfig `json:"chromium" yaml:"chromium"`
Docker *DockerConfig `json:"docker" yaml:"docker"`
Playwright *PlaywrightConfig `json:"playwright" yaml:"playwright"`
}
Config can be one of:
- Chromium - use local installed Chromium for fetch data
- Docker - use docker as service for spin up container for fetch data
- Playwright - use playwright framework for fetch data
Example:
{
"docker": {
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
}
Chromium
Use locally installed Chromium for fetch the data
type ChromiumConfig struct {
Path string `yaml:"path" json:"path"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
}
- Path - path to binary of Chromium
- Timeout[sec] - timeout for execution of the chromium
- Wait[msec] - timeout of page loading
- Flags - flags for Chromium default: "--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-extensions", "--no-sandbox"
Example:
{
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
Docker
Use Docker for spin up container for fetch data
type DockerConfig struct {
Image string `yaml:"image" json:"image"`
EntryPoint string `json:"entry_point" yaml:"entry_point"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
Purge bool `json:"purge" yaml:"purge"`
NoPull bool `yaml:"no_pull" json:"no_pull"`
PullTimeout uint32 `yaml:"pull_timeout" json:"pull_timeout"`
}
Docker default image: docker.io/zenika/alpine-chrome
- Image - image for the docker registry(provide with registry host)
- EntryPoint - cmd which will be run inside container
- Timeout[sec] - timeout for run container(without pulling image)
- Wait[msec] - timeout of page loading (works just for Chromium based containers)
- Flags - cmd arguments for run containers, default for Chromium based: "--no-sandbox","--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-gpu"
- Purge - should we remove container after work done(like docker rm)
- NoPull - prevent pulling of the image
- PullTimeout - define timeout for pull contains
- DOCKER_HOST - string - (EnvOverrideHost) to set the URL to the docker server.
- DOCKER_API_VERSION - string - (EnvOverrideAPIVersion) to set the version of the API to use, leave empty for latest.
- DOCKER_CERT_PATH - string - (EnvOverrideCertPath) to specify the directory from which to load the TLS certificates (ca.pem, cert.pem, key.pem).
- DOCKER_TLS_VERIFY - bool - (EnvTLSVerify) to enable or disable TLS verification (off by default)
Example:
{
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
Playwright
Run browsers via playwright framework
type PlaywrightConfig struct {
Browser PlaywrightBrowser `json:"browser" yaml:"browser"`
Install bool `yaml:"install" json:"install"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
TypeOfWait *playwright.WaitUntilState `json:"type_of_wait" yaml:"type_of_wait"`
PreRunScript string `json:"pre_run_script" yaml:"pre_run_script"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
- Browser - enum["Chromium", "FireFox", "WebKit"] - which browser to use
- Install - should we install browser
- Timeout[sec] - timeout to run playwright
- Wait[sec] - timeout of page loading
- TypeOfWait - enum["load", "domcontentloaded", "networkidle", "commit"] which state of page we waiting, default is "load"
- PreRunScript[""] - script which will be executed before reading content of the page. Also support placeholder {PL}
- Proxy - setup proxy for request config
Example
{
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
Model
With model we define result of the scrapping
type Model struct {
ObjectConfig *ObjectConfig `yaml:"object_config" json:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
BaseField *BaseField `json:"base_field" yaml:"base_field"`
IsArray bool `json:"is_array" yaml:"is_array"`
}
Config can be one of:
- ObjectConfig - configuration of object format
- ArrayConfig - configuration of array format
- BaseField - configuration of single/generated field
- IsArray - bool[false] - force indicate that field is array(usable in case of model field with base field)
Example:
{
"object_config": {}
}
ObjectConfig
Configuration of the object and fields
type ObjectConfig struct {
Fields map[string]*Field `json:"fields" yaml:"fields"`
Field *BaseField `json:"field" yaml:"field"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
}
Config can be one of:
- Fields - map of each field definition; key - field name, value - configuration
- Field - used for element of array; fields which will be deserialized like basic type like "string", "int" and etc (used here for case array of basic types)
- ArrayConfig - used for element of array; deserialization array of array
Example:
{
"fields": {
"title": {
"base_field": {
"type": "string",
"path": "type"
}
}
}
}
ArrayConfig
Configuration of the array and fields
type ArrayConfig struct {
RootPath string `json:"root_path" yaml:"root_path"`
Reverse bool `yaml:"reverse" json:"reverse"`
ItemConfig *ObjectConfig `json:"item_config" yaml:"item_config"`
LengthLimit uint32 `json:"length_limit" yaml:"length_limit"`
StaticConfig *StaticArrayConfig `json:"static_array" yaml:"static_array"`
}
- RootPath - selector for find root element of the array or repeated element in case of html parsing, size of array will be amount of children element under the root
- Reverse - bool[false] - indicate that need use reverse iteration(n to 1)
- LengthLimit - for define size of array only for generated(not working for static)
Config can be one of:
- ItemConfig - configuration of each element of the array
- StaticConfig - configuration of the static array
Example:
{
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
Field
Common of the field
type Field struct {
BaseField *BaseField `json:"base_field" yaml:"base_field"`
ObjectConfig *ObjectConfig `json:"object_config" yaml:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
FirstOf []*Field `json:"first_of" yaml:"first_of"`
}
Config can be one of:
- BaseField - fields which will be deserialized like basic type like "string", "int" and etc
- ObjectConfig - in case our field in nested object
- ArrayConfig - in case our field in array
- FirstOf - first not empty resolved field will be selected
Example:
{
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
BaseField
In case we want get some static information or generate new one
type BaseField struct {
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
HTMLAttribute string `json:"html_attribute" yaml:"html_attribute"`
Generated *GeneratedFieldConfig `yaml:"generated" json:"generated"`
FirstOf []*BaseField `json:"first_of" yaml:"first_of"`
}
- FieldType - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object", "html", "raw_string"] - static field for parse. Important: type html will only works from connector which return HTML (HTMLAttribute - have no effect in this case). Example
- Path - selector(relative in case it is array child) for parsing
- HTMLAttribute - extra value which have effect only in HTML parsing via goquery. Here you can specify which attribute need to be parsed.
Important: by default "string" type trimmed and all special chars is replaced, if you need plain string use "raw_string"
Config can be one of or empty:
- Generated - field can be generated one which custom configuration
- FirstOf - first not empty resolved field will be selected
Examples
{
"generated": {
"uuid": {}
}
}
{
"type": "string",
"path": "text()"
}
GeneratedFieldConfig
Provide functionality of generating field on the flight
type GeneratedFieldConfig struct {
UUID *UUIDGeneratedFieldConfig `yaml:"uuid" json:"uuid"`
Static *StaticGeneratedFieldConfig `yaml:"static" json:"static"`
Formatted *FormattedFieldConfig `json:"formatted" yaml:"formatted"`
Plugin *PluginFieldConfig `yaml:"plugin" json:"plugin"`
Calculated *CalculatedConfig `yaml:"calculated" json:"calculated"`
File *FileFieldConfig `yaml:"file" json:"file"`
Model *ModelField `yaml:"model" json:"model"`
FileStorageField *FileStorageField `json:"file_storage" yaml:"file_storage"`
}
Config can be one of:
- UUID - generate random UUID V4
- Static - generate static field
- Formatted - format field
- Model - model generated from the other connector and model
- Plugin - plugin field
- Calculated - calculated field
- File - file field (for download file from server)
- FileStorage - file field which can be saved to local file
Examples:
{
"uuid": {}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L58
{
"model": {
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 3,
"browser_config": {
"url": "http://www.quotationspage.com/random.php",
"chromium": {
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
}
}
}
}
UUID
Generate random UUID V4 on the flight, can be used for generate uniq id
type UUIDGeneratedFieldConfig struct {
Regexp string `yaml:"regexp" json:"regexp"`
}
- Regexp - provide matcher which can be used for get part of generated uuid
Static
Generate static field
type StaticGeneratedFieldConfig struct {
Type FieldType `yaml:"type" json:"type"`
Value string `json:"value" yaml:"value"`
}
- Type - enum["null", "boolean", "string", "int","int64","float","float64"] - type of the field
- Value - string value of the field
Example
{
"type": "int",
"value": "65"
}
Formatted Field Config
Generate formatted field which will pass value from parent base field
type FormattedFieldConfig struct {
Template string `yaml:"template" json:"template"`
}
Example: https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L98
{
"template": "https://news.ycombinator.com/item?id={PL}"
}
File Storage Field
Field can be used for store field result as local file
type FileStorageField struct {
Content string `json:"content" yaml:"content"`
Raw json.RawMessage `yaml:"raw" yaml:"raw"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
Append bool `json:"append" yaml:"append"`
}
- Content - template string for content. Important: can be with inject of the parent value as a string
- Raw - raw json content of field. Important: can be with inject of the parent value as a string
- FileName - local file name for storing file. By default, it is try get FileName from header, after that from url. Important: can be with inject of the parent value as a string.
- Path - local file parent directory for storing file. Default path it is process directory. Important: can be with inject of the parent value as a string
- Append[false] - append to file or not
{
"content": "{{{id}}}, {{{message}}}\n",
"append": true,
"file_name": "{{{id}}}.csv",
"path": "/Users/pxyup/fitter/examples/cli/test/csv"
}
File Field
Field can be used for download file from server locally
type FileFieldConfig struct {
Config *ServerConnectorConfig `yaml:"config" json:"config"`
Url string `yaml:"url" json:"url"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
}
- Config - ServerConfig use default fitter http.Client for send request
- Url - url of the image. Important: URL in the connector can be with inject of the parent value as a string
- FileName - local file name for storing file. By default, it is try get FileName from header, after that from url. Important: can be with inject of the parent value as a string.
- Path - local file parent directory for storing file. Default path it is process directory. Important: can be with inject of the parent value as a string
Result of the field will be local file path as string
{
"url": "https://images.shcdn.de/resized/w680/p/dekostoff-gobelinstoff-panel-oriental-cat-46-x-46_P19-KP_2.jpg",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
With propagated URL (inject of the parent value as a string)
{
"url": "https://picsum.photos{PL}",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
Config example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image.json
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image_multiple.json
Calculated field
Field can generate different types depends from expression
type CalculatedConfig struct {
Type FieldType `yaml:"type" json:"type"`
Expression string `yaml:"expression" json:"expression"`
}
- Type - resulting type of expression\
- Expression - expression for calculation (we use this lib for calculated expression)
FNull - alias for builder.Nullvalue
FNil - alias for nil
isNull(value T) - function for check is value is FNull
fRes - it is raw(with proper type) result from the parsing base field
fIndex - it is index in parent array(only if parent was array field)
fResJson - it is JSON string representation of the raw result
{
"type": "bool",
"expression": "fRes > 500"
}
Plugin field
Field can be some external plugin for fitter
type PluginFieldConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
- Name - name of the plugin(without extension just name)
- Config - json config of the plugin
Model Field
Field type which can be generated on the flight by news model and connector
type ModelField struct {
// Type of parsing
ConnectorConfig *ConnectorConfig `yaml:"connector_config" json:"connector_config"`
// Model of the response
Model *Model `yaml:"model" json:"model"`
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
Expression string `yaml:"expression" json:"expression"`
}
- ConnectorConfig - which connector to use. Important: URL in the connector can be with inject of the parent value as a string
- Model - configuration of the underhood model
- Type - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object"] - type of generated field
- Path - in case we cant extract some information from generated field we can use json selector for extract
- Expression - string which can be used for post processing of the Model (ignoring path field)
Examples:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L60
{
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json#L37
{
"type": "string",
"path": "temp.temp",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "//div[@id='forecast_list_ul']//td/b/a/@href",
"generated": {
"model": {
"type": "string",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 4,
"url": "https://openweathermap.org{PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "FireFox",
"type_of_wait": "networkidle"
}
}
}
}
}
}
}
}
}
},
"connector_config": {
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
}
Static Array Config
Provide static(fixed length) array generation
type StaticArrayConfig struct {
Items map[uint32]*Field `yaml:"items" json:"items"`
Length uint32 `yaml:"length" json:"length"`
}
- Items - map[uint32]*Field - key is index in array, value is field definition
- Length - if set(1+) can be used for define custom length of array
Examples:
{
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"2": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
Placeholder list
- {PL} - for inject value
- {INDEX} - for inject index in parent array
- {HUMAN_INDEX} - for inject index in parent array in human way
- {{{json_path}}} - will get information from propagated "object"/"array" field
- {{{RefName=SomeName}}} - get reference value by name. Example
- {{{RefName=SomeName json.path}}} - get reference value by name and extract value by json path. Example
- {{{FromEnv=ENV_KEY}}} - get value from environment variable
- {{{FromExp=fRes + 5 + fIndex}}} - get value from the expression. Predefined values
- {{{FromInput=.}}} or {{{FromInput=json.path}}} - get value from input of trigger or library
Examples:
{{{FromExp="{{{FromEnv=TEST_VAL}}}" + "hello"}}}
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}} Object={{{value}}} {PL} Env={{{FromEnv=TEST_VAL}}} {INDEX} {HUMAN_INDEX}
References
Special map which prefetched(before any processing) and can be user for connector or for placeholder
Can be used for:
- Cache jwt token and use them in headers
- Cache values
- Etc
Reference
type Reference struct {
*ModelField
Expire uint32 `yaml:"expire" json:"expire"`
}
- ModelField - is embedded struct, you can use same fields
- Expire[sec] - duration when reference is expired after fetching. Not set => forever cached. Set to 0 => every time re-fetch. Set to n > 0 => cached for n second
For Fitter
type RefMap map[string]*Reference
type Config struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
For Fitter Cli
type RefMap map[string]*Reference
type CliItem struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
- References - map[string]*Reference - object where is key if ReferenceName (can be user for connector or placeholder) and value is Reference
- Limits
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L2
{
"references": {
"TokenRef": {
"expire": 10,
"connector_config": {
"response_type": "json",
"static_config": {
"value": "\"plain token\""
}
},
"model": {
"base_field": {
"type": "string"
}
}
},
"TokenObjectRef": {
"connector_config": {
"response_type": "json",
"static_config": {
"value": "{\"token\":\"token from object\"}"
}
},
"model": {
"object_config": {
"fields": {
"token": {
"base_field": {
"type": "string",
"path": "token"
}
}
}
}
}
}
}
}
Limits
Provide limitation for prevent DDOS, big usage of memory
type Limits struct {
HostRequestLimiter HostRequestLimiter `yaml:"host_request_limiter" json:"host_request_limiter"`
ChromiumInstance uint32 `yaml:"chromium_instance" json:"chromium_instance"`
DockerContainers uint32 `yaml:"docker_containers" json:"docker_containers"`
PlaywrightInstance uint32 `yaml:"playwright_instance" json:"playwright_instance"`
}
- HostRequestLimiter - map[string]int64 - limitation per host name, key is host, value is amount of parallel request(usage for server connector)
- ChromiumInstance - amount of parallel chromium instance
- DockerContainers - amount of parallel docker instance
- PlaywrightInstance - amount of parallel playwright instance
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L2
{
"limits": {
"host_request_limiter": {
"hacker-news.firebaseio.com": 5
},
"chromium_instance": 3,
"docker_containers": 3,
"playwright_instance": 3
}
}
Roadmap
- Add browser scenario for preparing, after parsing
- Add scrolling support for scenario
- Add pagination support for scenario
- Add notification methods for Fitter: Webhook/Queue