http
Generic Extractor capable of using the HTTP response from an external API for
constructing the following assets types:
The user specified script has access to the response, if the API call was
successful, and can use it for constructing and emitting assets using a custom
script. Currently, Tengo is the only supported script engine.
Refer Tengo documentation for script language syntax and supported
functionality - https://github.com/d5/tengo/tree/v2.13.0#references.
Tengo standard library modules can also be imported and used if
required (except the os
module).
Usage
source:
scope: gotocompany
type: http
config:
request:
route_pattern: "/api/v1/endpoint"
url: "https://example.com/api/v1/endpoint"
query_params:
- key: param_key
value: param_value
method: "POST"
headers:
"User-Id": "1a4336bc-bc6a-4972-83c1-d6426b4d79c3"
content_type: application/json
accept: application/json
body:
key: value
timeout: 5s
success_codes: [ 200 ]
concurrency: 3
script:
engine: tengo
source: |
asset := new_asset("user")
// modify the asset using 'response'...
emit(asset)
Key |
Value |
Example |
Description |
Required? |
request |
Object |
see Request |
The configuration for constructing and sending HTTP request. |
✅ |
success_codes |
[]int |
[200] |
The list of status codes that would be considered as a successful response. Default is [200] . |
✘ |
concurrency |
int |
5 |
Number of concurrent child requests to execute. Default is 5 |
✘ |
script.engine |
string |
tengo |
Script engine. Only "tengo" is supported currently |
✅ |
script.source |
string |
see Worked Example. |
Tengo script used to map the response into 0 or more assets. |
✅ |
script.max_allocs |
int |
10000 |
The max number of object allocations allowed during the script run time. Default is 5000 . |
✘ |
script.max_const_objects |
int |
1000 |
The maximum number of constant objects in the compiled script. Default is 500 . |
✘ |
Request
Key |
Value |
Example |
Description |
Required? |
route_pattern |
string |
/api/v1/endpoint |
A route pattern to use in metrics as http.route tag. |
✅ |
url |
string |
http://example.com/api/v1/endpoint |
The HTTP endpoint to send request to |
✅ |
query_params |
[]{key, value} |
[{"key":"s","value":"One Piece"}] |
The query parameters to be added to the request URL. |
✘ |
method |
string |
GET /POST |
The HTTP verb/method to use with request. Default is GET . |
✘ |
headers |
map[string]string |
{"Api-Token": "..."} |
Headers to send in the HTTP request. |
✘ |
content_type |
string |
application/json |
Content type for encoding request body. Also sent as a header. |
✅ |
accept |
string |
application/json |
Sent as the Accept header. Also indicates the format to use for decoding. |
✅ |
body |
Object |
{"key": "value"} |
The request body to be sent. |
✘ |
timeout |
string |
1s |
Timeout for the HTTP request. Default is 5s. |
✘ |
Notes
- In case of conflicts between query parameters present in
request.url
and request.query_params
, request.query_params
takes precedence.
- Currently, only
application/json
is supported for encoding the request body
and for decoding the response body. If Content-Type
and Accept
headers are
added under request.headers
, they will be ignored and overridden.
- Script is only executed if the response status code matches
the
success_codes
provided.
- Tengo is the only supported script engine.
- Tengo's
os
stdlib module cannot be imported and used in the script.
Script Globals
recipe_scope
The value of the scope specified in the recipe (string).
With the following example recipe:
source:
scope: integration
type: http
config:
#...
The value of recipe_scope
will be integration
.
response
HTTP response received with the status_code
, header
and body
. Ex:
{
"status_code": "200",
"header": {
"link": "</products?page=5&perPage=20>;rel=self,</products?page=0&perPage=20>;rel=first,</products?page=4&perPage=20>;rel=previous,</products?page=6&perPage=20>;rel=next,</products?page=26&perPage=20>;rel=last"
},
"body": [
{
"id": 1,
"name": "Widget #1"
},
{
"id": 2,
"name": "Widget #2"
},
{
"id": 3,
"name": "Widget #3"
}
]
}
The header names are always in lower case. See
Worked Example for detailed usage.
new_asset(string): Asset
Takes a single string parameter and returns an asset instance. The type
parameter can be one of the following:
The asset can then be modified in the script to set properties that are
available for the given asset type.
WARNING: Do not overwrite the data
property, set fields on it instead.
Translating script object into proto fails otherwise.
// Bad
asset.data = {full_name: "Daiyamondo Jozu"}
// Good
asset.data.full_name = "Daiyamondo Jozu"
emit(Asset)
Takes an asset and emits the asset that can then be consumed by the
processor/sink.
execute_request(...requests): []Response
Takes 1 or more requests and executes the requests with the concurrency defined
in the recipe. The results are returned as an array. Each item in the array can
be an error or the HTTP response. The request object supports the properties
defined in the Request input section.
When a request is executed, it can fail due to temporary errors such as network
errors. These instances need to be handled in the script.
if !response.body.success {
exit()
}
reqs := []
for j in response.body.jobs {
reqs = append(reqs, {
url: format("http://my.server.com/jobs/%s/config", j.id),
method: "GET",
content_type: "application/json",
accept: "application/json",
timeout: "5s"
})
}
responses := execute_request(reqs...)
for r in responses {
if is_error(r) {
// TODO: Handle it appropriately. The error value has the request and
// error string:
// r.value.{request, error}
continue
}
asset := new_asset("job")
asset.name = r.body.name
exec_cfg := r.body["execution-config"]
asset.data.attributes = {
"job_id": r.body.jid,
"job_parallelism": exec_cfg["job-parallelism"],
"config": exec_cfg["user-config"]
}
emit(asset)
}
If the request passed to the function fails validation, a runtime error is
thrown.
exit()
Terminates the script execution.
Output
The output of the extractor depends on the user specified script. It can emit 0
or more assets.
Worked Example
Lets consider a service that returns a list of users on making a GET
call on
the endpoint http://my_user_service.company.com/api/v1/users
in the following
format:
{
"success": "<bool>"
"message": "<string>",
"data": [
{
"manager_name": "<string>",
"terminated": "<string: true/false>",
"fullname": "<string>",
"location_name": "<string>",
"work_email": "<string: email>",
"supervisory_org_id": "<string>",
"supervisory_org_name": "<string>",
"preferred_last_name": "<string>",
"business_title": "<string>",
"company_name": "<string>",
"cost_center_id": "<string>",
"preferred_first_name": "<string>",
"product_name": "<string>",
"cost_center_name": "<string>",
"employee_id": "<string>",
"manager_id": "<string>",
"location_id": "<string: ID/IN>",
"manager_id_2": "<string>",
"termination_date": "<string: YYYY-MM-DD>",
"company_hierarchy": "<string>",
"company_id": "<string>",
"preferred_middle_name": "<string>",
"preferred_social_suffix": "<string>",
"legal_middle_name": "<string>",
"manager_email_2": "<string: email>",
"legal_first_name": "<string>",
"manager_name_2": "<string>",
"manager_email": "<string: email>",
"legal_last_name": "<string>"
}
]
}
Assuming the authentication can be done using an Api-Token
header, we can use
the following recipe:
source:
scope: production
type: http
config:
request:
url: "http://my_user_service.company.com/api/v1/users"
method: "GET"
headers:
"Api-Token": "1a4336bc-bc6a-4972-83c1-d6426b4d79c3"
content_type: application/json
accept: application/json
timeout: 5s
success_codes: [ 200 ]
script:
engine: tengo
source: |
if !response.body.success {
exit()
}
users := response.body.data
for u in users {
if u.email == "" {
continue
}
asset := new_asset("user")
// URN format: "urn:{service}:{scope}:{type}:{id}"
asset.urn = format("urn:%s:staging:user:%s", "my_usr_svc", u.employee_id)
asset.name = u.fullname
asset.service = "my_usr_svc"
// asset.type = "user" // not required, new_asset("user") sets the field.
asset.data.email = u.work_email
asset.data.username = u.employee_id
asset.data.first_name = u.legal_first_name
asset.data.last_name = u.legal_last_name
asset.data.full_name = u.fullname
asset.data.display_name = u.fullname
asset.data.title = u.business_title
asset.data.status = u.terminated == "true" ? "suspended" : "active"
asset.data.manager_email = u.manager_email
asset.data.attributes = {
manager_id: u.manager_id,
cost_center_id: u.cost_center_id,
supervisory_org_name: u.supervisory_org_name,
location_id: u.location_id,
service_job_id: response.header["x-job-id"]
}
emit(asset)
}
This would emit a 'User' asset for each user object in response.data
. Note
that the response headers can be accessed under response.header
and can be
used as needed.
Caveats
The following features are currently not supported:
- Explicit authentication support, ex: Basic auth/OAuth/OAuth2/JWT etc.
- Retries with configurable backoff.
- Content type for request/response body other than
application/json
.
Contributing
Refer to
the contribution guidelines
for information on contributing to this module.