maxcompute
Usage
The maxcompute
extractor allows you to extract metadata from MaxCompute tables and schemas.
It supports configuration for project name, endpoint, access keys, schema name, exclusions, and concurrency.
source:
name: maxcompute
config:
project_name: goto_test
endpoint_project: http://goto_test-maxcompute.com
access_key:
id: access_key_id
secret: access_key_secret
schema_name: DEFAULT
exclude:
schemas:
- schema_a
- schema_b
tables:
- schema_c.table_a
concurrency: 10
Key |
Value |
Example |
Description |
|
project_name |
string |
goto_test |
MaxCompute Project Name |
required |
endpoint_project |
string |
http://goto_test-maxcompute.com |
Endpoint Project URL |
required |
access_key.id |
string |
access_key_id |
Access Key ID |
required |
access_key.secret |
string |
access_key_secret |
Access Key Secret |
required |
schema_name |
string |
DEFAULT |
Default schema name |
optional |
exclude.schemas |
[]string |
["schema_a", "schema_b"] |
List of schemas to exclude |
optional |
exclude.tables |
[]string |
["schema_c.table_a"] |
List of tables to exclude |
optional |
concurrency |
int |
10 |
Number of concurrent requests to MaxCompute |
optional |
Notes
Outputs
Field |
Sample Value |
Description |
resource.urn |
project_name.schema_name.table_name |
|
resource.name |
table_name |
|
resource.service |
maxcompute |
|
description |
table description |
|
schema |
[]Column |
|
properties.partition_data |
"partition_data": {"partition_field": "data_date", "require_partition_filter": false, "time_partition": {"partition_by": "DAY","partition_expire": 0 } } |
partition related data for time and range partitioning. |
properties.partition_field |
created_at |
returns the field on which table is time partitioned |
Partition Data
Field |
Sample Value |
Description |
partition_field |
created_at |
field on which the table is partitioned either by TimePartitioning or RangePartitioning. In case field is empty for TimePartitioning _PARTITIONTIME is returned instead of empty. |
require_partition_filter |
true |
boolean value which denotes if every query on the MaxCompute table must include at least one predicate that only references the partitioning column |
time_partition.partition_by |
HOUR |
returns partition type HOUR/DAY/MONTH/YEAR |
time_partition.partition_expire_seconds |
0 |
time in which data will expire from this partition. If 0 it will not expire. |
range_partition.interval |
10 |
width of a interval range |
range_partition.start |
0 |
start value for partition inclusive of this value |
range_partition.end |
100 |
end value for partition exclusive of this value |
Column
Field |
Sample Value |
name |
total_price |
description |
item's total price |
data_type |
decimal |
is_nullable |
true |
Join
Field |
Sample Value |
urn |
project_name.schema_name.table_name |
count |
3 |
conditions |
["ON target.column_1 = source.column_1 and target.param_name = source.param_name" ,"ON DATE(target.event_timestamp) = DATE(source.event_timestamp)" ] |
Contributing
Refer to the contribution guidelines for information on
contributing to this module.