awssolutionsconstructsawskinesisstreamsgluejob

package module
v2.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 1, 2022 License: Apache-2.0 Imports: 10 Imported by: 0

README

aws-kinesisstreams-gluejob module


All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


Reference Documentation: https://docs.aws.amazon.com/solutions/latest/constructs/
Language Package
Python Logo Python aws_solutions_constructs.aws_kinesis_streams_gluejob
Typescript Logo Typescript @aws-solutions-constructs/aws-kinesisstreams-gluejob
Java Logo Java software.amazon.awsconstructs.services.kinesisstreamsgluejob

This AWS Solutions Construct deploys a Kinesis Stream and configures a AWS Glue Job to perform custom ETL transformation with the appropriate resources/properties for interaction and security. It also creates an S3 bucket where the python script for the AWS Glue Job can be uploaded.

Here is a minimal deployable pattern definition in Typescript:

import * as glue from '@aws-cdk/aws-glue';
import * as s3assets from '@aws-cdk/aws-s3-assets';
import {KinesisstreamsToGluejob} from '@aws-solutions-constructs/aws-kinesisstreams-gluejob';

const fieldSchema: glue.CfnTable.ColumnProperty[] = [
    {
        name: 'id',
        type: 'int',
        comment: 'Identifier for the record',
    },
    {
        name: 'name',
        type: 'string',
        comment: 'Name for the record',
    },
    {
        name: 'address',
        type: 'string',
        comment: 'Address for the record',
    },
    {
        name: 'value',
        type: 'int',
        comment: 'Value for the record',
    },
];

const customEtlJob = new KinesisstreamsToGluejob(this, 'CustomETL', {
    glueJobProps: {
        command: {
            name: 'gluestreaming',
            pythonVersion: '3',
            scriptLocation: new s3assets.Asset(this, 'ScriptLocation', {
                path: `${__dirname}/../etl/transform.py`,
            }).s3ObjectUrl,
        },
    },
    fieldSchema: fieldSchema,
});

Initializer

new KinesisstreamsToGluejob(scope: Construct, id: string, props: KinesisstreamsToGluejobProps);

Parameters

Pattern Construct Props

Name Type Description
existingStreamObj? kinesis.Stream Existing instance of Kinesis Stream, providing both this and kinesisStreamProps will cause an error.
kinesisStreamProps? kinesis.StreamProps Optional user-provided props to override the default props for the Kinesis stream.
glueJobProps? cfnJob.CfnJobProps User provided props to override the default props for the AWS Glue Job.
existingGlueJob? cfnJob.CfnJob Existing instance of AWS Glue Job, providing both this and glueJobProps will cause an error.
fieldSchema? CfnTable.ColumnProperty[] User provided schema structure to create an AWS Glue Table.
existingTable? CfnTable Existing instance of AWS Glue Table. If this is set, tableProps and fieldSchema are ignored.
tableProps? CfnTableProps User provided AWS Glue Table props to override default props used to create a Glue Table.
existingDatabase? CfnDatabase Existing instance of AWS Glue Database. If this is set, then databaseProps is ignored.
databaseProps? CfnDatabaseProps User provided Glue Database Props to override the default props used to create the Glue Database.
outputDataStore? SinkDataStoreProps User provided properties for S3 bucket that stores Glue Job output. Current datastore types suported is only S3.
createCloudWatchAlarms? boolean Whether to create recommended CloudWatch alarms for Kinesis Data Stream. Default value is set to true.
SinkDataStoreProps
Name Type Description
existingS3OutputBucket? Bucket Existing instance of S3 bucket where the data should be written. Providing both this and outputBucketProps will cause an error.
outputBucketProps BucketProps User provided bucket properties to create the S3 bucket to store the output from the AWS Glue Job.
datastoreType SinkStoreType Sink data store type.
SinkStoreType

Enumeration of data store types that could include S3, DynamoDB, DocumentDB, RDS or Redshift. Current construct implementation only supports S3, but potential to add other output types in the future.

Name Type Description
S3 string S3 storage type

Pattern Properties

Name Type Description
kinesisStream kinesis.Stream Returns an instance of the Kinesis stream created or used by the pattern.
glueJob CfnJob Returns an instance of AWS Glue Job created by the construct.
glueJobRole iam.Role Returns an instance of the IAM Role created by the construct for the Glue Job.
database CfnDatabase Returns an instance of AWS Glue Database created by the construct.
table CfnTable Returns an instance of the AWS Glue Table created by the construct
outputBucket? s3.Bucket Returns an instance of the output bucket created by the construct for the AWS Glue Job.
cloudwatchAlarms? cloudwatch.Alarm[] Returns an array of recommended CloudWatch Alarms created by the construct for Kinesis Data stream.

Default settings

Out of the box implementation of the Construct without any override will set the following defaults:

Amazon Kinesis Stream
  • Configure least privilege access IAM role for Kinesis Stream
  • Enable server-side encryption for Kinesis Stream using AWS Managed KMS Key
  • Deploy best practices CloudWatch Alarms for the Kinesis Stream
Glue Job
  • Create a Glue Security Config that configures encryption for CloudWatch, Job Bookmarks, and S3. CloudWatch and Job Bookmarks are encrypted using AWS Managed KMS Key created for AWS Glue Service. The S3 bucket is configured with SSE-S3 encryption mode
  • Configure service role policies that allow AWS Glue to read from Kinesis Data Streams
Glue Database
  • Create an AWS Glue database. An AWS Glue Table will be added to the database. This table defines the schema for the records buffered in the Amazon Kinesis Data Streams
Glue Table
  • Create an AWS Glue table. The table schema definition is based on the JSON structure of the records buffered in the Amazon Kinesis Data Streams
IAM Role
  • A job execution role that has privileges to 1) read the ETL script from the S3 bucket location, 2) read records from the Kinesis Stream, and 3) execute the Glue Job
Output S3 Bucket
  • An S3 bucket to store the output of the ETL transformation. This bucket will be passed as an argument to the created glue job so that it can be used in the ETL script to write data into it
Cloudwatch Alarms
  • A CloudWatch Alarm to report when consumer application is reading data slower than expected
  • A CloudWatch Alarm to report when consumer record processing is falling behind (to avoid risk of data loss due to record expiration)

Architecture

Architecture Diagram

Reference Implementation

A sample use case which uses this pattern is available under use_cases/aws-custom-glue-etl.

© Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Documentation

Overview

CDK Constructs for streaming data from AWS Kinesis Data Stream for Glue ETL custom Job processing

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func KinesisstreamsToGluejob_IsConstruct

func KinesisstreamsToGluejob_IsConstruct(x interface{}) *bool

Checks if `x` is a construct.

Returns: true if `x` is an object created from a class which extends `Construct`. Deprecated: use `x instanceof Construct` instead

func NewKinesisstreamsToGluejob_Override

func NewKinesisstreamsToGluejob_Override(k KinesisstreamsToGluejob, scope constructs.Construct, id *string, props *KinesisstreamsToGluejobProps)

Constructs a new instance of KinesisstreamsToGluejob.Based on the values set in the @props.

Types

type KinesisstreamsToGluejob

type KinesisstreamsToGluejob interface {
	constructs.Construct
	CloudwatchAlarms() *[]awscloudwatch.Alarm
	Database() awsglue.CfnDatabase
	GlueJob() awsglue.CfnJob
	GlueJobRole() awsiam.IRole
	KinesisStream() awskinesis.Stream
	Node() constructs.Node
	OutputBucket() *map[string]interface{}
	Table() awsglue.CfnTable
	ToString() *string
}

func NewKinesisstreamsToGluejob

func NewKinesisstreamsToGluejob(scope constructs.Construct, id *string, props *KinesisstreamsToGluejobProps) KinesisstreamsToGluejob

Constructs a new instance of KinesisstreamsToGluejob.Based on the values set in the @props.

type KinesisstreamsToGluejobProps

type KinesisstreamsToGluejobProps struct {
	// Whether to create recommended CloudWatch alarms.
	CreateCloudWatchAlarms *bool `json:"createCloudWatchAlarms"`
	// The props for the Glue database that the construct should use to create.
	//
	// If @database is set
	// then this property is ignored. If none of @database and @databaseprops is provided, the
	// construct will define a GlueDatabase resoruce.
	DatabaseProps *awsglue.CfnDatabaseProps `json:"databaseProps"`
	// Glue Database for this construct.
	//
	// If not provided the construct will create a new Glue Database.
	// The database is where the schema for the data in Kinesis Data Streams is stored
	ExistingDatabase awsglue.CfnDatabase `json:"existingDatabase"`
	// Existing GlueJob configuration.
	//
	// If this property is provided, any properties provided through @glueJobProps is ignored
	ExistingGlueJob awsglue.CfnJob `json:"existingGlueJob"`
	// Existing instance of Kineses Data Stream.
	//
	// If not set, it will create an instance
	ExistingStreamObj awskinesis.Stream `json:"existingStreamObj"`
	// Glue Table for this construct, If not provided the construct will create a new Table in the database.
	//
	// This table should define the schema for the records in the Kinesis Data Streams.
	// One of @tableprops or @table or @fieldSchema is mandatory. If @tableprops is provided then
	ExistingTable awsglue.CfnTable `json:"existingTable"`
	// Structure of the records in the Amazon Kinesis Data Streams.
	//
	// An example of such a  definition is as below.
	// Either @table or @fieldSchema is mandatory. If @table is provided then @fieldSchema is ignored
	//  	"FieldSchema": [{
	//   	"name": "id",
	//   	"type": "int",
	//     "comment": "Identifier for the record"
	//   }, {
	//     "name": "name",
	//     "type": "string",
	//     "comment": "The name of the record"
	//   }, {
	//     "name": "type",
	//     "type": "string",
	//     "comment": "The type of the record"
	//   }, {
	//     "name": "numericvalue",
	//     "type": "int",
	//     "comment": "Some value associated with the record"
	//   },
	FieldSchema *[]*awsglue.CfnTable_ColumnProperty `json:"fieldSchema"`
	// User provides props to override the default props for Glue ETL Jobs.
	//
	// Providing both this and
	// existingGlueJob will cause an error.
	//
	// This parameter is defined as `any` to not enforce passing the Glue Job role which is a mandatory parameter
	// for CfnJobProps. If a role is not passed, the construct creates one for you and attaches the appropriate
	// role policies
	//
	// The default props will set the Glue Version 2.0, with 2 Workers and WorkerType as G1.X. For details on
	// defining a Glue Job, please refer the following link for documentation - https://docs.aws.amazon.com/glue/latest/webapi/API_Job.html
	GlueJobProps interface{} `json:"glueJobProps"`
	// User provided props to override the default props for the Kinesis Stream.
	KinesisStreamProps interface{} `json:"kinesisStreamProps"`
	// The output data stores where the transformed data should be written.
	//
	// Current supported data stores
	// include only S3, other potential stores may be added in the future.
	OutputDataStore *awssolutionsconstructscore.SinkDataStoreProps `json:"outputDataStore"`
	// The table properties for the construct to create the table.
	//
	// One of @tableprops or @table
	// or @fieldSchema is mandatory. If @tableprops is provided then @table and @fieldSchema
	// are ignored. If @table is provided, @fieldSchema is ignored
	TableProps *awsglue.CfnTableProps `json:"tableProps"`
}

Directories

Path Synopsis
Package jsii contains the functionaility needed for jsii packages to initialize their dependencies and themselves.
Package jsii contains the functionaility needed for jsii packages to initialize their dependencies and themselves.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL