postgres

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 8, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

README

PostgresWriter Plugin Documentation

Quick Introduction

The PostgresWriter plugin enables writing data to Postgres/Greenplum databases. Under the hood, PostgresWriter connects to remote Postgres/Greenplum databases using github.com/lib/pq and database/sql, executing corresponding SQL statements to write data into the Postgres/Greenplum database.

Implementation Principles

PostgresWriter connects to remote Postgres/Greenplum databases via github.com/lib/pq. It generates SQL statements for writing based on user-provided configuration information and go-etl's custom data types from the Reader. These statements are then sent to the remote Postgres/Greenplum database for execution.

Postgres/Greenplum implements specific queries by utilizing the query process defined in dbmswriter, calling go-etl's custom storage/database DBWrapper. DBWrapper encapsulates numerous interfaces from database/sql and abstracts the database dialect, Dialect. For Postgres/Greenplum, it adopts the Dialect implemented in storage/database/postgres.

Based on your configured writeMode, it generates either:

  • insert into... (which may fail to insert conflicting rows in case of primary key/unique index conflicts)

or

  • copy in ... which behaves similarly to insert into but offers faster performance. For optimal performance, data is buffered in memory and written only when the memory reaches a predefined threshold.

Functionality Description

Configuration Example

Configuring a job to synchronously write data to a Postgres/Greenplum database:

{
    "job":{
        "content":[
            {
               "writer":{
                    "name": "postgreswriter",
                    "parameter": {
                        "username": "postgres",
                        "password": "123456",
                        "writeMode": "insert",
                        "column": ["*"],
                        "session": [],
                        "preSql": [],
                        "connection":  {
                                "url": "postgres://192.168.15.130:5432/postgres?sslmode=disable",
                                "table": {
                                    "schema":"destination",
                                    "name":"type_table"
                                }
                         },
                        "preSql": ["create table a like b"],
                        "postSql": ["drop table a"],
                        "batchTimeout": "1s",
                        "batchSize":1000
                    }
               }
            }
        ]
    }
}
Parameter Description
url
  • Description: Primarily used to configure the connection information for the remote end. The basic format is: postgres://ip:port/db, where ip:port represents the IP address and port of the Postgres database, and db indicates the default database to connect to. It is similar to the connection configuration information of pq, except that the username and password are extracted from the connection configuration for easier encryption in the future. Unlike pq, read/write timeouts can be configured using readTimeout/writeTimeout in the same format as batchTimeout.
  • Required: Yes
  • Default: None
username
  • Description: Used to configure the username for the Postgres database.
  • Required: Yes
  • Default: None
password
  • Description: Used to configure the password for the Postgres database.
  • Required: Yes
  • Default: None
name

Describes the Postgres table information.

schema
  • Description: Primarily used to configure the schema name of the Postgres table.
  • Required: Yes
  • Default: None
table
  • Description: Primarily used to configure the table name of the Postgres table.
  • Required: Yes
  • Default: None
column
  • Description: A set of column names from the configured table that need to be synchronized, described using a JSON array. Users can use * to indicate that all columns should be used by default, for example, ["*"].

    Supports column pruning, allowing only selected columns to be exported.

    Supports column reordering, meaning columns can be exported in an order different from the table schema.

    Supports constant configuration. Users need to follow the PostgreSQL syntax format: ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] where id is a regular column name, 'hello'::varchar is a string constant, true is a boolean value, 2.5 is a floating-point number, and power(2,3) is a function.

  • Required: Yes

  • Default: None

writeMode
  • Description: Write mode. "insert" represents writing data using the insert into method, while "copyIn" represents writing data using the copy in method.
  • Required: No
  • Default: insert
batchTimeout
  • Description: Primarily used to configure the timeout interval for each batch write operation. The format is: number + unit, where the unit can be s for seconds, ms for milliseconds, or us for microseconds. If the specified time interval is exceeded, the data will be written directly. This parameter, along with batchSize, can be adjusted for optimal write performance.
  • Required: No
  • Default: 1s
batchSize
  • Description: Primarily used to configure the size of each batch write operation. If the specified size is exceeded, the data will be written directly. This parameter, along with batchTimeout, can be adjusted for optimal write performance.
  • Required: No
  • Default: 1000
preSql
  • Description: Primarily used for SQL statement groups executed before writing data. Do not use select statements as they will result in an error.
  • Required: No
  • Default: None
postSql
  • Description: Primarily used for SQL statement groups executed after writing data. Do not use select statements as they will result in an error.
  • Required: No
  • Default: None
Type Conversion

Currently, PostgresWriter supports most Postgres types, but there may be some individual types that are not supported. Please check your types accordingly.

Below is a conversion table for PostgresWriter with regards to Postgres types:

go-etl Type Postgres Data Type
bool boolean
bigInt bigint, bigserial, integer, smallint, serial, smallserial
decimal double precision, decimal, numeric, real
string varchar, text
time date, time, timestamp
bytes char

Performance Report

Pending testing.

Constraints and Limitations

Database Encoding Issues

Currently, only the utf8 character set is supported.

FAQ

(Frequently Asked Questions section to be added if applicable.)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Job

type Job struct {
	*dbms.Job
}

Job

type Task

type Task struct {
	*dbms.Task
}

Task

func (*Task) StartWrite

func (t *Task) StartWrite(ctx context.Context, receiver plugin.RecordReceiver) (err error)

StartWrite

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer Writer

func (*Writer) Job

func (w *Writer) Job() spiwriter.Job

Job Job

func (*Writer) ResourcesConfig

func (w *Writer) ResourcesConfig() *config.JSON

ResourcesConfig Plugin Resource Configuration

func (*Writer) Task

func (w *Writer) Task() spiwriter.Task

Task Task

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL