datagen

package
v3.30.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2023 License: Apache-2.0, Apache-2.0 Imports: 29 Imported by: 0

README

Datagen Tool

Help Usage

Usage of datagen:
  -c, --concurrency int             Number of concurrent sources and indexing routines to launch. (default 1)
      --dry-run                     Dry run - just flag parsing.
  -e, --end-at uint                 ID at which to stop generating records.
      --kafka.batch-size int        Number of records to generate before sending them to Kafka all at once. Generally, larger means better throughput and more memory usage. (default 1000)
      --kafka.hosts strings         Comma separated list of host:port pairs for Kafka. (default [])
      --kafka.registry-url string   Location of Confluent Schema Registry. Must start with 'https://' if you want to use TLS.
      --kafka.subject string        Kafka schema subject.
      --kafka.topic string          Kafka topic to post to.
      --pilosa.batch-size int       Number of records to read before indexing all of them at once. Generally, larger means better throughput and more memory usage. 1,048,576 might be a good number.
      --pilosa.cache-length uint    Number of batches of ID mappings to cache. (default 64)
      --pilosa.hosts strings        Comma separated list of host:port pairs for Pilosa. (default [])
      --pilosa.index string         Name of Pilosa index.
      --seed int                    Seed to use for any random number generation.
  -s, --source string               Source generator type. Running datagen with no arguments will list the available source types.
  -b, --start-from uint             ID at which to start generating records.
  -t, --target string               Destination for the generated data: [kafka, pilosa]. (default "pilosa")
      --track-progress              Periodically print status updates on how many records have been sourced.

Example Usage

The following command will create 100 records in Pilosa index (starting at ID 0 and ending at ID 99) in the equipment index using the equipment data generator.

datagen --source=equipment --pilosa.index=equipment --end-at=99

Adding New Sources

TODO: redo README (or delete?)

If you're looking to add a new Source to datagen, the best thing to do is use the special "custom" datagen source (datagen --source=custom --custom-config=somefile.yaml) and write a somefile.yaml which describes the data you want to generate. An example can be found in datagen/testdata/custom.yaml, and there are some more in the molecula/technical-validation repo.

Documentation

Index

Constants

View Source
const (
	TargetFeaturebase = "featurebase"
	TargetKafka       = "kafka"
	TargetKafkaStatic = "kafkastatic"
	TargetServerless  = "serverless"
)

Variables

This section is empty.

Functions

func AddThousandSep

func AddThousandSep(num uint64) string

Types

type AllFieldTypes

type AllFieldTypes struct{}

AllFieldTypes implements Sourcer, and returns a data set containing one of every field type.

func (*AllFieldTypes) DefaultEndAt

func (a *AllFieldTypes) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*AllFieldTypes) Info

func (a *AllFieldTypes) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*AllFieldTypes) PrimaryKeyFields

func (a *AllFieldTypes) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*AllFieldTypes) Source

func (a *AllFieldTypes) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type AllFieldTypesSource

type AllFieldTypesSource struct {
	// contains filtered or unexported fields
}

AllFieldTypesSource is an instance of a source generated by the Sourcer implementation AllFieldTypes.

func (*AllFieldTypesSource) Close

func (a *AllFieldTypesSource) Close() error

func (*AllFieldTypesSource) Record

func (a *AllFieldTypesSource) Record() (idk.Record, error)

Record implements idk.Source.

func (*AllFieldTypesSource) Schema

func (a *AllFieldTypesSource) Schema() []idk.Field

Schema implements idk.Source.

type Bank

type Bank struct {
	// contains filtered or unexported fields
}

Bank implements Sourcer.

func (*Bank) DefaultEndAt

func (b *Bank) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Bank) Info

func (b *Bank) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Bank) PrimaryKeyFields

func (b *Bank) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Bank) Source

func (b *Bank) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type BankSource

type BankSource struct {
	// contains filtered or unexported fields
}

BankSource is a data generator which generates data for all Pilosa field types.

func (*BankSource) ABA

func (b *BankSource) ABA() string

ABA returns a random 9 numeric digit string with about 27000 possible values.

func (*BankSource) Close

func (b *BankSource) Close() error

func (*BankSource) CustomAudiences

func (b *BankSource) CustomAudiences() string

CustomAudiences returns a fake Custom Audience string

func (*BankSource) Db

func (b *BankSource) Db() string

Db returns a db

func (*BankSource) KafkaRecord

func (b *BankSource) KafkaRecord() string

KafkaRecord might implement idk.KafkaSource TODO: going to want to generate the kafka record from the idk record automatic handling would be nice, but allowing per-source is probably important, e.g. to handle special cases for individual kafka environments

func (*BankSource) KafkaSchema

func (b *BankSource) KafkaSchema() string

KafkaSchema implements idk.KafkaSource

func (*BankSource) Record

func (b *BankSource) Record() (idk.Record, error)

Record implements idk.Source.

func (*BankSource) Schema

func (b *BankSource) Schema() []idk.Field

Schema implements idk.Source

func (*BankSource) UserID

func (b *BankSource) UserID() int

UserID returns a user ID

type Claim

type Claim struct{}

Claim implements Sourcer.

func (*Claim) DefaultEndAt

func (c *Claim) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Claim) Info

func (c *Claim) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Claim) PrimaryKeyFields

func (c *Claim) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Claim) Source

func (c *Claim) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type ClaimSource

type ClaimSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*ClaimSource) Close

func (s *ClaimSource) Close() error

func (*ClaimSource) Record

func (s *ClaimSource) Record() (idk.Record, error)

func (*ClaimSource) Schema

func (s *ClaimSource) Schema() []idk.Field

func (*ClaimSource) Seed

func (s *ClaimSource) Seed(seed int64)

type Custom

type Custom struct {
	CustomConfig    *CustomConfig
	IDKAndGenFields *IDKAndGenFields
	// contains filtered or unexported fields
}

Custom implements Sourcer

func (*Custom) DefaultEndAt

func (_ *Custom) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface. Not used for Custom.

func (*Custom) Info

func (_ *Custom) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Custom) PrimaryKeyFields

func (c *Custom) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Custom) Source

func (c *Custom) Source(cfg SourceConfig) idk.Source

Source returns a configured Custom idk.Source. It does not currently have explicit support for concurrency or paritioned data generation as some of the other datagen sources do.

type CustomConfig

type CustomConfig struct {
	// Fields describe the data to be generated.
	Fields []*GenField `json:"fields"`

	// IDKParams configure how the generated data should be treated by
	// the IDK (and probably ultimately ingested into FeatureBase).
	IDKParams IDKParams `json:"idk_params"`
}

CustomConfig represents the JSON/yaml configuration for the "custom" datagen source.

func (*CustomConfig) GetIDKFields

func (cc *CustomConfig) GetIDKFields() (*IDKAndGenFields, error)

GetIDKFields figures out what IDK Fields should be created based on the generated data fields and the IDK configuration parameters. If multiple IDK fields are generated for a single genField, the corresponding entries beyond the first in IDKAndGenFields.genFields will be nil. We'll only create one generator and use the value it generates for all those fields in the generated record.

func (*CustomConfig) PostUnmarshal

func (c *CustomConfig) PostUnmarshal() error

PostUnmarshal does post-processing of the CustomConfig object after the basic JSON/yaml unmarshal, like converting time strings to durations. There's probably an elegant way to do this during the Unmarshal, but I couldn't figure it out.

type CustomSource

type CustomSource struct {
	// contains filtered or unexported fields
}

CustomSource is an instance of a source generated by the Sourcer implementation Custom.

func (*CustomSource) Close

func (c *CustomSource) Close() error

func (*CustomSource) Record

func (cs *CustomSource) Record() (idk.Record, error)

Record uses the configured generators to emit a custom record. If any of the generators are nil, it uses the value from the last non-nil generator.

func (*CustomSource) Schema

func (cs *CustomSource) Schema() []idk.Field

Schema returns the IDK fields which were determined from the data generation and ingest configuration parameters in the custom yaml file.

type Customer

type Customer struct{}

Customer implements Sourcer.

func (*Customer) DefaultEndAt

func (c *Customer) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Customer) Info

func (c *Customer) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Customer) PrimaryKeyFields

func (c *Customer) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Customer) Source

func (c *Customer) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type CustomerSource

type CustomerSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*CustomerSource) Close

func (s *CustomerSource) Close() error

func (*CustomerSource) Record

func (s *CustomerSource) Record() (idk.Record, error)

func (*CustomerSource) Schema

func (s *CustomerSource) Schema() []idk.Field

func (*CustomerSource) Seed

func (s *CustomerSource) Seed(seed int64)

type Equipment

type Equipment struct{}

Equipment implements Sourcer.

func (*Equipment) DefaultEndAt

func (e *Equipment) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Equipment) Info

func (e *Equipment) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Equipment) PrimaryKeyFields

func (e *Equipment) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Equipment) Source

func (e *Equipment) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type EquipmentSource

type EquipmentSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*EquipmentSource) Close

func (s *EquipmentSource) Close() error

func (*EquipmentSource) Record

func (s *EquipmentSource) Record() (idk.Record, error)

func (*EquipmentSource) Schema

func (s *EquipmentSource) Schema() []idk.Field

func (*EquipmentSource) Seed

func (s *EquipmentSource) Seed(seed int64)

type Example

type Example struct{}

Example implements Sourcer, and returns a very basic data set. It can be used as an example for writing additional custom Sourcers.

func (*Example) DefaultEndAt

func (e *Example) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Example) Info

func (e *Example) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Example) PrimaryKeyFields

func (e *Example) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Example) Source

func (e *Example) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface. NOTE: avro Name fields can only contain certain characters. See: https://avro.apache.org/docs/current/spec.html#names

type ExampleSource

type ExampleSource struct {
	// contains filtered or unexported fields
}

ExampleSource is an instance of a source generated by the Sourcer implementation Example.

func (*ExampleSource) Close

func (e *ExampleSource) Close() error

func (*ExampleSource) Record

func (e *ExampleSource) Record() (idk.Record, error)

Record implements idk.Source.

func (*ExampleSource) Schema

func (e *ExampleSource) Schema() []idk.Field

Schema implements idk.Source.

type FeatureBaseConfig

type FeatureBaseConfig struct {
	OrganizationID string `flag:"org-id" short:"" help:"auto-assigned organization ID"`
	DatabaseID     string `flag:"db-id" short:"" help:"auto-assigned database ID"`
	TableName      string `flag:"table-name" short:"" help:"human friendly table name"`
}

FeatureBaseConfig is meant to represent the scoped (featurebase.*) configuration options to be used when target = mds. These are really just a sub-set of idk.Main, containing only those arguments that really apply to datagen.

type FieldGenerator

type FieldGenerator interface {
	// Generate produces a value. It takes in the previous value as an
	// optimization—some implementations may opt to reuse a slice, for
	// example.
	Generate(oldval interface{}) (interface{}, error)
}

FieldGenerator is an interface for generating values for a particular field.

type FixedProbabilitySourceFieldGenerator

type FixedProbabilitySourceFieldGenerator struct {
	// contains filtered or unexported fields
}

func NewFixedProbabilitySourceFieldGenerator

func NewFixedProbabilitySourceFieldGenerator(vals []string, probs []float64, r *rand.Rand) (*FixedProbabilitySourceFieldGenerator, error)

func (*FixedProbabilitySourceFieldGenerator) Generate

func (g *FixedProbabilitySourceFieldGenerator) Generate(_ interface{}) (interface{}, error)

type GenField

type GenField struct {
	Name            string  `json:"name"`
	Type            string  `json:"type"`
	Distribution    string  `json:"distribution"`
	Min             int64   `json:"min"`
	Max             int64   `json:"max"`
	MinFloat        float64 `json:"min_float"`
	MaxFloat        float64 `json:"max_float"`
	Repeat          bool    `json:"repeat"`
	Step            int64   `json:"step"`
	S               float64 `json:"s"`
	V               float64 `json:"v"`
	SourceFile      string  `json:"source_file"`
	GeneratorType   string  `json:"generator_type"`
	Cardinality     uint64  `json:"cardinality"`
	Charset         string  `json:"charset"`
	MinLen          uint64  `json:"min_len"`
	MaxLen          uint64  `json:"max_len"`
	MinNum          uint64  `json:"min_num"`
	MaxNum          uint64  `json:"max_num"`
	MinStepDuration string  `json:"min_step_duration"`
	MaxStepDuration string  `json:"max_step_duration"`

	MinDate    time.Time  `json:"min_date"`
	MaxDate    time.Time  `json:"max_date"`
	TimeFormat TimeFormat `json:"time_format"`
	TimeUnit   idk.Unit   `json:"time_unit"`
	NullChance float64    `json:"null_chance"` // between 0 and 1.0. Fraction of the time that we get a null value for this field
	// contains filtered or unexported fields
}

GenField describes one field of the data to be generated. Many of the struct fields are only used for certain values of the 'Type' field. See the commented custom.yaml example as a reference.

GenField says nothing about what should be done with the generated data, or how it might map to FeatureBase fields.

NOTE: If you add fields here, make sure you add them to the custom UnmarshalJSON method below.

func (*GenField) DefaultIDKField

func (g *GenField) DefaultIDKField() (idk.Field, error)

DefaultIDKField returns default IDK configuration for a given GenField. This is convenient for the common case of generated data wanting to be ingested into FeatureBase so that the user doesn't have to specify explicit "idk_params" configuration for every field.

func (*GenField) Generator

func (g *GenField) Generator(r *rand.Rand) (fg FieldGenerator, err error)

Generator creates a data generator based on the GenField.

type IDKAndGenFields

type IDKAndGenFields struct {
	// contains filtered or unexported fields
}

IDKAndGenFields keeps the IDK schema and generated data in sync. schema and genFields will always be the same length, and the idk.Fields in schema will always be non-nil. Items in genFields may be nil which indicates that the corresponding idk.Field in schema will use data generated by the last non-nil genField. e.g.

schema: IDField, RecordTimeField, TimestampField genFields: Gen{"id"}, Gen{"timestamp"}, nil

The TimestampField will get values from Gen{"timestamp"} since its corresponding genField is nil.

func (*IDKAndGenFields) Append

func (ig *IDKAndGenFields) Append(f idk.Field, g *GenField)

Append adds to IDKAndGenFields and ensures that schema and genFields stay in sync.

type IDKParams

type IDKParams struct {
	Fields           map[string][]IngestField `json:"fields"`
	PrimaryKeyConfig PrimaryKeyConfig         `json:"primary_key_config"`
}

IDKParams describes how data from CustomConfig.Fields should map to IDK fields.

type IncreasingTimestampGenerator

type IncreasingTimestampGenerator struct {
	R *rand.Rand

	MinDate         time.Time
	MaxDate         time.Time
	MinStepDuration time.Duration
	MaxStepDuration time.Duration
	Repeat          bool

	OutputFormat TimeFormat // "time", "int"
	Unit         idk.Unit   // "s", "ms", "us", "ns"
	// contains filtered or unexported fields
}

func (*IncreasingTimestampGenerator) Generate

func (g *IncreasingTimestampGenerator) Generate(_ interface{}) (interface{}, error)

type IngestField

type IngestField struct {
	Type         string    `json:"type"`
	Name         string    `json:"name"`
	Layout       string    `json:"layout"`
	Epoch        time.Time `json:"epoch"`
	Unit         string    `json:"unit"`
	Keyed        bool      `json:"keyed"`
	Mutex        bool      `json:"mutex"`
	TimeQuantum  string    `json:"time_quantum"`
	Min          *int64    `json:"min"`
	Max          *int64    `json:"max"`
	Scale        int64     `json:"scale"`
	ForeignIndex string    `json:"foreign_index"`
	Granularity  string    `json:"granularity"`
	TTL          string    `json:"ttl"`
}

IngestField is a json/yaml configuration for an IDK Field. See the ToIDKField method for how this configuration maps to each possible IDK Field.

func (IngestField) ToIDKField

func (f IngestField) ToIDKField(g *GenField) (idk.Field, error)

ToIDKField converts the general IngestField to a specific IDK Field.

type Item

type Item struct{}

Item implements Sourcer.

func (*Item) DefaultEndAt

func (i *Item) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Item) Info

func (i *Item) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Item) PrimaryKeyFields

func (i *Item) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Item) Source

func (i *Item) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type ItemSource

type ItemSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*ItemSource) Close

func (s *ItemSource) Close() error

func (*ItemSource) Record

func (s *ItemSource) Record() (idk.Record, error)

func (*ItemSource) Schema

func (s *ItemSource) Schema() []idk.Field

func (*ItemSource) Seed

func (s *ItemSource) Seed(seed int64)

type ItemSourceOption

type ItemSourceOption func(s *ItemSource) error

func OptItemEndAt

func OptItemEndAt(end uint64) ItemSourceOption

func OptItemStartFrom

func OptItemStartFrom(start uint64) ItemSourceOption

type KafkaConfig

type KafkaConfig struct {
	idk.ConfluentCommand
	Topic             string `short:"" help:"Kafka topic to post to."`
	Subject           string `short:"" help:"Kafka schema subject."`
	BatchSize         int    `` /* 152-byte string literal not displayed */
	ReplicationFactor int    `short:"" help:"set replication factor for kafka cluster"`
	NumPartitions     int    `short:"" help:"set partition for kafka cluster"`
}

KafkaConfig is meant to represent the scoped (pilosa.*) configuration options to be used when target = kafka. These are really just a sub-set of kafka.PutSource, containing only those arguments that really apply to datagen.

type KitchenSink

type KitchenSink struct {
	// contains filtered or unexported fields
}

KitchenSink implements Sourcer.

func (*KitchenSink) DefaultEndAt

func (k *KitchenSink) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*KitchenSink) Info

func (k *KitchenSink) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*KitchenSink) PrimaryKeyFields

func (k *KitchenSink) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*KitchenSink) Source

func (k *KitchenSink) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type KitchenSinkKeyed

type KitchenSinkKeyed struct {
	// contains filtered or unexported fields
}

KitchenSinkKeyed implements Sourcer.

func (*KitchenSinkKeyed) DefaultEndAt

func (k *KitchenSinkKeyed) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*KitchenSinkKeyed) Info

func (k *KitchenSinkKeyed) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*KitchenSinkKeyed) PrimaryKeyFields

func (k *KitchenSinkKeyed) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*KitchenSinkKeyed) Source

func (k *KitchenSinkKeyed) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type KitchenSinkKeyedSource

type KitchenSinkKeyedSource struct {
	// contains filtered or unexported fields
}

func NewKitchenSinkKeyedSource

func NewKitchenSinkKeyedSource(start, end uint64) *KitchenSinkKeyedSource

func (*KitchenSinkKeyedSource) Close

func (k *KitchenSinkKeyedSource) Close() error

func (*KitchenSinkKeyedSource) Record

func (k *KitchenSinkKeyedSource) Record() (idk.Record, error)

func (*KitchenSinkKeyedSource) Schema

func (k *KitchenSinkKeyedSource) Schema() []idk.Field

type KitchenSinkSource

type KitchenSinkSource struct {
	// contains filtered or unexported fields
}

KitchenSinkSource is a data generator which generates data for all Pilosa field types.

func (*KitchenSinkSource) Close

func (k *KitchenSinkSource) Close() error

func (*KitchenSinkSource) Record

func (k *KitchenSinkSource) Record() (idk.Record, error)

func (*KitchenSinkSource) Schema

func (k *KitchenSinkSource) Schema() []idk.Field

type Main

type Main struct {
	KafkaPut *kafka.PutSource `flag:"-"`

	Source string `short:"s" flag:"source" help:"Source generator type. Running datagen with no arguments will list the available source types."`
	Target string `short:"t" flag:"target" help:"Destination for the generated data: [featurebase, kafka, kafkastatic, mds]."`

	Concurrency int `short:"c" flag:"concurrency" help:"Number of concurrent sources and indexing routines to launch."`

	StartFrom uint64 `short:"b" help:"ID at which to start generating records."`
	EndAt     uint64 `short:"e" help:"ID at which to stop generating records."`

	Seed int64 `short:"" help:"Seed to use for any random number generation."`

	TrackProgress bool `short:"" help:"Periodically print status updates on how many records have been sourced."`

	UseShardTransactionalEndpoint bool `flag:"use-shard-transactional-endpoint" help:"Use experimental transactional endpoint"`

	CustomConfig string `short:"" help:"File from which to pull configuration for 'custom' source."`

	// Used strictly for configuration of the targets.
	Pilosa      PilosaConfig
	Kafka       KafkaConfig
	Serverless  ServerlessConfig
	FeatureBase FeatureBaseConfig `flag:"featurebase" help:"qualified featurebase table"`

	DryRun bool `help:"Dry run - just flag parsing."`

	AuthToken string `flag:"auth-token" help:"Authentication token for FeatureBase"`
	Verbose   bool   `flag:"verbose" help:"Enable extended logging for debug"`
	Datadog   bool   `flag:"datadog" help:"Enable datadog profiling"`
	// contains filtered or unexported fields
}

Main is the top-level datagen struct. It represents datagen-specific configuration parameters, as well as sub-level parameteres specific to each target.

func NewMain

func NewMain() *Main

NewMain returns a new instance of Main.

func (*Main) NoStats

func (m *Main) NoStats()

NoStats is a helper for tests.

func (*Main) PilosaClient

func (m *Main) PilosaClient() *pilosaclient.Client

PilosaClient is a helper for tests. The datagen command used to embed idk.Main, so datagen tests could call Main.PilosaClient(), but idk.Main is no longer directly embedded.

func (*Main) Preload

func (m *Main) Preload() error

Preload configures the Sources based on the concurrency and start/end range.

func (*Main) PrintPlan

func (m *Main) PrintPlan()

func (*Main) Run

func (m *Main) Run() error

Run generates data for the specified target.

func (*Main) Sources

func (m *Main) Sources() []idk.Source

Sources returns the list of sources. This is typically called by tests, after the generator has created the sources during Preload().

type Network

type Network struct {
	// contains filtered or unexported fields
}

Network implements Sourcer.

func (*Network) DefaultEndAt

func (n *Network) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Network) Info

func (n *Network) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Network) PrimaryKeyFields

func (n *Network) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Network) Source

func (n *Network) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type NetworkSource

type NetworkSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*NetworkSource) Close

func (s *NetworkSource) Close() error

func (*NetworkSource) Record

func (s *NetworkSource) Record() (idk.Record, error)

func (*NetworkSource) Schema

func (s *NetworkSource) Schema() []idk.Field

func (*NetworkSource) Seed

func (s *NetworkSource) Seed(seed int64)

type NullChanceGenerator

type NullChanceGenerator struct {
	R          *rand.Rand
	NullChance float64
	G          FieldGenerator
}

func (*NullChanceGenerator) Generate

func (g *NullChanceGenerator) Generate(oldval interface{}) (interface{}, error)

type Person

type Person struct{}

Person implements Sourcer,

func (*Person) DefaultEndAt

func (e *Person) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Person) Info

func (p *Person) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Person) PrimaryKeyFields

func (e *Person) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Person) Source

func (p *Person) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type PersonSource

type PersonSource struct {
	// contains filtered or unexported fields
}

PersonSource is an instance of a source generated by the Sourcer implementation Person.

func (*PersonSource) Close

func (ps *PersonSource) Close() error

func (*PersonSource) Record

func (ps *PersonSource) Record() (idk.Record, error)

Record implements idk.Source.

func (*PersonSource) Schema

func (ps *PersonSource) Schema() []idk.Field

Schema implements idk.Source.

type PilosaConfig

type PilosaConfig struct {
	Hosts       []string `short:"" help:"Comma separated list of host:port pairs for FeatureBase."`
	Index       string   `short:"" help:"Name of FeatureBase index."`
	BatchSize   int      `` /* 177-byte string literal not displayed */
	CacheLength uint64   `help:"Number of batches of ID mappings to cache."`
}

PilosaConfig is meant to represent the scoped (pilosa.*) configuration options to be used when target = pilosa. These are really just a sub-set of idk.Main, containing only those arguments that really apply to datagen.

type PrimaryKeyConfig

type PrimaryKeyConfig struct {
	Field string `json:"field"`
}

PrimaryKeyConfig specifies what should be interpreted as the record identifier in Featurebase.

type RandomStringGenerator

type RandomStringGenerator struct {
	R *rand.Rand

	MinLen  uint64
	MaxLen  uint64
	Charset string
}

func (*RandomStringGenerator) Generate

func (g *RandomStringGenerator) Generate(prev interface{}) (interface{}, error)

type Seedable

type Seedable interface {
	Seed(int64)
}

Seedable is an interface representing anything for which a seed can be provided.

type SequentialUintFieldGenerator

type SequentialUintFieldGenerator struct {
	Min    uint64
	Max    uint64
	Repeat bool
	Step   uint64
	// contains filtered or unexported fields
}

SequentialUintFieldGenerator generates unsigned integers sequentially from a minimum up to a maximum with a configurable step. It will start from the beginning when it reaches the max if repeat==true.

func (*SequentialUintFieldGenerator) Generate

func (g *SequentialUintFieldGenerator) Generate(_ interface{}) (interface{}, error)

type ServerlessConfig added in v3.30.0

type ServerlessConfig struct {
	Address string `short:"" help:"Controller host:port to connect to"`
}

ServerlessConfig represents the configuration options to be used when target = serverless. These are really just a sub-set of idk.Main, containing only those arguments that really apply to datagen.

type ShiftingStringGenerator

type ShiftingStringGenerator struct {
	// contains filtered or unexported fields
}

func NewShiftingStringGenerator

func NewShiftingStringGenerator(g *GenField, r *rand.Rand) (*ShiftingStringGenerator, error)

func (*ShiftingStringGenerator) Generate

func (g *ShiftingStringGenerator) Generate(_ interface{}) (interface{}, error)

type Site

type Site struct{}

Site implements Sourcer.

func (*Site) DefaultEndAt

func (s *Site) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Site) Info

func (s *Site) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Site) PrimaryKeyFields

func (s *Site) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Site) Source

func (s *Site) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type SiteSource

type SiteSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*SiteSource) Close

func (s *SiteSource) Close() error

func (*SiteSource) Record

func (s *SiteSource) Record() (idk.Record, error)

func (*SiteSource) Schema

func (s *SiteSource) Schema() []idk.Field

func (*SiteSource) Seed

func (s *SiteSource) Seed(seed int64)

type Sizing

type Sizing struct{}

Sizing implements Sourcer.

func (*Sizing) DefaultEndAt

func (s *Sizing) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Sizing) Info

func (s *Sizing) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Sizing) PrimaryKeyFields

func (s *Sizing) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Sizing) Source

func (s *Sizing) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type SizingSource

type SizingSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

SizingSource is an idk.Source meant to generate data which is helpful in determining the on-disk footprint of different types of data which can be extrapolated to help estimate necessary infrastructure size for various data. Typically one shard width of data is generated.

func (*SizingSource) Close

func (s *SizingSource) Close() error

func (*SizingSource) Record

func (s *SizingSource) Record() (idk.Record, error)

func (*SizingSource) Schema

func (s *SizingSource) Schema() []idk.Field

func (*SizingSource) Seed

func (s *SizingSource) Seed(seed int64)

type SourceConfig

type SourceConfig struct {
	// contains filtered or unexported fields
}

SourceConfig is the configuration required by a Sourcer when creating a new Source.

type SourceGenerator

type SourceGenerator interface {
	Info() string
	Sources(string, SourceGeneratorConfig) ([]string, []idk.Source, error)
	Config() SourceGeneratorConfig
}

SourceGenerator is an interface for anything which can generate data by providing one or more idk.Source. It also contains a method for getting information about the supported sources. Info() - return information about the generator along with

a list of supported types

Sources() - provided a string key and configuration, returns a list of primary key fields and a list of sources.

type SourceGeneratorConfig

type SourceGeneratorConfig struct {
	StartFrom    uint64
	EndAt        uint64
	Concurrency  int
	Seed         int64
	CustomConfig string
}

SourceGeneratorConfig provides configuration values used by implementations of the SourceGenerator interface.

type Sourcer

type Sourcer interface {
	Source(SourceConfig) idk.Source
	PrimaryKeyFields() []string
	DefaultEndAt() uint64
	Info() string
}

Sourcer is an interface describing a type which can generate sources, which in this case are `idk.Source`. It contains a few additional methods which are used to configure the source.

func NewAllFieldTypes

func NewAllFieldTypes(cfg SourceGeneratorConfig) Sourcer

NewAllFieldTypes returns a new instance of AllFieldTypes.

func NewBank

func NewBank(cfg SourceGeneratorConfig) Sourcer

NewBank returns a new instance of a Bank data generator.

func NewClaim

func NewClaim(cfg SourceGeneratorConfig) Sourcer

NewClaim returns a new instance of Claim.

func NewCustom

func NewCustom(cfg SourceGeneratorConfig) Sourcer

NewCustom returns a new instance of Custom.

func NewCustomer

func NewCustomer(cfg SourceGeneratorConfig) Sourcer

NewCustomer returns a new instance of Customer.

func NewEquipment

func NewEquipment(cfg SourceGeneratorConfig) Sourcer

NewEquipment returns a new instance of Equipment.

func NewExample

func NewExample(cfg SourceGeneratorConfig) Sourcer

NewExample returns a new instance of Example.

func NewItem

func NewItem(cfg SourceGeneratorConfig) Sourcer

NewItem returns a new instance of Item.

func NewKitchenSink

func NewKitchenSink(cfg SourceGeneratorConfig) Sourcer

NewKitchenSink returns a new instance of KitchenSink.

func NewKitchenSinkKeyed

func NewKitchenSinkKeyed(cfg SourceGeneratorConfig) Sourcer

NewKitchenSinkKeyed returns a new instance of KitchenSinkKeyed.

func NewNetwork

func NewNetwork(cfg SourceGeneratorConfig) Sourcer

NewNetwork returns a new instance of Network.

func NewPerson

func NewPerson(cfg SourceGeneratorConfig) Sourcer

NewPerson returns a new instance of Person.

func NewSite

func NewSite(cfg SourceGeneratorConfig) Sourcer

NewSite returns a new instance of Site.

func NewSizing

func NewSizing(cfg SourceGeneratorConfig) Sourcer

NewSizing returns a new instance of Sizing.

func NewStringPK

func NewStringPK(cfg SourceGeneratorConfig) Sourcer

NewStringPK returns a new instance of StringPK.

func NewTimeseries

func NewTimeseries(cfg SourceGeneratorConfig) Sourcer

NewTimeseries returns a new instance of Timeseries.

func NewTransaction

func NewTransaction(cfg SourceGeneratorConfig) Sourcer

NewTransaction returns a new instance of Transaction.

func NewTransaction1

func NewTransaction1(cfg SourceGeneratorConfig) Sourcer

NewTransaction1 returns a new instance of Transaction1.

func NewWarranty

func NewWarranty(cfg SourceGeneratorConfig) Sourcer

NewWarranty returns a new instance of Warranty.

type StringPK

type StringPK struct{}

StringPK implements Sourcer.

func (*StringPK) DefaultEndAt

func (s *StringPK) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*StringPK) Info

func (s *StringPK) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*StringPK) PrimaryKeyFields

func (s *StringPK) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*StringPK) Source

func (s *StringPK) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type StringPKSource

type StringPKSource struct {
	// contains filtered or unexported fields
}

func (*StringPKSource) Close

func (s *StringPKSource) Close() error

func (*StringPKSource) Record

func (s *StringPKSource) Record() (idk.Record, error)

func (*StringPKSource) Schema

func (s *StringPKSource) Schema() []idk.Field

func (*StringPKSource) Seed

func (s *StringPKSource) Seed(seed int64)

type StringSetGenerator

type StringSetGenerator struct {
	G FieldGenerator // for generating individual strings
	R *rand.Rand

	MinNum uint64
	MaxNum uint64
}

func NewStringSetGenerator

func NewStringSetGenerator(g *GenField, r *rand.Rand, fg FieldGenerator) (*StringSetGenerator, error)

func (*StringSetGenerator) Generate

func (g *StringSetGenerator) Generate(prev interface{}) (interface{}, error)

type TimeFormat

type TimeFormat string

type Timeseries

type Timeseries struct {
	// contains filtered or unexported fields
}

Timeseries implements Sourcer.

func (*Timeseries) DefaultEndAt

func (t *Timeseries) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Timeseries) Info

func (t *Timeseries) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Timeseries) PrimaryKeyFields

func (t *Timeseries) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Timeseries) Source

func (t *Timeseries) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type TimeseriesSource

type TimeseriesSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func NewTimeseriesSource

func NewTimeseriesSource(start, end uint64) *TimeseriesSource

func (*TimeseriesSource) Close

func (s *TimeseriesSource) Close() error

func (*TimeseriesSource) Record

func (s *TimeseriesSource) Record() (idk.Record, error)

func (*TimeseriesSource) Schema

func (s *TimeseriesSource) Schema() []idk.Field

func (*TimeseriesSource) Seed

func (s *TimeseriesSource) Seed(seed int64)

type Transaction

type Transaction struct {
	// contains filtered or unexported fields
}

Transaction implements Sourcer.

func (*Transaction) DefaultEndAt

func (t *Transaction) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Transaction) Info

func (t *Transaction) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Transaction) PrimaryKeyFields

func (t *Transaction) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Transaction) Source

func (t *Transaction) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type Transaction1

type Transaction1 struct {
	// contains filtered or unexported fields
}

Transaction1 implements Sourcer.

func (*Transaction1) DefaultEndAt

func (t *Transaction1) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Transaction1) Info

func (t *Transaction1) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Transaction1) PrimaryKeyFields

func (t *Transaction1) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Transaction1) Source

func (t *Transaction1) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type Transaction1Source

type Transaction1Source struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func NewTransaction1Source

func NewTransaction1Source(start, end uint64) *Transaction1Source

func (*Transaction1Source) Close

func (s *Transaction1Source) Close() error

func (*Transaction1Source) Record

func (s *Transaction1Source) Record() (idk.Record, error)

func (*Transaction1Source) Schema

func (s *Transaction1Source) Schema() []idk.Field

func (*Transaction1Source) Seed

func (s *Transaction1Source) Seed(seed int64)

type TransactionSource

type TransactionSource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*TransactionSource) Close

func (s *TransactionSource) Close() error

func (*TransactionSource) Record

func (s *TransactionSource) Record() (idk.Record, error)

func (*TransactionSource) Schema

func (s *TransactionSource) Schema() []idk.Field

func (*TransactionSource) Seed

func (s *TransactionSource) Seed(seed int64)

type UintSetGenerator

type UintSetGenerator struct {
	G FieldGenerator
	R *rand.Rand

	MinNum uint64
	MaxNum uint64
}

func (*UintSetGenerator) Generate

func (g *UintSetGenerator) Generate(_ interface{}) (interface{}, error)

type UniformFloatGenerator

type UniformFloatGenerator struct {
	R        *rand.Rand
	MinFloat float64
	MaxFloat float64
}

func (*UniformFloatGenerator) Generate

func (g *UniformFloatGenerator) Generate(_ interface{}) (interface{}, error)

type UniformIntFieldGenerator

type UniformIntFieldGenerator struct {
	R   *rand.Rand
	Min int64
	Max int64
}

UniformIntFieldGenerator generates random integers between Min and Max with a uniform distribution

func (*UniformIntFieldGenerator) Generate

func (g *UniformIntFieldGenerator) Generate(_ interface{}) (interface{}, error)

type UniformStringSourceFieldGenerator

type UniformStringSourceFieldGenerator struct {
	Source []string
	R      *rand.Rand
}

func (*UniformStringSourceFieldGenerator) Generate

func (g *UniformStringSourceFieldGenerator) Generate(_ interface{}) (interface{}, error)

type Warranty

type Warranty struct{}

Warranty implements Sourcer.

func (*Warranty) DefaultEndAt

func (w *Warranty) DefaultEndAt() uint64

DefaultEndAt sets the endAt record value for the case where one is not provided. It implements the Sourcer interface.

func (*Warranty) Info

func (w *Warranty) Info() string

Info describes what this implementation of Sourcer generates. It implements the Sourcer interface.

func (*Warranty) PrimaryKeyFields

func (w *Warranty) PrimaryKeyFields() []string

PrimaryKeyFields returns the fields from the schema which should be used as the index's primary key.

func (*Warranty) Source

func (w *Warranty) Source(cfg SourceConfig) idk.Source

Source returns an idk.Source which will generate records for a partition of the entire record space, determined by the concurrency value. It implements the Sourcer interface.

type WarrantySource

type WarrantySource struct {
	Log logger.Logger
	// contains filtered or unexported fields
}

func (*WarrantySource) Close

func (s *WarrantySource) Close() error

func (*WarrantySource) Record

func (s *WarrantySource) Record() (idk.Record, error)

func (*WarrantySource) Schema

func (s *WarrantySource) Schema() []idk.Field

func (*WarrantySource) Seed

func (s *WarrantySource) Seed(seed int64)

type ZipfianIntFieldGenerator

type ZipfianIntFieldGenerator struct {
	Z   *rand.Zipf
	Min int64
}

func (*ZipfianIntFieldGenerator) Generate

func (g *ZipfianIntFieldGenerator) Generate(_ interface{}) (interface{}, error)

type ZipfianStringSourceFieldGenerator

type ZipfianStringSourceFieldGenerator struct {
	Source []string
	Z      *rand.Zipf
}

func (*ZipfianStringSourceFieldGenerator) Generate

func (g *ZipfianStringSourceFieldGenerator) Generate(_ interface{}) (interface{}, error)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL