Documentation ¶
Overview ¶
Package dataset contains the qri ("query") dataset document definition This package contains the base definition, as well as a number of subpackages that build from this base to add functionality as necessary Datasets take inspiration from HTML documents, deliniating semantic purpose to predefined tags of the document, but instead of orienting around presentational markup, dataset documents emphasize interoperability and composition. The principle encoding format for a dataset document is JSON.
Alpha-Keys: Dataset documents are designed to produce consistent checksums when encoded for storage & transmission. To keep hashing consistent map keys are sorted lexographically for encoding. This applies to all fields of a dataset document except the body of a dataaset, where users may need to dictate the ordering of map keys
Pod ("Plain old Data") Pattern: To maintain high interoperability, dataset documents must support encoding & decoding ("coding", or "serialization") to and from many formats, fields of dataset documents that leverage "exotic" custom types are acommpanied by a "Plain Old Data" variant, denoted by a "Pod" suffix in their name Plain-Old-Data variants use only basic go types: string, bool, int, float64, []interface{}, etc. and have methods for clean encoding and decoding to their exotic forms
Index ¶
- Constants
- Variables
- func AbstractColumnName(i int) string
- func AccuralDuration(p string) time.Duration
- func ComponentTypePrefix(k Kind, str string) string
- func HashBytes(data []byte) (hash string, err error)
- func JSONHash(m json.Marshaler) (hash string, err error)
- type CSVOptions
- type Citation
- type Commit
- func (cm *Commit) Assign(msgs ...*Commit)
- func (cm *Commit) DropDerivedValues()
- func (cm *Commit) DropTransientValues()
- func (cm *Commit) IsEmpty() bool
- func (cm *Commit) MarshalJSON() ([]byte, error)
- func (cm *Commit) MarshalJSONObject() ([]byte, error)
- func (cm *Commit) UnmarshalJSON(data []byte) error
- type DataFormat
- type Dataset
- func (ds *Dataset) Assign(datasets ...*Dataset)
- func (ds *Dataset) BodyFile() qfs.File
- func (ds *Dataset) DropDerivedValues()
- func (ds *Dataset) DropTransientValues()
- func (ds *Dataset) IsEmpty() bool
- func (ds *Dataset) MarshalJSON() ([]byte, error)
- func (ds *Dataset) OpenBodyFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (ds *Dataset) PathMap(ignore ...string) map[string]string
- func (ds *Dataset) SetBodyFile(file qfs.File)
- func (ds *Dataset) SignableBytes() ([]byte, error)
- func (ds *Dataset) SigningBytes() []byte
- func (ds *Dataset) UnmarshalJSON(data []byte) error
- type FormatConfig
- type JSONOptions
- type Kind
- type License
- type Meta
- func (md *Meta) Assign(metas ...*Meta)
- func (md *Meta) DropDerivedValues()
- func (md *Meta) DropTransientValues()
- func (md *Meta) IsEmpty() bool
- func (md *Meta) MarshalJSON() ([]byte, error)
- func (md *Meta) MarshalJSONObject() ([]byte, error)
- func (md *Meta) Meta() map[string]interface{}
- func (md *Meta) Set(key string, val interface{}) (err error)
- func (md *Meta) SetArbitrary(key string, val interface{}) (err error)
- func (md *Meta) UnmarshalJSON(data []byte) error
- type Readme
- func (r *Readme) Assign(readmeConfigs ...*Readme)
- func (r *Readme) DropDerivedValues()
- func (r *Readme) DropTransientValues()
- func (r *Readme) InlineScriptFile(ctx context.Context, resolver qfs.PathResolver) error
- func (r *Readme) IsEmpty() bool
- func (r *Readme) MarshalJSON() ([]byte, error)
- func (r *Readme) MarshalJSONObject() ([]byte, error)
- func (r *Readme) OpenRenderedFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (r *Readme) OpenScriptFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (r *Readme) RenderedFile() qfs.File
- func (r *Readme) ScriptFile() qfs.File
- func (r *Readme) SetRenderedFile(file qfs.File)
- func (r *Readme) SetScriptFile(file qfs.File)
- func (r *Readme) ShallowCompare(b *Readme) bool
- func (r *Readme) UnmarshalJSON(data []byte) error
- type Stats
- type Structure
- func (s *Structure) Abstract() *Structure
- func (s *Structure) Assign(structures ...*Structure)
- func (s *Structure) DataFormat() DataFormat
- func (s *Structure) DropDerivedValues()
- func (s *Structure) DropTransientValues()
- func (s *Structure) Hash() (string, error)
- func (s *Structure) IsEmpty() bool
- func (s *Structure) JSONSchema() (*jsonschema.Schema, error)
- func (s Structure) MarshalJSON() (data []byte, err error)
- func (s Structure) MarshalJSONObject() ([]byte, error)
- func (s *Structure) RequiresTabularSchema() bool
- func (s *Structure) UnmarshalJSON(data []byte) (err error)
- type Theme
- type Transform
- func (q *Transform) Assign(qs ...*Transform)
- func (q *Transform) DropDerivedValues()
- func (q *Transform) DropTransientValues()
- func (q *Transform) InlineScriptFile(ctx context.Context, resolver qfs.PathResolver) error
- func (q *Transform) IsEmpty() bool
- func (q Transform) MarshalJSON() ([]byte, error)
- func (q Transform) MarshalJSONObject() ([]byte, error)
- func (q *Transform) OpenScriptFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (q *Transform) ScriptFile() qfs.File
- func (q *Transform) SetScriptFile(file qfs.File)
- func (q *Transform) ShallowCompare(b *Transform) bool
- func (q *Transform) UnmarshalJSON(data []byte) error
- type TransformResource
- type TransformStep
- type User
- type Viz
- func (v *Viz) Assign(visConfigs ...*Viz)
- func (v *Viz) DropDerivedValues()
- func (v *Viz) DropTransientValues()
- func (v *Viz) IsEmpty() bool
- func (v *Viz) MarshalJSON() ([]byte, error)
- func (v *Viz) MarshalJSONObject() ([]byte, error)
- func (v *Viz) OpenRenderedFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (v *Viz) OpenScriptFile(ctx context.Context, resolver qfs.PathResolver) (err error)
- func (v *Viz) RenderedFile() qfs.File
- func (v *Viz) ScriptFile() qfs.File
- func (v *Viz) SetRenderedFile(file qfs.File)
- func (v *Viz) SetScriptFile(file qfs.File)
- func (v *Viz) ShallowCompare(b *Viz) bool
- func (v *Viz) UnmarshalJSON(data []byte) error
- type XLSXOptions
Constants ¶
const ( // KindDataset is the current kind for datasets KindDataset = Kind("ds:" + CurrentSpecVersion) // KindBody is the current kind for body components KindBody = Kind("bd:" + CurrentSpecVersion) // KindMeta is the current kind for metadata components KindMeta = Kind("md:" + CurrentSpecVersion) // KindStructure is the current kind for structure components KindStructure = Kind("st:" + CurrentSpecVersion) // KindTransform is the current kind for transform components KindTransform = Kind("tf:" + CurrentSpecVersion) // KindCommit is the current kind for commit components KindCommit = Kind("cm:" + CurrentSpecVersion) // KindViz is the current kind for viz components KindViz = Kind("vz:" + CurrentSpecVersion) // KindReadme is the current kind for readme components KindReadme = Kind("rm:" + CurrentSpecVersion) // KindStats is the current kind for stats components KindStats = Kind("sa:" + CurrentSpecVersion) )
const CurrentSpecVersion = "0"
CurrentSpecVersion is the current verion of the dataset spec
Variables ¶
var ( // ErrNoBody occurs when a dataset has no body component, but one is expected ErrNoBody = fmt.Errorf("dataset has no body component") // ErrInlineBody is the error for attempting to generate a body file when // body data is stored as native go types ErrInlineBody = fmt.Errorf("dataset body is inlined") // ErrNoResolver is an error for missing-but-needed resolvers ErrNoResolver = fmt.Errorf("no resolver available to fetch path") )
var ( // BaseSchemaArray is a minimum schema to constitute a dataset, specifying // the top level of the document is an array BaseSchemaArray = map[string]interface{}{"type": "array"} // BaseSchemaObject is a minimum schema to constitute a dataset, specifying // the top level of the document is an object BaseSchemaObject = map[string]interface{}{"type": "object"} )
var ErrUnknownDataFormat = fmt.Errorf("Unknown Data Format")
ErrUnknownDataFormat is the expected error for when a data format is missing or unknown
Functions ¶
func AbstractColumnName ¶
AbstractColumnName is the "base26" value of a column name to make short, sql-valid, deterministic column names
func AccuralDuration ¶
AccuralDuration takes an ISO 8601 periodicity measure & returns a time.Duration invalid periodicities return time.Duration(0)
func ComponentTypePrefix ¶ added in v0.3.0
ComponentTypePrefix prefixes a string with a two letter component type identifier & a colon. Example: ComponentTypePrefix(KindDataset, "hello") == "ds:hello"
Types ¶
type CSVOptions ¶
type CSVOptions struct { // HeaderRow specifies weather this csv file has a header row or not HeaderRow bool `json:"headerRow"` // If LazyQuotes is true, a quote may appear in an unquoted field and a // non-doubled quote may appear in a quoted field. LazyQuotes bool `json:"lazyQuotes"` // Separator is the field delimiter. // It is set to comma (',') by NewReader. // Comma must be a valid rune and must not be \r, \n, // or the Unicode replacement character (0xFFFD). Separator rune `json:"separator,omitempty"` // VariadicFields sets permits records to have a variable number of fields // avoid using this VariadicFields bool `json:"variadicFields"` }
CSVOptions specifies configuration details for csv files This'll expand in the future to interoperate with okfn csv spec
func NewCSVOptions ¶
func NewCSVOptions(opts map[string]interface{}) (*CSVOptions, error)
NewCSVOptions creates a CSVOptions pointer from a map
func (*CSVOptions) Format ¶
func (*CSVOptions) Format() DataFormat
Format announces the CSV Data Format for the FormatConfig interface
func (*CSVOptions) Map ¶
func (o *CSVOptions) Map() map[string]interface{}
Map returns a map[string]interface representation of the configuration
type Citation ¶
type Citation struct { Name string `json:"name,omitempty"` URL string `json:"url,omitempty"` Email string `json:"email,omitempty"` }
Citation is a place that this dataset drew it's information from
type Commit ¶
type Commit struct { // Author of this commit Author *User `json:"author,omitempty"` // Message is an optional Message string `json:"message,omitempty"` // Path is the location of this commit, transient // derived Path string `json:"path,omitempty"` // Qri is this commit's qri kind // derived Qri string `json:"qri,omitempty"` // Signature is a base58 encoded privateKey signing of Title Signature string `json:"signature,omitempty"` // Time this dataset was created. Required. Timestamp time.Time `json:"timestamp"` // Title of the commit. Required. Title string `json:"title"` // RunID is only present if an automated script was executed durning the commit time // Commits with non-empty `RunID`s imply the existance of a transform component RunID string `json:"runID,omitempty"` }
Commit encapsulates information about changes to a dataset in relation to other entries in a given history. Commit is directly analogous to the concept of a Commit Message in the git version control system. A full commit defines the administrative metadata of a dataset, answering "who made this dataset, when, and why"
func NewCommitRef ¶
NewCommitRef creates an empty struct with it's internal path set
func UnmarshalCommit ¶
UnmarshalCommit tries to extract a dataset type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Commit) Assign ¶
Assign collapses all properties of a set of Commit onto one. this is directly inspired by Javascript's Object.assign
func (*Commit) DropDerivedValues ¶ added in v0.1.4
func (cm *Commit) DropDerivedValues()
DropDerivedValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Commit) DropTransientValues ¶
func (cm *Commit) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Commit) MarshalJSON ¶
MarshalJSON implements the json.Marshaler interface for Commit Empty Commit instances with a non-empty path marshal to their path value otherwise, Commit marshals to an object
func (*Commit) MarshalJSONObject ¶
MarshalJSONObject always marshals to a json Object, even if meta is empty or a reference
func (*Commit) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaller for Commit
type DataFormat ¶
type DataFormat int
DataFormat represents different types of data formats. formats specified here have some degree of support within the dataset packages TODO - consider placing this in a subpackage: dataformats
const ( // UnknownDataFormat is the default dataformat, meaning // that a data format should always be specified when // using the DataFormat type UnknownDataFormat DataFormat = iota // CSVDataFormat specifies comma separated value-formatted data CSVDataFormat // JSONDataFormat specifies Javascript Object Notation-formatted data JSONDataFormat // CBORDataFormat specifies RFC 7049 Concise Binary Object Representation // read more at cbor.io CBORDataFormat // XMLDataFormat specifies eXtensible Markup Language-formatted data // currently not supported. XMLDataFormat // XLSXDataFormat specifies microsoft excel formatted data XLSXDataFormat )
func ParseDataFormatString ¶
func ParseDataFormatString(s string) (df DataFormat, err error)
ParseDataFormatString takes a string representation of a data format TODO (b5): trim "." prefix, remove prefixed map keys
func SupportedDataFormats ¶
func SupportedDataFormats() []DataFormat
SupportedDataFormats gives a slice of data formats that are expected to work with this dataset package. As we work through support for different formats, the last step of providing full support to a format will be an addition to this slice
func (DataFormat) MarshalJSON ¶
func (f DataFormat) MarshalJSON() ([]byte, error)
MarshalJSON satisfies the json.Marshaler interface
func (DataFormat) String ¶
func (f DataFormat) String() string
String implements stringer interface for DataFormat
func (*DataFormat) UnmarshalJSON ¶
func (f *DataFormat) UnmarshalJSON(data []byte) error
UnmarshalJSON satisfies the json.Unmarshaler interface
type Dataset ¶
type Dataset struct { // Body represents dataset data with native go types. // Datasets have at most one body. Body, BodyBytes, and BodyPath // work together, often with only one field used at a time Body interface{} `json:"body,omitempty"` // BodyBytes is for representing dataset data as a slice of bytes BodyBytes []byte `json:"bodyBytes,omitempty"` // BodyPath is the path to the hash of raw data as it resolves on the network BodyPath string `json:"bodyPath,omitempty"` // Commit contains author & change message information that describes this // version of a dataset Commit *Commit `json:"commit,omitempty"` // ID is an identifier string for this dataset. ID string `json:"id,omitempty"` // Meta contains all human-readable meta about this dataset intended to aid // in discovery and organization of this document Meta *Meta `json:"meta,omitempty"` // name reference for this dataset, transient Name string `json:"name,omitempty"` // Location of this dataset, transient Path string `json:"path,omitempty"` // Peername of dataset owner, transient Peername string `json:"peername,omitempty"` // PreviousPath connects datasets to form a historical merkle-DAG of snapshots // of this document, creating a version history PreviousPath string `json:"previousPath,omitempty"` // ProfileID of dataset owner, transient ProfileID string `json:"profileID,omitempty"` // Readme is a path to the readme file for this dataset Readme *Readme `json:"readme,omitempty"` // Number of versions this dataset has, transient NumVersions int `json:"numVersions,omitempty"` // Qri is a key for both identifying this document type, and versioning the // dataset document definition itself. derived Qri string `json:"qri"` // Structure of this dataset Structure *Structure `json:"structure,omitempty"` // Stats is a component containing statistical metadata about the dataset body Stats *Stats `json:"stats,omitempty"` // Transform is a path to the transformation that generated this resource Transform *Transform `json:"transform,omitempty"` // Viz stores configuration data related to representing a dataset as // a visualization Viz *Viz `json:"viz,omitempty"` // contains filtered or unexported fields }
Dataset is a document for describing & storing structured data. Dataset documents are designed to satisfy the FAIR principle of being Findable, Accessible, Interoperable, and Reproducible, in relation to other dataset documents, and related-but-separate technologies such as data catalogs, HTTP API's, and data package formats Datasets are designed to be stored and distributed on content-addressed (identify-by-hash) systems The dataset document definition is built from a research-first principle, valuing direct interoperability with existing standards over novel definitions or specifications
func NewDatasetRef ¶
NewDatasetRef creates a Dataset pointer with the internal path property specified, and no other fields.
func UnmarshalDataset ¶
UnmarshalDataset tries to extract a dataset type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Dataset) Assign ¶
Assign collapses all properties of a group of datasets onto one. this is directly inspired by Javascript's Object.assign
func (*Dataset) BodyFile ¶
BodyFile exposes bodyFile if one is set. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Dataset) DropDerivedValues ¶ added in v0.1.4
func (ds *Dataset) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Dataset) DropTransientValues ¶
func (ds *Dataset) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs note that DropTransientValues does *not* drop the transient values of child components of a dataset, each component's DropTransientValues method must be called separately
func (*Dataset) IsEmpty ¶
IsEmpty checks to see if dataset has any fields other than the Path & Qri fields
func (*Dataset) MarshalJSON ¶
MarshalJSON uses a map to combine meta & standard fields. Marshalling a map[string]interface{} automatically alpha-sorts the keys.
func (*Dataset) OpenBodyFile ¶
OpenBodyFile sets the byte stream of file data, prioritizing: * erroring when the body is inline * creating an in-place file from bytes * passing BodyPath to the resolver once resolved, the file is set to an internal field, which is accessible via the BodyFile method. separating into two steps decouples loading from access
func (*Dataset) PathMap ¶ added in v0.3.0
PathMap converts all path references in a dataset into a map keyed by component name. Keys are only present in the map if the component exists on the dataset. Present components that do not have a path are represented by the empty string any components specified in ignore are omitted from the map
func (*Dataset) SetBodyFile ¶
SetBodyFile assigns the bodyFile.
func (*Dataset) SignableBytes ¶
SignableBytes produces the portion of a commit message used for signing the format for signable bytes is: * commit timestamp in nanosecond-RFC3339 format, UTC timezone * newline character * dataset structure checksum string checksum string should be a base58-encoded multihash of the dataset data DEPRECATED - use SigningBytes instead
func (*Dataset) SigningBytes ¶ added in v0.3.0
SigningBytes produces a set of bytes for signing to establish authorship of a dataset. The signing bytes is a newline-delimited, alpha-sorted list of components within the dataset, where each component is identified by a two letter prefix and a colon ':' character:
two_letter_component_type ':' component_value
the component value for all components except commit is the path of the component. For the commit component, the value is the value of commit.Timestamp in nanosecond-RFC3339 format, UTC timezone
When used in conjunction with a merkelizd filesystem path values are also content checksums. A signature of SigningBytes on a merkelized filesystem affirms time, author, and contents When used in with a mutable filesystem, SigningBytes is a weaker claim that only affirms time, author, and path values
func (*Dataset) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaller
type FormatConfig ¶
type FormatConfig interface { // Format gives the data format being configured Format() DataFormat // map gives an object of configuration details Map() map[string]interface{} }
FormatConfig is the interface for data format configurations
func NewXLSXOptions ¶
func NewXLSXOptions(opts map[string]interface{}) (FormatConfig, error)
NewXLSXOptions creates a XLSXOptions pointer from a map
func ParseFormatConfigMap ¶
func ParseFormatConfigMap(f DataFormat, opts map[string]interface{}) (FormatConfig, error)
ParseFormatConfigMap returns a FormatConfig implementation for a given data format and options map, often used in decoding from recorded formats like, say, JSON
type JSONOptions ¶
type JSONOptions struct {
Options map[string]interface{}
}
JSONOptions specifies configuration details for json file format
func NewJSONOptions ¶
func NewJSONOptions(opts map[string]interface{}) (*JSONOptions, error)
NewJSONOptions creates a JSONOptions pointer from a map
func (*JSONOptions) Format ¶
func (*JSONOptions) Format() DataFormat
Format announces the JSON Data Format for the FormatConfig interface
func (*JSONOptions) Map ¶
func (o *JSONOptions) Map() map[string]interface{}
Map returns a map[string]interface representation of the configuration
type Kind ¶
type Kind string
Kind is a short identifier for all types of qri dataset objects Kind does three things: 1. Distinguish qri datasets from other formats 2. Distinguish different types (Dataset/Structure/Transform/etc.) 3. Distinguish between versions of the dataset spec Kind is a string in the format 2_letter_prefix + ':' + version
func (*Kind) UnmarshalJSON ¶
UnmarshalJSON implements the JSON.Unmarshaler interface, rejecting any strings that are not a valid kind
type Meta ¶
type Meta struct { // Url to access the dataset AccessURL string `json:"accessURL,omitempty"` // The frequency with which dataset changes. Must be an ISO 8601 repeating // duration AccrualPeriodicity string `json:"accrualPeriodicity,omitempty"` // Citations is a slice of assets used to build this dataset Citations []*Citation `json:"citations"` // Contribute Contributors []*User `json:"contributors,omitempty"` // Description follows the DCAT sense of the word, it should be around a // paragraph of human-readable text Description string `json:"description,omitempty"` // Url that should / must lead directly to the data itself DownloadURL string `json:"downloadURL,omitempty"` // HomeURL is a path to a "home" resource HomeURL string `json:"homeURL,omitempty"` // Identifier is for *other* data catalog specifications. Identifier should // not be used or relied on to be unique, because this package does not // enforce any of these rules. Identifier string `json:"identifier,omitempty"` // String of Keywords Keywords []string `json:"keywords,omitempty"` // Languages this dataset is written in Language []string `json:"language,omitempty"` // License will automatically parse to & from a string value if provided as a // raw string License *License `json:"license,omitempty"` // path is the location of meta, transient // derived Path string `json:"path,omitempty"` // Kind is required, must be qri:md:[version] // derived Qri string `json:"qri,omitempty"` // path to dataset readme file, not part of the DCAT spec, but a common // convention in software dev ReadmeURL string `json:"readmeURL,omitempty"` // Title of this dataset Title string `json:"title,omitempty"` // "Category" for Theme []string `json:"theme,omitempty"` // Version is the version identifier for this dataset Version string `json:"version,omitempty"` // contains filtered or unexported fields }
Meta contains human-readable descriptive metadata that qualifies and distinguishes a dataset. Well-defined Meta should aid in making datasets Findable by describing a dataset in generalizable taxonomies that can aggregate across other dataset documents. Because dataset documents are intended to interoperate with many other data storage and cataloging systems, meta fields and conventions are derived from existing metadata formats whenever possible
func NewMetaRef ¶
NewMetaRef creates a Meta pointer with the internal path property specified, and no other fields.
func (*Meta) Assign ¶
Assign collapses all properties of a group of metadata structs onto one. this is directly inspired by Javascript's Object.assign
func (*Meta) DropDerivedValues ¶ added in v0.1.4
func (md *Meta) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Meta) DropTransientValues ¶
func (md *Meta) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Meta) MarshalJSON ¶
MarshalJSON uses a map to combine meta & standard fields. Marshalling a map[string]interface{} automatically alpha-sorts the keys.
func (*Meta) MarshalJSONObject ¶
MarshalJSONObject always marshals to a json Object, even if meta is empty or a reference
func (*Meta) Set ¶
Set writes value to key in metadata, erroring if the type is invalid input values are expected to be json.Unmarshal types
func (*Meta) SetArbitrary ¶
SetArbitrary is for implementing the ArbitrarySetter interface defined by base/fill_struct.go
func (*Meta) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaller
type Readme ¶ added in v0.2.0
type Readme struct { // Format designates the visualization configuration syntax. Only supported // formats are "html" and "md" Format string `json:"format,omitempty"` // Path is the location of a readme, transient // derived Path string `json:"path,omitempty"` // Qri should always be "rm:0" // derived Qri string `json:"qri,omitempty"` // ScriptBytes is for representing a script as a slice of bytes, transient ScriptBytes []byte `json:"scriptBytes,omitempty"` // ScriptPath is the path to the script that created this ScriptPath string `json:"scriptPath,omitempty"` // RenderedPath is the path to the file rendered using the readme script and the body RenderedPath string `json:"renderedPath,omitempty"` // contains filtered or unexported fields }
Readme stores configuration data related to representing a dataset as a visualization
func NewReadmeRef ¶ added in v0.2.0
NewReadmeRef creates an empty struct with it's internal path set
func UnmarshalReadme ¶ added in v0.2.0
UnmarshalReadme tries to extract a resource type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Readme) Assign ¶ added in v0.2.0
Assign collapses all properties of a group of structures on to one this is directly inspired by Javascript's Object.assign
func (*Readme) DropDerivedValues ¶ added in v0.2.0
func (r *Readme) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Readme) DropTransientValues ¶ added in v0.2.0
func (r *Readme) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Readme) InlineScriptFile ¶ added in v0.2.0
InlineScriptFile opens the script file, reads its contents, and assigns it to scriptBytes.
func (*Readme) IsEmpty ¶ added in v0.2.0
IsEmpty checks to see if Readme has any fields other than the internal path
func (*Readme) MarshalJSON ¶ added in v0.2.0
MarshalJSON satisfies the json.Marshaler interface
func (*Readme) MarshalJSONObject ¶ added in v0.2.0
MarshalJSONObject always marshals to a json Object, even if Readme is empty or a reference
func (*Readme) OpenRenderedFile ¶ added in v0.2.0
OpenRenderedFile generates a byte stream of the rendered data
func (*Readme) OpenScriptFile ¶ added in v0.2.0
OpenScriptFile generates a byte stream of script data prioritizing creating an in-place file from ScriptBytes when defined, fetching from the passed-in resolver otherwise
func (*Readme) RenderedFile ¶ added in v0.2.0
RenderedFile exposes renderedFile if one is set. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Readme) ScriptFile ¶ added in v0.2.0
ScriptFile exposes scriptFile if one is set. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Readme) SetRenderedFile ¶ added in v0.2.0
SetRenderedFile assigns the unexported renderedFile
func (*Readme) SetScriptFile ¶ added in v0.2.0
SetScriptFile assigns the unexported scriptFile
func (*Readme) ShallowCompare ¶ added in v0.3.0
ShallowCompare is an equality check that ignores Path values Intended for comparing components across different persistence states, ShallowCompare returns true if all exported fields in the component have the same value (with the exception of Path). ShallowCompare does not consider scriptFile or renderedFile
func (*Readme) UnmarshalJSON ¶ added in v0.2.0
UnmarshalJSON satisfies the json.Unmarshaler interface
type Stats ¶ added in v0.3.0
type Stats struct { Path string `json:"path,omitempty"` Qri string `json:"qri,omitempty"` Stats interface{} `json:"stats,omitempty"` }
Stats is a component that contains statistical metadata about the body of a dataset
func NewStatsRef ¶ added in v0.3.0
NewStatsRef creates an empty struct with it's path set
func (*Stats) Assign ¶ added in v0.3.0
Assign collapses all properties of a group of Stats components onto one
func (*Stats) DropDerivedValues ¶ added in v0.3.0
func (sa *Stats) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Stats) IsEmpty ¶ added in v0.3.0
IsEmpty checks to see if stats has any fields other than Path set
func (Stats) MarshalJSON ¶ added in v0.3.0
MarshalJSON satisfies the json.Marshaler interface
func (Stats) MarshalJSONObject ¶ added in v0.3.0
MarshalJSONObject always marshals to a json Object, even if Stats is empty or a reference
func (*Stats) UnmarshalJSON ¶ added in v0.3.0
UnmarshalJSON satisfies the json.Unmarshaler interface
type Structure ¶
type Structure struct { // Checksum is a bas58-encoded multihash checksum of the entire data // file this structure points to. This is different from IPFS // hashes, which are calculated after breaking the file into blocks // derived Checksum string `json:"checksum,omitempty"` // Compression specifies any compression on the source data, // if empty assume no compression Compression string `json:"compression,omitempty"` // Maximum nesting level of composite types in the dataset. // eg: depth 1 == [], depth 2 == [[]] // derived Depth int `json:"depth,omitempty"` // Encoding specifics character encoding, assume utf-8 if not specified Encoding string `json:"encoding,omitempty"` // ErrCount is the number of errors returned by validating data // against this schema. required // derived ErrCount int `json:"errCount,omitempty"` // Entries is number of top-level entries in the dataset. With tablular data // this is the same as the number of "rows" // derived Entries int `json:"entries,omitempty"` // Format specifies the format of the raw data MIME type Format string `json:"format"` // FormatConfig removes as much ambiguity as possible about how // to interpret the speficied format. // FormatConfig FormatConfig `json:"formatConfig,omitempty"` FormatConfig map[string]interface{} `json:"formatConfig,omitempty"` // Length is the length of the data object in bytes. // must always match & be present // derived Length int `json:"length,omitempty"` // location of this structure, transient // derived Path string `json:"path,omitempty"` // Qri should always be KindStructure // derived Qri string `json:"qri"` // Schema contains the schema definition for the underlying data, schemas // are defined using the IETF json-schema specification. for more info // on json-schema see: https://json-schema.org Schema map[string]interface{} `json:"schema,omitempty"` // Strict requires schema validation to pass without error. Datasets with // strict: true can have additional functionality and performance speedups // that comes with being able to assume that all data is valid Strict bool `json:"strict,omitempty"` }
Structure defines the characteristics of a dataset document necessary for a machine to interpret the dataset body. Structure fields are things like the encoding data format (JSON,CSV,etc.), length of the dataset body in bytes, stored in a rigid form intended for machine use. A well defined structure & accompanying software should allow the end user to spend more time focusing on the data itself Two dataset documents that both have a defined structure will have some degree of natural interoperability, depending first on the amount of detail provided in a dataset's structure, and then by the natural comparibilty of the datasets
func NewStructureRef ¶
NewStructureRef creates an empty struct with it's internal path set
func UnmarshalStructure ¶
UnmarshalStructure tries to extract a structure type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Structure) Abstract ¶
Abstract returns this structure instance in it's "Abstract" form stripping all nonessential values & renaming all schema field names to standard variable names
func (*Structure) Assign ¶
Assign collapses all properties of a group of structures on to one this is directly inspired by Javascript's Object.assign
func (*Structure) DataFormat ¶
func (s *Structure) DataFormat() DataFormat
DataFormat gives format as a DataFormat type, returning UnknownDataFormat in any case where st.DataFormat is an invalid string
func (*Structure) DropDerivedValues ¶ added in v0.1.4
func (s *Structure) DropDerivedValues()
DropDerivedValues resets all derived fields to their default values
func (*Structure) DropTransientValues ¶
func (s *Structure) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Structure) IsEmpty ¶
IsEmpty checks to see if structure has any fields other than the internal path
func (*Structure) JSONSchema ¶
func (s *Structure) JSONSchema() (*jsonschema.Schema, error)
JSONSchema parses the Schema field into a json-schema
func (Structure) MarshalJSON ¶
MarshalJSON satisfies the json.Marshaler interface
func (Structure) MarshalJSONObject ¶
MarshalJSONObject always marshals to a json Object, even if meta is empty or a reference
func (*Structure) RequiresTabularSchema ¶ added in v0.2.0
RequiresTabularSchema returns true if the structure's specified data format requires a JSON schema that describes a rectangular data shape
func (*Structure) UnmarshalJSON ¶
UnmarshalJSON satisfies the json.Unmarshaler interface
type Theme ¶
type Theme struct { Description string `json:"description,omitempty"` DisplayName string `json:"display_name,omitempty"` ImageDisplayURL string `json:"image_display_url,omitempty"` ID string `json:"id,omitempty"` Name string `json:"name,omitempty"` Title string `json:"title,omitempty"` }
Theme is pulled from the Project Open Data Schema version 1.1
type Transform ¶
type Transform struct { // Config outlines any configuration that would affect the resulting hash Config map[string]interface{} `json:"config,omitempty"` // location of the transform object, transient Path string `json:"path,omitempty"` // Kind should always equal KindTransform Qri string `json:"qri,omitempty"` // Resources is a map of all datasets referenced in this transform, with // alphabetical keys generated by datasets in order of appearance within the // transform Resources map[string]*TransformResource `json:"resources,omitempty"` // ScriptBytes is for representing a script as a slice of bytes, transient // Deprecated - use Steps instead ScriptBytes []byte `json:"scriptBytes,omitempty"` // ScriptPath is the path to the script that produced this transformation. // Deprecated - use Steps instead ScriptPath string `json:"scriptPath,omitempty"` // Secrets is a map of secret values used in the transformation, transient. // TODO (b5): make this not-transient by censoring the values used, but not keys Secrets map[string]string `json:"secrets,omitempty"` Steps []*TransformStep `json:"steps,omitempty"` // Syntax this transform was written in // Deprecated - syntax is defined per-step Syntax string `json:"syntax,omitempty"` // SyntaxVersion is an identifier for the application and version number that // produced the result // Deprecated - use steps.Syntax with a version suffix instead SyntaxVersion string `json:"syntaxVersion,omitempty"` // map of syntaxes used in this transform to their version identifier. Syntaxes map[string]string `json:"syntaxes,omitempty"` // contains filtered or unexported fields }
Transform is a record of executing a transformation on data. Transforms can theoretically be anything from an SQL query, a jupyter notebook, the state of an ETL pipeline, etc, so long as the input is zero or more datasets, and the output is a single dataset Ideally, transforms should contain all the machine-necessary bits to deterministicly execute the algorithm referenced in "ScriptPath".
func NewTransformRef ¶
NewTransformRef creates a Transform pointer with the internal path property specified, and no other fields.
func UnmarshalTransform ¶
UnmarshalTransform tries to extract a resource type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Transform) Assign ¶
Assign collapses all properties of a group of queries onto one. this is directly inspired by Javascript's Object.assign
func (*Transform) DropDerivedValues ¶ added in v0.1.4
func (q *Transform) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Transform) DropTransientValues ¶
func (q *Transform) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Transform) InlineScriptFile ¶ added in v0.2.0
InlineScriptFile opens the script file, reads its contents, and assigns it to scriptBytes
func (*Transform) IsEmpty ¶
IsEmpty checks to see if transform has any fields other than the internal path
func (Transform) MarshalJSON ¶
MarshalJSON satisfies the json.Marshaler interface
func (Transform) MarshalJSONObject ¶
MarshalJSONObject always marshals to a json Object, even if meta is empty or a reference
func (*Transform) OpenScriptFile ¶
OpenScriptFile generates a byte stream of script data prioritizing creating an in-place file from ScriptBytes when defined, fetching from the passed-in resolver otherwise
func (*Transform) ScriptFile ¶
ScriptFile gives the internal file, if any. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Transform) SetScriptFile ¶
SetScriptFile assigns the scriptFile
func (*Transform) ShallowCompare ¶ added in v0.3.0
ShallowCompare is an equality check that ignores Path values Intended for comparing transform components across different persistence states, ShallowCompare returns true if all exported fields in the component have the same value (with the exception of Path). ShallowCompare does not consider scriptFile or renderedFile
func (*Transform) UnmarshalJSON ¶
UnmarshalJSON satisfies the json.Unmarshaler interface
type TransformResource ¶
type TransformResource struct {
Path string `json:"path"`
}
TransformResource describes an external data dependency, the prime use case is for importing other datasets, but in the future this may be expanded to include details that specify resources other than datasets (urls?), and details for interpreting the resource (eg. a selector to specify only a subset of a resource is required)
func (*TransformResource) UnmarshalJSON ¶
func (r *TransformResource) UnmarshalJSON(data []byte) error
UnmarshalJSON implements json.Unmarshaler, allowing both string and object representations
type TransformStep ¶ added in v0.3.0
type TransformStep struct { Name string `json:"name"` // human-readable name for the step, used for display purposes Path string `json:"path,omitempty"` // path to this step if persisted separately on disk Syntax string `json:"syntax"` // execution environment, eg: "starlark", "qri-sql" Category string `json:"category"` // syntax-specific sub-typing Script interface{} `json:"script"` // input text for transform step. often code, or a query. }
TransformStep is a unit of operation in a transform script
type User ¶
type User struct { ID string `json:"id,omitempty"` Fullname string `json:"name,omitempty"` Email string `json:"email,omitempty"` }
User is a placholder for talking about people, groups, organizations
type Viz ¶
type Viz struct { // Format designates the visualization configuration syntax. currently the // only supported syntax is "html" Format string `json:"format,omitempty"` // Path is the location of a viz, transient // derived Path string `json:"path,omitempty"` // Qri should always be "vc:0" // derived Qri string `json:"qri,omitempty"` // ScriptBytes is for representing a script as a slice of bytes, transient ScriptBytes []byte `json:"scriptBytes,omitempty"` // ScriptPath is the path to the script that created this ScriptPath string `json:"scriptPath,omitempty"` // RenderedPath is the path to the file rendered using the viz script and the body RenderedPath string `json:"renderedPath,omitempty"` // contains filtered or unexported fields }
Viz stores configuration data related to representing a dataset as a visualization
func UnmarshalViz ¶
UnmarshalViz tries to extract a resource type from an empty interface. Pairs nicely with datastore.Get() from github.com/ipfs/go-datastore
func (*Viz) Assign ¶
Assign collapses all properties of a group of structures on to one this is directly inspired by Javascript's Object.assign
func (*Viz) DropDerivedValues ¶ added in v0.1.4
func (v *Viz) DropDerivedValues()
DropDerivedValues resets all set-on-save fields to their default values
func (*Viz) DropTransientValues ¶
func (v *Viz) DropTransientValues()
DropTransientValues removes values that cannot be recorded when the dataset is rendered immutable, usually by storing it in a cafs
func (*Viz) MarshalJSON ¶
MarshalJSON satisfies the json.Marshaler interface
func (*Viz) MarshalJSONObject ¶
MarshalJSONObject always marshals to a json Object, even if Viz is empty or a reference
func (*Viz) OpenRenderedFile ¶
OpenRenderedFile generates a byte stream of the rendered data
func (*Viz) OpenScriptFile ¶
OpenScriptFile generates a byte stream of script data prioritizing creating an in-place file from ScriptBytes when defined, fetching from the passed-in resolver otherwise
func (*Viz) RenderedFile ¶
RenderedFile exposes renderedFile if one is set. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Viz) ScriptFile ¶
ScriptFile exposes scriptFile if one is set. Callers that use the file in any way (eg. by calling Read) should consume the entire file and call Close
func (*Viz) SetRenderedFile ¶
SetRenderedFile assigns the unexported renderedFile
func (*Viz) SetScriptFile ¶
SetScriptFile assigns the unexported scriptFile
func (*Viz) ShallowCompare ¶ added in v0.3.0
ShallowCompare is an equality check that ignores Path values Intended for comparing viz components across different persistence states, ShallowCompare returns true if all exported fields in the component have the same value (with the exception of Path). ShallowCompare does not consider scriptFile or renderedFile
func (*Viz) UnmarshalJSON ¶
UnmarshalJSON satisfies the json.Unmarshaler interface
type XLSXOptions ¶
type XLSXOptions struct {
SheetName string `json:"sheetName,omitempty"`
}
XLSXOptions specifies configuraiton details for the xlsx file format
func (*XLSXOptions) Format ¶
func (*XLSXOptions) Format() DataFormat
Format announces the XLSX data format for the FormatConfig interface
func (*XLSXOptions) Map ¶
func (o *XLSXOptions) Map() map[string]interface{}
Map structures XLSXOptions as a map of string keys to values
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package compression is a horrible hack & should be replaced as soon as humanly possible
|
Package compression is a horrible hack & should be replaced as soon as humanly possible |
Package dsgraph is a placeholder package for linking queries, resources, and metadata until proper packaging & architectural decisions can be made
|
Package dsgraph is a placeholder package for linking queries, resources, and metadata until proper packaging & architectural decisions can be made |
Package dsio defines writers & readers for operating on "container" data structures (objects and arrays)
|
Package dsio defines writers & readers for operating on "container" data structures (objects and arrays) |
replacecr
Package replacecr defines a wrapper for replacing solo carriage return characters (\r) with carriage-return + line feed (\r\n)
|
Package replacecr defines a wrapper for replacing solo carriage return characters (\r) with carriage-return + line feed (\r\n) |
Package dsstats calculates statistical metadata for a given dataset
|
Package dsstats calculates statistical metadata for a given dataset |
histosketch
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).
|
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). |
Package dstest defines an interface for reading test cases from static files leveraging directories of test dataset input files & expected output files
|
Package dstest defines an interface for reading test cases from static files leveraging directories of test dataset input files & expected output files |
Package dsviz renders the viz component of a dataset, returning a qfs.File of data HTML rendering uses go's html/template package to generate html documents from an input dataset.
|
Package dsviz renders the viz component of a dataset, returning a qfs.File of data HTML rendering uses go's html/template package to generate html documents from an input dataset. |
Package generate is for generating random data from given structures
|
Package generate is for generating random data from given structures |
Package tabular defines functions for working with rectangular datasets.
|
Package tabular defines functions for working with rectangular datasets. |