files

package
v0.48.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2024 License: MIT Imports: 34 Imported by: 0

Documentation

Overview

Package files contains functionality for dealing with files, including remote files (e.g. HTTP). The files.Files type is the central API for interacting with files.

Index

Constants

This section is empty.

Variables

View Source
var (
	OptHTTPRequestTimeout = options.NewDuration(
		"http.request.timeout",
		nil,
		time.Second*10,
		"HTTP/S request initial response timeout duration",
		`How long to wait for initial response from a HTTP/S endpoint before timeout
occurs. Reading the body of the response, such as a large HTTP file download,
is not affected by this option. Example: 500ms or 3s.

Contrast with http.response.timeout.`,
		options.TagSource,
	)
	OptHTTPResponseTimeout = options.NewDuration(
		"http.response.timeout",
		nil,
		0,
		"HTTP/S request completion timeout duration",
		`How long to wait for the entire HTTP transaction to complete. This includes
reading the body of the response, such as a large HTTP file download. Typically
this is set to 0, indicating no timeout.

Contrast with http.request.timeout.`,
		options.TagSource,
	)
	OptHTTPSInsecureSkipVerify = options.NewBool(
		"https.insecure-skip-verify",
		nil,
		false,
		"Skip HTTPS TLS verification",
		"Skip HTTPS TLS verification. Useful when downloading against self-signed certs.",
		options.TagSource,
	)
	OptDownloadContinueOnError = downloader.OptContinueOnError
	OptDownloadCache           = downloader.OptCache
)
View Source
var OptCacheLockTimeout = options.NewDuration(
	"cache.lock.timeout",
	nil,
	time.Second*5,
	"Wait timeout to acquire cache lock",
	`Wait timeout to acquire cache lock. During this period, retry will occur
if the lock is already held by another process. If zero, no retry occurs.`,
)

OptCacheLockTimeout is the time allowed to acquire a cache lock.

See also: driver.OptIngestCache.

Functions

func DefaultCacheDir

func DefaultCacheDir() (dir string)

DefaultCacheDir returns the sq cache dir. This is generally in USER_CACHE_DIR/*/sq, but could also be in TEMP_DIR/*/sq/cache or similar. It is not guaranteed that the returned dir exists or is accessible.

func DefaultTempDir

func DefaultTempDir() (dir string)

DefaultTempDir returns the default sq temp dir. It is not guaranteed that the returned dir exists or is accessible.

func DetectMagicNumber

func DetectMagicNumber(ctx context.Context, newRdrFn NewReaderFunc,
) (detected drivertype.Type, score float32, err error)

DetectMagicNumber is a TypeDetectFunc that detects the "magic number" from the start of files.

Types

type Files

type Files struct {
	// contains filtered or unexported fields
}

Files is the centralized API for interacting with files. It provides a uniform mechanism for reading files, whether from local disk, stdin, or remote HTTP.

func New

func New(ctx context.Context, optReg *options.Registry, cfgLock lockfile.LockFunc,
	tmpDir, cacheDir string,
) (*Files, error)

New returns a new Files instance. The caller must invoke Files.Close when done with the instance.

func (*Files) AddDriverDetectors

func (fs *Files) AddDriverDetectors(detectFns ...TypeDetectFunc)

AddDriverDetectors adds driver type detectors.

func (*Files) AddStdin

func (fs *Files) AddStdin(ctx context.Context, f *os.File) error

AddStdin copies f to fs's cache: the stdin data in f is later accessible via Files.NewReader(src) where src.Handle is source.StdinHandle; f's type can be detected via DetectStdinType.

func (*Files) CacheClearAll

func (fs *Files) CacheClearAll(ctx context.Context) error

CacheClearAll clears the entire cache dir. Note that this operation is distinct from Files.doCacheSweep.

func (*Files) CacheClearSource

func (fs *Files) CacheClearSource(ctx context.Context, src *source.Source, clearDownloads bool) error

CacheClearSource clears the ingest cache for src. If arg downloads is true, the source's download dir is also cleared. The caller should typically first acquire the cache lock for src via Files.cacheLockFor.

func (*Files) CacheDir

func (fs *Files) CacheDir() string

CacheDir returns the cache dir. It is not guaranteed that the returned dir exists.

func (*Files) CacheDirFor

func (fs *Files) CacheDirFor(src *source.Source) (dir string, err error)

CacheDirFor gets the cache dir for handle. It is not guaranteed that the returned dir exists or is accessible.

func (*Files) CacheLockAcquire

func (fs *Files) CacheLockAcquire(ctx context.Context, src *source.Source) (unlock func(), err error)

CacheLockAcquire acquires the cache lock for src. The caller must invoke the returned unlock func.

func (*Files) CachePaths

func (fs *Files) CachePaths(src *source.Source) (srcCacheDir, cacheDB, checksums string, err error)

CachePaths returns the paths to the cache files for src. There is no guarantee that these files exist, or are accessible. It's just the paths.

func (*Files) CachedBackingSourceFor

func (fs *Files) CachedBackingSourceFor(ctx context.Context, src *source.Source) (backingSrc *source.Source,
	ok bool, err error,
)

CachedBackingSourceFor returns the underlying backing source for src, if it exists. If it does not exist, ok returns false.

func (*Files) Close

func (fs *Files) Close() error

Close closes any open resources.

func (*Files) CreateTemp

func (fs *Files) CreateTemp(pattern string, clean bool) (*os.File, error)

CreateTemp creates a new temporary file in fs's temp dir with the given filename pattern, as per the os.CreateTemp docs. If arg clean is true, the file is added to the cleanup sequence invoked by fs.Close. It is the callers responsibility to close the returned file.

func (*Files) DetectStdinType

func (fs *Files) DetectStdinType(ctx context.Context) (drivertype.Type, error)

DetectStdinType detects the type of stdin as previously added by AddStdin. An error is returned if AddStdin was not first invoked. If the type cannot be detected, TypeNone and nil are returned.

func (*Files) DetectType

func (fs *Files) DetectType(ctx context.Context, handle, loc string) (drivertype.Type, error)

DetectType returns the driver type of loc. This may result in loading files into the cache.

func (*Files) Filesize

func (fs *Files) Filesize(ctx context.Context, src *source.Source) (size int64, err error)

Filesize returns the file size of src.Location. If the source is being ingested asynchronously, this function may block until loading completes. An error is returned if src is not a document/file source.

func (*Files) NewBuffer added in v0.48.0

func (fs *Files) NewBuffer() ioz.Buffer

NewBuffer returns a new ioz.Buffer instance which may be in-memory or on-disk, or both, for use as a temporary buffer for potentially large data that may not fit in memory. The caller MUST invoke ioz.Buffer.Close on the returned buffer when done.

func (*Files) NewReader

func (fs *Files) NewReader(ctx context.Context, src *source.Source, ingesting bool) (io.ReadCloser, error)

NewReader returns a new io.ReadCloser for src.Location. Arg ingesting is a performance hint that indicates that the reader is being used to ingest data (as opposed to, say, sampling the data for type detection). It's an error to invoke NewReader for a src after having invoked it for the same src with ingesting=true.

If src.Handle is StdinHandle, AddStdin must first have been invoked.

The caller must close the reader.

func (*Files) Ping

func (fs *Files) Ping(ctx context.Context, src *source.Source) error

Ping implements a ping mechanism for document sources (local or remote files).

func (*Files) TempDir

func (fs *Files) TempDir() string

TempDir returns the temp dir. It is not guaranteed that the returned dir exists.

func (*Files) WriteIngestChecksum

func (fs *Files) WriteIngestChecksum(ctx context.Context, src, backingSrc *source.Source) (err error)

WriteIngestChecksum is invoked (after successful ingestion) to write the checksum of the source document file vs the ingest DB. Thus, if the source document changes, the checksum will no longer match, and the ingest DB will be considered invalid.

type NewReaderFunc

type NewReaderFunc func(ctx context.Context) (io.ReadCloser, error)

NewReaderFunc returns a func that returns an io.ReadCloser. The caller is responsible for closing the returned io.ReadCloser.

type TypeDetectFunc

type TypeDetectFunc func(ctx context.Context, newRdrFn NewReaderFunc) (
	detected drivertype.Type, score float32, err error)

TypeDetectFunc interrogates a byte stream to determine the source driver type. A score is returned indicating the confidence that the driver type has been detected. A score <= 0 is failure, a score >= 1 is success; intermediate values indicate some level of confidence. An error is returned only if an IO problem occurred. The implementation gets access to the byte stream by invoking newRdrFn, and is responsible for closing any reader it opens.

Directories

Path Synopsis
internal
downloader
Package downloader provides a mechanism for getting files from HTTP/S URLs, making use of a mostly RFC-compliant cache.
Package downloader provides a mechanism for getting files from HTTP/S URLs, making use of a mostly RFC-compliant cache.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL