Documentation ¶
Overview ¶
Package oxfordflowers102 provides tools to download and cache the dataset and a `train.Dataset` implementation that can be used to train models using GoMLX (http://github.com/gomlx/gomlx/).
Details in the README.md file. The dataset's home page is in https://www.robots.ox.ac.uk/~vgg/data/flowers/102/
Usage example:
Index ¶
- Variables
- func DownloadAndParse(baseDir string) error
- func InMemoryDataset(manager *Manager, baseDir string, imageSize int, name string, ...) (inMemoryDataset *data.InMemoryDataset, err error)
- func ParseImages(dirPath string) error
- func ParseLabels(filePath string) error
- func ReadExample(idx int) (img image.Image, label int32, err error)
- type Dataset
Constants ¶
This section is empty.
Variables ¶
var ( DownloadBaseURL = "https://www.robots.ox.ac.uk/~vgg/data/flowers/102/" DownloadSubdir = "downloads" DownloadFilesAndChecksums = []struct { File, Checksum, UntarDir string }{ {"102flowers.tgz", "", "jpg"}, {"imagelabels.mat", "4903e94206bac23bf772aadf06451916df56b58fc483a62db32a97b82656651d", ""}, {"setid.mat", "46b8678f91fd95d3c8f4feab80d271a6c834a1dd896fe29fd3e6ad9ce5c8dccd", ""}, } )
var ( // AllLabels of the dataset. Converted to 0-based (0 to 101). // Only available after DownloadAndParse is successfully called. AllLabels []int32 // AllImages of the dataset, the path to the images that is. // Only available after DownloadAndParse is successfully called. AllImages []string // NumExamples is the number of examples (images and labels) in the dataset. // Only available after DownloadAndParse is successfully called. NumExamples int // ImagesDir where images are stored. Only available after DownloadAndParse is // successfully called. ImagesDir string // NumLabels is 102, hence the name. NumLabels = 102 // Names of all the 102 flowers in the dataset. Names = []string{}/* 102 elements not displayed */ )
Functions ¶
func DownloadAndParse ¶
DownloadAndParse "Oxford Flowers 102" dataset files to baseDir and untar it. If files are already downloaded, their previous copy is used.
After download, the contents of the files are parsed, and the global AllLabels is set.
func InMemoryDataset ¶
func InMemoryDataset(manager *Manager, baseDir string, imageSize int, name string, partitionSeed int64, partitionFrom, partitionTo float64) ( inMemoryDataset *data.InMemoryDataset, err error)
InMemoryDataset creates a `data.InMemoryDataset` with the Oxford Flowers 102, of the given `imageSize` for both, height and width -- image is resized and then cropped at the center.
A cache version is automatically saved at the `baseDir` and prefixed with `name`, if it is not empty. And if a cache file is found, it is used, instead of re-reading and processing all the images.
It takes a partition of the data, defined by `partitionFrom` and `partitionTo`. They take values from 0.0 to 1.0 and represent the fraction of the dataset to take. They enable selection of arbitrary train/validation/test sizes. The `partitionSeed` can be used to generate different assignments -- the same seed should be used for the different partitions of the dataset.
If the cache is not found, it automatically calls DownloadAndParse to download and untar the original images, if they are not yet downloaded.
func ParseImages ¶
func ParseLabels ¶
Types ¶
type Dataset ¶
type Dataset struct {
// contains filtered or unexported fields
}
Dataset implements train.Dataset, and yields one image at a time. It pre-transforms the image to the target `imageSize`.
func NewDataset ¶
NewDataset returns a Dataset for one epoch that yields one image at time. It reads them from disk, and the parsing can be parallelized. See `data.NewParallelDataset`.
The images are resized and cropped to `imageSize x imageSize` pixel, cut from the middle.
It doesn't support batch, but you can use GoMLX's `data.Batch` for that.
func (*Dataset) Partition ¶ added in v0.4.0
Partition allows one to partition the dataset into different parts -- typically "train", "validation" and "test". This should be called before the start of an epoch.
It takes a seed number based on which the partitions will be selected, and the range of elements specified as `from` and `to`: these are float values that represent the slice (from 0.0 to 1.0) of the examples that go into this dataset.
Example:
seed := int64(42) dsTrain := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0, 0.8) // 80% dsValid := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.8, 0.9) // 10% dsTest := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.9, 1.0) // 10%
func (*Dataset) Shuffle ¶
Shuffle will shuffle the order of the images. This should be called before the start of an epoch.
Once shuffled, every time the dataset is reset, it is reshuffled.
func (*Dataset) Yield ¶
Yield implements train.Dataset interface. It returns `ds` (the Dataset pointer) as spec.
It yields one example at a time, each consists of:
- `inputs`: three values: the image itself and a scalar `int32` with the index of the example and finally the type of flower (from 0 to `NumLabels-1`=101). The index of the example can be used, for instance, to split the dataset (into training/validation/test).
- `labels`: the type of flower (same as `inputs[2]`), an `int32` value from 0 to `NumLabels-1` with the label.