oxfordflowers102

package

v0.9.1 Latest Latest Go to latest Published: Apr 20, 2024 License: Apache-2.0 Imports: 23 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

README ¶

Oxford Flowers 102 Dataset

https://www.robots.ox.ac.uk/~vgg/data/flowers/102/

102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories.

The dataset is divided into a training set, a validation set and a test set. The training set and validation set each consist of 10 images per class (totalling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class). Total download in ~330Mb.

More information in the TensorFlow Datasets page:

https://www.tensorflow.org/datasets/catalog/oxford_flowers102

This package provides a train.Dataset with the images.

Under it you will also find a diffusion demo model trains a diffusion model, following the Keras example in:

https://keras.io/examples/generative/ddim/

Documentation ¶

Overview ¶

Package oxfordflowers102 provides tools to download and cache the dataset and a `train.Dataset` implementation that can be used to train models using GoMLX (http://github.com/gomlx/gomlx/).

Details in the README.md file. The dataset's home page is in https://www.robots.ox.ac.uk/~vgg/data/flowers/102/

Usage example:

Index ¶

Variables
func DownloadAndParse(baseDir string) error
func InMemoryDataset(manager *Manager, baseDir string, imageSize int, name string, ...) (inMemoryDataset *data.InMemoryDataset, err error)
func ParseImages(dirPath string) error
func ParseLabels(filePath string) error
func ReadExample(idx int) (img image.Image, label int32, err error)
type Dataset
- func NewDataset(dtype shapes.DType, imageSize int) *Dataset

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	DownloadBaseURL           = "https://www.robots.ox.ac.uk/~vgg/data/flowers/102/"
	DownloadSubdir            = "downloads"
	DownloadFilesAndChecksums = []struct {
		File, Checksum, UntarDir string
	}{

		{"102flowers.tgz", "", "jpg"},
		{"imagelabels.mat", "4903e94206bac23bf772aadf06451916df56b58fc483a62db32a97b82656651d", ""},
		{"setid.mat", "46b8678f91fd95d3c8f4feab80d271a6c834a1dd896fe29fd3e6ad9ce5c8dccd", ""},
	}
)

View Source

var (
	// AllLabels of the dataset. Converted to 0-based (0 to 101).
	// Only available after DownloadAndParse is successfully called.
	AllLabels []int32

	// AllImages of the dataset, the path to the images that is.
	// Only available after DownloadAndParse is successfully called.
	AllImages []string

	// NumExamples is the number of examples (images and labels) in the dataset.
	// Only available after DownloadAndParse is successfully called.
	NumExamples int

	// ImagesDir where images are stored. Only available after DownloadAndParse is
	// successfully called.
	ImagesDir string

	// NumLabels is 102, hence the name.
	NumLabels = 102

	// Names of all the 102 flowers in the dataset.
	Names = []string{}/* 102 elements not displayed */

)

Functions ¶

func DownloadAndParse ¶

func DownloadAndParse(baseDir string) error

DownloadAndParse "Oxford Flowers 102" dataset files to baseDir and untar it. If files are already downloaded, their previous copy is used.

After download, the contents of the files are parsed, and the global AllLabels is set.

func InMemoryDataset ¶

func InMemoryDataset(manager *Manager, baseDir string, imageSize int, name string,
	partitionSeed int64, partitionFrom, partitionTo float64) (
	inMemoryDataset *data.InMemoryDataset, err error)

InMemoryDataset creates a `data.InMemoryDataset` with the Oxford Flowers 102, of the given `imageSize` for both, height and width -- image is resized and then cropped at the center.

A cache version is automatically saved at the `baseDir` and prefixed with `name`, if it is not empty. And if a cache file is found, it is used, instead of re-reading and processing all the images.

It takes a partition of the data, defined by `partitionFrom` and `partitionTo`. They take values from 0.0 to 1.0 and represent the fraction of the dataset to take. They enable selection of arbitrary train/validation/test sizes. The `partitionSeed` can be used to generate different assignments -- the same seed should be used for the different partitions of the dataset.

If the cache is not found, it automatically calls DownloadAndParse to download and untar the original images, if they are not yet downloaded.

func ParseImages ¶

func ParseImages(dirPath string) error

func ParseLabels ¶

func ParseLabels(filePath string) error

func ReadExample ¶

func ReadExample(idx int) (img image.Image, label int32, err error)

ReadExample reads an image for the example idx. The example idx must be between 0 and NumExamples.

Types ¶

type Dataset ¶

type Dataset struct {
	// contains filtered or unexported fields
}

Dataset implements train.Dataset, and yields one image at a time. It pre-transforms the image to the target `imageSize`.

func NewDataset ¶

func NewDataset(dtype shapes.DType, imageSize int) *Dataset

NewDataset returns a Dataset for one epoch that yields one image at time. It reads them from disk, and the parsing can be parallelized. See `data.NewParallelDataset`.

The images are resized and cropped to `imageSize x imageSize` pixel, cut from the middle.

It doesn't support batch, but you can use GoMLX's `data.Batch` for that.

func (*Dataset) Name ¶

func (ds *Dataset) Name() string

Name implements train.Dataset interface.

func (*Dataset) Partition ¶ added in v0.4.0

func (ds *Dataset) Partition(seed int64, from, to float64) *Dataset

Partition allows one to partition the dataset into different parts -- typically "train", "validation" and "test". This should be called before the start of an epoch.

It takes a seed number based on which the partitions will be selected, and the range of elements specified as `from` and `to`: these are float values that represent the slice (from 0.0 to 1.0) of the examples that go into this dataset.

Example:

seed := int64(42)
dsTrain := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0, 0.8)   // 80%
dsValid := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.8, 0.9) // 10%
dsTest := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.9, 1.0)  // 10%

func (*Dataset) Reset ¶

func (ds *Dataset) Reset()

Reset implements train.Dataset interface.

func (*Dataset) Shuffle ¶

func (ds *Dataset) Shuffle() *Dataset

Shuffle will shuffle the order of the images. This should be called before the start of an epoch.

Once shuffled, every time the dataset is reset, it is reshuffled.

func (*Dataset) Yield ¶

func (ds *Dataset) Yield() (spec any, inputs []tensor.Tensor, labels []tensor.Tensor, err error)

Yield implements train.Dataset interface. It returns `ds` (the Dataset pointer) as spec.

It yields one example at a time, each consists of:

`inputs`: three values: the image itself and a scalar `int32` with the index of the example and finally the type of flower (from 0 to `NumLabels-1`=101). The index of the example can be used, for instance, to split the dataset (into training/validation/test).
`labels`: the type of flower (same as `inputs[2]`), an `int32` value from 0 to `NumLabels-1` with the label.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
diffusion Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.	Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.
train

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL