inceptionv3

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2024 License: Apache-2.0 Imports: 16 Imported by: 0

README

Inception V3 Model

This library creates the model architecture and optionally loads the pre-trained weights from Google.

Reference: - Rethinking the Inception Architecture for Computer Vision (CVPR 2016)

Based on Keras implementation:

Documentation

Overview

Package inceptionv3 provides a pre-trained InceptionV3 model, or simply it's structure.

This library creates the model architecture and optionally loads the pre-trained weights from Google. It can be used with or without the top-layer.

Reference: - Rethinking the Inception Architecture for Computer Vision (CVPR 2016), http://arxiv.org/abs/1512.00567

Based on Keras implementation:

- Source: github.com/keras-team/keras/keras/applications/inception_v3.py(https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py) - Documentation: https://keras.io/api/applications/inceptionv3/

To use it, start with BuildGraph. If using the pre-trained weights, call once DownloadAndUnpackWeights -- it is a no-op if weights have already been downloaded and unpacked.

If using with transfer learning, be mindful it uses batch normalization, which has its own considerations, see discussion in https://pub.towardsai.net/batchnorm-for-transfer-learning-df17d2897db6 .

This model

Transfer learning model example:

var (
	flagDataDir = flag.String("data", "~/work/my_model", "Directory where to save and load model data.")
	flagInceptionPreTrained = flag.Bool("pretrained", true, "If using inception model, whether to use the pre-trained weights to transfer learn")
	flagInceptionFineTuning = flag.Bool("finetuning", true, "If using inception model, whether to fine-tune the inception model")
)

func ModelGraph(ctx *context.Context, spec any, inputs []*Node) []*Node {
	_ = spec // Not needed.
	image := inputs[0]
	channelsConfig := timage.ChannelsLast
	image = inceptionv3.PreprocessImage(image, channelsConfig)
	image = inceptionv3.ScaleImageValuesTorch(image)

	var preTrainedPath string
	if *flagInceptionPreTrained {
		preTrainedPath = *flagDataDir
	}
	logits := inceptionv3.BuildGraph(ctx, image).PreTrained(preTrainedPath).
		SetPooling(inceptionv3.MaxPooling).Trainable(*flagInceptionFineTuning).Done()
	if !*flagInceptionFineTuning {
		logits = StopGradient(logits) // We don't want to train the inception model.
	}
	logits = FnnOnTop(ctx, logits)
	return []*Node{logits}
}

func main() {
	…
	if *flagInceptionPreTrained {
		err := inceptionv3.DownloadAndUnpackWeights(*flagDataDir)
		AssertNoError(err)
	}
	…
}

Index

Constants

View Source
const (
	// WeightsURL is the URL for the whole model, including the top layer, a 1000-classes linear layer on top.
	WeightsURL = "https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5"

	// WeightsH5Checksum is the SHA256 checksum of the weights file.
	WeightsH5Checksum = "00c9ea4e4762f716ac4d300d6d9c2935639cc5e4d139b5790d765dcbeea539d0"

	// WeightsH5Name is the name of the local ".h5" file with the weights.
	WeightsH5Name = "weights.h5"

	// UnpackedWeightsName is the name of the subdirectory that will hold the unpacked weights.
	UnpackedWeightsName = "gomlx_weights"
)
View Source
const BuildScope = "InceptionV3"

BuildScope is used by BuildGraph as a new sub-scope for the InceptionV3 layers.

View Source
const ClassificationImageSize = 299

ClassificationImageSize if using the Inception V3's model for classification. The image should be 299 x 299.

View Source
const EmbeddingSize = 2048

EmbeddingSize output (it not using the top).

View Source
const MinimumImageSize = 75

MinimumImageSize for width and height required.

View Source
const NumberOfClasses = 1000

NumberOfClasses when using the top layer.

Variables

This section is empty.

Functions

func DownloadAndUnpackWeights

func DownloadAndUnpackWeights(baseDir string) (err error)

DownloadAndUnpackWeights to the given baseDir. It only does the work if the files are not there yet (downloaded and unpacked).

It is verbose and uses a progressbar if downloading/unpacking. It is quiet if there is nothing to do, that is, if the files are already there.

func KidMetric added in v0.4.0

func KidMetric(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig timage.ChannelsAxisConfig) metrics.Interface

KidMetric returns a metric that takes a generated image and a label image and returns a measure of similarity.

[Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) was proposed as a replacement for the popular [Frechet Inception Distance (FID) metric](https://arxiv.org/abs/1706.08500) for measuring image generation quality. Both metrics measure the difference in the generated and training distributions in the representation space of an InceptionV3 network pretrained on ImageNet.

The implementation is based on the Keras one, described in https://keras.io/examples/generative/ddim/

To directly calculate KID, as opposed to using it as a metric, see NewKidBuilder below.

Parameters:

  • `dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
  • `kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
  • `maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
  • `channelsConfig`: informs what is the channels axis, commonly set to `timage.ChannelsLast`. Passed to `PreprocessImage` function.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

func PathToTensor

func PathToTensor(baseDir, tensorName string) string

PathToTensor returns the path to tensorName (name within the h5 file).

func PreprocessImage

func PreprocessImage(image *Node, maxValue float64, channelsConfig timage.ChannelsAxisConfig) *Node

PreprocessImage makes the image in a format usable to InceptionV3 model.

It performs 3 tasks:

  • Scales the values from -1.0 to 1.0: this is how it was originally trained. It requires `maxValue` to be carefully set to the maxValue of the images -- it is assumed the images are scaled from 0 to `maxValue`. Set `maxValue` to zero to skip this step.
  • It removes the alpha channel, in case it is provided.
  • The minimum image size accepted by InceptionV3 is 75x75. If any size is smaller than that, it will be resized accordingly, while preserving the aspect ratio.

Input `image` must have a batch dimension (rank=4), be either 3 or 4 channels, and its values must be scaled from 0 to maxValue (except if it is set to -1).

func ScaleImageValues added in v0.3.1

func ScaleImageValues(image *Node, maxValue float64) *Node

ScaleImageValues scales the `image` values from -1.0 to 1.0, assuming it is provided with values from 0.0 to `maxValue`.

This is presumably how the model was trained, so one would want this if using the pre-trained weights. But not necessary if training from scratch.

Careful with setting maxValue, setting it wrong can cause odd behavior. It's recommended checking.

Types

type Config

type Config struct {
	// contains filtered or unexported fields
}

Config for instantiating an InceptionV3 model. After the configuration is set, call Done, and it will build the InceptionV3 graph with the loaded variables.

See Build to construct a Config object and a usage example.

func BuildGraph

func BuildGraph(ctx *context.Context, image *Node) *Config

BuildGraph for InceptionV3 model.

For a model with pre-trained weights, call Config.PreTrained.

It returns a Config object that can be further configured. Once the configuration is finished, call `Done` and it will return the embedding (or classification) of the given image.

See example in the package inceptionv3 documentation.

Parameters:

  • ctx: context.Context where variables are created and loaded. Variables will be re-used if they were already created before in the current scope. That means one can call BuildGraph more than once, and have the same model be used for more than one input -- for instance, for 2-tower models. To instantiate more than one model with different weights, just use the context in a different scope.
  • Image: image tensor (`*Node`) on which to apply the model. There must be 3 channels, and they must be scaled from -1.0 to 1.0 -- see PreprocessImage to scale image accordingly if needed. If using ClassificationTop(true), the images must be of size 299x299 (defined as a constant `ClassificationImageSize`). Otherwise the minimum image size is 75x75.

The original model has weights in `shapes.F32`. (TODO: If the image has a different `DType`, it will try to convert the weights and work the model fully on the image's `DType`. This hasn't been extensively tested, so no guarantees of quality.)

The implementation follows closely the definition in https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py

func (*Config) BatchNormScale added in v0.3.1

func (cfg *Config) BatchNormScale(value bool) *Config

BatchNormScale sets whether to a scaling variable in BatchNorm. It defaults to false. If set to true, it is initialized with 1.0, so it has no impact if not fine-tuned.

The original model doesn't use it, but maybe handy if training from scratch.

func (*Config) ChannelsAxis

func (cfg *Config) ChannelsAxis(channelsAxisConfig timage.ChannelsAxisConfig) *Config

ChannelsAxis configures the axis for the channels (aka. "depth" or "features") dimension. The default is `timage.ChannelsLast`, meaning the "channels" dimension comes last.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

It returns the modified Config object, so calls can be cascaded.

func (*Config) ClassificationTop

func (cfg *Config) ClassificationTop(useTop bool) *Config

ClassificationTop configures whether to use the very top classification layer at the top of the model.

Typically, if using only the embeddings, set this to false. If actually classifying Inception images, you can set this to true, and it will include a last linear layer, and it will return the logits layer for each of the Inception 1000 classes.

This is only useful if PreTrained weights are configured.

It returns the modified Config object, so calls can be cascaded.

func (*Config) Done

func (cfg *Config) Done() (output *Node)

Done builds the graph based on the configuration set.

func (*Config) PreTrained

func (cfg *Config) PreTrained(baseDir string) *Config

PreTrained configures the graph to load the pre-trained weights. It takes as an argument `baseDir`, the directory where the weights have been downloaded with DownloadAndUnpackWeights -- use the same value used there.

The default is not to use the pre-trained weights, which will build an untrained InceptionV3 graph.

It returns the modified Config object, so calls can be cascaded.

func (*Config) SetPooling

func (cfg *Config) SetPooling(pooling Pooling) *Config

SetPooling configures whether to use a MaxPool at the very top of the model.

If set to NoPooling, the default, it returns a 4D tensor, with 2048 channels (see ChannelsAxis for order of axis). If set to MaxPooling or MeanPooling, it will pool the last spatial dimensions, either using Max or Mean.

This is only used if not using ClassificationTop.

It returns the modified Config object, so calls can be cascaded.

func (*Config) Trainable

func (cfg *Config) Trainable(trainable bool) *Config

Trainable configures whether the variables created will be set as trainable or not -- see `context.Variable`.

If using pre-trained weights as frozen values, set this to false -- and considering using `StopGradient()` on the value returned by Done, to prevent any gradients from even propagating. It's an error to configure this to false if not using pre-trained weights (see PreTrained). The default is true, which allows for fine-tuning of the InceptionV3 model.

Notice that if `Trainable(false)`, it will also mark the batch normalization for inference only

It returns the modified Config object, so calls can be cascaded.

type KidBuilder added in v0.4.0

type KidBuilder struct {
	// contains filtered or unexported fields
}

KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between two sets of images. See details in KidMetric.

func NewKidBuilder added in v0.4.0

func NewKidBuilder(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig timage.ChannelsAxisConfig) *KidBuilder

NewKidBuilder configures a KidBuilder.

KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between `labels` and `predictions` batches of images. The metric is normalized by the `labels` images, so it's not symmetric.

See details in KidMetric.

  • `dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
  • `kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
  • `maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
  • `channelsConfig`: informs what is the channels axis, commonly set to `timage.ChannelsLast`. Passed to `PreprocessImage` function.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

func (*KidBuilder) BuildGraph added in v0.4.0

func (builder *KidBuilder) BuildGraph(ctx *context.Context, labels, predictions []*Node) (output *Node)

BuildGraph returns the mean KID score of two batches, see KidMetric.

It returns a scalar with the mean distance of the images provided in labels and predictions. The images

type Pooling

type Pooling int

Pooling to be used at the top of the model

const (
	NoPooling Pooling = iota
	MaxPooling
	MeanPooling
)

func (Pooling) String

func (i Pooling) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL