inceptionv3

package

v0.9.1 Latest Latest Go to latest Published: Apr 20, 2024 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

README ¶

Inception V3 Model

This library creates the model architecture and optionally loads the pre-trained weights from Google.

Reference: - Rethinking the Inception Architecture for Computer Vision (CVPR 2016)

Based on Keras implementation:

Source: [github.com/keras-team/keras/keras/applications/inception_v3.py])https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py)
Documentation: https://keras.io/api/applications/inceptionv3/

Documentation ¶

Overview ¶

Package inceptionv3 provides a pre-trained InceptionV3 model, or simply it's structure.

This library creates the model architecture and optionally loads the pre-trained weights from Google. It can be used with or without the top-layer.

Reference: - Rethinking the Inception Architecture for Computer Vision (CVPR 2016), http://arxiv.org/abs/1512.00567

Based on Keras implementation:

- Source: github.com/keras-team/keras/keras/applications/inception_v3.py(https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py) - Documentation: https://keras.io/api/applications/inceptionv3/

To use it, start with BuildGraph. If using the pre-trained weights, call once DownloadAndUnpackWeights -- it is a no-op if weights have already been downloaded and unpacked.

If using with transfer learning, be mindful it uses batch normalization, which has its own considerations, see discussion in https://pub.towardsai.net/batchnorm-for-transfer-learning-df17d2897db6 .

This model ¶

Transfer learning model example:

var (
	flagDataDir = flag.String("data", "~/work/my_model", "Directory where to save and load model data.")
	flagInceptionPreTrained = flag.Bool("pretrained", true, "If using inception model, whether to use the pre-trained weights to transfer learn")
	flagInceptionFineTuning = flag.Bool("finetuning", true, "If using inception model, whether to fine-tune the inception model")
)

func ModelGraph(ctx *context.Context, spec any, inputs []*Node) []*Node {
	_ = spec // Not needed.
	image := inputs[0]
	channelsConfig := timage.ChannelsLast
	image = inceptionv3.PreprocessImage(image, channelsConfig)
	image = inceptionv3.ScaleImageValuesTorch(image)

	var preTrainedPath string
	if *flagInceptionPreTrained {
		preTrainedPath = *flagDataDir
	}
	logits := inceptionv3.BuildGraph(ctx, image).PreTrained(preTrainedPath).
		SetPooling(inceptionv3.MaxPooling).Trainable(*flagInceptionFineTuning).Done()
	if !*flagInceptionFineTuning {
		logits = StopGradient(logits) // We don't want to train the inception model.
	}
	logits = FnnOnTop(ctx, logits)
	return []*Node{logits}
}

func main() {
	…
	if *flagInceptionPreTrained {
		err := inceptionv3.DownloadAndUnpackWeights(*flagDataDir)
		AssertNoError(err)
	}
	…
}

Index ¶

Constants
func DownloadAndUnpackWeights(baseDir string) (err error)
func KidMetric(dataDir string, kidImageSize int, maxImageValue float64, ...) metrics.Interface
func PathToTensor(baseDir, tensorName string) string
func PreprocessImage(image *Node, maxValue float64, channelsConfig timage.ChannelsAxisConfig) *Node
func ScaleImageValues(image *Node, maxValue float64) *Node
type Config
- func BuildGraph(ctx *context.Context, image *Node) *Config
type KidBuilder
- func NewKidBuilder(dataDir string, kidImageSize int, maxImageValue float64, ...) *KidBuilder
- func (builder *KidBuilder) BuildGraph(ctx *context.Context, labels, predictions []*Node) (output *Node)
type Pooling
- func (i Pooling) String() string

Constants ¶

View Source

const (
	// WeightsURL is the URL for the whole model, including the top layer, a 1000-classes linear layer on top.
	WeightsURL = "https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5"

	// WeightsH5Checksum is the SHA256 checksum of the weights file.
	WeightsH5Checksum = "00c9ea4e4762f716ac4d300d6d9c2935639cc5e4d139b5790d765dcbeea539d0"

	// WeightsH5Name is the name of the local ".h5" file with the weights.
	WeightsH5Name = "weights.h5"

	// UnpackedWeightsName is the name of the subdirectory that will hold the unpacked weights.
	UnpackedWeightsName = "gomlx_weights"
)

View Source

const BuildScope = "InceptionV3"

BuildScope is used by BuildGraph as a new sub-scope for the InceptionV3 layers.

View Source

const ClassificationImageSize = 299

ClassificationImageSize if using the Inception V3's model for classification. The image should be 299 x 299.

View Source

const EmbeddingSize = 2048

EmbeddingSize output (it not using the top).

View Source

const MinimumImageSize = 75

MinimumImageSize for width and height required.

View Source

const NumberOfClasses = 1000

NumberOfClasses when using the top layer.

Variables ¶

This section is empty.

Functions ¶

func DownloadAndUnpackWeights ¶

func DownloadAndUnpackWeights(baseDir string) (err error)

DownloadAndUnpackWeights to the given baseDir. It only does the work if the files are not there yet (downloaded and unpacked).

It is verbose and uses a progressbar if downloading/unpacking. It is quiet if there is nothing to do, that is, if the files are already there.

func KidMetric ¶ added in v0.4.0

func KidMetric(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig timage.ChannelsAxisConfig) metrics.Interface

KidMetric returns a metric that takes a generated image and a label image and returns a measure of similarity.

[Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) was proposed as a replacement for the popular [Frechet Inception Distance (FID) metric](https://arxiv.org/abs/1706.08500) for measuring image generation quality. Both metrics measure the difference in the generated and training distributions in the representation space of an InceptionV3 network pretrained on ImageNet.

The implementation is based on the Keras one, described in https://keras.io/examples/generative/ddim/

To directly calculate KID, as opposed to using it as a metric, see NewKidBuilder below.

Parameters:

`dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
`kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
`maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
`channelsConfig`: informs what is the channels axis, commonly set to `timage.ChannelsLast`. Passed to `PreprocessImage` function.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

func PathToTensor ¶

func PathToTensor(baseDir, tensorName string) string

PathToTensor returns the path to tensorName (name within the h5 file).

func PreprocessImage ¶

func PreprocessImage(image *Node, maxValue float64, channelsConfig timage.ChannelsAxisConfig) *Node

PreprocessImage makes the image in a format usable to InceptionV3 model.

It performs 3 tasks:

Scales the values from -1.0 to 1.0: this is how it was originally trained. It requires `maxValue` to be carefully set to the maxValue of the images -- it is assumed the images are scaled from 0 to `maxValue`. Set `maxValue` to zero to skip this step.
It removes the alpha channel, in case it is provided.
The minimum image size accepted by InceptionV3 is 75x75. If any size is smaller than that, it will be resized accordingly, while preserving the aspect ratio.

Input `image` must have a batch dimension (rank=4), be either 3 or 4 channels, and its values must be scaled from 0 to maxValue (except if it is set to -1).

func ScaleImageValues ¶ added in v0.3.1

func ScaleImageValues(image *Node, maxValue float64) *Node

ScaleImageValues scales the `image` values from -1.0 to 1.0, assuming it is provided with values from 0.0 to `maxValue`.

This is presumably how the model was trained, so one would want this if using the pre-trained weights. But not necessary if training from scratch.

Careful with setting maxValue, setting it wrong can cause odd behavior. It's recommended checking.

Types ¶

type Config ¶

type Config struct {
	// contains filtered or unexported fields
}

Config for instantiating an InceptionV3 model. After the configuration is set, call Done, and it will build the InceptionV3 graph with the loaded variables.

See Build to construct a Config object and a usage example.

func BuildGraph ¶

func BuildGraph(ctx *context.Context, image *Node) *Config

BuildGraph for InceptionV3 model.

For a model with pre-trained weights, call Config.PreTrained.

It returns a Config object that can be further configured. Once the configuration is finished, call `Done` and it will return the embedding (or classification) of the given image.

See example in the package inceptionv3 documentation.

Parameters:

ctx: context.Context where variables are created and loaded. Variables will be re-used if they were already created before in the current scope. That means one can call BuildGraph more than once, and have the same model be used for more than one input -- for instance, for 2-tower models. To instantiate more than one model with different weights, just use the context in a different scope.
Image: image tensor (`*Node`) on which to apply the model. There must be 3 channels, and they must be scaled from -1.0 to 1.0 -- see PreprocessImage to scale image accordingly if needed. If using ClassificationTop(true), the images must be of size 299x299 (defined as a constant `ClassificationImageSize`). Otherwise the minimum image size is 75x75.

The original model has weights in `shapes.F32`. (TODO: If the image has a different `DType`, it will try to convert the weights and work the model fully on the image's `DType`. This hasn't been extensively tested, so no guarantees of quality.)

The implementation follows closely the definition in https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py

func (*Config) BatchNormScale ¶ added in v0.3.1

func (cfg *Config) BatchNormScale(value bool) *Config

BatchNormScale sets whether to a scaling variable in BatchNorm. It defaults to false. If set to true, it is initialized with 1.0, so it has no impact if not fine-tuned.

The original model doesn't use it, but maybe handy if training from scratch.

func (*Config) ChannelsAxis ¶

func (cfg *Config) ChannelsAxis(channelsAxisConfig timage.ChannelsAxisConfig) *Config

ChannelsAxis configures the axis for the channels (aka. "depth" or "features") dimension. The default is `timage.ChannelsLast`, meaning the "channels" dimension comes last.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

It returns the modified Config object, so calls can be cascaded.

func (*Config) ClassificationTop ¶

func (cfg *Config) ClassificationTop(useTop bool) *Config

ClassificationTop configures whether to use the very top classification layer at the top of the model.

Typically, if using only the embeddings, set this to false. If actually classifying Inception images, you can set this to true, and it will include a last linear layer, and it will return the logits layer for each of the Inception 1000 classes.

This is only useful if PreTrained weights are configured.

It returns the modified Config object, so calls can be cascaded.

func (*Config) Done ¶

func (cfg *Config) Done() (output *Node)

Done builds the graph based on the configuration set.

func (*Config) PreTrained ¶

func (cfg *Config) PreTrained(baseDir string) *Config

PreTrained configures the graph to load the pre-trained weights. It takes as an argument `baseDir`, the directory where the weights have been downloaded with DownloadAndUnpackWeights -- use the same value used there.

The default is not to use the pre-trained weights, which will build an untrained InceptionV3 graph.

It returns the modified Config object, so calls can be cascaded.

func (*Config) SetPooling ¶

func (cfg *Config) SetPooling(pooling Pooling) *Config

SetPooling configures whether to use a MaxPool at the very top of the model.

If set to NoPooling, the default, it returns a 4D tensor, with 2048 channels (see ChannelsAxis for order of axis). If set to MaxPooling or MeanPooling, it will pool the last spatial dimensions, either using Max or Mean.

This is only used if not using ClassificationTop.

It returns the modified Config object, so calls can be cascaded.

func (*Config) Trainable ¶

func (cfg *Config) Trainable(trainable bool) *Config

Trainable configures whether the variables created will be set as trainable or not -- see `context.Variable`.

If using pre-trained weights as frozen values, set this to false -- and considering using `StopGradient()` on the value returned by Done, to prevent any gradients from even propagating. It's an error to configure this to false if not using pre-trained weights (see PreTrained). The default is true, which allows for fine-tuning of the InceptionV3 model.

Notice that if `Trainable(false)`, it will also mark the batch normalization for inference only ¶

It returns the modified Config object, so calls can be cascaded.

type KidBuilder ¶ added in v0.4.0

type KidBuilder struct {
	// contains filtered or unexported fields
}

KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between two sets of images. See details in KidMetric.

func NewKidBuilder ¶ added in v0.4.0

func NewKidBuilder(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig timage.ChannelsAxisConfig) *KidBuilder

NewKidBuilder configures a KidBuilder.

KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between `labels` and `predictions` batches of images. The metric is normalized by the `labels` images, so it's not symmetric.

See details in KidMetric.

`dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
`kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
`maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
`channelsConfig`: informs what is the channels axis, commonly set to `timage.ChannelsLast`. Passed to `PreprocessImage` function.

Note: `timage` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

func (*KidBuilder) BuildGraph ¶ added in v0.4.0

func (builder *KidBuilder) BuildGraph(ctx *context.Context, labels, predictions []*Node) (output *Node)

BuildGraph returns the mean KID score of two batches, see KidMetric.

It returns a scalar with the mean distance of the images provided in labels and predictions. The images

type Pooling ¶

type Pooling int

Pooling to be used at the top of the model

const (
	NoPooling Pooling = iota
	MaxPooling
	MeanPooling
)

func (Pooling) String ¶

func (i Pooling) String() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL