Documentation ¶
Overview ¶
Package vfilter provides filtering methods for the vision package. These apply tensor.Tensor filters to a 2D visual input via Conv (convolution) function, using filter-parallel approach: Each go routine does a different filter in a set of filters, e.g., different angles of Gabor filters. This is coarse-grained, strictly parallel, and thus very efficient.
image.go contains routines for converting an image into the float32 tensor.Float32 that is required for doing the convolution. * RGBToGrey converts an RGB image to a greyscale float32.
MaxPool function does Max-pooling over filtered results to reduce dimensionality, consistent with standard DCNN approaches.
Geom manages the geometry for going from an input image to the filtered output of that image.
Unlike the C++ version, no wrapping or clipping is supported directly: all input images must be padded so that the filters can be applied with appropriate padding border, guaranteeing that there are no bounds issues. See WrapPad for wrapping-based padding.
Index ¶
- Constants
- func Conv(geom *Geom, flt *tensor.Float32, img, out *tensor.Float32, gain float32)
- func Conv1(geom *Geom, flt *tensor.Float32, img, out *tensor.Float32, gain float32)
- func ConvDiff(geom *Geom, fltOn, fltOff *tensor.Float32, imgOn, imgOff, out *tensor.Float32, ...)
- func Deconv(geom *Geom, flt *tensor.Float32, img, out *tensor.Float32, gain float32)
- func EdgeAvg(tsr *tensor.Float32, padWidth int) float32
- func FadePad(tsr *tensor.Float32, padWidth int)
- func FadePadRGB(tsr *tensor.Float32, padWidth int)
- func FeatAgg(srcRows []int, trgStart int, src, out *tensor.Float32)
- func GreyTensorToImage(img *image.Gray, tsr *tensor.Float32, padWidth int, topZero bool) *image.Gray
- func LeftHalf(x int) int
- func MaxPool(psize, spc image.Point, in, out *tensor.Float32)
- func MaxReduceFilterY(in, out *tensor.Float32)
- func OuterAgg(innerPos, rowOff int, src, out *tensor.Float32)
- func RGBTensorToImage(img *image.RGBA, tsr *tensor.Float32, padWidth int, topZero bool) *image.RGBA
- func RGBToGrey(img image.Image, tsr *tensor.Float32, padWidth int, topZero bool)
- func RGBToTensor(img image.Image, tsr *tensor.Float32, padWidth int, topZero bool)
- func TensorLogNorm(tsr tensor.Tensor, ndim int)
- func UnPool(psize, spc image.Point, in, out *tensor.Float32, rnd bool)
- func WrapPad(tsr *tensor.Float32, padWidth int)
- func WrapPadRGB(tsr *tensor.Float32, padWidth int)
- type Geom
Constants ¶
const ( // Version is the version of this package being used Version = "v2.0.0-dev0.0.8" // GitCommit is the commit just before the latest version commit GitCommit = "cc4e0c3" // VersionDate is the date-time of the latest version commit in UTC (in the format 'YYYY-MM-DD HH:MM', which is the Go format '2006-01-02 15:04') VersionDate = "2024-02-05 22:52" )
Variables ¶
This section is empty.
Functions ¶
func Conv ¶
Conv performs convolution of filter over img into out. img *must* have border (padding) so that filters are applied without any bounds checking -- wrapping etc is all done in the padding process, which is much more efficient. Computation is parallel in number of different filter types (outer dim of flt) as that will be most memory efficient. img must be a 2D tensor of image values (convert RGB to grey first). Everything must be organized row major as tensor default. Out shape dims are: Y, X, Polarity (2), Angle where the 2 polarities (on, off) are for positive and and negative filter values, respectively.
func Conv1 ¶
Conv1 performs convolution of single filter over img into out. img *must* have border (padding) so that filters are applied without any bounds checking -- wrapping etc is all done in the padding process, which is much more efficient. Computation is parallel in image lines. img must be a 2D tensor of image values (convert RGB to grey first). Everything must be organized row major as tensor default. Output has 2 outer dims for positive vs. negative values, inner is Y, X todo: add option to interleave polarity as inner-most dim.
func ConvDiff ¶
func ConvDiff(geom *Geom, fltOn, fltOff *tensor.Float32, imgOn, imgOff, out *tensor.Float32, gain, gainOn float32)
ConvDiff computes difference of two separate filter convolutions (fltOn - fltOff) over two images into out. There are separate gain multipliers for On and overall gain. images *must* have border (padding) so that filters are applied without any bounds checking -- wrapping etc is all done in the padding process, which is much more efficient. Computation is parallel in image lines. img must be a 2D tensor of image values (grey or single components). Everything must be organized row major as tensor default. Output has 2 outer dims for positive vs. negative values, inner is Y, X
func Deconv ¶
Deconv performs reverse convolution of filter -- given output of filter, accumulates an input image as sum of filter * output activation. img *must* have border (padding) so that filters are applied without any bounds checking -- wrapping etc is all done in the padding process, which is much more efficient. img must be a 2D tensor of image values (convert RGB to grey first). Everything must be organized row major as tensor default. Out shape dims are: Y, X, Polarity (2), Angle where the 2 polarities (on, off) are for positive and and negative filter values, respectively.
func EdgeAvg ¶
EdgeAvg returns the average value around the effective edge of image at padWidth in from each side
func FadePad ¶
FadePad fades given padding width of float32 image around sides gradually fading the edge value toward a mean edge value
func FadePadRGB ¶
FadePadRGB fades given padding width of float32 image around sides gradually fading the edge value toward a mean edge value. RGB version iterates over outer-most dimension of components.
func FeatAgg ¶
FeatAgg does simple aggregation of feature rows from one feature map to another. One row (inner-most of 4D dimensions) is assumed to be an angle, common across feature rows. srcRows is the list of rows in the source to copy. outStart is starting row in output to start copy -- srcRows will be contiguous in output from that row up. no bounds checking is done on output so it will just fail if there isn't enough room -- allocate the output size before calling!
func GreyTensorToImage ¶
func GreyTensorToImage(img *image.Gray, tsr *tensor.Float32, padWidth int, topZero bool) *image.Gray
GreyTensorToImage converts a greyscale tensor to image -- uses existing img if it is of correct size, otherwise makes a new one. padWidth is the amount of padding to subtract from all sides. topZero retains the Y=0 value at the top of the tensor -- otherwise it is flipped with Y=0 at the bottom to be consistent with the emergent / OpenGL standard coordinate system
func MaxPool ¶
MaxPool performs max-pooling over given pool size and spacing. size must = spacing or 2 * spacing. Pooling is sensitive to the feature structure of the input, which must have shape: Y, X, Polarities, Angles.
func MaxReduceFilterY ¶
MaxReduceFilterY performs max-pooling reduce over inner Filter Y dimension (polarities, colors) must have shape: Y, X, Polarities, Angles.
func OuterAgg ¶
OuterAgg does simple aggregation of outer-most dimension from tensor into another 4D tensor, with Y, X as outer-most two dimensions, starting at given inner-most feature offset, and inner row-wise offset. inner row-wise dimension maps the outer-most dimension of source tensor. no bounds checking is done on output so it will just fail if there isn't enough room -- allocate the output size before calling!
func RGBTensorToImage ¶
RGBTensorToImage converts an RGB tensor to image -- uses existing image if it is of correct size, otherwise makes a new one. tensor must have outer dimension as RGB components. padWidth is the amount of padding to subtract from all sides. topZero retains the Y=0 value at the top of the tensor -- otherwise it is flipped with Y=0 at the bottom to be consistent with the emergent / OpenGL standard coordinate system
func RGBToGrey ¶
RGBToGrey converts an RGB input image to a greyscale tensor in preparation for processing. padWidth is the amount of padding to add on all sides. topZero retains the Y=0 value at the top of the tensor -- otherwise it is flipped with Y=0 at the bottom to be consistent with the emergent / OpenGL standard coordinate system
func RGBToTensor ¶
RGBToTensor converts an RGB input image to an RGB tensor with outer dimension as RGB components. padWidth is the amount of padding to add on all sides. topZero retains the Y=0 value at the top of the tensor -- otherwise it is flipped with Y=0 at the bottom to be consistent with the emergent / OpenGL standard coordinate system
func TensorLogNorm ¶
TensorLogNorm computes 1 + log of all the numbers and then does Max Div renorm so result is normalized in 0-1 range. computed on the first ndim dims of the tensor, where 0 = all values, 1 = norm each of the sub-dimensions under the first outer-most dimension etc. ndim must be < NumDims() if not 0 (panics).
func UnPool ¶
UnPool performs inverse max-pooling over given pool size and spacing. This is very dumb and either uses a random number if rnd = true, or just copies the max pooled value over all of the individual elements that were pooled. A smarter solution would require maintaining the index of the max item, but that requires more infrastructure size must = spacing or 2 * spacing. Pooling is sensitive to the feature structure of the input, which must have shape: Y, X, Polarities, Angles.
func WrapPad ¶
WrapPad wraps given padding width of float32 image around sides i.e., padding for left side of image is the (mirrored) bits from the right side of image, etc.
func WrapPadRGB ¶
WrapPadRGB wraps given padding width of float32 image around sides i.e., padding for left side of image is the (mirrored) bits from the right side of image, etc. RGB version iterates over outer-most dimension of components.
Types ¶
type Geom ¶
type Geom struct { // size of input -- computed from image or set In image.Point // size of output -- computed Out image.Point // starting border into image -- must be >= FiltRt Border image.Point // spacing -- number of pixels to skip in each direction Spacing image.Point // full size of filter FiltSz image.Point // computed size of left/top size of filter FiltLt image.Point // computed size of right/bottom size of filter (FiltSz - FiltLeft) FiltRt image.Point }
Geom contains the filtering geometry info for a given filter pass.