nn

package
v0.0.0-...-e05d22d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2024 License: MIT Imports: 15 Imported by: 2

Documentation

Index

Constants

View Source
const (
	COCOPerson       = 0
	COCOBicycle      = 1
	COCOCar          = 2
	COCOMotorcycle   = 3
	COCOAirplane     = 4
	COCOBus          = 5
	COCOTrain        = 6
	COCOTruck        = 7
	COCOBoat         = 8
	COCOTrafficLight = 9
	COCOFireHydrant  = 10
	COCOStopSign     = 11
	COCOParkingMeter = 12
	COCOBench        = 13
	COCOBird         = 14
	COCOCat          = 15
	COCODog          = 16
)
View Source
const DefaultNmsIouThreshold = 0.45
View Source
const DefaultProbabilityThreshold = 0.5

Variables

View Source
var COCOClasses = []string{
	"person",
	"bicycle",
	"car",
	"motorcycle",
	"airplane",
	"bus",
	"train",
	"truck",
	"boat",
	"traffic light",
	"fire hydrant",
	"stop sign",
	"parking meter",
	"bench",
	"bird",
	"cat",
	"dog",
	"horse",
	"sheep",
	"cow",
	"elephant",
	"bear",
	"zebra",
	"giraffe",
	"backpack",
	"umbrella",
	"handbag",
	"tie",
	"suitcase",
	"frisbee",
	"skis",
	"snowboard",
	"sports ball",
	"kite",
	"baseball bat",
	"baseball glove",
	"skateboard",
	"surfboard",
	"tennis racket",
	"bottle",
	"wine glass",
	"cup",
	"fork",
	"knife",
	"spoon",
	"bowl",
	"banana",
	"apple",
	"sandwich",
	"orange",
	"broccoli",
	"carrot",
	"hot dog",
	"pizza",
	"donut",
	"cake",
	"chair",
	"couch",
	"potted plant",
	"bed",
	"dining table",
	"toilet",
	"tv",
	"laptop",
	"mouse",
	"remote",
	"keyboard",
	"cell phone",
	"microwave",
	"oven",
	"toaster",
	"sink",
	"refrigerator",
	"book",
	"clock",
	"vase",
	"scissors",
	"teddy bear",
	"hair drier",
	"toothbrush",
}

COCO classes

Functions

func LoadClassFile

func LoadClassFile(filename string) ([]string, error)

Load a text file with class names on each line

func MergeSimilarAbstractObjects

func MergeSimilarAbstractObjects(input []ProcessedObject, abstractClasses map[int]bool, minIoU float32) []int

Scan all pairs of objects in 'input', and if they have a high IoU, and they are abstract objects, and their concrete classes differ, then merge them. For example: A small pickup might get detected by the NN as a "car" and a "truck" with slightly different bounding boxes. This will result in two objects getting detected: A car and a truck. After creating abstract classes, we'll have car, truck, and two vehicles. The goal of this function is to squash those two vehicles into a single vehicle. Returns the indices of the objects that should be retained.

func MergeSimilarObjects

func MergeSimilarObjects(input []ProcessedObject, mergeMap map[string]string, classes []string, minIoU float32) []int

Scan all pairs of objects in 'input', and if they have a high IoU, and their classes are specified in 'mergeMap', then merge them into a single object. Returns the list of objects that should be retained.

Types

type DetectionParams

type DetectionParams struct {
	ProbabilityThreshold float32 // Value between 0 and 1. Lower values will find more objects. Zero value will use the default.
	NmsIouThreshold      float32 // Value between 0 and 1. Lower values will merge more objects together into one. Zero value will use the default.
	Unclipped            bool    // If true, don't clip boxes to the neural network boundaries
}

NN object detection parameters

func NewDetectionParams

func NewDetectionParams() *DetectionParams

Create a default DetectionParams object

type DetectionResult

type DetectionResult struct {
	CameraID    int64             `json:"cameraID"`
	ImageWidth  int               `json:"imageWidth"`
	ImageHeight int               `json:"imageHeight"`
	Objects     []ObjectDetection `json:"objects"`
	FramePTS    time.Time         `json:"framePTS"`
}

Results of an NN object detection run

type ImageBatch

type ImageBatch struct {
	BatchSize   int    // Number of images in this batch
	BatchStride int    // Number of bytes between each image
	Width       int    // Image width
	Height      int    // Image height
	Stride      int    // Image stride (bytes from one row to the next)
	NChan       int    // Number of channels (eg 3 for RGB)
	Pixels      []byte // The images
}

ImageBatch is 1 or more images sent to a Neural Network

func MakeImageBatch

func MakeImageBatch(batchSize, batchStride, width, height, nchan, stride int, pixels []byte) ImageBatch

Setup an ImageBatch struct for 1 or more images

func MakeImageBatchSingle

func MakeImageBatchSingle(width, height, nchan, stride int, pixels []byte) ImageBatch

Setup an ImageBatch struct for a single image

func (*ImageBatch) Image

func (b *ImageBatch) Image(i int) ImageCrop

type ImageCrop

type ImageCrop struct {
	NChan       int    // Number of channels (eg 3 for RGB)
	Pixels      []byte // The whole image
	ImageWidth  int    // The width of the original image, held in Pixels
	ImageHeight int    // The height of the original image, held in Pixels
	CropX       int    // Origin of crop X
	CropY       int    // Origin of crop Y
	CropWidth   int    // The width of this crop
	CropHeight  int    // The height of this crop
}

ImageCrop is a crop of an image. In C we would represent this as a pointer and a stride, but since that's not memory safe, we must resort to this kind of thing. Once we get into the C world for NN inference, then we can use strides etc. To create an ImageCrop, start with WholeImage(), and then use Crop() to get a sub-crop.

func WholeImage

func WholeImage(nchan int, pixels []byte, width, height int) ImageCrop

Return a 'crop' of the entire image

func (ImageCrop) Crop

func (c ImageCrop) Crop(x1, y1, x2, y2 int) ImageCrop

Return a crop of the crop (new crop is relative to existing). If any parameter is out of bounds, we panic

func (ImageCrop) Pointer

func (c ImageCrop) Pointer() unsafe.Pointer

Return a pointer to the start of the crop

func (ImageCrop) Stride

func (c ImageCrop) Stride() int

func (ImageCrop) ToBatch

func (c ImageCrop) ToBatch() ImageBatch

Return an ImageBatch containing this image

type ImageLabels

type ImageLabels struct {
	Frame   int               `json:"frame,omitempty"` // For video, this is the frame number
	Objects []ObjectDetection `json:"objects"`
}

type InferenceOptions

type InferenceOptions struct {
	MinSize        int      // Minimum size of object, in pixels. If max(width, height) >= MinSize, then use the object
	MaxVideoHeight int      // If video height is larger than this, then scale it down to this size (0 = no scaling)
	StartFrame     int      // Start processing at frame (0 = start at beginning)
	EndFrame       int      // Stop processing at frame (0 = process to end)
	Classes        []string // List of class names to detect (eg ["person", "car", "bear"]). Classes not included in the list are ignored.
	StdOutProgress bool     // Emit progress to stdout
}

NN analysis options for RunInferenceOnVideoFile

type ModelConfig

type ModelConfig struct {
	Architecture string   `json:"architecture"` // eg "yolov8"
	Width        int      `json:"width"`        // eg 320
	Height       int      `json:"height"`       // eg 256
	Classes      []string `json:"classes"`      // eg ["person", "bicycle", "car", ...]
}

ModelConfig is saved in a JSON file along with the weights of the NN model

func LoadModelConfig

func LoadModelConfig(filename string) (*ModelConfig, error)

Load model config from a JSON file

type ModelSetup

type ModelSetup struct {
	BatchSize            int
	ProbabilityThreshold float32 // Same as nn.DetectionParams.ProbabilityThreshold
	NmsIouThreshold      float32 // Same as nn.DetectionParams.NmsIouThreshold
}

This was created for the Hailo accelerator interface. Too much overlap with DetectionParams!!!

func NewModelSetup

func NewModelSetup() *ModelSetup

type ObjectDetection

type ObjectDetection struct {
	Class      int     `json:"class"`
	Confidence float32 `json:"confidence"`
	Box        Rect    `json:"box"`
}

ObjectDetection is an object that a neural network has found in an image

func TiledInference

func TiledInference(model ObjectDetector, img ImageCrop, _params *DetectionParams, nThreads int) ([]ObjectDetection, error)

Run tiled inference on the image. We look at the width and height of the model, and if the image is larger, then we split the image up into tiles, and run each of those tiles through the model. Then, we merge the tiles back into a single dataset. If the model is larger than the image, then we just run the model directly, so it is safe to call TiledInference on any image, without incurring any performance loss.

type ObjectDetector

type ObjectDetector interface {
	// Close closes the detector (you MUST call this when finished, because it's a C++ object underneath)
	Close()

	// DetectObjects returns a list of objects detected in the batch of images.
	// nchan is expected to be 3, and batch is a batch of 24-bit RGB images.
	// You can create a default DetectionParams with NewDetectionParams()
	DetectObjects(batch ImageBatch, params *DetectionParams) ([][]ObjectDetection, error)

	// Model Config.
	// Callers assume that ModelConfig will remain constant, so don't change it
	// once the detector has been created.
	Config() *ModelConfig
}

ObjectDetector is given an image, and returns zero or more detected objects

type Point

type Point struct {
	X int32 `json:"x"`
	Y int32 `json:"y"`
}

func (Point) Distance

func (p Point) Distance(b Point) float32

type ProcessedObject

type ProcessedObject struct {
	Raw   ObjectDetection // Raw NN output
	Class int             // If this is an abstract class (eg "vehicle"), then it will be different from Raw.Class (eg "car" or "truck")
}

ProcessedObject is an ObjectDetection that has undergone some post-processing

type Rect

type Rect struct {
	X      int32 `json:"x"`
	Y      int32 `json:"y"`
	Width  int32 `json:"width"`
	Height int32 `json:"height"`
}

func MakeRect

func MakeRect(x, y, width, height int) Rect

func (Rect) Area

func (r Rect) Area() int

func (Rect) Center

func (r Rect) Center() Point

func (Rect) IOU

func (r Rect) IOU(b Rect) float32

Intersection over Union

func (Rect) Intersection

func (r Rect) Intersection(b Rect) Rect

func (*Rect) MaxDelta

func (r *Rect) MaxDelta(b Rect) int

func (*Rect) Offset

func (r *Rect) Offset(dx, dy int)

func (*Rect) String

func (r *Rect) String() string

func (Rect) Union

func (r Rect) Union(b Rect) Rect

func (Rect) X2

func (r Rect) X2() int32

func (Rect) Y2

func (r Rect) Y2() int32

type ResizeTransform

type ResizeTransform struct {
	OffsetX int32
	OffsetY int32
	ScaleX  float32
	ScaleY  float32
}

ResizeTransform expresses a transformation that we've made on an image (eg resizing, or resizing + moving) When applying forward, we first scale and then offset.

func IdentityResizeTransform

func IdentityResizeTransform() ResizeTransform

func (*ResizeTransform) ApplyBackward

func (r *ResizeTransform) ApplyBackward(detections []ObjectDetection)

func (*ResizeTransform) ApplyForward

func (r *ResizeTransform) ApplyForward(detections []ObjectDetection)

type ThreadingMode

type ThreadingMode int
const (
	ThreadingModeSingle   ThreadingMode = iota // Force the NN library to run inference on a single thread
	ThreadingModeParallel                      // Allow the NN library to run multiple threads while executing a model
)

type VideoLabels

type VideoLabels struct {
	Classes []string       `json:"classes"`
	Frames  []*ImageLabels `json:"frames"`
	Width   int            `json:"width"`  // Image width. Useful when inference is run at different resolution to original image
	Height  int            `json:"height"` // Image height. Useful when inference is run at different resolution to original image
}

VideoLabels contains labels for each video frame

func RunInferenceOnVideoFile

func RunInferenceOnVideoFile(model ObjectDetector, inputFile string, options InferenceOptions) (*VideoLabels, error)

Run NN inference on every frame of a video

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL