Documentation
¶
Index ¶
- Constants
- Variables
- func LoadClassFile(filename string) ([]string, error)
- func MergeSimilarAbstractObjects(input []ProcessedObject, abstractClasses map[int]bool, minIoU float32) []int
- func MergeSimilarObjects(input []ProcessedObject, mergeMap map[string]string, classes []string, ...) []int
- type DetectionParams
- type DetectionResult
- type ImageBatch
- type ImageCrop
- type ImageLabels
- type InferenceOptions
- type ModelConfig
- type ModelSetup
- type ObjectDetection
- type ObjectDetector
- type Point
- type ProcessedObject
- type Rect
- func (r Rect) Area() int
- func (r Rect) Center() Point
- func (r Rect) IOU(b Rect) float32
- func (r Rect) Intersection(b Rect) Rect
- func (r *Rect) MaxDelta(b Rect) int
- func (r *Rect) Offset(dx, dy int)
- func (r *Rect) String() string
- func (r Rect) Union(b Rect) Rect
- func (r Rect) X2() int32
- func (r Rect) Y2() int32
- type ResizeTransform
- type ThreadingMode
- type VideoLabels
Constants ¶
const ( COCOPerson = 0 COCOBicycle = 1 COCOCar = 2 COCOMotorcycle = 3 COCOAirplane = 4 COCOBus = 5 COCOTrain = 6 COCOTruck = 7 COCOBoat = 8 COCOTrafficLight = 9 COCOFireHydrant = 10 COCOStopSign = 11 COCOParkingMeter = 12 COCOBench = 13 COCOBird = 14 COCOCat = 15 COCODog = 16 )
const DefaultNmsIouThreshold = 0.45
const DefaultProbabilityThreshold = 0.5
Variables ¶
var COCOClasses = []string{
"person",
"bicycle",
"car",
"motorcycle",
"airplane",
"bus",
"train",
"truck",
"boat",
"traffic light",
"fire hydrant",
"stop sign",
"parking meter",
"bench",
"bird",
"cat",
"dog",
"horse",
"sheep",
"cow",
"elephant",
"bear",
"zebra",
"giraffe",
"backpack",
"umbrella",
"handbag",
"tie",
"suitcase",
"frisbee",
"skis",
"snowboard",
"sports ball",
"kite",
"baseball bat",
"baseball glove",
"skateboard",
"surfboard",
"tennis racket",
"bottle",
"wine glass",
"cup",
"fork",
"knife",
"spoon",
"bowl",
"banana",
"apple",
"sandwich",
"orange",
"broccoli",
"carrot",
"hot dog",
"pizza",
"donut",
"cake",
"chair",
"couch",
"potted plant",
"bed",
"dining table",
"toilet",
"tv",
"laptop",
"mouse",
"remote",
"keyboard",
"cell phone",
"microwave",
"oven",
"toaster",
"sink",
"refrigerator",
"book",
"clock",
"vase",
"scissors",
"teddy bear",
"hair drier",
"toothbrush",
}
COCO classes
Functions ¶
func LoadClassFile ¶
Load a text file with class names on each line
func MergeSimilarAbstractObjects ¶
func MergeSimilarAbstractObjects(input []ProcessedObject, abstractClasses map[int]bool, minIoU float32) []int
Scan all pairs of objects in 'input', and if they have a high IoU, and they are abstract objects, and their concrete classes differ, then merge them. For example: A small pickup might get detected by the NN as a "car" and a "truck" with slightly different bounding boxes. This will result in two objects getting detected: A car and a truck. After creating abstract classes, we'll have car, truck, and two vehicles. The goal of this function is to squash those two vehicles into a single vehicle. Returns the indices of the objects that should be retained.
func MergeSimilarObjects ¶
func MergeSimilarObjects(input []ProcessedObject, mergeMap map[string]string, classes []string, minIoU float32) []int
Scan all pairs of objects in 'input', and if they have a high IoU, and their classes are specified in 'mergeMap', then merge them into a single object. Returns the list of objects that should be retained.
Types ¶
type DetectionParams ¶
type DetectionParams struct { ProbabilityThreshold float32 // Value between 0 and 1. Lower values will find more objects. Zero value will use the default. NmsIouThreshold float32 // Value between 0 and 1. Lower values will merge more objects together into one. Zero value will use the default. Unclipped bool // If true, don't clip boxes to the neural network boundaries }
NN object detection parameters
func NewDetectionParams ¶
func NewDetectionParams() *DetectionParams
Create a default DetectionParams object
type DetectionResult ¶
type DetectionResult struct { CameraID int64 `json:"cameraID"` ImageWidth int `json:"imageWidth"` ImageHeight int `json:"imageHeight"` Objects []ObjectDetection `json:"objects"` FramePTS time.Time `json:"framePTS"` }
Results of an NN object detection run
type ImageBatch ¶
type ImageBatch struct { BatchSize int // Number of images in this batch BatchStride int // Number of bytes between each image Width int // Image width Height int // Image height Stride int // Image stride (bytes from one row to the next) NChan int // Number of channels (eg 3 for RGB) Pixels []byte // The images }
ImageBatch is 1 or more images sent to a Neural Network
func MakeImageBatch ¶
func MakeImageBatch(batchSize, batchStride, width, height, nchan, stride int, pixels []byte) ImageBatch
Setup an ImageBatch struct for 1 or more images
func MakeImageBatchSingle ¶
func MakeImageBatchSingle(width, height, nchan, stride int, pixels []byte) ImageBatch
Setup an ImageBatch struct for a single image
func (*ImageBatch) Image ¶
func (b *ImageBatch) Image(i int) ImageCrop
type ImageCrop ¶
type ImageCrop struct { NChan int // Number of channels (eg 3 for RGB) Pixels []byte // The whole image ImageWidth int // The width of the original image, held in Pixels ImageHeight int // The height of the original image, held in Pixels CropX int // Origin of crop X CropY int // Origin of crop Y CropWidth int // The width of this crop CropHeight int // The height of this crop }
ImageCrop is a crop of an image. In C we would represent this as a pointer and a stride, but since that's not memory safe, we must resort to this kind of thing. Once we get into the C world for NN inference, then we can use strides etc. To create an ImageCrop, start with WholeImage(), and then use Crop() to get a sub-crop.
func WholeImage ¶
Return a 'crop' of the entire image
func (ImageCrop) Crop ¶
Return a crop of the crop (new crop is relative to existing). If any parameter is out of bounds, we panic
func (ImageCrop) ToBatch ¶
func (c ImageCrop) ToBatch() ImageBatch
Return an ImageBatch containing this image
type ImageLabels ¶
type ImageLabels struct { Frame int `json:"frame,omitempty"` // For video, this is the frame number Objects []ObjectDetection `json:"objects"` }
type InferenceOptions ¶
type InferenceOptions struct { MinSize int // Minimum size of object, in pixels. If max(width, height) >= MinSize, then use the object MaxVideoHeight int // If video height is larger than this, then scale it down to this size (0 = no scaling) StartFrame int // Start processing at frame (0 = start at beginning) EndFrame int // Stop processing at frame (0 = process to end) Classes []string // List of class names to detect (eg ["person", "car", "bear"]). Classes not included in the list are ignored. StdOutProgress bool // Emit progress to stdout }
NN analysis options for RunInferenceOnVideoFile
type ModelConfig ¶
type ModelConfig struct { Architecture string `json:"architecture"` // eg "yolov8" Width int `json:"width"` // eg 320 Height int `json:"height"` // eg 256 Classes []string `json:"classes"` // eg ["person", "bicycle", "car", ...] }
ModelConfig is saved in a JSON file along with the weights of the NN model
func LoadModelConfig ¶
func LoadModelConfig(filename string) (*ModelConfig, error)
Load model config from a JSON file
type ModelSetup ¶
type ModelSetup struct { BatchSize int ProbabilityThreshold float32 // Same as nn.DetectionParams.ProbabilityThreshold NmsIouThreshold float32 // Same as nn.DetectionParams.NmsIouThreshold }
This was created for the Hailo accelerator interface. Too much overlap with DetectionParams!!!
func NewModelSetup ¶
func NewModelSetup() *ModelSetup
type ObjectDetection ¶
type ObjectDetection struct { Class int `json:"class"` Confidence float32 `json:"confidence"` Box Rect `json:"box"` }
ObjectDetection is an object that a neural network has found in an image
func TiledInference ¶
func TiledInference(model ObjectDetector, img ImageCrop, _params *DetectionParams, nThreads int) ([]ObjectDetection, error)
Run tiled inference on the image. We look at the width and height of the model, and if the image is larger, then we split the image up into tiles, and run each of those tiles through the model. Then, we merge the tiles back into a single dataset. If the model is larger than the image, then we just run the model directly, so it is safe to call TiledInference on any image, without incurring any performance loss.
type ObjectDetector ¶
type ObjectDetector interface { // Close closes the detector (you MUST call this when finished, because it's a C++ object underneath) Close() // DetectObjects returns a list of objects detected in the batch of images. // nchan is expected to be 3, and batch is a batch of 24-bit RGB images. // You can create a default DetectionParams with NewDetectionParams() DetectObjects(batch ImageBatch, params *DetectionParams) ([][]ObjectDetection, error) // Model Config. // Callers assume that ModelConfig will remain constant, so don't change it // once the detector has been created. Config() *ModelConfig }
ObjectDetector is given an image, and returns zero or more detected objects
type ProcessedObject ¶
type ProcessedObject struct { Raw ObjectDetection // Raw NN output Class int // If this is an abstract class (eg "vehicle"), then it will be different from Raw.Class (eg "car" or "truck") }
ProcessedObject is an ObjectDetection that has undergone some post-processing
type Rect ¶
type Rect struct { X int32 `json:"x"` Y int32 `json:"y"` Width int32 `json:"width"` Height int32 `json:"height"` }
func (Rect) Intersection ¶
type ResizeTransform ¶
ResizeTransform expresses a transformation that we've made on an image (eg resizing, or resizing + moving) When applying forward, we first scale and then offset.
func IdentityResizeTransform ¶
func IdentityResizeTransform() ResizeTransform
func (*ResizeTransform) ApplyBackward ¶
func (r *ResizeTransform) ApplyBackward(detections []ObjectDetection)
func (*ResizeTransform) ApplyForward ¶
func (r *ResizeTransform) ApplyForward(detections []ObjectDetection)
type ThreadingMode ¶
type ThreadingMode int
const ( ThreadingModeSingle ThreadingMode = iota // Force the NN library to run inference on a single thread ThreadingModeParallel // Allow the NN library to run multiple threads while executing a model )
type VideoLabels ¶
type VideoLabels struct { Classes []string `json:"classes"` Frames []*ImageLabels `json:"frames"` Width int `json:"width"` // Image width. Useful when inference is run at different resolution to original image Height int `json:"height"` // Image height. Useful when inference is run at different resolution to original image }
VideoLabels contains labels for each video frame
func RunInferenceOnVideoFile ¶
func RunInferenceOnVideoFile(model ObjectDetector, inputFile string, options InferenceOptions) (*VideoLabels, error)
Run NN inference on every frame of a video