Documentation ¶
Overview ¶
Package vision provides a client for the Google Cloud Vision API.
Google Cloud Vision allows easy integration of vision detection features into developer applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content. For more information about Cloud Vision, read the Google Cloud Vision API Documentation at https://cloud.google.com/vision/docs.
Creating Images ¶
The Cloud Vision API supports a variety of image file formats, including JPEG, PNG8, PNG24, Animated GIF (first frame only), and RAW. See https://cloud.google.com/vision/docs/image-best-practices#image_types for the complete list of formats. Be aware that Cloud Vision sets upper limits on file size as well as on the total combined size of all images in a request. Reducing your file size can significantly improve throughput; however, be careful not to reduce image quality in the process. See https://cloud.google.com/vision/docs/image-best-practices#image_sizing for current file size limits.
Creating an Image instance does not perform an API request.
Use NewImageFromReader to obtain an image from any io.Reader, such as an open file:
f, err := os.Open("path/to/image.jpg") if err != nil { ... } defer f.Close() img, err := vision.NewImageFromReader(f) if err != nil { ... }
Use NewImageFromGCS to refer to an image in Google Cloud Storage:
img := vision.NewImageFromGCS("gs://my-bucket/my-image.png")
Annotating Images ¶
Client.Annotate is the most general method in the package. It can run multiple detections on multiple images with a single API call.
To describe the detections you want to perform on an image, create an AnnotateRequest and specify the maximum number of results to return for each detection of interest. The exceptions are safe search and image properties, where a boolean is used instead.
resultSlice, err := client.Annotate(ctx, &vision.AnnotateRequest{ Image: img, MaxLogos: 5, MaxTexts: 100, SafeSearch: true, }) if err != nil { ... }
You can pass as many AnnotateRequests as desired to client.Annotate. The return value is a slice of an Annotations. Each Annotations value may contain an Error along with one or more successful results. The failed detections will have a nil annotation.
result := resultSlice[0] if result.Error != nil { ... } // some detections failed for _, logo := range result.Logos { ... } for _, text := range result.Texts { ... } if result.SafeSearch != nil { ... }
Other methods on Client run a single detection on a single image. For instance, Client.DetectFaces will run face detection on the provided Image. These methods return a single annotation of the appropriate type (for example, DetectFaces returns a FaceAnnotation). The error return value incorporates both API call errors and the detection errors stored in Annotations.Error, simplifying your logic.
faces, err := client.DetectFaces(ctx, 10) // maximum of 10 faces if err != nil { ... }
Here faces is a slice of FaceAnnotations. The Face field of each FaceAnnotation provides easy access to the positions of facial features:
fmt.Println(faces[0].Face.Nose.Tip) fmt.Println(faces[0].Face.Eyes.Left.Pupil)
This package is experimental and subject to API changes.
Index ¶
- Constants
- type AnnotateRequest
- type Annotations
- type Chin
- type Client
- func (c *Client) Annotate(ctx context.Context, requests ...*AnnotateRequest) ([]*Annotations, error)
- func (c *Client) Close() error
- func (c *Client) DetectFaces(ctx context.Context, img *Image, maxResults int) ([]*FaceAnnotation, error)
- func (c *Client) DetectImageProps(ctx context.Context, img *Image) (*ImageProps, error)
- func (c *Client) DetectLabels(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
- func (c *Client) DetectLandmarks(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
- func (c *Client) DetectLogos(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
- func (c *Client) DetectSafeSearch(ctx context.Context, img *Image) (*SafeSearchAnnotation, error)
- func (c *Client) DetectTexts(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
- type ColorInfo
- type Ears
- type EntityAnnotation
- type Eye
- type Eyebrow
- type Eyebrows
- type Eyes
- type FaceAnnotation
- type FaceLandmarks
- type FaceLikelihoods
- type Image
- type ImageProps
- type LatLng
- type LatLngRect
- type Likelihood
- type Mouth
- type Nose
- type Property
- type SafeSearchAnnotation
Examples ¶
Constants ¶
const ( // LikelihoodUnknown means the likelihood is unknown. LikelihoodUnknown = Likelihood(pb.Likelihood_UNKNOWN) // VeryUnlikely means the image is very unlikely to belong to the feature specified. VeryUnlikely = Likelihood(pb.Likelihood_VERY_UNLIKELY) // Unlikely means the image is unlikely to belong to the feature specified. Unlikely = Likelihood(pb.Likelihood_UNLIKELY) // Possible means the image possibly belongs to the feature specified. Possible = Likelihood(pb.Likelihood_POSSIBLE) // Likely means the image is likely to belong to the feature specified. Likely = Likelihood(pb.Likelihood_LIKELY) // VeryLikely means the image is very likely to belong to the feature specified. VeryLikely = Likelihood(pb.Likelihood_VERY_LIKELY) )
const Scope = "https://www.googleapis.com/auth/cloud-platform"
Scope is the OAuth2 scope required by the Google Cloud Vision API.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AnnotateRequest ¶
type AnnotateRequest struct { // Image is the image to annotate. Image *Image // MaxFaces is the maximum number of faces to detect in the image. // Specifying a number greater than zero enables face detection. MaxFaces int // MaxLandmarks is the maximum number of landmarks to detect in the image. // Specifying a number greater than zero enables landmark detection. MaxLandmarks int // MaxLogos is the maximum number of logos to detect in the image. // Specifying a number greater than zero enables logo detection. MaxLogos int // MaxLabels is the maximum number of labels to detect in the image. // Specifying a number greater than zero enables labels detection. MaxLabels int // MaxTexts is the maximum number of separate pieces of text to detect in the // image. Specifying a number greater than zero enables text detection. MaxTexts int // SafeSearch specifies whether a safe-search detection should be run on the image. SafeSearch bool // ImageProps specifies whether image properties should be obtained for the image. ImageProps bool }
An AnnotateRequest specifies an image to annotate and the features to look for in that image.
type Annotations ¶
type Annotations struct { // Faces holds the results of face detection. Faces []*FaceAnnotation // Landmarks holds the results of landmark detection. Landmarks []*EntityAnnotation // Logos holds the results of logo detection. Logos []*EntityAnnotation // Labels holds the results of label detection. Labels []*EntityAnnotation // Texts holds the results of text detection. Texts []*EntityAnnotation // SafeSearch holds the results of safe-search detection. SafeSearch *SafeSearchAnnotation // ImageProps contains properties of the annotated image. ImageProps *ImageProps // If non-nil, then one or more of the attempted annotations failed. // Non-nil annotations are guaranteed to be correct, even if Error is // non-nil. Error error }
Annotations contains all the annotations performed by the API on a single image. A nil field indicates either that the corresponding feature was not requested, or that annotation failed for that feature.
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is a Google Cloud Vision API client.
func NewClient ¶
NewClient creates a new vision client.
Example ¶
package main import ( "cloud.google.com/go/vision" "golang.org/x/net/context" ) func main() { ctx := context.Background() client, err := vision.NewClient(ctx) if err != nil { // TODO: handle error. } // Use the client. // Close the client when finished. if err := client.Close(); err != nil { // TODO: handle error. } }
Output:
func (*Client) Annotate ¶
func (c *Client) Annotate(ctx context.Context, requests ...*AnnotateRequest) ([]*Annotations, error)
Annotate annotates multiple images, each with a potentially differeent set of features.
Example (OneImage) ¶
package main import ( "fmt" "cloud.google.com/go/vision" "golang.org/x/net/context" ) func main() { ctx := context.Background() client, err := vision.NewClient(ctx) if err != nil { // TODO: handle error. } annsSlice, err := client.Annotate(ctx, &vision.AnnotateRequest{ Image: vision.NewImageFromGCS("gs://my-bucket/my-image.png"), MaxLogos: 100, MaxTexts: 100, SafeSearch: true, }) if err != nil { // TODO: handle error. } anns := annsSlice[0] if anns.Logos != nil { fmt.Println(anns.Logos) } if anns.Texts != nil { fmt.Println(anns.Texts) } if anns.SafeSearch != nil { fmt.Println(anns.SafeSearch) } if anns.Error != nil { fmt.Printf("at least one of the features failed: %v", anns.Error) } }
Output:
func (*Client) DetectFaces ¶
func (c *Client) DetectFaces(ctx context.Context, img *Image, maxResults int) ([]*FaceAnnotation, error)
DetectFaces performs face detection on the image. At most maxResults results are returned.
Example ¶
package main import ( "fmt" "cloud.google.com/go/vision" "golang.org/x/net/context" ) func main() { ctx := context.Background() client, err := vision.NewClient(ctx) if err != nil { // TODO: handle error. } img := vision.NewImageFromGCS("gs://my-bucket/my-image.png") faces, err := client.DetectFaces(ctx, img, 10) if err != nil { // TODO: handle error. } fmt.Println(faces[0].Face.Nose.Tip) fmt.Println(faces[0].Face.Eyes.Left.Pupil) }
Output:
func (*Client) DetectImageProps ¶
DetectImageProps computes properties of the image.
func (*Client) DetectLabels ¶
func (c *Client) DetectLabels(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
DetectLabels performs label detection on the image. At most maxResults results are returned.
func (*Client) DetectLandmarks ¶
func (c *Client) DetectLandmarks(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
DetectLandmarks performs landmark detection on the image. At most maxResults results are returned.
func (*Client) DetectLogos ¶
func (c *Client) DetectLogos(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
DetectLogos performs logo detection on the image. At most maxResults results are returned.
func (*Client) DetectSafeSearch ¶
DetectSafeSearch performs safe-search detection on the image.
func (*Client) DetectTexts ¶
func (c *Client) DetectTexts(ctx context.Context, img *Image, maxResults int) ([]*EntityAnnotation, error)
DetectTexts performs text detection on the image. At most maxResults results are returned.
type ColorInfo ¶
type ColorInfo struct { // RGB components of the color. Color color.NRGBA64 // Score is the image-specific score for this color, in the range [0, 1]. Score float32 // PixelFraction is the fraction of pixels the color occupies in the image, // in the range [0, 1]. PixelFraction float32 }
ColorInfo consists of RGB channels, score and fraction of image the color occupies in the image.
type EntityAnnotation ¶
type EntityAnnotation struct { // ID is an opaque entity ID. Some IDs might be available in Knowledge Graph(KG). // For more details on KG please see: // https://developers.google.com/knowledge-graph/ ID string // Locale is the language code for the locale in which the entity textual // description (next field) is expressed. Locale string // Description is the entity textual description, expressed in the language of Locale. Description string // Score is the overall score of the result. Range [0, 1]. Score float32 // Confidence is the accuracy of the entity detection in an image. // For example, for an image containing the Eiffel Tower, this field represents // the confidence that there is a tower in the query image. Range [0, 1]. Confidence float32 // Topicality is the relevancy of the ICA (Image Content Annotation) label to the // image. For example, the relevancy of 'tower' to an image containing // 'Eiffel Tower' is likely higher than an image containing a distant towering // building, though the confidence that there is a tower may be the same. // Range [0, 1]. Topicality float32 // BoundingPoly is the image region to which this entity belongs. Not filled currently // for label detection. For text detection, BoundingPolys // are produced for the entire text detected in an image region, followed by // BoundingPolys for each word within the detected text. BoundingPoly []image.Point // Locations contains the location information for the detected entity. // Multiple LatLng structs can be present since one location may indicate the // location of the scene in the query image, and another the location of the // place where the query image was taken. Location information is usually // present for landmarks. Locations []LatLng // Properties are additional optional Property fields. // For example a different kind of score or string that qualifies the entity. Properties []Property }
An EntityAnnotation describes the results of a landmark, label, logo or text detection on an image.
type FaceAnnotation ¶
type FaceAnnotation struct { // BoundingPoly is the bounding polygon around the face. The coordinates of // the bounding box are in the original image's scale, as returned in // ImageParams. The bounding box is computed to "frame" the face in // accordance with human expectations. It is based on the landmarker // results. Note that one or more x and/or y coordinates may not be // generated in the BoundingPoly (the polygon will be unbounded) if only a // partial face appears in the image to be annotated. BoundingPoly []image.Point // FDBoundingPoly is tighter than BoundingPoly, and // encloses only the skin part of the face. Typically, it is used to // eliminate the face from any image analysis that detects the "amount of // skin" visible in an image. It is not based on the landmarker results, only // on the initial face detection, hence the fd (face detection) prefix. FDBoundingPoly []image.Point // Landmarks are detected face landmarks. Face FaceLandmarks // RollAngle indicates the amount of clockwise/anti-clockwise rotation of // the face relative to the image vertical, about the axis perpendicular to // the face. Range [-180,180]. RollAngle float32 // PanAngle is the yaw angle: the leftward/rightward angle that the face is // pointing, relative to the vertical plane perpendicular to the image. Range // [-180,180]. PanAngle float32 // TiltAngle is the pitch angle: the upwards/downwards angle that the face is // pointing relative to the image's horizontal plane. Range [-180,180]. TiltAngle float32 // DetectionConfidence is the detection confidence. The range is [0, 1]. DetectionConfidence float32 // LandmarkingConfidence is the face landmarking confidence. The range is [0, 1]. LandmarkingConfidence float32 // Likelihoods expresses the likelihood of various aspects of the face. Likelihoods *FaceLikelihoods }
A FaceAnnotation describes the results of face detection on an image.
type FaceLandmarks ¶
type FaceLandmarks struct { Eyebrows Eyebrows Eyes Eyes Ears Ears Nose Nose Mouth Mouth Chin Chin Forehead *r3.Vector }
FaceLandmarks contains the positions of facial features detected by the service. TODO(jba): write doc for all
type FaceLikelihoods ¶
type FaceLikelihoods struct { // Joy is the likelihood that the face expresses joy. Joy Likelihood // Sorrow is the likelihood that the face expresses sorrow. Sorrow Likelihood // Anger is the likelihood that the face expresses anger. Anger Likelihood // Surprise is the likelihood that the face expresses surprise. Surprise Likelihood // UnderExposed is the likelihood that the face is under-exposed. UnderExposed Likelihood // Blurred is the likelihood that the face is blurred. Blurred Likelihood // Headwear is the likelihood that the face has headwear. Headwear Likelihood }
FaceLikelihoods expresses the likelihood of various aspects of a face.
type Image ¶
type Image struct { // Rect is a rectangle on the Earth's surface represented by the // image. It is optional. Rect *LatLngRect // LanguageHints is a list of languages to use for text detection. In most // cases, leaving this field nil yields the best results since it enables // automatic language detection. For languages based on the Latin alphabet, // setting LanguageHints is not needed. In rare cases, when the language of // the text in the image is known, setting a hint will help get better // results (although it will be a significant hindrance if the hint is // wrong). Text detection returns an error if one or more of the specified // languages is not one of the supported languages (See // https://cloud.google.com/translate/v2/translate-reference#supported_languages). LanguageHints []string // contains filtered or unexported fields }
An Image represents the contents of an image to run detection algorithms on, along with metadata. Images may be described by their raw bytes, or by a reference to a a Google Cloude Storage (GCS) object.
func NewImageFromGCS ¶
NewImageFromGCS returns an image that refers to an object in Google Cloud Storage. gcsPath must be a valid Google Cloud Storage URI of the form "gs://BUCKET/OBJECT".
You may optionally set Rect and LanguageHints on the returned Image before using it.
func NewImageFromReader ¶
func NewImageFromReader(r io.ReadCloser) (*Image, error)
NewImageFromReader reads the bytes of an image from rc, then closes rc.
You may optionally set Rect and LanguageHints on the returned Image before using it.
type ImageProps ¶
type ImageProps struct { // DominantColors describes the dominant colors of the image. DominantColors []*ColorInfo }
ImageProps describes properties of the image itself, like the dominant colors.
type LatLng ¶
type LatLng struct { // Lat is the latitude in degrees. It must be in the range [-90.0, +90.0]. Lat float64 // Lng is the longitude in degrees. It must be in the range [-180.0, +180.0]. Lng float64 }
A LatLng is a point on the Earth's surface, represented with a latitude and longitude.
type LatLngRect ¶
type LatLngRect struct {
Min, Max LatLng
}
A LatLngRect is a rectangular area on the Earth's surface, represented by a minimum and maximum latitude and longitude.
type Likelihood ¶
type Likelihood int
A Likelihood is an approximate representation of a probability.
type SafeSearchAnnotation ¶
type SafeSearchAnnotation struct { // Adult is the likelihood that the image contains adult content. Adult Likelihood // Spoof is the likelihood that an obvious modification was made to the // image's canonical version to make it appear funny or offensive. Spoof Likelihood // Medical is the likelihood that this is a medical image. Medical Likelihood // Violence is the likelihood that this image represents violence. Violence Likelihood }
SafeSearchAnnotation describes the results of a SafeSearch detection on an image.