videox

package
v0.0.0-...-e05d22d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2024 License: MIT Imports: 23 Imported by: 0

README

Annex-B performance hit

On a Raspberry Pi 5, our Annex-B encoder (the bit that adds the Emulation Prevention Byte) can encode 716 MB/s. Memcpy on this platform is 4690 MB/s. Decode is 916 MB/s.

You can use misc_test.cpp to measure the speed yourself (instructions at top of that file).

I don't have enough numbers right now to figure out the total system impact, but my gut doesn't like it. It seems plausible that one should be able to improve the speed of the encoder, but I don't know how. The alternative that I'm considering is to delay encoding to Annex-B for as long as possible - perhaps even doing it in the browser immediately before display.

If we're recording to disk, then it would be useful to avoid this penalty completely, but that precludes us from using regular video formats like mp4. On the other hand, we might want to avoid regular formats anyway.

Documentation

Index

Constants

View Source
const DebugVideoDecodeTimes = false

If true, report the decode FPS

View Source
const EnableEmulationPreventBytesEscaping = true

Topic: $ANNEXB-CONFUSION Here's the story: When we receive packets from Hikvision cameras, via github.com/bluenviron/gortsplib, the packets are supposedly NALUFormatRBSP, aka raw data bits, with no start codes, and no emulation prevention bytes. The codecs seem to want packets in SODB (aka AnnexB) encoding, so we dutifully encode the raw packets into AnnexB, with emulation prevention bytes added. HOWEVER, when we activate this code path, we get sporadic errors from ffmpeg, telling us that we've got bad frames. If we comment out the code that does the emulation prevention byte injection, then these errors go away. To be clear, we must inject the start codes. This is unambiguous. It's the emulation prevention bytes that cause errors. This confusion is the reason for this constant. At some point we'll hopefully learn more, and make better sense of this. Right now the culprit could be any one of these: 1. HikVision cameras 2. gortsplib 3. The way I'm using the h264 codec in ffmpeg 4. My SODB/Annex-B encoder 5. My understanding ------------------------ UPDATE WITH ANSWER ------------------------ I have come to the conclusion that my Hikvision cameras are sending data with emulation prevention bytes added to the byte stream, but without start codes. So this has led me to store two pieces of state with each NALU: 1. Does it have a start code? 2. How is the payload encoded? I initially thought that the presence of a start code should be synonymous with the presence of emulation prevention bytes, but I've learned that this is not the case.

Variables

View Source
var ErrResourceTemporarilyUnavailable = errors.New("Resource temporarily unavailable") // common response from avcodec_receive_frame if a frame is not available

Functions

func AnnexBWorstSize

func AnnexBWorstSize(startCodeLen, rawLen int) int

Return the worst case size of an Annex-B encoded packet, given the size of the raw packet (including a 3 byte start code).

func DecodeAnnexB

func DecodeAnnexB(encoded []byte) []byte

Decode an Annex-B encoded packet into a Raw Byte Sequence Payload (RBSP). We assume that you're handling the 3 or 4 byte NALU prefix outside of this function.

func DecodeAnnexBSize

func DecodeAnnexBSize(encoded []byte) int

Return the number of bytes needed to decode an Annex-B encoded packet. This function is for analysis of camera streams. In ordinary usage, we just call DecodeAnnexB().

func DecodeClosestImageInPacketList

func DecodeClosestImageInPacketList(codec Codec, packets []*VideoPacket, targetTime time.Time, cache *FrameCache, videoCacheKey string) (*cimg.Image, time.Time, error)

Decode the list of packets, and return the decoded image who's presentation time is closest to targetTime. If targetTime is zero, then we return the first image coming out of the decoder. If cache is not nil, then we will insert/query the provided cache. videoCacheKey is the key for this video. We use {videoCacheKey-PTS} as the complete cache key.

func DecodeFirstImageInPacketList

func DecodeFirstImageInPacketList(codec Codec, packets []*VideoPacket) (*cimg.Image, time.Time, error)

Decode the list of packets, and return the first image that successfully decodes

func DecodeSinglePacketToImage

func DecodeSinglePacketToImage(codec Codec, packet *VideoPacket) (*cimg.Image, error)

Creates a decoder and attempts to decode a single IDR packet. This was built for extracting a thumbnail during a long recording. Obviously this is a bit expensive, because you're creating a decoder for just a single frame.

func EncodeAnnexB

func EncodeAnnexB(raw []byte, startCodeLen int, flags AnnexBEncodeFlags) []byte

Encode an RBSP (Raw Byte Sequence Packet) into Annex-B format, optionally adding a 3 or 4 byte start code (00.00.01 or 00.00.00.01) to the beginning of the encoded byte stream. Also, we adds the "emulation prevention byte" (0x03) where necessary, if the relevant flag is set. If startCodeLen is zero, then we do not add a start code

func EncodeAnnexBInto

func EncodeAnnexBInto(raw []byte, startCodeLen int, flags AnnexBEncodeFlags, dst []byte) (encodedSize int, bufferSizeOK bool)

Encode an RBSP (Raw Byte Sequence Packet) into Annex-B format, optionally adding a 3 byte start code (00.00.01) to the beginning of the encoded byte stream. This encoding adds the "emulation prevention byte" (0x03) where necessary.

func ExtractFrame

func ExtractFrame(srcFilename string, atSecond float64, outputWidth int) ([]byte, error)

Extract a single frame from a video file and return the JPEG bytes If outputWidth is zero, then we use the same width as the input video

func ExtractVideoDuration

func ExtractVideoDuration(srcFilename string) (time.Duration, error)

Extract the duration of a video file

func FirstLikelyAnnexBEncodedIndex

func FirstLikelyAnnexBEncodedIndex(encoded []byte) int

func IsVisualPacket

func IsVisualPacket(t h264.NALUType) bool

func NALUStartCode

func NALUStartCode(length int) []byte

func NumPlanes

func NumPlanes(pixelFormat AVPixelFormat) int

func ParseBinFilename

func ParseBinFilename(filename string) (packetNumber int, naluNumber int, timeNS int64)

This is just used for debugging and testing

func ParseH264SPS

func ParseH264SPS(nalu []byte) (width, height int, err error)

Parse a raw SPS NALU (not annex-b) On Rpi5, this takes 305ns for a 50 byte SPS packet, which is typical on my Hikvisions. On AMD Ryzen 9 5900X, this takes 94ns

func RunAppCombinedOutput

func RunAppCombinedOutput(app_name string, args []string) ([]byte, error)

app_name is an executable, such as "ffmpeg" or "ffprobe" args must not include the executable name as the first parameter Returns the string output from exec.Cmd's "CombinedOutput" method.

func TranscodeMediumQualitySeekable

func TranscodeMediumQualitySeekable(srcFilename, dstFilename string) error

Transcode the high quality video stream into a slightly lower quality stream, with keyframes every 8 frames, and with noise reduction. This is for use on our training platform, where people need to be able to seek randomly inside a video.

func TranscodeSeekable

func TranscodeSeekable(srcFilename, dstFilename string) error

Transcode a video to make it easy for a low powered mobile browser to seek to random video positions

Types

type AVPixelFormat

type AVPixelFormat int

Export some of the ffmpeg C pixel formats to Go

const (
	AVPixelFormatYUV420P AVPixelFormat = C.AV_PIX_FMT_YUV420P
	AVPixelFormatRGB24   AVPixelFormat = C.AV_PIX_FMT_RGB24
)

type AnnexBEncodeFlags

type AnnexBEncodeFlags int

Flags that control how EncodeAnnexB works

const (
	AnnexBEncodeFlagNone                        AnnexBEncodeFlags = 0 // This is nonsensical - it is simply a memcpy
	AnnexBEncodeFlagAddEmulationPreventionBytes AnnexBEncodeFlags = 1 // Add emulation prevention bytes (0x03) where necessary
)

type Codec

type Codec string
const (
	CodecH264 Codec = "h264"
	CodecH265 Codec = "h265"
)

func ParseCodec

func ParseCodec(codec string) (Codec, error)

func (Codec) ToFFmpeg

func (c Codec) ToFFmpeg() string

Return the string that FFMpeg uses to identify this codec

type Frame

type Frame struct {
	Image *accel.YUVImage // Image (might be a deep reference into ffmpeg memory)
	PTS   int64           // Presentation time in native time units. Use VideoDecoder.FrameTimeToDuration() to convert to a time.Duration
}

A decoded frame

func (*Frame) DeepClone

func (f *Frame) DeepClone() *Frame

Return a deep clone of the frame (new image memory)

type FrameCache

type FrameCache struct {
	MaxMemory  int // Maximum bytes of RAM to use
	MemoryUsed int // Current bytes of RAM used
	// contains filtered or unexported fields
}

FrameCache is used to speed up the fetching of individual frames while a user is seeking around in a video. We cache YUV images.

func NewFrameCache

func NewFrameCache(maxMemory int) *FrameCache

NewFrameCache creates a new FrameCache with the given maximum memory usage

func (*FrameCache) AddFrame

func (f *FrameCache) AddFrame(key string, frame *accel.YUVImage)

Add a frame to the cache

func (*FrameCache) GetFrame

func (f *FrameCache) GetFrame(key string) *accel.YUVImage

Return the frame or nil

func (*FrameCache) MakeKey

func (f *FrameCache) MakeKey(videoKey string, framePTSUnixMS int64) string

type MPGTSEncoder

type MPGTSEncoder struct {
	// contains filtered or unexported fields
}

MPGTSEncoder allows to encode H264 NALUs into MPEG-TS.

func NewMPEGTSEncoder

func NewMPEGTSEncoder(log logs.Log, output io.Writer, sps []byte, pps []byte) (*MPGTSEncoder, error)

NewMPEGTSEncoder allocates a mpegtsEncoder.

func (*MPGTSEncoder) Close

func (e *MPGTSEncoder) Close() error

close closes all the mpegtsEncoder resources.

func (*MPGTSEncoder) Encode

func (e *MPGTSEncoder) Encode(nalus []NALU, pts time.Duration) error

encode encodes H264 NALUs into MPEG-TS.

type NALU

type NALU struct {
	PayloadIsAnnexB  bool
	PayloadNoEscapes bool // True if PayloadIsAnnexB BUT we know that we have no "emulation prevention bytes", so we can avoid decoding them.
	Payload          []byte
}

Codec NALU

func WrapRawNALU

func WrapRawNALU(raw []byte) NALU

Wrap a raw buffer in a NALU object. Do not clone memory, or add prefix bytes.

func (*NALU) AsAnnexB

func (n *NALU) AsAnnexB() NALU

Return payload data, but make sure it's in AnnexB format, and has a start code of 00.00.01 or 00.00.00.01

func (*NALU) AsRBSP

func (n *NALU) AsRBSP() NALU

Return payload data, but make sure it's in RBSP format, with no start code

func (*NALU) DeepClone

func (n *NALU) DeepClone() NALU

func (*NALU) IsAnnexBWithStartCode

func (n *NALU) IsAnnexBWithStartCode() bool

Returns true if the NALU has a start code, and the payload is encoded with emulation prevention bytes

func (*NALU) IsRBSPWithNoStartCode

func (n *NALU) IsRBSPWithNoStartCode() bool

Returns true if the NALU has no start code, and the payload is not encoded with emulation prevention bytes

func (*NALU) PayloadOnly

func (n *NALU) PayloadOnly() []byte

Returns only the payload, without any start code

func (*NALU) StartCodeLen

func (n *NALU) StartCodeLen() int

Returns length of start code Possible return values: 0: No start code 3: 00 00 01 4: 00 00 00 01

func (*NALU) Type

func (n *NALU) Type() h264.NALUType

Return the NALU type

type PacketBuffer

type PacketBuffer struct {
	Packets []*VideoPacket
}

A list of packets, with some helper functions

func ExtractFsvPackets

func ExtractFsvPackets(input []fsv.NALU) *PacketBuffer

Convert FSV packets to our VideoPacket format

func LoadBinDir

func LoadBinDir(dir string) (*PacketBuffer, error)

Opposite of RawBuffer.DumpBin NOTE: We don't attempt to inject SPS and PPS into RawBuffer, but would be trivial for H264.. just look at first byte of payload... (67 and 68 for SPS and PPS)

func (*PacketBuffer) DecodeHeader

func (r *PacketBuffer) DecodeHeader() (width, height int, err error)

Decode SPS and PPS to extract header information

func (*PacketBuffer) DumpBin

func (r *PacketBuffer) DumpBin(dir string) error

Dump each NALU to a .raw file

func (*PacketBuffer) ExtractThumbnail

func (r *PacketBuffer) ExtractThumbnail() (*cimg.Image, error)

Decode the center-most keyframe This is O(1), assuming no errors or funny business like no keyframes.

func (*PacketBuffer) FindClosestPacketWallPTS

func (r *PacketBuffer) FindClosestPacketWallPTS(wallPTS time.Time, keyframeOnly bool) int

Find the packet with the WallPTS closest to the given time

func (*PacketBuffer) FindFirstIDR

func (r *PacketBuffer) FindFirstIDR() int

Returns the index of the first keyframe in the buffer, or -1 if none found

func (*PacketBuffer) FirstNALUOfType

func (r *PacketBuffer) FirstNALUOfType(ofType h264.NALUType) *NALU

Returns the first NALU of the given type, or nil if none found

func (*PacketBuffer) HasIDR

func (r *PacketBuffer) HasIDR() bool

Returns true if we have at least one keyframe in the buffer

func (*PacketBuffer) IndexOfFirstNALUOfType

func (r *PacketBuffer) IndexOfFirstNALUOfType(ofType h264.NALUType) (packetIdx int, indexInPacket int)

func (*PacketBuffer) ResetPTS

func (r *PacketBuffer) ResetPTS()

Adjust all PTS values so that the first frame starts at time 0

func (*PacketBuffer) SaveToMP4

func (r *PacketBuffer) SaveToMP4(filename string) error

func (*PacketBuffer) SaveToMPEGTS

func (r *PacketBuffer) SaveToMPEGTS(log logs.Log, output io.Writer) error

Extract saved buffer into an MPEGTS stream

type PayloadFormat

type PayloadFormat int8

PayloadState tells us the state of the payload, such as whether it has been escaped for Annex-B

const (
	PayloadRawBytes PayloadFormat = iota // Not escaped (RBSP)
	PayloadAnnexB                        // Annex-B escaped (SODB)
)

type VideoDecoder

type VideoDecoder struct {
	// contains filtered or unexported fields
}

VideoDecoder is a wrapper around ffmpeg, for decoding videos

func NewVideoFileDecoder

func NewVideoFileDecoder(filename string) (*VideoDecoder, error)

Create a new decoder that will decode a file

func NewVideoStreamDecoder

func NewVideoStreamDecoder(codec Codec) (*VideoDecoder, error)

Create a new decoder that you will feed with packets

func (*VideoDecoder) Close

func (d *VideoDecoder) Close()

func (*VideoDecoder) Decode

func (d *VideoDecoder) Decode(packet *VideoPacket) (*Frame, error)

Decode the packet and return a copy of the YUV image. This is used when decoding a stream (not a file).

func (*VideoDecoder) DecodeDeepRef

func (d *VideoDecoder) DecodeDeepRef(packet *VideoPacket) (*Frame, error)

WARNING: The image returned is only valid while the decoder is still alive, and it will be clobbered by the subsequent DecodeDeepRef/Decode(). The pixels in the returned image are not a garbage-collected Go slice. They point directly into the libavcodec decode buffer. That's why the function name has the "DeepRef" suffix.

func (*VideoDecoder) FrameTimeToDuration

func (d *VideoDecoder) FrameTimeToDuration(pts int64) time.Duration

Convert a native frame time to a time.Duration

func (*VideoDecoder) Height

func (d *VideoDecoder) Height() int

func (*VideoDecoder) NextFrame

func (d *VideoDecoder) NextFrame() (*Frame, error)

NextFrame reads the next frame from a file and returns a copy of the YUV image.

func (*VideoDecoder) NextFrameDeepRef

func (d *VideoDecoder) NextFrameDeepRef() (*Frame, error)

NextFrameDeepRef will read the next frame from a file and return a deep reference into the libavcodec decoded image buffer. The next call to NextFrame/NextFrameDeepRef will invalidate that image.

func (*VideoDecoder) Width

func (d *VideoDecoder) Width() int

type VideoEncoder

type VideoEncoder struct {
	InputPixelFormat AVPixelFormat
	// contains filtered or unexported fields
}

func NewVideoEncoder

func NewVideoEncoder(codec, format, filename string, width, height int, pixelFormatIn, pixelFormatOut AVPixelFormat, encoderType VideoEncoderType, fps int) (*VideoEncoder, error)

NewVideoEncoder creates a new video encoder You must Close() a video encoder when you are done using it, otherwise you will leak ffmpeg objects

func (*VideoEncoder) Close

func (v *VideoEncoder) Close()

func (*VideoEncoder) WriteImage

func (v *VideoEncoder) WriteImage(pts time.Duration, data [][]uint8, stride []int) error

Write an RGB (single plane) or YUV (3 planes) image to the encoder

func (*VideoEncoder) WriteNALU

func (v *VideoEncoder) WriteNALU(dts, pts time.Duration, nalu NALU) error

func (*VideoEncoder) WritePacket

func (v *VideoEncoder) WritePacket(dts, pts time.Duration, packet *VideoPacket) error

func (*VideoEncoder) WriteTrailer

func (v *VideoEncoder) WriteTrailer() error

type VideoEncoderType

type VideoEncoderType int
const (
	VideoEncoderTypePackets     VideoEncoderType = C.EncoderTypePackets     // Sending pre-encoded packets/NALUs to the encoder
	VideoEncoderTypeImageFrames VideoEncoderType = C.EncoderTypeImageFrames // Sending image frames to the encoder
)

type VideoPacket

type VideoPacket struct {
	RawRecvID   int64     // Arbitrary monotonically increasing ID of raw received. Used to detect dropped packets, or other issues like that.
	ValidRecvID int64     // Arbitrary monotonically increasing ID of useful decoded packets. Used to detect dropped packets, or other issues like that.
	RecvTime    time.Time // Wall time when the packet was received. This is obviously subject to network jitter etc, so not a substitute for PTS
	H264NALUs   []NALU
	H264PTS     time.Duration
	WallPTS     time.Time // Reference wall time combined with the received PTS. We consider this the ground truth/reality of when the packet was recorded.
	IsBacklog   bool      // a bit of a hack to inject this state here. maybe an integer counter would suffice? (eg nBacklogPackets)
}

VideoPacket is what we store in our ring buffer

func ClonePacket

func ClonePacket(nalusIn [][]byte, pts time.Duration, recvTime time.Time, wallPTS time.Time, isPayloadAnnexBEncoded bool) *VideoPacket

Clone a packet of NALUs and return the cloned packet NOTE: gortsplib re-uses buffers, which is why we copy the payloads. NOTE2: I think that after upgrading gortsplib in Jan 2024, it no longer re-uses buffers, so I should revisit the requirement of our deep clone here.

func (*VideoPacket) Clone

func (p *VideoPacket) Clone() *VideoPacket

Deep clone of packet buffer

func (*VideoPacket) EncodeToAnnexBPacket

func (p *VideoPacket) EncodeToAnnexBPacket() []byte

Encode all NALUs in the packet into AnnexB format (i.e. with 00,00,01 prefix bytes)

func (*VideoPacket) FirstNALUOfType

func (p *VideoPacket) FirstNALUOfType(t h264.NALUType) *NALU

Returns the first NALU of the given type, or nil if none exists

func (*VideoPacket) HasIDR

func (p *VideoPacket) HasIDR() bool

Returns true if this packet has a keyframe

func (*VideoPacket) HasType

func (p *VideoPacket) HasType(t h264.NALUType) bool

Return true if this packet has a NALU of type t inside

func (*VideoPacket) IsIFrame

func (p *VideoPacket) IsIFrame() bool

Return true if this packet has one NALU which is an intermediate frame

func (*VideoPacket) PayloadBytes

func (p *VideoPacket) PayloadBytes() int

Returns the number of bytes of NALU data. If the NALUs have annex-b prefixes, then these are included in the size.

func (*VideoPacket) Summary

func (p *VideoPacket) Summary() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL