Documentation ¶
Overview ¶
Package magicnumber contains the magic number matchers for identifying file types that are expected to be handled by the Defacto2 server application. Magic numbers are not always accurate and should be used as hints combined with other checks such as file extension matching.
Usually, the magic number is the first few bytes of a file that uniquely identify the file type. But a number of document formats also check the final few bytes of a file.
The sources for the magic numbers byte values are from the following:
Index ¶
- Constants
- Variables
- func AAC(r io.ReaderAt) bool
- func ASCII(r io.ReaderAt) bool
- func Ansi(r io.ReaderAt) bool
- func ArcFree(r io.ReaderAt) bool
- func ArcSEA(r io.ReaderAt) bool
- func Arj(r io.ReaderAt) bool
- func Avi(r io.ReaderAt) bool
- func Avif(r io.ReaderAt) bool
- func Bmp(r io.ReaderAt) bool
- func Bzip2(r io.ReaderAt) bool
- func CSI(r io.ReaderAt) bool
- func Cab(r io.ReaderAt) bool
- func CodePage(r io.ReaderAt) bool
- func ConvLatin1(p []byte) (string, error)
- func ConvSize(p []byte) int64
- func Daa(r io.ReaderAt) bool
- func DosKWAJ(r io.ReaderAt) bool
- func DosSZDD(r io.ReaderAt) bool
- func Empty(r io.ReaderAt) bool
- func Flac(r io.ReaderAt) bool
- func Flv(r io.ReaderAt) bool
- func Gif(r io.ReaderAt) bool
- func Gzip(r io.ReaderAt) bool
- func Hlp(r io.ReaderAt) bool
- func ID3v220(r io.ReaderAt) string
- func ID3v22Frame(id [3]byte, data ...byte) string
- func ID3v230(r io.ReaderAt) string
- func ID3v23Frame(id [4]byte, data ...byte) string
- func ISO(r io.ReaderAt) bool
- func IT(r io.ReaderAt) bool
- func Ico(r io.ReaderAt) bool
- func Iff(r io.ReaderAt) bool
- func Ilbm(r io.ReaderAt) bool
- func IlbmDecode(r io.ReaderAt) (int, int)
- func Ivr(r io.ReaderAt) bool
- func Jpeg(r io.ReaderAt) bool
- func Jpeg2000(r io.ReaderAt) bool
- func JpegNoSuffix(r io.ReaderAt) bool
- func Length(r io.ReaderAt) int64
- func LzhLha(r io.ReaderAt) bool
- func M4v(r io.ReaderAt) bool
- func MK(r io.ReaderAt) bool
- func MSComp(r io.ReaderAt) bool
- func MSExe(r io.ReaderAt) bool
- func MTM(r io.ReaderAt) bool
- func Mdf(r io.ReaderAt) bool
- func Midi(r io.ReaderAt) bool
- func Mp3(r io.ReaderAt) bool
- func Mp4(r io.ReaderAt) bool
- func Mpeg(r io.ReaderAt) bool
- func MusicID3v1(r io.ReaderAt) string
- func MusicID3v2(r io.ReaderAt) string
- func MusicIT(r io.ReaderAt) string
- func MusicMK(r io.ReaderAt) string
- func MusicMTM(r io.ReaderAt) string
- func MusicTracker(r io.ReaderAt) string
- func MusicXM(r io.ReaderAt) string
- func NonISO889591(b byte) bool
- func NonWindows1252(b byte) bool
- func NotASCII(b byte) bool
- func NotPlainText(b byte) bool
- func Nri(r io.ReaderAt) bool
- func Ogg(r io.ReaderAt) bool
- func Pcx(r io.ReaderAt) bool
- func Pdf(r io.ReaderAt) bool
- func PkImplode(r io.ReaderAt) bool
- func PkReduce(r io.ReaderAt) bool
- func PkShrink(r io.ReaderAt) bool
- func Pklite(r io.ReaderAt) bool
- func Pksfx(r io.ReaderAt) bool
- func Pkzip(r io.ReaderAt) bool
- func PkzipMulti(r io.ReaderAt) bool
- func Png(r io.ReaderAt) bool
- func QTMov(r io.ReaderAt) bool
- func RIFF(r io.ReaderAt) bool
- func Rar(r io.ReaderAt) bool
- func Rarv5(r io.ReaderAt) bool
- func Ripscrip(r io.ReaderAt) bool
- func Rtf(r io.ReaderAt) bool
- func Tar(r io.ReaderAt) bool
- func Tiff(r io.ReaderAt) bool
- func Txt(r io.ReaderAt) bool
- func TxtLatin1(r io.ReaderAt) bool
- func TxtWindows(r io.ReaderAt) bool
- func Utf16(r io.ReaderAt) bool
- func Utf32(r io.ReaderAt) bool
- func Utf8(r io.ReaderAt) bool
- func Wave(r io.ReaderAt) bool
- func Webp(r io.ReaderAt) bool
- func Wmv(r io.ReaderAt) bool
- func X7z(r io.ReaderAt) bool
- func XM(r io.ReaderAt) bool
- func XZ(r io.ReaderAt) bool
- func ZStd(r io.ReaderAt) bool
- func Zip64(r io.ReaderAt) bool
- func Zoo(r io.ReaderAt) bool
- type Extension
- type Finder
- type Matcher
- type NewExecutable
- type PortableExecutable
- type Signature
- func Archive(r io.ReaderAt) (Signature, error)
- func Archives() []Signature
- func ArchivesBBS() []Signature
- func DiscImage(r io.ReaderAt) (Signature, error)
- func DiscImages() []Signature
- func Document(r io.ReaderAt) (Signature, error)
- func Documents() []Signature
- func Find(r io.ReaderAt) Signature
- func Image(r io.ReaderAt) (Signature, error)
- func Images() []Signature
- func MatchExt(filename string, r io.ReaderAt) (bool, Signature, error)
- func Program(r io.ReaderAt) (Signature, error)
- func Programs() []Signature
- func Text(r io.ReaderAt) (Signature, error)
- func Texts() []Signature
- func Video(r io.ReaderAt) (Signature, error)
- func Videos() []Signature
- type Windows
- type WindowsName
Examples ¶
Constants ¶
const ID3v1Size = 128
ID3v1Size is the minimum buffer size of an ID3 v1 tag.
Variables ¶
var ErrNilReader = errors.New("nil reader")
Functions ¶
func ASCII ¶
ASCII returns true if the reader exclusively contains printable ASCII characters. Today, ASCII characters are the first characters of the Unicode character set but historically it was a 7 and 8-bit character encoding standard found on most microcomputers, personal computers, and the early Internet.
func Ansi ¶
Ansi returns true if the reader contains some common ANSI escape codes. It for speed and to avoid false positives it only matches the ANSI escape codes for bold, normal and reset text.
func Avif ¶
Avif matches the AV1 Image File image format in the byte slice, also known as AVIF. This is a new image format based on the AV1 video codec from the Alliance for Open Media. But the detection method is not accurate and should be used as a hint.
func CSI ¶ added in v1.0.3
CSI returns true if the reader contains three or more common Control Sequence Introducer (CSI) escape codes that are used in ANSI encoded texts. This is a heuristic function and does not guarantee that the reader contains ANSI encoded text.
func CodePage ¶ added in v1.0.1
CodePage returns true if the reader contains is a possible IBM code page text file that was often found on DOS and 16-bit Windows computers.
This function is heuristic and checks for the following:
- no multiple nulls before the EOF marker
- require IBM PC/Microsoft newlines
- number of newlines should be at least (80 columns / length of file) / halfed
func ConvLatin1 ¶
ConvLatin1 converts a byte slice to a Latin-1 (ISO-8859-1) string.
func ConvSize ¶
ConvSize converts a byte slice to an integer. The byte slice is a synchsafe integer, which is a 7-bit integer.
func DosKWAJ ¶
DosKWAJ returns true if the reader begins with the KWAJ compression signature, found in some DOS executables.
func Gif ¶
Gif matches the image Graphics Interchange Format. There are two versions of the GIF format, GIF87a and GIF89a.
func Hlp ¶
Hlp returns true if the reader contains the Windows Help File signature. This is a generic signature for Windows help files and does not differentiate between the various versions of the help file format.
func ID3v220 ¶
ID3v220 reads the ID3 v2.2 tags in the byte slice and returns the song, artist and year. The v2.2 tag is obsolete but still found in the wild.
func ID3v22Frame ¶
ID3v22Frame reads the ID3 v2.2 frame in the byte slice and returns the frame data as a string. The frame header contains a 3 byte identifier followed by a 3 byte size.
func ID3v230 ¶
ID3v230 reads the ID3 v2.3 and ID3 v2.4 tags in the byte slice and returns the song, artist and year. The v2.3 and v2.4 tags are the most common ID3 tags found in MP3 files. For our purposes, we treat v2.3 and v2.4 tags the same as there's no difference for the metadata used.
func ID3v23Frame ¶
ID3v23Frame reads the ID3 v2.3 and v2.4 frame in the byte slice and returns the frame data as a string. The frame header contains a 4 byte identifier followed by a 4 byte size.
func ISO ¶
ISO returns true if the reader contains the ISO 9660 CD-ROM filesystem signature. To be accurate, it requires at least 36KB of data to be read.
func Iff ¶
Iff matches the Interchange File Format image. This is a generic wrapper format originally created by Electronic Arts for storing data in chunks.
func Ilbm ¶
Ilbm matches the InterLeaved Bitmap image format. Created by Electronic Arts it conforms to the IFF standard.
func IlbmDecode ¶
IlbmDecode reads the InterLeaved Bitmap image format in the reader and returns the width and height.
func JpegNoSuffix ¶
JpegNoSuffix matches the JPEG File Interchange Format v1 image. This is a less accurate method than Jpeg as it does not check the final bytes.
func Mp3 ¶
Mp3 matches the MPEG-1 Audio Layer 3 audio format. This only checks for the ID3v2 tag and not the audio data. Songs with no ID3v2 tag will not be detected including files with ID3v1 tags.
func MusicID3v1 ¶
MusicID3v1 reads the ID3 v1 tag in the byte slice and returns the song, artist and year. The ID3 v1 tag is a 128 byte tag at the end of an MP3 audio file.
func MusicID3v2 ¶
MusicID3v2 reads the ID3 v2 tag in the byte slice and returns the song, artist and year. The ID3 v2 tag is a variable length tag at the start of an MP3 audio file.
func MusicIT ¶
MusicIT returns the Impulse Tracker song or title in the byte slice if available. The Impulse Tracker format is a tracked music format created by Jeffrey Lim.
func MusicMK ¶
MusicMK returns the MOD song or title in the byte slice if available. The Soundtracker MOD format is a tracked music format created by Karsten Obarski on the Commodore Amiga. The original MOD format had no signature, but the M.K. signature was added by Mahoney & Kaktus in their MOD samples and became a common signature in the MOD format.
Common MOD formats include the original The Ultimate Soundtracker, Protracker, FastTracker II...
func MusicMTM ¶
MusicMTM returns the MultiTracker song or title in the byte slice if available. The MultiTracker format is a tracked music format created by the scene group Renaissance.
func MusicTracker ¶
MusicMod returns the tracked music format in the byte slice and the name or title of the song if available. The tracked music formats include MultiTracker, Impulse Tracker, Extended Module, and 4 channel MODule music.
Modland has a large collection of tracked music format documentation.
func MusicXM ¶
MusicXM returns the eXtended Module song or title in the byte slice if available. The XM format was originally used by FastTracker II (FT2) and later modified by other trackers.
func NonISO889591 ¶
NonISO889591 returns true if the byte is not a printable ISO/IEC-8895-1 character.
func NonWindows1252 ¶
NonWindows1252 returns true if the byte is not a printable Windows-1252 character.
func NotASCII ¶
NotASCII returns true if the byte is not an printable ASCII character. Most control characters are not printable ASCII characters, but an exception is made for the ESC (escape) character which is used in ANSI escape codes and the EOF (end of file) character which is used in DOS.
func NotPlainText ¶
NotPlainText returns true if the byte is not a printable plain text character. This includes any printable ASCII character as well as any "extended ASCII".
func Nri ¶
Nri returns true if the reader contains the Nero CD image signature. This method is untested.
func PkImplode ¶
PkImplode matches the PKWARE Implode method zip archive format. This is a legacy method and is generally not supported in modern ZIP tools and libraries.
func PkReduce ¶
PkReduce matches the PKWARE Reduce method zip archive format. This is a legacy method and is generally not supported in modern ZIP tools and libraries.
func PkShrink ¶
PkShrink matches the PKWARE Shrink method zip archive format. This is a legacy method and is generally not supported in modern ZIP tools and libraries.
func Pklite ¶
Pklite matches the PKLITE archive format in the byte slice which is a compressed executable format for DOS and 16-bit Windows.
func Pksfx ¶
Pksfx matches the PKSFX archive format in the byte slice which is a self-extracting archive format.
func PkzipMulti ¶
PkzipMulti matches the PKWARE Multi-Volume Zip archive format.
func Ripscrip ¶
Ripscrip returns true if the reader contains the RIPscrip signature. This is a vector graphics format used in BBS systems in the early 1990s.
func Txt ¶
Txt returns true if the reader exclusively contains plain text ASCII characters, control characters or "extended ASCII characters".
There is a 2% threshold for non-plain text characters such as ASCII control characters which are not printable but often found in plain text files for 8-bit microcomputers.
func TxtLatin1 ¶
TxtLatin1 returns true if the reader exclusively contains plain text ISO/IEC-8895-1 characters, commonly known as the Latin-1 character set.
func TxtWindows ¶
TxtWindows returns true if the reader exclusively contains plain text Windows-1252 characters. This is an extension of the Latin-1 character set with additional typography characters and was the default character set for English in Microsoft Windows up to Windows 7?
Types ¶
type NewExecutable ¶
type NewExecutable int
NewExecutable represents the New Executable file type, a format used by Microsoft and IBM from the mid-1980s to improve on the limitations of the MS-DOS MZ executable format.
const ( NoneNE NewExecutable = iota - 1 // Not a New Executable UnknownNE // Unknown New Executable OS2Exe // Microsoft IBM OS/2 New Executable Windows286Exe // Windows requiring an Intel 286 CPU New Executable DOSv4Exe // MS-DOS v4 New Executable Windows386Exe // Windows requiring an Intel 386 CPU New Executable )
func (NewExecutable) String ¶
func (ne NewExecutable) String() string
type PortableExecutable ¶
type PortableExecutable uint16
PortableExecutable represents the Portable Executable file type, a format used by Microsoft for executables, object code, DLLs, FON Font files, and others. In this implementation, only executables for desktop Windows are considered.
const ( UnknownPE PortableExecutable = 0x0 // Unknown Portable Executable Intel386PE PortableExecutable = 0x14c // Intel 386 Portable Executable AMD64PE PortableExecutable = 0x8664 // AMD64 Portable Executable ARMPE PortableExecutable = 0x1c0 // ARM Portable Executable ARM64PE PortableExecutable = 0xaa64 // ARM64 Portable Executable ItaniumPE PortableExecutable = 0x200 // Itanium Portable Executable )
type Signature ¶
type Signature int
Signature represents a file type signature.
const ( ZeroByte Signature = iota - 2 Unknown ElectronicArtsIFF AV1ImageFile JPEGFileInterchangeFormat JPEG2000 PortableNetworkGraphics GraphicsInterchangeFormat GoogleWebP TaggedImageFileFormat BMPFileFormat PersonalComputereXchange InterleavedBitmap MicrosoftIcon RIPscrip MPEG4 QuickTimeMovie QuickTimeM4V MicrosoftAudioVideoInterleave MicrosoftWindowsMedia MPEG FlashVideo RealPlayer MusicalInstrumentDigitalInterface MPEG1AudioLayer3 MPEGAdvancedAudioCoding OggVorbisCodec FreeLosslessAudioCodec WaveAudioForWindows MusicExtendedModule MusicMultiTrackModule MusicImpulseTracker MusicProTracker PKWAREZipShrink PKWAREZipReduce PKWAREZipImplode PKWAREZip64 PKWAREZip PKWAREMultiVolume PKLITE PKSFX TapeARchive RoshalARchive RoshalARchivev5 GzipCompressArchive Bzip2CompressArchive X7zCompressArchive XZCompressArchive ZStandardArchive FreeArc ARChiveSEA YoshiLHA ZooArchive ArchiveRobertJung MicrosoftCABinet MicrosoftDOSKWAJ MicrosoftDOSSZDD MicrosoftExecutable MicrosoftCompoundFile CDISO9660 CDNero CDPowerISO CDAlcohol120 WindowsHelpFile PortableDocumentFormat RichTextFormat UTF8Text UTF16Text UTF32Text ANSIEscapeText PlainText )
func Archive ¶
Archive reads all the bytes from the reader and returns the file type signature if the file is a known archive of files or Unknown if the file is not an archive.
Example ¶
package main import ( "fmt" "os" "path/filepath" "github.com/Defacto2/magicnumber" ) func main() { f1, err := os.Open(filepath.Join("testdata", "TEST.cab")) if err != nil { panic(err) } defer f1.Close() f2, err := os.Open(filepath.Join("testdata", "README.md")) if err != nil { panic(err) } defer f2.Close() sign1, err := magicnumber.Archive(f1) if err != nil { panic(err) } fmt.Println(sign1) sign2, err := magicnumber.Archive(f2) if err != nil { panic(err) } fmt.Println(sign2) }
Output: Microsoft cabinet binary data
func ArchivesBBS ¶
func ArchivesBBS() []Signature
Archives returns all the archive file type signatures that were commonly used in the BBS online era of the 1980s and early 1990s. Eventually these were replaced by the universal ZIP format using the Deflate and Store compression methods.
func DiscImage ¶
DiscImage reads all the bytes from the reader and returns the file type signature if the file is a known CD disk image or Unknown if the file is not a disk image.
func DiscImages ¶
func DiscImages() []Signature
DiscImages returns all the CD disk image file type signatures.
func Document ¶
Document reads all the bytes from the reader and returns the file type signature if the file is a known document or Unknown if the file is not a document.
func Find ¶
Find returns the file type signature from the byte slice.
Example ¶
package main import ( "fmt" "os" "path/filepath" "github.com/Defacto2/magicnumber" ) func main() { f, err := os.Open(filepath.Join("testdata", "TEST.cab")) if err != nil { panic(err) } defer f.Close() sign := magicnumber.Find(f) fmt.Println(sign.String()) fmt.Println(sign.Title()) }
Output: Microsoft cabinet Microsoft Cabinet
func Image ¶
Image reads all the bytes from the reader and returns the file type signature if the file is a known image or Unknown if the file is not an image.
func MatchExt ¶
MatchExt determines if the reader matches the file type signature expected from the extension of the filename. It returns true if the file type matches and a found signature is always returned.
A PNG encoded image using the filename TEST.PNG will return true and the PortableNetworkGraphics signature. A PNG encoded image using the filename TEST.JPG will return false and the PortableNetworkGraphics signature.
func Program ¶
Program reads all the bytes from the reader and returns the file type signature if the file is a known DOS or Windows program or Unknown if the file is not a program.
func Programs ¶
func Programs() []Signature
Programs returns all the program file type signatures for Microsoft operating systems, DOS and Windows.
func Text ¶
Text reads the first 512 bytes from the reader and returns the file type signature if the file is a known plain text file or Unknown if the file is not a text file.
type Windows ¶
type Windows struct { TimeDateStamp time.Time // The time the executable was compiled, only included in PE files Major int // Major minimum version, for example, Windows 3.0 would be 3 Minor int // Minor minimum version, for example, Windows 3.0 would be 0 NE NewExecutable // The New Executable, a legacy format replaced by the Portable Executable format PE PortableExecutable // The Portable Executable CPU architecture PE64 bool // True if the executable is a 64-bit Portable Executable (PE32+) }
Windows represents the Windows specific information in the executable header.
func FindExecutable ¶
FindExecutable reads the first 1KB from the reader and returns the specific information contained within the executable headers. Both the New Executable and Portable Executable formats are supported, which are commonly used by IBM and Microsoft desktop operating systems from PC/MS-DOS to modern Windows.
Example ¶
package main import ( "fmt" "os" "path/filepath" "github.com/Defacto2/magicnumber" ) func main() { f, err := os.Open(filepath.Join("testdata", "binaries", "windows9x", "7za920", "7za.exe")) if err != nil { panic(err) } defer f.Close() win, err := magicnumber.FindExecutable(f) if err != nil { panic(err) } fmt.Println(win.String()) }
Output: Windows NT v4.0
func NE ¶
NE returns the New Executable file type from the byte slice.
Windows programs that are New Executables are usually for the ancient Windows 2 or 3.x editions. Windows v2 came in two versions, Windows 2 (for the 286 CPU) and Windows/386, while Windows 3.0+ unified support for both CPUs. The New Executable format was replaced by the Portable Executable format in Windows 95/NT.
If a Windows program is detected, the major and minor version numbers are returned, for example, a Windows 3.0 requirement would return 3 and 0.
func PE ¶
PE returns the Portable Executable file type from the byte slice.
The Portable Executable format is used by Microsoft for executables, object code, DLLs, FON Font files, and others. In this implementation, only executables for desktop Windows are considered. The information returned is the CPU architecture, the Windows NT version, and the time the executable was compiled.
The major and minor version numbers are not always accurate.
type WindowsName ¶
WindowsName represents the Windows version names and their minimum version numbers.
func WindowsNames ¶
func WindowsNames() WindowsName
WindowsNames returns the Windows version names and their minimum version numbers. The minimum version numbers are based on the minimum system version required by the executable, and not the libraries or system calls in use by the program.
The minimum version numbers were discontinued by Microsoft in Windows 8.1 and may not be accurate for modern programs.