Documentation ¶
Index ¶
- Constants
- Variables
- func GNUAwkCheckSyntax(smell Smell) error
- func GoCheckSyntax(smell Smell) error
- func IsAltShellScript(smell Smell) bool
- func PHPCheckSyntax(smell Smell) error
- func POSIXShCheckSyntax(smell Smell) error
- func PerlishCheckSyntax(smell Smell) error
- func PythonCheckSyntax(smell Smell) error
- func UnixCheckSyntax(smell Smell) error
- type Smell
- type SniffConfig
Constants ¶
const Version = "0.0.27"
Version is semver.
Variables ¶
var ALTEXTENSIONS = map[string]bool{ ".osh": true, ".lksh": true, ".csh": true, ".cshrc": true, ".tcsh": true, ".tcshrc": true, ".fish": true, ".fishrc": true, ".ion": true, ".ionrc": true, ".rc": true, ".rcrc": true, ".tsh": true, ".etsh": true, ".elv": true, }
ALTEXTENSIONS collects some alternative shell script file extensions.
var ALTFILENAMES = map[string]bool{ "csh.login": true, "csh.logout": true, "rc.elv": true, }
ALTFILENAMES matches some alternative shell script profile filenames.
var ALTINTERPRETERS = map[string]bool{ "osh": true, "lksh": true, "csh": true, "tcsh": true, "fish": true, "ion": true, "rc": true, "tsh": true, "etsh": true, "elvish": true, }
ALTINTERPRETERS collects some alternative shell interpreters.
var BOMS = map[string]bool{ "\uFFBBBF": true, "\uFEFF": true, "\uFFFE": true, "\u0000FEFF": true, "\uFFFE0000": true, "\u2B2F7638": true, "\u2B2F7639": true, "\u2B2F762B": true, "\u2B2F762F": true, }
BOMS acts as a registry set of known Byte Order mark sequences. See https://en.wikipedia.org/wiki/Byte_order_mark for more information.
var FullBashInterpreters = map[string]bool{ "bash": true, "bash4": true, }
FullBashInterpreters note when a shell has the basic modern bash features, as opposed to subsets such as ash, dash, posh, ksh, zsh.
var INTERPRETERS2POSIXyNESS = map[string]bool{ "sh": true, "tsh": false, "etsh": false, "bash": true, "bash4": true, "bosh": true, "yash": true, "zsh": true, "hsh": true, "lksh": false, "ksh": true, "ksh88": true, "pdksh": true, "ksh93": true, "mksh": true, "oksh": true, "rksh": true, "dash": true, "posh": true, "ash": true, "csh": false, "tcsh": false, "fish": false, "rc": false, "python": false, "jython": false, "perl": false, "perl6": false, "ruby": false, "jruby": false, "php": false, "lua": false, "node": false, "awk": false, "gawk": false, "sed": false, "swift": false, "tclsh": false, "ion": false, "elvish": false, "expect": false, "stash": false, }
INTERPRETERS2POSIXyNESS is a fairly exhaustive map of interpreters to whether or not the interpreter is a POSIX compatible shell. Newly minted interpreters can be added by stank contributors.
var Interpreter2SyntaxValidator = map[string]func(Smell) error{ "generic-sh": POSIXShCheckSyntax, "sh": POSIXShCheckSyntax, "ash": UnixCheckSyntax, "bash": UnixCheckSyntax, "bash4": UnixCheckSyntax, "dash": UnixCheckSyntax, "posh": UnixCheckSyntax, "elvish": UnixCheckSyntax, "ksh": UnixCheckSyntax, "ksh88": UnixCheckSyntax, "ksh93": UnixCheckSyntax, "mksh": UnixCheckSyntax, "oksh": UnixCheckSyntax, "pdksh": UnixCheckSyntax, "rksh": UnixCheckSyntax, "lksh": UnixCheckSyntax, "bosh": UnixCheckSyntax, "osh": UnixCheckSyntax, "yash": UnixCheckSyntax, "zsh": UnixCheckSyntax, "csh": UnixCheckSyntax, "tcsh": UnixCheckSyntax, "rc": UnixCheckSyntax, "fish": UnixCheckSyntax, "make": UnixCheckSyntax, "gmake": UnixCheckSyntax, "bmake": UnixCheckSyntax, "pmake": UnixCheckSyntax, "perl": PerlishCheckSyntax, "perl6": PerlishCheckSyntax, "ruby": PerlishCheckSyntax, "node": PerlishCheckSyntax, "iojs": PerlishCheckSyntax, "php": PHPCheckSyntax, "python": PythonCheckSyntax, "python3": PythonCheckSyntax, "go": GoCheckSyntax, "gawk": GNUAwkCheckSyntax, }
Interpreter2SyntaxValidator provides syntax validator delegates, if one is available.
var KshInterpreters = map[string]bool{ "ksh": true, "ksh88": true, "pdksh": true, "ksh93": true, "mksh": true, "oksh": true, "rksh": true, }
KshInterpreters note when a shell is a member of the modern ksh family.
var LOWEREXTENSIONS2CONFIG = map[string]bool{ ".shrc": true, ".shinit": true, ".profile": true, ".bash_profile": true, ".bashrc": true, ".bash_login": true, ".bash_logout": true, ".ashrc": true, ".dashrc": true, ".kshrc": true, ".zshenv": true, ".zprofile": true, ".zshrc": true, ".zlogin": true, ".zlogout": true, ".cshrc": true, ".tcshrc": true, ".fishrc": true, ".rcrc": true, ".ionrc": true, }
LOWEREXTENSIONS2CONFIG is a fairly exhaustive map of lowercase file extensions to whether or not they represent shell script configurations. Newly minted extensions can be added by stank contributors.
var LOWEREXTENSIONS2INTERPRETER = map[string]string{
".sh": "sh",
".shrc": "sh",
".shinit": "sh",
".bash": "bash",
".bashrc": "bash",
".zsh": "zsh",
".zshrc": "zsh",
".zlogin": "zsh",
".zlogout": "zsh",
".hsh": "hsh",
".ksh": "ksh",
".lkshrc": "lksh",
".kshrc": "ksh",
".ksh88": "ksh",
".pdksh": "pdksh",
".pdkshrc": "pdksh",
".ksh93": "ksh93",
".ksh93rc": "ksh93",
".mksh": "mksh",
".mkshrc": "mksh",
".dash": "dash",
".dashrc": "dash",
".poshrc": "posh",
"ash": "ash",
".ashrc": "ash",
".zshenv": "zsh",
".zprofile": "zsh",
".csh": "csh",
".cshrc": "csh",
".tcsh": "tcsh",
".tcshrc": "tcsh",
".fish": "fish",
".fishrc": "fish",
".rc": "rc",
".rcrc": "rc",
".ion": "ion",
".ionrc": "ion",
".profile": "sh",
".bash_profile": "bash",
".bash_login": "bash",
".bash_logout": "bash",
".zshprofile": "zsh",
".elv": "elvish",
".php": "php",
".lua": "lua",
".mf": "make",
".makefile": "make",
".gnumakefile": "gmake",
".bsdmakefile": "bmake",
".pmakefile": "pmake",
".awk": "awk",
".gawk": "gawk",
".sed": "sed",
}
LOWEREXTENSIONS2INTERPRETER is a fairly exhaustive map of lowercase file extensions to their corresponding interpreters. Newly minted config extensions can be added by stank contributors.
var LOWEREXTENSIONS2POSIXyNESS = map[string]bool{}/* 101 elements not displayed */
LOWEREXTENSIONS2POSIXyNESS is a fairly exhaustive map of lowercase file extensions to whether or not they represent POSIX shell scripts. Newly minted extensions can be added by stank contributors.
var LOWERFILENAMES2CONFIG = map[string]bool{ "shrc": true, "shinit": true, "profile": true, "login": true, "logout": true, "bash_login": true, "bash_logout": true, "zshenv": true, "zprofile": true, "zshrc": true, "zlogin": true, "zlogout": true, "csh.login": true, "csh.logout": true, "tcsh.login": true, "tcsh.logout": true, "rcrc": true, "rc.elv": true, }
LOWERFILENAMES2CONFIG is a fairly exhaustive map of lowercase filenames to whether or not they represent shell script configurations. Newly minted config filenames can be added by stank contributors.
var LOWERFILENAMES2INTERPRETER = map[string]string{
".shrc": "sh",
".shinit": "sh",
".bashrc": "bash",
".zshrc": "zsh",
".zlogin": "zsh",
".zlogout": "zsh",
".lkshrc": "lksh",
".kshrc": "ksh",
".pdkshrc": "pdksh",
".ksh93rc": "ksh93",
".mkshrc": "mksh",
".dashrc": "dash",
".poshrc": "posh",
".ashrc": "ash",
".zshenv": "zsh",
".zprofile": "zsh",
".cshrc": "csh",
".tcshrc": "tcsh",
".fishrc": "fish",
".rcrc": "rc",
".ionrc": "ion",
"profile": "sh",
".login": "sh",
".logout": "sh",
"zshenv": "zsh",
"zprofile": "zsh",
"zshrc": "zsh",
"zlogin": "zsh",
"zlogout": "zsh",
"csh.login": "csh",
"csh.logout": "csh",
"tcsh.login": "tcsh",
"tcsh.logout": "tcsh",
"rc.elv": "elvish",
"makefile": "make",
"gnumakefile": "gmake",
"bsdmakefile": "bmake",
"pmakefile": "pmake",
}
LOWERFILENAMES2INTERPRETER is a fairly exhaustive map of lowercase filenames to their corresponding interpreters. Newly minted config filenames can be added by stank contributors.
var LOWERFILENAMES2POSIXyNESS = map[string]bool{ "shrc": true, "shinit": true, ".profile": true, "profile": true, "login": true, "logout": true, "bash_login": true, "bash_logout": true, "zshenv": true, "zprofile": true, "zshrc": true, "zlogin": true, "zlogout": true, "csh.login": false, "csh.logout": false, "tcsh.login": false, "tcsh.logout": false, "rcrc": false, "makefile": false, "readme": false, "changelog": false, "rc.elv": false, "thumbs.db": false, }
LOWERFILENAMES2POSIXyNESS is a fairly exhaustive map of lowercase filenames to whether or not they represent POSIX shell scripts. Newly minted config filenames can be added by stank contributors.
var LOWERMACHINEEXTENSIONS = map[string]bool{ ".sample": true, }
LOWERMACHINEEXTENSIONS collects a rather truncated survey of machine-generated file extensions likely to not be edited directly by most shell script authors.
Functions ¶
func GNUAwkCheckSyntax ¶ added in v0.0.17
GNUAwkCheckSyntax validates syntax for GNU awk files.
func GoCheckSyntax ¶ added in v0.0.17
GoCheckSyntax validates syntax for Go.
func IsAltShellScript ¶ added in v0.0.12
IsAltShellScript returns whether a smell represents a non-POSIX, but nonetheless similar kind of lowlevel shell script language.
func PHPCheckSyntax ¶ added in v0.0.17
PHPCheckSyntax validates syntax for PHP.
func POSIXShCheckSyntax ¶ added in v0.0.17
POSIXShCheckSyntax validates syntax for strict POSIX sh compliance.
func PerlishCheckSyntax ¶ added in v0.0.17
PerlishCheckSyntax validates syntax for Perl, Ruby, and Node.js.
func PythonCheckSyntax ¶ added in v0.0.17
PythonCheckSyntax validates syntax for Python.
func UnixCheckSyntax ¶ added in v0.0.17
UnixCheckSyntax validates syntax for the wider UNIX shell family.
Types ¶
type Smell ¶
type Smell struct { Path string Filename string Basename string Extension string Symlink bool Shebang string Interpreter string InterpreterFlags []string LineEnding string FinalEOL *bool ContainsCR bool Permissions os.FileMode Directory bool OwnerExecutable bool Library bool BOM bool POSIXy bool Bash bool Ksh bool AltShellScript bool CoreConfiguration bool MachineGenerated bool }
Smell describes the overall impression of a file's POSIXyness, using several factors to determine with a reasonably high accuracy whether or not the file is a POSIX compatible shell script.
An idiomatic shebang preferably leads the file, such as #!/bin/bash, #!/bin/zsh, #!/bin/sh, etc. This represents good form when writing shell scripts, in particular ensuring that the script will be evaluated by the right interpreter, even if the extension is omitted or a generic ".sh". Shell scripts, whether executable applications or source'able libraries, should include a shebang. One attribute not analyzed by this library is unix file permission bits. Application shell scripts should set the executable bit(s) to 1, while shell scripts intended to be sourced or imported should not set these bits. Either way, the bits have hardly any correlation with the POSIXyness of a file, as the false positives and false negatives are too frequent.
Common filenames for POSIX compatible scripts include .profile, .login, .bashrc, .bash_profile, .zshrc, .kshrc, .envrc*, and names for git hooks. The stank library will catalog some of these standard names, though application-specific filenames are various and sundry. Ultimately, all files containing POSIX compatible shell content should include a shebang, to help interpreters, editors, and linters identify POSIX shell content.
File extension is another way to estimate a script's POSIXyness. For example, ".bash", ".ksh", ".posh", ".sh", etc. would each indicate a POSIX compatible shell script, whereas ".py", ".pl", ".rb", ".csh", ".rc", and so on would indicate nonPOSIX script. File extensions are often omitted or set to a generic ".sh" for command line applications, in which case the extension is insufficient for establishing the POSIX vs. nonPOSIX nature of the script. This is why shebangs are so important; while file extensions can be helpful, shell scripts really rely moreso on the shebang for self identification, and extensions aren't always desirable, as unix CLI applications prefer to omit the extension from the filename for brevity. Note that some filenames such as ".profile" may be logically considered to have basename "" (blank) and extension ".profile", or basename ".profile" with extension ".profile", or else basename ".profile" and extension "" (blank). In practice, Go treats both the basename and extension for these kinds of files as containing ".profile", and Smell will behave accordingly.
File encoding also sensitive for shell scripts. Generally, ASCII subset is recommended for maximum portability. If your terminal supports it, the LANG environment variable can be altered to accept UTF-8 and other encodings, enabling raw UTF-8 data to be used in script contents. However, this restricts your scripts to running only on systems explicitly configured to match the encoding/locale of your script; and tends to furter limit the platforms for your script to specifically GNU libc Linux distributions, so using nonASCII content in your scripts is inadvisable. Shell scripts conforming to POSIX should really use pure ASCII characters. NonUTF-8 encodings such as UTF-16, UTF-32, and even nonUnicode encodings like EBCDIC, Latin1, and KOI8-R usually indicate a nonPOSIX shell script, even a localization file or other nonscript. These encodings are encountered less often than ASCII and UTF-8, and are generally considered legacy formats. For performance reasons, the stank library will not attempt to discern the exact encoding of a file, but merely report whether the file leads with a byte order marker such as 0xEFBBBF (UTF-8) or 0xFEFF (UTF-16, UTF-32). If BOM, then the file is Unicode, which may lead to a stank warning, as POSIX shell scripts are best written in pure ASCII, for maximum cross-platform compatibliity. BOMs are outside of the 127 max integer range for ASCII values, so a file with a BOM is likely not a POSIX shell script, while a file without a BOM may be a POSIX shell script.
Line endings for POSIX shell scripts should LF="\n" in C-style notation. Alternative line endings such as CRLF="\r\n", ancient Macintosh CR="\r", and bizarre forms like vertical tab (ASCII code 0x0B) or form feed (ASCII code 0x0C) are possible in a fuzzing sense, but may lead to undefined behavior depending on the particular shell interpreter. For the purposes of identifying POSIX vs nonPOSIX scripts, a Smell will look for LF, CRLF, and CR; and ignore the presence or absence of these other exotic whitespace separators. NonPOSIX scripts written in Windows, such as Python and Ruby scripts, are ideally written with LF line endings, though it is common to observe CRLF endings, as Windows users more frequently invoke these as "python script.py", "ruby script.rb", rather than the bare "script" or dot slash "./script" forms typically used by unix administrators. For performance, the stank library will not report possible multiple line ending styles, such as poorly formatted text files featuring both CRLF and LF line endings. The library will simply report the first confirmed line ending style.
Moreover, POSIX line ending LF is expected at the end of a text file, so a final end of line character "\n" is good form. Common unix utilities such as cat expect this final EOL, and will misrender the successive shell prompt when processing files that omit the final EOL. Make expects a final EOL, and gcc may produce malformed .c code if the .h header files neglect to include a final EOL. For performance reasons, the stank library will not attempt to read the entire file to report on the presence/absence of a final EOL. Shell script authors should nonetheless configure their text editors to consistently include a final EOL in the vast majority of text file formats.
A POSIXy flag indicates that, to the best of the stank library's ability, a file is identified as either very likely a POSIX shell script, or something else. Something else encompasses nonPOSIX shell scripts such as Csh, Tcsh, Python, Ruby, Lua scripts; also encompasses nonscript files such as multimedia images, audio, rich text documents, machine code, and other nonUTF-8, nonASCII content.
Scripts referencing "sh" are generally considered to be POSIX sh. Ignoring unmarked legacy Thompson sh scripts.
Unknown, even more obscure languages are assumed to be non-POSIXY.
Languages with duplicate names (e.g. oil shell osh vs. OpenSolaris oil shell) are generally assumed not to be POSIXy. Unable to disambiguate without more specific information (shebang names, file extentions).
func Sniff ¶
func Sniff(pth string, config SniffConfig) (Smell, error)
Sniff analyzes the holistic smell of a given file path, returning a Smell record of key indicators tending towards either POSIX compliance or noncompliance, including a flag for the final "POSIXy" trace scent of the file.
For performance, if the scent of one or more attributes obviously indicates POSIX or nonPOSIX, Sniff() may short-circuit, setting the POSIXy flag and returning a record with some attributes set to zero value.
Polyglot and multiline shebangs are technically possible in languages that do not support native POSIX-style shebang comments ( see https://rosettacode.org/wiki/Multiline_shebang ). However, Sniff() can reliably identify only ^#!.+$ POSIX-style shebangs, and will populate the Shebang field accordingly.
If an I/O problem occurs during analysis, an error value will be set. Otherwise, the error value will be nil.
type SniffConfig ¶ added in v0.0.9
SniffConfig bundles together the various options when sniffing files for POSIXyNESS.