Documentation ¶
Overview ¶
This package compiles graph files describing a non-deterministic automaton to a compact(ed) deterministic automaton.
Usage ccda [IN] OUT Compiles the input graph file IN and writes the automaton to OUT. If IN is omitted, the graph file is read from stdin.
ccda -R [-start] [-end] IN [FILE...] Reads the compact(ed) automaton from IN and rewrites FILE... to stout using the automaton as rewrite lexicon. If no files are given, the input is read from stdin.
Graph file syntax ¶
Lines starting with `#` are comments; empty lines are ignored. ¶
# State names # 0 denotes the initial state; there cannot be more than one initial state. # State names are strings without any whitespace. They cannot contain `@` and/or `\` # and cannot start with `_`. # Final states have to be marked using `#final NAME REWRITE`.
# Final states # To mark the state NAME final and set its rewrite string use: #final NAME final string data ...
# Transitions # Transitions denote transitions from a state to the next. # Use valid state name to reference different states. # To denote a transition from SRC state to DST state accepting EXPR # use: SRC DST EXPR # To denote an empty (automatic) transition from SRC to DST use: SRC DST # Note that any leading and/or subsequent whitespace around SRC, EXPR and DST # are ignored.
# Replacements (macros) # To denote the replacement of XXX with YYY in all EXPR use: #define XXX YYY
# Special symbols # The following symbols have a special meaning in EXPR: # - `.()[]*+?\` # To use any of the above symols literally in an expression, # you have to escape them using `\`. This includes symbols # in macros and within square brackets (`[...]`).
# Include graph files # To include the contents of another file use: #include /path/to/file
# Renaming state names # To rename state names (or parts of state names) use: #rename OLD NEW
################################################## # Expressions (assuming a final state named `1`) # ##################################################
# Character classes # Accepted language: A|B|...|Z|a|b|...|z 0 1 [A-Za-z]
# Negated character classes # Accepts any sequence of characters without a or b 0 1 [^ab]*
# Dot accepts anything # Accepts any sequence of characters. 0 1 .*
# One or more matches # Accepted language: (a|b|...|z)(a|b|...|z)* 0 1 ([a-z])+
# Zero or more matches # Accepted language: (a|b|...|z)* 0 1 [a-z]*
# Optional matches # Accepted language: (0|1|...|9)+(((.(0|1|...|9)+)|ε) 0 1 [0-9]+(.[0-9]+)?
# Combination of expressions # Accepted language: (abc)*(0|1|...|9)+ 0 1 (abc)*[0-9]+
# Empty transitions # Accepted language: (a|b|...|z)* 0 1 [a-z] 1 0 0 1
# Macros # Accepted language: (a|b|...|z)(0|1|...|9)+ #define <d> [0-9] #define <l> [a-z] 0 2 <l> 2 1 <d>+
# Dictionaries # Accepted language: abc|def 0 1 @dict @dict abc @dict def
# Accepted language: [0-9](abc|def) 0 1 [0-9]@dict @dict abc @dict def
# Accepted language: (abc|def)[0-9] 0 1 @dict[0-9] @dict abc @dict def
# Accepted language: (abc|def)ghi 0 1 (@dict)ghi @dict abc @dict def
# Escape syntax: # Accepted language: ([|])* 0 1 ([\[\]])*
# Escape sequences: # Accepted language: iä🦖 0 1 \x69\u00e4\U0001F996
# Unicode classes: # unicode classes can be used in [...] expression or direct in # normal expressions. use \pN to refer to the unicode class N or # \p{NAME} to refer to the unicode class NAME.