Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
analysestats
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation.
|
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation. |
avg-lines
avg-lines prints a report of the average confidence for each line, sorted from worst to best
|
avg-lines prints a report of the average confidence for each line, sorted from worst to best |
boxtotxt
boxtotxt converts a Tesseract .box file to plain text
|
boxtotxt converts a Tesseract .box file to plain text |
bucket-lines
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
|
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line |
dehyphenate
dehyphenate does basic dehyphenation on a hocr file
|
dehyphenate does basic dehyphenation on a hocr file |
dlgbook
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory
|
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory |
eeboxmltohocr
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
|
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF |
extracthocrlines
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training
|
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training |
fonttobytes
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
|
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary |
hocrtotxt
hocrtotxt prints the text from a hocr file
|
hocrtotxt prints the text from a hocr file |
iiifdownloader
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url
|
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url |
pare-gt
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
|
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section |
pgconf
pgconf prints the total confidence for a page of hOCR
|
pgconf prints the total confidence for a page of hOCR |
pkg
|
|
hocr
hocr contains structures and functions for parsing and analysing hocr files
|
hocr contains structures and functions for parsing and analysing hocr files |
line
line contains various functions to manipulate ocr lines
|
line contains various functions to manipulate ocr lines |
prob
prob processes .prob files generated by ocropus
|
prob processes .prob files generated by ocropus |
Click to show internal directories.
Click to hide internal directories.