uspto-bulk-data-tool

module

v1.0.0 Latest Latest Go to latest Published: Jul 2, 2024 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/diverged/uspto-bulk-data-tool

Links

Open Source Insights

README ¶

USPTO-Bulk-Data-Tool

A configurable tool for concurrent processing of U.S. Patent and Trademark Office (USPTO) bulk data zip files.

At this time, the tool supports the following USPTO bulk data products:

Patent Grant Full Text Data (No Images) (2004 - Present)
Patent Application Full Text Data (No Images) (2004 - Present)

Given a directory of USPTO zip files, the application will produce one of the following outputs:

Complete XML files of individual documents split out from the zip
JSON files of individual documents
- Selective (non-exhaustive) parsing of main document fields
- Structured patent claims representing referential relationships, as in the original PatentPublicData tool
- HTML formatting of Abstract and Description fields
Apache Parquet files corresponding to bulk zip files

Usage

Clone this repository. Edit the config.toml as needed - the most important config values are the first three:

[required]
inputdirectory = "data/in"
outputdirectory = "data/out"
outputmode = "json"

For the most basic setup, create data/in directories within the project root, and populate the /in directory with zip files to process.

Then, from the root of project directory:

make run

For more advanced usage running the application from somewhere other than the root of the project directory, the executable accepts a single optional argument specifying the path to a config.toml file.

License

MIT

Directories ¶

Path	Synopsis
cmd
internal
config
controller
logger
outputhandler

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL