README
¶
godlp
一、简介
为了保障企业的数据安全和隐私安全,godlp 提供了一系列针对敏感数据的识别和处置方案, 其中包括敏感数据识别算法,数据脱敏处理方式,业务自定义的配置选项和海量数据处理能力。 godlp 能够应用多种隐私合规标准,对原始数据进行分级打标、判断敏感级别和实施相应的脱敏处理。
In order to achieve data security and privacy security requirements for enterprises, godlp provides a serial of sensitive information finding and handling methods, including sensitive detection algorithm, de-identification APIs, business DIY configuration and the big data handling ability. Also, godlp is able to apply a variety of privacy compliance standers, do classification based on sensitive levels, and mask data based on rules.
二、关键能力
godlp 能够广泛支持结构化(JSON数据、KV数据、golang map)和非结构化数据(多语言字符串)。
1. 敏感数据自动发现
DLP 内置多种敏感数据识别规则,能对原始数据进行敏感类型识别,确保敏感信息能被妥善处理。
2. 敏感数据脱敏处理
DLP 支持多种脱敏算法,业务可以根据需求对敏感数据进行不同的脱敏处理。
3. 业务自定义配置选项
除默认的敏感信息识别和处理规则外,业务可以根据实际情况,配置自定义的YAML规则,DLP 能够根据传入的配置选项,完成相应的数据处理任务。
三、接入方式
go get github.com/VisionaryZeng/dmsdk@latest
示例代码在 mainrun/mainrun.go
文件中
在godlp代码根目录下输入以下命令可以进行编译和运行
make
make run
make test
make bench
API 描述
dlpheader定义了 godlp SDK需要的数据结构,常量定义等。godlp SDK主要提供了以下API进行敏感信息识别和脱敏。
- ApplyConfig(conf string) error
- ApplyConfig by configuration content
- 传入conf string 进行配置
- ApplyConfigFile(filePath string) error
- ApplyConfigFile by config file path
- 传入filePath 进行配置
- Detect(inputText string) ([]*DetectResult, error)
- Detect string
- 对string进行敏感信息识别
- DetectMap(inputMap map[string]string) ([]*DetectResult, error)
- DetectMap detects KV map
- 对map[string]string进行敏感信息识别
- DetectJSON(jsonText string) ([]*DetectResult, error)
- DetectJSON detects json string
- 对json string 进行敏感信息识别
- Deidentify(inputText string) (string, []*DetectResult, error)
- Deidentify detects string firstly, then return masked string and results
- 对string先识别,然后按规则进行打码
- DeidentifyMap(inputMap map[string]string) (map[string]string, []*DetectResult, error)
- DeidentifyMap detects KV map firstly,then return masked map
- 对map[string]string先识别,然后按规则进行打码
- ShowResults(resultArray []*DetectResult)
- ShowResults print results in console
- 打印识别结果
- Mask(inputText string, methodName string) (string, error)
- Mask inputText following predefined method of MaskRules in config
- 根据脱敏规则直接脱敏
- Close()
- Close engine object, release memory of inner object
- 关闭,释放内部变量
- GetVersion() string
- Get Dlp SDK version string
- 获取版本号
- RegisterMasker(maskName string, maskFunc func(string) (string, error)) error
- Register DIY Masker
- 注册自定义打码函数
- NewLogProcesser() logs.Processor
- NewLogProcesser create a log processer for the package logs
- 日志脱敏处理函数
- MaskStruct(inObj interface{}) (interface{}, error)
- MaskStruct will mask a strcut object by tag mask info
- 根据tag mask里定义的脱敏规则对struct object直接脱敏
四、规则文件
规则文件请见 conf.yml
config 文件以yaml格式为准,整体分为: Global
,MaskRules
,Rules
三个部分。其中:
- Global 包含影响DLP全局的一些配置项,例如API版本、禁用的规则ID、是否启用后端服务辅助判断。
- MaskRules 包含脱敏操作的配置,例如打码、替换等方式。
- Rules 包含识别和处理规则,其中一个识别过程包括 Detect, Filter 和 Verify 三个依次的过程, 处理需要引用上面定义的脱敏规则。
五、架构
godlp 以 Engine 结构为主,通过Engine对象来实现 EngineAPI 接口,直接实现的接口以sdk.go
,sdkdeidentify.go
,sdkdetect.go
和sdkmask.go
为主。对于deidentify和mask操作,会继续调用子目录下的detector
,mask
子模块。
5.1 文件说明
-
sdk.go: 实现EngineAPI接口中业务无关的API,例如Close()
-
sdk_test.go: 单元测试用例。
-
sdkconfig.go: 实现配置相关的接口,例如ApplyConfig()
-
sdkdeidentify.go: 实现脱敏相关的接口。
-
sdkdetect.go: 实现敏感信息检测接口。
-
sdkinternal.go: 实现 Engine 对象的内部函数。
-
sdkmask.go: 实现直接打码的接口。
-
conf.yml: 内置的默认配置文件,含DLP维护的规则。
-
bindata.go: go generate生成的数据文件,包含conf.yml
5.2 子目录说明
-
conf: 实现DlpConf结构,处理配置文件。
-
detector: 敏感信息检测逻辑的内部实现。
-
errlist: 报错信息列表。
-
mask: 直接脱敏的内部实现。
-
util: 辅助功能实现。
-
dlpheader: dlp sdk 定义的接口头文件。
六、致谢
DLP项目从立项开始,一路走来,离不开其中辛苦付出的开发同学们,这里向为DLP写下代码的同学,致以最诚挚的感谢,以下同学排名不分先后。
- 丁保增 负责DLP1.0 识别信息验证模块。
- 王聪 负责DLP1.0 官网、JSON识别处理等模块、多个项目接入。
- 王赛 负责DLP1.0 去标识模块。
- 苏宁宁 负责DLP1.0 性能准确率测试。
- 王帅 负责DLP1.0 API头文件。
- 鲁云飞 负责DLP1.0 AI模块、NLP服务。
- 石岚 负责DLP1.0 AI模块,大数据处理API模块,发版等。
- 黄勇辉 负责DLP1.0 AI模块,优化更新了大量规则。
- 张宇鹏 参与DLP1.0 AI模块。
- 李赛南 参与DLP1.0 AI模块。
- 王珩 负责DLP1.0 保格式加密、保顺序加密模块。
- 夏世文 负责DLP1.0 性能优化、规则代码实现、主要完成了多个项目的合作开发工作。
- 罗同龙 为DLP2.0 提交了log处理性能优化的PR。
- 乔鑫 负责DLP2.0 服务端代码、SDK性能优化、技术实现。
- 杨经宇 负责DLP1.0 和 2.0的整体项目。
Documentation
¶
Overview ¶
Package dlp provides dlp sdk api implementaion
Package dlp sdkconfig.go implement config related API ¶
Package dlp sdkdeidentify.go implements deidentify related APIs ¶
Package dlp sdkdetect.go implements DLP detect APIs ¶
Package dlp sdkinternal.go implements internal API for DLP ¶
Package dlp sdkmask.go implements Mask API
Index ¶
- Constants
- Variables
- func Asset(name string) ([]byte, error)
- func AssetDigest(name string) ([sha256.Size]byte, error)
- func AssetDir(name string) ([]string, error)
- func AssetInfo(name string) (os.FileInfo, error)
- func AssetNames() []string
- func AssetString(name string) (string, error)
- func B2S(b []byte) string
- func Digests() (map[string][sha256.Size]byte, error)
- func MustAsset(name string) []byte
- func MustAssetString(name string) string
- func NewEngine(callerID string) (dlpheader.EngineAPI, error)
- func RestoreAsset(dir, name string) error
- func RestoreAssets(dir, name string) error
- func S2B(s string) (b []byte)
- type DIYMaskWorker
- type DescribeRulesResponse
- type Engine
- func (I *Engine) ApplyConfig(confString string) error
- func (I *Engine) ApplyConfigDefault() error
- func (I *Engine) ApplyConfigFile(filePath string) error
- func (I *Engine) Close()
- func (I *Engine) Deidentify(inputText string) (outputText string, retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) DeidentifyJSON(jsonText string) (outStr string, retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) DeidentifyJSONByResult(jsonText string, detectResults []*dlpheader.DetectResult) (outStr string, retErr error)
- func (I *Engine) DeidentifyMap(inputMap map[string]string) (outMap map[string]string, retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) Detect(inputText string) (retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) DetectJSON(jsonText string) (retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) DetectMap(inputMap map[string]string) (retResults []*dlpheader.DetectResult, retErr error)
- func (I *Engine) DisableAllRules() error
- func (I *Engine) GetDefaultConf() string
- func (I *Engine) GetVersion() string
- func (I *Engine) Mask(inputText string, methodName string) (outputText string, err error)
- func (I *Engine) MaskStruct(inPtr interface{}) (outPtr interface{}, retErr error)
- func (I *Engine) NewDIYMaskWorker(maskName string, maskFunc func(string) (string, error)) (mask.MaskAPI, error)
- func (I *Engine) NewEmptyLogProcessor() dlpheader.Processor
- func (I *Engine) NewLogProcessor() dlpheader.Processor
- func (I *Engine) RegisterMasker(maskName string, maskFunc func(string) (string, error)) error
- func (I *Engine) ShowDlpConf() error
- func (I *Engine) ShowResults(results []*dlpheader.DetectResult)
- type HttpResponseBase
- type ResultList
Constants ¶
const ( // outter const values Version = "v1.2.15" PackageName = "github.com/VisionaryZeng/dmsdk" FullVer = PackageName + "@" + Version )
const var for dlp
const ( DEF_MAX_INPUT = 1024 * 1024 // 1MB, the max input string length DEF_LIMIT_ERR = "<--[DLP] Log Limit Exceeded-->" // append to log if limit is exceeded DEF_MAX_LOG_ITEM = 16 // max input items for log DEF_RESULT_SIZE = 4 // default results size for array allocation DEF_LineBlockSize = 1024 // default line block DEF_CUTTER = " /\r\n\\[](){}:=\"'," // default cutter for finding KV object in string DEF_MAX_ITEM = 1024 * 4 // max input items for MAP API DEF_MAX_CALL_DEEP = 5 // max call depth for MaskStruct )
const var for default values
const AssetDebug = false
AssetDebug is true if the assets were built with the debug flag enabled.
Variables ¶
var ( DEF_MAX_LOG_INPUT int32 = 1024 // default 1KB, the max input lenght for log, change it in conf DEF_MAX_REGEX_RULE_ID int32 = 0 // default 0, no regex rule will be used for log default, change it in conf )
var DEF_CFG string
make conf.yml as asset in go binary
Functions ¶
func Asset ¶
Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.
func AssetDigest ¶
AssetDigest returns the digest of the file with the given name. It returns an error if the asset could not be found or the digest could not be loaded.
func AssetDir ¶
AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:
data/ foo.txt img/ a.png b.png
then AssetDir("data") would return []string{"foo.txt", "img"}, AssetDir("data/img") would return []string{"a.png", "b.png"}, AssetDir("foo.txt") and AssetDir("notexist") would return an error, and AssetDir("") will return []string{"data"}.
func AssetInfo ¶
AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.
func AssetString ¶
AssetString returns the asset contents as a string (instead of a []byte).
func MustAsset ¶
MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.
func MustAssetString ¶
MustAssetString is like AssetString but panics when Asset would return an error. It simplifies safe initialization of global variables.
func RestoreAsset ¶
RestoreAsset restores an asset under the given directory.
func RestoreAssets ¶
RestoreAssets restores an asset under the given directory recursively.
Types ¶
type DIYMaskWorker ¶
type DIYMaskWorker struct {
// contains filtered or unexported fields
}
DIYMaskWorker stores maskFuc and maskName
func (*DIYMaskWorker) GetRuleName ¶
func (I *DIYMaskWorker) GetRuleName() string
GetRuleName is required by mask.MaskAPI
func (*DIYMaskWorker) Mask ¶
func (I *DIYMaskWorker) Mask(in string) (string, error)
Mask is required by mask.MaskAPI
func (*DIYMaskWorker) MaskResult ¶
func (I *DIYMaskWorker) MaskResult(res *dlpheader.DetectResult) error
MaskResult is required by mask.MaskAPI
type DescribeRulesResponse ¶
type DescribeRulesResponse struct { HttpResponseBase Rule []byte `json:"rule,omitempty"` Crc uint32 `json:"crc,omitempty"` //rule 的crc }
type Engine ¶
type Engine struct { Version string // contains filtered or unexported fields }
Engine Object implements all DLP API functions
func (*Engine) ApplyConfig ¶
ApplyConfig by configuration content 传入conf string 进行配置
func (*Engine) ApplyConfigDefault ¶
func (*Engine) ApplyConfigFile ¶
ApplyConfigFile by config file path 传入filePath 进行配置
func (*Engine) Close ¶
func (I *Engine) Close()
Close release inner object, such as detector and masker
func (*Engine) Deidentify ¶
func (I *Engine) Deidentify(inputText string) (outputText string, retResults []*dlpheader.DetectResult, retErr error)
public func Deidentify detects string firstly, then return masked string and results 对string先识别,然后按规则进行打码
func (*Engine) DeidentifyJSON ¶
func (I *Engine) DeidentifyJSON(jsonText string) (outStr string, retResults []*dlpheader.DetectResult, retErr error)
DeidentifyJSON detects JSON firstly, then return masked json object in string format and results 对jsonText先识别,然后按规则进行打码,返回打码后的JSON string
func (*Engine) DeidentifyJSONByResult ¶
func (I *Engine) DeidentifyJSONByResult(jsonText string, detectResults []*dlpheader.DetectResult) (outStr string, retErr error)
DeidentifyJSONByResult returns masked json object in string format from the passed-in []*dlpheader.DetectResult. You may want to call DetectJSON first to obtain the []*dlpheader.DetectResult. 根据传入的 []*dlpheader.DetectResult 对 Json 进行打码,返回打码后的JSON string
func (*Engine) DeidentifyMap ¶
func (I *Engine) DeidentifyMap(inputMap map[string]string) (outMap map[string]string, retResults []*dlpheader.DetectResult, retErr error)
DeidentifyMap detects KV map firstly,then return masked map 对map[string]string先识别,然后按规则进行打码
func (*Engine) Detect ¶
func (I *Engine) Detect(inputText string) (retResults []*dlpheader.DetectResult, retErr error)
Detect find sensitive information for input string 对string进行敏感信息识别
func (*Engine) DetectJSON ¶
func (I *Engine) DetectJSON(jsonText string) (retResults []*dlpheader.DetectResult, retErr error)
DetectJSON detects json string 对json string 进行敏感信息识别
func (*Engine) DetectMap ¶
func (I *Engine) DetectMap(inputMap map[string]string) (retResults []*dlpheader.DetectResult, retErr error)
DetectMap detects KV map 对map[string]string进行敏感信息识别
func (*Engine) DisableAllRules ¶
ApplyConfigDefault will use embeded local config, only used for DLP team 业务禁止使用
func (*Engine) GetDefaultConf ¶
GetDefaultConf will return default config string 返回默认的conf string
func (*Engine) MaskStruct ¶
MaskStruct will mask a strcut object by tag mask info 根据tag mask里定义的脱敏规则对struct object直接脱敏, 会修改obj本身,传入指针,返回指针
func (*Engine) NewDIYMaskWorker ¶
func (I *Engine) NewDIYMaskWorker(maskName string, maskFunc func(string) (string, error)) (mask.MaskAPI, error)
NewDIYMaskWorker creates mask.MaskAPI object
func (*Engine) NewEmptyLogProcessor ¶
NewEmptyLogProcesser will new a log processer which will do nothing 业务禁止使用
func (*Engine) NewLogProcessor ¶
NewLogProcessor create a log processer for the package logs 调用过之后,eng只能用于log处理,因为规则会做专门的优化,不适合其他API使用
func (*Engine) RegisterMasker ¶
Register DIY Masker 注册自定义打码函数
func (*Engine) ShowResults ¶
func (I *Engine) ShowResults(results []*dlpheader.DetectResult)
ShowResults print results in console 打印识别结果
type HttpResponseBase ¶
type ResultList ¶
type ResultList []*dlpheader.DetectResult
Result type define is uesd for sort in mergeResults
func (ResultList) Contain ¶
func (a ResultList) Contain(i, j int) bool
Contain checks whether a[i] contains a[j]
func (ResultList) Equal ¶
func (a ResultList) Equal(i, j int) bool
Equal checks whether positions are equal
func (ResultList) Less ¶
func (a ResultList) Less(i, j int) bool
Less function is used for sort in mergeResults
func (ResultList) Swap ¶
func (a ResultList) Swap(i, j int)
Swap function is used for sort in mergeResults
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package conf provides configuration handler for dlp
|
Package conf provides configuration handler for dlp |
Package detector implements detector functions
|
Package detector implements detector functions |
Package dlpheader defines API information about DLP SDK, including DetectResult, mask methods and API functions.
|
Package dlpheader defines API information about DLP SDK, including DetectResult, mask methods and API functions. |
Package mask implements Mask API
|
Package mask implements Mask API |