Documentation
¶
Overview ¶
Package embargo performs embargo for all sidestream data. For all data that are more than one year old, or server IP in the list of M-Lab server IP list except the samknow sites, the sidestream test will be published. Otherwise the test will be embargoed and saved in a private bucket. It will published later when it is more than one year old.
Package embargo implemented site IP loading from public URL or local file and check whether an IP is in the whitelist which is the list of all sites exceot the samknows sites.
Parse filename and return componants like log-time, IP, etc. Filename example: 20170315T01:00:00Z_173.205.3.39_0.web100
Package gcs implements a simple library for basic operations given bucket names and file name/prefix, such as ls, cp, rm, etc. on Google Cloud Storage.
Implement the umembargo process when the previously embargoed files are more than one year old.
Index ¶
- func CompareBuckets(sourceBucket string, destBucket string) bool
- func CopyOneFile(sourceBucket string, destBucket string, fileName string) bool
- func CreateBucket(projectID string, bucketName string) bool
- func CreateService() *storage.Service
- func DeleteBucket(bucketName string) bool
- func DeleteFiles(bucketName string, prefixFileName string) bool
- func FilterSiteIPs(body []byte) (map[string]struct{}, error)
- func FormatDateAsInt(t time.Time) int
- func GetFileNamesFromBucket(bucketName string) []string
- func GetFileNamesWithPrefix(service *storage_v1.Service, bucketName string, prefixFileName string) (map[string]bool, error)
- func SyncTwoBuckets(sourceBucket string, destBucket string, prefixFileName string) bool
- func UnEmbargoOneDayLegacyFiles(sourceBucket string, destBucket string, prefixFileName string) error
- func UnembargoCron(date int) error
- func UpdateWhitelist() error
- func UploadFile(bucketName string, fileName string, targetdir string) bool
- type EmbargoConfig
- func (ec *EmbargoConfig) EmbargoOneDayData(date string, cutoffDate int) error
- func (ec *EmbargoConfig) EmbargoOneTar(content io.Reader, tarfileName string, moreThanOneYear bool) error
- func (ec *EmbargoConfig) EmbargoSingleFile(filename string) error
- func (ec *EmbargoConfig) SplitFile(content io.Reader, moreThanOneYear bool) (bytes.Buffer, bytes.Buffer, error)
- func (ec *EmbargoConfig) WriteResults(tarfileName string, embargoBuf, publicBuf bytes.Buffer) error
- type FileName
- type FileNameParser
- type Site
- type UnembargoConfig
- type WhitelistChecker
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CompareBuckets ¶
CompareBuckets compares whether 2 buckets have exactly same files. Return true if they are the same.
func CopyOneFile ¶
CopyOneFile copies one file from one bucket to another bucket. Return true if succeed. ("cp")
func CreateBucket ¶
CreateBucket creates a new bucket. Return true if it already exsits or is created successfully.
func CreateService ¶
CreateService creates GCS service used by the following functions.
func DeleteBucket ¶
Delete the bucket if it is empty. ("rmdir")
func DeleteFiles ¶
DeleteFiles deletes all files with specified prefix from bucket. ("rm")
func FilterSiteIPs ¶
FilterSiteIPs parses bytes and returns array of struct with site IPs filtering out all samknows sites. TODO: make the filter use positive checks, including the list of things other than samknows, rather than excluding samknows.
func FormatDateAsInt ¶
FormatDateAsInt return a date in interger as format yyyymmdd.
func GetFileNamesFromBucket ¶
GetFileNamesFromBucket returns array of file names in that bucket given the bucket name,. ("ls")
func GetFileNamesWithPrefix ¶
func GetFileNamesWithPrefix(service *storage_v1.Service, bucketName string, prefixFileName string) (map[string]bool, error)
Get filenames for given bucket with the given prefix. Use the service
func SyncTwoBuckets ¶
SyncTwoBuckets copies all files with PrefixFileName from SourceBucke to DestBucket if there is no one yet. Return true if succeed.
func UnEmbargoOneDayLegacyFiles ¶
func UnEmbargoOneDayLegacyFiles(sourceBucket string, destBucket string, prefixFileName string) error
UnEmbargoOneDayLegacyFiles unembargos one day data in the sourceBucket, and writes the output to destBucket. The date is used as prefixFileName in format sidestream/yyyy/mm/dd
func UnembargoCron ¶
func UpdateWhitelist ¶
func UpdateWhitelist() error
UpdateWhitelist loads the site IP json file again and updates the whitelist in memory.
Types ¶
type EmbargoConfig ¶
type EmbargoConfig struct {
// contains filtered or unexported fields
}
EmbargoConfig is a struct that performs all embargo procedures.
var EmbargoSingleton *EmbargoConfig
EmbargoSingleton is the singleton object that is the pointer of the EmbargoConfig object.
func GetEmbargoConfig ¶
func GetEmbargoConfig(siteIPFile string) (*EmbargoConfig, error)
GetEmbargoConfig creates a new EmbargoConfig and returns it.
func (*EmbargoConfig) EmbargoOneDayData ¶
func (ec *EmbargoConfig) EmbargoOneDayData(date string, cutoffDate int) error
EmbargoOneDayData do embargo for one day files. The input date is string in format yyyymmdd The cutoffDate is integer in format yyyymmdd TODO: handle midway crash. Since the source bucket is unchanged, if it failed in the middle, we just rerun it for that specific day.
func (*EmbargoConfig) EmbargoOneTar ¶
func (ec *EmbargoConfig) EmbargoOneTar(content io.Reader, tarfileName string, moreThanOneYear bool) error
EmbargoOneTar processes one tar file, splits it to 2 files. The embargoed files will be saved in a private bucket, and the unembargoed part will be save in a public bucket. The private file will have a different name, so it can be copied to public bucket directly when it becomes one year old. The tarfileName is like 20170516T000000Z-mlab1-atl06-sidestream-0000.tgz
func (*EmbargoConfig) EmbargoSingleFile ¶
func (ec *EmbargoConfig) EmbargoSingleFile(filename string) error
EmbargoSingleFile embargo the input file.
func (*EmbargoConfig) SplitFile ¶
func (ec *EmbargoConfig) SplitFile(content io.Reader, moreThanOneYear bool) (bytes.Buffer, bytes.Buffer, error)
SplitFile splits one tar files into 2 buffers.
func (*EmbargoConfig) WriteResults ¶
func (ec *EmbargoConfig) WriteResults(tarfileName string, embargoBuf, publicBuf bytes.Buffer) error
WriteResults writes results to GCS.
type FileName ¶
type FileName struct {
Name string
}
func (*FileName) GetLocalIP ¶
GetLocalIP parse the filename and return IP. For old format, it will return empty string.
type FileNameParser ¶
type FileNameParser interface { GetLocalIP() GetDate() }
type Site ¶
type Site struct { Hostname string `json:"hostname"` Ipv4 string `json:"ipv4"` Ipv6 string `json:"ipv6"` }
Site is a struct for parsing json file.
type UnembargoConfig ¶
type UnembargoConfig struct {
// contains filtered or unexported fields
}
func NewUnembargoConfig ¶
func NewUnembargoConfig(privateBucketName, publicBucketName string) *UnembargoConfig
func (*UnembargoConfig) Unembargo ¶
func (nc *UnembargoConfig) Unembargo(date int) error
Unembargo unembargo the data of the input date in format yyyymmdd. TODO(dev): add more validity check for input date.
type WhitelistChecker ¶
type WhitelistChecker struct {
EmbargoWhiteList map[string]struct{}
}
WhitelistChecker is a struct containing map EmbargoWhiteList which is the list of M-Lab site IP EXCEPT the Samknows sites.
func (*WhitelistChecker) CheckInWhiteList ¶
func (wc *WhitelistChecker) CheckInWhiteList(fileName string) bool
CheckInWhiteList checks whether the IP in fileName is in the embargo whitelist. The filename is like: 20170225T23:00:00Z_4.34.58.34_0.web100 file with IP that is in the site IP list, return true file with IP not in the site IP list, return false
func (*WhitelistChecker) LoadFromLocalWhitelist ¶
func (wc *WhitelistChecker) LoadFromLocalWhitelist(path string) error
LoadFromLocalWhitelist loads embargo IP whitelist from a local file.
func (*WhitelistChecker) LoadFromURL ¶
func (wc *WhitelistChecker) LoadFromURL(jsonURL string) error
LoadFromGCS loads the embargo IP whitelist from public URL. TODO: add unittest for this func.