filewalker

package module
v0.0.0-...-e091414 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2025 License: MIT Imports: 25 Imported by: 1

README

Filewalker

A high-performance, security-focused filesystem traversal and monitoring system with advanced threat detection capabilities.

Go Report Card GoDoc

Overview

Filewalker combines blazing-fast concurrent filesystem traversal with sophisticated security monitoring capabilities. It's designed for both performance and security, making it ideal for:

  • Security Monitoring: Detect suspicious files and behaviors in real-time
  • File System Analysis: Process large directory structures efficiently
  • Threat Detection: Identify malicious files using hash-based detection
  • Process Monitoring: Correlate file changes with process activities
  • Performance Benchmarking: Measure filesystem operations with detailed metrics

Table of Contents

Core Capabilities

Filesystem Traversal
  • Concurrent Processing: Up to 8x faster than standard library with worker pools
  • Flexible Filtering: Filter by size, pattern, modification time, and more
  • Progress Tracking: Real-time statistics on processing speed and file counts
  • Graceful Cancellation: Context-aware operations with clean shutdown
  • Memory Efficiency: Controlled buffer sizes prevent memory spikes
  • Symlink Handling: Configurable policies with cycle detection
Security Monitoring
  • File Analysis: Hash computation, size analysis, and extension checking
  • Threat Feed Integration: Dynamic updates from external threat intelligence
  • Process Correlation: Track which processes modify files
  • Behavioral Analysis: Detect suspicious patterns of system activity
  • Real-time Alerting: Immediate notification of potential threats
  • HTTP API: RESTful access to alerts and monitoring data

Architecture

Filewalker employs a producer-consumer architecture with a worker pool model for maximum throughput and controlled resource usage.

graph TB
    subgraph "Input Layer"
        Root[Root Directory]
        Config[Configuration]
    end

    subgraph "Core Engine"
        Producer[Directory Traversal]
        TaskQueue[Task Queue]
        WorkerPool[Worker Pool]
        ErrorHandler[Error Handler]
    end

    subgraph "Security Layer"
        FileAnalyzer[File Analyzer]
        ThreatFeed[Threat Feed]
        ProcessTracker[Process Tracker]
        BehaviorMonitor[Behavior Monitor]
    end

    subgraph "Output Layer"
        Alerts[Alert System]
        Stats[Statistics]
        API[HTTP API]
        Logger[Structured Logger]
    end

    Root --> Producer
    Config --> Producer
    Config --> WorkerPool
    Config --> FileAnalyzer
    Config --> ThreatFeed

    Producer --> TaskQueue
    TaskQueue --> WorkerPool
    WorkerPool --> ErrorHandler
    WorkerPool --> FileAnalyzer
    WorkerPool --> Stats

    FileAnalyzer --> ThreatFeed
    FileAnalyzer --> ProcessTracker
    ProcessTracker --> BehaviorMonitor
    
    FileAnalyzer --> Alerts
    BehaviorMonitor --> Alerts
    
    Alerts --> API
    Stats --> API
    ErrorHandler --> Logger
    Alerts --> Logger
    Stats --> Logger

    classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
    classDef security fill:#fff3e0,stroke:#ff6f00,stroke-width:2px;
    classDef output fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px;

    class Root,Config input;
    class Producer,TaskQueue,WorkerPool,ErrorHandler core;
    class FileAnalyzer,ThreatFeed,ProcessTracker,BehaviorMonitor security;
    class Alerts,Stats,API,Logger output;
Key Components
  1. Directory Traversal Engine

    • Single producer goroutine walks the directory tree
    • Buffered channel controls memory usage and backpressure
    • Worker pool processes files in parallel
    • Atomic counters track progress without locks
  2. Security Monitoring System

    • File analyzer computes hashes and evaluates suspiciousness
    • Process tracker correlates file operations with running processes
    • Behavior monitor identifies suspicious patterns of activity
    • Alert system aggregates and prioritizes security events
  3. API and Integration Layer

    • HTTP server provides RESTful access to alerts and statistics
    • Structured logging with configurable verbosity
    • Threat feed integration for up-to-date malicious hash detection

Security Features

File-Based Threat Detection

Filewalker employs multiple strategies to identify suspicious files:

  • Hash Verification: Compare file hashes against known malicious signatures
  • Extension Analysis: Flag files with suspicious extensions (.exe, .bat, etc.)
  • Size Anomalies: Detect unusually large files that may indicate data exfiltration
  • Permission Analysis: Identify files with unusual permission settings
  • Fast Hashing: Optimized algorithm for large files (>10MB) using first/last chunks
Process-Level Visibility

One of Filewalker's most powerful features is its ability to correlate file changes with the processes that modified them:

  • Process Identification: Determine which process has a file open
  • Command Line Inspection: Analyze the full command line of suspicious processes
  • Parent-Child Relationships: Track process hierarchy for deeper context
  • Network Connection Monitoring: Identify processes with suspicious network activity
  • Cross-Platform Support: Works on Linux, macOS, and Windows (with platform-specific optimizations)
Behavioral Monitoring

Filewalker goes beyond simple file scanning to detect suspicious patterns of behavior:

  • Script-to-Binary Modifications: Detect when scripts modify executable files
  • Sensitive Directory Access: Monitor changes to system directories
  • Privilege Escalation: Identify operations performed with elevated privileges
  • Unusual Access Patterns: Flag abnormal file access sequences
  • Temporal Analysis: Detect rapid or coordinated file modifications
Real-Time Monitoring

The system includes event-driven monitoring capabilities:

  • File System Events: Track create, modify, delete, and rename operations
  • Recursive Directory Watching: Monitor entire directory trees
  • Filtering Options: Include/exclude paths and patterns
  • Low-Latency Alerts: Immediate notification of suspicious events

Performance Design

Filewalker is engineered for high performance across various workloads:

Concurrency Model
// Worker Pool Implementation
for i := 0; i < concurrency; i++ {
    go func() {
        for task := range taskQueue {
            // Process file
            // Update atomic counters
        }
    }()
}
  • Configurable Concurrency: Adjust worker count based on available CPU cores
  • Work Stealing: Dynamic load balancing ensures even distribution
  • Controlled Buffering: Prevent memory spikes during large traversals
  • Context Propagation: Clean cancellation throughout the system
Memory Efficiency
  • Atomic Counters: Lock-free statistics tracking
  • Buffered Channels: Control backpressure and prevent OOM conditions
  • Object Reuse: Minimize allocations for common operations
  • Streaming Processing: Handle large files without loading them entirely into memory
Optimized File Hashing

For large files, Filewalker uses a smart hashing approach:

┌─────────────────────────────────────────────────┐
│                  Large File                     │
└─────────────────────────────────────────────────┘
   ▲                                           ▲
   │                                           │
   │                                           │
┌──┴──┐                                     ┌──┴──┐
│First│                                     │Last │
│1 MB │                                     │1 MB │
└─────┘                                     └─────┘
   │                                           │
   ▼                                           ▼
┌─────────────────────────────────────────────────┐
│               Combined SHA-256 Hash             │
└─────────────────────────────────────────────────┘
  • Up to 100x faster for multi-GB files
  • Maintains strong detection capabilities
  • Automatically used for files larger than 10MB
  • Configurable chunk size for different security requirements

Usage Examples

Basic File Traversal
err := filewalker.WalkLimit(ctx, "/path/to/dir", func(path string, info os.FileInfo, err error) error {
    // Process file
    return nil
}, 8) // 8 concurrent workers
With Filtering and Progress
filter := filewalker.FilterOptions{
    MinSize: 1024,
    Pattern: "*.log",
    ExcludeDir: []string{".git", "node_modules"},
}

progressFn := func(stats filewalker.Stats) {
    fmt.Printf("Processed: %d files, %d dirs, %.2f MB/s\n", 
        stats.FilesProcessed, stats.DirsProcessed, stats.SpeedMBPerSec)
}

err := filewalker.WalkLimitWithFilter(ctx, "/path/to/dir", walkFn, 8, filter, progressFn)
Security Monitoring
// Start the monitoring system
filewalker.Start("/path/to/monitor", "config.json", ":8080", "admin", "password", 8)

// Access alerts via HTTP
// GET http://localhost:8080/alerts
// GET http://localhost:8080/behavioral-alerts

Configuration

Filewalker can be configured via JSON:

{
  "malicious_hashes": {
    "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef": true
  },
  "suspicious_extensions": [".exe", ".bat", ".sh", ".ps1"],
  "max_size_threshold": 104857600,
  "threat_feed_url": "https://example.com/threat-feed",
  "threat_feed_interval": "1h",
  "log_file_path": "/var/log/filewalker.log"
}

Benchmarks

Filewalker significantly outperforms standard library traversal:

Workers Time (ns/op) Memory (B/op) Allocs/op Speedup
filepath.Walk 2,980,779,125 4,217,888 26,152 baseline
2 workers 1,436,847,583 4,683,048 26,213 2.07x faster
4 workers 722,509,938 4,682,936 26,208 4.13x faster
8 workers 360,482,125 4,684,189 26,210 8.27x faster
File Hashing Performance
File Size Full Hash Fast Hash Speedup
1 MB 10 ms 10 ms 1x
100 MB 980 ms 20 ms 49x
1 GB 9,800 ms 22 ms 445x
10 GB 98,000 ms 25 ms 3,920x

License

MIT License. See the LICENSE file for details.

Author

Built with ❤️ by TFMV

Documentation

Index

Constants

This section is empty.

Variables

View Source
var SuspiciousExecutionPaths = []string{
	"/tmp",
	"/var/tmp",
	"/dev/shm",
	"/run",
	"/var/run",
	"/proc",
	"C:\\Windows\\Temp",
	"C:\\Temp",
	"C:\\Users\\Public",
}

SuspiciousExecutionPaths contains paths that are suspicious for executing files from

View Source
var SuspiciousPaths = []string{
	"/etc/passwd",
	"/etc/shadow",
	"/etc/sudoers",
	"/etc/ssh",
	"/etc/crontab",
	"/etc/cron.d",
	"/boot",
	"/sbin",
	"/bin",
	"/usr/bin",
	"/usr/sbin",
	"/usr/local/bin",
	"/lib",
	"/lib64",
	"/usr/lib",
	"/usr/lib64",
	"/System/Library",
	"/Library/StartupItems",
	"/Library/LaunchAgents",
	"/Library/LaunchDaemons",
	"C:\\Windows\\System32",
	"C:\\Windows\\SysWOW64",
	"C:\\Program Files",
	"C:\\Program Files (x86)",
	"C:\\Windows\\Tasks",
	"C:\\Windows\\Temp",
}

SuspiciousPaths contains paths that are sensitive and should be monitored for modifications

View Source
var SuspiciousProcessNames = []string{
	"nc", "netcat", "ncat",
	"socat", "cryptcat",
	"nmap", "zenmap",
	"wireshark", "tcpdump",
	"mimikatz",
	"psexec",
	"powershell", "pwsh",
	"cmd.exe",
	"bash", "sh", "zsh",
	"python", "python3", "perl", "ruby",
	"wget", "curl",
	"ssh", "telnet", "rdesktop",
}

SuspiciousProcessNames contains process names that are commonly associated with malicious activity

Functions

func AddBehavioralAlert

func AddBehavioralAlert(alert BehavioralAlert)

AddBehavioralAlert adds a new behavioral alert to the global store

func AnalyzeBehavior

func AnalyzeBehavior(event FileEvent, logger *zap.Logger)

AnalyzeBehavior analyzes a file event for suspicious behavior

func BasicAuthMiddleware

func BasicAuthMiddleware(next http.HandlerFunc, username, password string) http.HandlerFunc

BasicAuthMiddleware provides basic authentication. In production, use a *real* auth system.

func LoadConfig

func LoadConfig(configPath string) error

Load configuration from external sources (file, env, etc.) - Good for production.

func SetConfigForTest

func SetConfigForTest(cfg Config)

SetConfigForTest sets the global configuration for testing purposes. This function should only be used in tests.

func Start

func Start(rootDir, configFile, httpAddr, authUser, authPassword string, concurrency int)

func StartBehavioralMonitoring

func StartBehavioralMonitoring(logger *zap.Logger)

StartBehavioralMonitoring initializes behavioral monitoring

func UpdateThreatFeed

func UpdateThreatFeed(logger *zap.Logger) error

UpdateThreatFeed fetches and updates the malicious hashes from a threat feed. Exported for testing.

func WalkLimit

func WalkLimit(ctx context.Context, root string, walkFn filepath.WalkFunc, limit int) error

WalkLimit provides controlled concurrency.

func WalkLimitWithFilter

func WalkLimitWithFilter(ctx context.Context, root string, walkFn filepath.WalkFunc, limit int, filter FilterOptions) error

WalkLimitWithFilter walks the file tree with a limit on concurrency and applies filtering. Exported for testing.

func WalkLimitWithOptions

func WalkLimitWithOptions(ctx context.Context, root string, walkFn filepath.WalkFunc, opts WalkOptions) error

WalkLimitWithOptions provides flexible, enterprise-grade traversal.

func WalkLimitWithProgress

func WalkLimitWithProgress(ctx context.Context, root string, walkFn filepath.WalkFunc, limit int, progressFn ProgressFn) error

WalkLimitWithProgress walks the file tree with a limit on concurrency and reports progress. Exported for testing.

Types

type BehavioralAlert

type BehavioralAlert struct {
	Timestamp    time.Time   `json:"timestamp"`
	Type         string      `json:"type"`
	Description  string      `json:"description"`
	Severity     string      `json:"severity"` // "low", "medium", "high", "critical"
	ProcessInfo  ProcessInfo `json:"process_info,omitempty"`
	FileEvent    *FileEvent  `json:"file_event,omitempty"`
	RelatedPaths []string    `json:"related_paths,omitempty"`
}

BehavioralAlert represents a suspicious behavior detected by the system

func GetBehavioralAlerts

func GetBehavioralAlerts() []BehavioralAlert

GetBehavioralAlerts returns all behavioral alerts

type Config

type Config struct {
	MaliciousHashes      map[string]bool `json:"malicious_hashes"`
	SuspiciousExtensions []string        `json:"suspicious_extensions"`
	MaxSizeThreshold     int64           `json:"max_size_threshold"` // In bytes
	YaraRules            []string        // Add Yara rules support
	ThreatFeedURL        string          `json:"threat_feed_url"`      // URL for dynamic threat feed
	ThreatFeedInterval   time.Duration   `json:"threat_feed_interval"` // How often to refresh
	LogFilePath          string          `json:"log_file_path"`        // Path for log file
}

--------------------------------------------------------------------------

Configuration and External Data

--------------------------------------------------------------------------

func GetConfig

func GetConfig() Config

GetConfig returns a *copy* of the current configuration (thread-safe).

type ErrorHandling

type ErrorHandling int
const (
	ErrorHandlingContinue ErrorHandling = iota
	ErrorHandlingStop
	ErrorHandlingSkip
)

type FileEvent

type FileEvent struct {
	Path           string      `json:"path"`
	Hash           string      `json:"hash"`
	Size           int64       `json:"size"`
	Mode           os.FileMode `json:"mode"`
	ModTime        time.Time   `json:"mod_time"`
	Suspicious     bool        `json:"suspicious"`
	Reason         string      `json:"reason"`
	Timestamp      time.Time   `json:"timestamp"`
	User           string      `json:"user,omitempty"`            // User who owns the file (if available)
	Process        string      `json:"process,omitempty"`         // Process associated to the file (if available) - Requires more advanced monitoring
	ParentProcess  string      `json:"parent_process,omitempty"`  // Useful in advanced investigations
	PID            int         `json:"pid,omitempty"`             // Process ID that modified the file
	PPID           int         `json:"ppid,omitempty"`            // Parent Process ID
	CmdLine        string      `json:"cmdline,omitempty"`         // Full command line of the process
	NetConnections []string    `json:"net_connections,omitempty"` // Network connections associated with the process
}

type FileMonitor

type FileMonitor struct {
	// contains filtered or unexported fields
}

FileMonitor represents a real-time file monitor

func MonitorDirectories

func MonitorDirectories(ctx context.Context, paths []string, recursive bool, eventHandler func(FileEvent), logger *zap.Logger) (*FileMonitor, error)

MonitorDirectories starts real-time monitoring of directories

func NewFileMonitor

func NewFileMonitor(options FileMonitorOptions) (*FileMonitor, error)

NewFileMonitor creates a new file monitor

func (*FileMonitor) Start

func (m *FileMonitor) Start() error

Start starts the file monitor

func (*FileMonitor) Stop

func (m *FileMonitor) Stop()

Stop stops the file monitor

type FileMonitorOptions

type FileMonitorOptions struct {
	Paths           []string        // Paths to monitor
	RecursiveWatch  bool            // Whether to watch directories recursively
	EventHandler    func(FileEvent) // Function to call when a file event is detected
	ExcludePaths    []string        // Paths to exclude from monitoring
	IncludePatterns []string        // File patterns to include (e.g., "*.exe")
	ExcludePatterns []string        // File patterns to exclude
	Logger          *zap.Logger     // Logger to use
}

FileMonitorOptions contains options for real-time file monitoring

type FilterOptions

type FilterOptions struct {
	MinSize        int64
	MaxSize        int64
	Pattern        string
	ExcludeDir     []string
	IncludeTypes   []string
	ModifiedAfter  time.Time
	ModifiedBefore time.Time
}

type HTTPClientInterface

type HTTPClientInterface interface {
	Get(url string) (*http.Response, error)
}

HTTPClientInterface defines the interface for HTTP clients

HTTPClient is the client used for HTTP requests, can be mocked in tests

type LogLevel

type LogLevel zapcore.Level

LogLevel uses zapcore.Level for better integration with zap.

const (
	LogLevelError LogLevel = LogLevel(zapcore.ErrorLevel)
	LogLevelWarn  LogLevel = LogLevel(zapcore.WarnLevel)
	LogLevelInfo  LogLevel = LogLevel(zapcore.InfoLevel)
	LogLevelDebug LogLevel = LogLevel(zapcore.DebugLevel)
)

type MemoryLimit

type MemoryLimit struct {
	SoftLimit int64 // Pause processing
	HardLimit int64 // Stop processing
}

type ProcessInfo

type ProcessInfo struct {
	PID            int
	PPID           int
	ProcessName    string
	CmdLine        string
	NetConnections []string
}

ProcessInfo contains information about a process

type ProgressFn

type ProgressFn func(stats Stats)

type Stats

type Stats struct {
	FilesProcessed int64         // Number of files processed
	DirsProcessed  int64         // Number of directories processed
	EmptyDirs      int64         // Number of empty directories
	BytesProcessed int64         // Total bytes processed
	ErrorCount     int64         // Number of errors encountered
	ElapsedTime    time.Duration // Total time elapsed
	AvgFileSize    int64         // Average file size in bytes
	SpeedMBPerSec  float64       // Processing speed in MB/s
}

type SymlinkHandling

type SymlinkHandling int
const (
	SymlinkFollow SymlinkHandling = iota
	SymlinkIgnore
	SymlinkReport
)

type WalkOptions

type WalkOptions struct {
	ErrorHandling   ErrorHandling
	Filter          FilterOptions
	Progress        ProgressFn
	Logger          *zap.Logger
	LogLevel        LogLevel
	BufferSize      int
	SymlinkHandling SymlinkHandling
	MemoryLimit     MemoryLimit
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL