filemanager

package module
v0.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2024 License: MIT Imports: 33 Imported by: 1

README

FileManager Package

The FileManager package is a powerful and flexible solution for handling and processing files in Go. It provides a convenient way to manage file storage, retrieval, and processing using a plugin-based architecture.

Versions

  • v0.5.2 updated Upload Handling to read in content of the managed file after completion of the upload
  • v0.5.1 added hlper CreateManagedFileFromResponseBody to create a ManagedFile from a http.Response.Body (io.ReadCloser)
  • v0.5.0 lots of fixes and addition of an optional logger function to the FileManager struct
  • v0.4.4 added support for multiple output files in processing recipes with pattern-based file naming and examples. added support for creating paths with outfput file names.
  • v0.4.2 fixed public URL generation
  • v0.4.1 Added fm.RunProcessingStep helper to run a single processing step instead of a pre-loaded recipe.
  • v0.4.0 Added 2 helpers to create ManagedFiles without Processing from multipart.FileHeader and another from a local file path.
  • v0.3.0 Added support for multiple output files, improved error handling, and enhanced processing status updates with resulting file information.

Features

  • File storage and retrieval: Store and retrieve files from different storage types (public, private, temporary).
  • File processing: Process files using various processing plugins, such as image manipulation, PDF manipulation, and more.
  • Recipe-based processing: Define processing recipes that specify a sequence of processing steps to be applied to files.
  • Upload handling: Handle file uploads and trigger processing recipes based on the uploaded files.

Usage

Initialization

To start using the FileManager package, you need to initialize a new instance of the FileManager struct:

import "github.com/itsatony/go-filemanager"

fm := filemanager.NewFileManager(publicBasePath, privateBasePath, baseURL, tempPath)
  • publicBasePath: The base path for storing public files.
  • privateBasePath: The base path for storing private files.
  • baseURL: The base URL for accessing files.
  • tempPath: The path for storing temporary files.
Adding Processing Plugins

To add processing plugins to the FileManager, use the AddProcessingPlugin method:

fm.AddProcessingPlugin("image_manipulation", &filemanager.ImageManipulationPlugin{})
fm.AddProcessingPlugin("pdf_manipulation", &filemanager.PDFManipulationPlugin{})
fm.AddProcessingPlugin("pdf_text_extractor", &filemanager.PDFTextExtractorPlugin{})
fm.AddProcessingPlugin("clamav", &filemanager.ClamAVPlugin{})
fm.AddProcessingPlugin("format_converter", &filemanager.FormatConverterPlugin{})
fm.AddProcessingPlugin("exif_metadata_extractor", &filemanager.ExifMetadataExtractorPlugin{})
Loading Recipes

To load processing recipes from a directory, use the LoadRecipes method:

err := fm.LoadRecipes("path/to/recipes")
if err != nil {
    // Handle the error
}

The recipe files should be in YAML format and stored in the specified directory.

Processing Files

To process a file using a specific recipe, use the ProcessFile method:

fileProcess := filemanager.NewFileProcess("example.jpg", "image_processing_recipe")

statusCh := make(chan *filemanager.FileProcess)
go func() {
    fm.ProcessFile(file, "image_processing_recipe", fileProcess, statusCh)
}()

for processUpdate := range statusCh {
    latestStatus := processUpdate.GetLatestProcessingStatus()
    if latestStatus.Error != nil {
        // Handle the processing error
    } else if latestStatus.Done {
        // Processing completed successfully
    } else {
        // Processing progress update
        fmt.Printf("Processing progress: %d%% - %s\n", latestStatus.Percentage, latestStatus.StatusDescription)
    }
}
Handling File Uploads

To handle file uploads and trigger processing recipes, use the HandleFileUpload method:

In this updated example:

We create a statusCh channel to receive the processing status updates, including upload progress updates. We use a goroutine to handle the file upload asynchronously using the HandleFileUpload function. We pass the fileReader (an io.Reader representing the file data) and the statusCh channel to the function. If an error occurs during the upload, we handle it appropriately. After the file is successfully uploaded, we trigger a processing recipe using the ProcessFile function, passing the uploaded file, the recipe name, and the statusCh channel. We use a for loop to consume the status updates from the statusCh channel. If the status contains an error, we handle it appropriately. If the status indicates that the processing is done (status.Done is true), we handle the completion of the processing. If the status is neither an error nor a completion status, it represents an upload or processing progress update. We print the progress percentage using status.Percentage.

fileProcess := filemanager.NewFileProcess("uploaded_file.pdf", "upload_processing_recipe")

statusCh := make(chan *filemanager.FileProcess)
go func() {
    file, err := fm.HandleFileUpload(fileReader, fileProcess, statusCh)
    if err != nil {
        fmt.Printf("Upload error: %v\n", err)
        return
    }
    
    err = fm.ProcessFile(file, "upload_processing_recipe", fileProcess, statusCh)
    if err != nil {
        fmt.Printf("Processing error: %v\n", err)
    }
}()

for processUpdate := range statusCh {
    latestStatus := processUpdate.LatestStatus
    if latestStatus.Error != nil {
        fmt.Printf("Processing error: %v\n", latestStatus.Error)
    } else if latestStatus.Done {
        fmt.Printf("Processing completed successfully\n")
        for _, resultingFile := range latestStatus.ResultingFiles {
            fmt.Printf("Resulting file: %s\n", resultingFile.FileName)
            fmt.Printf("  Local file path: %s\n", resultingFile.LocalFilePath)
            fmt.Printf("  URL: %s\n", resultingFile.URL)
            fmt.Printf("  File size: %d bytes\n", resultingFile.FileSize)
            fmt.Printf("  MIME type: %s\n", resultingFile.MimeType)
        }
    } else {
        fmt.Printf("Progress: %d%% - %s\n", latestStatus.Percentage, latestStatus.StatusDescription)
    }
}

Example Recipes

Here are a few example recipes that demonstrate the usage of different processing plugins:

File Upload Recipe

this demonstrates a file upload recipe that accepts image/jpeg, image/png, and application/pdf files, with a minimum file size of 1 byte and a maximum file size of 50 MB. The recipe includes a ClamAV plugin to scan the uploaded files for viruses and two output formats: original and backup. The target file names are specified using date-based patterns and metadata values.

name: file_upload_with_metadata
accepted_mime_types:
  - image/jpeg
  - image/png
  - application/pdf
min_file_size: 1
max_file_size: 52428800
processing_steps:
  - plugin_name: clamav
output_formats:
  - format: original
    target_file_names:
      # the metadata needs to be supplied when submitting a ManagedFile for processing by populating the metadata map
      # metadata.process_id is set by the filemanager processor
      - "uploads/{metadata.user_id}/{metadata.incoming_filename}"
      - "backups/{metadata.process_id}/{metadata.incoming_filename}"
    # public, private, temp
    storage_type: public
Image Processing Recipe
name: image_processing_recipe
accepted_mime_types:
  - image/jpeg
  - image/png
min_file_size: 1024
max_file_size: 10485760
processing_steps:
  - plugin_name: image_manipulation
    params:
      format: webp
      width: 800
      height: 600
      aspect_ratio: "4:3"
output_formats:
  - format: webp
    target_file_names:
      -  processed_image.webp
    storage_type: public

This recipe processes image files by converting them to WebP format, resizing them to 800x600 pixels, and cropping them to a 4:3 aspect ratio. The processed image is stored as a public file.

PDF Text Extraction Recipe
name: pdf_text_extraction_recipe
accepted_mime_types:
  - application/pdf
min_file_size: 1024
max_file_size: 52428800
processing_steps:
  - plugin_name: pdf_text_extractor
    params:
      output_format: markdown
output_formats:
  - format: md
    target_file_names:
      -  extracted_text.md
    storage_type: private

This recipe extracts text from PDF files and converts it to Markdown format. The extracted text is stored as a private file.

Upload Processing Recipe
name: upload_processing_recipe
accepted_mime_types:
  - image/jpeg
  - image/png
  - application/pdf
min_file_size: 1024
max_file_size: 52428800
processing_steps:
  - plugin_name: image_manipulation
    params:
      format: jpg
      width: 1200
      height: 800
  - plugin_name: pdf_manipulation
    params:
      manipulation_type: compress
      compression_level: medium
output_formats:
  - format: jpg
    target_file_names:
      -  processed_upload.jpg
    storage_type: public
  - format: pdf
    target_file_names:
      -  compressed_upload.pdf
    storage_type: private

This recipe processes uploaded files based on their MIME type. If the uploaded file is an image, it is converted to JPEG format and resized to 1200x800 pixels. If the uploaded file is a PDF, it is compressed using the medium compression level. The processed files are stored as public (for images) and private (for PDFs) files.

Included Plugins / Processors

The FileManager package comes with several built-in plugins and processors that can be used to manipulate and process files. Here's a list and description of the plugins and processors available:

Image Manipulation Plugin

The Image Manipulation plugin allows you to perform various image processing operations on image files. It supports the following parameters:

  • format: The output format of the processed image. Supported formats: "jpg", "png", "webp".
  • width: The desired width of the processed image in pixels.
  • height: The desired height of the processed image in pixels.
  • aspect_ratio: The desired aspect ratio of the processed image. Supported aspect ratios: "1:1", "4:3", "16:9", "21:9".

This plugin can be used to resize, crop, and convert image files to different formats.

PDF Text Extractor Plugin

The PDF Text Extractor plugin allows you to extract text from PDF files and convert it to plain text or Markdown format. It supports the following parameter:

  • output_format: The output format of the extracted text. Supported formats: "text", "markdown".

This plugin is useful for extracting text content from PDF files and converting it to a more readable and editable format.

PDF Manipulation Plugin

The PDF Manipulation plugin allows you to perform various operations on PDF files, such as extracting pages, merging PDFs, compressing PDFs, and reordering pages. It supports the following parameters:

  • manipulation_type: The type of manipulation to perform on the PDF. Supported types: "extract", "merge", "compress", "reorder".
  • start_page (for "extract"): The starting page number to extract (inclusive).
  • end_page (for "extract"): The ending page number to extract (inclusive).
  • merge_files (for "merge"): An array of file names to be merged with the base PDF.
  • compression_level (for "compress"): The compression level to apply. Supported levels: "low", "medium", "high".
  • page_order (for "reorder"): An array of page numbers representing the desired order of pages.

This plugin provides powerful capabilities for manipulating PDF files, such as extracting specific pages, merging multiple PDFs into one, compressing PDFs to reduce file size, and reordering pages.

ClamAV Plugin

The ClamAV plugin allows you to scan files for viruses using the ClamAV antivirus engine. It doesn't require any additional parameters.

This plugin is useful for ensuring the security of uploaded files by scanning them for known viruses and malware using the ClamAV engine.

These plugins and processors can be used individually or chained together in processing recipes to create custom file processing workflows. The FileManager package provides flexibility and extensibility, allowing you to easily add new plugins and processors to meet your specific requirements.

For detailed information on how to use these plugins and processors, please refer to the documentation and examples provided in the README file.

Here's a section for the Format Converter Processor plugin that you can add to the README file, along with an examples section:

Format Converter Processor Plugin

The Format Converter Processor plugin allows you to convert various file formats into text-based versions suitable for further processing or injection into Large Language Models (LLMs). It currently supports the following file format conversions:

  • DOCX to plain text
  • DOCX to Markdown
  • Excel (XLS, XLSX) to CSV

The plugin uses the following libraries for file format conversions:

  • github.com/yuin/goldmark for DOCX to Markdown conversion
  • github.com/360EntSecGroup-Skylar/excelize/v2 for Excel to CSV conversion
Limitations
  • The DOCX to plain text conversion is currently a placeholder implementation that assumes the content is already in plain text format. You may need to replace it with a custom implementation or a library that converts DOCX to plain text.
  • The Excel to CSV conversion currently converts only the first sheet of the Excel file. If you need to handle multiple sheets or specify a specific sheet, you may need to modify the convertExcelToCSV function accordingly.

Please refer to the plugin's source code for more details on its implementation and functionality.

Exif Metadata Extractor Plugin

The Exif Metadata Extractor plugin allows you to extract Exif metadata from image files. It retrieves information such as camera make, model, capture date and time, GPS coordinates, focal length, aperture, exposure time, and ISO speed ratings.

The plugin uses the following library for Exif metadata extraction:

  • github.com/rwcarlsen/goexif/exif for extracting Exif metadata from image files
Exif Metadata Extractor Usage

To use the Exif Metadata Extractor plugin, include it in your processing pipeline by adding the following configuration to your recipe:

processing_steps:
  - plugin_name: exif_metadata_extractor

The plugin will automatically detect image files based on their MIME type and extract the Exif metadata.

Installation

To use the FileManager package in your Go project, you need to install it using the following command:

go get github.com/itsatony/go-filemanager

Contributing

Contributions to the FileManager package are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

The FileManager package is open-source software licensed under the MIT License.

TODO / Roadmap

  • tests
  • less annoying printing

Documentation

Overview

processing.go

upload.go

Index

Constants

View Source
const FILE_PROCESS_ID_LENGTH = 16
View Source
const FILE_PROCESS_ID_PREFIX = "FP"
View Source
const Version = "0.5.1"

Variables

View Source
var (
	ErrLocalFileNotFound = errors.New("local file not found")
	ErrUrlNotMapped      = errors.New("url not mapped to local file")
)
View Source
var (
	ErrRecipeNotFound           = errors.New("recipe not found")
	ErrInvalidMimeType          = errors.New("invalid MIME type")
	ErrInvalidFileSize          = errors.New("invalid file size")
	ErrProcessingPluginNotFound = errors.New("processing plugin not found")
)
View Source
var (
	ErrNilResponseBody = errors.New("response body is nil")
)

Functions

func DownloadFileFromUrl

func DownloadFileFromUrl(url string, localFilePath string) (err error)

func FileExists

func FileExists(filePath string) bool

func GuessMimeType

func GuessMimeType(filepath string) (string, error)

func NID added in v0.2.1

func NID(prefix string, length int) (nid string)

func ReplaceFileNameVariables added in v0.4.3

func ReplaceFileNameVariables(fileName string, file *ManagedFile) string

Types

type ClamAVPlugin

type ClamAVPlugin struct {
	// contains filtered or unexported fields
}

func NewClamAVPlugin

func NewClamAVPlugin(tcpConnection string) (*ClamAVPlugin, error)

NewClamAVPlugin creates a new ClamAVPlugin instance - only works with TCP connection tcp := viper.GetString("CLAMAV_TCP")

func (*ClamAVPlugin) Process

func (p *ClamAVPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type ExifMetadataExtractorPlugin added in v0.1.2

type ExifMetadataExtractorPlugin struct{}

func (*ExifMetadataExtractorPlugin) Process added in v0.1.2

func (p *ExifMetadataExtractorPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type FileManager

type FileManager struct {
	// contains filtered or unexported fields
}

func NewFileManager

func NewFileManager(publicLocalBasePath, privateLocalBasePath, baseUrl, tempPath string, logger LogAdapter) *FileManager

func (*FileManager) AddProcessingPlugin

func (fm *FileManager) AddProcessingPlugin(name string, plugin ProcessingPlugin)

func (*FileManager) CreateManagedFileFromFileHeader added in v0.4.0

func (fm *FileManager) CreateManagedFileFromFileHeader(fileHeader *multipart.FileHeader, targetStorageType FileStorageType) (*ManagedFile, error)

CreateManagedFileFromFileHeader creates a ManagedFile from a multipart.FileHeader which is typical in HTTP file uploads.

func (*FileManager) CreateManagedFileFromPath added in v0.4.0

func (fm *FileManager) CreateManagedFileFromPath(localPath string, targetStorageType FileStorageType) (*ManagedFile, error)

CreateManagedFileFromPath creates a ManagedFile from a given local path.

func (*FileManager) CreateManagedFileFromResponseBody added in v0.5.1

func (fm *FileManager) CreateManagedFileFromResponseBody(filename string, responseBody io.ReadCloser, targetStorageType FileStorageType) (*ManagedFile, error)

CreateManagedFileFromResponseBody creates a ManagedFile from a response body. will NOT CLOSE the response body.

func (*FileManager) GetBaseUrl

func (aifm *FileManager) GetBaseUrl() string

func (*FileManager) GetLocalPathForFile

func (aifm *FileManager) GetLocalPathForFile(target FileStorageType, filename string) string

func (*FileManager) GetLocalPathOfUrl

func (aifm *FileManager) GetLocalPathOfUrl(url string) (localPath string, err error)

func (*FileManager) GetLocalTemporaryFilePath

func (aifm *FileManager) GetLocalTemporaryFilePath(fileName string) string

func (*FileManager) GetLocalTemporaryPath

func (aifm *FileManager) GetLocalTemporaryPath() string

func (*FileManager) GetPrivateLocalBasePath

func (aifm *FileManager) GetPrivateLocalBasePath() string

func (*FileManager) GetPrivateLocalFilePath

func (aifm *FileManager) GetPrivateLocalFilePath(fileName string) string

func (*FileManager) GetPublicLocalBasePath

func (aifm *FileManager) GetPublicLocalBasePath() string

func (*FileManager) GetPublicLocalFilePath

func (aifm *FileManager) GetPublicLocalFilePath(fileName string) string

func (*FileManager) GetPublicUrlForFile

func (aifm *FileManager) GetPublicUrlForFile(localFilePath string) (pubUrl string, err error)

func (*FileManager) GetRecipe added in v0.2.1

func (fm *FileManager) GetRecipe(name string) (Recipe, error)

func (*FileManager) HandleFileUpload

func (fm *FileManager) HandleFileUpload(r io.Reader, fileProcess *FileProcess, statusCh chan<- *FileProcess) (*ManagedFile, error)

func (*FileManager) LoadRecipes

func (fm *FileManager) LoadRecipes(recipesDir string) error

func (*FileManager) LogTo added in v0.5.0

func (fm *FileManager) LogTo(level string, message string)

func (*FileManager) ProcessFile

func (fm *FileManager) ProcessFile(file *ManagedFile, recipeName string, fileProcess *FileProcess, statusCh chan<- *FileProcess)

func (*FileManager) RunProcessingStep added in v0.4.1

func (fm *FileManager) RunProcessingStep(file *ManagedFile, pluginName string, params map[string]any, targetStorageType FileStorageType) (*ManagedFile, error)

RunProcessingStep applies a single processing step to a ManagedFile.

type FileProcess added in v0.2.1

type FileProcess struct {
	ID                string
	IncomingFileName  string
	RecipeName        string
	ProcessingUpdates []ProcessingStatus
	LatestStatus      *ProcessingStatus
}

func NewFileProcess added in v0.2.1

func NewFileProcess(incomingFileName, recipeName string) *FileProcess

func (*FileProcess) AddProcessingUpdate added in v0.2.1

func (fp *FileProcess) AddProcessingUpdate(update ProcessingStatus)

func (*FileProcess) GetLatestProcessingStatus added in v0.2.1

func (fp *FileProcess) GetLatestProcessingStatus() *ProcessingStatus

type FileStorageType

type FileStorageType string
const (
	FileStorageTypePrivate FileStorageType = "private"
	FileStorageTypeTemp    FileStorageType = "temp"
	FileStorageTypePublic  FileStorageType = "public"
)

type FormatConverterPlugin added in v0.1.2

type FormatConverterPlugin struct{}

func (*FormatConverterPlugin) Process added in v0.1.2

func (p *FormatConverterPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type ImageManipulationPlugin

type ImageManipulationPlugin struct{}

func (*ImageManipulationPlugin) Process

func (p *ImageManipulationPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type LogAdapter added in v0.5.0

type LogAdapter func(logLevel string, logContent string)

type ManagedFile

type ManagedFile struct {
	FileName         string         `json:"fileName"`
	MimeType         string         `json:"mimetype"`
	URL              string         `json:"url"`
	LocalFilePath    string         `json:"localFilePath"`
	FileSize         int64          `json:"fileSize"`
	MetaData         map[string]any `json:"metaData"`
	ProcessingErrors []string       `json:"processingErrors"`
	Content          []byte         `json:"-"`
}

func (*ManagedFile) EnsureFileIsLocal

func (entity *ManagedFile) EnsureFileIsLocal(fm *FileManager, target FileStorageType) (file *ManagedFile, err error)

func (*ManagedFile) EnsurePublicURL

func (entity *ManagedFile) EnsurePublicURL(fm *FileManager) (pubUrl string, err error)

func (*ManagedFile) GetFileName

func (entity *ManagedFile) GetFileName() string

func (*ManagedFile) GetLocalFilePathWithoutFileName

func (entity *ManagedFile) GetLocalFilePathWithoutFileName() string

func (*ManagedFile) GetMetaData

func (entity *ManagedFile) GetMetaData(key string) (value any)

func (*ManagedFile) Save

func (file *ManagedFile) Save() error

func (*ManagedFile) SetMetaData

func (entity *ManagedFile) SetMetaData(key string, value any)

func (*ManagedFile) UpdateFilesize

func (entity *ManagedFile) UpdateFilesize() int64

func (*ManagedFile) UpdateMimeType

func (entity *ManagedFile) UpdateMimeType() string

type OutputFormat

type OutputFormat struct {
	Format          string          `yaml:"format"`
	TargetFileNames []string        `yaml:"target_file_names"`
	StorageType     FileStorageType `yaml:"storage_type"` // public, private, temp
}

type PDFManipulationPlugin

type PDFManipulationPlugin struct{}

func (*PDFManipulationPlugin) Process

func (p *PDFManipulationPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type PDFTextExtractorPlugin

type PDFTextExtractorPlugin struct{}

func (*PDFTextExtractorPlugin) Process

func (p *PDFTextExtractorPlugin) Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)

type ProcessingPlugin

type ProcessingPlugin interface {
	Process(files []*ManagedFile, fileProcess *FileProcess) ([]*ManagedFile, error)
}

type ProcessingResultFile added in v0.3.0

type ProcessingResultFile struct {
	FileName      string
	LocalFilePath string
	URL           string
	FileSize      int64
	MimeType      string
}

type ProcessingStatus

type ProcessingStatus struct {
	ProcessID         string
	TimeStamp         int // js timestamp in unix milliseconds
	ProcessorName     string
	StatusDescription string
	Percentage        int
	Error             error
	Done              bool
	ResultingFiles    []ProcessingResultFile
}

type ProcessingStep

type ProcessingStep struct {
	PluginName string         `yaml:"plugin_name"`
	Params     map[string]any `yaml:"params"`
}

type ProgressReader

type ProgressReader struct {
	Reader      io.Reader
	Size        int64
	Uploaded    int64
	StatusCh    chan<- *FileProcess
	FileProcess *FileProcess
	Done        bool
}

func (*ProgressReader) Read

func (r *ProgressReader) Read(p []byte) (int, error)

type Recipe

type Recipe struct {
	Name              string           `yaml:"name"`
	AcceptedMimeTypes []string         `yaml:"accepted_mime_types"`
	MinFileSize       int64            `yaml:"min_file_size"`
	MaxFileSize       int64            `yaml:"max_file_size"`
	ProcessingSteps   []ProcessingStep `yaml:"processing_steps"`
	OutputFormats     []OutputFormat   `yaml:"output_formats"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL