sortfile

package
v0.0.1-rc20230124 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2023 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package sortfile provides functions to sort a file. Both in-memory and external merge sort.

## Usage

```go import "github.com/KEINOS/go-sortfile/sortfile" ```

Index

Examples

Constants

View Source
const (
	LF   = "\n"    // LF is the line feed character
	CR   = "\r"    // CR is the carriage return character
	CRLF = CR + LF // CRLF is the carriage return and line feed character
)

Variables

This section is empty.

Functions

func ExternalFile

func ExternalFile(sizeFileIn, sizeChunk datasize.InBytes, ptrFileIn io.Reader, ptrFileOut io.Writer, isLess func(string, string) bool) error

ExternalFile sorts the file using external merge sort (K-way merge sort).

The isLess agument is a function to compare two lines. If isLess is nil, the default is used.

  // Default isLess function
  func isLess(a, b string) bool {
	     return a < b // to reverse the sort, use a > b
  }

If the sizeFileIn is smaller than the sizeChunk, we recommend to use InMemory sort instead.

Example
package main

import (
	"log"
	"os"
	"path/filepath"

	"github.com/KEINOS/go-sortfile/sortfile"
	"github.com/KEINOS/go-sortfile/sortfile/datasize"
)

func main() {
	exitOnError := func(err error) {
		if err != nil {
			log.Fatal(err)
		}
	}

	// Input and output file paths
	pathFileIn := filepath.Join("testdata", "sorted_chunks", "input_shuffled.txt")

	// Get file and memory information
	sizeFileIn, _, err := datasize.File(pathFileIn)
	exitOnError(err)

	sizeMemoryFree, err := datasize.AvailableMemory()
	exitOnError(err)

	// Open the file to read
	fileIn, err := os.Open(pathFileIn)
	exitOnError(err)

	defer fileIn.Close()

	fileOut := os.Stdout

	// External merge sort with sizeMemoryFree as the chunk size. Use the default
	// sort function (by nil).
	err = sortfile.ExternalFile(sizeFileIn, sizeMemoryFree, fileIn, fileOut, nil)
	exitOnError(err)
}
Output:

Alice
Bob
Carol
Charlie
Dave
Ellen
Eve
Frank
Isaac
Ivan
Justin
Mallet
Mallory
Marvin
Matilda
Oscar
Pat
Peggy
Steve
Trent
Trudy
Victor
Walter
Zoe

func FileExists

func FileExists(pathFile string) bool

FileExists returns true if the path exists and is a file.

Example
package main

import (
	"fmt"
	"os"
	"path/filepath"

	"github.com/KEINOS/go-sortfile/sortfile"
)

func main() {
	for _, pathTarget := range []string{
		filepath.Join("testdata", "sorted_chunks", "input_shuffled.txt"), // Existing file
		os.TempDir(),                // Exists but not a file
		"unknown-non-existing-file", // Not exists
	} {
		exists := sortfile.FileExists(pathTarget)

		fmt.Println("Is file:", exists, ":", pathTarget)
	}
}
Output:

Is file: true : testdata/sorted_chunks/input_shuffled.txt
Is file: false : /var/folders/8c/lmckjks95fj4h_jqzw4v3k_w0000gn/T/
Is file: false : unknown-non-existing-file

func FromPath

func FromPath(pathFileIn, pathFileOut string, forceExternalSort bool) error

FromPath sorts the file by lines and stores the result in the given path.

It will sort in-memory if the file size is smaller than the current free memory. Otherwise it will use the external merge sort.

It is similar to FromPathFunc() but it uses the default isLess() function.

Example
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/KEINOS/go-sortfile/sortfile"
)

func main() {
	exitOnError := func(err error) {
		if err != nil {
			log.Fatal(err)
		}
	}

	// Input and output file paths
	pathFileIn := filepath.Join("testdata", "sorted_chunks", "input_shuffled.txt")
	pathFileOut := filepath.Join(os.TempDir(), "pkg-sortfile_example_from_path.txt")

	// Clean up the output file after the test
	defer func() {
		exitOnError(os.Remove(pathFileOut))
	}()

	// Sort file in-memory since the file size is small
	forceExternalSort := false // auto detect

	err := sortfile.FromPath(pathFileIn, pathFileOut, forceExternalSort)
	exitOnError(err)

	// Print the result
	data, err := os.ReadFile(pathFileOut)
	exitOnError(err)

	fmt.Println(string(data))
}
Output:

Alice
Bob
Carol
Charlie
Dave
Ellen
Eve
Frank
Isaac
Ivan
Justin
Mallet
Mallory
Marvin
Matilda
Oscar
Pat
Peggy
Steve
Trent
Trudy
Victor
Walter
Zoe

func FromPathFunc

func FromPathFunc(pathFileIn, pathFileOut string, forceExternalSort bool, isLess func(string, string) bool) error

FromPath sorts the file by lines and stores the result in the given path.

It will sort in-memory if the file size is smaller than the current free memory. Otherwise it will use the external merge sort.

It is similar to FromPath() but it allows you to specify your own isLess() function. If isLess is nil, it will use the default isLess() function.

  // Default isLess function
  func isLess(a, b string) bool {
	     return a < b // to reverse the sort, use a > b
  }
Example
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/KEINOS/go-sortfile/sortfile"
)

func main() {
	exitOnError := func(err error) {
		if err != nil {
			log.Fatal(err)
		}
	}

	// Input and output file paths
	pathFileIn := filepath.Join("testdata", "sorted_chunks", "input_shuffled.txt")
	pathFileOut := filepath.Join(os.TempDir(), "pkg-sortfile_example_from_path.txt")

	// Clean up the output file after the test
	defer func() {
		exitOnError(os.Remove(pathFileOut))
	}()

	// Sort file in-memory since the file size is small
	forceExternalSort := false // auto detect

	// User defined sort function (reverse sort)
	isLess := func(a, b string) bool {
		return a > b
	}

	err := sortfile.FromPathFunc(pathFileIn, pathFileOut, forceExternalSort, isLess)
	exitOnError(err)

	// Print the result
	data, err := os.ReadFile(pathFileOut)
	exitOnError(err)

	fmt.Println(string(data))
}
Output:

Zoe
Walter
Victor
Trudy
Trent
Steve
Peggy
Pat
Oscar
Matilda
Marvin
Mallory
Mallet
Justin
Ivan
Isaac
Frank
Eve
Ellen
Dave
Charlie
Carol
Bob
Alice

func InMemory

func InMemory(numLines int, input io.Reader, output io.Writer, isLess func(string, string) bool) error

InMemory sorts the lines in-memory from the given io.Reader and writes the result to the given io.Writer. Note that the number of lines is required to be known in advance.

Usually it is recommended to use the FromPath() function which detects whether to use the in-memory sort or the external merge sort.

Example

Example of using the in-memory sort.

Note that the number of lines is required to be known in advance. Usually it is recommended to use the FromPath() function which detects whether to use the in-memory sort or the external merge sort.

package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"github.com/KEINOS/go-sortfile/sortfile"
	"github.com/KEINOS/go-sortfile/sortfile/datasize"
)

func main() {
	pathFileIn := filepath.Join("testdata", "sorted_chunks", "input_shuffled.txt")

	// Get the number of lines in the file
	sizeFile, numLines, err := datasize.File(pathFileIn)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("File size:", sizeFile)

	// Open the input file
	ptrFileIn, err := os.Open(pathFileIn)
	if err != nil {
		log.Fatal(err)
	}

	defer ptrFileIn.Close()

	// Output to stdout
	ptrFileOut := os.Stdout

	// Custom sort function as reverse alphabetical order
	isLess := func(a, b string) bool {
		return a > b
	}

	// Sort the file in-memory. Use default isLess function for sorting (by nil).
	if err := sortfile.InMemory(numLines, ptrFileIn, ptrFileOut, isLess); err != nil {
		log.Fatal(err)
	}
}
Output:

File size: 145 Bytes
Zoe
Walter
Victor
Trudy
Trent
Steve
Peggy
Pat
Oscar
Matilda
Marvin
Mallory
Mallet
Justin
Ivan
Isaac
Frank
Eve
Ellen
Dave
Charlie
Carol
Bob
Alice

Types

This section is empty.

Directories

Path Synopsis
Package chunk is a chunk file manager.
Package chunk is a chunk file manager.
Package datasize defines the type InBytes which represents a size in bytes.
Package datasize defines the type InBytes which represents a size in bytes.
Package inmemory provides sorting algorithms for in-memory data.
Package inmemory provides sorting algorithms for in-memory data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL