orcgen

package module
v2.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2024 License: MIT Imports: 6 Imported by: 1

README

Orcgen

Orcgen is a Go package that enables an easy-to-use conversion of web pages and HTML content to various file formats like PDF, PNG, and JPEG. The underlying implementation uses the Rod library for the page conversion.

Functionalities and packages

Orcgen provides the following functionalities:

  • Conversion of web pages and HTML content to a static file (PNG, PDF...).

    • This can be done simply using the Generate function, but if you need prior configuration, you can access all the webdriver functionalities.
    • You can also use the other functions at orcgen.go, as specified in the examples page.
Package folder
  • FileInfo: A struct to standardize the returns and file saves. There's a Output function that writes the content to a output file.

  • Handlers: The implementations of the page file save functionality (PDF / Screenshots).

  • Webdriver: Simple wrapper over rod library.

Installation

To use Orcgen, you can install it via Go modules:

    go get github.com/luabagg/orcgen/v2

Then you can import it in your Go code:

    import "github.com/luabagg/orcgen/v2"

Usage Example

    import "github.com/luabagg/orcgen/v2"

    // Webpage conversion
    orcgen.Generate(
        "https://www.github.com",
        orcgen.ScreenshotConfig{
            Format: "webp",
        },
        "github.webp",
    )

    // HTML conversion
    orcgen.Generate(
        []byte("my html"),
        orcgen.PDFConfig{
            Landscape:         true,
            PrintBackground:   true,
            PreferCSSPageSize: true,
        },
        "html.pdf",
    )

The package comes with examples that demonstrate the usage of the various functions and features provided by Orcgen. It's the way-to-go if you're trying to use this package for the first time.

You can see more in examples_test.go page.

Contributors

This project is an open-source project, and contributions from other developers are welcome. If you encounter any issues or have suggestions for improvement, please submit them on the project's GitHub page.

Documentation

Overview

Package orcgen generates files from HTML / URLs - any webpage can be informed, or even an HTML file.

The file will be generated accordingly the configured handler. You can also configure the webdriver to control the page before saving the file.

Example

Examples of how to use the package structs directly.

screenshotHandler := screenshot.New()
screenshotHandler.SetFullPage(false)

wd := webdriver.FromDefault()
defer wd.Close()

// Using the page directly to search before screnshotting:
page := wd.UrlToPage("https://google.com")
wd.WaitLoad(page)
page.MustInsertText("github orcgen package golang").Keyboard.Type(input.Enter)
wd.WaitLoad(page)

// Using the handler directly - creates a PNG of the Google search:
fileinfo, err := screenshotHandler.GenerateFile(page)
if err == nil {
	// Output must be called to create a new file.
	filename := "google.png"
	fileinfo.Output(getName(filename))
	fmt.Printf("%s generated successfully\n", filename)
}

// With NewHandler function - creates a PDF of the Google search:
// It will not check the extension, so make sure to use the correct one.
// e.g: if you use a PagePrintToPDF config, the output must be a PDF file.
fileinfo, err = orcgen.NewHandler(orcgen.PDFConfig{
	PrintBackground: true,
	PageRanges:      "1,2",
}).GenerateFile(page)

if err == nil {
	filename := "google.pdf"
	fileinfo.Output(getName(filename))
	fmt.Printf("%s generated successfully\n", filename)
}
Output:

google.png generated successfully
google.pdf generated successfully

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertHTML

func ConvertHTML[Config handlers.Config](handler handlers.FileHandler[Config], html []byte) (*fileinfo.Fileinfo, error)

ConvertHTML converts the bytes using the given handler, and returns a Fileinfo object.

handler is a Handler instance (see pkg/handlers). html is the html byte array (if it's a filepath, use os.ReadFile(filepath)).

The connection with the Browser is automatically closed.

Example

ExampleConvertHTML gives examples using the ConvertHTML function.

// Converting the HTML file to a JPG file.
filename := "html.jpg"
fileinfo, err := orcgen.ConvertHTML(
	screenshot.New().SetConfig(orcgen.ScreenshotConfig{
		Format: "jpeg",
	}),
	getHTML(),
)
err = fileinfo.Output(getName(filename))
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}

// Converting the HTML file to a PDF file.
filename = "html.pdf"
fileinfo, err = orcgen.ConvertHTML(pdf.New(), getHTML())
err = fileinfo.Output(getName(filename))
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}
Output:

html.jpg generated successfully
html.pdf generated successfully

func ConvertWebpage

func ConvertWebpage[Config handlers.Config](handler handlers.FileHandler[Config], url string) (*fileinfo.Fileinfo, error)

ConvertWebpage converts the url using the given handler, and returns a Fileinfo object

handler is a Handler instance (see pkg/handlers). url will be converted as configured, if you need special treats, check the Webdriver docs.

The connection with the Browser is automatically closed.

Example

ExampleConvertWebpage gives examples using the ConvertWebpage function.

// Converting the Faceboox homepage to a PNG file.
filename := "facebook.png" // png is the default extension for screenshots.
fileinfo, err := orcgen.ConvertWebpage(
	screenshot.New(), "https://www.facebook.com",
)

err = fileinfo.Output(getName(filename))
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}

// Converting the X homepage to a PDF file.
filename = "x.pdf"
fileinfo, err = orcgen.ConvertWebpage(
	pdf.New().SetFullPage(true), "https://www.x.com",
)

err = fileinfo.Output(getName(filename))
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}
Output:

facebook.png generated successfully
x.pdf generated successfully

func Generate

func Generate[T string | []byte, Config handlers.Config](html T, config Config, output string) error

Generate generates a file from the given HTML / URL and outputs it to the given path.

There's no checking in the extension type, so make sure to use the correct one.

Example

ExampleGenerate uses the Generate function to write to the output.

// Converting the GitHub homepage to a webp file.
filename := "github.webp"
err := orcgen.Generate(
	"https://www.github.com",
	orcgen.ScreenshotConfig{
		Format: "webp",
	},
	getName(filename),
)
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}

// Converting the HTML file to a PDF file.
filename = "html.pdf"
err = orcgen.Generate(
	getHTML(),
	orcgen.PDFConfig{
		Landscape:           true,
		DisplayHeaderFooter: true,
		PrintBackground:     true,
		MarginTop:           new(float64),
		MarginBottom:        new(float64),
		MarginLeft:          new(float64),
		MarginRight:         new(float64),
		PreferCSSPageSize:   true,
	},
	getName(filename),
)
if err == nil {
	fmt.Printf("%s generated successfully\n", filename)
}
Output:

github.webp generated successfully
html.pdf generated successfully

func NewHandler

func NewHandler[Config handlers.Config](config Config) handlers.FileHandler[Config]

NewHandler creates a handler from the config.

It checks the config type and instanciates the handler accordingly.

Example

ExampleNewHandler shows how to use ExampleNewHandler function to create a new handler.

screenshotHandler := orcgen.NewHandler(
	orcgen.ScreenshotConfig{},
)
screenshotHandler.SetFullPage(true)

pdfHandler := orcgen.NewHandler(
	orcgen.PDFConfig{
		PrintBackground: false,
	},
)
pdfHandler.SetFullPage(false)
Output:

Types

type PDFConfig added in v2.0.2

type PDFConfig = proto.PagePrintToPDF

type ScreenshotConfig added in v2.0.2

type ScreenshotConfig = proto.PageCaptureScreenshot

Aliases:

Directories

Path Synopsis
pkg
fileinfo
Package fileinfo is used for file information control.
Package fileinfo is used for file information control.
handlers/pdf
Package pdf is used to generate PDFs from the rod Page instance.
Package pdf is used to generate PDFs from the rod Page instance.
handlers/screenshot
Package screenshot is used to generate screenshots from the rod Page instance.
Package screenshot is used to generate screenshots from the rod Page instance.
webdriver
Package webdriver provides a wrapper for the rod library to perform browser operations.
Package webdriver provides a wrapper for the rod library to perform browser operations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL