extended

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2022 License: MIT Imports: 6 Imported by: 6

README

80-bit Extended-Precision Floating-Point Numbers

This is a Go library that provides a type for representing 80-bit extended-precision floating-point numbers. It is licensed under the terms of the MIT license, see LICENSE.txt for details.

Example

package main

import (
	"encoding/binary"
	"fmt"

	"github.com/depp/extended"
)

func main() {
	e := extended.Extended{
		SignExponent: 0x3fff,
		Fraction:     0xC000000000000000,
	}
	// Value: 1.500
	fmt.Printf("Value: %.3f\n", e)

	// Float64: 1.500
	f64 := e.Float64()
	fmt.Printf("Float64: %.3f\n", f64)

	// Value: 100.75
	// SignExponent: 0x4005
	// Fraction: 0xc980000000000000
	e = extended.FromFloat64(100.75)
	fmt.Println("Value:", e)
	fmt.Printf("SignExponent: 0x%04x\n", e.SignExponent)
	fmt.Printf("Fraction: 0x%016x\n", e.Fraction)

	// Binary (big endian): 4005c980000000000000
	var buf [extended.ByteSize]byte
	e.PutBytes(binary.BigEndian, buf[:])
	fmt.Printf("Binary (big endian): %x\n", buf[:])

	// Binary (little endian): 00000000000080c90540
	e.PutBytes(binary.LittleEndian, buf[:])
	fmt.Printf("Binary (little endian): %x\n", buf[:])
}

Rounding, Infinity, and NaN

This library uses round-to-even when converting from 80-bit floats to 64-bit floats. This should be what you’re used to, and what you expect! In round-to-even, when an 80-bit float is exactly half-way between two possible float64 values, the value with a zero in the least-significant bit is chosen (or the value with the larger exponent is chosen, if the values have different exponents).

Values which are outside the range of possible float64 values are rounded to infinity.

Infinity and NaN are preserved. Different types of NaN values are not distinguished from each other, but the sign of NaN values is preserved during conversion.

Documentation

Overview

Package extended provides conversions to and from 80-bit "extended" floating-point numbers (float80).

Note that while NaNs are handled by this package, the distinction between quiet NaN and signaling NaN is not preserved during conversions.

Index

Constants

View Source
const ByteSize = 10

ByteSize is the size, in bytes, of the binary representation of an extended-precision float.

Variables

View Source
var ErrIsNaN = errors.New("value is NaN")

ErrIsNaN indicates that the value is NaN and cannot be converted.

Functions

This section is empty.

Types

type Extended

type Extended struct {
	// The sign is stored as the high bit. The low 15 bits contain the exponent,
	// with a bias of 16383.
	SignExponent uint16

	// The fraction includes a ones place as the high bit. The value in the ones
	// place may be zero.
	Fraction uint64
}

An Extended is an 80-bit extended precision floating-point number.

func FromBytes

func FromBytes(order binary.ByteOrder, b []byte) (e Extended)

FromBytes deserializes an extended-precision float from its binary representation. The binary representation takes 10 bytes.

func FromBytesBigEndian

func FromBytesBigEndian(b []byte) (e Extended)

FromBytesBigEndian deserializes an extended-precision float from its binary representation in big endian. The binary representation takes 10 bytes.

func FromBytesLittleEndian

func FromBytesLittleEndian(b []byte) (e Extended)

FromBytesLittleEndian deserializes an extended-precision float from its binary representation in little endian. The binary representation takes 10 bytes.

func FromFloat64

func FromFloat64(x float64) (e Extended)

FromFloat64 converts a 64-bit floating-point number to an 80-bit extended floating-point number.

func (Extended) Append

func (e Extended) Append(buf []byte, fmt byte, prec int) []byte

Append appends the string form of the number to buf and returns the result.

func (Extended) BigFloat

func (e Extended) BigFloat() (*big.Float, error)

BigFloat converts the number to an arbitrary-precision float. Returns ErrIsNaN if the value is NaN, because NaN cannot be represented by big.Float.

func (Extended) Float64

func (e Extended) Float64() float64

Float64 returns the value of this 80-bit floating-point number as a float64. The result is rounded to the nearest float64, breaking ties towards even in the least-significant bit. Values which, after rounding, would be outside the range of a float64 are flushed to zero or infinity.

func (Extended) Format

func (e Extended) Format(s fmt.State, format rune)

Format implements fmt.Formatter.

func (Extended) PutBytes

func (e Extended) PutBytes(order binary.ByteOrder, b []byte)

PutBytes serializes the value as binary and writes it to a byte array. The binary representation takes 10 bytes.

func (Extended) PutBytesBigEndian

func (e Extended) PutBytesBigEndian(b []byte)

PutBytesBigEndian serializes the value as a big-endian binary value and writes it to a byte array. The binary representation takes 10 bytes.

func (Extended) PutBytesLittleEndian

func (e Extended) PutBytesLittleEndian(b []byte)

PutBytesLittleEndian serializes the value as a little-endian binary value and writes it to a byte array. The binary representation takes 10 bytes.

func (Extended) String

func (e Extended) String() string

String converts the extended-precision value to a string.

func (Extended) Text

func (e Extended) Text(format byte, prec int) string

Text converts the floating-point number to a string using the given format specifier and precision.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL