float

package module
v0.0.0-...-56010e2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 2, 2021 License: MIT Imports: 6 Imported by: 0

README

80-bit IEEE 754 extended double precision floating-point library for Go

The float package is a software implementation of floating-point arithmetics that conforms to the 80-bit IEEE 754 extended double precision floating-point format

This package is derived from the original SoftFloat package and was implemented as a basis for a Motorola M68881/M68882 FPU emulation in pure Go

Example

package float_test

import (
    "fmt"
    "github.com/jenska/float"
)

func ExampleX80() {
    pi := float.X80Pi
    pi2 := pi.Add(pi)
    sqrtpi2 := pi2.Sqrt()
    epsilon := sqrtpi2.Mul(sqrtpi2).Sub(pi2)
    fmt.Println(epsilon)
    // Output: -0.000000000000000000433680868994
}

Error Handling

TODOs
  • improve test coverage
  • add examples
  • improve error handling
  • log/ln operations
  • atan
  • benchmarks

Documentation

Index

Examples

Constants

View Source
const (
	TininessAfterRounding  = 0
	TininessBeforeRounding = 1
)

Software IEC/IEEE floating-point underflow tininess-detection mode.

View Source
const (
	RoundNearestEven = 0
	RoundToZero      = 1
	RoundDown        = 2
	RoundUp          = 3
)

Software IEC/IEEE floating-point rounding mode.

View Source
const (
	ExceptionInvalid   = 0x01
	ExceptionDenormal  = 0x02
	ExceptionDivbyzero = 0x04
	ExceptionOverflow  = 0x08
	ExceptionUnderflow = 0x10
	ExceptionInexact   = 0x20
)

Software IEC/IEEE floating-point exception flags.

Variables

View Source
var (
	X80Zero     = newFromHexString("00000000000000000000") // 0
	X80One      = newFromHexString("3FFF8000000000000000") // 1
	X80MinusOne = newFromHexString("BFFF8000000000000000") // -1
	X80E        = newFromHexString("4000ADF85458A2BB4800") // e
	X80Pi       = newFromHexString("4000C90FDAA22168C000") // pi
	X80Sqrt2    = newFromHexString("BFFFB504F333F9DE6800") // sqrt(2)
	X80Log2E    = newFromHexString("3FFFB8AA3B295C17F000") // Log2(e)
	X80Ln2      = newFromHexString("3FFEB17217F7D1CF7800") // Ln(2)
	X80InfPos   = newFromHexString("7FFF8000000000000000") // inf+
	X80InfNeg   = newFromHexString("FFFF8000000000000000") // inf-
	X80NaN      = newFromHexString("7FFFC000000000000000") // NaN
)

"constants" fpr X80 format

View Source
var DetectTininess = TininessAfterRounding

DetectTininess tininess-detection mode.

View Source
var Exception int = 0

Exception Software IEC/IEEE floating-point exception flags.

View Source
var RoundingMode = RoundNearestEven

RoundingMode Software IEC/IEEE floating-point rounding mode.

View Source
var RoundingPrecision = 80

RoundingPrecision Software IEC/IEEE extended double-precision rounding precision. Valid values are 32, 64, and 80.

Functions

func Raise

func Raise(x int)

Raise any or all of the software IEC/IEEE floating-point exception flags.

Types

type X80

type X80 struct {
	// contains filtered or unexported fields
}

X80 represents the 80-bit extended double precision floating-point type

Example
package main

import (
	"fmt"

	"github.com/jenska/float"
)

func main() {
	pi := float.X80Pi
	pi2 := pi.Add(pi)
	sqrtpi2 := pi2.Sqrt()
	epsilon := sqrtpi2.Mul(sqrtpi2).Sub(pi2)
	fmt.Println(epsilon)
}
Output:

-0.000000000000000000433680868994

func Float32ToFloatX80

func Float32ToFloatX80(a float32) X80

Float32ToFloatX80 returns the result of converting the single-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func Float64ToFloatX80

func Float64ToFloatX80(a float64) X80

Float64ToFloatX80 returns the result of converting the double-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func Int32ToFloatX80

func Int32ToFloatX80(a int32) X80

Int32ToFloatX80 returns the result of converting the 32-bit two's complement integer `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func Int64ToFloatX80

func Int64ToFloatX80(a int64) X80

Int64ToFloatX80 returns the result of converting the 64-bit two's complement integer `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func NewFromBytes

func NewFromBytes(b []byte, order binary.ByteOrder) X80

NewFromBytes returns a new extended double precision float from a byte array in byte order LittleEndian or BigEndian

func NewFromFloat64

func NewFromFloat64(a float64) X80

NewFromFloat64 returns the result of converting the double-precision floating-point value `a' to the extended double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Add

func (a X80) Add(b X80) X80

Add returns the result of adding the extended double-precision floating-point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Append

func (a X80) Append(dst []byte, fmt byte, prec int) []byte

Append appends the string form of the floating-point number f, as generated by FormatFloat, to dst and returns the extended buffer.

func (X80) Bytes

func (a X80) Bytes(order binary.ByteOrder) []byte

Bytes returns a byte array in byte order LittleEndian or BigEndian of an extended double precision float

func (X80) Div

func (a X80) Div(b X80) X80

Div returns the result of dividing the extended double-precision floating-point value `a' by the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Eq

func (a X80) Eq(b X80) bool

Eq returns true if the extended double-precision floating-point value `a' is equal to the corresponding value `b', and false otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) EqSignaling

func (a X80) EqSignaling(b X80) bool

EqSignaling returns true if the extended double-precision floating-point value `a' is equal to the corresponding value `b', and false otherwise. The invalid exception is raised if either operand is a NaN. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Format

func (a X80) Format(fmt byte, prec int) string

Format converts the extended floating-point number f to a string, according to the format fmt. It rounds the result assuming that the original was obtained from a floating-point value of 80 bits.

The format fmt is one of 'b' (-ddddp±ddd, a binary exponent), 'e' (-d.dddde±dd, a decimal exponent), 'E' (-d.ddddE±dd, a decimal exponent), 'f' (-ddd.dddd, no exponent),

The precision prec controls the number of digits (excluding the exponent) printed by the 'e', 'E', 'f' formats. For 'e', 'E', 'f' it is the number of digits after the decimal point.

func (X80) Ge

func (a X80) Ge(b X80) bool

Ge returns true if the extended double-precision floating-point value `a' is greater than or equal to the corresponding value `b', and false otherwise.

func (X80) GeQuiet

func (a X80) GeQuiet(b X80) bool

GeQuiet returns true if the extended double-precision floating-point value `a' is greater than or equal to the corresponding value `b', and false otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Gt

func (a X80) Gt(b X80) bool

Gt returns true if the extended double-precision floating-point value `a' is greater than the corresponding value `b', and false otherwise.

func (X80) GtQuiet

func (a X80) GtQuiet(b X80) bool

GtQuiet returns true if the extended double-precision floating-point value `a' is greater than the corresponding value `b', and false otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Internal

func (a X80) Internal() string

Internal returns the internal represantion of the 80bit float value in hex format.

func (X80) IsNaN

func (a X80) IsNaN() bool

IsNaN returns true if the value is NaN, otherwise false

func (X80) IsSignalingNaN

func (a X80) IsSignalingNaN() bool

IsSignalingNaN returns true of the value is a signaling NaN, otherwise false

func (X80) Le

func (a X80) Le(b X80) bool

Le returns true if the extended double-precision floating-point value `a' is less than or equal to the corresponding value `b', and false otherwise.

func (X80) LeQuiet

func (a X80) LeQuiet(b X80) bool

LeQuiet returns true if the extended double-precision floating-point value `a' is less than or equal to the corresponding value `b', and false otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Lt

func (a X80) Lt(b X80) bool

Lt returns true if the extended double-precision floating-point value `a' is less than the corresponding value `b', and false otherwise. The comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) LtQuiet

func (a X80) LtQuiet(b X80) bool

LtQuiet returns true if the extended double-precision floating-point value `a' is less than the corresponding value `b', and false otherwise. Quiet NaNs do not cause an exception. Otherwise, the comparison is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Mul

func (a X80) Mul(b X80) X80

Mul returns the result of multiplying the extended double-precision floating- point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Rem

func (a X80) Rem(b X80) X80

Rem returns the remainder of the extended double-precision floating-point value `a' with respect to the corresponding value `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) RoundToInt

func (a X80) RoundToInt() X80

RoundToInt rounds the extended double-precision floating-point value `a' to an integer, and returns the result as an extended quadruple-precision floating-point value. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) Sqrt

func (a X80) Sqrt() X80

Sqrt returns the square root of the extended double-precision floating-point value `a'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) String

func (a X80) String() string

func (X80) Sub

func (a X80) Sub(b X80) X80

Sub returns the result of subtracting the extended double-precision floating- point values `a' and `b'. The operation is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) ToFloat32

func (a X80) ToFloat32() float32

ToFloat32 returns the result of converting the extended double-precision floating- point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) ToFloat64

func (a X80) ToFloat64() float64

ToFloat64 returns the result of converting the extended double-precision floating- point value `a' to the double-precision floating-point format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.

func (X80) ToInt32

func (a X80) ToInt32() int32

ToInt32 returns the result of converting the extended double-precision floating- point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned.

func (X80) ToInt32RoundZero

func (a X80) ToInt32RoundZero() int32

ToInt32RoundZero returns the result of converting the extended double-precision floating- point value `a' to the 32-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned.

func (X80) ToInt64

func (a X80) ToInt64() int64

ToInt64 returns the result of converting the extended double-precision floating- point value `a' to the 64-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic---which means in particular that the conversion is rounded according to the current rounding mode. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned.

func (X80) ToInt64RoundZero

func (a X80) ToInt64RoundZero() int64

ToInt64RoundZero returns the result of converting the extended double-precision floating-point value `a' to the 64-bit two's complement integer format. The conversion is performed according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic, except that the conversion is always rounded toward zero. If `a' is a NaN, the largest positive integer is returned. Otherwise, if the conversion overflows, the largest integer with the same sign as `a' is returned.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL