Floating point
|
The BASIC assembler, as standard, does not have any support for true floating point instructions.
You have the ability to convert integers to your implementation-defined 'floating point' and
perform basic mathematics with them (most usually fixed point), but you cannot interact with a
floating point co-processor and do things the 'native' way.
There are, however, patches which extend the things that the assembler can do - which include
FP instructions.
Parts of this documentation has been taken from the ARM Assembler manual.
The ARM processor can interface with up to sixteen co-processors. The ARM3 and later have virtual
co-processors within the ARM to handle internal control functions. But the first co-processor
that was available was the floating point processor. This chip handles floating point maths to
the IEEE standard.
A standard ARM floating point instruction set has been defined, so that the code may be used
across all RISC OS machines. If the actual hardware does not exist, then the instructions are
trapped and executed by the floating point emulator module (FPEmulator). The program does not
need to know whether or not the FP co-processor is present. The only real difference will be
speed of execution.
If you are interested in the co-processor aspect, read the
document on co-processor access.
The ARM IEEE FP system has eight high precision FP registers (F0 to F7). The register format is
irrelevant as you cannot access those registers directly, the register is only 'visible' when it
is transferred to memory or to an ARM register. In memory, an FP register consumes three
words, but as the FP system will be reloading its own register, the format of these three words
is considered irrelevant.
There is also an FPSR (floating point status register) which, similar to the ARM's own PSR,
holds the status information that an application might require. Each of the flags available has
a 'trap' which allows the application to enable or disable traps associated with the given
error.
The FPSR also allows you to tell between different implementations of the FP system.
There may also be an FPCR (floating point control register). This holds information that the
application should not access, such as flags to turn the FP unit on and off. Typically, hardware
will have an FPCR, software will not.
FP units can be software implementations such as the FPEmulator modules, hardware implementations
such as the FP chip (and support code), or a combination of both.
The best example of a 'both' that I can think of is the Warm Silence Software patch that will
utilise the 80x87 chip on suitably equipped PC co-processor cards as a floating point processor
for ARM FP operations. Talk about resource sharing...!
The results are calculated as though it were infinite precision, then they are rounded to the
length required. The rounding may be to nearest, to +infinity(P), to -infinity(M), or to zero.
The default is rounding to nearest. If a tie, it will round to nearest even.
The working precision is 80 bits, comprising of a 64 bit mantissa, a 15 bit exponent, and a sign
bit. Specific instructions that work with single precision may provide better performance in
some implementations - notably fully-software-based ones.
The FPSR contains the necessary status for the FP system. The IEEE flags are always present, but the result flags are only available after an FP compare operation.
Floating point instructions should not be used from SVC mode.
Exception Flags Byte, the lower byte of the FPSR.
6 4 3 2 1 0 FPSR: Reserved INX UFL OFL DVZ IVOWhenever an exception condition arises, the appropriate cumulative exception flag in bits 0 to 4 will be set to 1. If the relevant trap enable bit is set, then an exception is also delivered to the user's program in a manner specific to the operating system. (Note that in the case of underflow, the state of the trap enable bit determines under which conditions the underflow flag will be set.) These flags can only be cleared by a WFS instruction.
IVO - invalid operation
The IVO flag is set when an operand is invalid for the operation to be performed. Invalid
operations are:
DVZ - division by zero
The DVZ flag is set if the divisor is zero and the dividend a finite, non-zero number. A
correctly signed infinity is returned if the trap is disabled. The flag is also set for LOG(0)
and for LGN(0). Negative infinity is returned if the trap is disabled.
OFL - overflow
The OFL flag is set whenever the destination format's largest number is exceeded in magnitude by
what the rounded result would have been were the exponent range unbounded. As overflow is
detected after rounding a result, whether overflow occurs or not after some operations depends
on the rounding mode.
If the trap is disabled either a correctly signed infinity is returned, or the format's largest
finite number. This depends on the rounding mode and floating point system used.
UFL - underflow
Two correlated events contribute to underflow:
INX - inexact
The INX flag is set if the rounded result of an operation is not exact (different from the
value computable with infinite precision), or overflow has occurred while the OFL trap was
disabled, or underflow has occurred while the UFL trap was disabled. OFL or UFL traps take
precedence over INX.
The INX flag is also set when computing SIN or COS, with the exceptions of SIN(0) and COS(1).
The old FPE and the FPPC system may differ in their handling of the INX flag. Because of this
inconsistency we recommend that you do not enable the INX trap.
Precision is:
S -
single
D -
double
E -
double extended
P -
packed decimal
EP -
extended packed decimal
Rounding modes are:
-
nearest (no letter required)
P -
plus infinity
M -
minus infinity
Z -
zero
LDF<condition><precision><fp register>, <address>
Load Floating Point value
The address can be in the forms:
STF<condition><precision><fp register>, <address>
Store floating point value.
The address can be in the forms:
LFM
and SFM
These are similar in idea to LDM and STM, but they will not be described because some versions
of FPEmulator do not support them. The FP module in RISC OS 3.1x (2.87) does, as do (I think)
later versions. If you know they your software will only operate on a system that supports SFM,
then use it. Otherwise you'll need to 'fake' it with a sequence of STFs. Likewise for LFM/LDF.
FLT<condition><precision><rounding> <fp register>, <register>
FLT<condition><precision><rounding> <fp register>, #<value>
Convert integer to floating point, either an ARM register or an absolute value.
FIX<condition><rounding> <register>, <fp register>
Convert floating point to integer.
WFS<condition> <register>
Write floating point status register with the contents of the ARM register specified.
RFS<condition> <register>
Read floating point status register into the ARM register specified.
WFC<condition> <register>
Write floating point control register with the contents of the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
RFC<condition> <register>
Read floating point control register into the ARM register specified.
Supervisor mode only, and only on hardware that supports it.
Floating point coprocessor data operations:
The formats of these instructions are:
The binary operations are...
ADF -
Add
DVF -
Divide
FDV -
Fast Divide - only defined to work with single precision
FML -
Fast Multiply - only defined to work with single precision
FRD -
Fast Reverse Divide - only defined to work with single precision
MUF -
Multiply
POL -
Polar Angle
POW -
Power
RDF -
Reverse Divide
RMF -
Remainder
RPW -
Reverse Power
RSF -
Reverse Subtract
SUF -
Subtract
The unary operations are...
ABS -
Absolute Value
ACS -
Arc Cosine
ASN -
Arc Sine
ATN -
Arc Tangent
COS -
Cosine
EXP -
Exponent
LOG -
Logarithm to base 10
LGN -
Logarithm to base e
MVF -
Move
MNF -
Move Negated
NRM -
Normalise
RND -
Round to integral value
SIN -
Sine
SQT -
Square Root
TAN -
Tangent
URD -
Unnormalised Round
CMF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with FP register 1.
The varient CMFE compares with exception.
CNF<condition><precision><rounding> <fp register 1>, <fp register 2>
Compare FP register 2 with the negative of FP register 1.
The varient CMFE compares with exception.
Compares are provided with and without the exception that could arise if the numbers are unordered (ie one or both of them is not-a-number). To comply with IEEE 754, the CMF instruction should be used to test for equality (ie when a BEQ or BNE is used afterwards) or to test for unorderedness (in the V flag). The CMFE instruction should be used for all other tests (BGT, BGE, BLT, BLE afterwards).
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
When the AC bit in the FPSR is clear, the ARM flags N, Z, C, V refer to the following after
compares:
N =
Less than
Z =
Equal
C =
Greater than, or equal
V =
Unordered
And when the AC bit is set, the flags refer to:
N =
Less than
Z =
Equal
C =
Greater than, or equal, or unordered
V =
Unordered
In APCS code with objasm, to store a floating point value, you would use the directive DCF. You append 'S' for single precision, and 'D' for double.
Here is a brief example. We MUL two numbers, but use the floating point unit instead of the ARM's multiplication instruction. This could be modified to multiply two floating point numbers, and give a floating point response, but as it is only a short example, it will simply use two integers.
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply FLTS F0, R0 FLTS F1, R1 FMLS F2, F0, F1 FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) ENDThere is no option to download this program, as standard BASIC won't touch it. However, you can include FP statements if you can 'build' the instructions.
This version will work in BASIC:
REM >fpmul REM REM Short example to multiply two integers via the REM floating point unit. Totally pointless, but... DIM code% 20 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% .multiply EQUD &EE000110 ; FLTS F0, R2 EQUD &EE011110 ; FLTS F1, R1 EQUD &EE902101 ; FMLS F2, F0, F1 EQUD &EE100112 ; FIXS R0, F2 MOVS PC, R14 ] NEXT INPUT "First number : "one% INPUT "Second number : "two% A% = one% B% = two% result% = USR(multiply) PRINT "The result is "+STR$(result%) END
One final thing... Remember to use the appropriate precision for what you are doing.
REM >precision REM REM Short example to show how data can be 'lost' due REM to using incorrect precision. ON ERROR PRINT REPORT$ + " at " + STR$(ERL/10) : END DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% EXT 1 .single_precision FLTS F0, R0 FIX R0, F0 MOV PC, R14 .double_precision FLTD F0, R0 FIX R0, F0 MOV PC, R14 .doubleext_precision FLTE F0, R0 FIX R0, F0 MOV PC, R14 ] NEXT A% = &1ffffff PRINT "Original input is " + STR$~A% PRINT "Single precision " + STR$~(USR(single_precision)) PRINT "Double precision " + STR$~(USR(double_precision)) PRINT "Double extended " + STR$~(USR(doubleext_precision)) PRINT ENDThe result of this program is:
Original input is 1FFFFFF Single precision 2000000 Double precision 1FFFFFF Double extended 1FFFFFFYou don't need to use double precision everywhere, though, as it will be that much slower. Simply keep this in mind if you are dealing with large numbers.
In order to test the actual speed differences, I wrote a test program:
DIM code% 64 FOR loop% = 0 TO 2 STEP 2 P% = code% [ OPT loop% MOV R0, #23 MOV R1, #1<<16 .timetest FLTD F0, R0 FLTD F1, R0 MUFD F2, F0, F1 SUBS R1, R1, #1 BNE timetest MOV PC, R14 ] NEXT t% = TIME CALL code% PRINT "That took "+STR$(TIME - t%)+" centiseconds." ENDI tried various precisions, and also the fast multiply. It showed something interesting. So I tried multiplication, and addition. All with the same data (input 23).
Here are my results for a million (roughly) convert-and-process operations (ARM710 processor, FPEmulator 4.14):
Operation Fast single Single Double Double extended Multiplication 1731cs 1755cs 1965cs 1712cs Division 2169cs 2169cs 2618cs 2479cs Addition n/a 1684cs 1899cs 1646csThis seems to show that double extended precision is the fastest on my machine for a selection of operations. Thus, it is incorrect to simply assume more complexity takes longer time. My personal suspicion here is the internal format is double extended, thus working directly with it entails no loss due to converting the value to a different precision.
The moral here? Don't be afraid to experiment...