The 920ATC Instruction Set.
        ===========================
        Terry Froggatt, 7th January 2017.


Background.
-----------

In these notes, for simplicity,
statements about the 920B also apply to the 903,
statements about the 920C also apply to the 905.
The reader is assumed to be familiar with these computers.
OMP = Operators Monitor Panel.

The 920ATC was designed by Maritime Aircraft Systems Division
in Rochester, a rebranding of Airborne Computing Division,
which had moved from Borehamwood to Rochester in around 1970.

The 920ATC instruction set was based on the earlier 920A 920B 920M 920C
series, which were products of Mobile Computing Division in Borehamwood.

Following on from Airborne Computing Division's use of
the 920B in Nimrod Mk 1 and the 920M in Jaguar,
the 920ATC was developed for the Nimrod upgrades.

I understand that folk at RAE Farnborough had told Rochester that
they would be unlikely to be awarded the Nimrod upgrade computer
contract unless their offering included a hardware stack and
hardware floating point, which rival offerings did include.
So a hardware stack and hardware floating point were "provided".

The stack was somewhat impotent, being only 64 words long
shared across all four program levels, so it was probably
long enough to hold the data associated with expressions,
but probably not for holding subroutine parameters, and
it was certainly was not intended to hold subroutine links.

The floating point used a rather odd format, providing a meagre
18 mantissa bits whilst providing an excessive 18 exponent bits,
with lazy standardisation, in contrast to the standardised 28/7
or 35/18 formats supported by software, and there were assorted
problems with the implementation.

The Nimrod upgrade software was written in CORAL, using a compiler
purchased from CAP Reading by RAE Farnborough. CAP developed it as
a cross-compiler on a 1900 (sometimes at Bankside power station),
then it was rehosted as a 905 native compiler. (I wrote SODAR to
integrate CAP CORAL with 905 RADOS). Later it was rehosted at
Rochester to be a 4080 cross-compiler, largely by Roger Holmes.

Roger & I are both pretty sure that the compiler was never modified
to use the stack, or to use the 18/18 floating point hardware,
although it did compile for 28/7 packed QF format. I did write a
version of the software interpreter QF to use the 18/18 format,
and some optimised matrix operations and a square root routine
in 18/18 format, all in SIR callable from CAP CORAL programs as
code procedures. But it is unclear what, if anything, translated
any floating constants in the CORAL code into 18/18 format. At the
time we did consider converting constants from 28/7 to 18/18 within
in the Loader, but Roger reports that they were not suitably tagged.
I suppose constants could be set up using a 2-integer overlay,
which would not be too difficult in the 18/18 unstandardised format.

The fact that neither the stack or floating point hardware were
used extensively deprives me of a constraint, when reading the
manuals, that "it must mean this, because that would not work".


Evolution.
----------

Trying to describe the 920ATC instruction set is somewhat like
building a replica of Cody's "Flyer" or building a replica the
EDSAC computer, "which version do you want". I felt it unwise
to give details in the BCS/CCS "Our Computer Heritage" and I've
never felt inclined to modify my "SIM900" simulator to include it.

I've taken "Specification for a CPU 920ATC (with parallel peripheral
interface), Type No. 59-031-01", 240/A01-03/00544, MASD, March 1977,
issued to RAE Farnborough in July 1977, as the most useful source.
On page 4, this talks of "the servicing & maintenance policy to be
adopted in respect of the 'B' model units" and I've taken it that
this the specification of "the 920ATC 'B' model".

I also have the earlier "Electrical Design Specification for the
920 ATC (PCB Version), Working Paper 240/107/NRD/316/011, Issue A,
MASD, 25/9/73", which possibly may describe what presumably was
not known as "the 920ATC 'A' model" until the B model was mooted.
In this model there is no hardware B-register (the B-registers
are entirely in core), and the mode bits differ significantly.

I have my own notes & correspondence in a buff wallet file "920 ATC",
covering from January 1972 to August 1976, finishing with a note
saying that "The 920 ATC Floating Point Saga continues in the Sonics
Cross Compiler files, especially item 828" ("Arguments for & against
In-Line Code for Floating Point on the 920ATC'B' models") and then
in "MASD KALMAN", a maroon ring-binder, covering work that I did
between April 1977 and September 1978 (on a freelance basis,
after I'd left Elliotts at the end of 1976).

Included in the "920 ATC" wallet file is a 2-sheet blueprint,
"920 ATC Microprogram, (B model Sonics)", Drawing 40SK2260, dated
June 1975. It provides a level of detail not available in the above
specifications, and I have used it for the explanation of the floating
point instructions below, although there are several aspects of
the implementation which cannot be deduced from the microprogram.
I studied this in detail in 1977 and found problems with the
floating point hardware which I recorded in "MASD KALMAN".

Some modifications were mooted subsequent to the B-model:
   Correcting the implementation of floating point zero,
      although I'm not sure which problems were to be fixed and how.
   Changing function 0 in floating mode so as not to alter Q.
      My Kalman routines state that they assume this to be the case,
      suggesting that it may not have been the case when I wrote them.
      At a chance meeting (on 11th December 2016) with Ken Edwards
      (once of RAE), he tells me that he wrote a Kalman filter for
      the 920ATC and was unaware that there was any floating point.
      Its is possible that the routines, which I'd been paid to write,
      were never offered to Ken because function 0 was never altered.
   Changing function 10 (and 15 7170) in floating mode
      so as to add +2 rather than +1.
   Changing B-modified function 11 so as to be a "far call"
      (only when enabled by yet another mode bit?).

However, I do not know which if any of these changes was ever
implemented. Erik's 920ME may give us some clues, (although it uses
an AMD chipset like the MC1800, not the 54181 ALU which I think the
920ATC used, so the microprogram will be different). And just as
I was unaware of the success of the 12-bit machines after I left,
there could have been other developments that I'm unaware of.


Q-Register Corruption.
----------------------

On 920A, functions 6,7,8,9 all corrupt Q,
   (I forgot to mention function 6 in "Our Computer Heritage" E5X3-1).
On 920B, just functions 7 & 9 corrupt Q.
On 920M, only function 7 corrupts Q.
On 920C, none of these functions corrupt Q.

But on all of these 920 variants,
B-modification (of any function) corrupts the Q-register during
the operand address calculation, before the instruction is obeyed.
So /12 is OK (taking operands from A & store, with result in A & Q),
but /3 is meaningless (it corrupts Q then stores it).

Although I have not yet found it in the specification,
my own notes say that, on the 920ATC, "it seems that
B-modification does not corrupt Q", so /3 is OK.
My notes go on to point out that this is essential,
because the exponent of floating point numbers is held in Q.
If B-modification corrupted Q, it would be impossible to
B-modify floating-point instructions (except function 4, load).
I made extensive use of this in my optimised matrix routines.

(Note that this issue is distinct from the deliberate setting
of the Q-register by function 0, load B-register, and whether
this is sensible in floating point mode).


Program Counters and Store Size.
--------------------------------

On 920A 920B 920M, the SCRs (Sequence Control Register = program counter)
for the four priority levels are held in core store locations, 0 2 4 6,
and at the start of every instruction the appropriate Register has to
be read from core, incremented, and written back.

On 920C and 920ATC, the current level's SCR is held in what is described
as a "hardware" register, (held on flip-flops, rather than in core store)
making most instructions faster. When an Interrupt or Terminate causes a
level change, the hardware register has to be written to the old level's
SCR in core, then the new level's SCR is read into the hardware register.

So whilst it was possible to write into the current level's SCR,
on 920A 920B 920M, (perhaps to implement a switch jump), this would
have no effect on a 920C or 920ATC. But nobody ever did this. It's
OK to write to the SCR of other levels, typically to initialise them.

The 920B supports up to 65536 words of store, addresses wrap
round at 16 bits, and the top two address bits are ignored.
The 920C supports up to 131072 words of store, addresses wrap
round at 17 bits, and the top one address bit is ignored.
I think that this difference gave rise to some problems
running Algol on the 920C (because Algol uses the top two
bits to distinguish different sorts of parameter addresses).

The 920ATC supports up to 262144 words of store.
I think that this difference could give rise to some problems
running Fortran on the 920ATC (because Fortran uses the top
bit to distinguish direct & indirect parameter addresses).
On 920ATC, addresses can be made to wrap round at 17 bits
when only 17 bits are needed, by unlinking two pins on the
rear CPU connector, (see B-model specification page 17).

On 920C and 920ATC, the top bit of the Sequence Control Registers
in core is used to hold the H-bit, which is the Address Mode
Register of that level, Relative-v-Absolute. This leads to the
statement that, even though the 920ATC can have 262144 words
of store for data, only the first 131072 words can hold program,
(see B-model specification page 6 (C)).

On 920ATC, the hardware SCR is described as having 18 bits,
excluding the H-bit which is described as a separate register.
This suggests that a program which runs entirely on top level
could actually have code anywhere in a 262144-word store.
(Function 11 saves the usual 13 N-bits of S in store and 4 F-bits
rather than 5 bits in Q, so intermodule calls would need care).

In my "920 ATC" wallet I have an Internal Communication from
Howard A Jones dated 26th August 1976, which notes that
"The specification for the 920ATC restricts the amount of program
storage to the lower 128K words but allows extension of the memory
for 'data only' storage up to 256K words. This is compatible with
the 905. Under special circumstances (i.e. level 1 operation only)
the current hardware design permits code to be put above 128K.
This is not compatible with the 905, and is a facility which
would appear to have limited applications". This is all true.

Howard then considers how to "prevent code being placed above 128K",
by changing either the hardware or the Coral linker, and requests
"Modify the Linker such that software produced will run on both
905's and existing 920ATC's" [his apostrophes]. This is very odd.
905 software will run on a 920ATC (by unlinking two pins on the
rear CPU connector if it happens to use the sign bit of addresses).
But 920ATC software which puts program beyond 128K won't run on a
905 anyway, just like 920ATC software which uses the stack or the
floating point hardware. At most, a linker warning is needed,
that the program will run only on top level only on a 920ATC.

As recorded in emails between myself & Andrew Herbert in May 2015,
there is a snag in the SCR implementation on the 920C & 905.
Operating the JUMP (or JUMP II) key places the 920C or 905 onto top
level, but it does NOT save the hardware SCR into the appropriate
core SCR before setting the hardware SCR to the address on the WG
keys (or 8181). On 920A 920B 920M the interrupted SCR is not lost,
being already in core. So some correct 920A 920B 920M programs fail on
920C=905, specifically my simulator for 12-bit 900s (for which I issued
a patch). I don't know whether the 920ATC gets this right or not.

The onboard store size of the 'A' model was 16K.
The onboard store size of the 'B' model was 32K.


B-Registers.
------------

On 920A 920B 920M 920C, the B-Registers for the four priority levels
are held in core store locations, 1 3 5 7. Writing to these directly,
rather than using function 0, is generally deprecated, (because it
constrains the code to a given level and requires the code to be in
the first 8192 words or to use absolute address mode), but it is
certainly not illegal. It happens within initial instructions and
in various loaders, and it is explicitly described in the 920M & 920C
facts cards (for example "Multiply A by B, 12 1 or 3 or 5 or 7").
The early 920ATC models were like this too.

On 920ATC 'B' model, the current level's B-Register is held in a
hardware register, making modified instructions faster, and for
compatibility the 920ATC has to explicitly detect writing into the
current B-Register address: "The contents of the current program
level B register core location will be identical to the hardware
B register at the end of every instruction" (see B-model specification
page 8 (C)), except after a RESET (when the hardware B is cleared) or
an external data transfer which includes B's core location (unlikely).
And when an Interrupt or Terminate causes a level change, the new
level's core B-Register is copied into the hardware register.

(At points where the floating-point microcode swaps operands around,
it would be faster to hijack the hardware B-Register as an extra
workspace, then recover it from core. My notes show that I'd thought
of this potential optimisation at the time, but the floating microcode
was probably inherited from before the hardware B-register model.)


Instruction Set.
----------------

The 920ATC (as given in the B-model specification)
implements the instruction set of the 920C:
    0:  Load B, Load Q
    1:  Add
    2:  Negate & Add, Load Q
    3:  Store Q
    4:  Load A
    5:  Store A
    6:  Collate
    7:  Jump if A = 0 (block relative)
    8:  Jump          (block relative)
    9:  Jump if A < 0 (block relative)
   10:  Count
   11:  Store S   (13 N-bits in store, 4 F-bits in Q, /-bit not saved)
   12:  Multiply  (leaving Q1 undefined, see notes below)
   13:  Divide    (setting Q1 := 0, A1 := 1, as usual)
   14 0    to 14 36  :  Left shift  (longer shifts undefined)
   14 2048 to 14 4095:  Block transfer into memory
   14 4096 to 14 6143:  Block transfer out of memory
   14 8156 to 14 8191:  Right shift (longer shifts undefined)
   15 0    to 15 2047:  Input from peripheral to A
   15 2048 to 15 2056:  Input to CPU depending on particular OMP used
        (No statement that the existing A is left-shifted 7 places)
   15 4096 to 15 6143:  Output from A to peripheral
   15 6144 to 15 6152:  Output from CPU depending on particular OMP used
   15 7168:  Program Terminate (save H & S, restore H & S and B)
   15 7169:  Skip if Standardised
   15 7170:  Increment B, Skip if B's N-bits = 0
   15 7171:  A := keys if fitted, else A := 8177
   15 7172:  A-to-Q:  Q(18-2):=A(17-1), Q(1):=0
   15 7173:  Q-to-A:  A(17-1):=Q(18-2), A(18):=0
   15 7174:  A-to-B:  B:=A, Q:=A (see notes below)
   15 7175:  B-to-A:  A:=B, Q:=B (see notes below)
   15 7176:  Set relative addressing (on current level)
   15 7177:  Set absolute addressing (on current level)
   -- ----:  Program Interrupt (save H & S, restore H & S and B)

There are actually some slight differences here.
   Shifts in either direction are undefined beyond 36 places.
      This limit also appears in 920M & 920C Facts Cards.
   Q1 is undefined after function 12, Multiply.
      Peter Lawrence's "Programming Compatibility of 900 Series
      Computers" says that "Q1 := 1 if A < 0 otherwise Q1 := 0",
      on the 920A 920B 920M, and indeed it is a consequence of
      the Booth's algorithm used throughout the 900 series,
      that the sign bit of one of the operands ends up in Q1.
      The 920B microprogram starts by moving A into Q and clearing A,
      then it adds or subtracts the store operand M to or from A,
      in a loop which also right-shifts A & Q. But I've just
      (12th November 2016) noticed that the 920ATC is different.
      The 920ATC microprogram starts by placing the store operand
      into Q, moving A to J and setting A to the inverse of A,
      and clearing M, then it loops adding either A+1 or J
      to M and shifting M & Q, and it finally copies M into A.
      So it looks as though the sign bit of the store operand
      ends up in Q1. I don't know which sign bit the 920C gives.
   The B-model specification page 31 says that "Interrupts will
      be permitted to become active after any instruction" whereas
      on 920A 920B 920M 920C they are inhibited after Function 0.
      When a program saves its Q-register with function 3, this will
      be right shifted one place, and if this is reloaded using the
      defined effect of function 0 or 2, a "14 1" one place left shift
      is needed to restore the original value. If an interrupt occurs
      just before this shift, the bottom bit (of the 17) will be lost,
      using the interrupt instructions published in various Facts Cards,
      which themselves use function 3 to save the lower-level Q-register.
      Inhibiting interrupts after function 0 avoids loosing this bit
      provided the Q-register is loaded via function 0 not function 2.
      Certainly on the 920M Jaguar program we used admittedly slower
      interrupt instructions which saved all 18 bits of the Q-register,
      so that we were free to load Q with either function 0 or 2,
      whichever was most efficient, function 2 being slightly faster.
      Also we were then free to use all 18 bits of Q anywhere.
      On 920ATC where the Q-register holds the floating point exponent,
      it is essential to use the interrupt instructions which save all
      of Q, so there is no need to inhibit interrupts after function 0.
      There is a slight complication that programs (like "C3") which
      use the TRACE facility do assume that function 0 cannot be TRACEd.
   As shown above, instructions 15 7174 A-to-B & 15 7175 B-to-A on
      920ATC also load the transferred value into the Q-register,
      whereas neither the 920C or 905 Facts cards suggest this.
      In my "920 ATC" wallet I have an Internal Communication from
      Noel J Turner dated 18th March 1974, stating that the 15 7174
      instruction on 905 has an undocumented side effect of placing
      A into Q as well as into B. It makes no comment about 15 7175,
      but it does note that the value loaded into Q usefully differs
      from the shifted value loaded by the A-to-Q instruction.
      I ran some tests on 8th February 1974 to check other aspects of
      the 905 microcode but I can find no evidence that I checked Noel's
      statement for 15 7174 or the equivalent statement for 15 7175.
      It is probably wise to treat the effect of 15 7174 & 15 7175
      on the Q-register as undefined, both on 920C=905 and on 920ATC.
   There is no statement that the existing Accumulator is left-shifted
      7 places before a paper tape or TeleType input; this appears
      in the OMP specification, and is presumably implemented within
      the OMP which does have an 18-bit interface to the 920ATC.
      (It would not be possible to implement this shift within the
      paper tape station of a 920B which only has an 8-bit interface).

It is explicitly stated that a function 15 instructions can be
B-modified "from one 15 instruction to another". I believe that
a similar statement is equally true for function 14 instructions,
since both statements are true across the rest of the 900 series.
Thus left shifts can be B-modified into right shifts & vice versa,
(a useful property which is not true on many other makes of computer).

The 920ATC (as given in the B-model specification)
implements these extra instructions:
   15 7678:  Reset Overflow flag O/F, Skip if Overflow was not set.
   15 8079:  Input fault holding register (level 2 interrupts)
   15 8090:  Reset cycle monitor
   15 8094:  Reset system fault
   15 8107:  Generate internal interrupts
                Level1 := A1, Level2 := A2, Level3 := A3
   15 8111:  Input level 3 interrupts
   15 8123:  Set external interrupt enable register
   15 8126:  Set system fault (causes level 2 interrupt)
   15 8143:  Read the four current-level mode bits & AP
                A18:=FLP, A17:=ASM, A16:=PAR, A15:=O/F, A(7-1):=AP
   15 8154:  Reset Accumulator pointer register, AP := 127
   15 8171:  Set store protection register (in 2K blocks within 32K)
   15 8187:  Set the four current-level mode bits
                FLP:=A18, ASM:=A17, PAR:=A16, O/F:=A15


The Mode Bits.
--------------

The 920ATC 'A' model specification of the mode bits below differs from
the 920ATC 'B' model specification of the mode bits above:
   15 8170:  FLP := 0, Reset floating point mode
   15 8171:  FLP := 1, Set floating point mode
   15 8186:  ASM := 0, Reset accumulator stacking mode
   15 8187:  ASM := 1, Set accumulator stacking mode
   15 8143:  Read mode bits:  A18 := FLP, A17 := ASM.
Although the 'A' model has no instructions for setting or getting the
PAR bit, it did have one. ("Performing any of these instructions will
register the required incrementation by setting at [sic] latch (PAR)").

All of FLP ASM PAR are described as single bits, and I know that
the early 920ATC certainly did not have an FLP bit per level.
In a meeting on 27th January 1976, it was recorded that "Only one
fixed/floating point mode flag [is] provided rather than one per
level. This incurs a software overhead in a multi level program
when changing from fixed to floating point mode" with a proposed
solution "This can be overcome by a minor hardware change [detail]
in all SP division machines without impact on delivery dates.
The opportunity would also be taken to correct a problem in the
same area on the Accumulator Stacking Mode (ASM)". A note dated
9th April 76 details the required specification changes.

I leave it as a challenge to the reader to devise interrupt code
to save the lower level A & Q without knowing what modes are
currently in use, and read and save its mode bits, before
setting the modes required for the interrupt code, and later
to restore the lower level modes and A & Q before terminating.

The 920ATC 'B' model has four mode bits per interrupt level,
which are selected from 16 flip-flops by the interrupt level
and so do not need to be saved and restored:
   FLP:  The Floating Point Mode Register
   ASM:  The Accumulator Stacking Mode Register
   PAR:  Increment Accumulator Register
   O/F:  Overflow Register
All 16 bits are set to their 920C compatible states by RESET.
I don't know if they are set by any JUMP facility in the OMP,
if not you might have to RESET before JUMP, which would be
inconvenient and error prone, and would also clear A & Q.

Thus on the 920ATC 'B' model you cannot change one of the mode
bits without knowing, or reading, the values of the other bits.
(The ability to AND and OR into the mode bits would be useful).

Note that the fastest way to zero the current level mode bits is
"6 +0, 15 8187". This is because function 6 (collate), which it
not affected by the floating mode, is significantly faster than
function 4 (load A & Q) in floating mode, and is slightly faster
than function 4 (load A) in fixed mode which has to test the mode.


Accumulator Stacking Mode.
--------------------------

The 920ATC 'B' model specification pages 17 to 19 state:

   The mode of operation will be determined
   by the state of the current levels ASM bit.
   This is loaded by instructions and read
   by program.

   The accumulator pointer register (AP) is
   used to indicate which accumulator within the
   stack is being operated on. The stack is
   held in store locations 64 to 127. AP is
   reset to 127 by CPU Reset or by a 0 15 8154
   instruction, following the first 4 instruction
   performed whilst ASH and PAR are set the AP will
   be set to 64. AP can be read at any time by a
   0 15 8143 instruction.

   The accumulator pointer AP will be incremented
   if a LOAD A is preceded by one of the following
   arithmetic functions 1, 4, 6, 12, 13 or 14
   (shift) instruction. Performing any of these
   instructions with the current levels ASM bit
   set shall register the required incrementation
   by setting PAR on the current level. An
   arithmetic function performed when ASM is set,
   and of address 0, will decrement AP and place
   the result in AP-1. Address 0 when used with
   functions, 1, 2, 4, 6, 12 and 13 shall cause
   the function operands to be taken from the top
   two stack locations. Address 0 is independent
   of the state of the modifier bit. i.e. the
   modifier bit should be 0 for normal operation.
   If it is required to access location 0 in ASM
   it may be achieved by a modified instruction.
   Specifically, the A Register shall be the top
   stack location and the operand, normally read
   from store, shall be read from the top less one
   stack location. A LOAD A instruction preceded
   by a 5, 7 or 9 instruction, shall prevent AP
   from incrementing i.e. PAR will be reset by
   these instructions.

   NOTE:
   Care is required in programming because
   certain 15 instructions affect the accumulator.

   The functions 0, 2, 3, 8, 10, 11, 14 (Block
   Transfer) and 15 will have no effect on PAR.
   Generally function 2 shall be interpreted as
   both Negate and Add and LOAD Q. If the
   function 2 is followed by a LOAD A instruction
   then the effect shall be determined by the
   state of PAR. i.e. If PAR is set function 2
   will have the effect of a Negate and Add, or
   if PAR is not set function 2 will have the
   effect of LOAD Q.

   Accumulator stacking mode will not affect the
   instruction execution times except for a
   LOAD A instruction when PAR is set.

And as noted subsequently (in the specification and herein):

   The facility of using floating point arithmetic
   when in accumulator stacking mode will be provided.

The fact that no timings are altered by Stacking Mode,
except for function 4, makes me suspect that the actual
implementation differs from the description. I think it
likely that the "top" accumulator is not held, or duplicated,
in the stack, but is only held in the normal A-register.
(Only the microcode for function 4 tests PAR, see below).

So my understanding of this is that, in ASM mode:
   functions which place a result into the Accumulator set PAR,
   functions which consume a result in the Accumulator clear PAR,
   and the remaining functions do not alter PAR.
If an unconsumed result is about to be overwritten by a function
4, that function 4 pauses whilst the result is stacked; and
operands are unstacked by unmodified instructions with N=0.

It's odd that information is given about accessing location 0,
given that the effects of doing this are already not portable.
On the assumption that B-modification is permitted in stacking
mode in the normal way, and that B-modified instructions often
have N=0, I deduce that location 0 is can only be accessed using
B-modified instructions with N>0 and the B-register set to -N.

I think that the statement about function 2 is confusing.
Surely (unless in floating point mode) function 2 always both
Negates & Adds to the Accumulator and loads the Q-register,
regardless of the state of PAR, and without altering PAR.

Note that there are cases when the stacking logic can go wrong.
For example, if a value is calculated in A then shifted into Q
prior to loading A, or if a multiplication of positive integers is
performed and saved with function 3, a spurious stack can occur.

I've just (13th November 2016) found a note written by T Steve
Chubb dated 19.2.73, saying that "Since the mini-stacks are only 16
locations in length [per level] then an overflow is quite likely".
Of course there is no need to statically partition the stack in
into equal parts, or even into unequal parts, provided each level
uses it correctly. But there is NO detection of stack overflow
when the 64 word total is exceeded. I presume that the stack
pointer then wraps back from 127 to 64 as it does initially.

Although AP is described as a 7-bit register, I presume that
it is really a 6-bit register plus a hard-wired +64 bit.

The use of locations 64 to 127 for the stack is in direct conflict
with their use on the 920C for controlling autonomous transfers,
for example RADOS disk transfers use locations 88 to 95.


Floating Point Mode.
--------------------

The 920ATC 'B' model specification pages 19 to 21 state:

   This mode of operation will be determined by
   the state of the current levels FLP bit.

   The notation "a2^q" shall be read as meaning
   that the A register contains the mantissa,
   which shall be interpreted directly as a
   fixed point 18 bit binary fraction using
   'two's complement' notation for negative
   numbers. The Q register shall contain the
   exponent, the contents being interpreted
   as a binary integer.

   The notation m2^m+1 shall be interpreted
   in a like manner, m being the contents of
   store location of address M and (m + 1) being
   the contents of store location (M + 1). The
   store address of the operand (M) shall be
   formed as for fixed point working.

   Numbers will not be standardised at the end
   of instructions since this only results in a
   loss of time. Any necessary standardisation will
   be handled at the beginning of the floating
   point instructions.

   The floating point microprogram will operate
   to preserve maximum accuracy consistent with
   18 bit operation. However, for the divide
   operation, if the dividend is standardised on
   entry to the instruction, it will be destandard-
   ised by a shift one place right before
   calculation commences. The divisor is
   standardised by the microprogram before division
   takes place.

   If division by zero is called, then the
   mantissa of the quotient will be as the
   dividend but the exponent is set to:-
   Quotient Exponent:=
   Dividend Exponent - Divisor Exponent + 18

   If an add or negate and add instruction is
   called up with either of the mantissas
   being zero then the answer given will be the
   non-zero mantissa together with its exponent -
   unless however the exponent of the zero
   mantissa is more than 63 greater than the
   non-zero mantissa's exponent when the answer
   given will be the non-zero mantissa with an
   exponent equal to the larger exponent minus
   36 to 39.

   The facility of using floating point arithmetic
   when in accumulator stacking mode will be provided.

   In this mode, since two word operands are
   involved, the stack will expand and retract
   two locations at a time. The top of stack
   location will be used for storage of the
   mantissa and the top less one location will
   be used for exponent storage. In floating
   point mode, accumulator stacking will not
   affect the instruction times except for a Load A
   instruction if PAR is set.

Page 13 of the specification may be thought to contradict this,
"The mantissa will be in the A register and the exponent held
in the Q register. Both will be expressed in fixed point binary".
However it is clear that Q represents the exponent as an Integer.

The 920ATC 'B' model specification page 30 lists the
instructions which change when in floating point mode:
    1:  Add,                     a2^q := a2^q + m2^m+1
    2:  Negate & Add,            a2^q := m2^m+1 - a2^q
    3:  Convert to fixed point,  a := a * 2^(q-M), q := M
           O/F:=1 if number is about to overflow during shifting
    4:  Load A,                  a2^q := m2^m+1
    5:  Store A,                 m:=a, m+1 := q
   12:  Multiply,                a2^q := a2^q * m2^m+1
           In multiply, the product is partially standardised
           i.e. only up to a maximum of 18 shifts will occur
           on a double length answer formed during the multiply
           algorithm in order to standardise the product before
           truncation to a single length mantissa.
   13:  Divide,                  a2^q := a2^q / m2^m+1

Page 16 of the specification explains "For convert to fixed point,
(Function 3 in floating point mode) the number of shifts of the
mantissa, contained in the A Register shall be equal to q-M" where
M = the address N field of the instruction plus the contents of the
B register if the instruction is B-modified, "always as if h = 1",
thus clarifying that the 8K module address is never added in.
Note that, as for shifts, M is used, not the contents of location M.


Lazy Standardisation.
---------------------

The argument for "lazy" standardisation is only partly valid.
It certainly only wastes time to left-shift the result of an
add or subtract, or to left-shift the result of a multiply by more
than 18 places, given an even chance that the result is only going
to be right-shifted at the start of a subsequent add or subtract.
But as a general rule, it is better to tidy up a result when it
is calculated rather than whenever it is used, on the grounds
that it should be used at least once, but may be used more often.
If several items are all to be multiplied by the same value, (for
example, by the sine or cosine of an axis rotation angle), it pays
to standardise that value, to reduce shifting after every multiply.
(The argument is similar to that for ADSL broadband: information
is uploaded only once, with a view to being downloaded at least
once, so with only limited bandwidth available, it is sensible
to make the speeds asymmetric, with download faster than upload).
Lazy standardisation also reduces division accuracy, see below.


Floating Point Microcode.
-------------------------

I have a diagram "920 ATC Processor Scheme" which shows
   A 4-word RAM containing the registers: A Q S J,
   A separate M register which can be shifted in situ,
   An arithmetic logic unit (5*54181?), taking one input from the
      selected RAM register and the other from M or M-inverse.
A Q S are known to the programmer, J & M are not.

Before the individual routines are entered, the instruction has
been read into the bus buffer, and tested for '/'. The address
of the operand (or of the peripheral, or the shift distance) has
been read into both J & M, but the operand has not been accessed.

Microcode for Floating Function 1.
   Read first store value into M;
   Increment store address in J;
   J+M+1 into J;  J-M-1 into M;  J-M-1 into J;  (thus swap J & M)
   Q+(-1-second store value)+1 into M;
   J+M+1 into J;  J-M-1 into M;  J-M-1 into J;  (thus swap J & M)
   (thus J now holds the difference of the exponents)
   J-1 into J;
   if J was not <0 before adding -1 then
      J+1 into J;
      if J was <0 before adding +1 then goto EXP_EQ; end if;
      (Q > second store value)
      -1-J into J;  J*2 into J;  J/2 into shift counter (KSBC);
         (shift counter now second store value - Q - 1)
   else
      (Q < second store value)
      J+M+1 into J;  (Q - second store value + M)
      J-M-1 into M;  (Q - second store value - 1)
         and this into shift counter (KSBC);
      Q-M-1 into Q;  (second store value)
      J-M-1 into M;  (first store value)
      A+M+1 into A;  A-M-1 into M;  A-M-1 into A;  (thus swap A & M)
   end if;
   (Larger exponent in Q, Smaller-Larger-1 in shift counter)
   (Mantissa with larger exponent in A, other mantissa in M)
   loop
      Increment shift counter (1TSC);
      if A not standardised then
         Q-1 into Q;  A*2 into A;
         if SM1 then goto EXP_EQ; end if;  (test shift counter)
      else  (no need to test if standardised again)
         loop
            M/2 into M;
            if SM1 then goto EXP_EQ; end if;  (test shift counter)
            Increment shift counter (1TSC);
               (assumed to take effect after above test)
         end loop;
      end if;
   end loop;
EXP_EQ:
   A+M into M;
   if signs of A & M before the add agree and after it disagree then
      Q+1 into Q; M/2 into M;
   end if;
   M into A;
I have a one-page sketched flowchart of floating functions 1 & 2,
(which I think came with the microprogram), showing floating add
in two symmetric parts, to be entered according to which exponent
was greatest. This clearly disagrees with the microprogram above,
in which the times taken by X+Y and Y+X can differ substantially.

Microcode for Floating Function 2.
   -1-A into A;  A+1 into A;  (thus A := -A)
   if A was not <0 before adding +1 and A now is <0 then
      Q+1 into Q;  A/2 into A(1-17);
   end if;
   goto Function 1;
So if A was &400000, before and after negation, the overflow
is corrected, by setting it to &200000 and incrementing Q.
The reverse is not implemented (of standardising &200000 before
negation from &600000 after negation to &400000 and decrementing Q).

Microcode for Floating Function 3:
   J+1 into M;  (spurious step shared with function 5)
   -1-Q into M;
   J+M into both J and M;  (target exponent -1-Q)
   -1-J into J;            (Q-target exponent)
   Q+M+1 into Q;           (target exponent)
   if J<0 then
      (Q < target exponent)
      A into M;
      loop
         J+1 into J;  M/2 into M;
         exit loop when not J<0;
      end loop;
      M into A;
   else  (J >= 0)
      (Q >= target exponent)
      J-1 into J;
      loop
         exit loop when J<0;
         J-1 into J;
         if A is standardised then
            set overflow flag and exit loop;
         end if;
         A*2 into A;
      end loop;
   end if;
My understanding of this instruction is that it replaces the floating
point number in A & Q with a (usually) different representation of
the same floating point number, but with the given target exponent.
If overflow occurs, the flag is set, and the shifting of A stops, but
the microcode shows that Q has already been set to the target exponent,
so the values of A & Q no longer correspond, and so the number is lost.
The microcode confirms that the shifts can go in either direction,
and there appears to be nothing to limit the shifts to 18 places.

Microcode for Floating Function 4, with PAR false.
   Read first store value into A;
   Increment store address;
   Read second store value into Q;

Microcode for Floating Function 4, with PAR true.
   Q+(-1-first store value) into M;
   Increment AP;
   Q-M-1 into Q;  (so Q now holds the first store value)
   Q+M+1 into M;  (so M now holds the original Q)
   Write it to stack;
   Increment AP;
   A into M;
   Write it to stack;
   Q into M;
   M into A;  (so A now holds the first store value)
   Increment store address;
   Read second store value into Q;
I have just realised for the first time (21st November 2016) that the
stacking operations above place A & Q into store in the reverse order
to that used by Function 5. That's OK, it assumes that they will be
unstacked by an instruction (with N=0) which unstacks A before it
unstacks Q (and which ignores "increment store address" herein).
But, whilst fixed point values anywhere within the stack can be
accessed via their (non-zero) store address if required, floating
point values within the stack cannot be meaningfully accessed.

Microcode for Floating Function 5.
   Write A into first store location;
   Increment store address;
   Write Q into second store location;

Microcode for Floating Function 12.
   Set shift counter (KSBC);
   -1-A into both A and M;
   Read first store value into M;
   A+1 into A;  A-1 into A;
   if A was not <0 before adding +1 and A was <0 after adding +1 then
      (it was &377777 then &400000, so the original A was &400000)
      Q+1 into Q;  A/2 into A(1-17);
   end if;
   Q+M+1 into Q;  Q-M+1 into M;  Q-M+1 into Q;  (thus swap Q & M)
   J+1 into J;                          (increment store address)
   J+M+1 into J;  J-M-1 into M;  J-M-1 into J;  (thus swap J & M)
   J-(-1-second store value)-1 into J;         (sum of exponents)
   0 into M;
   loop
      Increment shift counter (1TSC);
      Q/2 into Q;  M/2 into M;  with carry from M into Q;
      if Q1 xor S0 then
         exit loop when SM2;
      else                 (not (Q1 xor S0))
         if Q then         (M := M - original A)
            A+M+1 into M;
         else              (M := M + original A)
            -1-A into A;  A+M into M;  -1-A into A;
            (this is particularly messy, fixed-point multiply
            can hold A in J, and so does these 3 steps in 1).
         end if;
      end if;
      exit loop when SM1;
   end loop;
   Set shift counter (KSBC);
   Increment shift counter (1TSC);
   M into A;
   loop
      exit loop when A standardised;
      Increment shift counter (1TSC);
      J-1 into J;
      Q*2 into Q;  M*2 into M;  with carry from Q into M;
      M into A;
      exit loop when SM1;
   end loop;
   J into M;  M into Q;

Microcode for Floating Function 13.
   Set shift counter (KSBC);
   Read first store value into M;
   J+1 into J;                          (increment store address)
   J+M+1 into J;  J-M-1 into M;  J-M-1 into J;  (thus swap J & M)
   loop
      exit loop when J standardised;
      Increment shift counter (1TSC);
      Q+1 into Q;  J*2 into J;
      if SM1 then
         Q+(-1-second store value)+1 into Q;  goto END_DV;
      end if;
   end loop;
   if A standardised then
      A+M+1 into A;  A-M-1 into M;  A-M-1 into A;  (thus swap A & M)
      Q+1 into Q;  M/2 into M;
      A+M+1 into A;  A-M-1 into M;  A-M-1 into A;  (thus swap A & M)
   end if;
   Q+(-1-second store value)+1 into Q;
   M*2 into M;  -1-J into J;  A into M;  0 into A;
   Set shift counter (KSBC);
   loop
      if A19 xor M18 then  (M := M - denominator)
         Increment shift counter (1TSC);
         J+M+1 into M;  A*2+1 into A;  M*2 into M;
      else                 (M := M + denominator)
         Increment shift counter (1TSC);
         -1-J into J;  J+M into M;  -1-J into J;  A*2 into A;  M*2 into M;
      end if;
      exit loop when SM1;
   end loop;
   A*2+1 into A;
END_DV:
As noted earlier "if the dividend is standardised on entry to the
instruction, it will be destandardised by a shift one place right
before calculation commences. The divisor is standardised by the
microprogram before division takes place." But the dividend
is not standardised before that one place destandardisation.
As a consequence, forcing the 1 into the least significant bit of the
result can have an unexpectedly large effect on what otherwise might
be an exact result. I have a worked example in my "920 ATC" wallet
of dividing 500 by 10: when dividing (A=500, Q=17) by (A=10, Q=17)
the divisor is standardised to (A=81920, Q=4) giving a result of
(A=801, Q=13) or 50.0625. If the dividend were almost standardised
to (A=64000, Q=10) the result would be (A=102401, Q=6) or 50.0005.
Likewise dividing 1 by 3 can give results between 0.2500 & 0.3333,
depending on how well standardised the 1 is.


Much Ado About Nothing.
-----------------------

I was familiar with the software floating point interpreter QF,
for use in SIR assembly code programs, well before the hardware
floating point for the 920ATC was first mooted, and I was aware
that, when two numbers are added or subtracted, the mantissae
first have to be aligned, by shifting the smaller number right
a number of places determined by the difference of the exponents.

I was also aware that this could generate shifts well in excess
of 36 places (beyond which further shifts have no effect), but
because the 903=920B allows shifts up to 2048 places, this never
was a problem (just a waste of time) using the packed 28/7 format.
Using the unpacked 35/18 format there was some risk of generating a
block transfer instruction, but I'm not aware that this ever happened.
The only issue of ALGOL for which I have the source potentially has
the same problems, calculating expressions using the 35/18 format,
although generally using variables unpacked from the 28/7 format.
I have a note which states that shifts on a 920M are only valid up to
48 places, and the various facts cards say that shifts are limited to
36 or 48 places. In recognition of these problems, later versions of
the floating point software (in the FORTRAN issued with RADOS, in
the version of QF which we used with CAP CORAL, and in my 900 BASIC)
took appropriate action when the exponents differed by more than 36.

So when I was first told of the 18/18 format, I was worried that
this might be implemented naively, leading to shifts, to align
the operands in Add and Negate-&-Add, of up to 2^18-1 places
(even though the Accumulator is all 0s or all 1s after 17 shifts),
and taking some seconds (during which interrupts would be ignored).
The engineers offered to avoid this by interpreting the required
shift distance modulo 64, but of course this would lead to gross
errors, for example adding 2^-64 to 1 would give 2. They could not
understand why anyone would want to add two numbers whose exponents
differed by 64 or more. I had to explain that if we could predict the
exponents when we were writing the code, we wouldn't need floating point.
(In my "920 ATC" wallet I have an undated Internal Communication
from Tony Acton: "At present during Add or Negate & Add instructions
in FLP mode the exponent difference must be restricted to 6 bits.
Would a 12 bit difference be satisfactory i.e. exponents limited
to 11 bits?").

There is one important difference between the 920ATC hardware and
all of the software packages. In the software, zero is represented
with a zero mantissa and the exponent is of no consequence, and the
arithmetic routines test for zero and take special action where needed.
The 920ATC Function 7, jump if zero, does not have a floating point
variant, it just tests the accumulator in fixed or floating mode.
(In fact the 920ATC hardware has no easy way to test the accumulator
for zero: Function 7 jumps if the accumulator is not negative but
the accumulator minus one is negative). So from this perspective,
920ATC floating zero has a zero mantissa and the exponent is of no
consequence. But the hardware arithmetic routines do not test for zero.
(Whilst explicitly adding or subtracting zero is an unlikely operation,
subtracting a value from zero to obtain its absolute value is common).
It follows that the exponent of zero has to be a large negative number,
to ensure that the other operand is not right-shifted prior to an add
or subtract. (In my Kalman routines, I used an exponent of -65536.
Whilst -131072 might seem logical, there is no overflow checking when
the exponents are compared). But as a consequence, the zero itself
will be potentially subject to a long right shift.

So the earlier software packages contained latent errors regarding long
shifts but they had little impact due to the special treatment of zero,
whereas on the 920ATC, the lack of special treatment for zero guarantees
that there is a long shift problem. I was assured by the engineers that
they had fixed this problem, possibly by detecting when the exponent
difference in Function 1 is greater than 18 (or perhaps greater than
32 or 64), and limiting the value written into the shift counter to this
value, (which would make more sense than interpreting it modulo 64).

Some uses of the shift counter are clear. It is probably
   only a 5 or 6 bit register. It set (to something) by KSBC,
   incremented by 1TSC (or is it ITSC?), and tested by SM1 (& SM2?).
It is used to control the main 18-step loops in Multiply & Divide,
   and to limit the post-standardisation in Multiply and the
   pre-standardisation in Divide to 18 shifts, when the value is zero.
It is set & tested by Add (and Negate & Add), although I can see nothing
   in the microprogram to distinguish this setting (ideally to the lower
   of 18 and the exponent difference) from the other uses which set 18.
The J register, rather than the shift counter, is used to control
   the loop in floating Function 3. I suspect that this instruction
   could generate arbitrary long right shifts for any accumulator
   value and arbitrary long left shifts if the accumulator is zero.

There is a further problem with zeros (which had still not been
resolved when I left). Explicitly coded zeros can be written with
a large negative exponent, but zeros arising from calculation
inherit their exponent from the operands. Thus, in W:=(X+Y)+Z,
if X and Y happen to be equal and opposite, X+Y will have a zero
mantissa, but its exponent could be greater than that of Z, thus
incorrectly forcing Z to be right shifted before adding it to zero.
The best that can be said of this is that it could be viewed
as no worse than an error in the bottom bit of either X or Y
(making X+Y non-zero, and shifting Z accordingly).

Also the 920ATC 'B' model specification pages 19 to 21 (given above)
indicates that these big zeros will give incorrect results, if
"the exponent of the zero mantissa is more than 63 greater than the
non-zero mantissa's exponent when the answer given will be the non-zero
mantissa with an exponent equal to the larger exponent minus 36 to 39."


Floating Point Convergence.
---------------------------

This section is not specific to 920ATC, it is also relevant
to the software floating point package QF, as used in 900 SIR,
ALGOL and FORTRAN. Whether in 18/18, 28/7 or 35/18 format,
these all use 2's complement mantissae, whereas many other
floating point implementations use sign-&-magnitude mantissa.

When a positive mantissa is shifted right a long way it becomes
zero, whereas a negative mantissa never becomes zero, no matter
how far it is shifted right; it becomes and then remains all 1s.

When it first becomes all 1s it is still correct, representing
a negative value somewhere between the bottom bit inclusive an
zero exclusive, depending on the value of the bits shifted out.
But after another shift, the value represented is between half
of a bottom bit and zero, and so should be replaced by zero.

In 900 BASIC, which effectively uses 29/8 & 35/13 formats,
I've been careful to do this, and I've documented an important
consequence of this within the user manual, which I was able
exploit when implementing the mathematical functions within BASIC:

"When two numbers are added or subtracted, if one is negligible
compared with the other, the result is exactly as if the one number
were zero (regardless of the signs of the numbers). This property
of the floating point arithmetic enables the summing of series
to be terminated as soon as the term has no effect on the sum;
without waiting until the term itself underflows to zero,
and without resorting to a machine-dependent "epsilon"."

From the RADOS source files for the FORTRAN library, I can see
that this 900 FORTRAN limits right shifts to 36 places, likewise
for a version of QF interpreter used in 28/7 format with CAP CORAL,
shifts are limited to 32 places. In both cases, operands requiring
more than this many right shifts are treated as if zero.

But in the 920ATC hardware, in the SIR QF interpreter, and in the
only issue of the ALGOL interpreter for which I have the source,
no special action is taken, so negative values, however small,
always cause some negative drift.


Initial Instructions.
---------------------

920A, 920B=903, 920M, 920C=905, all have built-in Initial
Instructions, which appear to occupy locations 8180 to 8191,
and hide the core store locations in that address range.
Sometimes, and always on machines with more than 8K of store,
they can be turned off, to avoid having a gap in the store,
either with the program terminate instruction 15 7168, (or,
but only on 920C=905, by selecting absolute addressing, 15 7177).

Programs frequently use multiple entry points, to select options
and to initiate actions, especially when a no TeleType is available,
and these reactivate initial instructions, even though entry point
8181 for initial instructions has not been selected. Which is why
the 16K versions of ALGOL, FORTRAN, and MASIR make more use of
the assumed TeleType to select options and actions.

The 920ATC 'A' model specification section 1.5 "Program Loading"
states that "Programs will be loaded from punched tape via the
OMP connector. Initial instructions will first be loaded into
memory under control of the OMP or program loading unit. Control
will then be transferred to the initial instructions program
for reading in the tape."

Thus, initial instructions are initially copied from the OMP
into core store locations 8180 to 8191, overwriting whatever
is there, but then they may themselves be overwritten by the
actions of the loaded program, with no need to "turn them off".

I can find no mention of initial instructions in the 920ATC
'B' model specification, so I assume that by then they were
viewed as entirely a matter for the OMP.


Operators Monitor Panel Options.
--------------------------------

The 920ATC 'B' model specification Appendix B states
"Four OMP options are available with different interface facilities
for use with Paper Tape Stations, Teletype, VDUs etc."

OMP type 25-017-02 interface to PTS 240/131-03/0231/A/G01.
   Supports 15 2048 (with 7-place shift) and 15 6144 only.

OMP type 25-017-01 interface to PTS 240/131-03/0231/A/G01 [plus]
One V24 interface to device such as a Teletype at 110 bits/sec.
   Supports 15 2048 & 15 2052 (with 7-place shift) and 15 6144 & 15 6148,
   also 15 2049, giving teleprinter status only, in bits A3 & A4.

OMP type 90-168-01 interface to PTS 240/131-03/0231/A/G01 [plus]
Four V24 interfaces for 4 devices at any combination of rates
from 9600, 4800, 2400, 1200, 600, 300, and 110 bits/sec.
   Tape input  at 2048, V24 inputs  at 2050 2052 2054 2056,
   Tape output at 6144, V24 outputs at 6146 6148 6150 6152,
   Status at 2049 2051 2053 2055, Control at 6145.

OMP type 25-017-04 = OMP type 90-168-01 in free standing enclosure.

In my "920 ATC" wallet file, I have an Internal Communication
which I wrote on 1/10/75, querying whether these OMPs had
the usual Tape-v-TeleType Override switches. The answer was
essentially "No, but we could provide them on a separate panel".


*** EOF ***