This report was prepared in 1997 by John H. Letcher, Professor of Computer Science of the University of Tulsa and President of Synergistic Consultants Incorporated.
The primary purpose of this study was to answer the question, is it possible to employ 100BaseT ethernet to provide all of the interprocessor communication in a real-time multiprocessing system built out of distinct modules? It was also the purpose to state the loss in efficiency that this configuration would produce over that using of more elaborate and expensive alternative approaches, such as the use of Symmetric MultiProcessing (SMP) using an operating system such as Windows NT, and the use of multi-master busses interconnecting the processors.
The answer to the question of ethernet practicality is a qualified - yes. For stable system operation, the use of contention in communication protocols should be avoided. Establishing a protocol for messaging (i.e. taking turns or a slave processor responding only to active polling) almost defines away the possibility of a message packet collision. This allows for the specification of a guaranteed stable system.
The use of ethernet as the mechanism for interprocessor communication degrades each computer by at most a few percentage points over the use of the more complex and expensive multi-master busses which could provide the interconnection of the system processors. Even in a system with as few as a half dozen processors, since the overhead imposed by ethernet communication is only a few percent, the ethernet distributed system offers more computer computational power than a single system with SMP. Furthermore, the extendibility of the distributed system is readily apparent.
Finally, the wall clock time required to carry on ethernet communication (over that of multi-master busses) will provide a delay in processing of at most one period of the system clock (1/60 sec.) over that possible with the more expensive busses. This author does not see any harmful effects of this slight additional delay.
The design specification for a stable real-time system is outlined in this report as a way to prove the practicality of using ethernet techniques for interprocessor communication. The calculated timing diagrams show a very large margin of safety (with regard to timing) that is offered by this proposed system: With a defined communication load of under 10,000 bytes total in both directions, the processor burden is only a few percent of the total time available. Furthermore, the total time required to carry out all communication is less than ten percent of the available time; that is, from the start of any period of the clock, the communication is finished before ten percent of the period has been used. This offers a huge margin for error to correct for almost all interprocessor communications errors.
For the purposes of this study, it is taken as a given that there is a need for a computer system that performs a set of real-time tasks, each of which is coded in the form of a procedure or process. These tasks can be organized so that each of the processes is called once and only once during the period of the primary system clock (1/60 sec.) Each process can be characterized by a number (expressed with the dimensions of time) that is the maximum amount of time that is required to execute the process, no matter what data are presented to it. These times will be called the Processor Characteristic Time (PCT).
Assuming that the total demand of these tasks - the sum of the Characteristic Times, if run on a single processor, exceeds the time allotted for a single period of the clock (16666 usec.), then the question arises, how may multiple processors (computers) be used to make up for the inability of one processor to accomplish the job at hand within the proper time, with complete assurance that all processing required is actually accomplished within the period of primary clock. Furthermore, there must be assurance that this processing is accomplished within the period of each and every period of the primary clock by all processors.
Although there are many computer configuration possibilities to implement this system, we will choose to examine only three choices, namely 1) the use of multiple processors in a Symmetric Multi-Processing (SMP) mode using an operating system such as Windows NT, 2) the use of a set of independent processors, each with a local memory, all sharing a common multi-master bus and a block of shared memory, and 3) a group of independent processors interconnected only by means of an industry standard networking bus. For this study we will consider the use of 100BaseT ethernet techniques.
Although, in general, real-time computer systems may take many forms, we choose to consider only one special case of such systems. We choose to consider only the case of a system comprised of a single process, called the Host, in a master/slave relationship with a set of slave processes called Clients. The operation of this system takes place in regular cycles each with a period of 1/60 sec. Interprocessor communication takes place in the form of messages which are blocks of data passed between the local memory of one process to that of another.
Each process must complete all of the calculations and perform all of its expected actions within the allowed time frame, the period of the system clock. The inability to do so constitutes a system failure. We shall also refer to this situation as system instability. A stable system is therefore a system that never misses a time deadline and performs all of the actions expected of it (the required tasks) experiences a failure by any processor in any time period.
All processes must be stable for the system as a whole to be stable.
The system shall be organized so that one processor, called the HOST drives all of the operations of the entire system. The other processors called CLIENTi, i=1,...,NC where NC is the number of client processors receive a single data block from the Host, perform calculations and take actions appropriate to the process and at the appropriate time, return a block of information to the Host. Every period of the primary clock stands alone functionally in each time period; that is, each process receives at least one block of data, performs calculations and returns at least one block of data to be used by other processors. During each period of the clock, each process is using exogenous data acquired in this time period only. Of course, each process stores data for future use in its own local memory area to be used for any purpose needed by the process.
Back door communication from Client to Client is not allowed. If a Client needs data, it must acquire it by communication of this need to the Host by the conventional message blocks.
In the design specification process, after the function of each process is decided, it is possible to characterize each process with a Characteristic Time (defined to be T0 for the Host and Ti, i=1,...,NC for the Clients). Again, the Characteristic Time is the maximum amount of time that the processor will use each time the process is executed, no matter what values of the data are presented. It will be possible to show the conditions of stability (meaning the lack of failure) for options 1 and 2, above. However, for option 3, making statements with regard to stability must be qualified by limitations of the timings of the interaction between processors. This is to say that it is possible to organize the timing relationships to induce instability.
The system must be specified and implemented so that there exists no possibility of system failure, unless there is a serious hardware malfunction or there is double occurrence of an improbable noise corruption of an interprocessor message.
Fast ethernet is a broadcast technique that employs a transmission media and controller cards which are mass produced and are very inexpensive. All ethernet conforms to the IEEE 802.3 Standard specification. Earlier versions of ethernet transmitted at the rate of ten million signal bits per second, used inexpensive coaxial cable, and employed Manchester encoding and decoding to achieve proper clock synchronization of the send/receive units. Fast ethernet transmits at the rate of 100 Mbps. Cabling is possible using a variety of methods, such as using two Category 5 unshielded twisted pairs (UTP) or two shielded twisted pairs (STP) (100BaseTX) possible to use two optical fibers (100BaseFX) or four Category 3 or Category 5 UTP (100BaseT4) sufficient bandwidth for the interprocessor communications of the proposed system.
The ethernet controllers are intended to operate using the Carrier Sense Multiple Access with Collision Detection (CSMA/CD) technique (where each message sent is sent to all listeners, equally). Since each sender may transmit whenever it senses that the transmission medium is idle, there is a possibility that two stations will transmit at the same time. This is true because the propagation time between stations is finite so it is possible for two stations to start transmitting at roughly the same time, not knowing that the other has started transmission. When this occurs, each sending station is to send a short blocking signal (to make sure that the rest of the system knows that this is a collision). A 1-persistent algorithm is used after a collision is detected to determine when to send the message, the previous sensing of which contributed to causing the collision. At least two senders are in the same situation; that is, each wants to send. Ultimately each sending station will remain idle for a random amount of time after the detection of a collision. Hopefully one station will start transmitting long enough before another so that the second sees the first station's transmission (and waits) so that another collision is not generated. If the maximum random time is set to too small a value, the probability of a second collision will increase, yet if the time is set too long, there is a high probability of missing one of the real-time deadline; thereby generating a system failure.
The use of the binary or exponential hold-off techniques do not work well in a system with rigid (relatively short) time deadlines. The inability to have received any of the interprocessor messages within the allotted time slot constitutes system failure. Therefore, contention should not be allowed in the design of a system in which stability (lack of failure) is an absolute requirement.
Fortunately, one can impose (by software design) an environment which is positively driven by the Host, so that even with one message corrupted by noise in any time period, this system will remain stable.
The message structure used by ethernet (and Fast Ethernet) (called the MAC frame for the 802.3 protocol) consists of the following fields:
1. Preamble (7 octets of bits of alternating 0's and 1's, used to establish synchronization of the clock of sender and receiver)
2. Start Frame Delimiter (1 octet with the value of 10101011)
3. Destination Address (2 or 6 octets) (2 chosen for this system)
4. Source Address (2 or 6 octets) (2 chosen)
5. LLC Data (>= 0 bits, the message to be sent)
6. Pad (a number of octets added to ensure that the frame length has a total transmission time that is at least twice the propagation delay of the longest broadcast path. This is to ensure proper collision detection operation.
7. Frame Check Sequence (4 octets for error control)
In the contemplated real-time system, the cable lengths are extremely short
(in tens of meters, not hundreds) so that a requirement of Pad octets is
essentially nonexistent. The point here is that the number of bytes
(octets) in each message is therefore 16+
Earlier ethernet transmission employed Manchester encoding/decoding of the
data bits to produce signal bits. Here, the number of signal bits is double
the number of data bits. This is done to facilitate the synchronization of
the signal sampling clocks in the receiver circuits with that of the
sender, sometimes referred to as self-clocking, a term which is misleading.
This clock synchronization is accomplished by supplying a guaranteed
transition in the middle of the time interval that a data bit is sent.
With Fast Ethernet, it was felt that Manchester encoding was a wasteful use
of broadcasting bandwidth. Therefore, other methods have been employed for
all of the 100Base techniques. For 100BaseX, a unidirectional data rate of
100 Mbps is achieved by transmitting over a single link (single twisted
pair, single optical fiber)
4B/5B-NRZI is employed. 100BaseTX uses two pairs of twisted pair cable.
Both STP and Category 5 UTP are allowed. The MTL-3 signaling scheme is
used. (Appendix 13A of the Stallings textbook on Data and Computer
Communications, pages 451-457, is an excellent reference for the details
regarding the above mention signaling techniques.) The details of these
techniques will not be presented further, here. The importance of all of
this is to realize that it is absolutely safe to say that estimating that
the use of 16 signal bits for each 8 data bits (byte) is a very
conservative estimate. This provides an upper bound to be use in the timing
calculations, below.
All Ethernet controllers worth discussing operate in a straightforward
manner: A finite state machine on the ethernet controller card handles all
of the tasks of sending and receiving packets of data. The computer itself
need only send a short message to this microcontroller that it wishes to
send or receive a packet (a block of data)
location in memory of the message (if it is to be sent or where it is to be
placed if it is to be received)
each message. In a well designed system each message type would have a
template message prepared by the computer during initialization. Therefore
the actions required by each computer to handle each message require a very
short amount of time to perform the task (under 20 microseconds).
An important fact to realize is that the act of sending, receiving and all
other communication tasks are handled by the microcontroller on board the
ethernet card. These tasks present essentially no burden whatsoever on the
computer and its other tasks (to be sure, when the microcontroller is
moving a message (by DMA) to or from main memory, the computer and the
microcontroller share the data bus for memory transactions. The impact of
this on the computer is minimal.
The operation of an ethernet controller card is a relatively simple matter
to effect. Every action (or more properly, requested action) is taken by
copying a simple data block into the internal registers of the ethernet
controller. A part of this block is comprised of the command and a memory
location of where the packet is to be placed. The destination address must
also be specified. This command block may be issued either in an interrupt
routine or not depending upon the situation. Writing a parcel is a matter
of telling the controller where the data block is located in memory by
giving it a pointer, by giving the specification of the destination address
(which may be an individual computer or it may be designated as intended to
be sent to all stations, i.e., a broadcast message)
byte with the proper value tells the controller to act. The time to execute
all of the above is on the order of under 10 microseconds for each request.
The processor is now free to do whatever it wishes (except that the block
of data to be sent may not be touched)
the controller usually posts an interrupt to alert the computer and to
allow it to initiate another write command.
Similarly, to accept a packet of data from the outside world, the computer
sends a message to the controller (by loading various I/O registers on the
controller card) to offer a space in memory to hold an incoming data
packet. Only when a packet with the proper destination address is received
is the computer informed by means of an interrupt. Then, the computer is
free to issue a command to accept another packet. Many modern controllers
allow the posting of a number of requests without waiting for the first
request to complete.
The time required by the computer for each of the chain of interrupt
responses will be so short and efficiently carried out that each packet
sent or received represents under 20 microseconds of time by the computer.
The tasks of sending, receiving, encoding, decoding, checking for
transmission errors, etc. are handled entirely by the ethernet controller
card. These tasks do not provide any burden on the computer except for the
messaging time and the slowing of the computer caused by the sharing of the
data bus by the DMA movement of the data blocks from the main memory to the
data memory on board the ethernet controller card.
An ethernet controller card is actually two totally independent devices
one to send and one to receive. It is possible for the receive unit to
listen to the ethernet bus and accept the packet that is currently being
sent by the sender unit of this controller card. Since the packet is fully
formed, along with a Frame Check Sequence, it is possible to positively
affirm that the packet was properly sent. By using very short (shielded)
cables, it is unlikely that a reception error would occur that is not
observed by the sender. Therefore, for ordinary operation each controller
acts as its own monitor with regard to the correctness of its transmission.
In the extremely unlikely event of a reception error which has not been
caught by the self monitoring process, this can be handled by a simple
negative acknowledge message sent after all normal transmission in a period
to allow retransmission of a block, except that the retransmission of a
Host Data Block described below will be handled immediately because of the
importance of these data to each Client. In the examples given below it
will be seen there is ample time for retransmission of a single block in
any time period.
When a message block is sent the probability of occurrence of a single bit
error or short burst is small. If the system is properly implemented with
adequate power, shielding, etc., the probability that another single bit
error occurs within a short interval of time is given by a Poisson
distribution of pair-wise stochastically independent events. The maximum in
this distribution is at zero time meaning that the occurrence of improbable
independent events tends to cluster. Nevertheless, for this discussion, the
probability of a single transmission error is very small and the
probability of two of these errors within the same time period is
vanishingly small.
Within the address space of the processes, there exist a set of data
structures (blocks of memory) that are used for storage and for
communication of information among the processes. These are defined as
follows:
1.1 The Host Data Block (HDB)
employs a single processor or multiple processors sharing a
multimaster bus. For the system using ethernet communications this
block is replicated in the local memory of each of the processes
(Host, Client_1, Client_2, etc.)
1.2 Data storage for other items required by Host processing which is
local to the Host process.
2.1 The Client_i Data Blocks (CDB_1, CDB_2, etc.) This is global
memory if the system employs a single processor or multiple processors
sharing a multimaster bus. For the system using ethernet
communications, these block are replicated in the local memory of each
of the processes.
2.2 Data storage for other items required by Client processing which
is local to the Client process.
There exists a set of processes:
1. An interrupt routine that responds to the Primary Clock Tick. This
occurs every 1/60 sec.
2. An interrupt routine that responds to an interrupt caused by the
ethernet controller.
3. The program Sequencer Process, MAIN, that actively drives the
calling of each of the non-interrupt processes.
4. The INIT process, an initialization routine to fill each data block
of memory with appropriate values at startup or restart.
5. The HOST process. All of the important tasks to be performed by the
Host are performed in this single module. This software reads the
Client Data Blocks, obtained during the previous time period, performs
calculations, performs I/O operations with its local devices and
prepares the Host Data Block for sending to the Client modules. A time
period for the Host Process starts upon the receipt of the primary
timer tick interrupt.
6. The CLIENT_i Processes. All of the tasks to be performed by each
Client are carried out by this process or are directly called by this
process. I/O operations with the devices local to the process are also
performed. This software reads the Host Data Block, performs its work
and prepares the Client Data Block for transmission to the Host. A
time period for a Client starts immediately after the receipt of the
Host Data Block. Each Client must finish its work before the receipt
of the next HDB. The Client must have prepared the CDB areas of memory
within the allotted time.
Processes 1 and 2 operate under interrupt control. The others do not.
For a single processor system, the program Sequencer is a very simple loop
set up in the following manner:
Please notice that this simple sequencer defines away the problem of access
to the same memory by multiple processes that might occur in
multi-processor systems. By positively orchestrating the sending of
messages in the distributed systems we will accomplish the same end.
Each of the above processes (that is, each call to the process) is
characterized by a Process Characteristic Time. For a single processor
system to be stable, the clock period T must be greater than the sum of the
Processor Characteristic Times =(T0 + T1 + ... +TNC) plus
The HOST process must be capable of executing the following activities:
Using stored data and new data received recently from the Clients, HOST
performs calculations to generate blocks of data to be sent to the Clients.
During this time, the Host process performs actions that control peripheral
devices. The Host Process is called by the sequencer, once in each time
period.
The Client processes copies the block(s) generated by the Host, as above
and performs its tasks culminating in the preparing the Client Data Block
to be sent to the Host.
The actual time that is spent in the Sequencer (except for intentional
waiting for a timer tick) is extremely small (well under one percent of the
available processing time) so that this will be ignored.
In a well written system, the time required to service each interrupt
(timer and ethernet controller) is short. A very conservative estimate of
under 20 usec. per event shall be used in the timing examples below. Since
the interrupt routines must share the processor with the code that is not
run under interrupt control, any excess time spent in the interrupt routine
is taken away from the non-interrupt routines. In the examples given below
the fraction of time required to support timers and ethernet controllers
with interrupt processing will be at most only a few percent of the
available time. For very rough calculations, the time required by the
interrupt routines can be ignored; however for the timing diagram given
below, each contribution is shown, even though it causes the time scale to
be not to scale.
Let us assume that there exists the situation that the total execution time
(the sum of the Processor Characteristic Times of all processes plus the
execution time of servicing interrupts) exceeds the allowed time period, T.
We have no choice other than to use a faster processor or to use multiple
processors specified so that each processor is capable of finishing its
work in the allowed time period. This next step is to distribute the
processing load over a number of processors. These processors can be
interconnected in various ways:
1. Using a local bus type interconnect which is employed in a variety
of commercial motherboard designs. The hardware would employ an
operating system, such as the SMP processing capability of the
Microsoft Windows NT operating system, to distribute the processing
load among a number of processors.
2. Using a multi-master bus, such as Multibus-II, to allow sets of
computers, each with its own memory, to share a common memory address
space. The simplest configuration is where each processor has its own
local address space, yet shares with all other processors a common
memory area to be used for interprocessor communication. This area of
memory will hold the primary copies of the HDB and the CDB blocks.
Perhaps, some additional memory is used for interprocessor
synchronization.
3. Using a set of totally independent processors, each with its own
local memory and each is capable of operating as a stand alone
computer. The set of processors share, at most, a common source of
power. All interprocessor communication is carried out by the use of
an industry standard networking bus, chosen for its acceptance as an
IEEE standard, which is supplied by a variety of vendors where all
vendors products are mutually compatible with each other.
The choice of IEEE standard networking techniques depends upon a set of
complex issues. The two major contenders are ethernet and token ring. It is
well known that token ring is stable under heavy loads (which is not shared
by ethernet) but it operates at relatively slow rates (4 or 16 Mbps)
higher transmission rate of 100 Mbps offered by Fast Ethernet seems to be
the proper choice. However, we must remember that ethernet employs a
contention technique. This system will collapse under a very heavy system
load, so for this technique to be used in a real-time system with stern
short time deadlines, it means that we must impose a set of rules with
regard to who may transmit and with regard to when each may send in order
to prevent the multiple collisions that will characterize a system
collapse. This failure is not the result of bad programming, rather the
possibility of collapse is implicit in the design of ethernet.
On the single processor machine, the message blocks are areas of global
memory. Sending a block consists of nothing more than a processor executing
a memory block move. The time required to perform this task (assuming that
the memory is fast enough not to require the insertion of wait states) is
two clocks of the memory bus clock for each read and write of the data
object transfer. For an Intel 386 class computer, the data object can be
four bytes per transfer. For sending 10,000 bytes each 16666 usec., this is
effectively free. So, the use of multi-master busses or local busses for
interprocess communication will provide only a trivial overhead using
options 1 and 2, above. Therefore, in the simple calculations that follow
we will make the assumption that this overhead is free, that is, it takes
no additional time. The calculation of the processor time required to
perform ethernet communications will be our measure of the comparison of
the different techniques.
The software for the HOST and CLIENT processes can be written, easily, so
that each may be used in any of the above three configuration choices
without any software modification, whatsoever. Only the additional software
that moves the message blocks from the address space of one processor to
another will perhaps differ from one configuration to another. The time
required to execute the tasks of communication will obviously take away
from the amount available. Nevertheless, the remaining time in the period
may be use by the HOST and CLIENT processes.
The writing of the code for the HOST and CLIENT processes is by far the
majority of the effort in the implementation of these systems. No person
involved in these efforts need know which system configuration has been
chosen, because it makes no difference in how the HOST and CLIENT processes
are written. The use of ethernet communications is transparent to most of
the systems programmers.
For an ethernet based system, the Host is a standalone computer running the
simple sequencer process:
Each Client is also a standalone computer. Each of these is running a
simple sequencer process such as:
Supporting the above is a set of interrupt service routines to drive the
ethernet communications and to synchronize processing with the primary
clock. These routines are listed below.
In all of the examples given below, it is assumed that the number of Client
processes is four; i.e., NC=4.
The interrupt routines for the Host must be capable of performing the
following:
1. Receive the Primary Tick Interrupt and allow the sequencer to start
its loop. Then, send a broadcast packet called SYNC_0 immediately.
This message alerts all processors of the start of a Primary Time
Period.
2. Receive the SYNC_0 packet and set the Time State (TS) equal to 0.
3. Send a broadcast packet to all listeners. This message contains
the HDB.
4. Receive the HDB packet and then send a broadcast packet called
POLL_1. This signals that Client 1 must send its message (CDB_1), now.
The Time State is set equal to the Client number, i.e., 1.
5. Receive the CDB_1 packet and immediately send the POLL_2 packet.
This signals that Client 2 must send its message now. The Time State
is set equal to the Client number, i.e., 2.
6. Receive the CDB_2 packet and then send a broadcast packet called
POLL_3. This signals that Client 3 must send its message now. The Time
State is set equal to the Client number, i.e., 3.
7. Receive the CDB_3 packet and then send a broadcast packet called
POLL_4. This signals that Client 4 must send its message now. The Time
State is set equal to the Client number, i.e., 4.
8. Receive the CDB_4 packet and then send a broadcast packet called
SYNC_5. This signals the end of the messaging sequence for this time
period
9. Receive the SYNC_5 packet and set the Time State (TS) equal to 5.
The interrupt routines for Client_i must be capable of the following:
1. Receive the SYNC_0 packet and set the Time State (TS) equal to 0.
2. Receive the HDB packet. Copy this block immediately to working
storage.
3. Receive the POLL_i packet. Set the Time State = i. If the value of
i is equal to the Client number for this process, send a packet to the
Host containing this processes CDB_i, immediately.
4. Receive the SYNC_5 packet and set the Time State (TS) equal to 5.
Up until now, no statements were made concerning the internal organization
of the HOST and CLIENT processes. Since the HDB and the CDBs are simply
contiguous blocks of memory with no discernable content (that is, by the
messaging software), we need to characterize these blocks only by size. For
our proposed system, the Host Data Block is assumed to be 4000 bytes long.
Furthermore, we assume that each of the four CLIENT machines produce a CDB
which is 1000 bytes in length. It will be seen that the timing calculations
are sensitive only to total length and how this length is distributed among
the processors is of no significance. We are really assuming that we have
8000 bytes to send in each and every clock period.
The operation of the system is described in the enclosed timing diagram.
This is a plot of activity as a function of time. The activity is
considered to be busy (the resource is in use by this activity) or it is
not. The busy regions are shown as rectangles. The absence of a rectangle
means that this resource is either idle or is being used for other
purposes.
Shown on the timing diagram are the six activities of interest in the
telecommunications portion of this system. These are:
1. The ethernet bus. When a packet is sent by any station a rectangle
on the timing diagram shows that the bus is busy. Any attempt to
transmit a packet by another station will result in a collision and
thereby generating a garbage packet. Even if the timing diagram shows
an idle state, not every station may transmit because of the finite
propagation delay of the transmission medium. Simply checking for idle
does not offer complete assurance that a new packet can be sent
without the possibility of a collision, unless there is a rule imposed
on the software that gives priority to one process or another to send
at that instant.
2. The Host interrupt service activity is to cause the sending of a
packet or the processing to handle the receipt of a packet that has
been addressed to it. All other times not shown as busy (by the
presence of a rectangle) are available for the non-interrupt
processing that is performed by the HOST process, so the time not so
marked is the processing time that is available for the HOST process.
The HOST Process Characteristic Time must be less than the time shown
as unmarked on the timing diagram. That is, just as long as the PCT is
less than the unmarked time, the system will be stable.
3-6. The Client interrupt service activity is to cause the sending of
a packet or the processing to handle the receipt of a packet that has
been addressed to it. All other times not shown as busy (by the
presence of a rectangle) are available for the non-interrupt
processing that is performed by this CLIENT process, so the time not
so marked is the processing time that is available for the CLIENT
process. This CLIENT Process Characteristic Time must be less than the
time shown as unmarked on the timing diagram, that is just as long as
the PCT is less than the unmarked time, the system will be stable.
Only one clock period is shown. It is assumed that all other clock periods
follow precisely the same pattern.
The clock period for this system is 16666 usec. This is the total time
available for each processor. No time may be 'saved' from one time period
to another. Therefore, if we can show that one period is stable, then all
time periods will be stable.
The activities shown as a rectangle labeled with 'R' is the execution time
of the software that responds to an interrupt from the ethernet controller
that is caused by the receipt of a packet addressed to this station.
Normally, the command to ready the controller to receive another packet is
performed at this time or was already performed in a previous time period.
No block move of the data is required because the next read request will
point to a different block of memory. The time to execute this block is
given the fixed value of 20 usec. (a conservative estimate).
The activities shown as a rectangle labeled with 'S' is the software that
sends a command to the ethernet controller to send a packet of information
from the designated memory location to the designated ethernet address. For
the proposed system, all messages (except perhaps the sending of a CDB by a
Client) are broadcast messages, so all of these are received by all
stations. The time to execute this computer code is given the fixed value
of 20 usec. (a conservative estimate).
The use of a polling technique is preferable to a round-robin method where
each process takes its turn after the receipt of a block from a specified
neighbor. The advantage to the positive polling technique is that if a CDB
is received by the Host and is in error, the Host can immediately re-poll
the Client. The improper receipt of a poll by a Client is indicated by the
lack of a CDB message block within a short measurable block of time.
If the HOST senses a transmission error in the sending of a HDB (remember
that the HOST is also listening for broadcast messages), then the Host is
free to send the HDB again, immediately. Better still, the HOST should
pause for a brief amount of time to see if any Client sends a negative
acknowledge message of an error in the receipt of the HDB. Chances are all
stations will see the same error. Each will send (or try to send) a
negative acknowledgement. A collision is assured. This collision designates
to the HOST that the HDB was not received correctly, allowing an immediate
retransmission of the HDB. This is essential because of the timely nature
of these data.
For the purposes of this report we will assume that the number of signal
bits required to send the messages is double that of the number of data
bits. For the transmission of (8000*8*2=128,000 bits), then
128,000/100,000,000 = 1.28 msec. will be required. This states that under
normal circumstances, the ethernet bus is busy only 1.28/16.66=0.0768 =
7.7% of the time.
To support the ethernet communication, the Host uses 18 R and S blocks. At
a cost of 20 usec. per block, the total time used is 0.320 msec = 320 usec.
The total elapsed time for communication is this number plus the actual
transmission time. This is 0.320+1.280= 1.6 msec.
As the total period of the clock is 16.66 msec., the communication tasks
are accomplished within the first ten percent of the clock period. The
remaining ninety percent of the time is available for retransmission of
important blocks in the unlikely event of an error. It is also available
for optional and non-critical transmissions.
The timing diagram shows the accumulated processor and activity time. For
the proposed system, all of the necessary blocks are transmitted within the
first ten percent of the period. Nevertheless, time states are defined that
are seen by all processes to allow each to monitor the overall operation of
the system. Since the primary clock tick interrupt causes the HOST to send
a SYNC_0 block, the receipt of this block is a clear well defined
synchronization signal for the entire system (including the Host), as these
blocks are received by all at the same instant (not counting the
transmission medium propagation delay which can be corrected for in
software since it is due to cable lengths, only.)
Enclosed with this report as Figure 1 and Figure 2 are two timing diagrams
for the proposed system. Figure 1 shows a system which uses a round-robin
protocol in which each processor knows precisely when it may transmit by
the receipt of a block of known type from another station. This order has
been worked out in advance. That is, each processor knows where its
transmission 'fits' in the sequence. Each processor is on its honor to
transmit at the proper time and only at the proper time.
Figure 2 is a timing diagram for a system that uses active polling by the
HOST process. The advantages of this technique outweigh the effect of the
slight amount of additional processing and the slight additional delay in
completing all transmissions, over that seen with the round-robin example.
However, there may be a situation where timing is extremely tight in which
this technique could be used to advantage.
One of the primary purposes of this study was to determine whether or not
that Fast Ethernet could be adequate for the interprocessor communication
in a real-time multiple processor system. The answer to this question is
clearly yes. However, it must be realized that we have achieved this level
of performance by imposing on the ethernet communications a set of rules to
prevent the system collapse that we see could exist. That is, if no
controls are imposed upon the message sending sequence, then there would
exist a finite probability that the system would start into the downward
spiral of collisions leading ultimately to system failure. We will choose
to impose those rules that prevent this instability.
The total processing demand (which is the most demanding on the Host) is
320 usec out of a total of 16666 usec, the period of the clock. A
multimaster bus, if absolutely perfect, would buy back only 320/16666 or
1.9% of the available processor time. The other advantages of modularity
etc. far outweigh this very small penalty that we pay for ethernet
communications.
The total elapsed time for messaging is under 10% of the available time.
The use of any other configuration would buy back only this amount of wall
clock time. Normally, this will not cause the delay in processing of data
that would slow the overall real-time performance of the system.
As an added benefit, it is seen that small embedded controllers can be
designed and produced very inexpensively. These controllers would listen to
the ethernet communication and pick out specific bits of information to
satisfy the needs of this controller. This listen only attachment offers no
burden, whatsoever, to the running of the remaining parts of the real-time
system. This means that an altimeter can be attached to the system by an
ethernet cable and operate without an operating system, yet have the
processor power to properly model the ballistic properties of the device
freeing the host computer from this computational burden.
The writing of the code for the HOST and CLIENT processes is by far the
majority of the effort in the implementation of these systems. No person
involved in these efforts need know which system configuration has been
chosen.
Stallings, William Data and Computer Communications, Fifth Edition.
Prentice Hall 1997. ISBN 0-02-415425-3
S/W Specification Internal Specification Intel Ethernet Controller Card
Revision 3.2x 4/30/92. Intel Corporation.
The 82586 Local Area Network Coprocessor Local Area Networks. Intel
Corporation October 1989.
PROGRAM MAIN
c declare the global areas of memory
c declare the local areas of memory
CALL INIT
A: Wait for a Primary Timer Tick interrupt
CALL HOST
CALL CLIENT_1
CALL CLIENT_2
...
CALL CLIENT_NC
GO TO A
END
PROGRAM MAIN
C declare global memory blocks
C declare local memory blocks
CALL INIT
1: wait for the primary clock tick interrupt
send the SYNC_0 and the HDB
CALL HOST
GO TO 1
END
PROGRAM MAIN
C declare global memory blocks
C declare local memory blocks
CALL INIT
2: wait for HDB to be received
prepare to send the CDB upon receipt of a poll
CALL CLIENT
GO TO 2
END