100BaseT Ethernet for Interprocessor Communication in Real-Time Systems

This report was prepared in 1997 by John H. Letcher, Professor of Computer Science of the University of Tulsa and President of Synergistic Consultants Incorporated.

Executive Summary

The primary purpose of this study was to answer the question, is it possible to employ 100BaseT ethernet to provide all of the interprocessor communication in a real-time multiprocessing system built out of distinct modules? It was also the purpose to state the loss in efficiency that this configuration would produce over that using of more elaborate and expensive alternative approaches, such as the use of Symmetric MultiProcessing (SMP) using an operating system such as Windows NT, and the use of multi-master busses interconnecting the processors.

The answer to the question of ethernet practicality is a qualified - yes. For stable system operation, the use of contention in communication protocols should be avoided. Establishing a protocol for messaging (i.e. taking turns or a slave processor responding only to active polling) almost defines away the possibility of a message packet collision. This allows for the specification of a guaranteed stable system.

The use of ethernet as the mechanism for interprocessor communication degrades each computer by at most a few percentage points over the use of the more complex and expensive multi-master busses which could provide the interconnection of the system processors. Even in a system with as few as a half dozen processors, since the overhead imposed by ethernet communication is only a few percent, the ethernet distributed system offers more computer computational power than a single system with SMP. Furthermore, the extendibility of the distributed system is readily apparent.

Finally, the wall clock time required to carry on ethernet communication (over that of multi-master busses) will provide a delay in processing of at most one period of the system clock (1/60 sec.) over that possible with the more expensive busses. This author does not see any harmful effects of this slight additional delay.

The design specification for a stable real-time system is outlined in this report as a way to prove the practicality of using ethernet techniques for interprocessor communication. The calculated timing diagrams show a very large margin of safety (with regard to timing) that is offered by this proposed system: With a defined communication load of under 10,000 bytes total in both directions, the processor burden is only a few percent of the total time available. Furthermore, the total time required to carry out all communication is less than ten percent of the available time; that is, from the start of any period of the clock, the communication is finished before ten percent of the period has been used. This offers a huge margin for error to correct for almost all interprocessor communications errors.

Introduction

For the purposes of this study, it is taken as a given that there is a need for a computer system that performs a set of real-time tasks, each of which is coded in the form of a procedure or process. These tasks can be organized so that each of the processes is called once and only once during the period of the primary system clock (1/60 sec.) Each process can be characterized by a number (expressed with the dimensions of time) that is the maximum amount of time that is required to execute the process, no matter what data are presented to it. These times will be called the Processor Characteristic Time (PCT).

Assuming that the total demand of these tasks - the sum of the Characteristic Times, if run on a single processor, exceeds the time allotted for a single period of the clock (16666 usec.), then the question arises, how may multiple processors (computers) be used to make up for the inability of one processor to accomplish the job at hand within the proper time, with complete assurance that all processing required is actually accomplished within the period of primary clock. Furthermore, there must be assurance that this processing is accomplished within the period of each and every period of the primary clock by all processors.

Although there are many computer configuration possibilities to implement this system, we will choose to examine only three choices, namely 1) the use of multiple processors in a Symmetric Multi-Processing (SMP) mode using an operating system such as Windows NT, 2) the use of a set of independent processors, each with a local memory, all sharing a common multi-master bus and a block of shared memory, and 3) a group of independent processors interconnected only by means of an industry standard networking bus. For this study we will consider the use of 100BaseT ethernet techniques.

Real-Time Systems

Although, in general, real-time computer systems may take many forms, we choose to consider only one special case of such systems. We choose to consider only the case of a system comprised of a single process, called the Host, in a master/slave relationship with a set of slave processes called Clients. The operation of this system takes place in regular cycles each with a period of 1/60 sec. Interprocessor communication takes place in the form of messages which are blocks of data passed between the local memory of one process to that of another.

Each process must complete all of the calculations and perform all of its expected actions within the allowed time frame, the period of the system clock. The inability to do so constitutes a system failure. We shall also refer to this situation as system instability. A stable system is therefore a system that never misses a time deadline and performs all of the actions expected of it (the required tasks) experiences a failure by any processor in any time period.

All processes must be stable for the system as a whole to be stable.

System Organization

The system shall be organized so that one processor, called the HOST drives all of the operations of the entire system. The other processors called CLIENTi, i=1,...,NC where NC is the number of client processors receive a single data block from the Host, perform calculations and take actions appropriate to the process and at the appropriate time, return a block of information to the Host. Every period of the primary clock stands alone functionally in each time period; that is, each process receives at least one block of data, performs calculations and returns at least one block of data to be used by other processors. During each period of the clock, each process is using exogenous data acquired in this time period only. Of course, each process stores data for future use in its own local memory area to be used for any purpose needed by the process.

Back door communication from Client to Client is not allowed. If a Client needs data, it must acquire it by communication of this need to the Host by the conventional message blocks.

In the design specification process, after the function of each process is decided, it is possible to characterize each process with a Characteristic Time (defined to be T0 for the Host and Ti, i=1,...,NC for the Clients). Again, the Characteristic Time is the maximum amount of time that the processor will use each time the process is executed, no matter what values of the data are presented. It will be possible to show the conditions of stability (meaning the lack of failure) for options 1 and 2, above. However, for option 3, making statements with regard to stability must be qualified by limitations of the timings of the interaction between processors. This is to say that it is possible to organize the timing relationships to induce instability.

The system must be specified and implemented so that there exists no possibility of system failure, unless there is a serious hardware malfunction or there is double occurrence of an improbable noise corruption of an interprocessor message.

Fast Ethernet Communications

Fast ethernet is a broadcast technique that employs a transmission media and controller cards which are mass produced and are very inexpensive. All ethernet conforms to the IEEE 802.3 Standard specification. Earlier versions of ethernet transmitted at the rate of ten million signal bits per second, used inexpensive coaxial cable, and employed Manchester encoding and decoding to achieve proper clock synchronization of the send/receive units. Fast ethernet transmits at the rate of 100 Mbps. Cabling is possible using a variety of methods, such as using two Category 5 unshielded twisted pairs (UTP) or two shielded twisted pairs (STP) (100BaseTX) possible to use two optical fibers (100BaseFX) or four Category 3 or Category 5 UTP (100BaseT4) sufficient bandwidth for the interprocessor communications of the proposed system.

The ethernet controllers are intended to operate using the Carrier Sense Multiple Access with Collision Detection (CSMA/CD) technique (where each message sent is sent to all listeners, equally). Since each sender may transmit whenever it senses that the transmission medium is idle, there is a possibility that two stations will transmit at the same time. This is true because the propagation time between stations is finite so it is possible for two stations to start transmitting at roughly the same time, not knowing that the other has started transmission. When this occurs, each sending station is to send a short blocking signal (to make sure that the rest of the system knows that this is a collision). A 1-persistent algorithm is used after a collision is detected to determine when to send the message, the previous sensing of which contributed to causing the collision. At least two senders are in the same situation; that is, each wants to send. Ultimately each sending station will remain idle for a random amount of time after the detection of a collision. Hopefully one station will start transmitting long enough before another so that the second sees the first station's transmission (and waits) so that another collision is not generated. If the maximum random time is set to too small a value, the probability of a second collision will increase, yet if the time is set too long, there is a high probability of missing one of the real-time deadline; thereby generating a system failure.

The use of the binary or exponential hold-off techniques do not work well in a system with rigid (relatively short) time deadlines. The inability to have received any of the interprocessor messages within the allotted time slot constitutes system failure. Therefore, contention should not be allowed in the design of a system in which stability (lack of failure) is an absolute requirement.

Fortunately, one can impose (by software design) an environment which is positively driven by the Host, so that even with one message corrupted by noise in any time period, this system will remain stable.

The message structure used by ethernet (and Fast Ethernet) (called the MAC frame for the 802.3 protocol) consists of the following fields:

1. Preamble (7 octets of bits of alternating 0's and 1's, used to establish synchronization of the clock of sender and receiver)

2. Start Frame Delimiter (1 octet with the value of 10101011)

3. Destination Address (2 or 6 octets) (2 chosen for this system)

4. Source Address (2 or 6 octets) (2 chosen)

5. LLC Data (>= 0 bits, the message to be sent)

6. Pad (a number of octets added to ensure that the frame length has a total transmission time that is at least twice the propagation delay of the longest broadcast path. This is to ensure proper collision detection operation.

7. Frame Check Sequence (4 octets for error control)

In the contemplated real-time system, the cable lengths are extremely short (in tens of meters, not hundreds) so that a requirement of Pad octets is essentially nonexistent. The point here is that the number of bytes (octets) in each message is therefore 16+. For a long message (>160 bytes), the overhead associated with adding a preamble and postamble to the transmitted parcel is under ten percent of the signal transmission time.

Earlier ethernet transmission employed Manchester encoding/decoding of the data bits to produce signal bits. Here, the number of signal bits is double the number of data bits. This is done to facilitate the synchronization of the signal sampling clocks in the receiver circuits with that of the sender, sometimes referred to as self-clocking, a term which is misleading. This clock synchronization is accomplished by supplying a guaranteed transition in the middle of the time interval that a data bit is sent.

With Fast Ethernet, it was felt that Manchester encoding was a wasteful use of broadcasting bandwidth. Therefore, other methods have been employed for all of the 100Base techniques. For 100BaseX, a unidirectional data rate of 100 Mbps is achieved by transmitting over a single link (single twisted pair, single optical fiber) 4B/5B-NRZI is employed. 100BaseTX uses two pairs of twisted pair cable. Both STP and Category 5 UTP are allowed. The MTL-3 signaling scheme is used. (Appendix 13A of the Stallings textbook on Data and Computer Communications, pages 451-457, is an excellent reference for the details regarding the above mention signaling techniques.) The details of these techniques will not be presented further, here. The importance of all of this is to realize that it is absolutely safe to say that estimating that the use of 16 signal bits for each 8 data bits (byte) is a very conservative estimate. This provides an upper bound to be use in the timing calculations, below.

Operation of Ethernet Controllers

All Ethernet controllers worth discussing operate in a straightforward manner: A finite state machine on the ethernet controller card handles all of the tasks of sending and receiving packets of data. The computer itself need only send a short message to this microcontroller that it wishes to send or receive a packet (a block of data) location in memory of the message (if it is to be sent or where it is to be placed if it is to be received) each message. In a well designed system each message type would have a template message prepared by the computer during initialization. Therefore the actions required by each computer to handle each message require a very short amount of time to perform the task (under 20 microseconds).

An important fact to realize is that the act of sending, receiving and all other communication tasks are handled by the microcontroller on board the ethernet card. These tasks present essentially no burden whatsoever on the computer and its other tasks (to be sure, when the microcontroller is moving a message (by DMA) to or from main memory, the computer and the microcontroller share the data bus for memory transactions. The impact of this on the computer is minimal.

The operation of an ethernet controller card is a relatively simple matter to effect. Every action (or more properly, requested action) is taken by copying a simple data block into the internal registers of the ethernet controller. A part of this block is comprised of the command and a memory location of where the packet is to be placed. The destination address must also be specified. This command block may be issued either in an interrupt routine or not depending upon the situation. Writing a parcel is a matter of telling the controller where the data block is located in memory by giving it a pointer, by giving the specification of the destination address (which may be an individual computer or it may be designated as intended to be sent to all stations, i.e., a broadcast message) byte with the proper value tells the controller to act. The time to execute all of the above is on the order of under 10 microseconds for each request. The processor is now free to do whatever it wishes (except that the block of data to be sent may not be touched) the controller usually posts an interrupt to alert the computer and to allow it to initiate another write command.

Similarly, to accept a packet of data from the outside world, the computer sends a message to the controller (by loading various I/O registers on the controller card) to offer a space in memory to hold an incoming data packet. Only when a packet with the proper destination address is received is the computer informed by means of an interrupt. Then, the computer is free to issue a command to accept another packet. Many modern controllers allow the posting of a number of requests without waiting for the first request to complete.

The time required by the computer for each of the chain of interrupt responses will be so short and efficiently carried out that each packet sent or received represents under 20 microseconds of time by the computer. The tasks of sending, receiving, encoding, decoding, checking for transmission errors, etc. are handled entirely by the ethernet controller card. These tasks do not provide any burden on the computer except for the messaging time and the slowing of the computer caused by the sharing of the data bus by the DMA movement of the data blocks from the main memory to the data memory on board the ethernet controller card.

An ethernet controller card is actually two totally independent devices one to send and one to receive. It is possible for the receive unit to listen to the ethernet bus and accept the packet that is currently being sent by the sender unit of this controller card. Since the packet is fully formed, along with a Frame Check Sequence, it is possible to positively affirm that the packet was properly sent. By using very short (shielded) cables, it is unlikely that a reception error would occur that is not observed by the sender. Therefore, for ordinary operation each controller acts as its own monitor with regard to the correctness of its transmission. In the extremely unlikely event of a reception error which has not been caught by the self monitoring process, this can be handled by a simple negative acknowledge message sent after all normal transmission in a period to allow retransmission of a block, except that the retransmission of a Host Data Block described below will be handled immediately because of the importance of these data to each Client. In the examples given below it will be seen there is ample time for retransmission of a single block in any time period.

When a message block is sent the probability of occurrence of a single bit error or short burst is small. If the system is properly implemented with adequate power, shielding, etc., the probability that another single bit error occurs within a short interval of time is given by a Poisson distribution of pair-wise stochastically independent events. The maximum in this distribution is at zero time meaning that the occurrence of improbable independent events tends to cluster. Nevertheless, for this discussion, the probability of a single transmission error is very small and the probability of two of these errors within the same time period is vanishingly small.

Program Organization of a Proposed System

Within the address space of the processes, there exist a set of data structures (blocks of memory) that are used for storage and for communication of information among the processes. These are defined as follows:

1.1 The Host Data Block (HDB) employs a single processor or multiple processors sharing a multimaster bus. For the system using ethernet communications this block is replicated in the local memory of each of the processes (Host, Client_1, Client_2, etc.)

1.2 Data storage for other items required by Host processing which is local to the Host process.

2.1 The Client_i Data Blocks (CDB_1, CDB_2, etc.) This is global memory if the system employs a single processor or multiple processors sharing a multimaster bus. For the system using ethernet communications, these block are replicated in the local memory of each of the processes.

2.2 Data storage for other items required by Client processing which is local to the Client process.

There exists a set of processes:

1. An interrupt routine that responds to the Primary Clock Tick. This occurs every 1/60 sec.

2. An interrupt routine that responds to an interrupt caused by the ethernet controller.

3. The program Sequencer Process, MAIN, that actively drives the calling of each of the non-interrupt processes.

4. The INIT process, an initialization routine to fill each data block of memory with appropriate values at startup or restart.

5. The HOST process. All of the important tasks to be performed by the Host are performed in this single module. This software reads the Client Data Blocks, obtained during the previous time period, performs calculations, performs I/O operations with its local devices and prepares the Host Data Block for sending to the Client modules. A time period for the Host Process starts upon the receipt of the primary timer tick interrupt.

6. The CLIENT_i Processes. All of the tasks to be performed by each Client are carried out by this process or are directly called by this process. I/O operations with the devices local to the process are also performed. This software reads the Host Data Block, performs its work and prepares the Client Data Block for transmission to the Host. A time period for a Client starts immediately after the receipt of the Host Data Block. Each Client must finish its work before the receipt of the next HDB. The Client must have prepared the CDB areas of memory within the allotted time.

Processes 1 and 2 operate under interrupt control. The others do not.

For a single processor system, the program Sequencer is a very simple loop set up in the following manner:

 
             PROGRAM MAIN
        c  declare the global areas of memory
        c  declare the local areas of memory
             CALL INIT
          A: Wait for a Primary Timer Tick interrupt
             CALL HOST
             CALL CLIENT_1
             CALL CLIENT_2
               ...
             CALL CLIENT_NC
             GO TO A
             END
 

Please notice that this simple sequencer defines away the problem of access to the same memory by multiple processes that might occur in multi-processor systems. By positively orchestrating the sending of messages in the distributed systems we will accomplish the same end.

Each of the above processes (that is, each call to the process) is characterized by a Process Characteristic Time. For a single processor system to be stable, the clock period T must be greater than the sum of the Processor Characteristic Times =(T0 + T1 + ... +TNC) plus .

The HOST process must be capable of executing the following activities: Using stored data and new data received recently from the Clients, HOST performs calculations to generate blocks of data to be sent to the Clients. During this time, the Host process performs actions that control peripheral devices. The Host Process is called by the sequencer, once in each time period. The Client processes copies the block(s) generated by the Host, as above and performs its tasks culminating in the preparing the Client Data Block to be sent to the Host.

The actual time that is spent in the Sequencer (except for intentional waiting for a timer tick) is extremely small (well under one percent of the available processing time) so that this will be ignored.

In a well written system, the time required to service each interrupt (timer and ethernet controller) is short. A very conservative estimate of under 20 usec. per event shall be used in the timing examples below. Since the interrupt routines must share the processor with the code that is not run under interrupt control, any excess time spent in the interrupt routine is taken away from the non-interrupt routines. In the examples given below the fraction of time required to support timers and ethernet controllers with interrupt processing will be at most only a few percent of the available time. For very rough calculations, the time required by the interrupt routines can be ignored; however for the timing diagram given below, each contribution is shown, even though it causes the time scale to be not to scale.

System Options

Let us assume that there exists the situation that the total execution time (the sum of the Processor Characteristic Times of all processes plus the execution time of servicing interrupts) exceeds the allowed time period, T. We have no choice other than to use a faster processor or to use multiple processors specified so that each processor is capable of finishing its work in the allowed time period. This next step is to distribute the processing load over a number of processors. These processors can be interconnected in various ways:

1. Using a local bus type interconnect which is employed in a variety of commercial motherboard designs. The hardware would employ an operating system, such as the SMP processing capability of the Microsoft Windows NT operating system, to distribute the processing load among a number of processors.

2. Using a multi-master bus, such as Multibus-II, to allow sets of computers, each with its own memory, to share a common memory address space. The simplest configuration is where each processor has its own local address space, yet shares with all other processors a common memory area to be used for interprocessor communication. This area of memory will hold the primary copies of the HDB and the CDB blocks. Perhaps, some additional memory is used for interprocessor synchronization.

3. Using a set of totally independent processors, each with its own local memory and each is capable of operating as a stand alone computer. The set of processors share, at most, a common source of power. All interprocessor communication is carried out by the use of an industry standard networking bus, chosen for its acceptance as an IEEE standard, which is supplied by a variety of vendors where all vendors products are mutually compatible with each other.

The choice of IEEE standard networking techniques depends upon a set of complex issues. The two major contenders are ethernet and token ring. It is well known that token ring is stable under heavy loads (which is not shared by ethernet) but it operates at relatively slow rates (4 or 16 Mbps) higher transmission rate of 100 Mbps offered by Fast Ethernet seems to be the proper choice. However, we must remember that ethernet employs a contention technique. This system will collapse under a very heavy system load, so for this technique to be used in a real-time system with stern short time deadlines, it means that we must impose a set of rules with regard to who may transmit and with regard to when each may send in order to prevent the multiple collisions that will characterize a system collapse. This failure is not the result of bad programming, rather the possibility of collapse is implicit in the design of ethernet.

On the single processor machine, the message blocks are areas of global memory. Sending a block consists of nothing more than a processor executing a memory block move. The time required to perform this task (assuming that the memory is fast enough not to require the insertion of wait states) is two clocks of the memory bus clock for each read and write of the data object transfer. For an Intel 386 class computer, the data object can be four bytes per transfer. For sending 10,000 bytes each 16666 usec., this is effectively free. So, the use of multi-master busses or local busses for interprocess communication will provide only a trivial overhead using options 1 and 2, above. Therefore, in the simple calculations that follow we will make the assumption that this overhead is free, that is, it takes no additional time. The calculation of the processor time required to perform ethernet communications will be our measure of the comparison of the different techniques.

The software for the HOST and CLIENT processes can be written, easily, so that each may be used in any of the above three configuration choices without any software modification, whatsoever. Only the additional software that moves the message blocks from the address space of one processor to another will perhaps differ from one configuration to another. The time required to execute the tasks of communication will obviously take away from the amount available. Nevertheless, the remaining time in the period may be use by the HOST and CLIENT processes.

The writing of the code for the HOST and CLIENT processes is by far the majority of the effort in the implementation of these systems. No person involved in these efforts need know which system configuration has been chosen, because it makes no difference in how the HOST and CLIENT processes are written. The use of ethernet communications is transparent to most of the systems programmers.

For an ethernet based system, the Host is a standalone computer running the simple sequencer process:

 
                   PROGRAM MAIN
            C   declare global memory blocks
            C   declare local memory blocks
                   CALL INIT
             1: wait for the primary clock tick interrupt
                 send the SYNC_0 and the HDB
                   CALL HOST
                   GO TO 1
                   END
 

Each Client is also a standalone computer. Each of these is running a simple sequencer process such as:

 
                   PROGRAM MAIN
            C   declare global memory blocks
            C   declare local memory blocks
                   CALL INIT
             2: wait for HDB to be received
                 prepare to send the CDB upon receipt of a poll
                   CALL CLIENT
                   GO TO 2
                   END
 

Supporting the above is a set of interrupt service routines to drive the ethernet communications and to synchronize processing with the primary clock. These routines are listed below.

The Interrupt Service Routines

In all of the examples given below, it is assumed that the number of Client processes is four; i.e., NC=4.

The interrupt routines for the Host must be capable of performing the following:

1. Receive the Primary Tick Interrupt and allow the sequencer to start its loop. Then, send a broadcast packet called SYNC_0 immediately. This message alerts all processors of the start of a Primary Time Period.

2. Receive the SYNC_0 packet and set the Time State (TS) equal to 0.

3. Send a broadcast packet to all listeners. This message contains the HDB.

4. Receive the HDB packet and then send a broadcast packet called POLL_1. This signals that Client 1 must send its message (CDB_1), now. The Time State is set equal to the Client number, i.e., 1.

5. Receive the CDB_1 packet and immediately send the POLL_2 packet. This signals that Client 2 must send its message now. The Time State is set equal to the Client number, i.e., 2.

6. Receive the CDB_2 packet and then send a broadcast packet called POLL_3. This signals that Client 3 must send its message now. The Time State is set equal to the Client number, i.e., 3.

7. Receive the CDB_3 packet and then send a broadcast packet called POLL_4. This signals that Client 4 must send its message now. The Time State is set equal to the Client number, i.e., 4.

8. Receive the CDB_4 packet and then send a broadcast packet called SYNC_5. This signals the end of the messaging sequence for this time period

9. Receive the SYNC_5 packet and set the Time State (TS) equal to 5.

The interrupt routines for Client_i must be capable of the following:

1. Receive the SYNC_0 packet and set the Time State (TS) equal to 0.

2. Receive the HDB packet. Copy this block immediately to working storage.

3. Receive the POLL_i packet. Set the Time State = i. If the value of i is equal to the Client number for this process, send a packet to the Host containing this processes CDB_i, immediately.

4. Receive the SYNC_5 packet and set the Time State (TS) equal to 5.

A Proposed System

Up until now, no statements were made concerning the internal organization of the HOST and CLIENT processes. Since the HDB and the CDBs are simply contiguous blocks of memory with no discernable content (that is, by the messaging software), we need to characterize these blocks only by size. For our proposed system, the Host Data Block is assumed to be 4000 bytes long. Furthermore, we assume that each of the four CLIENT machines produce a CDB which is 1000 bytes in length. It will be seen that the timing calculations are sensitive only to total length and how this length is distributed among the processors is of no significance. We are really assuming that we have 8000 bytes to send in each and every clock period.

The operation of the system is described in the enclosed timing diagram. This is a plot of activity as a function of time. The activity is considered to be busy (the resource is in use by this activity) or it is not. The busy regions are shown as rectangles. The absence of a rectangle means that this resource is either idle or is being used for other purposes.

Shown on the timing diagram are the six activities of interest in the telecommunications portion of this system. These are:

1. The ethernet bus. When a packet is sent by any station a rectangle on the timing diagram shows that the bus is busy. Any attempt to transmit a packet by another station will result in a collision and thereby generating a garbage packet. Even if the timing diagram shows an idle state, not every station may transmit because of the finite propagation delay of the transmission medium. Simply checking for idle does not offer complete assurance that a new packet can be sent without the possibility of a collision, unless there is a rule imposed on the software that gives priority to one process or another to send at that instant.

2. The Host interrupt service activity is to cause the sending of a packet or the processing to handle the receipt of a packet that has been addressed to it. All other times not shown as busy (by the presence of a rectangle) are available for the non-interrupt processing that is performed by the HOST process, so the time not so marked is the processing time that is available for the HOST process. The HOST Process Characteristic Time must be less than the time shown as unmarked on the timing diagram. That is, just as long as the PCT is less than the unmarked time, the system will be stable.

3-6. The Client interrupt service activity is to cause the sending of a packet or the processing to handle the receipt of a packet that has been addressed to it. All other times not shown as busy (by the presence of a rectangle) are available for the non-interrupt processing that is performed by this CLIENT process, so the time not so marked is the processing time that is available for the CLIENT process. This CLIENT Process Characteristic Time must be less than the time shown as unmarked on the timing diagram, that is just as long as the PCT is less than the unmarked time, the system will be stable.

Only one clock period is shown. It is assumed that all other clock periods follow precisely the same pattern.

The clock period for this system is 16666 usec. This is the total time available for each processor. No time may be 'saved' from one time period to another. Therefore, if we can show that one period is stable, then all time periods will be stable.

The activities shown as a rectangle labeled with 'R' is the execution time of the software that responds to an interrupt from the ethernet controller that is caused by the receipt of a packet addressed to this station. Normally, the command to ready the controller to receive another packet is performed at this time or was already performed in a previous time period. No block move of the data is required because the next read request will point to a different block of memory. The time to execute this block is given the fixed value of 20 usec. (a conservative estimate).

The activities shown as a rectangle labeled with 'S' is the software that sends a command to the ethernet controller to send a packet of information from the designated memory location to the designated ethernet address. For the proposed system, all messages (except perhaps the sending of a CDB by a Client) are broadcast messages, so all of these are received by all stations. The time to execute this computer code is given the fixed value of 20 usec. (a conservative estimate).

The use of a polling technique is preferable to a round-robin method where each process takes its turn after the receipt of a block from a specified neighbor. The advantage to the positive polling technique is that if a CDB is received by the Host and is in error, the Host can immediately re-poll the Client. The improper receipt of a poll by a Client is indicated by the lack of a CDB message block within a short measurable block of time.

If the HOST senses a transmission error in the sending of a HDB (remember that the HOST is also listening for broadcast messages), then the Host is free to send the HDB again, immediately. Better still, the HOST should pause for a brief amount of time to see if any Client sends a negative acknowledge message of an error in the receipt of the HDB. Chances are all stations will see the same error. Each will send (or try to send) a negative acknowledgement. A collision is assured. This collision designates to the HOST that the HDB was not received correctly, allowing an immediate retransmission of the HDB. This is essential because of the timely nature of these data.

For the purposes of this report we will assume that the number of signal bits required to send the messages is double that of the number of data bits. For the transmission of (8000*8*2=128,000 bits), then 128,000/100,000,000 = 1.28 msec. will be required. This states that under normal circumstances, the ethernet bus is busy only 1.28/16.66=0.0768 = 7.7% of the time.

To support the ethernet communication, the Host uses 18 R and S blocks. At a cost of 20 usec. per block, the total time used is 0.320 msec = 320 usec. The total elapsed time for communication is this number plus the actual transmission time. This is 0.320+1.280= 1.6 msec. As the total period of the clock is 16.66 msec., the communication tasks are accomplished within the first ten percent of the clock period. The remaining ninety percent of the time is available for retransmission of important blocks in the unlikely event of an error. It is also available for optional and non-critical transmissions.

The timing diagram shows the accumulated processor and activity time. For the proposed system, all of the necessary blocks are transmitted within the first ten percent of the period. Nevertheless, time states are defined that are seen by all processes to allow each to monitor the overall operation of the system. Since the primary clock tick interrupt causes the HOST to send a SYNC_0 block, the receipt of this block is a clear well defined synchronization signal for the entire system (including the Host), as these blocks are received by all at the same instant (not counting the transmission medium propagation delay which can be corrected for in software since it is due to cable lengths, only.)

Enclosed with this report as Figure 1 and Figure 2 are two timing diagrams for the proposed system. Figure 1 shows a system which uses a round-robin protocol in which each processor knows precisely when it may transmit by the receipt of a block of known type from another station. This order has been worked out in advance. That is, each processor knows where its transmission 'fits' in the sequence. Each processor is on its honor to transmit at the proper time and only at the proper time.

Figure 2 is a timing diagram for a system that uses active polling by the HOST process. The advantages of this technique outweigh the effect of the slight amount of additional processing and the slight additional delay in completing all transmissions, over that seen with the round-robin example. However, there may be a situation where timing is extremely tight in which this technique could be used to advantage.

Conclusions

One of the primary purposes of this study was to determine whether or not that Fast Ethernet could be adequate for the interprocessor communication in a real-time multiple processor system. The answer to this question is clearly yes. However, it must be realized that we have achieved this level of performance by imposing on the ethernet communications a set of rules to prevent the system collapse that we see could exist. That is, if no controls are imposed upon the message sending sequence, then there would exist a finite probability that the system would start into the downward spiral of collisions leading ultimately to system failure. We will choose to impose those rules that prevent this instability.

The total processing demand (which is the most demanding on the Host) is 320 usec out of a total of 16666 usec, the period of the clock. A multimaster bus, if absolutely perfect, would buy back only 320/16666 or 1.9% of the available processor time. The other advantages of modularity etc. far outweigh this very small penalty that we pay for ethernet communications.

The total elapsed time for messaging is under 10% of the available time. The use of any other configuration would buy back only this amount of wall clock time. Normally, this will not cause the delay in processing of data that would slow the overall real-time performance of the system.

As an added benefit, it is seen that small embedded controllers can be designed and produced very inexpensively. These controllers would listen to the ethernet communication and pick out specific bits of information to satisfy the needs of this controller. This listen only attachment offers no burden, whatsoever, to the running of the remaining parts of the real-time system. This means that an altimeter can be attached to the system by an ethernet cable and operate without an operating system, yet have the processor power to properly model the ballistic properties of the device freeing the host computer from this computational burden.

The writing of the code for the HOST and CLIENT processes is by far the majority of the effort in the implementation of these systems. No person involved in these efforts need know which system configuration has been chosen.

References

Stallings, William Data and Computer Communications, Fifth Edition. Prentice Hall 1997. ISBN 0-02-415425-3

S/W Specification Internal Specification Intel Ethernet Controller Card Revision 3.2x 4/30/92. Intel Corporation.

The 82586 Local Area Network Coprocessor Local Area Networks. Intel Corporation October 1989.