filters

COMPILER FILTERS

Introduction

All serious computer programmers use more than one computer language. Different languages have their own features that cause us to favor the use of one language over another, depending of course, on the task at hand. Sometimes, to complete a programming task and to have produced software that performs according to a specification, the use of one particular language is mandatory. Yet other routines, necessary for the current task, are available and are written in another language.

For this and a number of other reasons, it would be desirable for the programmer to be able to merge sets of object modules (.obj) produced by different compilers into a single load module (.exe). This is possible only if the compilers (the programs that translate the source modules [.for, .c, .cpp, etc.] into object modules) that are written in a mutually compatible format or in sets of formats that all can be read correctly by the linker (the linker is the program that reads the sets of object modules and produces the executable load module.)

Compilers from different vendors are seldom mutually compatible.

Unfortunately, this incompatibility goes much further than just the different object file formats. For compilers to be truely compatible, subroutine calls must be made in precisely the same way. This is just one example.

This author uses Fortran, C, and C++ languages on a daily basis. When a source module is created in one language, it seems foolish to have to translate (or transliterate) it into another language, just for compatibility reasons. So, we invariably end up with sets of routines written in different languages. We want to use all of these sets of routines and we want to use them together.

Also, in using multiple languages, we do not want to change the way we write in any of the languages.

Achieving this compatibility is quite a task. For this to be possible, we must choose the compilers, linker(s), library generators, and operating system(s) carefully. Then, a set of programs, called compiler filters, can be written to rewrite the output of each compiler to achieve this mutual compatibility. This document describes what has been done to accomplish making our compilers "play together".

In order for the remainder of this document to be understood, it is assumed that the reader is totally familiar with the use of X86 assembly language. It is not the intention that this document give this background information.

The compiler filter software described in the document was designed and implemented by John H. Letcher , Professor of Computer Science of the University of Tulsa and President of Synergistic Consultants Incorporated.

Choice of Vendor Supplied Software

On systems running Microsoft Windows NT (3.51, 4.0 and 2000), this author has chosen to use the Fortran-77 compiler from Microway, For the C and C++ languages, the Visual C/C++ 5.0 compiler has been chosen for use. The assembler chosen is the Microsoft Macro Assembler, MASM 6.0. The linker and library generator are supplied by Microsoft.

For a variety of reasons, it is also desirable to be able to translate Fortran source modules directly into the C programming language. For this purpose, this author uses the Fortran to C translation program Promula.Fortran (PFC). This generates a homogeneous C enviornment for porting software to Linux (Unix, Solaris, etc.) Yet, we may still write in Fortran, if we wish.

Compiler Filter Philosophy

To achieve mutual compatibility, it was chosen to instruct each compiler to output assembly language. Then, a single assembler could be used to generate the object modules. Since only one program will produce the objects, clearly, there is no compatability problem with regard to object file formats. However,serious problems still remain. The problem here is that the assembly code generated by one compiler will not "play together" with the assembly code produced by another compiler.

Now, our task is to produce a set of programs that each read an assembly language source module produced by one compiler and translate the module into assembly language conforming to a fixed (common) set of specifications. Then, each module will be compatible with every other. Furthermore, the linker will not even know which compiler has produced the module.

Assembly Language Conventions

A number of choices must be made in the design of the compiler filters. Of great importance, is the choice of the method of how subroutines are called. That is, would the stack be used to pass subroutine argument locations? All of the really fast Fortran compilers of thirty years ago (e.g., from Cray), did not use stacks, but these notions were abandoned, not to make compilers easier to write (which it does) but because the older techniques did not allow recursion. Since C flatly requires recursion (which is good), C passes subroutine argument locations by pushing them onto a stack. But now we have to ask, should the arguments be pushed in the order of from right to left or from left to right? Also, does the called routine or the caller routine pop the arguments off the stack. Of the four possible answers to these two questions, Microsoft has used almost all of the options.

To simplify life, it was decided that the format and structure used by the Visual C/C++ compiler 5.0 when it writes assembly language. This saved writing one compiler filter.

The Filter Program, NDPPREP.EXE

This filter, NDPPREP.EXE, translates the assembly language output of the Microway Fortran compiler into a compatible assembly language format.

The Filter Program, PFCPREP.EXE and MAKELONG.EXE

The filter program, PFCPREP.EXE, translates the C language output of the Promula.Fortran System into a format compatible with Microsoft C language conventions.

The Filter Program, DLLPREP.EXE

Once routines are written in Fortran, C or C++, we would like to be able to call these routines directly from Visual Basic and Powerbuilder. This filter produces dynamic link libraries (.dll) and source code to be included in the Visual Basic and Powerbuilder source modules. Then the Fortran or C routines may be called directly by Visual Basic and Powerbuilder programs.

The Structure of an Assembly Language Module

Consider the Fortran source module:


      SUBROUTINE MAINSUB
      COMMON/LOOK/I,J,K
      I=5
      J=7
      CALL MYSUB(I,J,K)
      RETURN
      END

      SUBROUTINE MYSUB(I,J,K)
      K=I+J
      RETURN
      END

The assembly language module produced by the Microway Fortran compiler from the above fortran source is:


; NDP Version 4.6.0 -- 03/18/95
; fcom  -OLMA -X22 -X37 -X171 -X210 -X214 -X215 -X226 -X244 -X247 -X266 -X325
;  -X334 -X335 -X357 -X358 -X382 -X474 -X592 -X682 -X683 -X899 -X908
;  -X925 -X928 -X929 -X939 -X1002 -X1006 -X1010 -X1011 -X1016
; name mysub.for
 .386
 .387
 assume cs:codeseg
 assume ds:dataseg
codeseg segment para use32 public 'code'
codeseg ends
dataseg segment para use32 public 'data'
 extrn __vms_fortran:dword
dataseg ends
codeseg segment use32 para public 'code'
_mainsub_ proc near
;     .bf
 mov dword ptr ds:_look_,5      ; 00000000
 nop
 mov dword ptr ds:_look_+4,7    ; 0000000b
 lea eax,dword ptr ds:_look_+8  ; 00000015
 push eax                       ; 0000001b
 lea eax,dword ptr ds:_look_+4  ; 0000001c
 push eax                       ; 00000022
 push offset ds:_look_          ; 00000023
 add cl,0
 call _mysub_                   ; 00000028
 add esp,12                     ; 00000030
;     .ef
 ret                            ; 00000033
_mainsub_ endp
codeseg ends
dataseg segment para use32 public 'data'
dataseg ends
codeseg segment use32 para public 'code'
_mysub_ proc near
 push ebx                       ; 00000034
;     .bf
 mov eax,dword ptr [esp]+8      ; 00000035
 mov ecx,dword ptr [esp]+12     ; 00000039
 mov ebx,dword ptr [esp]+16     ; 0000003d
 mov eax,dword ptr [eax]        ; 00000041
 add eax,dword ptr [ecx]        ; 00000043
 mov dword ptr [ebx],eax        ; 00000045
;     .ef
 pop ebx                        ; 00000047
 ret                            ; 00000048
_mysub_ endp
codeseg ends
dataseg segment para use32 public 'data'
;_i eax local
;_j ecx local
;_k ebx local
;_i [esp]+8 local
;_j [esp]+12 local
;_k [esp]+16 local
dataseg ends
codeseg segment use32 para public 'code'
codeseg ends
dataseg segment para use32 public 'data'
 public _mysub_
 @12 struct 1t
 x db 12 dup (?)
 @12 ends
 externdef _look_:@12
 public _mainsub_
dataseg ends
codeseg segment use32 para public 'code'
codeseg ends
 end

The filter program NDPPREP.EXE reads the above file and produces this:


 .386
 .387
 assume cs:_TEXT
 assume ds:_DATA

_TEXT segment para use32 public 'code'

_mainsub proc near

 mov dword ptr ds:_look,5
 nop
 mov dword ptr ds:_look+4,7
 lea eax,dword ptr ds:_look+8
 push eax
 lea eax,dword ptr ds:_look+4
 push eax
 push offset ds:_look
 add cl,0
 call _mysub
 add esp,12

 ret
_mainsub endp

_mysub proc near
 push ebx

 mov eax,dword ptr [esp]+8
 mov ecx,dword ptr [esp]+12
 mov ebx,dword ptr [esp]+16
 mov eax,dword ptr [eax]
 add eax,dword ptr [ecx]
 mov dword ptr [ebx],eax

 pop ebx
 ret
_mysub endp
_TEXT ends

_DATA segment para use32 public 'data'
 public _mysub
 @12 struct 1t
 x db 12 dup (?)
 @12 ends
 public _mainsub

COMM _look   :BYTE:12

_DATA ends
 end

Please notice all of the differences between the two assembly language source modules.

The assembly language file starts with the directives .386 and .387 to tell the assembler that we wish to use full 32 bit X86 instruction. Two segments are defined, a code segment and a data segment. The code segment is supposed to be invariant under code execution. A stack segnment is used but it need not be named within this module.

When a subroutine is called, the subroutine argument locations are pushed onto the stack from left to right order. Then the subroutine is accessed by the use of the call instruction. Immediately following the call instruction, the stack pointer ESP is modified to remove the subroutine argument location pointers that had been pushed onto the stack before the call. This is saying that the calling program removes the argument locations from the stack.

A convention has been adopted that the value in EBX be pushed onto the stack immediately on entry and poped off just before the return, ret. This is used when the code is not terribly complicated.

At the time of execution of the first instruction after the push ebx instruction, the stack looks like this:


Offset Value
  0    The value in EBX
  4    Return Address
  8    Subroutine Argument #1 Location
 12    Subroutine Argument #2 Location
 16    Subroutine Argument #3 Location

If, however, the code needs the full use of the registers, normally as pointers,the push ebx is replaced with


     mov ebp,esp
     push esi
     push edi
     push ebx

Upon exit from the routine, the stack must be returned to its origional state.

Filter Batch Files

Listed below are a set of batch file (each to be run from the NT Command Prompt. Notice that the files are named NT or ND followed immediately by A (for assembler), F (for Fortran), C (for C) or CPP (for C++). The we have the letters TO followed by the designation of the purpose of the batch file. That is NTATOOBJ converts an assembly file into an object module.

NTNEWLIB.BAT (create a new Library using NT.obj and PLIB.obj) DEL FLIBSCI.LIB LIB /OUT:FLIBSCI.LIB NT.OBJ LIB FLIBSCI.LIB PLIB.OBJ

NTLINK.BAT (link mainline program with Library producing %.EXE) LINK %1.OBJ FLIBSCI.LIB LIBC.LIB /NOLOGO

NTATOOBJ.BAT (assemble %.asm to %.obj using MASM 6.11) ML /c %1.ASM /nologo

NTATOLIB.BAT (assemble %.asm into %.obj and place in the Library) ML /c %1.ASM /nologo LIB FLIBSCI.LIB %1.obj /nologo

NTCTOASM.BAT (compile %.c into %.asm) CL -c -G6 /Ox -nologo -Fa%1.ASM %1.c

NTCTOOBJ.BAT (compile %.c into %.obj using the Visual C++ Compiler) CL -c -G6 /Ox -nologo -Fo%1.obj %1.c

NTCTOLIB.BAT (compile %.c into %.obj and place in Library) CL -c -G6 /Ox -nologo -Fo%1.obj %1.c LIB FLIBSCI.LIB %1.obj /nologo

NTCTODLL.BAT (compile %.c into %.obj and create %.dll) DLLPREP %1 CL -c -Gz -G6 /Ox -nologo -Fa%1.ASM %1.c ML /c /nologo %1.ASM LINK %1.OBJ LIBC.LIB /DEF:%1.DEF /DLL /nologo

NTFTOASM.BAT (translate %.for into %.c then compile %.c into %.asm) PFC %1 BO PFCPREP %1 CL -c -G6 /Ox -nologo -Fa%1.ASM %1.c

NTFTOOBJ.BAT (translate %.for into %.c then compile %.c into %.obj) PFC %1 BO PFCPREP %1 CL -c -G6 /Ox -nologo -Fo%1.obj %1.c

NTFTOLIB.BAT (translate %.for into %.c, compile into %.obj PFC %1 BO PFCPREP %1 CL -c -G6 /Ox -nologo -Fo%1.obj %1.c LIB FLIBSCI.LIB %1.obj /nologo

NTFTODLL.BAT (translate %.for into %.c, compile into %.obj, create %.dll) PFC %1 BO PFCPREP %1 DLLPREP %1 CL -c -Gz -G6 /Ox -nologo -Fa%1.ASM %1.c ML /c /nologo %1.ASM LINK %1.OBJ LIBC.LIB /DEF:%1.DEF /DLL /nologo

NDFTOASM.BAT (compile %.for into %.asm using the Microway F-77 Compiler) SET NDP=. SET BIN=. SET LIB=. SET INCLUDE=. MFIW -S -x928 -on %1.FOR NDPPREP %1

NDFTOOBJ.BAT (compile %.for into %.obj using the Microway F-77 Compiler) SET NDP=. SET BIN=. SET LIB=. SET INCLUDE=. MFIW -S -x928 -on %1.FOR NDPPREP %1 ML /c /nologo %1.ASM

NDFTOLIB.BAT (compile %.for into %.obj and place it in the library) SET NDP=. SET BIN=. SET LIB=. SET INCLUDE=. MFIW -S -x928 -on %1.FOR NDPPREP %1 ML /c /nologo %1.ASM LIB FLIBSCI.LIB %1.obj /nologo

Other Considerations

One nasty problem remained: how to handle memory allocation. The C language allows 1) local to an individual routine 2) global to this source module, only and 3) global storage, known to all. Local variable values have a nasty habit of going away after leaving the routine, as they are normally placed on the stack.

The Fortran language allows for 1a) local variables on the stack and 1b) local variable placed in memory. 2 and 3) global storage which is set up in named blocks of storage, accessable by anyone who knows and uses the name. Storage is allocate as extern block of type char.