Social Icons

Saturday, November 23, 2013

STAGES OF COMPILATION.............ARTICLE 12

GENERAL COMPILER

(You are free to skip this paragraph if you know general compilation process
The above picture shows how the steps involved in the compilation. These are the steps followed by every compiler. The first step is taking the file which contains the valid C-syntax and performing an operation  called pre-processing. After this step the output will be taken by a tool called the compiler and it will generate an assembly code. The assembly code is read by the assembler and it converts it into an object-code. The linker steps in and makes necessary modifications and converts the object code into an executable binary file or .exe file. I just went through the stages of compilation very briefly. This is the steps involved with every compiler which converts the high level language written by the user into machine instructions or the target code depending on the compiler. 

GCC COMPILER:

Lets see how the Gcc compiler works and stages involved in the compilation.

 As you observed the picture you might have got a pretty good idea that all the compilers are pretty much  similar, all of them work in a similar fashion. Our objective is to understand in detail how the Gcc works, what goes into that and what components it uses. As Linux is an OpenSource we can see each and every detail as its all open unlike the proprietary software like Turbo-C and others which hides all these steps.
                          
                                                       Whenever you present a  '.C ' file to compile and build an executable it carries out the following process of compilation in various steps. When I am mentioning the term compiler its not a single tool that I are referring, its a collection of tools. The compiler distribution called Gcc is a set of tools.

Lets take a simple C-program as an example: program which adds two numbers
 

Step 1: 

             
   The first stage of compilation in Gcc (even in any other compiler) is, when we give the .C file to the Gcc, what Gcc does is it first invokes a tool called the pre-processor. The pre-processor tool looks into the code and identifies all those line which start with   
" # "
                         
                              The syntax which are  ' # ' derivatives like  #include,  #define,  #ifdef,  #ifndef  etc, are resolved by the pre-processor. The pre-processor tries to make sense out of these lines, it tries to process and compile those ' # ' lines . Only these preprocessor directives which start with the ' # ' are resolved, rest of the source-code will remain untouched. Generally these pre-processing directives contains function prototypes which will be replaced by the compiler at the time of pre-processing phase of the compilation process.  The significance of this step will become more clear as you progress towards my future posts.

Step 2:

In the second stage of compilation Gcc uses a tool called as compiler. The input for the compiler is the output generated by the pre-processor in the first stage of compilation. The compiler takes in the pre-processed code and converts it into an assembly instruction set. It does not generate the machine code straight away. It generates the assembly instructions for the architecture we are compiling the code. This is because the assembly instructions are architecture specific. It means that that different architecture's like X86, ARM uses and understands only specific assembly code. Below is a screen shot of an assembly code in case you are not familiar with assembly.
  
                        
  Here this is an assembly instructions which my Gcc compiler generated for my X86 architecture.


Don't get confused looking at all those code. It will be pretty confusing for a beginner to understand it. As you progress through the process of learning you will slowly understand. The %eax, %edx, %ecx,... etc you are seeing are all in-fact registers we use to store the values.

                        All these instructions will be executed in a stack.


Note:-
The term compilation and compiler are totally different terms. Compiler is a tool and compilation is the whole process which is required to convert a C-program to an executable output.








Step 3:

In the third stage of compilation we will use a tool called assembler. This tool takes the output generated by the compiler as the input. The assembler takes in the assembly code and converts it into machine instructions. These machine instructions are for the C.P.U. The processor will understand only machine instructions and its the job of the assembler to generate the machine instructions. The output generated by the assembler is the object code.
                                         
                                                 Generally the machine instructions are kinda difficult to understand, but if you want to have a look on how these machine instructions look like, I have placed a screen shot below.


What you are seeing above is not the exactly the way it will be stored. What I mean to say is there wont be any push, pop, add, inc ,..... etc instructions that you are seeing at the extreme left in the original object code. The tool called " objdump " which I used to open this object file which is in ELF format will convert add these extra code for the users understanding only. The actual machine instructions are on the extreme right of the screen shot above. Well in case you did not understand just look the below screen shots.
This is the machine instructions which I mentioned above. Now you must be thinking that why are the machine instructions are not in 1's and 0's.

We have learned that the processor does not understand any  format except 0's and 1's. Well thats true but the tool called objdump which I used to open the object file will convert the machine instructions to Hex format for us to understand.
 The numbers you are seeing is actually the opcode or operational code which the processor will understand. the processor cannot understand human language so opcode is used.... for example for some processors 58 number means add or subtract or multiply or some other operation. It all varies between processors. Check out your processor manual for your processor specific opcode instruction set.



These instructions you are seeing in the screen shot  are also added by the objdump tool for the user to get a better understanding of the machine instructions. 
Lets get back to the stages of compilation.









 Step 4:

In this step a tool called linker steps in takes the object code and converts in into binary executable file. It has much more functions than just creating an executable file but all that we will see soon in the future.

The above steps are same for all the compilers in the market. The proprietary compilers  like Turbo-c and other compilers which is not free unlike Gcc do not show all these steps. That's the beauty of Linux.
 

No comments:

Post a Comment