Social Icons

Thursday, November 28, 2013

UNDERSTAND PRE-PROCESSOR.............ARTICLE 13

  PRE PROCESSOR

We have already seen how to compile a program using a gcc compiler and each step involved in the process of compilation. The gcc is a compiler tool chain, i.e its a set of tools or a collection of tools. Gcc acts as a wrapper for all these tools. In the process of compilation we came across 4 steps on how gcc compiles a program into an executable binary output file. (refer my old article stages of compilation)

                                                       The first phase of the compilation process is the pre-processing phase where the given input program will pre-processed to generate a specific output which will be taken as input by the other tools involved in the compilation process. Lets now study about the pre-processor and learn how it processes the .c file.

WHAT IS MEANT BY PRE-PROCESSING???

Pre-processing can be generally as a process where we prepare any entity for some specific application so that it produces a better outcome. In relevance to computer science, pre-processing can be explained as a process where we take an input data or program make certain optimizations and  modifications
and produce an output which will be used by an other application.

PRE-PROCESSING IN GCC

The pre-processing phase is the first and foremost step in the stages of compilation. The pre-processor will process all the " # " derivatives in the program like #include, #define, #ifdef, #endif etc and substitute them with the necessary code. This is the general functionality of the pre-processor. 

 TOOL USED FOR PRE-PROCESSING

The tool used by the gcc to perform the pre-processing operation is called as pre-processor. When we give a program as input to the gcc it will straight away convert the program into an binary executable but, if we can also instruct the gcc to stop after performing the first phase of compilation. 

                                                      Consider the below C program add.c. Its a very simple program for addition of two number.
The above program is a very simple program which will take two numbers a and b,  add the values and store the output in the variable c. The program has only a single print statement. If we are to compile this program normally we use the command gcc <<filename.c>>  and an executable binary file a.out will be a created.

                                           Let us perform only the first stage of compilation by instructing the gcc to perform only the pre-processing. Now the gcc invokes the pre-processor and then it stops. Now what we are going to do is simply type the following command below

command:   gcc <<filename.c>>   -E  -o  <<outputfilename.i>>.

Here the gcc is provided with a special flag " -E "  after the filename. This flag will instruct the gcc to invoke the pre-processor and create an outputfile with a " .i " extension. This extension specifies that its a pre-processed file.

Lets pre-process the code for adding the two numbers.

Here what I have done above is I pre-processed the file add.c and pushed the output into a file add1.i.
If you observe carefully the above screen-shot its clear that when I did an ls command its displaying a file called add.i in green font which is the pre-processed file.
                                                             The is a command called " file " which will take the file name as input and display the file type as the output.
Lets check the filetype of add1.i

 Here its showing that add1.i is  an ASCII text file, which means it can be opened using a regular text editor. 
lets open the  add1.i  pre-processes file using the vim text editor. The command to open add1.i is 
vim add1.i
cd temp



The above screen-shot displays only a small part of the entire file. I have scrolled down about 19% of the entire file which is shown at the below right end of the screen-shot. Check out the below video.

In this video I am showing you the entire pre-processed output of the add.c program. The pre-processor processed all the " # " derivatives i.e in this context the # include <stdio.h> line was processed by the pre-processor. At the end of the below clip I will clearly show you that only the # derivatives were only processed by the pre-processor and the rest of the code is completely untouched by the pre-processor.




 Hence we can conclude that pre-processor will only understand the # derivatives and it will process all the lines which starts with " # ". The contents of the header file stdio.h was replaced into the program as the compilers cant understand the header files. So, the pre-processor processes the macros for the compiler for compilation.
  

HOW PRE-PROCESSORS LOOKS FOR THE FILE AND PROCESSES

Well by now you might have understood what is the functionality of the pre-processor. Let us now learn how the pre-processor looks for the file and what are conditions are evaluated by the pre-processor.
There is a flag called " -v " called verbose. Most of the commands  can work with this flag. This will give us the verbose output along with the output of the command. The verbose output means the compiler will give comments regarding what all operations it performed while generating the output.
                                                            By using this -v flag we can check what the pre-processor is actually doing while generating the pre-processed file.

command:     gcc add.c -E -v -o add1.i


 The output generated by the above command will look like this.

  This is what the pre-processor does while building the pre-processed output of a c program. Now I know its looking very messy and difficult to follow. The first operation the pre-processor does is, it reads a file called  " SPECS ". 
  It is the specifications file which will tell the pre-processor what tasks to perform. The next step gcc does it will look for the target platform i.e for which platform and processor it should start building the iouput. Here in my system its showing i686-linux-gnu, which is my architecture. I hope I don't have to place a screen-shot of that line. Its specified in the fourth line. The next few lines indicate that the machine is verifying all the system dependent components and libraries required. Now when all the verifications and validations are over it will invoke the pre-processor tool called " CC1 ". 
                                                                 CC1 is the pre-processor tool which will convert all the macros and replace the content of the header files into the add1.i file. In the below screen shot I have highlighted the CC1 tool.

The gcc will take the flags specified by us manually and the rest of the flags will be passed by the gcc implicitly. If you carefully look, you can find the compiler path which indicated where the compiler will search for the libraries. I will highlight some of the predefined library paths which is displayed in the output.

I hope I completed all the major topics related to the pre-processor. If I happen to learn anything new about the pre-processor then, I will surely add that to this article and will notify everyone about it.

Saturday, November 23, 2013

STAGES OF COMPILATION.............ARTICLE 12

GENERAL COMPILER

(You are free to skip this paragraph if you know general compilation process
The above picture shows how the steps involved in the compilation. These are the steps followed by every compiler. The first step is taking the file which contains the valid C-syntax and performing an operation  called pre-processing. After this step the output will be taken by a tool called the compiler and it will generate an assembly code. The assembly code is read by the assembler and it converts it into an object-code. The linker steps in and makes necessary modifications and converts the object code into an executable binary file or .exe file. I just went through the stages of compilation very briefly. This is the steps involved with every compiler which converts the high level language written by the user into machine instructions or the target code depending on the compiler. 

GCC COMPILER:

Lets see how the Gcc compiler works and stages involved in the compilation.

 As you observed the picture you might have got a pretty good idea that all the compilers are pretty much  similar, all of them work in a similar fashion. Our objective is to understand in detail how the Gcc works, what goes into that and what components it uses. As Linux is an OpenSource we can see each and every detail as its all open unlike the proprietary software like Turbo-C and others which hides all these steps.
                          
                                                       Whenever you present a  '.C ' file to compile and build an executable it carries out the following process of compilation in various steps. When I am mentioning the term compiler its not a single tool that I are referring, its a collection of tools. The compiler distribution called Gcc is a set of tools.

Lets take a simple C-program as an example: program which adds two numbers
 

Step 1: 

             
   The first stage of compilation in Gcc (even in any other compiler) is, when we give the .C file to the Gcc, what Gcc does is it first invokes a tool called the pre-processor. The pre-processor tool looks into the code and identifies all those line which start with   
" # "
                         
                              The syntax which are  ' # ' derivatives like  #include,  #define,  #ifdef,  #ifndef  etc, are resolved by the pre-processor. The pre-processor tries to make sense out of these lines, it tries to process and compile those ' # ' lines . Only these preprocessor directives which start with the ' # ' are resolved, rest of the source-code will remain untouched. Generally these pre-processing directives contains function prototypes which will be replaced by the compiler at the time of pre-processing phase of the compilation process.  The significance of this step will become more clear as you progress towards my future posts.

Step 2:

In the second stage of compilation Gcc uses a tool called as compiler. The input for the compiler is the output generated by the pre-processor in the first stage of compilation. The compiler takes in the pre-processed code and converts it into an assembly instruction set. It does not generate the machine code straight away. It generates the assembly instructions for the architecture we are compiling the code. This is because the assembly instructions are architecture specific. It means that that different architecture's like X86, ARM uses and understands only specific assembly code. Below is a screen shot of an assembly code in case you are not familiar with assembly.
  
                        
  Here this is an assembly instructions which my Gcc compiler generated for my X86 architecture.


Don't get confused looking at all those code. It will be pretty confusing for a beginner to understand it. As you progress through the process of learning you will slowly understand. The %eax, %edx, %ecx,... etc you are seeing are all in-fact registers we use to store the values.

                        All these instructions will be executed in a stack.


Note:-
The term compilation and compiler are totally different terms. Compiler is a tool and compilation is the whole process which is required to convert a C-program to an executable output.








Step 3:

In the third stage of compilation we will use a tool called assembler. This tool takes the output generated by the compiler as the input. The assembler takes in the assembly code and converts it into machine instructions. These machine instructions are for the C.P.U. The processor will understand only machine instructions and its the job of the assembler to generate the machine instructions. The output generated by the assembler is the object code.
                                         
                                                 Generally the machine instructions are kinda difficult to understand, but if you want to have a look on how these machine instructions look like, I have placed a screen shot below.


What you are seeing above is not the exactly the way it will be stored. What I mean to say is there wont be any push, pop, add, inc ,..... etc instructions that you are seeing at the extreme left in the original object code. The tool called " objdump " which I used to open this object file which is in ELF format will convert add these extra code for the users understanding only. The actual machine instructions are on the extreme right of the screen shot above. Well in case you did not understand just look the below screen shots.
This is the machine instructions which I mentioned above. Now you must be thinking that why are the machine instructions are not in 1's and 0's.

We have learned that the processor does not understand any  format except 0's and 1's. Well thats true but the tool called objdump which I used to open the object file will convert the machine instructions to Hex format for us to understand.
 The numbers you are seeing is actually the opcode or operational code which the processor will understand. the processor cannot understand human language so opcode is used.... for example for some processors 58 number means add or subtract or multiply or some other operation. It all varies between processors. Check out your processor manual for your processor specific opcode instruction set.



These instructions you are seeing in the screen shot  are also added by the objdump tool for the user to get a better understanding of the machine instructions. 
Lets get back to the stages of compilation.









 Step 4:

In this step a tool called linker steps in takes the object code and converts in into binary executable file. It has much more functions than just creating an executable file but all that we will see soon in the future.

The above steps are same for all the compilers in the market. The proprietary compilers  like Turbo-c and other compilers which is not free unlike Gcc do not show all these steps. That's the beauty of Linux.