Pages

Tuesday, November 9, 2010

Biên dịch chương trình C/C++ tự động bằng việc tạo makefile

Đây là một bài viết khá hoàn chỉnh và lý thú. Qua bài này và một số tham khảo khác trên mạng, tạm copy cách làm như sau:

Preface

Compiling a program made of one source file is easy. Compiling one made of few sources is slightly annoying, but may be automated via a simple shell script. Anything larger than that would start to get on your nerves. This is where makefiles are helpful.
A makefile is a collection of instructions that should be used to compile your program. Once you modify some source files, and type the command "make" (or "gmake" if using GNU's make), your program will be recompiled using as few compilation commands as possible. Only the files you modified and those dependent upon them will be recompiled. Of-course, this is not done via usage of magic. You need to supply the rules for compiling various files and file types, and the list of dependencies between files (if file "A" was changed, then files "B", "C" and "D" also need to be re-compiled), but that only has to be done once.

The Structure Of A Makefile

A typical makefile contains lines of the following types:
  • Variable Definitions - these lines define values for variables. For example:
    
         CFLAGS = -g -Wall
         SRCS = main.c file1.c file2.c
         CC = gcc
         

  • Dependency Rules - these lines define under what conditions a given file (or a type of file) needs to be re-compiled, and how to compile it. For example:
    
         main.o: main.c
                 gcc -g -Wall -c main.c
         

    This rule means that "main.o" has to be recompiled whenever "main.c" is modified. The rule continues to tell us that in order to compile "main.o", the command "gcc -g -Wall -c main.c" needs to be executed.
    Note that each line in the commands list must begin with a TAB character. "make" is quite picky about the makefile's syntax.

  • Comments - Any line beginning with a "#" sign, or any line that contains only white-space.

Order Of Compilation

When a makefile is executed, we tell the make command to compile a specific target. A target is just some name that appears at the beginning of a rule. It can be a name of a file to create, or just a name that is used as a starting point (such a target is also called a "phony" target).
When make Is invoked, it first evaluates all variable assignments, from top to bottom, and when it encounters a rule "A" whose target matches the given target (or the first rule, if no target was supplied), it tries to evaluate this rule. First, it tries to recursively handle the dependencies that appear in the rule. If a given dependency has no matching rule, but there is a file in the disk with this name, the dependency is assumed to be up-to-date. After all the dependencies were checked, and any of them required handling, or refers to a file newer than the target, the command list for rule "A" is executed, one command at a time.
This might appear a little complex, but the example in the next section will make things clear.

Starting Small - A Single-Source Makefile Example

Lets first see a simple example of a makefile that is used to compile a program made of a single source file:


# top-level rule to create the program.
all: main

# compiling the source file.
main.o: main.c
        gcc -g -Wall -c main.c

# linking the program.
main: main.o
        gcc -g main.o -o main

# cleaning everything that can be automatically recreated with "make".
clean:
        /bin/rm -f main main.o
Few notes about this makefile:
  1. Not all rules have to be used in every invocation of make. The "clean" rule, for example, is not normally used when building the program, but it may be used to remove the object files created, to save disk space.
  2. A rule doesn't need to have any dependencies. This means if we tell make to handle its target, it will always execute its commands list, as in the "clean" rule above.
  3. A rule doesn't need to have any commands. For example, the "all" rule is just used to invoke other rules, but does not need any commands of its own. It is just convenient to make sure that if someone runs make without a target name, this rule will get executed, due to being the first rule encountered.
  4. We used the full path to the "rm" command, instead of just typing "rm", because many users have this command aliased to something else (for example, "rm" aliased to "rm -i"). By using a full path, we avoid any aliases.

Getting Bigger - A Multi-Source Makefile Example

In anything but the simplest programs, we usually have more than one source file. This is where using a makefile starts to pay off. Making a change to one file requires re-compilation of only the modified file, and then re-linking the program. Here is an example of such a makefile:


# top-level rule to compile the whole program.
all: prog

# program is made of several source files.
prog: main.o file1.o file2.o
        gcc main.o file1.o file2.o -o prog

# rule for file "main.o".
main.o: main.c file1.h file2.h
        gcc -g -Wall -c main.c

# rule for file "file1.o".
file1.o: file1.c file1.h
        gcc -g -Wall -c file1.c

# rule for file "file2.o".
file2.o: file2.c file2.h
        gcc -g -Wall -c file2.c

# rule for cleaning files generated during compilations.
clean:
        /bin/rm -f prog main.o file1.o file2.o
Few notes:
  • There is one rule here for each source file. This causes some redundancy, but later on we'll see how to get rid of it.
  • We add dependency on included files (file1.h, file2.h) where applicable. If one of these interface-definition files changes, the files that include it might need a re-compilation too. This is not always true, but it is better to make a redundant compilation, than to have object files that are not synchronized with the source code.

Using Compiler And Linker Flags

As one could see, there are many repetitive patterns in the rules for our makefile. For example, what if we wanted to change the flags for compilation, to use optimization (-O), instead of add debug info (-g)? we would need to go and change this flag for each rule. This might not look like much work with 3 source files, but it will be tedious when we have few tens of files, possibly spread over few directories.
The solution to this problem is using variables to store various flags, and even command names. This is especially useful when trying to compile the same source code with different compilers, or on different platforms, where even a basic command such as "rm" might reside in a different directory on each platform.
Lets see the same makefile, but this time with the introduction of variables:


# use "gcc" to compile source files.
CC = gcc
# the linker is also "gcc". It might be something else with other compilers.
LD = gcc
# Compiler flags go here.
CFLAGS = -g -Wall
# Linker flags go here. Currently there aren't any, but if we'll switch to
# code optimization, we might add "-s" here to strip debug info and symbols.
LDFLAGS =
# use this command to erase files.
RM = /bin/rm -f
# list of generated object files.
OBJS = main.o file1.o file2.o
# program executable file name.
PROG = prog

# top-level rule, to compile everything.
all: $(PROG)

# rule to link the program
$(PROG): $(OBJS)
        $(LD) $(LDFLAGS) $(OBJS) -o $(PROG)

# rule for file "main.o".
main.o: main.c file1.h file2.h
        $(CC) $(CFLAGS) -c main.c

# rule for file "file1.o".
file1.o: file1.c file1.h
        $(CC) $(CFLAGS) -c file1.c

# rule for file "file2.o".
file2.o: file2.c file2.h
        $(CC) $(CFLAGS) -c file2.c

# rule for cleaning re-compilable files.
clean:
        $(RM) $(PROG) $(OBJS)
Few notes:
  • We define many variables in this makefile. This will make it very easy to modify compile flags, compiler used, etc. It is good practice to define even things that might seem like they'll never change. In time - they will.
  • We still have a problem with the fact that we define one rule for each source file. If we'll want to change this rule's format, it will be rather tedious. The next section will show us how to avoid this problem.

A Rule For Everyone - Using "File Type" Rules

So, the next phase would be to eliminate the redundant rules, and try to use one rule for all source files. After all, they are all compiled in the same way. Here is a modified makefile:


# we'll skip all the variable definitions, just take them from the previous
# makefile
.
.
# linking rule remains the same as before.
$(PROG): $(OBJS)
        $(LD) $(LDFLAGS) $(OBJS) -o $(PROG)

# now comes a meta-rule for compiling any "C" source file.
%.o: %.c
        $(CC) $(CFLAGS) -c $<
We should explain two things about our meta-rule:
  1. The "%" character is a wildcard, that matches any part of a file's name. If we mention "%" several times in a rule, they all must match the same value, for a given rule invocation. Thus, our rule here means "A file with a '.o' suffix is dependent on a file with the same name, but a '.c' suffix".
  2. The "$<" string refers to the dependency list that was matched by the rule (in our case - the full name of the source file). There are other similar strings, such as "$@" which refers to the full target name, or "$*", that refers the part that was matched by the "%" character.

Automatic Creation Of Dependencies

One problem with the usage of implicit rules, is that we lost the full list of dependencies, that are unique to each file. This can be overcome by using extra rules for each file, that only contain dependencies, but no commands. This can be added manually, or be automated in one of various ways. Here is one example, using the "makedepend" Unix program.


# define the list of source files.
SRCS = main.c file1.c file2.c
.
.
# most of the makefile remains as it was before.
# at the bottom, we add these lines:

# rule for building dependency lists, and writing them to a file
# named ".depend".
depend:
        $(RM) .depend
        makedepend -f- -- $(CFLAGS) -- $(SRCS) > .depend

# now add a line to include the dependency list.
include .depend
Now, if we run "make depend", the "makedepend" program will scan the given source files, create a dependency list for each of them, and write appropriate rules to the file ".depend". Since this file is then included by the makefile, when we'll compile the program itself, these dependencies will be checked during program compilation.
There are many other ways to generate dependency lists. It is advised that programmers interested in this issue read about the compiler's "-M" flag, and read the manual page of "makedepend" carefully. Also note that gnu make's info pages suggest a different way of making dependency lists created automatically when compiling a program. The advantage is that the dependency list never gets out of date. The disadvantage is that many times it is being run for no reason, thus slowing down the compilation phase. Only experimenting will show you exactly when to use each approach.

Notes: Sorry the author of the article due to copyright.

No comments:

Post a Comment