KEMBAR78
GEM - GNU C Compiler Extensions Framework | PPT
GCC Hacks Alexey Smirnov GRC’06 http://gcchacks.info
Introduction GNU Compiler Collection includes C, C++, Java, etc. compilers and libraries for them Standard compiler for Linux Latest release: GCC 4.1 http://gcc.gnu.org
Introduction GEM – compiler extensibility framework Examples: syntactic sugar, BCC, Propolice, etc. Dynamically loaded modules simplify development and deployment
Overview GCC 3.4 Tutorial GEM Overview Hacks, hacks, hacks.
GCC Architecture Driver program  gcc . Finds appropriate compiler. Calls compiler, assembler, linker C language: cc1, as, collect2 This presentation: cc1
GCC Architecture Front end, middle end, back end.
Representations AST – abstract syntax tree RTL – register transfer language Object – assembly code of target platform Other representations used for optimizations
GCC Initialization cc1 is preprocessor and compiler toplev.c: toplev_main()  command-line option processing, front end/back end initialization, global scope creation Front end is initialized with standard types:  char_type_node ,  integer_type_node ,  unsigned_type_node . built-in functions:  builtin_memcpy ,  builtin_strlen These objects are instances of  tree .
Tree data type Code, operands. MODIFY_EXPR  – an assignment expression. TREE_OPERAND(t,0), TREE_OPERAND(t,1) ARRAY_TYPE  – declaration of type. TREE_TYPE(t) – type of array element, TYPE_DOMAIN(t) – type of index. CALL_EXPR  – function call. TREE_OPERAND(t,0) – function definition, TERE_OPERAND(t,1) – function arguments. debug_tree()  prints out AST
Parser Identifier after identifier get_identifier()  char* -> tree with IDENTIFIER_NODE code. A declaration is a tree node with _DECL code.  lookup_name()  returns declaration corresponding to the symbol Symbol table not constructed. C_DECL_INVISIBLE attribute used instead.
AST to RTL to assembly start_decl()  / finish_decl() start_function() / finish_function() tree  build_function_call (tree function, tree params) When a function is parsed it is converted to RTL immediately or after the file is parsed. Option –funit-at-a-time finish_function() Assembly code is generated from RTL.  output_asm_insn()  is executed for each instruction
GEM Framework The idea is similar to that of LSM Module loaded using an option: -fextension-module=test.gem Hooks throughout GCC code AST  Assembly output New hooks added when needed
GEM Framework Hooks gem_handle_option gem_c_common_nodes_and_builtins gem_macro_name, gem_macro_def gem_start_decl, gem_start_func gem_finish_function gem_output_asm_insn
Traversing an AST walk_tree static tree callback(tree *tp, …) { switch (TREE_CODE(*tp)) { case CALL_EXPR: … case VAR_DECL: … } return NULL_TREE; } walk_tree(&t, callback, NULL, NULL);
Creating trees t =build_int_2(val, 0); build1(ADDR_EXPR, build_pointer_type(T_T(t)), t); build(MODIFY_EXPR, TREE_TYPE(left), left, val);
Hacks Syntactic sugar Operating systems Security
Syntactic Sugar When a compiler error occurs, fix compiler rather than program. Examples: Function overloading as in C++ toString() in each structure as in Java Invoke block of code from a function Ruby Use functions to initialize a variable Default argument values
Security DIRA: detection, identification, and repair of control hijacking attacks PASAN: signature and patch generation Propolice -fstack-protector
Operating Systems Dusk: develop in userland, install at kernel level.
Function Overloading Two functions: void add(int, int); void add(int, char*); The idea is to replace function name so that it includes argument types: add_i_i add_i_pch gem_start_decl() gem_start_function() gem_build_function_call()
Alias Each Declaraiton cfo_find_symtab(&t_func, func_name);  if (t_func==NULL_TREE || DECL_BUILT_IN(t_func)) { return; }   If found then alias and create new declaration.
Alias Each Declaration strcpy(new_name, func_name); strcat(new_name, cfo_build_name(TREE_PURPOSE(T_O(declarator, 1)))); cfo_find_symtab(&t_func_alias, name);   If not found: t_alias_attr=tree_cons(get_identifier("alias"), tree_cons(NULL_TREE, get_identifier(name), NULL_TREE), NULL_TREE); TYPE_ATTRIBUTES(T_T(t_func)) = t_alias_attr; DECL_ATTRIBUTES(t_func)=t_alias_attr;   T_O(declarator,0) = get_identifier(new_name);
Replace function calls name = cfo_build_decl_name(t_func, t_parm);   t_new_func = get_identifier(name);   if (t_new_func) { t_new_func = lookup_name(t_new_func); }   *func = t_new_func;
Conclusion GCC is a big program so we thought it’s a good idea to document it: http://en.wikibooks.org/GNU_C_Compiler_Internals GEM allows to implement GCC extensions.   http://www.ecsl.cs.sunysb.edu/gem Examples: programming languages, security, OS.
Thank you http://gcchacks.info

GEM - GNU C Compiler Extensions Framework

  • 1.
    GCC Hacks AlexeySmirnov GRC’06 http://gcchacks.info
  • 2.
    Introduction GNU CompilerCollection includes C, C++, Java, etc. compilers and libraries for them Standard compiler for Linux Latest release: GCC 4.1 http://gcc.gnu.org
  • 3.
    Introduction GEM –compiler extensibility framework Examples: syntactic sugar, BCC, Propolice, etc. Dynamically loaded modules simplify development and deployment
  • 4.
    Overview GCC 3.4Tutorial GEM Overview Hacks, hacks, hacks.
  • 5.
    GCC Architecture Driverprogram gcc . Finds appropriate compiler. Calls compiler, assembler, linker C language: cc1, as, collect2 This presentation: cc1
  • 6.
    GCC Architecture Frontend, middle end, back end.
  • 7.
    Representations AST –abstract syntax tree RTL – register transfer language Object – assembly code of target platform Other representations used for optimizations
  • 8.
    GCC Initialization cc1is preprocessor and compiler toplev.c: toplev_main() command-line option processing, front end/back end initialization, global scope creation Front end is initialized with standard types: char_type_node , integer_type_node , unsigned_type_node . built-in functions: builtin_memcpy , builtin_strlen These objects are instances of tree .
  • 9.
    Tree data typeCode, operands. MODIFY_EXPR – an assignment expression. TREE_OPERAND(t,0), TREE_OPERAND(t,1) ARRAY_TYPE – declaration of type. TREE_TYPE(t) – type of array element, TYPE_DOMAIN(t) – type of index. CALL_EXPR – function call. TREE_OPERAND(t,0) – function definition, TERE_OPERAND(t,1) – function arguments. debug_tree() prints out AST
  • 10.
    Parser Identifier afteridentifier get_identifier() char* -> tree with IDENTIFIER_NODE code. A declaration is a tree node with _DECL code. lookup_name() returns declaration corresponding to the symbol Symbol table not constructed. C_DECL_INVISIBLE attribute used instead.
  • 11.
    AST to RTLto assembly start_decl() / finish_decl() start_function() / finish_function() tree build_function_call (tree function, tree params) When a function is parsed it is converted to RTL immediately or after the file is parsed. Option –funit-at-a-time finish_function() Assembly code is generated from RTL. output_asm_insn() is executed for each instruction
  • 12.
    GEM Framework Theidea is similar to that of LSM Module loaded using an option: -fextension-module=test.gem Hooks throughout GCC code AST Assembly output New hooks added when needed
  • 13.
    GEM Framework Hooksgem_handle_option gem_c_common_nodes_and_builtins gem_macro_name, gem_macro_def gem_start_decl, gem_start_func gem_finish_function gem_output_asm_insn
  • 14.
    Traversing an ASTwalk_tree static tree callback(tree *tp, …) { switch (TREE_CODE(*tp)) { case CALL_EXPR: … case VAR_DECL: … } return NULL_TREE; } walk_tree(&t, callback, NULL, NULL);
  • 15.
    Creating trees t=build_int_2(val, 0); build1(ADDR_EXPR, build_pointer_type(T_T(t)), t); build(MODIFY_EXPR, TREE_TYPE(left), left, val);
  • 16.
    Hacks Syntactic sugarOperating systems Security
  • 17.
    Syntactic Sugar Whena compiler error occurs, fix compiler rather than program. Examples: Function overloading as in C++ toString() in each structure as in Java Invoke block of code from a function Ruby Use functions to initialize a variable Default argument values
  • 18.
    Security DIRA: detection,identification, and repair of control hijacking attacks PASAN: signature and patch generation Propolice -fstack-protector
  • 19.
    Operating Systems Dusk:develop in userland, install at kernel level.
  • 20.
    Function Overloading Twofunctions: void add(int, int); void add(int, char*); The idea is to replace function name so that it includes argument types: add_i_i add_i_pch gem_start_decl() gem_start_function() gem_build_function_call()
  • 21.
    Alias Each Declaraitoncfo_find_symtab(&t_func, func_name); if (t_func==NULL_TREE || DECL_BUILT_IN(t_func)) { return; } If found then alias and create new declaration.
  • 22.
    Alias Each Declarationstrcpy(new_name, func_name); strcat(new_name, cfo_build_name(TREE_PURPOSE(T_O(declarator, 1)))); cfo_find_symtab(&t_func_alias, name); If not found: t_alias_attr=tree_cons(get_identifier("alias"), tree_cons(NULL_TREE, get_identifier(name), NULL_TREE), NULL_TREE); TYPE_ATTRIBUTES(T_T(t_func)) = t_alias_attr; DECL_ATTRIBUTES(t_func)=t_alias_attr; T_O(declarator,0) = get_identifier(new_name);
  • 23.
    Replace function callsname = cfo_build_decl_name(t_func, t_parm); t_new_func = get_identifier(name); if (t_new_func) { t_new_func = lookup_name(t_new_func); } *func = t_new_func;
  • 24.
    Conclusion GCC isa big program so we thought it’s a good idea to document it: http://en.wikibooks.org/GNU_C_Compiler_Internals GEM allows to implement GCC extensions. http://www.ecsl.cs.sunysb.edu/gem Examples: programming languages, security, OS.
  • 25.