Flow Sensitive Information Flow Analysis for C Programs

We have re-cently modified its typing discipline to flow sensitive, while those of the other realistic information secure compiler implementations for Java[5] and ML[8] are flow insensit

Trang 1

Flow Sensitive Information Flow Analysis

for C Programs

Jun Furuse1

, Dzung Dinh-Khac2

, and Viet Ha Nguyen2

1

Graduate School of Information Science and Technology, the University of Tokyo

2 A.N.Lab Joint Stock Company, Vietnam

Abstract VITC compiler aims to provide information security to legacy

C applications, using type based information flow analysis We have re-cently modified its typing discipline to flow sensitive, while those of the other realistic information secure compiler implementations for Java[5] and ML[8] are flow insensitive This is because local states in C are too frequently stored in global variables such as errno

Language based information flow analysis verifies non-interference property of programs: roughly speaking, a non-interferent program cannot leak its secret information to attackers unintentionally It is a very strong measure against program security holes, however, information flow analysis has not been well applied to C language, which is one of the most popular targets for attackers who try to steal secrets

The goal of our VITC compiler project is to secure existing C applications by providing this information flow based security to C language Our static/dynamic type systems track down information flow of a C program annotated by secrecy specifications and verify its non-interference property according to them

In this short paper, after brief discussion on VITC’s key features, we explain our recent achievement: VITC’s flow sensitive information flow typing system, which is mandatory to analyze information flow of realistic C applications Such imperatively written programs often use global variables such as errno to store program states, which cannot be flow-insensitively typed well

VITC programs are annotated with security specifications using the lattice model

of security levels[2], where the security level constants ℓ called labels form a lattice (L, ⊑):

xRULE(L < H) // specification of the security lattice

int xSEC(L) l = 0; int xSEC(H) h = 42; l = h + 1;

Declaration xRULE(L < H) specifies the finite lattice L = {L, H} where L ⊑ H,

L for the lower secrecy and H for the higher Using C type attribute syntax,

Trang 2

macro xSEC(ℓ) specifies that variables l and h store information of lower(L) and higher(H) secrecy respectively Using these specifications, our static type system detects that the assignment l = h + 1 illegally leaks the higher secrecy information derived from h to l of the lower secrecy3

So-called implicit flows are also tracked

so that insecure codes like if(h){ l = 1;} can be properly rejected

To make this type system track the information flow correctly, the programs must be compiled by memory safe C compilation[6, 7] Once memory-secured, C becomes a very imperative functional language Thus, our static type system is partially based on one for ML[8]: to handle functional aspects of C, C functions may have polymorphic security types, for example

Even with the assumption of memory safety, static analyses of C programs are very hard, due to its type casts This is also true for information flow analysis, and some flows around type casts must be checked dynamically at run-time For example, when an expression e of type int is type-casted to a pointer type (int

*)e, we may not be able to statically determine how secret its content is our compiler enforces programmers to write an explicit annotation here like (int xSEC(L)*)eto embed code which dynamically type-checks whether the result

of e is a pointer to lower security information or not

Even when these checks for memory safety and the dynamic typing fail,

a VITC program must continue its execution in a failure oblivious manner[3], rather than simple fail-safe abortion It is since careless program termination may leak secrecy to whom observes the termination: for example, a termination

of code execution if(h){ e; } at e gives a clue that h had a non-zero value

3 Flow sensitive analysis

3.1 A motivating example: errno

Until recently VITC type system was flow insensitive as [5, 8], that is, a variable has a fixed secrecy in different contexts It was acceptable as far as we type-check very simple examples This breaks once we tried to compile more realistic applications with global variables like errno To demonstrate the problem, let

us consider the following program:

int errno; /* Global variable */

int main()

{ int xSEC(H) h; /* variable of higher secrecy */

int xSEC(L) l; /* variable of lower secrecy */

if(h) { errno = 1; }

errno = 0;

l = errno; }

Let errno be of secrecy ℓ It is easily seen that after the if-statement, ℓ = H (or higher than H) due to the indirect information flow from h to errno

3 Typing of a variable with an xSEC specification is done flow-insensitively

Trang 3

In the flow insensitive information flow analysis, the secrecy of errno is fixed throughout the program Therefore, it reports an error for the assign-ment l = errno since a flow from H(higher) to L(lower) is forbidden How-ever, in practice the example should be analyzed well since after the assignment errno = 0, the variable carries no information of higher secrecy In C, lacking modern language functionalities such as exceptions, such a global variable like errnois often used to store states which are just locally meaningful This kind of use of global variables for temporal states prevents secure programs from being typed with flow insensitive information flow analysis

Flow sensitive information flow analysis gives a solution to this problem since variables can have different secrecy after assignments After the assign-ment errno = 0, the secrecy of errno can be lowered to L If the code between errno = 0 and l = errno do not raise the secrecy of errno, the last assignment does not raise any error since it just induce a flow from L to L

In literature, there have been a number of approaches to flow-sensitive infor-mation flow analysis, e.g [1, 4] Although they give nice theoretical results, they

do not consider sub-functions, which are very common in C programs, and thus that makes them less practical We argue that our system, which is presented subsequently, is more useful as we allows for functions

3.2 The language

Syntax For our formalization, we first define a small C-like language which supports global variables, conditionals, function declarations and function calls:

e::= Expressions

| n Constants

| x | f Variables

| f (e) Function calls

s::= Statements

| x := e Assignments

| s; s Sequences

| if e then s else s Conditionals

t::= int | char |

d::=

| t x = n; Variable decls

| tℓ

x= n; Variables with specs

| t f (t x) {s; return e; } Function decls

p::= d d Programs

The language can have security level constants ℓ only at variable declarations

tℓ

x = n; (ex int xSEC(L) x = 0;) Such variables with levels give security specifications of a program, and their typing is flow insensitive while the others are typed flow sensitively

Types, constraints, conditions, and subtyping Types τ in our type system

is fully annotated with flow types λ, which is either a level constant ℓ or a type variable α for polymorphism Functional type τ → τ is annotated with its effectπ π: the security lower bound of side effects inside the function:

λ, π::= Flow types

| ℓ Level constants

| α Type variables

τ ::= Mono-types

| tλ

t= int, char

| τ→ τπ Functional types

Trang 4

As we allow variables in types, it is necessary to have a formal manner to express the ordering relation between type variables and flow types In our sys-tem, it is represented by type constraints (or constraints for short) of the form

k::= λ1 ⊑ λ2 A set of constraints K forms a trivial constraint system and we write K ⊢ λ1⊑ λ2when λ1⊑ λ2 is inferable from K

Similarly to [4], to allow types to be flow-sensitive, the type of a variable must be able to ”vary”, i.e a variable may have different types before and after a statement (especially an assignment) is executed Therefore, to keep track of such types of variables during program execution, we must have conditions C, which are partial maps from variables x to mono-types τ Typing of statements must be annotated with respectively pre- and post-conditions in order to represent types

of variables before and after the statement is executed Moreover, differently

to [4], as we allow for function calls in expressions where functions may have different and post-conditions, each expression is also annotated with a pre-and a post-condition with the same meaning

Partial order between security levels ℓ is naturally extended to the following subtyping relationships between types and conditions The subtyping of function parameters is contra-variant:

K⊢ λ ⊑ λ′

K⊢ tλ

⊑ tλ ′

K⊢ λ1⊑ λ′

1 K⊢ λ′

2⊑ λ2

K⊢ π′⊑ π

K⊢ tλ 2 2 π

→ tλ 1

1 ⊑ tλ′2 2

π ′

→ tλ′1 1

Dom(C1) = Dom(C2)

∀x ∈ Dom(C1) K ⊢ C1(x) ⊑ C2(x)

K⊢ C1⊑ C2

It now suffices to define polymorphic functional types of the form ∀α1 αn[K] {Cpre} τ1

π

→ τ2{Cpost} Intuitively, it states that for all type variables αi’s which satisfy the constraint set K, if the pre-condition is Cpre, then the corresponding function works as typed τ1

π

→ τ2 and modifies the condition to Cpost

3.3 Typing rules

Our flow sensitive typing system depicted in Appendix A is a mixture and an extension of a flow sensitive type system for while programs [4] and a flow insen-sitive polymorphic constraint typing for ML [8] Type judgments for expressions and statements take the form of K, π, Γ ⊢ {C1} · {C2}, with two kinds of typing environments: C is a condition, a flow sensitive environment which mem-orizes the types of flow sensitive variables Γ is the flow insensitive counterpart,

a partial mapping from variables to polymorphic types or mono-types, which

is for functions and flow insensitive variables annotated with security specifica-tions K is a set of constraints which are requirements between type variables and level constants for the judgment π is so-called “program counter” to de-note the secrecy of program execution flow Unlike the based type systems, conditions and a program counter also appear in the judgment for expressions

K, π, Γ ⊢ {C1} e : τ {C2}, since we suppose function calls with side effects may occur inside the expression e

The core of flow sensitivity is the rule t-Asgn: types of flow sensitive vari-ables tλ ′

are modified after assignments, to those of assigned values tλ

Thus

Trang 5

errnocan have different security levels at different point Apart from this rule, pre- and post-conditions must join correctly at each computation step On the other hand, assignments to flow insensitive variables with security specifications are typed not by t-Asgn but by t-AsgnInsens This is very similar to the classical flow insensitive typings: the type of the variable at an assignment must

be equal to the assigned value and is never modified

The type of a function is also flow insensitive, therefore it is also bound in Γ , with a polymorphic type Its polymorphic type is instantiated (t-Instantiate) then applied (t-FunCall) for each application independently, in order to achieve polymorphism The type instantiation S must be meaningful: S(K) must be sat-isfiable (|= S(K)) and must not contain contradictive constraints like H ⊑ L The judgment for declarations has a form of K0, Γ, C0 ⊢ d Since all the definitions are declared at the top-level once and for all, we have no notion of the program counter nor pre- and post-conditions but the initial condition C0 The global constraints K0 are the constraints which must hold throughout the program and be satisfiable

t-FunDecl is to type a function declaration Function body s; return e; must be typed under a constraint set K0+ Kα and the pre-condition extended for the function argument x The free variables α introduced in the typing of body are the targets of type generalization, and the extended part of the constraint set Kα must relate with the generalized variables α In the polymorphic type

of the function, these generalized variables quantify the mono-type, pre- and post-conditions and the extended part of the constraint Kα

4 Type inference and implementation issues

The type inference algorithm is almost automatically obtained, using the typing rules bottom-up, then checking satisfiability of constraints A problem arises at function declarations, since t-FunDecl uses polymorphic recursion, which re-quires complex inference Currently our algorithm does not support polymorphic recursion: a recursive function is typed monomorphic inside its body

Our implementation based on the formalization types various errno examples like one in Section 3.1 well Sometimes programmers are forced to lower the secrecy level of errno by inserting reset assignments like errno = 0, but it is easy and not comparable against the obtained information security

Flow sensitive typing is required for information flow analysis for C programs, since they often use global variables such as errno in order to store states that are just locally meaningful We have formalized and implemented such a flow sensitive polymorphic information flow typing system for C

We leave a formal proof for soundness of the system as a future work Frankly speaking, we believe that it is not very difficult to show that since the proof in [4] can be adjusted to one for our system

Trang 6

Currently flow sensitivity is only permitted for pure integers, and pointers are typed flow insensitively Flow sensitivity for pointers is left as a future work, which will require detailed pointer analysis as pointed out in [1]

References

1 David Clark, Chris Hankin, and Sebastian Hunt Information flow for algol-like languages Computer Languages, 28(1), 2002

2 Dorothy E Denning A lattice model of secure information flow Commun ACM, 19(5):236–243, 1976

3 Martin Rinard et al Enhancing server availability and security through failure-oblivious computing, December 2004

4 Sebastian Hunt and David Sands On flow-sensitive security types In Proc Prin-ciples of Programming Languages, 33rd Annual ACM SIGPLAN - SIGACT Sym-posium (POPL’06), pages 79–90, Charleston, South Carolina, USA, January 2006 ACM Press

5 Andrew C Myers JFlow: Practical mostly-static information flow control In Symposium on Principles of Programming Languages, pages 228–241, 1999

6 George C Necula, Scott McPeak, and Westley Weimer CCured: type-safe retrofitting of legacy code In Symposium on Principles of Programming Languages, pages 128–139, 2002

7 Yutaka Oiwa, Tatsurou Sekiguchi, Eijiro Sumii, and Akinori Yonezawa Fail-safe ANSI-C compiler: An approach to making C programs secure (progress report), Fev 2003

8 Francois Pottier and Vincent Simonet Information flow inference for ML In Sym-posium on Principles of Programming Languages, pages 319–330, 2002

Trang 7

A Flow sensitive typing rules

K, π, Γ⊢ {C} n : tλ

{C} (t-Const) C(x) = τ or Γ (x) = τ

K, π, Γ ⊢ {C} x : τ {C}(t-Var)

Γ(f ) = ∀α[K] {C1} τ {C2} |= S(K)

S(K), π, Γ ⊢ {S(C1)} f : S(τ ) {S(C2)}

(t-Instantiate)

K, π, Γ ⊢ {C1} e : τ {C2}

K, π, Γ ⊢ {C2} f : τ→ τπ ′{C3}

K, π, Γ⊢ {C1} f (e) : τ′{C3}

(t-FunCall)

K, π, Γ⊢ {C1} e : τ {C2} K⊢ τ ⊑ τ′

K⊢ C1⊑ C′

1 K⊢ C′

2⊑ C2 K⊢ π′⊑ π

K, π′, Γ⊢ {C′

1} e : τ′ {C′

2} (t-SubExp)

K, π, Γ ⊢ {C1} s {C2}

K⊢ C1⊑ C′

1 K⊢ C′

2⊑ C2 K⊢ π′⊑ π

K, π′, Γ⊢ {C′

1} s {C′

2} (t-SubStmt)

K, π, Γ ⊢ {C} skip {C} (t-Skip) K, π, Γ ⊢ {Ci} si{Ci+1} i= 1, 2

K, π, Γ ⊢ {C1} s1; s2 {C3}

(t-Seq)

K, π, Γ⊢ {C1} e : tλ

{C2}

K⊢ π ⊑ λ C2(x) = tλ ′

C3= C2[x : tλ

]

K, π, Γ ⊢ {C1} x := e {C3}

(t-Asgn)

K, π, Γ ⊢ {C1} e : tλ

{C2}

K⊢ π ⊑ λ Γ(x) = tλ

K, π, Γ⊢ {C1} x := e {C2}

(t-AsgnInsens)

K, π, Γ ⊢ {C1} e : tλ

{C2} K⊢ π′⊒ π ⊔ λ K, π′, Γ ⊢ {C2} si{C3} i= 1, 2

K, π, Γ ⊢ {C1} if e then s1 else s2{C3}

(t-Cond)

C(x) = tλ

|= K0

K0, Γ, C⊢ tn x= n;

(t-VarDecl)

Γ(x) = tℓ

n |= K0

K0, Γ, C⊢ tℓ

n x= n;

(t-VarDeclInsens)

Γ(f ) = ∀α[Kα].{C1} tλ π

→ t′λ′ {C3}

α= FV ({C1} tλ π

→ t′λ′ {C3}) \ (FV (Γ ) ∪ FV (C))

K0+ Kα, π, Γ ⊢ {C1[x : tλ

]} s {C2}

K0+ Kα, π, Γ ⊢ {C2} e : t′λ′ {C3}

|= K0+ Kα Dom(C1) ⊆ Dom(C)

∀k ∈ Kα.FV (k) ∩ α6= ∅ ∀k ∈ K0.FV (k) ∩ α = ∅

K0, Γ, C⊢ t′f(t x) { s; return e; }

(t-FunDecl)

Định dạng
Số trang	7
Dung lượng	114,89 KB