Julia Introspects

Aug 20, 2013   #julialang  #tutorial  #code  #llvm  #ast 

When people are introspective, they’re thinking about how their minds work, about how and why they think what they do. The Julia language has some impressive facilities for letting you see how the compilers’ mind works. Using convenient built-in functions that are available both at the REPL1 and in normal code, Julia allows you to see the layers of internal representation that your code goes through, from the parsed AST to native assembly code.

This allows you to answer some otherwise difficult questions very easily. (and allows you to peer into the inner workings of the compiler, which is just plain fun.)

##A Simple Question: I wonder if it matters which of these I use?

One of the questions I have pondered is whether two syntaxes for assigning variables make a performance difference. Each of these two approaches assigns the same two values to the same two variables.

The most straight-forward approach:

function linear_foo()
  x = 4
  y = 5
end

Sometimes this looks nicer:

function tuple_foo()
  x,y = 4,5
end

The question that we’re specifcally wondering about is whether it matters which syntax we use, from a speed stand point. (The concern is that the second syntax might implicitly create a tuple and waste time messing around with it.) In most languages, if you really cared, you’d make a microbenchmark to get an approximate answer. Instead, I’m going to go take a look at the optimized version of the AST. (It’s much easier and more exact than benchmarking. :)

In Julia, you can just look at the optimized version of the AST for any generic function2. All it takes is a single call to code_typed.

The value of the last expression in a Julia function is implictly returned, so the only change to our first approach is to add the return statement:

julia> code_typed(linear_foo,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))

The only difference in the optimized version of the second one is that it returns a tuple. This is solely due to the expression evaluating to a tuple.

julia> code_typed(tuple_foo,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4
        y = 5
        return top(tuple)(4,5)::(Int64,Int64)
    end)))

For example:

function tuple_foo2()
  x,y = 4,5
  y
end

becomes

julia> code_typed(tuple_foo2,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4
        y = 5 # line 3:
        return y::Int64
    end)))

As you may have guessed, this is by no means the final internal represetation or the final optimization pass. These functions can be optimized down to nearly nothing, as we can see if we take a look at the assembly code:

julia> code_native(linear_foo,()) # returns the value of y
	.text
Filename: none
Source line: 3
	push	RBP
	mov	RBP, RSP
	mov	EAX, 5
Source line: 3
	pop	RBP
	ret

julia> code_native(tuple_foo,()) # returns a tuple of (x,y)
	.text
Filename: none
Source line: 2
	push	RBP
	mov	RBP, RSP
	mov	EAX, 83767488
Source line: 2
	pop	RBP
	ret

julia> code_native(tuple_foo2,()) # returns the value of y
	.text
Filename: none
Source line: 3
	push	RBP
	mov	RBP, RSP
	mov	EAX, 5
Source line: 3
	pop	RBP
	ret

Layers

This post will cover five layers of internal representations of Julia code. Except for the first one, each layer is accessible via a normal generic function that takes a generic function and a tuple of argument types (to specify which method you want to examine).

If you are uncertain about the signature of the method you’re calling, the macro @which will be useful to you.3 The internal representation of your code in the compiler is called an Abstract Syntax Tree (AST); the AST format is specific to Julia. Julia uses LLVM to create machine-specific assembly code; LLVM has its own intermediate representation (IR). The Julia compiler generates LLVM IR to tell LLVM what the generated assembly code should do. The native assembly code is specific to your computer’s architecture. You can see the documentation for these functions in the official manual here.

  1. The AST after parsing
  2. The AST after lowering
  3. The AST after type inference and optimization
  4. The LLVM IR
  5. The assembly code

Layer 1: The AST

When the parser takes your code in (as a String), it will produce an AST (Abstract Syntax Tree). The AST is the compiler’s representation of your code. It is like you turning my written sentances into your mental representation of their structure/meaning. If you’re familiar with writing macros in Julia, this will be old news to you. This representation is not saved, so if we want to see it, we’ll need to quote the expression.

julia> :(2 + 2)
:(+(2,2))

Above, we can see that the infix + operator just becomes a function call. This is identical to if you use + as a normal function:

julia> :(+(2,2))
:(+(2,2))

Slightly more interesting:

julia> :(1 + 2 + 3 + 4 + 5)
:(+(1,2,3,4,5))

julia> :(1 + 2 - 3 - 4 + 5)
:(+(-(-(+(1,2),3),4),5))

julia> :(1-2-3-4-5)
:(-(-(-(-(1,2),3),4),5))

This lets you see that + becomes one function call even with a lot of args, while - does not.

This is the same quoting used in macros; you can also use a quote block, such as:

quote
  2 + 2
end

Layer 2: The Lowered AST

code_lowered(generic_function, (types_arg_list,))

While quoting will work on any expression, the rest of these layers involve calling a function that takes a generic function and a tuple of argument types. For example, code_lowered(linear_foo,()) returns the lowered AST of our function from the start of this post.

code_lowered will return the lowered AST for any method of a generic function. Lowering in general is the process of moving from surface syntax (highest) to machine code (lowest). Here, lowering involves transforming the AST in ways that make it simpler. This includes unnesting some expressions and desugaring some syntax into the function calls indicated.

The lowered AST is stored for every generic function. This will work on methods you write and on ones in packages and on ones from the base libraries. code_lowered is a normal generic function: it will work from the REPL and from any Julia code you write.

Examples

You can call it on one of the simple functions we defined earlier:

julia> code_lowered(linear_foo,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,:Any,18},{:y,:Any,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))

Or you could call it on a built-in function from Base:

julia> code_lowered(+,(Int,Int))
1-element Any Array:
 :($(Expr(:lambda, {:x,:y}, {{},{{:x,:Any,0},{:y,:Any,0}},{}}, quote  # int.jl, line 36:
        return box(Int64,add_int(unbox(Int64,x),unbox(Int64,y)))
    end)))

The + function also has a single-argument method:

julia> +(5)
5

If you want to make a unary tuple, use a trailing comma:

julia> cl = code_lowered(+,(Int,)) #trailing comma to make (Int,) a tuple
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # operators.jl, line 39:
        return x
    end)))

As you can see below, the value you get back is a one-dimensional Any array of Exprs. Expr is the type used to represent an expression in the AST; you also use them when writing macros.

julia> typeof(cl)
Array{Any,1}

julia> typeof(cl[1]) # Julia Arrays are indexed from 1
Expr

code_lowered returns an Array because it sometimes returns multiple (or 0) values. It will return an entry for each matching method:

julia> code_lowered(+,(Any,))
3-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # bool.jl, line 35:
        return int(x)
    end)))     
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # operators.jl, line 39:
        return x
    end)))     
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # abstractarray.jl, line 264:
        return x
    end)))

An example of getting no results:

julia> code_lowered(+,(String,String))
0-element Any Array

There is no + for Strings because Julia uses * as the string concatenation operator.

julia> code_lowered(*,(String,String))
1-element Any Array:
 :($(Expr(:lambda, {:(s::top(apply_type)(Vararg,String))}, {{},{{:s,:Any,0}},{}}, quote  # string.jl, line 72:
        return top(apply)(string,s)
    end)))

It’s easier to see what lowering does if you take a look at examples involving control flow. For example, if you define this function:

function myloop(x::Int)
  result = 0  
  for i=1:x
    result += x
  end
  result
end

You can see a loop in the lowered code:

julia> code_lowered(myloop,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{:result,:#s6,:#s5,:i},{{:x,:Any,0},{:result,:Any,2},{:#s6,:Any,2},{:#s5,:Any,18},{:i,:Any,18}},{}}, quote  # none, line 2:
        result = 0 # line 3:
        #s6 = 1
        #s5 = x
        1: 
        unless top(<=)(#s6,#s5) goto 2
        i = #s6 # line 4:
        result = +(result,x)
        3: 
        #s6 = top(convert)(top(typeof)(#s6),top(+)(1,#s6))
        goto 1
        2: 
        0:  # line 6:
        return result
    end)))

If you want to see what happens to an if-statment, you could use this example:

function lessthan5(x::Int)
  if x < 5
    return true
  else
    return false
  end
end

You can see that, like the loop, this is also lowered into an unless and a goto.

julia> code_lowered(lessthan5,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # none, line 2:
        unless <(x,5) goto 0 # line 3:
        return true
        goto 1
        0:  # none, line 5:
        return false
        1: 
    end)))

Layer 3: The Type-inferred, optimized AST

code_typed(generic_function, (types_arg_list,))

code_typed returns the type-inferred and optimized version of the Julia AST. It is the last layer that is internal to Julia.

How to Call code_typed

You need to be extra careful to use trailing commas for single-argument functions with code_typed:

julia> code_typed(+,(Int))
0-element Any Array

julia> code_lowered(+,(Int))
ERROR: no method code_lowered(Function,DataType)

julia> code_typed(+,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,Int64,0}},{}}, quote  # operators.jl, line 39:
        return x::Int64
    end)))

Structure of the Return Value

You should be getting an Array of Exprs back. This value can be a bit complicated and hard to understand. It has three fields: head, args, and typ. (You can find this out by calling the function names on it.)

  • head is a Symbol that tells you what kind of expression this is. For Exprs returned by code_typed, this will always be :lambda.

  • typ is a DataType. Currently, it will always be Any for Exprs returned by code_typed.

  • args is a 1-dimensional Any Array (Array{Any,1}). It’s the interesting part: it contains information about the body of the function and the variables used there.

There are three parts to args:

  1. Symbols of the names of function arguments. This has type Array{Any,1}.
  2. An Array{Any,1} of length 3. It contains details about all variables used in the function (local, captured, and arguments).
  3. An Expr representing the body of the generic function.

The middle part (2) above has more structure to examine:

  1. An Array{Any,1} of Symbols. This contains a Symbol for the name of each local variable.
  2. An Array{Any,1} of length-3 Array{Any,1}s describing each local variable and argument. I’ll get to the format of the length-3 Arrays in a moment.
  3. An Array{Any,1} of length-3 Array{Any,1}s describing each captured variable. These entries have the same format as above.

The triple that describes each used variable is an Array{Any,1}. It consists of a Symbol of the variable name, a DataType for the inferred type of the variable, and an Int64 of bit flags describing how the variable is used.

The lowest 5 bits of the Int64 are used as bit flags.4 From most to least significant, these bits represent whether the variable: [is assigned once][is const][is assigned by inner function][is assigned][is captured]. So, if no bits are set, then the value will be 0. If a variable is captured, but not const or assigned to, then it will have a value of 1. If a local variable is assigned to, then it would have a value of 2.

An Example: 0-args, just assigning to local vars

The function:

function foo()
  x = 4
  y = 5
end

The result of code_typed(foo,()):

1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))
  • the .head field is :lambda, as expected.
  • the .typ field is Any, also as expected.
  • the .args field is:
3-element Any Array:
{}                                                                   
{{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}                           
quote  # none, line 2:
   x = 4 # line 3:
   y = 5
   return 5
end

Let’s talk more about args:

  • .args[1] is empty because we took no arguments.
  • .args[2][1] contains the names of our two local variables, x and y.
  • .args[2][2] contains a description of each of those local variables. The Int64 values indicate that x and y have been inferred to be of that type, despite no type annotations in the code. The value 18 indicates that the set bit flags are “is assigned once” (16) and “is assigned by inner function” (4).
  • .args[2][3] is empty because we did not capture any variables.
  • .args[3] is the Expr representing the body of the function. You may notice that it is nearly identical to the original version of the code.

Layer 4: LLVM IR

code_llvm(generic_function, (types_arg_list,))

Calling code_llvm prints the LLVM IR for the function. This is going to be more unfamiliar-looking that the previous layers, since it looks like a complicated kind of assembly code, rather than being Julia-specific. It also differs in that it prints out the code, not returning a maniputable value to you.

Usage Examples

julia> code_llvm(linear_foo,())

define i64 @julia_linear_foo() {
top:
  ret i64 5, !dbg !4355
}

julia> code_llvm(+,(Int,))

define i64 @"julia_+823"(i64) {
top:
  ret i64 %0, !dbg !4361
}

julia> code_llvm(+,(Int,Int))

define i64 @"julia_+824"(i64, i64) {
top:
  %2 = add i64 %1, %0, !dbg !4367
  ret i64 %2, !dbg !4367
}

Note that now trying to get multiple results is going to end in an error:

julia> code_llvm(+,(Any,))
ERROR: no method found for the specified argument types
 in _dump_function at reflection.jl:110
 in code_llvm at reflection.jl:115

Happily, accidently non-tuple types also result in an error:

julia> code_llvm(+,(Int))
ERROR: no method code_llvm(Function,DataType)

Layer 5: Assembly Code

code_native(generic_function, (types_arg_list,))

Calling code_native prints the native assembly code for the specified method.

Usage Example

julia> code_native(+,(Int,Int))
	.text
Filename: int.jl
Source line: 36
	push	RBP
	mov	RBP, RSP
Source line: 36
	add	RDI, RSI
	mov	RAX, RDI
	pop	RBP
	ret

Calling it with a non-tuple or for signatures that don’t exist results in an error:

julia> code_native(+,(Int))
ERROR: no method code_native(Function,DataType)

julia> code_native(+,(Any,))
ERROR: no method found for the specified argument types
 in _dump_function at reflection.jl:110
 in code_native at reflection.jl:116

##Footnotes:


  1. The Julia REPL and normal Julia code (in files) are equally powerful and have all the same capabilities. These are not interpreter directives, like :t in GHCI. ↩︎

  2. If your function has a name, it’s a generic function. ↩︎

  3. The @which macro lets you see which method of a function would be called with a particular set of arguments, without actually calling it. For example:

    :::jl
    julia> @which 2+2
    +(x::Int64,y::Int64) at int.jl:36
    

    This means that you should look in the Julia source tree, in the base folder, for a file called int.jl, and you’ll find the method for +(Int64,Int64) defined there.

    It’s less helpful for functions defined in the REPL:

    :::jl
    julia> @which lessthan5(4)
    lessthan5(x::Int64) at none:2
    
    ↩︎
  4. To see for yourself what the bit flags field is, you should take a look around lines 2357-2381 of julia/src/julia-syntax.scm:

    :::scm
    ;; record whether var is captured
    (define (vinfo:set-capt! v c) (set-car! (cddr v)
                                            (if c
                                                (logior (caddr v) 1)
                                                (logand (caddr v) -2))))
    ;; whether var is assigned
    (define (vinfo:set-asgn! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 2)
                                                (logand (caddr v) -3))))
    ;; whether var is assigned by an inner function
    (define (vinfo:set-iasg! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 4)
                                                (logand (caddr v) -5))))
    ;; whether var is const
    (define (vinfo:set-const! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 8)
                                                (logand (caddr v) -9))))
    ;; whether var is assigned once
    (define (vinfo:set-sa! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 16)
                                                (logand (caddr v) -17))))
    
    ↩︎