In my last installment, we added the ability to create new steering behaviors to Bayts using the Lua scripting language. Unfortunately, the resulting implementation was horribly slow. So today we’ll explore using a very important tool: the profiler. For this article, I’ll be using Shark, Apple’s free profiler. These techniques apply just as well to profilers on other platforms.

It’s been several months since I last looked at this material, mainly because I’ve been spending time with my new son, so some of it is a little rusty in my memory. Hopefully I get it all right!

Any time you’re confronted with a performance issue, the first thing you should do is fire up a profiler. Often, it can be tempting to just plow into the code and search for hot spots by hand, and you may even have a good lot of success going that route. But chances are, it’ll take you a lot longer then if you use a profiler. You might end up making micro-optimizations that have no real impact on performance, but makes your code less maintainable, and that is something that you should definitely avoid.

So, on that note, we’ll run Bayts in Shark and see what we can come up with. For this test, I’m going to use the Time Profile mode, with a time limit of 30 seconds. These are usually the default settings for Shark. Since Bayts is currently so simple, with no user interaction and no real launch time work, we can just launch the app and let it run for the 30 seconds, then stop it when it’s done, and we’ll get a pretty good sample. In more complicated projects, you’ll often need to get to a certain point in the game before starting the sampling. Either way, you’ll want to launch the app, then find it in Shark’s processes list and either click Shark’s Start button, or press Option+Esc.

Running the current version of Bayts through Shark’s Time Profile will give you something looking a little like this:

Shark Profile 1

From this, we can see that we’re spending a bit over 13% of our time in a lua function, luaS_newlstr. Well, we probably could easily blame Lua at this point, but it would be better to see where this function is getting called from, it might still be due to inefficiencies in our code. Shark lets us drill down for a closer look:

Shark Profile 2

Now we see that nearly all of the time in luaS_newlstr is within one of our functions, lua_pushbayt. As you’ll recall from last time, this function pushes all of the information about a Bayt object onto Lua’s stack so it can be used in a Lua function.

I didn’t dig too much into the Lua code, but I’m willing to bet that there’s not too much we could do to improve the performance of luaS_newlstr. So how about avoiding it altogether, or at least try to minimize how much we call it?

To start, we need to look at what code path actually leads us through luaS_newlstr - we aren’t calling it directly from lua_pushbayt. Fortunately, it’s not too hard to figure out, especially since we already have a snapshot of the appropriate call stack in Shark. Looks like there’s a function in between, lua_setfield. A quick glance at the source for lua_pushbayt confirms, we call this function a boatload of times:

  1. void lua_pushbayt(lua_State* L, const Bayt& b)
  2. {
  3.     lua_createtable(L, 0, 12);
  4.  
  5.     int tableIdx = lua_gettop(L);
  6.  
  7.     lua_pushvector(L, b.getPosition());
  8.     lua_setfield(L, tableIdx, "position");
  9.  
  10.     lua_pushvector(L, b.getLastPosition());
  11.     lua_setfield(L, tableIdx, "last_position");
  12.     …

It goes on from there for quite a while.

Do we really need to put all of this information on Lua’s stack? Let’s take a look at the one steering behavior we implemented in Lua:

  1. function SteeringTest(bayt, friends, enemies)
  2.    local change = { [0] = 0, [1] = 0, [2] = 0 }
  3.    local i
  4.  
  5.    for i = 1, friends.NearbyBaytCount - 1, 1 do
  6.       local index = friends.indexes[i]
  7.  
  8.       if friends[index].id ~= bayt.id then
  9.          change = vectorAdd(change, friends[index].velocity)
  10.       end
  11.    end
  12.  
  13.    change = setVectorMagnitude(change, bayt.minUrgency)
  14.  
  15.    return change
  16. end

In this example, it looks like we inspect only 3 fields from any given Bayt: id, velocity, and minUrgency. So, for this function at least, we’re pushing a whole lot of stuff onto Lua’s stack that never gets used. And we know that doing so is somewhat expensive. What to do?

We could assume this function is pretty exemplary and try only pushing these three pieces of data onto the stack. But that is a major assumption backed by almost no evidence, and has the potential to be far too limiting to people wishing to write new scripts.

Another solution is to not push any data about the Bayt onto the stack, but instead just give the Lua script a handle which it can pass back to some functions we’ll supply to get the data it needs. That seems like a much better general solution. So let’s give it a shot.

Lua has a mechanism called “light user data” which allows you to give some Lua code an opaque piece of data that the runtime is not allowed to change, but which can be passed back into your custom code. We’ll use that mechanism to push handles to Bayts onto the Lua stack, instead of the whole object as we did before. What will be the handle? Why, a pointer to the Bayt, of course!

Now our huge lua_pushbayt function has been reduced to a single line of code:

  1. void lua_pushbayt(lua_State* L, const Bayt& b)
  2. {
  3.     lua_pushlightuserdata(L, const_cast<Bayt*>(&b));
  4. }

Of course, we have to provide a way for the Lua script to get data it’s interested in. This will be done with a set of functions taking the form:

  1. extern "C" int lua_getBaytPosition(lua_State *L)
  2. {
  3.     validateArgumentCount(L, 1);
  4.    
  5.     Bayt* b = lua_tobayt(L, lua_gettop(L));
  6.  
  7.     if (b)
  8.     {
  9.         lua_pushvector(L, b->getPosition());
  10.     }
  11.     else
  12.     {
  13.         lua_pushvector(L, BaytVector(3, 0.0));
  14.     }
  15.  
  16.     return 1;   
  17. }

There is a function like this for every member of a Bayt object that we want to expose to Lua script. The function lua_tobayt is as straightforward as lua_pushbayt:

  1. Bayt* lua_tobayt(lua_State* L, int index)
  2. {
  3.     return static_cast<Bayt*>(lua_touserdata(L, index));
  4. }

So, what does this buy us? Quite a bit, actually. We’ve gone from “go-get-a-cup-of-coffee-between-frames” slow to, on my machine, around 80fps. For comparison, the pure C++ version runs at about 115fps on this machine. Hmm, still losing quite a bit. But, then again, we knew we would when switching to using a scripting language. Still seems like a lot, though… Let’s look at Shark again.

Shark Profile 2

It does look like we spend a little bit of extra time allocating and deallocating memory, but nothing too significant. Technically, we could probably do better than we are right now, but to be honest, I don’t think it’s worth it at the moment. If this system becomes more complex, and involves more scripting, then we’ll revisit the issue, but for now, it’s Good Enough.

The improved version of Bayts can be found, as always, in the Google Code SVN repository:

http://programmicon.googlecode.com/svn/tags/LuaScripting-0.03


Comments

Name

Speak your mind

*
To prove that you're not a bot, enter this code
Anti-Spam Image

Check Spelling
Activate Spell Check while Typing

2 Comments so far

  1. Jon on July 2, 2008 3:19 pm

    I suggest implementing a simple block allocator in your ScriptManager class - that made a big (15-20%) difference in my (single threaded) application. Using a block size of 16-32 means that many short lived allocations are serviced very cheaply.

  2. Andy Molloy on August 20, 2008 9:05 am

    Sounds like a good idea, I’ll give it a try when I get a chance. Thanks!