Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

Optimizations in your C++ Program


Admiral Refuge
 Share

Recommended Posts

Hi everyone.
This is my very first real article on anything programming related, so bare with me.

I'm not actually going to go too far into this, just going to bring up some small things that you can do to make your programs more efficient (sometimes slightly, other times greatly).

Now, for those compiling with gcc, it's a pretty smart compiler, and it will most likely assume what you're trying to do, and correct it accordingly.  Though the same cannot be said about every compiler, or every version of every compiler; such as the while vs for (I'll get to that in a moment); I think microsoft and borland's compile it literally (optimization, aside).
So it's better to be safe than sorry, especially if you don't know if your code will compiler on various platforms.

**while() vs for()**
_NOTE:My compiler ended up auto-optimizing this, so I couldn't create some assembly code to match my argument (though you can google it up, and you'll find some code generated on other compilers which support this argument), but I can generate some assembly that would look similar, if you want_.

There can be abit of controversy when it comes to for vs while for an infinite loop.
In a literal compilation, while() and for() will not do the same thing.
Consider the following two:
```
while(1)
  std::cout<<"oh noes!"<What the program may compile to in the first example (via pseudo-assembly) is:
```
label:
Check if '1'
print "oh noes!" endline
jump to 'label'

```Here's the second code:
```
label:
print "oh noes!" endline
jump to 'label'

```Where you see "", that's actually where the two semi-colons for the for loop are.
So in short..
A while() loop, since it's conditional, will validate to make sure that the condition '1' is true, if so, it will continue the loop, whereas a for() statement would have no statement to condition, so it simply loops.
One fun thing I like to do in my code, is this:
```
#define EVER ;;
for(EVER) //muhaha!

```
**Postfix vs Prefix operators**
One thing I see alot in code, is the use of 'var++' over '++var'.
Protip: Unless you need the value of 'var' before incrementing it, use pre-incremental.
You may think, "what harm can ++i do?".  Honestly, not that much, if it's just an integer.
See, when you use ++i, it increments i, and returns it's value.  When you're using i++, it stores the value of i, increments it, and returns it's old value.
In assembly, this is just switching of three commands, but what about with objects?
A great example I found [here](http://www.codeguru.com/forum/showthread.php?t=231052), is the following:
```
class MyClass
{
public:
    MyClass& operator++()    //Prefix increment operator  (++x)
    {
        //Perform increment operation
        return *this;
    }

    MyClass  operator++(int) //Postfix increment operator (x++)
    {
        MyClass temp = *this;
        //Perform increment operation
        return temp;
    }
};

```As you see, a new object is initialized, holding the old object, and returns that object.
This means that you're taking a performance shot to the stack every time you use postfix.

**Dynamic allocation vs static (for lack of a better term) allocation.**
First off, let me define dynamic and static:
```
{
  //static:
  object obj;
  int foo;
  foo = 0;
  obj.setValue(foo);
}
{
  //dynamic:
  object *obj = new object;
  int *foo = new int;
  *foo = 0;
  obj->setValue(*foo);
  delete foo;
  delete obj;
}
//Please excuse my dereferencing if it's wrong.

```Now, the primary difference between the dynamic and static, is that when you dynamically allocate data, it is saved in the heap; whereas if you statically allocate it, it is saved on the stack.
So for large objects, it's a good idea to dynamically allocate it, to preserve the stack.
Another benefit of dynamically allocated objects, is that they don't have to follow the law of scope, and in a sense, they never go out of scope until they are manually deleted.

**Two-Dimensional Array Access**
Access wouldn't be the word – I should say traverse a two-dimensional array.
But the question is, which is better to use?
ary[x][y] or ary[y][x]?

It may not seem like there's any difference, but one is actually much slower than the other.
Here's an example of what the array foo[3][3] would look like in memory (don't question my epic pant skillz ;D):
![](http://i49.tinypic.com/2qdslqv.png)
Putting more into perspective, consider the two:
```
for(x=0; x<3; ++x)
  for(y=0; y<3; ++y)
    foo[x][y] = 0;
for(x=0; x<3; ++x)
  for(y=0; y<3; ++y)
    foo[y][x] = 0;

```Here is what the first five steps of the first for/for loop is doing in memory:
(the red dot represents where in memory the program is pointing to, and the arrow represents where that pointer is pointing to).
![](http://i46.tinypic.com/wt8zgo.png)
As you can see, it's using the pointer foo[0] to reach each location of memory that is next to it.
Next, we look at the second part:
![](http://i46.tinypic.com/2ewpgkw.png)

As you can see, the second way, the data pointer is throwing itself all over memory, instead of moving though it one block at a time.
To prove it's length, here is a program I threw together (well, I modified it from the original C program my professor used last year) that creates an array that's 0xF00*0xF00 in size, and traverses it:
http://pastebin.com/f45636265
Here are the results for when I ran it:
x->y: 0.0500
y->x: 0.1000
As expected, the latter was about half as slow.

**switch() vs else if**
I generated some assembly output for this one (I generated assembly output for the rest as well, but those aren't worth looking at).
One thing I try to do, is use switch() as much as possible, as opposed to else if.
Consider the following two programs:
```
int main()
{
  int myVar = 0;
  int foo;
  if(myVar == 1)
    ++myVar;
  else if(myVar == 2)
    --myVar;
  else if(myVar ==  3)
    myVar++;
  else if(myVar ==  4)
    myVar--;
  else if(myVar ==  5)
    --myVar;
  else if(myVar ==  6)
    myVar=0x3;
  else if(myVar ==  7)
    myVar=0xff;
  else if(myVar ==  8)
    myVar=0xff;
  else if(myVar ==  9)
    myVar=0x2a;
  else if(myVar ==  10)
    myVar=0x801;
  else if(myVar ==  11)
    myVar=0x6;
  else if(myVar ==  12)
    myVar=0xfa;
  else if(myVar ==  13)
    myVar=0x0354;
  else if(myVar ==  14)
    myVar=myVar+8;
  else if(myVar ==  15)
    myVar=415;
  else
    myVar=0xF;
  foo = myVar;
return 0;
}

``````
int main()
{
  int myVar = 0;
  int foo;
  switch(myVar)
  {
  case 1:
      ++myVar;
      break;
    case 2:
    --myVar;
    break;
  case 3:
    myVar++;
    break;
  case 4:
    myVar--;
    break;
  case 5:
    --myVar;
    break;
  case 6:
    myVar=0x3;
    break;
  case 7:
    myVar=0xff;
    break;
  case 8:
    myVar=0xff;
    break;
  case 9:
    myVar=0x2a;
    break;
  case 10:
    myVar=0x801;
    break;
  case 11:
    myVar=0x6;
    break;
  case 12:
    myVar=0xfa;
    break;
  case 13:
    myVar=0x0354;
    break;
  case 14:
    myVar=myVar+8;
    break;
  case 15:
    myVar=415;
    break;
  }
  foo = myVar;
  return 0;
}

```The brackets are left out, as it would make the else if too long.
Now, when this is compiled, the first program would end up compiling into a mess of cmp and jmp commands, one-by-one, working its way down.
If myVar happened to be 15, it would end up doing 15 conditionals.  The worst-case scenario here, would be that myVar isn't a number listed, and the result would be a test of every possible condition, until it reaches the else.

As for the switch(), that actually manages to be pretty efficient (well… VERY efficient).
This is because the second will compile into jump tables -- or hash maps.  This means that there is only _one_ test done, and it knows which to jump to.
You can find the assembly output for the first program here: http://pastebin.com/f511c7140 and the second here: http://pastebin.com/f654d9226

We now have a situation though.  And that's the question of "what should we do with strings?".
I ran into this exact problem when I was writing my virtual machine, well, more specifically, my assembler.
In my first version, I was using a mess of else if and strcmp; here's a snippet of my parser, so you can see what I mean:
```
  char* first = new char[256];
  //Get first 4 characters
  while(line[j]!='\0' && line[j]!=' ' && line[j] != '\n' && line[j]!=';' && line[j]!='\r')
    first[j] = line[j++]; //Get instruction
  first[j++]='\0';
/*Flow*/
  if(!(strcmp("jmp", first)))
  {
    if(jumps(line, second, j, result, table, size))
    {
      result.dword = 1;
      return result;
    }
    else
      result.byte[3] = 0x20;//Set the highest byte
  }
  else if(!(strcmp("jz", first)))
  {
    if(jumps(line, second, j, result, table, size))
    {
      result.dword = 1;
      return result;
    }
    else
      result.byte[3] = 0x24;//Set the highest byte
  }
  else if(!(strcmp("jg", first)))
  //etc

```This honestly _was_ killing me!
The reason for this issue, is because a string (whether it's the std string, or a char*) is pretty much an array of characters, and you can't treat an array as one value.

My ultimate solution, was to hash the string, throw the solution keys into an enumeration, and run a switch() on the key – it greatly improved the efficiency, and the num helped make things more sensible.

There are other solutions though; one, is to use nested switch() statements -- they have been used professionally before, but at the cost of making your wonderful code look horrible, also, the efficiency drops the more nested switches you use; here's an example that does something for the following strings: "jmp", "je", "cat"
```
switch(str[0])
{
  case 'j':
    switch(str[1])
    {
      case 'm':
      //do stuff for jmp
      break;
      case 'e':
      //do stuff for je
      break;
    }
    break;
  case 'c':
    switch(str[1])
    {
      case 'a':
      //do stuff for "cat"
      break;
    }
    break;
}
```
The second, is to use the STL Container classes, and use the strings as the key, and just have the entries point to functions like in a hash table.

Other than that, I can't really think of any other ways of doing this.

**Inline vs Non-Inline**
Just a quickie on here.
If you call a certain function many times (esepcially a small function, such as void "addXY(int& x, int& y, int& z) {z = x+y;}", you may want to consider making it inline.
Use inline sparingly though, because it doesn't always result in faster code, and usually results in _longer_ code

**Preprocessors**
Another controversy is whether to use #define (and macros) or use typedef (or functions).
Since the preprocessors are made during compile-time rather than run-time, they will be faster, but I personally use typedef over #define due to the fact I feel it's safer to work to work with type-safe objects then constants (you can fund much discussion on this subject).

**Compiler Flags**
For those using GCC, there are afew flags you should be aware of when compiling your code.
First, the -O (not -o) flag, you can choose the level of compiler-size optimization for your code:
-O0
This option turns all optimizations off.
-O1
This option will try to reduce the code's size and efficiency while trying not to slow down the compiling process.
-O2
Level two optimization will enable as just about all the compiler optimizations that shouldn't hurt your program.  This one may show down the actual time it will take to compile, but it's worth it.
I'd always recommend -O2
-O3
This turns all the optimizations on; there's a chance it will break your code though, I don't touch it.
-mtune and -march
mtune will tune your program to your CPU type, and march will generate assembly for your specific CPU (it will assume that mtune=march).
If you don't mind your program only working on your computer, you can use:
-march=native
Which will detect your CPU type, and act accordingly.
If you want your binary to execute on any machine (well, any x86), you can use -march=generic (or -mtune=generic); it will use all of those slow legacy instructions for the 8086 :O
You can see the rest of the -m optimizations here:
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

Well, I hope this helped someone – I enjoyed writing it atleat :P

```
Link to comment
Share on other sites

@Admiral Refuge: arrays with more than one index shouldn't be used as they aren't properly standardised in C compilers (a.k.a. what might seem to work in one compiler, might break the whole application in another). Basically, I don't even know why they are part of the C/C++ languages.

btw. [the solution](http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html) for everything (in C/C++).

Also, try to stick to low-level features as much as possible. Classes and such might be cool and neat to use, but when you have to use templated operator overloading, it will surely be a pain to get it working with just **one** compiler (I love how the C/C++ standards are flawed).

One of the best _optimisation_ techniques in C/C++ is to get the stuff you can get done at compile-time done at compile-time instead of run-time. If you have some heavy calculations to be done like vectors, quaternions or whatever then use SIMD (SSE, 3DNow!, MMX, Altivec, etc.).

Also, allow me to make you aware of the fact that _formal_ C/C++ is incredibly slow compared to Assembly as the whole language is stack-oriented and not register-oriented. So prevent having as much functions and class methods as possible. If the calls aren't inlined, you are basically screwed.

Regards,
  Godlord.
Link to comment
Share on other sites

  • 4 months later...
Hmm weird that an array of arrays isn't supported (Or is a multi-dimensional array not an array of arrays ?).

Also templates are a tad flawed with MSVC it will allow them to be inside a class while with for example GCC they can't be.
Link to comment
Share on other sites

@Xeross:

> Hmm weird that an array of arrays isn't supported (Or is a multi-dimensional array not an array of arrays ?).
>
> Also templates are a tad flawed with MSVC it will allow them to be inside a class while with for example GCC they can't be.

They aren't the same and they're both supported, but multi-dimensional arrays aren't properly standardised because compilers might decide the order of the elements themselves.

ex. 1: array of arrays.
```
char **objArray = new char *[4];
objArray[0] = new char[5];
objArray[1] = new char[8];
objArray[2] = new char[3];
objArray[3] = new char[20];
```
ex. 2: multi-dimensional array.
```
char **objArray = new char [4][4];
```

Regards,
  Stephan.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...