I agree with R. Pate and Todd Gardner; a std::set might be a good idea here. Even if you’re stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.

Let’s compare three approaches:

Just using vector, sort + unique

sort( vec.begin(), vec.end() );

vec.erase( unique( vec.begin(), vec.end() ), vec.end() );

Convert to set (manually)

set

unsigned size = vec.size();

for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );
Convert to set (using a constructor)
set

vec.assign( s.begin(), s.end() );

Here’s how these perform as the number of duplicates changes:

Summary: when the number of duplicates is large enough, it’s actually faster to convert to a set and then dump the data back into a vector.

And for some reason, doing the set conversion manually seems to be faster than using the set constructor — at least on the toy random data that I used.

I redid Nate Kohl’s profiling and got different results. For my test case, directly sorting the vector is always more efficient than using a set. I added a new more efficient method, using an unordered_set.

Keep in mind that the unordered_set method only works if you have a good hash function for the type you need uniqued and sorted. For ints, this is easy! (The standard library provides a default hash which is simply the identity function.) Also, don’t forget to sort at the end since unordered_set is, well, unordered ðŸ™‚

I did some digging inside the set and unordered_set implementation and discovered that the constructor actually construct a new node for every element, before checking its value to determine if it should actually be inserted (in Visual Studio implementation, at least).

Here are the 5 methods:

f1: Just using vector, sort + unique

sort( vec.begin(), vec.end() );

vec.erase( unique( vec.begin(), vec.end() ), vec.end() );

f2: Convert to set (using a constructor)

set

vec.assign( s.begin(), s.end() );

f3: Convert to set (manually)

set

for (int i : vec)

s.insert(i);

vec.assign( s.begin(), s.end() );

f4: Convert to unordered_set (using a constructor)

unordered_set

vec.assign( s.begin(), s.end() );

sort( vec.begin(), vec.end() );

f5: Convert to unordered_set (manually)

unordered_set

for (int i : vec)

s.insert(i);

vec.assign( s.begin(), s.end() );

sort( vec.begin(), vec.end() );

I did the test with a vector of 100,000,000 ints chosen randomly in ranges [1,10], [1,1000], and [1,100000]

The results (in seconds, smaller is better):

range f1 f2 f3 f4 f5

[1,10] 1.6821 7.6804 2.8232 6.2634 0.7980

[1,1000] 5.0773 13.3658 8.2235 7.6884 1.9861

[1,100000] 8.7955 32.1148 26.5485 13.3278 3.9822