Bucket and Radix sort

Bucket sort is also called bin sort works by distributing the elements of an array into a number of buckets. The computational complexity estimates involve the number of buckets.

Set up an array of initially empty "buckets".
    Go over the original array, putting each object in its bucket.
    Sort each non-empty bucket.
    Visit buckets in order and put elements back into the original array.

title

If we have to order a continuous range then buckets represent intervals. If we have more identical element then buckets let be some dynamic structure like a queue. Because of FIFO properties we can store the same element in original order.

In our implementation the sort function has a functor parameter. It is the key for ordering. If we have objects and not raw types, we have to implement some comparison operator between them. The key function returns back a comparable data field.

In [ ]:
template<typename Iter, typename Keyfun>
    void bucketSort(Iter begin, Iter end, Keyfun f) {
        typedef typename std::iterator_traits<Iter>::value_type T;
        int minimum = f(*begin), maximum = f(*begin);
        for (auto it = begin; it != end; it++) {
            if (f(*it) < minimum)
                minimum = f(*it);
            if (f(*it) > maximum)
                maximum = f(*it);
        }
        size_t size = maximum - minimum + 1;
        std::queue<T>* buckets = new std::queue<T>[size];
        for (auto it = begin; it != end; it++)
            buckets[f(*it) - minimum].push(*it);
        auto it = begin;
        for (size_t j = 0; j < size; j++) {
            while (!buckets[j].empty()) {
                *(it++) = buckets[j].front();
                buckets[j].pop();
        }
        }
        delete[] buckets;
    }
    

In raw types the key function is the identity function, so returns back itself the item.

In [1]:
int identity(int x) { return x; }
    

The next example we have dates and we’d like to order them. If we have two dates and their year fields are equal we have to check the month fields and if they are also equal the decision remain to day fields.

In [ ]:
struct Date {
        int year, month, day;
        Date(int y, int m, int d) : year(y), month(m), day(d) {}
    
        bool operator<(const Date &rhs) {
            return (this->year < rhs.year || (this->year == rhs.year && (this->month < rhs.month || (this->month == rhs.month && this->day < rhs.day))));
        }
    };
    
    int getYear(Date d) { return d.year; }
    int getMonth(Date d) { return d.month; }
    int getDay(Date d) { return d.day; }
    

A special bucket sort called radix sort keep the original order inside the fields. To achieve this sort first we have to order by day then month and finally year.

In [ ]:
void radixSort(Date* vec, size_t n) {
        bucketSort(vec, vec + n, getDay);
        bucketSort(vec, vec + n, getMonth);
        bucketSort(vec, vec + n, getYear);
    }
    

The main function test the bucket sort with raw data and then test the implemented radix sort with dates.

In [ ]:
#include <iostream>
    #include <random>
    #include <functional>
    #include <ctime>
    
    const int n = 10;
    
    ////////// Using random generator engine //////////
    std::default_random_engine gen((unsigned int)std::time(0));
    std::uniform_int_distribution<int> distribution(1, 50);
    auto randInt = std::bind(distribution, gen); // using randInt() generate integer number between (1-50)
    
    template<class Iter>
    void printArray(Iter begin, Iter end) {
        for (auto i = begin; i != end; i++) {
            std::cout << *i << ", ";
        }
        std::cout << std::endl;
    }
    
    std::cout << "Testing sort algorithms ...\n\n";
    
    int* test = new int[n];
    
    auto generate = [](decltype(test[0]) i){ i = randInt(); };
    std::for_each (test, test + n, generate);
    
    std::cout << "*** Bucket sort: ***\n";
    std::cout << "original:  ";
    printArray(test, test + n);
    
    bucketSort(test, test + n, identity);
    
    std::cout << "sorted:  ";
    printArray(test, test + n);
    std::cout << "\n";
    
    delete[] test;
    
    std::cout << "*** Radix sort: ***\n";
    std::vector<Date> dates;
    while (dates.size() < n) {
        dates.push_back(Date(2010 + rand() % 6, rand() % 13 + 1, rand() % 28 + 1));
    }
    
    auto first = &dates[0];
    radixSort(first, 10);
    
    for (unsigned int i = 0; i < dates.size(); ++i)
        std::cout << dates[i].year << '.' << dates[i].month << '.' << dates[i].day << std::endl;
    
    delete[] test;