java中List的sort源码解读

最近看了一些排序相关的文章，因此比较好奇，Java中的排序是如何做的。本片文章介绍的是JDK1.8，List中的sort方法。

先来看看List中的sort是怎么写的：

@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

首先，你需要传入一个比较器作为参数，这个好理解，毕竟你肯定要定一个比较标准。然后就是将list转换成一个数组，再对这个数组进行排序，排序完之后，再利用iterator重新改变list。

接着，我们再来看看Arrays.sort：

public static <T> void sort(T[] a, Comparator<? super T> c) {
    if (c == null) {
        sort(a);
    } else {
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, c);
        else
            TimSort.sort(a, 0, a.length, c, null, 0, 0);
    }
}

public static void sort(Object[] a) {
    if (LegacyMergeSort.userRequested)
        legacyMergeSort(a);
    else
        ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
}

static final class LegacyMergeSort {
    private static final boolean userRequested =
        java.security.AccessController.doPrivileged(
            new sun.security.action.GetBooleanAction(
                "java.util.Arrays.useLegacyMergeSort")).booleanValue();
}

这样可以看出，其实排序的核心就是TimSort，LegacyMergeSort大致意思是表明如果版本很旧的话，就用这个，新版本是不会采用这种排序方式的。

我们再来看看TimSort的实现：

private static final int MIN_MERGE = 32;
static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
                     T[] work, int workBase, int workLen) {
    assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;

    int nRemaining  = hi - lo;
    if (nRemaining < 2)
        return;  // Arrays of size 0 and 1 are always sorted

    // If array is small, do a "mini-TimSort" with no merges
    if (nRemaining < MIN_MERGE) {
        // 获得最长的递增序列
        int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
        binarySort(a, lo, hi, lo + initRunLen, c);
        return;
    }

    /**
     * March over the array once, left to right, finding natural runs,
     * extending short natural runs to minRun elements, and merging runs
     * to maintain stack invariant.
     */
    TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
    int minRun = minRunLength(nRemaining);
    do {
        // Identify next run
        int runLen = countRunAndMakeAscending(a, lo, hi, c);

        // If run is short, extend to min(minRun, nRemaining)
        if (runLen < minRun) {
            int force = nRemaining <= minRun ? nRemaining : minRun;
            binarySort(a, lo, lo + force, lo + runLen, c);
            runLen = force;
        }

        // Push run onto pending-run stack, and maybe merge
        ts.pushRun(lo, runLen);
        ts.mergeCollapse();

        // Advance to find next run
        lo += runLen;
        nRemaining -= runLen;
    } while (nRemaining != 0);

    // Merge all remaining runs to complete sort
    assert lo == hi;
    ts.mergeForceCollapse();
    assert ts.stackSize == 1;
}

如果小于2个，代表不再不需要排序；如果小于32个，则采用优化的二分排序。怎么优化的呢？首先获得最长的递增序列：

private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                Comparator<? super T> c) {
    assert lo < hi;
    int runHi = lo + 1;
    if (runHi == hi)
        return 1;

    // Find end of run, and reverse range if descending
    if (c.compare(a[runHi++], a[lo]) < 0) { // Descending
        // 一开始是递减序列，就找出最长递减序列的最后一个下标
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
            runHi++;
        // 逆转前面的递减序列
        reverseRange(a, lo, runHi);
    } else {                              // Ascending
        while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
            runHi++;
    }

    return runHi - lo;
}

接着进行二分排序：

private static <T> void binarySort(T[] a, int lo, int hi, int start,
                                   Comparator<? super T> c) {
    assert lo <= start && start <= hi;
    if (start == lo)
        start++;
    for ( ; start < hi; start++) {
        T pivot = a[start];

        // Set left (and right) to the index where a[start] (pivot) belongs
        int left = lo;
        int right = start;
        assert left <= right;
        /*
         * Invariants:
         *   pivot >= all in [lo, left).
         *   pivot <  all in [right, start).
         */
        // start位置是递增序列后的第一个数的位置
        // 从前面的递增序列中找出start位置的数应该处于的位置
        while (left < right) {
            // >>> 无符号右移
            int mid = (left + right) >>> 1;
            if (c.compare(pivot, a[mid]) < 0)
                right = mid;
            else
                left = mid + 1;
        }
        assert left == right;

        /*
         * The invariants still hold: pivot >= all in [lo, left) and
         * pivot < all in [left, start), so pivot belongs at left.  Note
         * that if there are elements equal to pivot, left points to the
         * first slot after them -- that's why this sort is stable.
         * Slide elements over to make room for pivot.
         */
        int n = start - left;  // The number of elements to move
        // Switch is just an optimization for arraycopy in default case
        // 比pivot大的数往后移动一位
        switch (n) {
            case 2:  a[left + 2] = a[left + 1];
            case 1:  a[left + 1] = a[left];
                     break;
            default: System.arraycopy(a, left, a, left + 1, n);
        }
        a[left] = pivot;
    }
}

好了，待排序数量小于32个的讲完了，现在来说说大于等于32个情况。首先，获得一个叫minRun的东西，这是个啥含义呢：

int minRun = minRunLength(nRemaining);
private static int minRunLength(int n) {
    assert n >= 0;
    int r = 0;      // Becomes 1 if any 1 bits are shifted off
    while (n >= MIN_MERGE) {
        // 这里我没搞懂的是为什么不直接将(n & 1)赋值给r，而要做一次逻辑或。
        r |= (n & 1);
        n >>= 1;
    }
    return n + r;
}

各种位运算符，MIN_MERGE默认为32，如果n小于此值，那么返回n本身。否则会将n不断地右移，直到小于MIN_MERGE，同时记录一个r值，r代表最后一次移位n时，n最低位是0还是1。
其实看注释比较容易理解：

Returns the minimum acceptable run length for an array of the specified length. Natural runs shorter than this will be extended with binarySort.
Roughly speaking, the computation is: If n < MIN_MERGE, return n (it's too small to bother with fancy stuff).
Else if n is an exact power of 2, return MIN_MERGE/2.
Else return an int k, MIN_MERGE/2 <= k <= MIN_MERGE, such that n/k is close to, but strictly less than, an exact power of 2. For the rationale, see listsort.txt.

返回结果其实就是用于接下来的合并排序中。

接下来就是一个while循环

do {
    // Identify next run
    // 获得一个最长递增序列
    int runLen = countRunAndMakeAscending(a, lo, hi, c);

    // If run is short, extend to min(minRun, nRemaining)
    // 如果最长递增序列
    if (runLen < minRun) {
        int force = nRemaining <= minRun ? nRemaining : minRun;
        binarySort(a, lo, lo + force, lo + runLen, c);
        runLen = force;
    }

    // Push run onto pending-run stack, and maybe merge
    // lo——runLen为将要被归并的范围
    ts.pushRun(lo, runLen);
    // 归并
    ts.mergeCollapse();

    // Advance to find next run
    lo += runLen;
    nRemaining -= runLen;
} while (nRemaining != 0);

这样，假设你的每次归并排序的两个序列为r1和r2，r1肯定是有序的，r2也已经被排成递增序列了，因此这样的归并排序就比较特殊了。

为什么要用归并排序呢，因为归并排序的时间复杂度永远为O(nlogn)，空间复杂度为O(n)，以空间换取时间。

好了，以上就是针对Java中的排序做的一次总结，但具体的归并代码还没有分析，其实我自己也没有完全研究透，为什么minRun的取值是这样的，这也和TimSort中的stackLen有关，有兴趣的小伙伴可以在下方留言，我们可以一起探讨。