KEMBAR78
Release v2.0.0 — Wheelbarrow · Morwenn/cpp-sort · GitHub
Skip to content

v2.0.0 — Wheelbarrow

Latest

Choose a tag to compare

@Morwenn Morwenn released this 21 Sep 17:41

DOI

It's already been a whole decade since the inception of what would soon become cpp-sort, and it had accumulated a number of deprecated components over the years; it was high time to blow all that dust off with a breaking change. It was also a good opportunity to start requiring C++17: it's indeed been a very long time since I last had my hands on a compiler that did not support most C++17 features (even in continuous integration, which made guaranteeing support for older compilers more difficult). That change allows to get rid of a lot of internal polyfills—naturally leading to reduced maintenance effort—, and to slightly improve user experience here and there.

This release also focuses on improving the ecosystem around measures of disorder: over the years, this project has become one of the first results when searching for "measures of presortedness", yet I was forced to admit that my own knowledge of the topic was extremely surface-level, and that the documentation was far from enough when not outright incorrect. Beyond clarifying the difference between the terms "measure of disorder" and "measure of presortedness", the wiki now clearly documents some important axioms and properties, what measures of disorder satisfy them, and how they relate to adaptive sorting. The test suite was made more robust in this area, with RapidCheck being used to perform much more thorough tests of the different axioms, properties, lemmas and theorems from the literature. Special thanks to Slimane Benloucif for all the resources, enlightening conversations, and exchanges of points of views, without whom this release would have been much more tamer on this front.

Overall version 2.0.0 is almost solely focused on quality and making the project a better resource, without adding a single big new feature. Do not worry though, I do plan to bring back features to the table in the near future, with new sorters and measures of disorder on the way. Among my various goals, I would also like to add more safety-related features to the library, and to document some sorting-adjacent topics and implementation tricks from the library in a less rigid form on my blog. What I do manage to work on will obviously depend on my motivation, though I do hope that leaving C++14 behind will help me with that.

Note: if you're an avid reading of the library's release notes, you might remember that I had been teasing a v2.0.0 breaking change for a long time, with ambitious changes. That effort is still undergoing, but does not correspond to the current version: I have hit a lot of road-blockers along the way, and progress has mostly stalled. I can't make any promise about a timeline, but as of today that future version targets C++23.

Removals

cppsort::sort, cppsort::stable_sort and default_sorter (#168)

The very concept of cpp-sort is to put a handful of sorting-related algorithms and tools into the users' hands and to give them some amount of choice. Providing a default sorter kind of went against that, and the one I picked at the time was supposedly "best effort" but truly ad-hoc, likely not answering anybody's needs. Additionally it was a bloated mess, leading to measurable compilation slowdown and unreadable error messages.

If you feel nostalgic, you can still combine the bits and pieces yourself to bring back its spirit:

struct old_default_sorter:
    cppsort::self_sort_adapter<
        cppsort::hybrid_adapter<
            cppsort::small_array_adapter<
                cppsort::low_comparisons_sorter,
                std::make_index_sequence<14u>
            >,
            cppsort::quick_sorter,
            cppsort::pdq_sorter
        >
    >
{};

template<>
struct cppsort::stable_adapter<default_sorter>:
    cppsort::merge_sorter
{
    stable_adapter() = default;

    constexpr explicit stable_adapter(const default_sorter&) noexcept:
        stable_adapter()
    {}
};

cppsort::sort and cppsort::stable_sort (#167)

Despite looking cute and easy to use, these functions basically brought nothing to the table except a well-known interface for default_sorter, which got nuked as mentioned in the previous section. stable_sort was merely a wrapper hiding stable_adapter, and as such not much more useful. Both of the functions had heavy enough template tinkering to make overloads non-ambiguous, which happened to make compile times slower and error messages ever more unreadable when something went wrong.

In v2.0.0, the preferred solution is to use sorters and adapters directly.

drop_merge_sorter, split_sorter and verge_sorter

These sorters have respectively been replaced with drop_merge_adapter, split_adapter and verge_adapter. If you want the old sorters back, you need to explicitly wrap them as follows:

struct drop_merge_sorter:
    cppsort::drop_merge_adapter<cppsort::pdq_sorter>
{
    drop_merge_sorter() = default;
};

struct split_sorter:
    cppsort::split_adapter<cppsort::pdq_sorter>
{
    split_sorter() = default;
};

struct verge_sorter:
    cppsort:verge_adapter<
        cppsort::hybrid_adapter<
            cppsort::pdq_sorter,
            cppsort::quick_merge_sorter
        >
    >
{
    verge_sorter() = default;
};

template<>
struct cppsort::stable_adapter<verge_sorter>:
    cppsort::stable_t<
        cppsort::verge_adapter<
            cppsort::hybrid_adapter<
                cppsort::pdq_sorter,
                cppsort::quick_merge_sorter
            >
        >
    >
{
    stable_adapter() = default;

    constexpr explicit stable_adapter(const verge_sorter&):
        cppsort::stable_t<
            cppsort::verge_adapter<
                cppsort::hybrid_adapter<
                    cppsort::pdq_sorter,
                    cppsort::quick_merge_sorter
                >
            >
        >()
    {}
};

Those reimplementations are also examples of how the existing components of the library compose.

block_sorter

The old block_sorter has been renamed to wiki_sorter a few versions ago. The reason behind that change is that block sorts are nowadays a class of sorting algorithms more than a single algorithm: grail_sorter is another block sort, and many more have been invented in the last decade (many of them by members of the Discord server The Studio, @The-Studio-Discord), some of which differ widely in how they are implemented.

As such if felt unjust to keep the name "block sort" for a specific implementation thereof, especially since there is not one consensual "basic" block sort (unlike insertion, selection or heapsorts). Ironically enough, "wikisort" is also a bad name according to its author, who intended for his project to be "like a wiki around techniques to implement a block sort", but that's what the algorithm is generally called nowadays.

counting_adapter

This old sorter adapter eventually gave birth to a subcategory of adapters with a slightly different interface: metrics. It can be replaced with metrics::comparisons, which similarly counts the number of comparisons performed by a comparison sorter.

Metrics are a category of sorter adapters meant to gather information about a sorter while it is running, that return the gathered information wrapped into cppsort::utility::metric. As such, the main difference between counting_adapter and metrics::comparisons is that the latter returns utility::metric<CountType, comparisons_tag> instead of CountType.

probe::par

When I originally started adding measures of disorder to the library, I was discovering the very concept of disorder, and the literature around it at the same time. Most such measures had no implementation, neither in the paper nor in the wild, and the naming was pretty confusing. As a result lots of mistakes were made: along incorrectly implemented measures, I also accidentally implemented the same measure twice, under two different names: $Dis(X)$ and $Par(X)$. Those measures were introduced in different papers, but are fundamentally the same, as evidenced by Estivill-Castro, Mannila and Wood in Right invariant metrics and measures of presortedness.

A few years later as I was going through the literature and optimizing my implementation, I realized my mistake and removed $Par(X)$, reflecting the decision of Estivill-Castro and Wood to use the name $Dis(X)$ is all of their subsequent papers. It wasn't all for nothing: around the same time I added a section to the documentation describing the various names under which different measures of disorder appear in the literature. It was after realizing that Altman and Igarashi described $Radius(X)$ in Roughly Sorting: Sequential and Parallel Approach, which I thought was a new measure of presortedness before realizing it was yet another name for $Dis(X)$ and $Par(X)$.

utility::static_const

static_const was a hack originally used by Eric Niebler to solve ODR issues while circumventing the lack of inline variables. We used it to create global instances of sorters:

namespace
{
    constexpr auto&& awesome_sort
        = cppsort::utility::static_const<awesome_sorter>::value;
}

With C++17 we get inline variables, which render static_const useless, hence its deletion:

inline constexpr awesome_sorter awesome_sort{};

utility::make_integer_range and utility::make_index_range

Old unused features that don't even interact with the rest of the library.

Other breaking changes

probe::enc

The implementation of probe::enc was changed: the previous implementation returned $Enc(X) - 1$, which does not satisfy all of the axioms needed to make it a measure of presortedness. The new version implements it as $M_{Enc}$ instead, a variation of $Enc$ proposed by Vladimir Estivill-Castro in Sorting and Measures of Disorder that satisfies more axioms of measures of presortedness. It is defined as follows:

$$ M_{Enc}(X)= \begin{cases} 0 & \text{if } X \text{ is sorted,}\\ Enc(X_{tail}) & \text{otherwise, where } X_{tail} \text{ is } X \text{ without its leading ascending run.} \end{cases} $$

The main difference for users is that it returns $1$ more than the previous implementation for some sequences. The result of max_for_size accordingly changes from $\frac{|X| + 1}{2} - 1$ to $\frac{|X|}{2}$.

Miscellaneous

The following changes are less likely to immediately affect code in the wild, but are worth mentioning:

  • utility::projection_base is now a CRTP base class, which can help the compiler to better optimize projection chains (#212).
  • The comparison adapter make_projection_compare was renamed in projection_compare, and the returned type from projection_compare to projection_compare_t. This change makes the naming similar to the other comparison adapters in the library.
  • The O(n²) fallback of the measure of disorder probe::osc was removed: it was used when there wasn't enough heap memory available for the main O(n log n) algorithm, but didn't do so gracefully: it was all or nothing. I wanted to move away from such surprising changes of complexity, but also to get rid of quadratic algorithms as much as reasonable.
  • CMake options that didn't start with CPPSORT_ were removed. Equivalents options with a CPPSORT_ prefix can be used instead.

Bug fixes

  • Fix probe::osc.max_for_size(1) which used to return some garbage value.
  • Fix probe::osc.max_for_size for even sizes: it used to unconditionally return $\frac{|X|(|X| - 2) - 1}{2}$, which is fine with odd sizes, but the actual bound for even sizes is $\frac{|X|(|X| - 2)}{2}$.

Improvements

Tooling

Benchmarks:

  • The errorbar-plot benchmark now cycles through its list of markers when there's not enough markers for all displayed lines.
  • A new dist::max distribution accepting a factor parameter in the range [0.0, 1.0] was added. It generates an ascending sequence $X$ of elements, except that its first element is swapped with the one at position $\lfloor factor * |X| \rfloor$. This can be used to test whether an algorithm is Max-adaptive. The following graph displays how regular dist::max is:

Graph showing the number of runs generated by dist::max with different percentages

Documentation:

  • Replace all occurrences of "iterable" with either "range", "container", "collection" or "sequence" depending on the context.
  • Rewrite, expand and clarify the documentation about measures of disorder:
    • Define what a measure of disorder is, make it clear that measures of presortedness are a subset of measures of disorder that satisfy a specific set of additional axioms.
    • When a measure of disorder we implement does not satisfy one of those axioms, clearly document it, with a counterexample.
    • Document the monotonicity and prefix monotonicity properties, described as desirable by Valdimir Estivill-Castro in Sorting and Measures of Disorder.
    • Specify whether the measures of disorder we implement are monotonic or not.
    • Clearly document the specific notation used through this page of the documentation.
    • Provide better definitions of what constitutes an adaptive sorting algorithm, and of the operator used to establish a partial order on measures of disorder.
    • Mention that $Exc$, the minimum number of exchanges to bring a collection in sorted order, is NP-hard (issue #233) in the presence of elements that compare equivalent. That result was proven by Amir et al in On the Cost of Interchange Rearrangement in Strings, and explains why probe::exc does not always return the smallest possible number of exchanges in the presence of equivalent elements.
    • Clarify that probe::sus returns the minimum number of ascending subsequences that can be used to decompose a sequence minus 1.
  • Update the average complexity of some sorters to reflect that they adapt to some measure of disorder: for example the average complexity of mel_sorter is now documented as $n \log{} Enc(X)$.
  • Improve and extend the Original research section about the partial ordering of $Mono$, providing tentative proofs that $Mono \not \equiv Max$ and $Mono \not \equiv SUS$.
  • Remove the section about C++17 in the changelog page: everything that used to be described there is now supported by default.
  • Remove diff marks mentioning versions 1.x.y of the library. The plan is to reset such marks after every new major version.

Test suite:

  • Don't force the use of specific linkers anymore when compiling the test suite with CMake. Depending on the operating system and compiler, the test suite CMakeLists.txt used to force the use of gold or lld, nowadays it uses whatever CMake is configured to use without overwriting user options.
  • Test all sorters with a "descending plateau" distribution: a strictly descending subsequence followed by a plateau of equivalent elements, followed by another strictly descending subsequence.
  • Improve tests for measures of disorder:
    • Add a property test to ensure that, for any measure of disorder $M$, we have $M(X) \le M.max\_for\_size(|X|)$
    • Add a property test for order isomorphism, arguably the most important property of measures of disorder, and the second axiom defined by Mannila that measures of presortedness have to satisfy: given a measure of presortedness $M$ and two sequences $X$ and $Y$, if the relative order of all elements of $X$ and $Y$ is the same, then $M(X) = M(Y)$. The property ensures that only the order of elements is involved in the computation of the disorder.
    • Add a property test verifying that $M(subsequence(X)) \le M(X)$ for most of the library's measures of disorder (Mannila axiom 3 for measures of presortedness).
    • Add a property test for $M(XY) = M(X) \le M(Y)$ if $X \le Y$ for most of the library's measures of disorder (Mannila axiom 4 for measures of presortedness).
    • Add a property test for the following stronger bound: $M(XY) = M(X) + M(Y)$ if $X \le Y$. The property is satisfied by probe::exc, probe::ham, probe::inv, probe::rem, probe::runs and probe::spear.
    • Add a property test for prefix monotonicity, a property of interest satisfied some measures of disorder, described by Vladimir Estivill-Castro in Sorting and Measures of Disorder: given a measure of disorder $M$ and three sequences $X$, $Y$ and $Z$, then $M$ satisfies the prefix monotonicity property if $M(XZ) \le M(YZ)$ when $X \le Z$, $Y \le Z$ and $M(X) \le M(Y)$ (where $XZ$ is the concatenation of sequences $X$ and $Z$). This property is satisfied by probe::dis, probe::exc, probe::enc, probe::ham, probe::inv, probe::max, probe::rem, probe::runs, probe::spear and probe::sus.
    • Add a property test for monotonicity, another property of interest satisfied some measures of disorder, described in Sorting and Measures of Disorder: given a measure of disorder $M$ and four sequences $W$, $X$, $Y$ and $Z$, then $M$ satisfies the monotonicity property if $M(WXZ) \le M(WYZ)$ when, $W \le X \le Z$, $W \le Y \le Z$, and $M(X) \le M(Y)$. This property implies prefix monotonicity and is satisfied by probe::dis, probe::exc, probe::ham, probe::inv, probe::max, probe::rem, probe::runs, probe::spear and probe::sus.
    • Add a property test checking that $M( \langle n, 1, 2, ..., n - 1 \rangle ) \le |X|$ for most probes (Theorem 3.18-1 in Sorting and Measures of Disorder).
    • Add a property test checking that $M( \langle 2, 1, 4, 3, 6, 5, ... \rangle ) \le \frac{|X| * M(\langle 2, 1 \rangle)}{2}$ for most probes (Theorem 3.18-2 in Sorting and Measures of Disorder).
    • Add a property test checking that $M(Reversed(X)) = M(X)$ for probe::mono and probe::osc.
    • Add a property test for $Dis(XY) = \max{}(Dix(X), Dis(Y))$.
  • Bump downloaded version of Catch2 to v3.10.0.

Miscellaneous:

  • New Git version tagging scheme: instead of 1.x.y, we now use the more conventional v2.x.y scheme instead (prefix v).
  • Require CMake 3.11 or greater to build the library.
  • The CMake option CPPSORT_BUILD_TESTING now defaults to the standard option BUILD_TESTING, which means that it is now OFF by default.
  • Remove the vendored CMake tool Crascit/DownloadProject now that FetchContent is unconditionally available. Thanks for all those years together!
  • Change the cmake_minimum_required version to 3.5.0 in test_package, making it compatible with CMake 4.
  • Require C++17 in the Conan recipe that ships with the project.
  • CI: use the mold linker where possible.

Known bugs

I didn't manage to fix every bug I could find since the previous release, so you might want to check the list of known bugs.