tuna.blog
← Writing

Building a price-time-priority order book in C++

A matching engine does one thing. An order arrives, and it matches against the best resting orders on the other side under price-time priority: best price wins, and at the same price the earliest order wins. Get that wrong and you have either mispriced a trade or broken the fairness rule that every exchange depends on.

I wrote one in C++17 to find the engine's ceiling. Matching runs at p50 0.42µs, and cancels clear at 1.36M per second. The speed is not the point of this post. The point is a smaller puzzle inside it: "find the best price" and "cancel any order by id" both have to be constant time, and no single container gives you both. This is the design that does.

This is part 1 of a series. It covers the data structure and the matching loop.

The shape of it

The engine is single-threaded and synchronous. An order goes in, a list of events comes out. Everything downstream (the WebSocket server, the dashboard) is I/O fan-out that never touches the book, so the engine's throughput has nothing to do with the 10Hz display rate.

What it maintains is a live depth ladder. Resting buy orders below, sell orders above, both sorted toward the spread in the middle.

40100.06
25100.05
70100.04
spread
8599.98
5099.97
ord-17ord-42ord-88
3099.96

What "fast" has to mean

Four operations, each with a hard constraint:

  1. Best bid and best ask. Read on every incoming order, so this wants O(1).
  2. Insert a resting order at its price level, kept sorted.
  3. Match at a level by filling the oldest order first (FIFO).
  4. Cancel by id. In real markets cancels run 20 to 30% of message volume, so a scan is out of the question.

A sorted tree handles 1 through 3 but cancel-by-id is O(n) on it. A hash map gives O(1) cancel but knows nothing about price order. The whole trick is to keep both and let each do what it is good at.

Step 1: the order, and a floating-point smell

order.hpp
struct Order {
    std::string id;
    OrderSide   side;       // BUY / SELL
    OrderType   type;       // LIMIT / MARKET
    double price, quantity, remaining;
    bool is_filled() const { return remaining <= 1e-12; }
    double fill(double qty) { double f = std::min(qty, remaining); remaining -= f; return f; }
};
Copied to clipboard

Look at the 1e-12 on the highlighted line. fill() subtracts doubles over and over, so a fully filled order can land on remaining = 1e-15 instead of a clean zero. A phantom order like that would hold a price level open forever, and the epsilon hides it. That number comes back at the end of the post.

Step 2: two trees, opposite orderings

order_book.hpp
// Bids: highest price first  -> begin() is the best bid
std::map<double, OrderList, std::greater<double>> bids_;
// Asks: lowest price first   -> begin() is the best ask
std::map<double, OrderList> asks_;
Copied to clipboard

std::map is a red-black tree, and begin() is O(1) amortized because the tree caches its leftmost node. Flip the comparator to std::greater on the bid side and the best price (highest bid, lowest ask) sits at begin() on both sides. Each value is a std::list<OrderPtr>, the FIFO queue at that price: push_back on arrival, front() on match. Price-time priority falls out of the container choice for free. This layout, sorted price levels each holding a time-ordered queue, is the standard order-book model that venues like CME Globex run.

Step 3: the O(1) cancel trick

Here is the part I like. A second map, keyed by order id, holds an iterator that points straight at the order's node inside its price-level list:

order_book.hpp
struct OrderLocation {
    OrderPtr order;
    OrderList::iterator it;   // points INTO the price-level list
    bool   is_bid;
    double price_level;
};
std::unordered_map<std::string, OrderLocation> orders_;
Copied to clipboard

On insert, grab the iterator right after the push:

order_book.cpp
auto& level = bids_[order->price];
level.push_back(order);
auto it = std::prev(level.end());
orders_[order->id] = { order, it, true, order->price };
Copied to clipboard

Cancel is then a hash lookup, an erase by the stored iterator, and dropping the level if it just went empty:

order_book.cpp
auto level_it = bids_.find(loc.price_level);
if (level_it != bids_.end()) {
    level_it->second.erase(loc.it);              // O(1) list splice-out
    if (level_it->second.empty()) bids_.erase(level_it);
}
orders_.erase(it);
Copied to clipboard
orders_["ord-42"]stored iteratorord-17ord-42ord-88

Why std::list and not std::vector? A std::list iterator stays valid when you insert or erase elsewhere in the list. A vector invalidates the stored iterator on every push_back or erase, and the O(1)-cancel trick falls apart. That one requirement decides the container. The cost is poor cache locality and a heap allocation per order, which part 4 of this series fixes.

Step 4: walking the book

To match a buy order you keep taking the best ask and filling against it until the order is done or the price stops crossing:

matching_engine.cpp
auto try_match_buy = [&](bool check_price) {
    while (!order->is_filled()) {
        auto passive = book_.best_ask_order();
        if (!passive) break;
        if (check_price && passive->price > order->price) break;
        double fill_qty   = std::min(order->remaining, passive->remaining);
        double fill_price = passive->price;          // trade prints at the PASSIVE price
        order->fill(fill_qty);
        passive->fill(fill_qty);
        events.push_back(make_trade_event(order, passive, fill_price, fill_qty));
        if (passive->is_filled()) book_.remove_front_ask();
    }
};
Copied to clipboard

The highlighted line is the one people get wrong. A trade prints at the resting order's price, the maker's price, not the incoming order's. That gap is the spread the aggressor pays. It is why a sell at 99 that hits bids at 100 and 99 fills the 100 bid first and prints at 100.

Step 5: limit versus market

Same loop, one flag:

  • MARKET runs with check_price = false. It sweeps any level, and whatever cannot fill is rejected rather than parked in the book. That is immediate-or-cancel behavior.
  • LIMIT runs with check_price = true. It stops once the best opposite price no longer crosses the limit, and whatever is left rests in the book and emits ORDER_ACCEPTED. A marketable-limit order crosses what it can and rests the remainder.

Does it actually go fast?

I ran benchmarks/benchmark_engine.py on an Apple Silicon machine, driving the engine through the same Python-to-C++ boundary the server uses. These are end-to-end numbers, not a micro-benchmark of the C++ in isolation:

OperationThroughputp50p99
Limit insert737K /s0.25µs1.21µs
Market match424K /s0.42µs0.92µs
Cancel1.36M /s0.74µs avgn/a

Cancel comes out fastest, which is the payoff. It is the pure hash-lookup plus list-splice path with no tree traversal in the common case, exactly what the dual structure promised.

What I would change

Two honest caveats.

The prices should be integers. Every price and quantity here is a double, and prices are keys in a std::map. That only works because every generated price runs through round(price, 2) first, so the keys stay bit-identical. Real exchanges store prices as integer ticks (int64) for exactly this reason. The 1e-12 epsilon from step 1 is the tell. It exists to hide floating-point drift that an integer design would not have.

This is a teaching engine, not a colocation engine. A production matching engine would not use std::map<double, ...> at all. It would index a flat array by tick offset for O(1), cache-friendly level access, store orders as plain-old-data, and thread them with intrusive lists so there is zero heap allocation on the hot path. Public C++ books built that way report 1.4M transactions per second and up. That is the subject of part 4.

The lesson I keep relearning: the O(1)-cancel requirement is what picked the data structure. Start from "cancel must be constant time and must not invalidate," and the std::list plus stored-iterator design is the only thing that fits. The throughput was a side effect of getting that one constraint right.

Part 2 takes the same engine across the language boundary: how pybind11 marshals these events into Python, and where the latency actually goes once a browser is watching.