How to design high-frequency trading systems and its architecture. Part I

HFT trading systems, electronic trading, software engineer, low latency

In this article, I would like to talk about how to implement a low latency system, the modules needed and its system architecture and focus on the software development side.

Other parts need to be covered to have a complete system, and they might be things like network communications, switches/routers, specialized servers, FPGAs, system operating tweaking, like kernel-bypassing, etc. But, as I said, in this article I will be more focused on software design and its architecture.

This is a basic approach, to what I consider best practices for this kind of trading system. Depending on different factors, markets, and compliances, this may vary, but some basics will remain the same for any trading system.

Of course, the key factor here is being able to handle a considerable amount of incoming information, response time to external events, internal response time, and capabilities to provide the highest throughput and lowest latency.

When I design low latency systems, I always found the following challenges:

  • The system must be able to handle multiple strategies, making sure that the system will not underperform as we add more strategies.
  • Order book reconstruction from different venues. If we take as an example like the forex market, there are tons of different venues and ECNs, all of them using different API and connectivity standards. The system must be able to handle those different sources and aggregate them into an internal and well-defined data structure.
  1. Data Feed Handler

The feed handler is one of the most important components of any algorithmic trading system. Without accurate market data, any high-frequency or algorithmic strategy won’t be able to make correct decisions. So, the idea of this module is to capture market data from different venues, to allow our strategies, generate correct decisions, and keep a representation of venues’ limit order book.

A large number of market updates can be received per second for each market venue, and your internal representation of each limit order book should change accordingly.

This process should go like this:


The link between venues and your system is usually made using FIX protocol, so I have to put a FIX engine, in order to communicate with them (receive and send messages) and provide the core infrastructure that interfaces and manages the connectivity into the venues using FIX protocol.

Receiving message will be receiving market updates, and sending, will be sending orders generated by my strategies.

There are two options to implement a FIX Engine: custom made, or a commercial option. To implement a custom FIX engine, you will need to put too many man-hours and you can make sure that you will optimize the communication for a low latency environment. Firms like large banks and institutions will prefer their own FIX engine, so they can have ownership of the entire code.

I prefer to go with the commercial option, since there are too many good options out there, and you just need to plug their library, and you will be ready to go.

Note that besides network and communication latency, there also will be a decode/parse latency, so I’ve to take care of it as well. Parsing is a string manipulation, hence very expensive in terms of CPU cycles and memory management.

There are venues that also started to provide connectivity through newer and better protocols since FIX is becoming too slow for nowadays algorithms. FAST, ITCH/OUTCH are binary protocols and I always try to use them as long they are available. They will use a different engine to communicate, but the concept remains the same.

  1. Limit Order Book

Once the system has its connectivity between the venues, I will need to update all the events reported by them: order updates/adds or deletions (trades if needed as well).

For each event received I will go into my internal order book and make the changes it needs to be made.

Usually, all venues send these updates using a unique identifier (EntryID) and the update type (update is an insert, update or delete) so you can accurately replicate their limit order book.

Here I show how the limit order book reconstruction works:


One important thing to keep in mind here is what type of data structure we want to use to hold the order book data. Remember that we can get several million updates per second, so our data structure should be fast enough to find an entry or delete it.

From previous recommendations, I stated that the best data structure is to use plain arrays: one for the bids, one for the asks. They provide the best performance.

Moreover, pre-allocate as much memory as you can.

Dynamic allocation is expensive, should not be used in critical paths like updating a limit order book data structure and you will end up having a large amount of overhead per allocation.

But since we are building a low latency system, we need to pre-allocate our data structures. And I’m going to show you the difference between pre-allocating and dynamic allocation, and its performance impact.

On the system initialization phase, pre-allocate these arrays: bidsArray and askArrays, and let’s say we are going to store a 10-level depth of the book, hence, we will need to pre-allocate 10-element bid/ask array. And we are going to move/replace elements, NOT removing and creating since the allocation process will consume too much time.

The following code shows how dynamic allocation works:


As you can see, we add or delete, and then I sort the entire array, so we can keep an ordered structure.

In the next section, I show how to pre-allocate and reuse.


You define the array with 10 elements on it and you will reuse all those elements. You may notice that instead of deleting the element, I’m clearing its values, and once ordered, sent them at the beginning of the array.

Once I run a performance test between them, counting CPU cycles taken on each case on a sample of 100,000 iterations,  I get the following results.


Wow, a 60% difference. Not a surprise for me, allocation/deallocation is expensive. And this was a very basic/simple example!!!

You can check this code example on my gist repository

Check the part II and III


Ariel Silahian

Keywords: #hft #quants #forex #fx #risk $EURUSD $EURGBP $EURJPY

One thought on “How to design high-frequency trading systems and its architecture. Part I

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.