Experienced  · 

Navigating Infrastructure as Code (IaC) in a non-cloud trading environment

Nicolás Demarchi

Part of series:

About the author


Nicolás is a seasoned software engineer with over 15 years of experience. He joined Optiver in 2019 with the aim of developing an infrastructure management platform and currently leads the Linux, Networks, and Platform infrastructure teams.

This blog post is an overview of a talk that I gave at EuroPython 2023. If you prefer, you can watch the full presentation via the conference site here.

In the high-performance landscape of algorithmic trading, technological infrastructure isn’t just important—it’s critical. While Infrastructure as Code (IaC) is a well-established practice in cloud-based solutions, its application in non-cloud environments presents unique challenges, especially in latency-sensitive environments like ours at Optiver.

In this post, I’ll go into these specific challenges and the solutions we’ve developed at Optiver.

The evolution of trading infrastructure

Optiver’s first-ever trade took place in 1986. In those days, we had a single trader on the floor of the Amsterdam Options Exchange, where brokers vocally communicated orders in a traditional trading floor setting. Fast forward to the present, trading has evolved into computer signals on wires and instead of the trading floor, data centres are the home of the trading exchanges.

Today, any entity that wants to join an exchange needs a way to connect to the exchange system that is hosted in a data centre. It’s possible to use brokers to connect, but of course, big market making companies like Optiver must build their own infrastructure to remain competitive.

It would also be possible—theoretically—to use cloud providers and a tool like Terraform to provision infrastructure. For example, you could have a simple Point of Presence (PoP) in the data centre where the exchange is hosted and connect this PoP to infrastructure running on the closest cloud-provider region. But in today’s trading landscape, success is constrained by the frequency of processors and latency of networks. As transactions occur in fractions of a second, even minor delays can translate to missed opportunities and reduced profitability. Competitive participants must operate at these high speeds, or risk falling behind in a market where microseconds, even nanoseconds, can mean the difference between profit and loss.

If you want to be at the top of the industry, building your own custom infrastructure has a lot of competitive advantages.

Modern exchange infrastructure

Figure 1: Single exchange system

The exchange system is hosted in a data centre, and every member is connected to it. Using UDP Multicast, the exchange gives you a guarantee that all members will receive information at the same time. For example, a stock option traded 10 lots at €5.00; then it’s up to your systems and applications to process the data, decide how to react and potentially send a new order back to the exchange.

This scenario means that we have to build all our networking and compute, including external connectivity, and connectivity to our offices where the traders are.

For each of your colocations, you are required to take care of the physical space, power, racks, cables, switches and servers, connectivity, firmware, OS, configurations, etc. That’s already not a simple problem to solve, but now multiply that by dozens of exchanges around the globe, and you can see the scale of the challenges we face at Optiver.

Infrastructure as Code

In recent decades, Infrastructure As Code (IaC) is the approach that has taken infrastructure to the next level; no more spreadsheets, manually maintained diagrams or tribal knowledge (like using pet names for servers). As engineers we don’t want to depend on our memories, some mnemonic rule, or even a spreadsheet to manage our environment. We want a system to take care of that, to enable us to easily manage scalability.  

IaC also gives you the power to enforce a standard, set up information assurance, build orchestration and have a good interface for applications to grow from. The true test of your infrastructure is the question: “Can you rebuild it from scratch?” If the answer is a fearless yes, you’ve passed the test.

Sadly, there is no “de facto” OSS solution to build on-prem infra. The “K8s of on-prem” has yet to be built. We suggest exploring OpenStack, MaaS and RackN before embarking on a custom-made solution.

Netbox is another interesting choice. After considering these options, we concluded that it was probably the closest match to solve part of our problem, but it didn’t have the flexibility and functionality we needed. Of course, we still use many open-source technologies to build our stack. NAPALM is a great community-driven library to interact with Switches, and our main system is built on top of Django, FastAPI and Celery.

Our Approach: IaC without the cloud

Figure 2: Optiver’s IaC system
Implementing a standard

Now back the original question: “How can we use IaC without the cloud?”

Step 0 is to have a standard. It’s unrealistic to build any platform to manage infrastructure without having a standard to implement. Any code we write should simply be an implementation of the standard, but our system should support more than one standard simultaneously, as retrofitting existing remote data centres is always going to lag behind your ideal standard to some degree. The standard should dictate everything from the colour of the cables to the networking architecture to the OS configuration.

Our intent systems

In our implementation, our “Infra-Intent” systems represent our infrastructure and we can model the different realities in a relational database. For example a data centre has racks and in these racks we have switches and servers, which are all interconnected with cables. You can compare this system with a cloud provider web-console where you can see a VPC that has virtual machines, each of which has interfaces that belong to a subnet.

Using the web API we can define our infrastructure, and once that’s there, it can be consumed by our provisioning pipelines that are simply reading that source and making it reality. A provisioning pipeline can be a process generating a switch configuration, or a pipeline that knows how to install an OS image in a bare-metal server and configure it. Decoupling these pipelines from our intent system allows us to easily change our provisioning tooling as we see fit. 
Defining the infrastructure in the Infra-Intent system is a key step for our automation but also to assure our standard. Provisioning all the resources will require hundreds of API calls that a single human would have to carry out in the required order. To do that we have code that can take a high-level definition and do all the required API calls. Just imagine here Terraform doing API calls to a cloud provider when you ask it to create a VM. This abstracts the low level complexity of the devices configuration and allow the engineers to effectively manage our infrastructure at scale.

Truth collectors and audits

Finally, our truth collectors take a snapshot of how our infrastructure looks in reality, and audits compare this reality to our intention. If something is different, we generate an action for a human to check what happened, so having tests for our infra in the same way we have them for code.

Real-life use case

When we create a new colocation, we first write our high-level definition files. Then we run our piece of code that will create all the resources in our intent system; this includes everything from the cables to the firmware version of a server. Once we have that, we can export patching instructions for our Data Centre Engineering team, who will do the physical work and connect everything as expected. Once that’s done, they can run a pipeline to verify that things are connected according to intent (so the same process as for our audits), and when that’s green they can just run the pipelines to provision the devices.

Your future in Infrastructure Software Engineering

I could go on for pages about this topic, but I hope this glimpse is enough to give an idea of our challenges and what are our Infrastructure Software Engineers are building.

If you’re excited about exploring the world of infrastructure as code and contributing to an environment that thrives on creativity and innovation, visit our job post below. We would love to hear from you.

ExperiencedLife at OptiverTechnology

Related Articles

  • Machine Learning at Optiver
    Experienced, Life at Optiver

    Machine learning opportunities in capital markets

    Solving problems at scale The allure of “problems at scale” is significant for researchers aspiring to transition from academia to the private sector. At Optiver, we are constantly scaling up in every dimension – adding more features, models, financial exchanges on which we trade; and expanding our range of products, asset classes and geographic colocations. […]

    Learn more
  • Experienced, Life at Optiver, Technology

    Behind the scenes: Engineering Optiver’s global trading network

    Optiver's global trading network is a marvel of engineering, ensuring rapid and reliable data transmission essential for electronic trading. Network Engineer Ryan Bennett reveals how dedicated fibre optic cables and meticulous route planning maintain Optiver's competitive edge. Despite challenges like geographical hurdles and fibre cuts, the network's resilience and continuous improvement keep Optiver at the forefront of trading innovation.

    Learn more
    Europe, Global
  • Experienced, Meet the team

    A finance role unlike others

    As a leading proprietary trading firm, Optiver works to make markets more efficient, transparent and stable across the globe. While our commitment to provide liquidity is continuous and our aim is to be a stabilising force, financial markets and our operations are dynamic. For the Finance Team, this requires continuous improvements in finance processes to stay aligned with evolving market conditions and business strategies.

    Learn more
  • Competition, Experienced

    Advent of Code 2023: Clean Code Challenge

    In December last year, Optiver proudly entered its third year as a sponsor of Advent of Code. This annual event, structured like an advent calendar, offers tech enthusiasts from around the world the chance to test and showcase their creative programming skills with a new festive-themed puzzle each day. Our sponsorship reinforces our commitment to fostering technical innovation and a culture of continuous learning.

    Learn more
    Europe, Global
  • Experienced, Life at Optiver

    Risk and reward within a dynamic trading firm: Insights from Optiver’s CRO Europe

    In business, risk management is often thought of as a of back-office support function—the department generally responsible for steering a company away from pitfalls and worse-case scenarios with cautionary, arms-length advice. Not at Optiver. In our high-stakes trading firm environment, it’s a core discipline that directly impacts the success of daily trading operations. As Optiver […]

    Learn more