Senior SRE Specialist

Meet your recruiter Anastasiia Shtanoprud
https://www.linkedin.com...
Vacancy details
DevOps Engineering
DevOps Engineer (AWS)
Senior
Bulgaria, 
Croatia, 
Poland, 
Portugal, 
Spain, 
Ukraine
Remote

Let’s breathe life into great tech ideas! With more than 3 000 people globally, Intellias is a company where benchmark technological solutions are born. Join in and take your part in digitalizing the world. 

What project we have for you

We are a publicly-traded FTSE250 FinTech company who run mobile, web and desktop platforms that help our clients trade stocks & shares, leveraged products, Futures & Options and Crypto.

We are ambitious. Over 340,000 people already use our platforms. We’re global with offices in 18 countries and products in 16 regions. We’re hungry to move faster, ship better product for our customers and grow our user base. We believe in high autonomy, and we want people who are looking to do things differently in order to create better experiences for our customers.

We work in cross-functional teams and are laser focused on increasing the number of active clients we serve to drive sustainable growth.

Your team
The SRE Team comprises highly skilled software engineers dedicated to embedding performance and reliability into our trading platform. You’ll work with cutting-edge distributed systems handling high-throughput, low-latency trading operations that demand zero downtime.

As a Site Reliability Engineer, you’ll champion reliability patterns, improve observability, establish 24/7 operations, and drive operational excellence across our crypto trading platform infrastructure and associated applications.

What you will do

System Reliability & 24/7 Operations

  • This role excludes on call support.
  • Implement comprehensive monitoring and observability using OpenTelemetry and distributed tracing
  • Establish and maintain 24/7 operational readiness including automated deployments, blue/green releases, and zero-downtime patching strategies
  • Define and track Service Level Objectives (SLOs) and Error Budgets for critical crypto trading services
  • Identify and eliminate single points of failure in distributed systems

Application Instrumentation & Observability

  • Instrument Java applications with OpenTelemetry spans, metrics, and traces
  • Work hands-on with development teams to add observability to their code
  • Guide teams on implementing meaningful SLIs that reflect user experience

Technical Leadership & Enablement

  • Partner with development teams on system design, capacity planning, and architectural reviews
  • Provide technical guidance and hands-on support for teams transitioning from traditional deployments to containerized infrastructure
  • Mentor developers on reliability patterns including circuit breakers, retry logic, and fault tolerance
  • Lead by example – write production code that demonstrates SRE best practices

Software Development & Automation

  • Write clean, maintainable code in Java and Python following industry best practices
  • Build automation tools and CI/CD pipelines that embed reliability practices
  • Contribute to application codebases to implement instrumentation and reliability patterns
  • Apply software engineering discipline including version control, code reviews, and testing

What you need for this

Java development experience– Must be able to read, write, and instrument Java code. Deep understanding of JVM internals and experience with complex distributed Java applications

Observability & Instrumentation – Hands-on experience with OpenTelemetry, distributed tracing concepts (spans, trace context propagation), and observability platforms such as Honeycomb, Datadog, Dynatrace, Splunk or Grafana. Strong understanding of OpenTelemetry Collector pipelines, including data transformation, enrichment, and labeling, use of processors (attributes, resource, transform, span, tail sampling), and propagation of custom business identifiers (e.g., customer/tenant/transaction IDs) across services to enable end-to-end trace correlation between heterogeneous systems, applications, and environments.

SLO/SLI Expertise – Proven experience defining SLOs based on SLIs, establishing error budgets, and working with development teams on reliability measurement

Reliability Patterns – Solid understanding of circuit breakers, retry logic, bulkheads, and other fault tolerance patterns

Cloud – AWS & Kubernetes Platform Engineering– Strong hands-on experience with AWS as the primary cloud provider, including production workloads on Amazon EKS. Proven expertise in Kubernetes networking, covering ingress and egress controllers (e.g., ALB / NGINX / Envoy), service configuration and fine-tuning (requests/limits, HPA/VPA, pod disruption budgets, network policies), and traffic management. Demonstrated ability to investigate and optimize performance and reliability using metrics, logs, and traces, complemented by chaos engineering practices (fault injection, node/pod failures, network latency, dependency outages) to validate system resilience and high availability under real-world failure scenarios.

Message Brokers – Production experience with ActiveMQ, Kafka, or similar messaging systems

Containerization – Hands-on experience with container orchestration (Nomad experience is advantageous, Kubernetes acceptable)

CI/CD – Experience building and maintaining deployment pipelines, preferably with GitLab

Experience Requirements

  • Track record in high-throughput, production environments (financial services, trading platforms, or similar mission-critical systems preferred)
  • Demonstrated ability to improve system reliability and performance at scale
  • Experience working collaboratively with development teams to implement observability and reliability improvements
  • Strong troubleshooting skills in distributed systems environments

Core Competencies

  • Systems thinking approach to problem-solving
  • Excellent communication skills for cross-functional collaboration and technical enablement
  • Ability to balance hands-on development work with operational responsibilities
  • Strong bias toward automation and eliminating manual toil
  • Comfortable working in a fast-paced environment with evolving requirements

What it’s like to work at Intellias

At Intellias, where technology takes center stage, people always come before processes. By creating a comfortable atmosphere in our team, we empower individuals to unlock their true potential and achieve extraordinary results. That’s why we offer a range of benefits that support your well-being and charge your professional growth.
We are committed to fostering equity, diversity, and inclusion as an equal opportunity employer. All applicants will be considered for employment without discrimination based on race, color, religion, age, gender, nationality, disability, sexual orientation, gender identity or expression, veteran status, or any other characteristic protected by applicable law.
We welcome and celebrate the uniqueness of every individual. Join Intellias for a career where your perspectives and contributions are vital to our shared success.

Skills

AWS
Java
Kafka/ActiveMQ
Kubernetes
Observability
OpenTelemetry
Tracing

Have not found the most
suitable position
yet?

Leave your resume and we will select a cool option for you.
Good news!
Link copied
Good news!
You did it.
Bad news!
Something went wrong. Please try again.