Site Reliability Engineer, UK
Partly
Location
London, United Kingdom
Employment Type
Full time
Location Type
On-site
Department
Engineering
Note: Partly is headquartered in the UK, with a Product and Engineering base in Christchurch, NZ and an early presence in San Francisco, US. This position is in office, based in London.
🚀 Our story
Partly's mission is to connect the world's parts and we're doing that by building the first global platform for replacement parts, starting with auto parts. Our big vision is to accelerate the world towards a sustainable future where waste is eliminated and all replacement parts are universally searchable, accessible and available to all.
Founded by ex-Rocket Lab engineers, we utilise cutting-edge technology to solve challenging but exciting problems that make a huge impact in a $1.9 trillion industry. We've more than tripled our team over the last 12 months and expect to double in size again over the coming 12 months. We're a global team spanning both Europe and Australasia.
We provide a scalable digital infrastructure solution to some of the world's largest businesses and the most exciting startups. Partly's solutions are integrated across hundreds of companies globally, providing the backbone for cataloguing and managing parts online.
Our investors in Blackbird Ventures (Canva, CultureAmp etc.), Square Peg, Octopus Ventures, Hillfarrance, Icehouse, Peter Beck (Rocket Lab), Akshay Kothari (Notion Co-Founder) and Dylan Field (Figma Co-Founder).
We're continuing to build a world-class team and ensuring Partly is a place where people can do the best work of their lives. We're proud of the culture we've built at Partly, and our values are lived throughout every experience.
🖍️ This role
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed systems, ensuring that both internally critical and externally visible services have the reliability, uptime, and performance appropriate to clients' needs while enabling a fast rate of improvement. SREs maintain constant awareness of system capacity and performance, ensuring our networks, platforms, and tools are scalable, secure, and reliable so engineers can focus on delivering impactful software. This senior role demands high autonomy, leadership, and strategic thinking, making it ideal for those excited by the challenge of designing and supporting the infrastructure that connects the worlds parts.
💻 What will you do
Reliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way.
Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance.
Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production.
Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution.
Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.
Want to learn more about the problems we're solving and the culture we're building at Partly? Hear directly from our team here: https://shorturl.at/iAFUX
🥷 Your skills
Software Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.
Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.
System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.
SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.
Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.
Ownership & Leadership: High degree of ownership and bias for action, with a proactive approach to solving problems. You take initiative and don’t wait to be told what to do. You have demonstrated leadership through mentoring junior engineers or leading small teams/projects, even if not formally a manager. We’re seeking a track record of ownership over critical systems and successful delivery of complex projects.
Collaboration & Communication: Excellent communication skills (written and verbal) and a collaborative attitude. You can work across teams and departments – from explaining technical issues to non-technical colleagues, to coordinating with engineers on deployments. You value teamwork and knowledge sharing.
Adaptability: Willingness to wear multiple hats and adapt to evolving needs. In a fast-growing startup environment, requirements can change – you’re excited by the chance to learn new skills, take on new challenges, and grow with the role.
-
Bonus Points:
Experience in a high-growth startup environment, which means you’re used to the pace and ambiguity.
Any prior experience maintaining security compliance and certifications in a company is a plus.
If you have used specific tools we use (GCP, ArgoCD, GitLab CI, Kafka, etc.), that’s great – if not, you can learn quickly.
If you have significant experience running production workloads over Apache Cassandra and / or Postgres database
If you developed software in Rust programming language and can mentor other developers on the best practices in Rust.
Please note: if you don't have all the skills/experience listed above but believe you could be outstanding in this role, please still consider applying. Many folks, especially those from underrepresented or marginalised groups, often count themselves out. Please allow us to learn more about you and why you're exceptional!
🪅 Benefits
High trust, low process and no bureaucracy. We hire exceptional people whose judgment we trust. This means we proactively remove any process or rules that slow us down (for example, our expense policy is simply the “red face test”).
Competitive base salary + equity. We offer competitive salaries and generous equity options for all full-time employees, ensuring everyone shares in the financial upside when we win.
Flexible working hours. Choose when to work based on what time you’re most effective (no mandatory or set hours). We combine flexibility with an office-first approach (in cities where we have critical mass, i.e. London, Christchurch, Auckland).
Focus Days. Two days per week, with zero meetings, dedicated solely to uninterrupted deep work
Take time when you need it. We don’t ask questions or care if people have a negative leave balance. We work extremely hard and trust our team to take the time they need to recharge.
Learn from the best. Whether it’s during a ‘Lunch n Learn’ or hearing from a unicorn CEO at a Fireside chat, you’ll have the opportunity to constantly learn from the world’s best.
Quarterly season openers across the UK and EU. Connect regularly at the nearest centralised location for a week of collaboration, big-picture planning and team events.
Team connection. Monthly team lunches, celebrating our wins, happy hours and more!
Parental leave and flexible return to work. Do what works for you. Primary carers can return with 4-day weeks (on 100% pay for the first 12 weeks). Secondary carers get 10 days full pay.
Payroll Giving: We encourage generous giving and donate to the high-impact charities you support
CycleSaver: UK employees can now save up to 47% on Lime, Forest, Beryl, or Santander cycle subscriptions through CycleSaver, enjoying the health benefits of cycling to work with flexible, hassle-free monthly plans instead of bike ownership.