Job Description
Role
Keelvar is looking for a Lead DevOps/SRE to join our growing Engineering & Product team. We are proud to be a remote-first company and this role can be based remotely in either Ireland, Germany, Spain or the UK.
Reporting directly to one of our engineering managers, you will be joining our team of SREs who work closely with our cross functional teams. Each team works on a specific product or product area and includes software engineers, data scientists, product managers and UI designers.
Responsibilities
-
Provide strategic leadership in defining, planning, implementing, iterating, and maintaining Keelvar’s cloud infrastructure in AWS, ensuring alignment with the company’s goals and scaling requirements.
-
Mentor and guide the SRE team in fostering a continuous deployment ecosystem that enables the product engineering teams to release changes to customers quickly, reliably, and sustainably through automated deployment pipelines.
-
Collaborate closely with engineering leadership and cross-functional teams to drive initiatives that enhance the availability, performance, and resilience of critical services
-
Lead efforts in infrastructure and application security by partnering with product engineering teams and the security and compliance team to incorporate DevSecOps principles and implement secure infrastructure, enforce defence-in-depth principles, and advocate for best practices in security.
-
Oversee and enhance production system monitoring to ensure optimal availability, latency, and overall system health, and provide strategic recommendations for improvements
-
Oversee and prioritise tickets and incoming requests to the SRE team, ensuring timely response to incidents, operational tasks, and support requests from product and engineering teams. Develop processes to triage, track, and resolve issues efficiently, and maintain clear communication with stakeholders regarding ticket status and resolution timelines.
-
Develop, test, and evolve disaster recovery plans to ensure business continuity, leading periodic drills to prepare for system failures or disasters.
-
Drive the design and implementation of monitoring and alerting strategies, ensuring application SLAs and SLOs properly defined, met and exceeded
-
Identify and lead initiatives for technical, operational, and process improvements, enabling continuous optimization of SRE practices and team efficiency.
-
Establish and enforce DevOps best practices across the organisation, promoting a culture of continuous integration, continuous delivery, and high system resilience.
-
Provide expert guidance in load testing, performance profiling, and capacity planning to support product scalability.
-
Maintain comprehensive documentation for infrastructure, processes, and procedures to facilitate knowledge sharing and onboarding.
-
Stay at the forefront of industry advancements by actively researching and implementing emerging SRE and cloud computing best practices to improve our processes and infrastructure
Your profile
-
7+ years of experience in SRE, DevOps with at least 2+ years in a leadership or senior engineering capacity
-
Proven track record in managing and scaling cloud infrastructure, preferably within AWS, with hands-on expertise in automation, infrastructure as code, and cloud-native architectures.
-
Strong technical background in CI/CD pipeline development, with experience building and optimising automated deployment pipelines in agile, product-focused environments.
-
Proficient in DevSecOps principles, with a deep understanding of security best practices in infrastructure and application deployment.
-
Experienced in managing monitoring, logging, and alerting systems for high-availability applications, with a strong knowledge of defining and meeting SLAs and SLOs.
-
Advanced skills in infrastructure-as-code tools, such as Pulumi and familiarity with container orchestration platforms like AWS ECS / Kubernetes
-
Programming skills in one or more languages, such as Python or Shell scripting to support automation and tooling.
-
Solid understanding of networking, database management, and system performance tuning, with the ability to diagnose and resolve complex issues.
-
Proven ability to mentor and guide SRE/DevOps teams, fostering a collaborative, continuous-learning environment.
-
You thrive in an environment of mutual respect, openness and collaboration. You enjoy getting things done at a quick pace.
-
You have a sense of ownership, you enjoy taking initiative and leading projects through to completion.
Why us?
Here at Keelvar, we are proud to be a remote-first organisation and we offer some great perks.
- Competitive salary with a Series B backed, fast growing organisation
- 25 days holidays increasing to 26 after 3 years and increasing again to 27 after 5 years. Plus your birthday off on us
- Flexible working hours with a positive approach to work – life balance
- An inclusive, collaborative, innovative culture
- Generous leave offerings including Wellbeing days
- Technology that enables you to perform to your best
If you really like the sound of the role but don’t match every listed criteria exactly, we still want to hear from you. You could be the exact fit for this or any of our other roles.
We are also a diverse group and we intend to continue to attract and retain diverse talent in our organisation. We’re committed to an inclusive and diverse Keelvar. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.
Position ID
1081285