[What the role is]
We are seeking a skilled Ansible Automation and Elastic Observability Engineer to design, implement, and maintain our infrastructure automation and observability platform. The successful candidate will be responsible for developing automation solutions using Ansible whilst building comprehensive observability capabilities with the Elastic Stack to monitor, analyse, and optimise our IT infrastructure and applications.[What you will be working on]
Design and implement automation frameworks for infrastructure, applications, and processes for critical information infrastructure (CII)
Develop and maintain observability solutions, including monitoring dashboards, alerts, and metrics collection frameworks
Build self-healing systems and automated remediation solutions to ensure system reliability and performance
Develop and maintain Logstash pipelines for data ingestion, transformation, and enrichment from various sources including application logs, system metrics, and business data.
Develop comprehensive monitoring dashboards using Kibana/Grafana to track system health, performance metrics, and business KPIs. Troubleshoot cluster issues, performance bottlenecks, and data ingestion problems.
Participate in incident management, on-call rotations, and post-incident reviews to implement improvements
Create and maintain comprehensive documentation, runbooks, and best practices for automation and observability
Assist in change management of Observability & Automation platform for new versions, Hotfixes, Platform Admin tasks, etc.
Troubleshoot and resolve complex issues related to Elastic & Ansible components
[What we are looking for]
A relevant university degree with at least 3+ years of relevant working experience.
Minimum 3+ years of hands-on experience with Elasticsearch, Logstash, Kibana, and Beats/Elastic Agent.
Strong understanding of distributed systems, search algorithms, and data structures.
Proficiency in Linux system administration and command-line tools.
Experience with containerisation technologies such as Docker and Kubernetes.
Solid experience with Elastic Stack components including Elasticsearch, Logstash, Kibana, and Beats for log management and observability.
Understanding of observability principles including metrics, logs, and traces (MLT) and their implementation in distributed systems.
Experience with APM tools and distributed tracing technologies for application performance monitoring.
Solid programming skills in Python, Java, or similar languages for custom plugin development and automation. Experience with configuration management tools like Ansible, Puppet, or Chef.
Knowledge of scripting languages including Bash and PowerShell for operational tasks.
Ability to succeed in a fast-paced, high demand environment
Excellent oral and written communication skills
Experience working with infrastructure as code technologies such as Terraform is preferred.
Strong analytical and problem-solving capabilities
Excellent project management and organisational skills
Ability to work effectively under pressure and manage multiple priorities
Strong service-oriented mindset with focus on operational excellence
As part of the shortlisting process for this role, you may be required to complete a medical declaration and/or undergo further assessment.
This is a 2-year contract position. All applicants will be notified on whether they are shortlisted or not within 4 weeks of the closing date of this job posting.