Responsibilities
As the Head of Observability and Automation (O&A), will be responsible for leading and overseeing the team to ensure the stability, reliability, and scalability of our banking platforms and services. This role requires a visionary leader with a deep understanding of SRE principles, a strategic mindset, and a hands-on approach to problem-solving. The ideal candidate will have a strong background in software engineering, infrastructure management, and a proven track record in driving reliability improvements in complex technical environments.
Key Responsibilities:
- Leadership & Strategy:
- Develop and execute the overall O&A strategy in alignment with the bank's business goals and technological vision.
- Lead, mentor, and manage a team of engineers, fostering a culture of collaboration, innovation, and continuous improvement.
- Define and implement the O&A framework and policy, standardizing reliability practices across the organization.
- Establish and lead a Centre of Excellence (COE) for O&A, promoting best practices and continuous learning adhering to the SRE principles and practices.
- Reliability & Performance:
- Design, implement, and maintain robust monitoring and alerting, to ensure high availability and performance of banking services.
- Drive the adoption of best practices in system design, capacity planning, and performance optimization.
- Identify and mitigate potential risks to system reliability, proactively addressing issues before they impact customers.
- Monitoring & Analysis:
- Develop and implement monitoring & analysis strategies to proactively identify and address potential issues within the technology infrastructure.
- Utilize data-driven insights to optimize system performance, reliability, and scalability.
- Collaborate with cross-functional teams to establish monitoring tools and metrics, ensuring alignment with business objectives and goals.
- Automation & Tooling:
- Champion automation efforts to streamline operational processes, reduce manual intervention, and increase system efficiency.
- Lead and implement AI/ML initiatives for transformation of our O&A landscape
- Collaboration & Communication:
- Work closely with the application & infrastructure teams to ensure that reliability is built into the architecture and design of new features and services.
- Communicate reliability goals, progress, and challenges to executive leadership and other stakeholders.
- Promote a culture of transparency and accountability within the team and across the organization.
- Vendor Management:
- Manage relationships with external vendors and service providers, ensuring their services align with the bank’s reliability and performance standards.
- Negotiate contracts and service level agreements (SLAs) with vendors to secure favourable terms and ensure accountability.
- Continuously evaluate vendor performance, addressing any issues and exploring new opportunities to optimize service delivery.
- Collaborate with vendors to stay abreast of the latest O&A technologies and tools that can enhance the bank’s infrastructure and reliability.
- Education & Experience:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Minimum 10 years of experience in software engineering, infrastructure management, or a related technical field.
- Minimum 5 years of experience in a leadership role within an SRE or DevOps team, preferably in the banking or financial services industry.
- Technical Skills:
- Proficiency in programming languages such as Python, Go, Java, or similar.
- Strong understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Experience with CI/CD pipelines.
- Deep knowledge of monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
- Soft Skills:
- Excellent leadership, mentoring, and team-building skills.
- Strong problem-solving and analytical abilities.
- Effective communication and interpersonal skills, with the ability to convey complex technical concepts to non-technical stakeholders.
- Strategic thinking and a proactive approach to identifying and addressing potential issues.
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.