WHO IS KLAARA ?
KLAARA provides AI-driven solutions tailored to the needs of customers in the capital market, insurance, reinsurance, and various other organizations. Our cutting-edge platform empowers businesses to transform their unstructured data into actionable intelligence, fuelling better decision-making and driving competitive advantage. KEY
RESPONSIBILITIES:
We are seeking highly skilled Solution Architect to lead the Infrastructure and Cloud Architecture of our AI-driven platform, particularly in the financial and banking sectors. In this role, you will be responsible for designing scalable, secure, and resilient cloud-native environments that support complex AI/ML workloads within the firm and at customer sites, ensuring optimal instrumentation, scalability, and compliance with security standards.
You will collaborate closely with cross-functional teams including Machine Learning Engineering, Security, Software Engineers, Business Analysts, and Customers to ensure seamless delivery, performance testing, and documentation of our solutions. This is a hands-on role requiring strong technical acumen, cross-functional teamwork, and occasional code fixes.
You are expected to:
- Design, implement, and manage scalable cloud infrastructure (AWS, Azure, or GCP) tailored for AI/ML workloads.
- Lead and execute the implementation of software deployment packages at customer sites, ensuring robust instrumentation and scalable performance.
- Design, develop, and execute load and scalability tests to validate system reliability under various conditions.
- Implement and maintain monitoring and alerting systems to identify proactively and resolve issues in development, staging, and production environments.
- Participate in incident response, root cause analysis, and contribute to continuous improvement of system reliability.
- Develop and maintain automation scripts and tools to streamline deployment, infrastructure provisioning, and routine operational tasks.
- Create and maintain clear, comprehensive documentation for customers on deployment, configuration, and operations.
- Collaborate with internal development teams, business analysts, and customers to align technical solutions with business requirements.
- Troubleshoot and resolve deployment and integration issues, occasionally contributing fixes in TypeScript or Rust.
- Ensure compliance with security and regulatory requirements, particularly in Banking environments.
- Maintain and optimize CI/CD, and MLOps pipelines and deployment automation using Jenkins and Atlassian tools.
- Manage and maintain development, staging, and production environments, ensuring consistency and high availability.
- Support and manage infrastructure components using Linux, Docker, and Kubernetes.
- Integrate and manage application components such as Kong, Open Telemetry, Kafka, RabbitMQ, and MySQL.
- Work with systems designed using microservice architecture, ensuring seamless deployment and integration across services.
- Utilize Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Helm to provision and manage infrastructure effectively.
REQUIREMENTS
- Ph.D or Master's Degree in Information Technology, Computer Science, Engineering, or related field.
- Minimum of 10 years of experience in Cloud Operations, Infrastructure Engineering, and Automation, with at least 5 years in AWS or similar Cloud operations (Azure, GCP), preferably within Banking or Financial Services industry.
- Proven experience with CI/CD tools such as Jenkins and the Atlassian suite (Bitbucket, Jira, Confluence).
- Solid understanding of mainstream cloud products and services including Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and OpenShift.
- Hands-on experience with Linux, Docker, Kubernetes, and cloud-native deployment practices.
- Solid understanding of microservice architecture and experience deploying and managing distributed systems.
- Experience with monitoring and observability tools, especially OpenTelemetry, and managing alerts and incident response processes.
- Proficiency in automation and scripting (e.g., Bash, Python, or similar) for deployment and infrastructure tasks.
- Experience in building and implementing Infrastructure as Code (IaC) tools such as Terraform or Open Tofu
- Experience with API and Microservices based architecture patterns for deploying ML models on cloud.
- Familiarity with API gateways (e.g., Kong), message brokers (Kafka, RabbitMQ), and databases (MySQL, PostgreSQL).
- Ability to design and execute performance, load, and scalability tests.
- Basic proficiency in TypeScript and/or Rust, sufficient for reading code and implementing minor fixes.
- Strong understanding of security, compliance, and risk management practices in financial services or banking.
- Excellent communication and collaboration skills; able to interact with technical and non-technical stakeholders.
Job Types: Full-time, Permanent
Pay: Up to $15,663.10 per month
Supplemental Pay:
- Attendance bonus
- Performance bonus
Education:
- Bachelor's or equivalent (Preferred)
Work Location: Hybrid remote in Downtown Core