Job Description:

SLOs & error budgets - Define, track, and evangelize latency and availability targets for our payment APIs.
Observability - Deploy Cloud Monitoring, Cloud Trace, Error Reporting, and dashboards; integrate alerts via Incident.io and Slack for on-call.
Incident lifecycle - Establish blameless postmortems, guardrails, and runbooks to drive learning and prevent recurrence.
CI/CD golden path - Codify Cloud Build pipelines and automated canary rollouts for Cloud Functions / Cloud Run.
Infrastructure as Code - Manage GCP resources; embed security, IAM least-privilege, and cost controls by default.
Performance & cost tuning - Profile hot paths (BigQuery, Firestore, Pub/Sub), and implement caching or concurrency improvements to keep user latency < 100 ms.
Developer tooling - Eliminate toil by improving local-to-prod parity, secrets management, and spinning up environments with a single command.
Culture carrier - Instill reliability thinking across engineering and product as the first platform-focused hire.

Requirements:

At least 5+ years of experience building/operating production systems at scale, ideally on Google Cloud or a similar serverless stack, ideally in fast-paced or startup settings.
Hands‑on Fluency with Firebase, Cloud Build, Cloud Run/Functions, Pub/Sub, Cloud SQL/Spanner, VPC Service Controls.
Strong coding in Python or Go for automation, with an eye on maintainability.
Demonstrated record of driving observability, on‑call and cost optimisation in a fast‑moving environment.
Excellent collaboration and communication skills to work effectively with cross-functional teams.
Experience in payments, PCI‑DSS, or crypto settlement flows is a bonus.

Tech note: we are 99 % serverless . There are no pet VMs to patch, but the stakes are higher: every cold‑start, DB connection pool and retry policy can impact real money transfers. You’ll architect for resiliency and velocity.

Platform & Reliability Engineer

Backend Software Engineer (Architect) -Reliability -Singapore

Service Engineer

Senior Engineer / Engineer - Test Equipment (Tester Support)

Senior Engineer (Servers & Sytems)

Reliability Engineer

Senior Platform Engineer

Machine Learning Platform R&D Expert Engineer (Parameter Server Direction) - EGO Team

Senior / Staff Engineer Reliability

Machine Learning Engineer

Intern - Supplier Quality Development