We are seeking an experienced DevOps Engineer to join our dynamic team in Singapore. The ideal candidate will be responsible for designing, implementing, and maintaining our cloud infrastructure, CI/CD pipelines, and monitoring systems to ensure high availability, scalability, and security of our services.
Required Technical Skills
Core Programming & Frameworks:
- Proficiency in Python with hands-on experience in FastAPI or Flask frameworks
- Strong understanding of RESTful API development and microservices architecture
Containerization & Orchestration:
- Extensive experience with Docker and Docker Compose
- Container lifecycle management and optimization
Google Cloud Platform (GCP) Infrastructure:
- Google Cloud Storage (GCS) automated backup solutions
- Google Container Registry for private Docker repositories
- Google Artifact Registry for private PyPI package management
- Cloud IAM for access control and permission management
- VPC Firewall Rules configuration and network security
- Cloud Load Balancing and Cloud CDN setup
MySQL/PostgreSQL Database Administration:
- Database performance monitoring and query optimization
- Disk space management and storage optimization
- Automated backup strategies and data archival/purging policies
- Database maintenance and cleanup procedures
Logging & Observability:
- Implementation of centralized logging solutions (ELK Stack, Cloud Logging)
- Log aggregation, parsing, and visualization using tools like Grafana, Kibana, or Cloud Monitoring
- Structured logging best practices
Monitoring & Alerting:
- Performance monitoring setup using Prometheus, Grafana, or Cloud Monitoring
- Application Performance Monitoring (APM) tools integration
- Alert configuration and incident response automation
- SLA/SLO monitoring and reporting
Message Queue Management:
- Apache Kafka cluster setup, configuration, and maintenance
- Topic management, partition optimization, and consumer group monitoring
- Kafka Connect and Schema Registry management
CI/CD Pipeline:
- Design and implementation of automated deployment pipelines
- GitHub CI/CD integration and pipeline optimization
- Automated testing integration and deployment strategies
- Blue-green and canary deployment patterns
Network & Security:
- Cloudflare Tunnel (cloudflared) configuration and management
- SSH tunneling and secure remote access solutions
- Nginx web server configuration and optimization
- systemd service management and daemon configuration
- Network troubleshooting and performance optimization
Preferred Additional Skills
Container Orchestration:
- Kubernetes (K8s) deployment and cluster management experience
- Helm charts creation and management
- Pod autoscaling and resource optimization
Task Management & Automation:
- Python Celery for distributed task queue management
- Cron job centralization and monitoring
- Workflow orchestration tools experience
Data Analysis & Documentation:
- Jupyter Notebook and pandas for data analysis and reporting
- Data pipeline automation and ETL processes
Knowledge Management:
- Experience creating and maintaining internal wikis or runbooks
- Technical documentation and knowledge sharing platforms
- GitBook, Confluence, or similar documentation tools
Additional Technical Areas (Nice to Have)
- Redis caching solutions
- Cost optimization and resource management in cloud environments
Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field
- 3+ years of experience in DevOps, Site Reliability Engineering, or related roles
- Strong problem-solving skills and ability to work in a fast-paced environment
- Excellent communication skills
- Experience working in Agile/Scrum development environments