Key Responsibilities:
- Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
- Drive observability for our applications.
- Drive optimise-operate initiative, example, reduction of operation toil
- Work with application teams in setting up SLI, SLO and Error budget for their applications
- Work with enterprise team in deploying SRE enablers/initiatives.
- Strong background in machine learning and deep learning algorithms.
- Proficiency in Python to developing Gen-AI models.
- Ability to design and implement scalable and efficient AI systems.
- Skills in data preprocessing and feature engineering for AI model training.
- Ability to stay updated with the latest advancements in generative AI research and incorporate them into work.
- Expert level knowledge of different OS (AIX, LINUX, WINTEL, Solaris) for BAU support, upgrades & maintenance.
- Knowledge on OS Security & hardening.
- Knowledge / hands on experience on Patch Management.
- In-depth knowledge of LVM, SAN allocation & File System increase, Create new file systems in Cluster / Non-cluster environment.
- ESXi, vSphere systems administration and support including vMotion, HA, DRS, vCenter Operations Manager, vCenter Service Manager, vCenter Configuration Manager, Site Recovery Manager.
- Administering cloud-based & OpenShift based Infrastructure deployment. Administration tasks includes provisioning/de-provisioning Of resources.
- Support audit and Infrastructure / network security scans, Disaster Recovery and security related drills.
- Capacity review & performance management across all platform systems.
- Knowledge on Middleware components such as JBOSS, APACHE, WebSphere Application server & MQ.
- Knowledge on SSL Certificate procurement process & renewals.
- Having knowledge on MariaDB, Oracle & DB2 databases Backup, DB restarts, access issues, DB Upgrade support.
- Very good understanding of SAN configuration EMC/Hitachi LUNs on UNIX (AIX/Solaris/Linux) servers.
- Mange Firewall, GTM & LTM configuration requests.
- Ability to develop simple/complex shell scripts as per requirements and for automation.
- Effective in dealing with crisis calls / critical issues for business-critical services.
- Proven experience in technically guiding teams in productivity driven environment.
- Worked in at least two of the areas of IT Infrastructure support i.e. Production Support, Application Support & infrastructure Support.
- Explore, learn and deploy new technologies that will help the company to reduce cost or improve operational efficiencies.
- Excellent troubleshooting and analytical skills
- Communication and interpersonal skills.
- Working across cultures & able to work 24*7
Job Type: Full-time
Pay: From $7,132.34 per month
Experience:
- Site Reliability Engineer: 3 years (Required)
- Python: 1 year (Required)
Report job