Key Responsibilities:

Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
Drive observability for our applications.
Drive optimise-operate initiative, example, reduction of operation toil
Work with application teams in setting up SLI, SLO and Error budget for their applications
Work with enterprise team in deploying SRE enablers/initiatives.

Strong background in machine learning and deep learning algorithms.
Proficiency in Python to developing Gen-AI models.
Ability to design and implement scalable and efficient AI systems.
Skills in data preprocessing and feature engineering for AI model training.
Ability to stay updated with the latest advancements in generative AI research and incorporate them into work.
Expert level knowledge of different OS (AIX, LINUX, WINTEL, Solaris) for BAU support, upgrades & maintenance.
Knowledge on OS Security & hardening.
Knowledge / hands on experience on Patch Management.
In-depth knowledge of LVM, SAN allocation & File System increase, Create new file systems in Cluster / Non-cluster environment.
ESXi, vSphere systems administration and support including vMotion, HA, DRS, vCenter Operations Manager, vCenter Service Manager, vCenter Configuration Manager, Site Recovery Manager.
Administering cloud-based & OpenShift based Infrastructure deployment. Administration tasks includes provisioning/de-provisioning Of resources.
Support audit and Infrastructure / network security scans, Disaster Recovery and security related drills.
Capacity review & performance management across all platform systems.
Knowledge on Middleware components such as JBOSS, APACHE, WebSphere Application server & MQ.
Knowledge on SSL Certificate procurement process & renewals.
Having knowledge on MariaDB, Oracle & DB2 databases Backup, DB restarts, access issues, DB Upgrade support.
Very good understanding of SAN configuration EMC/Hitachi LUNs on UNIX (AIX/Solaris/Linux) servers.
Mange Firewall, GTM & LTM configuration requests.
Ability to develop simple/complex shell scripts as per requirements and for automation.
Effective in dealing with crisis calls / critical issues for business-critical services.
Proven experience in technically guiding teams in productivity driven environment.
Worked in at least two of the areas of IT Infrastructure support i.e. Production Support, Application Support & infrastructure Support.
Explore, learn and deploy new technologies that will help the company to reduce cost or improve operational efficiencies.
Excellent troubleshooting and analytical skills
Communication and interpersonal skills.
Working across cultures & able to work 24*7

Job Type: Full-time

Pay: From $7,132.34 per month

Experience:

Site Reliability Engineer: 3 years (Required)
Python: 1 year (Required)

Save Apply

Report job

Site Reliability Engineer

Reliability Engineer

Senior Site Engineer

Site Engineer

Chiller Engineer

Senior / Staff Engineer Reliability

Service Engineer - Semicon Handler

civil engineer cum site engineer

Machine Learning Engineer

IT Support Engineer

Assistant Manager, Finance System & Automation