Responsibilities:
- Design, develop, and maintain scalable data pipelines using Apache Spark.
- Write clean, efficient, and reusable code in Java or Scala (strong preference for Java/Scala).
- Work with HBase and Hive for data storage, processing, and retrieval in a distributed environment.
- Optimize queries and data operations using advanced SQL techniques.
- Ensure high availability and scalability of distributed data processing systems.
- Collaborate with cross-functional engineering teams to integrate data systems with microservices.
- Implement best practices in data engineering and ensure data quality, integrity, and consistency.
- Work in Unix/Linux environments to manage data infrastructure and automation.
- Participate in system architecture design discussions and recommend solutions to improve data platform efficiency.
- Provide support during development, testing, deployment, and operational phases of data solutions.
Required Skills & Qualifications:
- Bachelor's degree in Computer Science, Information Technology, Engineering, or related field
- 7–9 years of experience in data engineering or backend systems development.
- Strong hands-on experience with Apache Spark.
- Proficiency in Java or Scala (mandatory – Python developers will not be considered).
- Experience with HBase and Hive in production environments.
- Solid understanding of SQL and relational database systems.
- In-depth knowledge of distributed systems architecture and scalable system design.
- Experience working in Unix/Linux environments.
- Familiarity with the Spring Framework and microservices-based architecture.
- Strong problem-solving, debugging, and analytical skills.
- Excellent communication skills and ability to work in collaborative, cross-functional teams
- If required, must be able to work on weekend or public holidays for deployment to production server and cut over activities
Nice to Have:
- Exposure to Kafka or similar streaming platforms.
- Experience with cloud platforms (AWS, GCP, Azure) for data infrastructure.
- Familiarity with CI/CD pipelines in data engineering projects.
Report job