Design, develop, and maintain data pipelines, ETL/ELT processes, and data integration workflows .
Architect and optimize data lakes, data warehouses, and streaming platforms .
Work with structured, semi-structured, and unstructured data at scale.
Implement real-time and batch data processing solutions .
Collaborate with Data Scientists, Analysts, and Business stakeholders to deliver high-quality data solutions.
Ensure data security, lineage, governance, and compliance across platforms.
Optimize queries, data models, and storage for performance and cost efficiency .
Automate processes and adopt DevOps/DataOps practices for CI/CD in data engineering.
Troubleshoot complex data-related issues and resolve production incidents.
Mentor junior engineers and contribute to technical strategy and best practices.

Programming & Scripting

Proficiency in Python, Scala, or Java for data engineering.
Strong SQL skills (query optimization, tuning, advanced joins, window functions).

Big Data & Distributed Systems

Expertise with Apache Spark, Hadoop, Hive, HBase, Flink, Kafka .
Hands-on with streaming frameworks (Kafka Streams, Spark Streaming, Flink) .

Cloud & Data Platforms

Deep knowledge of AWS (Redshift, Glue, EMR, Athena, S3, Kinesis) ,
or Azure (Synapse, Data Factory, Databricks, ADLS) ,
or GCP (BigQuery, Dataflow, Pub/Sub, Dataproc) .
Experience with Snowflake, Databricks, or Teradata .

ETL/ELT & Orchestration

Data Modeling & Storage

DevOps & Automation

Proficiency in CI/CD (Jenkins, GitLab, Azure DevOps) for data pipelines.
Experience with Docker, Kubernetes, Terraform, Ansible for infrastructure automation.

Other Tough Skills

Strong knowledge of Data Governance, MDM, Data Quality, Metadata Management .
Familiarity with Graph Databases (Neo4j), Time-Series Databases (InfluxDB, TimescaleDB) .
Understanding of machine learning data pipelines (feature engineering, model serving).

Bachelor’s/Master’s degree in Computer Science, Data Engineering, or related field.
7–10 years of experience in data engineering or big data development .
At least 2–3 large-scale end-to-end data platform implementations .
Preferred Certifications:
AWS Certified Data Analytics – Specialty
Google Professional Data Engineer
Databricks Certified Data Engineer

Data Engineer

Machine Learning Engineer