Key Responsibilities
- Design, develop, and maintain data pipelines, ETL/ELT processes, and data integration workflows .
- Architect and optimize data lakes, data warehouses, and streaming platforms .
- Work with structured, semi-structured, and unstructured data at scale.
- Implement real-time and batch data processing solutions .
- Collaborate with Data Scientists, Analysts, and Business stakeholders to deliver high-quality data solutions.
- Ensure data security, lineage, governance, and compliance across platforms.
- Optimize queries, data models, and storage for performance and cost efficiency .
- Automate processes and adopt DevOps/DataOps practices for CI/CD in data engineering.
- Troubleshoot complex data-related issues and resolve production incidents.
- Mentor junior engineers and contribute to technical strategy and best practices.
Technical Skills (Must-Have Tough Requirements)
Programming & Scripting
- Proficiency in Python, Scala, or Java for data engineering.
- Strong SQL skills (query optimization, tuning, advanced joins, window functions).
Big Data & Distributed Systems
- Expertise with Apache Spark, Hadoop, Hive, HBase, Flink, Kafka .
- Hands-on with streaming frameworks (Kafka Streams, Spark Streaming, Flink) .
Cloud & Data Platforms
- Deep knowledge of AWS (Redshift, Glue, EMR, Athena, S3, Kinesis) ,
or Azure (Synapse, Data Factory, Databricks, ADLS) ,
or GCP (BigQuery, Dataflow, Pub/Sub, Dataproc) . - Experience with Snowflake, Databricks, or Teradata .
ETL/ELT & Orchestration
- Strong experience with Airflow, Luigi, Azkaban, Prefect .
- ETL tools like Informatica, Talend, SSIS .
Data Modeling & Storage
- Experience with Data Lakes, Data Warehouses, and Lakehouse architectures .
- Knowledge of Star Schema, Snowflake Schema, Normalization/Denormalization .
DevOps & Automation
- Proficiency in CI/CD (Jenkins, GitLab, Azure DevOps) for data pipelines.
- Experience with Docker, Kubernetes, Terraform, Ansible for infrastructure automation.
Other Tough Skills
- Strong knowledge of Data Governance, MDM, Data Quality, Metadata Management .
- Familiarity with Graph Databases (Neo4j), Time-Series Databases (InfluxDB, TimescaleDB) .
- Understanding of machine learning data pipelines (feature engineering, model serving).
Qualifications
- Bachelor’s/Master’s degree in Computer Science, Data Engineering, or related field.
- 7–10 years of experience in data engineering or big data development .
- At least 2–3 large-scale end-to-end data platform implementations .
- Preferred Certifications:
AWS Certified Data Analytics – Specialty
Google Professional Data Engineer
Databricks Certified Data Engineer