AWS Data Engineer

2 months ago

Gurugram, India Infogain Full time

AWS Data Engineer (Senior) with skills Data Engineering, Kafka, Python, Scala, postgreSQL Development, AWS - EKS, AWS - CloudFormation, Data Modeling, ETL, Apache Hive, AWS-Apps, AWS-Infra, Apache Airflow, SQL, Datadog, Splunk, Apache Spark, AWS DBA for location Gurugram, India

Posted on: July 30, Share on Linkedin Share on Twitter Share on Facebook

ROLES & RESPONSIBILITIES

We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have extensive experience in ETL, Data Modelling, and Data Architecture. Proficiency in ETL optimization, designing, coding, and tuning big data processes using Scala is essential, along with hands-on experience in stream data processing using Spark, Kafka, and Spark Structured Streaming.

Additionally, the candidate should have extensive experience in building data platforms using a variety of technologies, including Scala, SQL/PLSQL, PostgreSQL, SQL Server, Teradata, Spark, Spark Structured Streaming, Kafka, Parquet/ORC, Data Modelling (Relational Dimensional E-R Modelling), ETL, RDS (PostgreSQL, MySQL), Splunk, DataDog, Airflow, Git, CI/CD Jenkins, JIRA, Confluence, IntelliJ Idea, Agile - Scrum/Kanban, On Call & Operations, Code Review, RCP Framework, Query book, Build, Deployment CI/CD & Release Process, Backstage, PagerDuty, and Spinnaker.

Key Responsibilities:

Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latency.

· Develop and maintain batch and stream processing data solutions using Apache Spark, Kafka, and Spark Structured Streaming.

· Work on orchestration using Airflow to automate and manage data workflows.

· Utilize project management tools like JIRA and Confluence to track progress and collaborate with the team.

· Develop data processing workflows utilizing Spark, SQL/PLSQL, and Scala to transform and cleanse raw data into a usable format.

· Implement data storage solutions leveraging Parquet/ORC formats on platforms such as PostgreSQL, SQL Server, Teradata, and RDS (PostgreSQL, MySQL).

· Optimize data storage and retrieval performance through efficient data modelling techniques, including Relational, Dimensional, and E-R modelling.

· Maintain data integrity and quality by implementing robust validation and error handling mechanisms within ETL processes.

· Automate deployment processes using CI/CD tools like Jenkins and Spinnaker to ensure reliable and consistent releases.

· Monitor and troubleshoot data pipelines using monitoring tools like DataDog and Splunk to identify performance bottlenecks and ensure system reliability.

· Participate in Agile development methodologies such as Scrum/Kanban, including sprint planning, daily stand-ups, and retrospective meetings.

· Conduct code reviews to ensure adherence to coding standards, best practices, and scalability considerations.

· Manage and maintain documentation using tools like Confluence to ensure clear and up-to-date documentation of data pipelines, schemas, and processes.

· Provide on-call support for production data pipelines, responding to incidents and resolving issues in a timely manner.

· Collaborate with cross-functional teams including developers, data scientists, and operations teams to address complex data engineering challenges.

· Stay updated on emerging technologies and industry trends to continuously improve data engineering processes and tools.

· Contribute to the development of reusable components and frameworks to streamline data engineering tasks across projects.

· Utilize version control systems like Git to manage codebase and collaborate effectively with team members.

· Leverage IDEs like IntelliJ IDEA for efficient development and debugging of data engineering code.

· Adhere to security best practices in handling sensitive data and implementing access controls within the data lake environment.

Good-to-Know Skills:

· Programming Languages: Python, Bash/Unix/Linux

· Big Data Technologies: Hive, Avro, Apache Iceberg, Delta Format

· Cloud Services: EC2, ECS, S3, SNS, SQS, CloudWatch

· Databases: DynamoDB, Redis

· Containerization and Orchestration: Docker, Kubernetes

· CI/CD Tools: Github Copilot

· Additional Skills: Maven, CLI/SDK

Nice-to-Have Skills:

· Networking: Subnets, Routes

· Big Data Technologies: Flink