Databricks Engineer
3 months ago
Company Description
EXL is a digital transformation partner that focuses on collaboration and tailoring solutions to meet the unique needs of each client. With expertise in transformation, data science, and change management, we help businesses make better decisions, improve customer relationships, and enhance revenue growth. Our approach involves listening, learning, and adapting methodologies to drive intelligence into digital operations and deliver successful outcomes.
Role Description
We are seeking a highly motivated and experienced Senior Databricks Engineer with a proven track record of building robust data pipelines and data warehousing solutions within the Databricks ecosystem. The ideal candidate will be a self-driven individual with a deep understanding of data modeling, a passion for data quality, and expertise in implementing end-to-end Databricks projects. This role will be instrumental in leveraging Databricks' cutting-edge technologies to transform our data landscape and drive data-informed decision-making within the healthcare sector.
Responsibilities:
- Databricks Data Pipeline Development: Design, develop, and maintain complex data pipelines using Delta Live Tables (DLT) to support SCD Type 1 and SCD Type 2 requirements.
- Proficiency in ADF Pipelines: In-depth understanding of Azure Data Factory pipelines, activities, triggers, and control flow. Ability to design and implement complex pipelines that handle diverse data sources and transformations.
- Data Integration Expertise: Strong grasp of data integration patterns and best practices. Experience connecting to various data sources (databases, files, cloud storage) and transforming data using ADF activities or external compute (Databricks, Azure Functions).
- Data Modeling: Design and implement efficient and scalable data models for data warehouses and data marts.
- Data Warehousing: Build and maintain data warehouses utilizing Databricks' capabilities, ensuring data integrity and performance.
- Delta Lake Expertise: Leverage Delta Lake features such as ACID transactions, time travel, and schema evolution for robust data management.
- Unity Catalog: Utilize Unity Catalog for data governance, access control, and data discovery within the Databricks environment.
- Delta Share: Facilitate secure and governed data sharing across internal and external stakeholders using Delta Share.
- CI/CD: Implement and manage continuous integration and continuous deployment (CI/CD) processes for Databricks notebooks and projects.
- Data Governance: Implement data masking and other data governance practices using Unity Catalog to safeguard sensitive data.
- Azure Ecosystem Knowledge: Understanding of Azure services like Azure Blob Storage, Azure SQL Database, Azure Data Lake Storage, and Azure Synapse Analytics.
- ADF Data Flows: Expertise in designing and optimizing data flows for efficient data transformations. Ability to use mapping data flows and wrangling data flows effectively.
Qualifications:
- 5+ years of experience in data engineering or a related field.
- 3+ years of hands-on experience with Databricks, including Delta Live Tables, Unity Catalog, Delta Share, and SQL Warehouses.
- Expertise in data modeling, data warehousing, and ETL/ELT processes.
- Proficiency in SQL and Python or Scala for Databricks development.
- Experience with CI/CD pipelines for Databricks notebooks and projects.
- Strong understanding of data governance and security principles.
- Experience with Azure cloud infrastructure and resource management.
- Excellent communication and collaboration skills.
- Self-motivated and able to work independently with minimal supervision.
- Experience in the healthcare sector is a plus.
- Experience with Databricks streaming tables and materialized views.
- Knowledge of data privacy regulations like HIPAA.
- Familiarity with other cloud platforms (AWS, GCP).
- Contributions to open-source Databricks projects.
- Experience with Healthcare projects will be a plus