Role Profile: Senior Data Engineering Lead Internal Data Ingestion Platform
Role Overview
The Senior Data Engineering Lead will spearhead the design, implementation, and technical oversight of an advanced internal data ingestion platform, enabling scalable and efficient ingestion of diverse datasets. The role involves leading a team of data engineers, driving the development of configurable Spark ETL pipelines, and leveraging modern data architecture principles to support downstream analytics and visualisation requirements.
Core Responsibilities
- Team Leadership and Development:
- Recruit, train, and manage a high-performing data engineering team.
- Foster a collaborative environment that promotes innovation and technical excellence.
- Establish best practices for data engineering processes and mentor team members to develop their skills.
- Platform Ownership and Technical Oversight:
- Act as the product owner for the internal data ingestion platform, ensuring alignment with business needs and long-term scalability.
- Provide technical oversight for the design and development of configurable Spark ETL pipelines that consume data from Kafka streams.
- Leverage Airflow for pipeline orchestration, ensuring efficient scheduling and execution of workflows.
- Manage the hosting and deployment of the platform on Azure Kubernetes Service (AKS).
- Data Processing and Storage:
- Oversee the integration of preprocessing tasks in Databricks, ensuring seamless transformation of raw data.
- Implement a hybrid storage architecture, utilising Snowflake, on-premise SQL warehouses, and Delta Lake to meet varied storage and access needs.
- Maintain the Medallion architecture (bronze/silver/gold layers) to organise and optimise data for downstream analytics.
- Metadata and Discoverability:
- Design and implement metadata capture processes to improve data discovery for internal consultants.
- Ensure that metadata aids in understanding data lineage, quality, and context for effective utilisation.
- Enablement for Data Visualisation and Analytics:
- Ensure the data platform provides well-organised datasets to downstream consultants for use with tools like PowerBI.
- Optimise pipelines to deliver high-quality, query-ready datasets that align with analytical and visualisation needs.
- Technical Innovation and Optimisation:
- Continuously evaluate and incorporate emerging technologies and practices to enhance the platform’s performance and scalability.
- Optimise the Spark-based ETL processes for handling large-scale data efficiently and reliably.
- Ensure pipelines are built with fault tolerance, monitoring, and logging to minimise downtime.
Key Skills and Qualifications
- Technical Expertise:
- Extensive experience with Apache Spark, Kafka, and Airflow for data ingestion and orchestration.
- Proficiency in designing and deploying solutions on Azure Kubernetes Service (AKS).
- Deep understanding of Databricks for data preprocessing and transformation.
- Hands-on experience with Snowflake, SQL warehouses, and Delta Lake for data storage.
- Knowledge of the Medallion architecture (bronze/silver/gold layers) and its practical applications.
- Team Leadership and Collaboration:
- Proven ability to lead and develop data engineering teams.
- Strong collaboration skills, working with data analysts, consultants, and cross-functional teams.
- Excellent communication skills to bridge technical and business requirements.
- Problem Solving and Scalability:
- Expertise in designing scalable systems to handle diverse datasets efficiently.
- Strong analytical and troubleshooting skills to optimise pipelines and resolve integration challenges.
- Metadata and Discoverability:
- Familiarity with metadata management tools and techniques to enhance data discoverability and usability.
- Understanding of data lineage and quality management.
Preferred Qualifications:
- Experience in hybrid cloud and on-premise data environments.
- Hands-on experience with PowerBI and similar data visualisation tools.
- Familiarity with advanced orchestration and automation frameworks beyond Airflow.
Success Metrics:
- Fully operational data ingestion platform with high scalability and reliability.
- Efficient delivery of clean, structured datasets aligned with the Medallion architecture.
- Improved metadata management processes that enhance internal consultant productivity.
- A well-trained, high-performing data engineering team.
Outcomes:
This role will ensure the organisation’s data ingestion platform is a robust foundation for analytics, enabling efficient and effective use of data for business insights while supporting scalability and evolving business needs.