Job Title: Back End Data Engineer
Location: Houston, Texas
The Back End Data Engineer develops the back end of a data processing application, i.e. handling the logic, database interactions, user authentication, configuration, data processing, performance and scalability tuning etc. The Data engineer implements given requirements as per standard engineering practices and company standards, performs requirements analysis and design, as well as evaluates technologies and patterns suitable for the solution. The position reports to the software project manager.
‘ Builds prototypes, products and systems that meets the project quality standards and requirements
‘ Provides technical leadership and documentation to developers and stakeholders
‘ Contributes to and supports re-use through common components that are well documented and tested.
‘ Provide timely corrective actions on all assigned defects and issues.
‘ Contributes to development plan by providing task estimates.
‘ Design and implement solutions for problems arising out of large-scale and high velocity data processing
‘ Ensure end-to-end ownership of all tasks being aligned
‘ Design, build & maintain efficient, reusable & reliable code
‘ Test implementation, troubleshoot & correct problems
‘ Capable of working as an individual contributor and within team too
‘ Ensure high quality software development with complete documentation and traceability
‘ Fulfil organizational responsibilities (sharing knowledge & experience with other teams/ groups)
‘ Conduct technical training(s)/session(s), write whitepapers/case studies/blogs etc.
‘ Bachelors degree or higher in Computer Science or related with minimum 10 years working experience
Skills and knowledge
‘ 7+ years of software development experience in Big Data technologies (Spark/Hive/Hadoop)
‘ Design, build and maintain data processing pipelines in Apache NiFi, Spark Jobs, AirFlow
‘ Good experience in Databricks, Delta Lake, Spark on Kubernetes
‘ Understanding on databases, data warehouses, data lakes
‘ Experience in working on Hadoop Distribution, good understanding of core concepts and best practices
‘ Good experience in building/tuning Spark pipelines in Scala/Python
‘ Good experience in writing complex Hive/Delta Lake queries to derive business critical insights
‘ Good Programming experience with Java/Python/Scala
‘ Experience with Azure Databricks, exposure to Kubernetes/Kinesis will be good to have
‘ Experience in NoSQL Technologies – HBase/Azure Tables, Pinot, Dynamo DB
‘ Strong DevOps, Data Engineering skill
‘ Strong in PySpark & SQL
‘ Experience with CI/CD
‘ Exposure in PowerBI, SpotFire, Dataiku
‘ Experienced in application profiling, bottleneck analysis and performance tuning
‘ Knowledge and experience with version control tools (Git preferred but not mandatory)
2. Nice to have
‘ Build, test and maintain tools, infrastructure to support Data science initiatives
‘ Good communication and cross functional skills.
‘ Experience deploying machine learning models into production environment.
‘ Knowledge of containerization and orchestration (such as Docker, Kubernetes)
‘ Experience with ML training/retraining, Model Registry, ML model performance measurement
‘ Oil and gas industry experience
‘ Architectural expertise