Title: Data Engineer-ETL
Location(s): Houston, TX
Contract: 1 Year
Job Summary
- The Wells Data Foundation team empowers our Wells business users by building real-time data ingestion pipelines.
- Team members work closely with business SMEs, field personnel and 3rd party SaaS vendors to deliver these solutions. Responsible for the ETL (Extract-Transform-Load) procedures of current and historical data from multiple internal/external data sources, in particular:
- Extract data from multiple data sources through various available data vendors.
- Most of the data is structured or semi-structured while a small portion is unstructured.
- Transform data to reconcile changes in file formats and schemas over time or across multiple data sources for the same type of information.
- Align data at various temporal resolutions (e.g., aggregation or disaggregation at various levels in time).
- Store files in a structured way for future reference and develop a data schema; deposit data to database for convenient querying and analysis.
- Pipeline for current data: maintain automatic pipelines to ingest data continuously. o Ingestion and curation of historical data is a one-time effort.
- Once the ingestion pipelines for most of the data are established and stabilized, the support switches to the maintenance mode for the existing ingestion pipelines and additions of new data sources from time to time.
- Data cleaning to provide quick research turn around, together with a data retrieval API, supported in Excel Add-in and Python packages.
Skills required:
- Fluent in Python and packages related to data processing, such as pandas.
- Familiar with database technologies/SQL, data warehouses and related concepts