Data Engineer
Full Time
Dar es Salaam
Stanbic Bank Tanzania
Stanbic Bank Tanzania Limited is a licensed banking Institution in terms of the Banking and Financial Institutions Act of 2006.
Job Purpose
The Data Engineer will play a pivotal role in building and operationalizing the minimally inclusive data necessary for the enterprise data and analytics initiatives following industry standard practices and tools. The bulk of the data engineer’s work would be in building, managing, and optimizing data pipelines and then moving these data pipelines effectively into production for key data and analytics consumers like business/data analysts, data scientists or any persona that needs curated data for data and analytics use cases across the enterprise. Guarantee compliance with data governance and data security requirements while creating, improving, and operationalizing these integrated and reusable data pipelines, will be the key interface in operationalizing data and analytics on behalf of the business unit(s) and organizational outcomes.
Key Responsibilites
Build data pipelines: Managed data pipelines consist of a series of stages through which data flows (for example, from data sources or endpoints of acquisition to integration to consumption for specific use cases). These data pipelines must be created, maintained and optimized as workloads move from development to production for specific use cases. Architecting, creating and maintaining data pipelines will be the primary responsibility of the data engineer.
Drive Automation through effective metadata management: The data engineer will be responsible for using innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. The data engineer will also need to assist with renovating the data management infrastructure to drive automation in data integration and management.
This will include (but not be limited to):
- Learning and using modern data preparation, integration and AI-enabled metadata management tools and techniques.
- Tracking data consumption patterns.
- Performing intelligent sampling and caching.
- Monitoring schema changes.
- Recommending — or sometimes even automating — existing and future integration flows.
- The newly hired data engineer will need strong collaboration skills in order to work with varied stakeholders within the organization. In particular, the data engineer will work in close relationship with data science teams and with business (data) analysts in refining their data requirements for various data and analytics initiatives and their data consumption requirements.
- The data engineer should be curious and knowledgeable about new data initiatives and how to address them. This includes applying their data and/or domain understanding in addressing new data requirements. They will also be responsible for proposing appropriate (and innovative) data ingestion, preparation, integration and operationalization techniques in optimally addressing these data requirements. The data engineer will be required to train counterparts such as [data scientists, data analysts, LOB users or any data consumers] in these data pipelining and preparation techniques, which make it easier for them to integrate and consume the data they need for their own use cases.
- The data engineer will be considered a blend of data and analytics “expert,” “data guru” and “fixer.” This role will promote the available data and analytics capabilities and expertise to business unit leaders and educate them in leveraging these capabilities in achieving their business goals
Qualifications
Foundational knowledge of Data Management practices –
- Strong experience with various Data Management architectures like Data Warehouse, Data Lake, Data Hub, Relational database management systems (RDBMS) and the supporting processes like Data Integration, Governance, Metadata Management
- Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management.
- Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies such as stream data integration, CEP and data virtualization.
- Basic experience in working with data governance/data quality and data security teams and specifically information stewards and privacy and security officers in moving data pipelines into production with appropriate data quality, governance and security standards and certification. Ability to build quick prototypes and to translate prototypes into data products and services in a diverse ecosystem –
- Demonstrated success in working with large, heterogeneous datasets to extract business value using popular data preparation tools such as Trifacta, Paxata, Unifi, others to reduce or even automate parts of the tedious data preparation tasks.
- Strong experience with popular database programming languages including SQL, PL/SQL, others for relational databases and certifications on upcoming NoSQL/Hadoop oriented databases like MongoDB, Cassandra, others for nonrelational databases.
- Strong experience in working with SQL on Hadoop tools and technologies including HIVE, Impala, Presto, and others from an open source perspective and Hortonworks Data Flow (HDF), Dremio, Informatica, Talend, and others from a commercial vendor perspective.
- Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as R, Python, Java, C++, Scala, and others.
- Strong experience in working with both open-source and commercial message queuing technologies such as Kafka, JMS, Azure Service Bus, Amazon Simple queuing Service, and others, stream data integration technologies such as Apache Nifi, Apache Beam, Apache Kafka Streams, Amazon Kinesis, and stream analytics technologies such as Apache Kafka KSQL Apache Spark Streaming Apache Samza, others.
Ability to automate pipeline development –
- Strong experience in working with DevOps capabilities like version control, automated builds, testing and release management capabilities using tools like Git, Jenkins, Puppet, Ansible.
- Ability to collaborate with technical and business personas –
- Strong experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms
- Demonstrated success in working with both IT and business while integrating analytics and data science output into business processes and workflows.
- Basic experience working with popular data discovery, analytics and BI software tools like Tableau, Qlik, PowerBI and others for semantic-layer-based data discovery.
- Basic understanding of popular open-source and commercial data science platforms such as Python, R, KNIME, Alteryx, and others is a strong plus but not required/compulsory.
Exposure to hybrid deployments: Cloud and On-premise –
- Demonstrated ability to work across multiple deployment environments including cloud, on-premises and hybrid], multiple operating systems and through containerization techniques such as Docker, Kubernetes, AWS Elastic Container Service and others.
- Adept in agile methodologies and capable of applying DevOps and increasingly DataOps principles to data pipelines to improve the communication, integration, reuse and automation of data flows between data managers and consumers across an organization