Geovani Leitao

👋

I’m a senior Data Engineer with 8+ years of experience in Python, PySpark, SQL, Airflow, Databricks, AWS, and Terraform. Expert in Data Warehouses, Marts, and Lakes with robust and scalable pipelines. Passionate about complex problem-solving and eager for new challenges.

Contact

✉️ geovani.leitao@gmail.com

🔗 linkedin.com/in/geovani-leitao

👾 github.com/gLeitao

Introduction

I am a Senior Data Engineer with over 8 years of experience in large-scale projects utilizing technologies such as Python, PySpark, SQL, Airflow, Databricks, AWS, Terraform, and more. I excel in designing and modeling Data Warehouses, Data Marts, and Data Lakes powered by robust, scalable, and automated data pipelines. I am highly collaborative, passionate about solving complex problems, and eager to take on new challenges.

Technologies: Python, Terraform, CI/CD, PySpark, DevOps, Data lakes, Data Marts, Data warehouse, Apache Airflow, AWS Glue, Databricks, Amazon S3, Amazon Redshift, Oracle Databse, ETL, SQL and more.

Experience

Senior Data Engineer at TCS - Client Eli Lilly

2023 - Current

Python

AWS Ecosystem

Airflow

CloudFormation

Data Lake

Data Warehouse

SQL

ETL

Pyspark

Designed and implemented new data integration solutions, enhancing data flow and accessibility across various systems.

Analyzed and resolved complex bugs.

Reviewed and evaluated team deliverables, ensuring high quality and adherence to project standards.

Identified and proposed improvements for the data lake architecture, enhancing overall system efficiency and scalability.

Provided technical and business support to the team, addressing queries and offering expert guidance.

Senior Data Engineer at Semantix - Client PagSeguro

2023 - 2023

Python

AWS Ecosystem

Jenkins

Data Lake

SQL

ETL

Led a comprehensive CI/CD project, ensuring efficient and reliable software delivery throughout the development lifecycle.

Architected a robust and scalable CI/CD platform tool specifically tailored for AWS Redshift, optimizing the deployment and management of data pipelines and analytics workflows.

Developed the MVP of the CI/CD platform tool using Python.

Senior Data Engineer at Central Ailos

2015 - 2023

Python

AWS Ecosystem

Airflow

Pyspark

Databricks

Data Lake

SQL

ETL

Data Warehouse

Terraform

CI/CD

Datamart

Architected and implemented AWS Cloud infrastructure using Terraform for data management.

Led migration from on-premises to AWS Cloud, including construction of a data lake/warehouse in S3 + Redshift.

Built ETLs extracting data from multiple systems, managing thousands of tables, and enabling daily data ingestion into the AWS S3 Data Lake using AWS Glue and Databricks.

Created and maintained business datamarts using Databricks and Pyspark.

Orchestrated data pipelines with Airflow, and provided technical support and knowledge-sharing to the team.

Resolved complex bugs and optimized data pipelines processes to improve load performance that boosted data delivery to the business team.

Java Developer at Systextil

2013 - 2014

SQL

Java

Developed and modified screens for the Systextil ERP system.

Added new functionalities to existing system processes.

Created new processes to enhance the system.

Created and manipulated database objects (Oracle).

Skills

SQL

I’ve used it my entire professional life for managing and manipulating relational databases. It is widely used for querying, inserting, updating, and deleting data in databases.

Python

A popular high-level programming language widely used in data engineering. Over the past 4 years, I’ve utilized Python for tasks such as data extraction, transformation, loading, and pipeline orchestration

PySpark

A Python library for Apache Spark, a distributed computing framework used for big data processing and analysis. That I've used over the past 4 years.

Terraform

I have used Terraform for 2 years to build, deploy, and maintain analytics infrastructure as code safely and efficiently. Terraform uses a high-level configuration language and provides automation and collaboration features.

CI/CD

Is a automated software building, testing, and deployment process known as CI/CD ensures quick and reliable delivery with high quality. I have 2 years of DevOps experience building and maintaining CI/CD pipelines.

AWS

Is a cloud computing platform that provides computing power, storage, databases, and many other services. With 4 years of experience, I have utilized AWS infrastructure to design, build, and manage reliable and scalable cloud solutions for data engineering purposes.

Airflow

Is an open-source platform used for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). Over the past 4 years, I have utilized Airflow for orchestrating powerful data pipelines and automating complex data workflows in various data engineering projects.

Databricks

Is a cloud-based platform that integrates with Apache Spark, Databricks offers a powerful and scalable environment for data processing and analytics. That Over the past 4 years, I have used to build scripts for data lakes and data marts.

ETL

Is a data integration approach that involves extracting data from source systems, loading it into a target system, and then transforming it into the desired format. With over 8 years of experience, I have utilized ELT techniques to design and implement complex data integration solutions, ensuring the accuracy and integrity of the data.

Data Lakes

Over the past 4 years, I have designed, built, and maintained data lakes in AWS cloud provider. Data lake is a centralized repository that stores structured and unstructured data at any scale.

Data Marts

In the past 4 years, I built and maintained data marts in AWS cloud provider. Data marts are a subset of a data warehouse, designed to serve specific business functions or departments, commonly used for reporting and analysis purposes.

Data Warehouse

These centralized repositories store data from different sources, integrate them, and offer a comprehensive view of the data for reporting and analysis purposes. That I as a data engineer for the past 8 years, I have created, managed, and sustained data warehouses.

Languages

English 🇺🇸

Advanced Speaker

Portuguese 🇧🇷

Native speaker

Education

Bachelor's Degree in Computer Science

2014 - 2018/ FURB Universidade de Blumenau

Postgraduate Degree in Data Science

2019 - 2021/ FURB Universidade de Blumenau