Geovani Leitao

Geovani Leitao

👋
I’m a senior Data Engineer with 8+ years of experience in Python, PySpark, SQL, Airflow, Databricks, AWS, and Terraform. Expert in Data Warehouses, Marts, and Lakes with robust and scalable pipelines. Passionate about complex problem-solving and eager for new challenges.
 

Introduction


I am a Senior Data Engineer with over 8 years of experience in large-scale projects utilizing technologies such as Python, PySpark, SQL, Airflow, Databricks, AWS, Terraform, and more. I excel in designing and modeling Data Warehouses, Data Marts, and Data Lakes powered by robust, scalable, and automated data pipelines. I am highly collaborative, passionate about solving complex problems, and eager to take on new challenges.
 
Technologies: Python, Terraform, CI/CD, PySpark, DevOps, Data lakes, Data Marts, Data warehouse, Apache Airflow, AWS Glue, Databricks, Amazon S3, Amazon Redshift, Oracle Databse, ETL, SQL and more.
 

Experience


Senior Data Engineer at Semantix - Client PagSeguro

2023 - 2023
Python
AWS Ecosystem
 
Jenkins
Data Lake
 
SQL
ETL
 
  • Led a comprehensive CI/CD project, ensuring efficient and reliable software delivery throughout the development lifecycle.
  • Architected a robust and scalable CI/CD platform tool specifically tailored for AWS Redshift, optimizing the deployment and management of data pipelines and analytics workflows.
  • Developed the MVP of the CI/CD platform tool using Python.
 

Senior Data Engineer at Central Ailos

2015 - 2023
Python
AWS Ecosystem
Airflow
Pyspark
Databricks
Data Lake
SQL
ETL
Data Warehouse
Terraform
CI/CD
Datamart
 
  • Architected and implemented AWS Cloud infrastructure using Terraform for data management.
  • Led migration from on-premises to AWS Cloud, including construction of a data lake/warehouse in S3 + Redshift.
  • Built ETLs extracting data from multiple systems, managing thousands of tables, and enabling daily data ingestion into the AWS S3 Data Lake using AWS Glue and Databricks.
  • Created and maintained business datamarts using Databricks and Pyspark.
  • Orchestrated data pipelines with Airflow, and provided technical support and knowledge-sharing to the team.
  • Resolved complex bugs and optimized data pipelines processes to improve load performance that boosted data delivery to the business team.
 

Java Developer at Systextil

2013 - 2014
SQL
Java
  • Developed and modified screens for the Systextil ERP system.
  • Added new functionalities to existing system processes.
  • Created new processes to enhance the system.
  • Created and manipulated database objects (Oracle).
 

Skills
SQL
I’ve used it my entire professional life for managing and manipulating relational databases. It is widely used for querying, inserting, updating, and deleting data in databases.
Python
A popular high-level programming language widely used in data engineering. Over the past 4 years, I’ve utilized Python for tasks such as data extraction, transformation, loading, and pipeline orchestration
 
PySpark
A Python library for Apache Spark, a distributed computing framework used for big data processing and analysis. That I've used over the past 4 years.
 
Terraform
I have used Terraform for 2 years to build, deploy, and maintain analytics infrastructure as code safely and efficiently. Terraform uses a high-level configuration language and provides automation and collaboration features.
 
CI/CD
Is a automated software building, testing, and deployment process known as CI/CD ensures quick and reliable delivery with high quality. I have 2 years of DevOps experience building and maintaining CI/CD pipelines.
AWS
Is a cloud computing platform that provides computing power, storage, databases, and many other services. With 4 years of experience, I have utilized AWS infrastructure to design, build, and manage reliable and scalable cloud solutions for data engineering purposes.
Airflow
Is an open-source platform used for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). Over the past 4 years, I have utilized Airflow for orchestrating powerful data pipelines and automating complex data workflows in various data engineering projects.
Databricks
Is a cloud-based platform that integrates with Apache Spark, Databricks offers a powerful and scalable environment for data processing and analytics. That Over the past 4 years, I have used to build scripts for data lakes and data marts.
ETL
Is a data integration approach that involves extracting data from source systems, loading it into a target system, and then transforming it into the desired format. With over 8 years of experience, I have utilized ELT techniques to design and implement complex data integration solutions, ensuring the accuracy and integrity of the data.
 
Data Lakes
Over the past 4 years, I have designed, built, and maintained data lakes in AWS cloud provider. Data lake is a centralized repository that stores structured and unstructured data at any scale.
Data Marts
In the past 4 years, I built and maintained data marts in AWS cloud provider. Data marts are a subset of a data warehouse, designed to serve specific business functions or departments, commonly used for reporting and analysis purposes.
Data Warehouse
These centralized repositories store data from different sources, integrate them, and offer a comprehensive view of the data for reporting and analysis purposes. That I as a data engineer for the past 8 years, I have created, managed, and sustained data warehouses.
 

Languages

English 🇺🇸

Advanced Speaker

Portuguese 🇧🇷

Native speaker

Education

Bachelor's Degree in Computer Science
2014 - 2018/ FURB Universidade de Blumenau
 
Postgraduate Degree in Data Science
2019 - 2021/ FURB Universidade de Blumenau