Gonzalo Lencina

About Me

Data Engineer with over six years of experience in cloud infrastructures, data engineering, and data processing. Proficient inPython, Spark, Scala, AWS, and Kubernetes. I specialize in building scalable data pipelines, optimizing workflows with CI/CDpractices, and implementing cost-effective, high-performing architectures. Experienced in leading projects within remote andmulticultural environments.

  • Age 29
  • Residence Spain
  • Address Calle Arco De Poniente
  • e-mail gonza0305@hotmail.com
  • Phone +34 695427476

What I Do

ETL and Data Pipelines

Design and implement Extract, Transform, Load (ETL) processes and scalable data pipelines for processing large datasets using tools like Spark, PySpark, and Airflow. These pipelines are focused on improving data quality, automation, and handling large-scale data efficiently.

Cloud Infrastructure and Design

Develop and manage cloud-based infrastructures using AWS services like EMR, Lambda, Glue, ECS, and Athena. Designing cost-efficient, scalable architectures and streamlining cloud workflows with tools like Terraform.

Data Modeling and Processing

Specialize in creating optimized data models to support data integration and analytics.Leveraging NoSQL databases and improving performance in querying and data storage systems.

Workflow Automation and CI/CD

Implement and optimize CI/CD workflows using tools like Jenkins, ArgoCD, and GitHub Actions. This includes automating deployments and ensuring continuous integration of pipelines and infrastructure.

Testimonials

Cilents

Fun Facts

Happy Clients

5

Working Hours

10400

Awards Won

15

Resume

Education

2019-2020
University Carlos III of Madrid

Master In Analytical methods for Big Data

2013-2019
University Carlos III of Madrid

Dual Bachelor in Computer Science and Business Adminsitration

Experience

2024 - Current
Data Pebbles

Senior Data Engineer

Leading data engineering projects using PySpark and Python within a Kubernetes environment. Optimized data visualization and monitoring with Elasticsearch, Kibana, and Grafana. Implemented and managed CI/CD workflows using ArgoCD Contributed to migrating from Kubernetes (k8s) to k3s, achieving a 40% reduction in runtime and costs by enhancing system efficiency.

2023 - 2024
Roamler

Data Engineer

Built scalable data pipelines with Spark and Scala, and engineered a scraping engine on AWS that improved data quality and volume by 30%. Managed key AWS services (EMR, Lambdas, ECS, Glue, ECK) for processing and storage, and streamlined infrastructure provisioning with Terraform and workflow automation with Airflow.

2022 - 2023
Azerion

Data Engineer

Developed and managed AWS cloud infrastructures using Spark and Scala for data processing, integrating NoSQL for flexibility. Oversaw key AWS services (EC2, EMR, Glue, Lambda) for streaming and used Athena for querying, improving data architecture and storage.

2021 - 2022
Carrefour

Bid Data Developer

Implemented CI/CD processes with Jenkins and conducted data analysis using Cloudera tools. Utilized Spark-Scala and Control-M for efficient data processing and workflow management, enhancing performance and data integration

2020 - 2021
PUE

Bid Data Developer

Used Spark-Scala and MapReduce for efficient data processing, analyzed data with Cloudera for insights, and improved data integration with Apache Kafka, NiFi, and Oozie.

Design Skills

ETL

95%

AWS

90%

Data Modeling

85%

Kubernetes

80%

Coding Skills

Python

95%

SQL

90%

Spark

100%

Scala

75%

Knowledges

  • Kafka
  • Elastic Search
  • Docker
  • Grafana
  • CI /CD
  • Data Scrapping
  • Git
  • Problem-Solving
  • Flexibility

Certificates

Databricks Certified Data Engineer Professional

Membership ID: 96041297
Feb 2024

Cloudera Dataflow - NiFi

Membership ID: 633560e5a492
Sep 2021

Portfolio

Scrapping Application AWS

Scrapping Application AWS

The scraping application, built on AWS, automates scalable web scraping using Lambda functions, Glue Jobs, Spark, and Beautiful Soup for data extraction. EC2 instances and Fargate sidecars manage proxies dynamically via DynamoDB to prevent blocking. Data is validated with Great Expectations and stored in S3 buckets partitioned by date. With Route 53, an Application Load Balancer, Auto Scaling, and CloudWatch, the system ensures efficient resource management, monitoring, and reliability.

Reporting Kubernetes

Reporting Kubernetes

The Reporting Pipeline in Kubernetes processes and transforms financial data using Spark and Python, generating reports like SFTR. It validates reports with DTCC for compliance and stores them in NFS. Deployed in Kubernetes, it leverages ArgoCD for management and logs to Grafana, Kibana, and Spark History Server for monitoring. OpenTelemetry ensures detailed workflow observability, making it a scalable and reliable financial reporting solution.

Enrichement Financial Data

Enrichement Financial Data

The Financial Data Integration Pipelines project processes and merges financial data from Bloomberg and Reuters using Spark and Python, referencing static tables for accuracy. Results are written to Elasticsearch for analysis, with data also published to Kafka queues for consumption by other teams. Orchestrated by Airflow with scheduled tasks and managed through ArgoCD in Kubernetes, the system ensures scalability and efficiency. Logs are captured in Grafana, Kibana, and Spark History Server, while OpenTelemetry and Prometheus provide observability and performance metrics. This project exemplifies a modern, monitored, and collaborative data integration pipeline.

Consumer habits and trends

Consumer habits and trends

The Global Consumer Habits Analysis Platform enables users to perform worldwide searches on consumption habits through an intuitive web interface. User-specified parameters initiate AWS EMR containers, which utilize Spark and Scala to process historical data stored in S3. The transformed data is stored in Amazon Redshift, where it supports Business Intelligence teams in generating reports and dashboards. The platform ensures scalability with on-demand EMR clusters and efficient analytics with Redshift. Logs and metrics are monitored via Grafana and OpenTelemetry, while Jenkins handles CI/CD, automating deployments and ensuring system reliability. This solution delivers actionable insights with a focus on scalability and real-time analytics.

Real-Time Streaming Pipeline

Real-Time Streaming Pipeline

This Real-Time Streaming Pipeline processes high-velocity data from Beeswax using Spark Streaming, AWS Kinesis, and Scala. The system ingests, filters, and transforms data in real-time within an EMR container, ensuring high-quality output. The processed data is stored in partitioned S3 buckets for efficient retrieval and future use. Advanced features like windowed aggregations, stateful operations, and backpressure handling enhance its streaming capabilities. Monitoring is enabled through OpenTelemetry, with logs and metrics visualized in Grafana. The architecture is highly scalable, leveraging auto-scaling EMR clusters and designed for seamless integration of additional data sources and real-time analytics.

Contact

Madrid, Spain

+34 695427476

gonza0305@hotmail.com

Freelance Available

How Can I Help You?