
Gonzalo Lencina
Senior Data Engineer
Gonzalo Lencina
About Me
Data Engineer with over six years of experience in cloud infrastructures, data engineering, and data processing. Proficient inPython, Spark, Scala, AWS, and Kubernetes. I specialize in building scalable data pipelines, optimizing workflows with CI/CDpractices, and implementing cost-effective, high-performing architectures. Experienced in leading projects within remote andmulticultural environments.
- Age 29
- Residence Spain
- Address Calle Arco De Poniente
- e-mail gonza0305@hotmail.com
- Phone +34 695427476
What I Do
Testimonials

Gonzalo is a valuable worker for any team. He has top analytical and problem-solving skills, but also is caring and friendly. Working shoulder with him was a wonderful experience in which I learnt a lot. Solving the most stressful scenarios with pragmatism and humour, we always delivered in time. I can ask for anything else to a good workmate.

Gonzalo is an exceptional data engineer. We have worked together on several projects, and his dedication and commitment have been admirable. He possesses a strong technological stack and a high level of expertise in it. Working with him has been very rewarding as he is a highly efficient and proactive professional.

As a data engineer, Gonzalo has a deep understanding of data management and processing, as well as the technical skills needed to work with large and complex data sets. He is proficient in a variety of programming languages and tools, including Spark, Hadoop, and SQL, and has a talent for developing innovative and effective solutions to complex data-related challenges. I strongly recommend Gonzalo for any data engineering position. His technical expertise, problem-solving skills, and strong work ethic make him an asset to any team.

Working with Gonzalo at DataPebbles was a great experience. He’s a skilled data engineer with a solid understanding of building effective solutions and solving problems efficiently. What stood out to me was his supportive and approachable nature. He was always ready to collaborate and support the team when needed. His dedication and practical mindset contributed to the success of our projects, and I’m confident he’ll be a great addition to any team!
Cilents
Fun Facts
Happy Clients
5Working Hours
10400Awards Won
15Resume
Education
2019-2020
University Carlos III of MadridMaster In Analytical methods for Big Data
2013-2019
University Carlos III of MadridDual Bachelor in Computer Science and Business Adminsitration
Experience
2024 - Current
Data PebblesSenior Data Engineer
Leading data engineering projects using PySpark and Python within a Kubernetes environment. Optimized data visualization and monitoring with Elasticsearch, Kibana, and Grafana. Implemented and managed CI/CD workflows using ArgoCD Contributed to migrating from Kubernetes (k8s) to k3s, achieving a 40% reduction in runtime and costs by enhancing system efficiency.
2023 - 2024
RoamlerData Engineer
Built scalable data pipelines with Spark and Scala, and engineered a scraping engine on AWS that improved data quality and volume by 30%. Managed key AWS services (EMR, Lambdas, ECS, Glue, ECK) for processing and storage, and streamlined infrastructure provisioning with Terraform and workflow automation with Airflow.
2022 - 2023
AzerionData Engineer
Developed and managed AWS cloud infrastructures using Spark and Scala for data processing, integrating NoSQL for flexibility. Oversaw key AWS services (EC2, EMR, Glue, Lambda) for streaming and used Athena for querying, improving data architecture and storage.
2021 - 2022
CarrefourBid Data Developer
Implemented CI/CD processes with Jenkins and conducted data analysis using Cloudera tools. Utilized Spark-Scala and Control-M for efficient data processing and workflow management, enhancing performance and data integration
2020 - 2021
PUEBid Data Developer
Used Spark-Scala and MapReduce for efficient data processing, analyzed data with Cloudera for insights, and improved data integration with Apache Kafka, NiFi, and Oozie.
Design Skills
ETL
AWS
Data Modeling
Kubernetes
Coding Skills
Python
SQL
Spark
Scala
Knowledges
- Kafka
- Elastic Search
- Docker
- Grafana
- CI /CD
- Data Scrapping
- Git
- Problem-Solving
- Flexibility
Certificates

Databricks Certified Data Engineer Professional

Cloudera Dataflow - NiFi
Portfolio
Scrapping Application AWS
Scrapping Application AWSThe scraping application, built on AWS, automates scalable web scraping using Lambda functions, Glue Jobs, Spark, and Beautiful Soup for data extraction. EC2 instances and Fargate sidecars manage proxies dynamically via DynamoDB to prevent blocking. Data is validated with Great Expectations and stored in S3 buckets partitioned by date. With Route 53, an Application Load Balancer, Auto Scaling, and CloudWatch, the system ensures efficient resource management, monitoring, and reliability.
Reporting Kubernetes
Reporting KubernetesThe Reporting Pipeline in Kubernetes processes and transforms financial data using Spark and Python, generating reports like SFTR. It validates reports with DTCC for compliance and stores them in NFS. Deployed in Kubernetes, it leverages ArgoCD for management and logs to Grafana, Kibana, and Spark History Server for monitoring. OpenTelemetry ensures detailed workflow observability, making it a scalable and reliable financial reporting solution.
Enrichement Financial Data
Enrichement Financial DataThe Financial Data Integration Pipelines project processes and merges financial data from Bloomberg and Reuters using Spark and Python, referencing static tables for accuracy. Results are written to Elasticsearch for analysis, with data also published to Kafka queues for consumption by other teams. Orchestrated by Airflow with scheduled tasks and managed through ArgoCD in Kubernetes, the system ensures scalability and efficiency. Logs are captured in Grafana, Kibana, and Spark History Server, while OpenTelemetry and Prometheus provide observability and performance metrics. This project exemplifies a modern, monitored, and collaborative data integration pipeline.
Consumer habits and trends
Consumer habits and trendsThe Global Consumer Habits Analysis Platform enables users to perform worldwide searches on consumption habits through an intuitive web interface. User-specified parameters initiate AWS EMR containers, which utilize Spark and Scala to process historical data stored in S3. The transformed data is stored in Amazon Redshift, where it supports Business Intelligence teams in generating reports and dashboards. The platform ensures scalability with on-demand EMR clusters and efficient analytics with Redshift. Logs and metrics are monitored via Grafana and OpenTelemetry, while Jenkins handles CI/CD, automating deployments and ensuring system reliability. This solution delivers actionable insights with a focus on scalability and real-time analytics.
Real-Time Streaming Pipeline
Real-Time Streaming PipelineThis Real-Time Streaming Pipeline processes high-velocity data from Beeswax using Spark Streaming, AWS Kinesis, and Scala. The system ingests, filters, and transforms data in real-time within an EMR container, ensuring high-quality output. The processed data is stored in partitioned S3 buckets for efficient retrieval and future use. Advanced features like windowed aggregations, stateful operations, and backpressure handling enhance its streaming capabilities. Monitoring is enabled through OpenTelemetry, with logs and metrics visualized in Grafana. The architecture is highly scalable, leveraging auto-scaling EMR clusters and designed for seamless integration of additional data sources and real-time analytics.