Veronika Y.
Big Data Engineer
Veronika is an expert operating at the intersection of Data Engineering and Cloud Platforms, specializing in architecting resilient data processing systems and optimizing high-load cloud environments. Her approach centers on building transparent, scalable pipelines that do more than just move bytes—they empower businesses with high-quality data while minimizing operational overhead. In the data domain, Veronika possesses deep expertise in implementing Medallion Architectures using Databricks and Delta Lake. She has successfully designed end-to-end workflows, ranging from high-throughput web scraping systems powered by Python and Kafka to the development of production-ready analytical data marts. By prioritizing incremental processing, multi-stage data quality validation, and rigorous Spark performance tuning, she significantly accelerates time-to-insight while reducing compute resource consumption. Beyond pipeline engineering, Veronika actively drives the evolution of platform solutions. She has hands-on experience managing large-scale Kubernetes (AWS EKS) clusters, where she implemented Karpenter for intelligent autoscaling. This not only enhanced system elasticity under fluctuating workloads but also substantially optimized cloud expenditure through strategic instance selection and node lifecycle management. Veronika is committed to fostering stable production environments where automation (CI/CD, IaC) and observability (Monitoring/Alerting) are the baseline. She champions an engineering-first mindset where infrastructure reliability and data integrity go hand in hand, ensuring predictable business outcomes at any scale.
Key Expertise
Experience
7+ years
Timezone
CET (GMT +1)
1. URLs Scraping Pipeline
Project overview:
Built an end-to-end URL scraping high-throughput platform to continuously discover, schedule, and scrape web pages at scale. The pipeline coordinates scraping demand, executes distributed scraping jobs, and emits structured outputs for downstream processing and monitoring.
Responsibilities:
- Designed the orchestration flow that turns scraping signals into scheduled work units and manages execution windows, retries, and backpressure.
- Built distributed scraping services to fetch pages, normalize responses, and produce consistent scraping artifacts for downstream consumers.
- Optimized throughput and reliability by improving batching, error handling, and recovery logic for failed scraping attempts.
- Automated operational workflows (configuration, environment-based deployment, logging/metrics hooks) to support QA/prod parity and faster incident response.
- Implemented data quality safeguards to validate scraping outputs and prevent duplicate/invalid jobs from propagating through the pipeline.
Achievements:
Deployed a reliable production pipeline that schedules scraping workloads, adapts rescrape cadence based on content change signals, and scales horizontally to handle high-volume URL inventories while reducing unnecessary rescraping.
2. Kubernetes Autoscaling & Capacity Optimization
Project overview:
Built and maintained AWS EKS node provisioning with Karpenter by defining scalable, workload-aware node pools to automatically right-size capacity and improve cluster elasticity for data/compute services.
Responsibilities:
- Designed Karpenter node pools and provisioning constraints to match workload requirements (CPU/memory, architecture, zones, taints/tolerations), improving scheduling reliability and reducing manual node management.
- Automated cluster capacity management by codifying pool policies (limits, consolidation/expiration, disruption controls) to support safe scaling and predictable operations.
- Optimized EC2 costs by selecting appropriate instance families/sizes, leveraging spot where suitable, and eliminating over-provisioned capacity through right-sizing and consolidation strategies.
- Standardized environment configuration for repeatable deployments across stages (e.g., QA/prod) through infrastructure-as-code practices.
Achievements:
Enabled faster scale-out and more cost-efficient Kubernetes compute by introducing Karpenter pools tuned for different workload profiles, while reducing EC2 spend via right-sizing and lifecycle optimization.
3. Databricks Lakehouse Ingestion & Medallion Architecture
Project overview:
Built and operated a Databricks Lakehouse ingestion and transformation framework on Delta Lake, implementing a Medallion Architecture (Bronze/Silver/Gold) to move data from raw landing through curated layers into analytics-ready datasets for reporting and KPI consumption.
Responsibilities:
- Owned Medallion pipelines end-to-end: Bronze ingestion (schema drift, metadata, partitioning), Silver curation (standardization, deduplication, rule enforcement), and Gold outputs (consumption-ready models and aggregations).
- Implemented upsert/incremental patterns using Delta Lake ACID capabilities where needed.
- Tuned Spark performance (partition strategy, file sizing, query optimization) to keep jobs efficient at scale.
- Partnered with stakeholders to translate reporting needs into curated tables and maintain shared metric definitions.
Achievements:
• Delivered production-grade Delta pipelines with incremental processing, schema consistency, and integrated data quality validation. • Improved KPI reliability and time-to-insight by standardizing raw-to-curated flows and enforcing consistent dataset definitions. • Increased operational stability through repeatable Databricks job execution, structured logging, and pipeline observability.
Key Expertise
Experience
7+ years
Timezone
CET (GMT +1)
Ready to Work with Veronika Y.?
Big Data Engineer
Share your project details and our team will review the match and confirm availability.
We respond within 24 hours.