Dany D.
Lead Data & ML Engineer
Key Expertise
Experience
8+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
1. Agentic Automation Platform for Document-Intensive Workflows
Project overview:
The project involved architecting a greenfield agentic AI platform that automates the end-to-end processing of high-volume, document-heavy business cases for a regulated enterprise. A supervisor-style agent graph routes each case through a set of specialist agents that handle ingestion, enrichment, validation, coordination, and resolution, replacing manual review queues while keeping a human-in-the-loop checkpoint on high-stakes transitions. The agent layer sits on top of a cloud-native Databricks data platform with Unity Catalog governance, declarative streaming ingestion from an object-store landing zone, and a multi-region, multi-tenant infrastructure baseline.
Responsibilities:
- Designed the supervisor-and-specialist agent topology and the typed contracts exchanged between agents, with versioned schemas validated at every inter-agent boundary.
- Architected the Databricks data platform on a major public cloud: regional metastore provisioning, per-tenant workspaces for non-production and production, declarative streaming ingestion with quality expectations derived from schema definitions, and a quarantine path for records failing validation.
- Built the infrastructure-as-code hierarchy using Terraform and a dependency-orchestration layer, organised by region, business domain, environment, and stack, with reusable modules for metastore, workspace, catalog/schema/external location, and declarative permissions.
- Developed an internal data-engineering framework that wraps pipeline tasks with standardised configuration loading, logging, and schema enforcement, so that engineers author pipelines declaratively rather than by assembling Spark primitives.
- Implemented tenant isolation across storage (prefix/bucket partitioning), encryption (per-tenant keys), metadata (tagging for governance and cost allocation), and configuration (runtime-resolved rather than compile-time coupled).
Achievements:
Shipped the supervisor and specialist agents into production with durable state checkpointing, full execution tracing, and a repeatable end-to-end scenario suite covering happy paths and edge cases. Delivered the underlying Databricks platform as a reusable blueprint that subsequent internal business units onboarded against shared infrastructure modules rather than greenfield environments. Established strictly validated data contracts between agents so that malformed or incomplete messages are caught at the boundary and never propagate through the graph.
Technology stack:
2. AI-Driven Retail Execution Platform
Project overview:
The project involved delivering an enterprise data and AI platform for a multinational consumer-goods company to orchestrate daily sales-execution planning for its field teams across several major retail channels and international markets. The platform combines a medallion-architecture lakehouse on Databricks with a portfolio of production ML models that translate raw retailer feeds, inventory signals, compliance data, and third-party audits into a ranked set of outlet-level tasks delivered to reps each morning. The system operates as a multi-tenant codebase where each retailer channel is onboarded as a configurable tenant rather than a fork.
Responsibilities:
- Architected the bronze/silver/gold lakehouse on Databricks with parallel bronze ingestion, dozens of silver transformation tables, and a downstream gold layer consumed by the ML pipeline.
- Designed and implemented the ML inference DAG with explicit task dependencies, combining gradient-boosted forecasting, unsupervised segmentation, rules-driven compliance flagging, and a final prioritisation step that blends model-impact scoring with recency/cooldown constraints.
- Built a schema-governance framework using typed column and table definitions for consistent DDL management and evolution across bronze, silver, and gold layers.
- Implemented tenant-specific variation points (data sources, engineered features, enabled/disabled model outputs, output schema) so that a single codebase serves all downstream channels without branching.
- Stood up the CI/CD pipeline with lint, strict type checking, security scanning, asset-bundle deployment to layered target environments, and automated semantic versioning.
Achievements:
Brought a full production ML portfolio (demand forecasting at multiple time horizons, behavioural segmentation, compliance scoring, stock-availability risk, pricing anomaly detection, and a final task-ranking model) online and into daily operation. Reduced onboarding time for new retail channels from a multi-month custom build to a configuration exercise. Established a fully typed, validated configuration stack that catches misconfigurations before pipeline execution, eliminating an entire class of runtime failures.
Technology stack:
3. Cloud Lakehouse with Change-Data-Capture Ingestion
Project overview:
The project involved designing and delivering a cloud-native data platform for a financial-services institution moving off a fragmented legacy ETL stack. The platform is built around a medallion lakehouse on Databricks, declarative streaming transformations for the silver layer, and log-based change-data-capture from operational relational sources via a managed Kafka service. A config-driven pipeline layer decouples table onboarding from code changes, and a data-quality engine splits each stream into a clean sink and a quarantine sink for audit and remediation.
Responsibilities:
- Architected the bronze/silver/gold layout on cloud object storage and the declarative streaming transformation pipeline, where each silver table is materialised from a configuration entry declaring schema, constraints, paths, and source format.
- Built the data-quality engine (per-column mandatory checks, type-cast verification, row-level faulty-record flagging) and the dual-stream writer pattern that sinks valid and faulty rows into separate Delta destinations for downstream reconciliation.
- Implemented the CDC ingestion path: managed Kafka as the transport, log-based source connectors against the relational systems, an object-storage sink connector for landing, and a schema registry for evolution and serialization governance.
- Wrote the Terraform IaC covering resource group, redundant object storage with container layout, secret store with role-based access control and managed secrets for platform credentials, private virtual network with service endpoints, workspace, and orchestration layer.
- Established the CI/CD pipeline, a local test harness with session-scoped Spark fixtures, and a dev path mirroring the production storage topology so engineers can iterate without hitting cloud resources.
Achievements:
Replaced hand-written, per-table pipeline code with a declarative JSON configuration model - onboarding a new dataset becomes a configuration exercise rather than an engineering project. Materially reduced downstream data-quality incidents through per-column mandatory and type-cast validation, with full lineage of failed records into a quarantine table. Introduced log-based CDC with exactly-once delivery semantics, eliminating the polling overhead and latency of the previous arrangement while preserving schema evolution through the schema registry.
Technology stack:
Key Expertise
Experience
8+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
Ready to Work with Dany D.?
Lead Data & ML Engineer
Share your project details and our team will review the match and confirm availability.
We respond within 24 hours.