Skip to main content
Download free report
SoftBlues

Dany D.

Lead Data & ML Engineer

Key Expertise

Declarative Data EngineeringAdvanced Stream ProcessingReal-time CDC PipelinesMedallion Lakehouse DesignMLOpsAgentic AI Architecture

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

LightGBMstatsmodelsLangGraphLangChainMLflow

Languages

Python

Databases

Delta LakePostgreSQLUnity CatalogAuto LoaderDatabricks

Infrastructure

KafkaTerraformKubernetesAWSAzureAzure DevOps PipelinesGitLab CIDatadogcentralized loggingruffmypybandit

Frameworks

Scikit-learnPySparkPydantictyped configuration frameworkDatabricks Asset BundlesDatabricks Workflowsdeclarative streaming pipelinespytest

Integrations & Protocols

Model Context Protocollog-based CDC connectorsPower BI
7-day risk-free trial
Response within 24 hours

1. Agentic Automation Platform for Document-Intensive Workflows

AI Architect & Tech Lead Data Engineer·2025-2026

Project overview:

The project involved architecting a greenfield agentic AI platform that automates the end-to-end processing of high-volume, document-heavy business cases for a regulated enterprise. A supervisor-style agent graph routes each case through a set of specialist agents that handle ingestion, enrichment, validation, coordination, and resolution, replacing manual review queues while keeping a human-in-the-loop checkpoint on high-stakes transitions. The agent layer sits on top of a cloud-native Databricks data platform with Unity Catalog governance, declarative streaming ingestion from an object-store landing zone, and a multi-region, multi-tenant infrastructure baseline.

Responsibilities:

  • Designed the supervisor-and-specialist agent topology and the typed contracts exchanged between agents, with versioned schemas validated at every inter-agent boundary.
  • Architected the Databricks data platform on a major public cloud: regional metastore provisioning, per-tenant workspaces for non-production and production, declarative streaming ingestion with quality expectations derived from schema definitions, and a quarantine path for records failing validation.
  • Built the infrastructure-as-code hierarchy using Terraform and a dependency-orchestration layer, organised by region, business domain, environment, and stack, with reusable modules for metastore, workspace, catalog/schema/external location, and declarative permissions.
  • Developed an internal data-engineering framework that wraps pipeline tasks with standardised configuration loading, logging, and schema enforcement, so that engineers author pipelines declaratively rather than by assembling Spark primitives.
  • Implemented tenant isolation across storage (prefix/bucket partitioning), encryption (per-tenant keys), metadata (tagging for governance and cost allocation), and configuration (runtime-resolved rather than compile-time coupled).

Achievements:

Shipped the supervisor and specialist agents into production with durable state checkpointing, full execution tracing, and a repeatable end-to-end scenario suite covering happy paths and edge cases. Delivered the underlying Databricks platform as a reusable blueprint that subsequent internal business units onboarded against shared infrastructure modules rather than greenfield environments. Established strictly validated data contracts between agents so that malformed or incomplete messages are caught at the boundary and never propagate through the graph.

Technology stack:

LangGraphLangChainPythonPydanticPySparkDatabricksdeclarative streaming pipelinesTerraformAWSPostgreSQLMLflowModel Context ProtocolGitLab CIDatadog

2. AI-Driven Retail Execution Platform

Lead Data & ML Engineer·2023 - 2024

Project overview:

The project involved delivering an enterprise data and AI platform for a multinational consumer-goods company to orchestrate daily sales-execution planning for its field teams across several major retail channels and international markets. The platform combines a medallion-architecture lakehouse on Databricks with a portfolio of production ML models that translate raw retailer feeds, inventory signals, compliance data, and third-party audits into a ranked set of outlet-level tasks delivered to reps each morning. The system operates as a multi-tenant codebase where each retailer channel is onboarded as a configurable tenant rather than a fork.

Responsibilities:

  • Architected the bronze/silver/gold lakehouse on Databricks with parallel bronze ingestion, dozens of silver transformation tables, and a downstream gold layer consumed by the ML pipeline.
  • Designed and implemented the ML inference DAG with explicit task dependencies, combining gradient-boosted forecasting, unsupervised segmentation, rules-driven compliance flagging, and a final prioritisation step that blends model-impact scoring with recency/cooldown constraints.
  • Built a schema-governance framework using typed column and table definitions for consistent DDL management and evolution across bronze, silver, and gold layers.
  • Implemented tenant-specific variation points (data sources, engineered features, enabled/disabled model outputs, output schema) so that a single codebase serves all downstream channels without branching.
  • Stood up the CI/CD pipeline with lint, strict type checking, security scanning, asset-bundle deployment to layered target environments, and automated semantic versioning.

Achievements:

Brought a full production ML portfolio (demand forecasting at multiple time horizons, behavioural segmentation, compliance scoring, stock-availability risk, pricing anomaly detection, and a final task-ranking model) online and into daily operation. Reduced onboarding time for new retail channels from a multi-month custom build to a configuration exercise. Established a fully typed, validated configuration stack that catches misconfigurations before pipeline execution, eliminating an entire class of runtime failures.

Technology stack:

DatabricksPySparkDelta LakePythonScikit-learnLightGBMstatsmodelstyped configuration frameworkPydanticDatabricks Asset BundlesAzure DevOps PipelinesPower BIruffmypybandit

3. Cloud Lakehouse with Change-Data-Capture Ingestion

Senior Data Engineer & Architect·2022-2023

Project overview:

The project involved designing and delivering a cloud-native data platform for a financial-services institution moving off a fragmented legacy ETL stack. The platform is built around a medallion lakehouse on Databricks, declarative streaming transformations for the silver layer, and log-based change-data-capture from operational relational sources via a managed Kafka service. A config-driven pipeline layer decouples table onboarding from code changes, and a data-quality engine splits each stream into a clean sink and a quarantine sink for audit and remediation.

Responsibilities:

  • Architected the bronze/silver/gold layout on cloud object storage and the declarative streaming transformation pipeline, where each silver table is materialised from a configuration entry declaring schema, constraints, paths, and source format.
  • Built the data-quality engine (per-column mandatory checks, type-cast verification, row-level faulty-record flagging) and the dual-stream writer pattern that sinks valid and faulty rows into separate Delta destinations for downstream reconciliation.
  • Implemented the CDC ingestion path: managed Kafka as the transport, log-based source connectors against the relational systems, an object-storage sink connector for landing, and a schema registry for evolution and serialization governance.
  • Wrote the Terraform IaC covering resource group, redundant object storage with container layout, secret store with role-based access control and managed secrets for platform credentials, private virtual network with service endpoints, workspace, and orchestration layer.
  • Established the CI/CD pipeline, a local test harness with session-scoped Spark fixtures, and a dev path mirroring the production storage topology so engineers can iterate without hitting cloud resources.

Achievements:

Replaced hand-written, per-table pipeline code with a declarative JSON configuration model - onboarding a new dataset becomes a configuration exercise rather than an engineering project. Materially reduced downstream data-quality incidents through per-column mandatory and type-cast validation, with full lineage of failed records into a quarantine table. Introduced log-based CDC with exactly-once delivery semantics, eliminating the polling overhead and latency of the previous arrangement while preserving schema evolution through the schema registry.

Technology stack:

DatabricksPySparkDelta Lakedeclarative streaming pipelinesAuto LoaderKafkalog-based CDC connectorsKubernetesTerraformAzurecentralized loggingAzure DevOps Pipelines
DD

Ready to Work with Dany D.?

Lead Data & ML Engineer

Share your project details and our team will review the match and confirm availability.

Browse More Experts

We respond within 24 hours.