Agentic Automation Platform for Document-Intensive Workflows

AI Architect & Tech Lead Data Engineer2025-2026Dany D.

Lead Data & ML Engineer

Data Engineer & Big Data

Key Expertise

Declarative Data EngineeringAdvanced Stream ProcessingReal-time CDC PipelinesMedallion Lakehouse DesignMLOpsAgentic AI Architecture

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

statsmodelsLangGraphMLflowLightGBMLangChain

Languages

Python

Databases

DatabricksAuto LoaderPostgreSQLUnity CatalogDelta Lake

Infrastructure

Azure DevOps PipelinesKafkaAWSTerraformAzurebanditGitLab CIKubernetescentralized loggingmypyDatadogruff

Frameworks

typed configuration frameworkPydanticpytestdeclarative streaming pipelinesPySparkDatabricks Asset BundlesScikit-learnDatabricks Workflows

Integrations & Protocols

Model Context Protocollog-based CDC connectorsPower BI

7-day risk-free trial

Response within 24 hours

View Full Profile

Overview

The project involved architecting a greenfield agentic AI platform that automates the end-to-end processing of high-volume, document-heavy business cases for a regulated enterprise. A supervisor-style agent graph routes each case through a set of specialist agents that handle ingestion, enrichment, validation, coordination, and resolution, replacing manual review queues while keeping a human-in-the-loop checkpoint on high-stakes transitions. The agent layer sits on top of a cloud-native Databricks data platform with Unity Catalog governance, declarative streaming ingestion from an object-store landing zone, and a multi-region, multi-tenant infrastructure baseline.

Achievements

Shipped the supervisor and specialist agents into production with durable state checkpointing, full execution tracing, and a repeatable end-to-end scenario suite covering happy paths and edge cases. Delivered the underlying Databricks platform as a reusable blueprint that subsequent internal business units onboarded against shared infrastructure modules rather than greenfield environments. Established strictly validated data contracts between agents so that malformed or incomplete messages are caught at the boundary and never propagate through the graph.

Responsibilities

Designed the supervisor-and-specialist agent topology and the typed contracts exchanged between agents, with versioned schemas validated at every inter-agent boundary.
Architected the Databricks data platform on a major public cloud: regional metastore provisioning, per-tenant workspaces for non-production and production, declarative streaming ingestion with quality expectations derived from schema definitions, and a quarantine path for records failing validation.
Built the infrastructure-as-code hierarchy using Terraform and a dependency-orchestration layer, organised by region, business domain, environment, and stack, with reusable modules for metastore, workspace, catalog/schema/external location, and declarative permissions.
Developed an internal data-engineering framework that wraps pipeline tasks with standardised configuration loading, logging, and schema enforcement, so that engineers author pipelines declaratively rather than by assembling Spark primitives.
Implemented tenant isolation across storage (prefix/bucket partitioning), encryption (per-tenant keys), metadata (tagging for governance and cost allocation), and configuration (runtime-resolved rather than compile-time coupled).

Technologies Used

LangGraphLangChainPythonPydanticPySparkDatabricksdeclarative streaming pipelinesTerraformAWSPostgreSQLMLflowModel Context ProtocolGitLab CIDatadog

Dany D.

Lead Data & ML Engineer

Data Engineer & Big Data

Key Expertise

Declarative Data EngineeringAdvanced Stream ProcessingReal-time CDC PipelinesMedallion Lakehouse DesignMLOpsAgentic AI Architecture

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

statsmodelsLangGraphMLflowLightGBMLangChain

Languages

Python

Databases

DatabricksAuto LoaderPostgreSQLUnity CatalogDelta Lake

Infrastructure

Azure DevOps PipelinesKafkaAWSTerraformAzurebanditGitLab CIKubernetescentralized loggingmypyDatadogruff

Frameworks

typed configuration frameworkPydanticpytestdeclarative streaming pipelinesPySparkDatabricks Asset BundlesScikit-learnDatabricks Workflows

Integrations & Protocols

Model Context Protocollog-based CDC connectorsPower BI

7-day risk-free trial

Response within 24 hours

View Full Profile

This project was delivered by

Dany D.

View Full Profile

More Projects by Dany D.

2023 - 2024

AI-Driven Retail Execution Platform

Lead Data & ML Engineer

The project involved delivering an enterprise data and AI platform for a multinational consumer-goods company to orchestrate daily sales-execution planning for its field teams across several major retail channels and international markets. The platform combines a medallion-architecture lakehouse on Databricks with a portfolio of production ML models that translate raw retailer feeds, inventory signals, compliance data, and third-party audits into a ranked set of outlet-level tasks delivered to reps each morning. The system operates as a multi-tenant codebase where each retailer channel is onboarded as a configurable tenant rather than a fork.

DatabricksPySparkDelta LakePythonScikit-learn+10

View Details

2022-2023

Cloud Lakehouse with Change-Data-Capture Ingestion

Senior Data Engineer & Architect

The project involved designing and delivering a cloud-native data platform for a financial-services institution moving off a fragmented legacy ETL stack. The platform is built around a medallion lakehouse on Databricks, declarative streaming transformations for the silver layer, and log-based change-data-capture from operational relational sources via a managed Kafka service. A config-driven pipeline layer decouples table onboarding from code changes, and a data-quality engine splits each stream into a clean sink and a quarantine sink for audit and remediation.

DatabricksPySparkDelta Lakedeclarative streaming pipelinesAuto Loader+7

View Details