Automation & Workflows

Azure Data Factory

Name: Azure Data Factory
Brand: Azure Data Factory
Price: 1 USD
Availability: InStock

Microsoft's cloud-based data integration service for building, scheduling, and orchestrating data workflows and ETL pipelines at scale.

Starting at~$1 per 1,000 activity runs (Azure IR); ~$1.50 per 1,000 (Self-Hosted IR)

Visit Azure Data Factory →

💡

In Plain English

Microsoft's cloud-based data integration service for building, scheduling, and orchestrating data workflows and ETL pipelines at scale.

Overview

Azure Data Factory (ADF) is Microsoft's fully managed, serverless cloud data integration service designed to ingest, prepare, transform, and orchestrate data across hybrid and multi-cloud environments. Positioned as the data movement and transformation backbone of the Azure analytics stack, ADF enables organizations to build complex extract-transform-load (ETL) and extract-load-transform (ELT) workflows without provisioning or managing underlying infrastructure. It serves as the connective tissue between operational data sources — on-premises SQL Server, SAP, Oracle, mainframes, REST APIs, SaaS platforms — and modern cloud destinations such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Microsoft Fabric, Snowflake, and Azure SQL Database.

At its core, ADF is built around four conceptual primitives: linked services (connection definitions to external systems), datasets (typed views over data structures), activities (units of work such as copy, lookup, or data flow execution), and pipelines (logical groupings of activities that execute together). Pipelines can be authored visually through the browser-based Data Factory Studio, programmatically via REST APIs, ARM templates, Bicep, Terraform, or the Python and .NET SDKs. The visual authoring experience is a major differentiator, allowing data engineers and analysts to drag-and-drop sources, transformations, and sinks without writing Spark or SQL code, while still permitting custom code activities for advanced logic.

ADF supports more than 100 native connectors covering Azure services, AWS, Google Cloud Platform, enterprise SaaS applications (Salesforce, ServiceNow, Workday, Dynamics 365), file systems, databases, and generic protocols (ODBC, REST, OData, FTP/SFTP). For on-premises and private network sources, the Self-Hosted Integration Runtime acts as a secure data movement gateway, while the Azure Integration Runtime handles cloud-to-cloud transfers. A third option, the SSIS Integration Runtime, allows organizations to lift-and-shift existing SQL Server Integration Services packages into the cloud with minimal refactoring — a critical migration path for enterprises modernizing legacy ETL estates.

For transformations, ADF offers Mapping Data Flows, a code-free transformation environment that compiles visual logic into Apache Spark jobs executed on managed compute clusters. This decouples engineers from cluster management while still providing the scalability of distributed processing. Power Query activities bring self-service M-language transformations into pipelines, and external compute services (Azure Databricks, HDInsight, Azure Functions, Azure Batch) can be invoked for custom workloads. Triggers support scheduled, tumbling-window, event-based (storage events), and manual pipeline execution, enabling both batch and near-real-time orchestration patterns.

ADF integrates tightly with Azure DevOps and GitHub for source control, supporting branch-based development, pull request workflows, and CI/CD deployment via ARM templates. Monitoring capabilities include a built-in pipeline run dashboard, integration with Azure Monitor and Log Analytics, and alerting through Azure Action Groups. Security features include managed identity authentication, customer-managed key encryption, private endpoints, and integration with Microsoft Purview for data lineage and governance. As of 2024-2025, Microsoft has been positioning Microsoft Fabric Data Factory as the next-generation evolution of the service within the unified Fabric platform, though standalone ADF remains fully supported and continues to receive feature updates for customers not yet adopting Fabric.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

100+ Built-in Connectors+

ADF provides pre-built connectors to over 100 data sources and sinks including Azure services, AWS S3, Google BigQuery, Salesforce, SAP, Oracle, MongoDB, REST APIs, and file formats like Parquet, Avro, and JSON. Each connector handles authentication, pagination, schema detection, and data type mapping automatically. Connectors are fully managed by Microsoft, receiving regular updates for API changes and new features without requiring user intervention. Linked Services store connection configurations securely using Azure Key Vault integration, and parameterized datasets enable reusable connector definitions across multiple pipelines.

Mapping Data Flows+

Mapping Data Flows provide a visual, code-free interface for designing complex data transformations that execute on auto-scaled Apache Spark clusters managed by ADF. Users can perform joins, aggregations, pivots, window functions, derived columns, and conditional splits through a drag-and-drop canvas with real-time data preview. The underlying Spark code is generated automatically, eliminating the need for Spark expertise. Debug mode allows interactive testing with sample data, and the data flow graph provides execution metrics including row counts, timing, and partition distribution. Transformations support schema drift handling for semi-structured data and can process datasets ranging from megabytes to terabytes.

Integration Runtime Options+

ADF offers three Integration Runtime types: Azure IR for cloud-to-cloud data movement with auto-resolve region selection, Self-hosted IR for secure access to on-premises and private network data sources without opening firewall ports, and Azure-SSIS IR for running existing SSIS packages in a managed cloud environment. The Azure IR supports Managed Virtual Network with private endpoints for network-isolated data movement. Self-hosted IR supports high-availability clusters with multiple nodes and can be shared across data factories. Each runtime type is optimized for its connectivity scenario, and multiple runtimes can coexist within a single data factory to handle diverse network topologies.

Event-Based and Custom Triggers+

Beyond basic schedule triggers, ADF supports tumbling window triggers for backfill scenarios with dependency chaining, storage event triggers that fire when blobs are created or deleted, and custom event triggers that respond to Azure Event Grid topics. This enables event-driven architectures where pipelines execute automatically in response to data arrival, system events, or business process signals. Tumbling window triggers maintain their own execution state, supporting retry of failed windows and catch-up execution for gaps. Triggers can be parameterized to pass runtime context (file names, timestamps, event metadata) into pipeline parameters, enabling dynamic pipeline behavior based on the triggering event.

CI/CD and Source Control Integration+

ADF natively integrates with Azure DevOps Git and GitHub for version-controlled pipeline development, enabling branching, pull requests, and code review workflows for data pipelines. Teams can promote pipelines across dev, staging, and production environments using automated ARM template deployment via Azure DevOps release pipelines or GitHub Actions. The publish branch stores generated ARM templates, and parameterized linked services allow environment-specific configurations (connection strings, credentials, resource references) to be injected at deployment time. This enables enterprise-grade DevOps practices for data integration, including automated testing of pipeline configurations and rollback capabilities through version history.

Pricing Plans

Plan 1

~$1 per 1,000 activity runs (Azure IR); ~$1.50 per 1,000 (Self-Hosted IR)

Plan 2

~$0.25 per DIU-hour (Azure IR); ~$0.10 per hour (Self-Hosted IR)

Plan 3

~$0.193 per vCore-hour (General Purpose); ~$0.343 (Memory Optimized)

Plan 4

From ~$0.33/hr (Standard D1 v2) up to several dollars/hr for larger nodes

Plan 5

~$0.80 per inactive pipeline per month

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Azure Data Factory?

View Pricing Options →

Best Use Cases

🎯

Consolidating data from 100+ heterogeneous sources (on-premises SQL Server, Salesforce, SAP, S3, REST APIs) into Azure Synapse or Azure Data Lake for enterprise analytics and reporting

⚡

Migrating legacy SQL Server Integration Services (SSIS) ETL packages to the cloud using Azure-SSIS Integration Runtime without rewriting transformation logic

🔧

Building nightly or hourly batch ETL pipelines that extract data from operational databases, apply Mapping Data Flow transformations, and load into a data warehouse for BI dashboards in Power BI

🚀

Orchestrating multi-step data processing workflows that span Azure Databricks notebooks for ML feature engineering, Azure Functions for custom logic, and Synapse SQL for final aggregation

💡

Implementing event-driven data pipelines that automatically trigger when new files arrive in Azure Blob Storage or Azure Data Lake, processing and routing data to downstream systems in near-real-time

🔄

Running metadata-driven ingestion frameworks where a single parameterized pipeline dynamically processes hundreds of tables based on configuration stored in a control database, reducing pipeline maintenance overhead

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Azure Data Factory doesn't handle well:

⚠Azure Data Factory is fundamentally a batch and micro-batch orchestration platform — it is not designed for real-time stream processing, sub-second event handling, or continuous low-latency data feeds. Mapping Data Flow clusters have a cold-start latency of several minutes, making iterative development slow and ruling out workloads that need quick turnaround. The pricing model spans multiple meters (orchestration, DIU-hours, vCore-hours, integration runtime time) which makes cost prediction difficult and can produce surprising bills under heavy data flow usage. Debugging is split across several monitoring views, and Spark error messages from Mapping Data Flows are often cryptic. Cross-region data movement incurs egress charges that are not always obvious upfront. The SSIS Integration Runtime, while useful for migration, runs on always-on managed VMs and is comparatively expensive for low-utilization workloads. ADF's expression language is more constrained than a general-purpose programming language, so genuinely complex conditional or looping logic often requires offloading to Azure Functions or Databricks. Finally, Microsoft's parallel investment in Fabric Data Factory creates strategic ambiguity for new adopters deciding between the two products.

Pros & Cons

✓ Pros

✓Over 100 pre-built connectors covering Azure, AWS, GCP, SaaS applications, on-premises databases, and legacy mainframes — eliminates most custom integration code
✓Visual, code-free authoring through Data Factory Studio with Mapping Data Flows that compile to managed Spark jobs, making it accessible to non-developers while still scaling to large datasets
✓SSIS Integration Runtime provides a lift-and-shift path for existing SQL Server Integration Services packages, a unique advantage for enterprises modernizing legacy Microsoft ETL estates
✓Fully serverless with consumption-based pricing — no clusters to provision, patch, or scale, and the platform handles autoscaling of execution infrastructure
✓Deep integration with the broader Azure ecosystem including Synapse Analytics, Data Lake Storage, Key Vault, Purview, Monitor, and managed identities for end-to-end governance and security
✓Native CI/CD support via Azure DevOps and GitHub with ARM template publishing, enabling proper source control, code review, and multi-environment deployment workflows

✗ Cons

✗Pricing model is notoriously complex — pipeline orchestration, data movement (DIU-hours), data flow execution (vCore-hours), and integration runtime time are all metered separately, making cost forecasting difficult
✗Mapping Data Flows have noticeable cluster startup latency (often 4-6 minutes per debug or job run) that makes iterative development slow and unsuitable for low-latency micro-batch workloads
✗Streaming and true real-time processing are weak — ADF is fundamentally a batch and micro-batch tool; for sub-second event processing you need Azure Stream Analytics, Event Hubs, or Databricks Structured Streaming
✗Strategic ambiguity between standalone ADF and Microsoft Fabric Data Factory creates uncertainty about long-term investment, with some new features landing in Fabric first
✗Debugging complex pipelines and Mapping Data Flows can be painful — error messages from underlying Spark jobs are often opaque and require drilling into multiple monitoring panes to diagnose

Frequently Asked Questions

What is the difference between Azure Data Factory and Microsoft Fabric Data Factory?+

Azure Data Factory is the standalone, mature PaaS service available as an independent Azure resource, billed on a granular pay-per-use model. Microsoft Fabric Data Factory is a re-imagined version embedded inside the Microsoft Fabric SaaS platform, sharing capacity-based pricing with the rest of Fabric (Power BI, Synapse, OneLake) and introducing new experiences like Dataflow Gen2 and Fabric pipelines. They share many concepts and connectors but are separate products with different pricing, governance, and integration models. Microsoft continues to invest in both, but new strategic features increasingly debut in Fabric first.

How does Azure Data Factory handle on-premises data sources?+

ADF connects to on-premises and private-network data sources through the Self-Hosted Integration Runtime (SHIR), a lightweight agent installed on a Windows machine inside your network. The SHIR establishes outbound-only encrypted connections to the Azure Data Factory service, eliminating the need for inbound firewall rules or VPN tunnels. It supports clustering for high availability and load balancing across multiple nodes, and handles credential management locally so secrets never leave the network.

Can Azure Data Factory replace SQL Server Integration Services (SSIS)?+

Yes, in two ways. First, ADF can natively rebuild SSIS workflows using its own pipeline and Mapping Data Flow capabilities, which is the recommended modernization path. Second, the SSIS Integration Runtime allows you to lift-and-shift existing SSIS packages into ADF with minimal changes, running them on managed Azure SSIS instances. This is unique to Azure and gives Microsoft-shop customers a gradual migration option rather than forcing a full rewrite.

How does Azure Data Factory pricing actually work?+

ADF uses several separate consumption meters: pipeline orchestration (per activity run), data movement (per Data Integration Unit-hour for the Copy activity), data flow execution (per vCore-hour of the Spark cluster running Mapping Data Flows), SSIS Integration Runtime (per hour of provisioned compute), and inactive pipeline charges. Costs vary significantly based on workload patterns — a heavy data flow job can be far more expensive than a simple copy of the same data volume. Microsoft's pricing calculator and the cost analysis blade in Azure Cost Management are essential tools for forecasting.

Does Azure Data Factory support real-time or streaming data?+

Not in the true streaming sense. ADF supports event-based triggers that fire pipelines in response to blob storage or custom events, and it can process micro-batches on tight schedules (down to 1 minute via tumbling windows), but it is not a stream processing engine. For sub-second latency, complex event processing, or continuous ingestion of high-velocity event streams, Microsoft recommends pairing ADF with Azure Event Hubs, Azure Stream Analytics, or Databricks Structured Streaming.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Azure Data Factory and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

Through 2025 and into 2026, Microsoft has continued to converge Azure Data Factory capabilities with Microsoft Fabric Data Factory, with new pipeline activities and Dataflow Gen2 features generally landing in Fabric first before being backported to standalone ADF. Notable evolutions include expanded Microsoft Fabric and OneLake connectors as first-class targets, deeper integration with Microsoft Purview for automated lineage capture across pipeline runs, and broader support for Delta Lake and Iceberg table formats in sinks. The connector library has expanded with additional SaaS and modern data platform integrations, and Mapping Data Flows have received performance improvements around cluster startup and incremental refresh patterns. Copilot in Data Factory (within Fabric) introduces natural-language pipeline authoring and transformation suggestions, signaling Microsoft's direction for AI-assisted data engineering. Customers running standalone ADF should monitor Microsoft's Fabric roadmap closely, as the strategic center of gravity for new investment is clearly shifting toward Fabric while the standalone product remains in mainstream support.

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Azure Data Factory Today

Get started with Azure Data Factory and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Azure Data Factory

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

100+ Built-in Connectors+

Mapping Data Flows+

Integration Runtime Options+

Event-Based and Custom Triggers+

CI/CD and Source Control Integration+

Pricing Plans

Plan 1

~$1 per 1,000 activity runs (Azure IR); ~$1.50 per 1,000 (Self-Hosted IR)

Plan 2

~$0.25 per DIU-hour (Azure IR); ~$0.10 per hour (Self-Hosted IR)

Plan 3

~$0.193 per vCore-hour (General Purpose); ~$0.343 (Memory Optimized)

Plan 4

From ~$0.33/hr (Standard D1 v2) up to several dollars/hr for larger nodes

Plan 5

~$0.80 per inactive pipeline per month

Ready to get started with Azure Data Factory?

View Pricing Options →

Best Use Cases

🎯

Consolidating data from 100+ heterogeneous sources (on-premises SQL Server, Salesforce, SAP, S3, REST APIs) into Azure Synapse or Azure Data Lake for enterprise analytics and reporting

⚡

Migrating legacy SQL Server Integration Services (SSIS) ETL packages to the cloud using Azure-SSIS Integration Runtime without rewriting transformation logic

🔧

Building nightly or hourly batch ETL pipelines that extract data from operational databases, apply Mapping Data Flow transformations, and load into a data warehouse for BI dashboards in Power BI

🚀

Orchestrating multi-step data processing workflows that span Azure Databricks notebooks for ML feature engineering, Azure Functions for custom logic, and Synapse SQL for final aggregation

💡

Implementing event-driven data pipelines that automatically trigger when new files arrive in Azure Blob Storage or Azure Data Lake, processing and routing data to downstream systems in near-real-time

🔄

Running metadata-driven ingestion frameworks where a single parameterized pipeline dynamically processes hundreds of tables based on configuration stored in a control database, reducing pipeline maintenance overhead

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Azure Data Factory doesn't handle well:

⚠Azure Data Factory is fundamentally a batch and micro-batch orchestration platform — it is not designed for real-time stream processing, sub-second event handling, or continuous low-latency data feeds. Mapping Data Flow clusters have a cold-start latency of several minutes, making iterative development slow and ruling out workloads that need quick turnaround. The pricing model spans multiple meters (orchestration, DIU-hours, vCore-hours, integration runtime time) which makes cost prediction difficult and can produce surprising bills under heavy data flow usage. Debugging is split across several monitoring views, and Spark error messages from Mapping Data Flows are often cryptic. Cross-region data movement incurs egress charges that are not always obvious upfront. The SSIS Integration Runtime, while useful for migration, runs on always-on managed VMs and is comparatively expensive for low-utilization workloads. ADF's expression language is more constrained than a general-purpose programming language, so genuinely complex conditional or looping logic often requires offloading to Azure Functions or Databricks. Finally, Microsoft's parallel investment in Fabric Data Factory creates strategic ambiguity for new adopters deciding between the two products.

Pros & Cons

✓ Pros

✓Over 100 pre-built connectors covering Azure, AWS, GCP, SaaS applications, on-premises databases, and legacy mainframes — eliminates most custom integration code
✓Visual, code-free authoring through Data Factory Studio with Mapping Data Flows that compile to managed Spark jobs, making it accessible to non-developers while still scaling to large datasets
✓SSIS Integration Runtime provides a lift-and-shift path for existing SQL Server Integration Services packages, a unique advantage for enterprises modernizing legacy Microsoft ETL estates
✓Fully serverless with consumption-based pricing — no clusters to provision, patch, or scale, and the platform handles autoscaling of execution infrastructure
✓Deep integration with the broader Azure ecosystem including Synapse Analytics, Data Lake Storage, Key Vault, Purview, Monitor, and managed identities for end-to-end governance and security
✓Native CI/CD support via Azure DevOps and GitHub with ARM template publishing, enabling proper source control, code review, and multi-environment deployment workflows

✗ Cons

✗Pricing model is notoriously complex — pipeline orchestration, data movement (DIU-hours), data flow execution (vCore-hours), and integration runtime time are all metered separately, making cost forecasting difficult
✗Mapping Data Flows have noticeable cluster startup latency (often 4-6 minutes per debug or job run) that makes iterative development slow and unsuitable for low-latency micro-batch workloads
✗Streaming and true real-time processing are weak — ADF is fundamentally a batch and micro-batch tool; for sub-second event processing you need Azure Stream Analytics, Event Hubs, or Databricks Structured Streaming
✗Strategic ambiguity between standalone ADF and Microsoft Fabric Data Factory creates uncertainty about long-term investment, with some new features landing in Fabric first
✗Debugging complex pipelines and Mapping Data Flows can be painful — error messages from underlying Spark jobs are often opaque and require drilling into multiple monitoring panes to diagnose

Frequently Asked Questions