Microsoft's cloud-based data integration service for building, scheduling, and orchestrating data workflows and ETL pipelines at scale.
Microsoft's cloud-based data integration service for building, scheduling, and orchestrating data workflows and ETL pipelines at scale.
Azure Data Factory (ADF) is Microsoft's fully managed, serverless cloud data integration service designed to ingest, prepare, transform, and orchestrate data across hybrid and multi-cloud environments. Positioned as the data movement and transformation backbone of the Azure analytics stack, ADF enables organizations to build complex extract-transform-load (ETL) and extract-load-transform (ELT) workflows without provisioning or managing underlying infrastructure. It serves as the connective tissue between operational data sources — on-premises SQL Server, SAP, Oracle, mainframes, REST APIs, SaaS platforms — and modern cloud destinations such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Microsoft Fabric, Snowflake, and Azure SQL Database.
At its core, ADF is built around four conceptual primitives: linked services (connection definitions to external systems), datasets (typed views over data structures), activities (units of work such as copy, lookup, or data flow execution), and pipelines (logical groupings of activities that execute together). Pipelines can be authored visually through the browser-based Data Factory Studio, programmatically via REST APIs, ARM templates, Bicep, Terraform, or the Python and .NET SDKs. The visual authoring experience is a major differentiator, allowing data engineers and analysts to drag-and-drop sources, transformations, and sinks without writing Spark or SQL code, while still permitting custom code activities for advanced logic.
ADF supports more than 100 native connectors covering Azure services, AWS, Google Cloud Platform, enterprise SaaS applications (Salesforce, ServiceNow, Workday, Dynamics 365), file systems, databases, and generic protocols (ODBC, REST, OData, FTP/SFTP). For on-premises and private network sources, the Self-Hosted Integration Runtime acts as a secure data movement gateway, while the Azure Integration Runtime handles cloud-to-cloud transfers. A third option, the SSIS Integration Runtime, allows organizations to lift-and-shift existing SQL Server Integration Services packages into the cloud with minimal refactoring — a critical migration path for enterprises modernizing legacy ETL estates.
For transformations, ADF offers Mapping Data Flows, a code-free transformation environment that compiles visual logic into Apache Spark jobs executed on managed compute clusters. This decouples engineers from cluster management while still providing the scalability of distributed processing. Power Query activities bring self-service M-language transformations into pipelines, and external compute services (Azure Databricks, HDInsight, Azure Functions, Azure Batch) can be invoked for custom workloads. Triggers support scheduled, tumbling-window, event-based (storage events), and manual pipeline execution, enabling both batch and near-real-time orchestration patterns.
ADF integrates tightly with Azure DevOps and GitHub for source control, supporting branch-based development, pull request workflows, and CI/CD deployment via ARM templates. Monitoring capabilities include a built-in pipeline run dashboard, integration with Azure Monitor and Log Analytics, and alerting through Azure Action Groups. Security features include managed identity authentication, customer-managed key encryption, private endpoints, and integration with Microsoft Purview for data lineage and governance. As of 2024-2025, Microsoft has been positioning Microsoft Fabric Data Factory as the next-generation evolution of the service within the unified Fabric platform, though standalone ADF remains fully supported and continues to receive feature updates for customers not yet adopting Fabric.
Was this helpful?
ADF provides pre-built connectors to over 100 data sources and sinks including Azure services, AWS S3, Google BigQuery, Salesforce, SAP, Oracle, MongoDB, REST APIs, and file formats like Parquet, Avro, and JSON. Each connector handles authentication, pagination, schema detection, and data type mapping automatically. Connectors are fully managed by Microsoft, receiving regular updates for API changes and new features without requiring user intervention. Linked Services store connection configurations securely using Azure Key Vault integration, and parameterized datasets enable reusable connector definitions across multiple pipelines.
Mapping Data Flows provide a visual, code-free interface for designing complex data transformations that execute on auto-scaled Apache Spark clusters managed by ADF. Users can perform joins, aggregations, pivots, window functions, derived columns, and conditional splits through a drag-and-drop canvas with real-time data preview. The underlying Spark code is generated automatically, eliminating the need for Spark expertise. Debug mode allows interactive testing with sample data, and the data flow graph provides execution metrics including row counts, timing, and partition distribution. Transformations support schema drift handling for semi-structured data and can process datasets ranging from megabytes to terabytes.
ADF offers three Integration Runtime types: Azure IR for cloud-to-cloud data movement with auto-resolve region selection, Self-hosted IR for secure access to on-premises and private network data sources without opening firewall ports, and Azure-SSIS IR for running existing SSIS packages in a managed cloud environment. The Azure IR supports Managed Virtual Network with private endpoints for network-isolated data movement. Self-hosted IR supports high-availability clusters with multiple nodes and can be shared across data factories. Each runtime type is optimized for its connectivity scenario, and multiple runtimes can coexist within a single data factory to handle diverse network topologies.
Beyond basic schedule triggers, ADF supports tumbling window triggers for backfill scenarios with dependency chaining, storage event triggers that fire when blobs are created or deleted, and custom event triggers that respond to Azure Event Grid topics. This enables event-driven architectures where pipelines execute automatically in response to data arrival, system events, or business process signals. Tumbling window triggers maintain their own execution state, supporting retry of failed windows and catch-up execution for gaps. Triggers can be parameterized to pass runtime context (file names, timestamps, event metadata) into pipeline parameters, enabling dynamic pipeline behavior based on the triggering event.
ADF natively integrates with Azure DevOps Git and GitHub for version-controlled pipeline development, enabling branching, pull requests, and code review workflows for data pipelines. Teams can promote pipelines across dev, staging, and production environments using automated ARM template deployment via Azure DevOps release pipelines or GitHub Actions. The publish branch stores generated ARM templates, and parameterized linked services allow environment-specific configurations (connection strings, credentials, resource references) to be injected at deployment time. This enables enterprise-grade DevOps practices for data integration, including automated testing of pipeline configurations and rollback capabilities through version history.
~$1 per 1,000 activity runs (Azure IR); ~$1.50 per 1,000 (Self-Hosted IR)
~$0.25 per DIU-hour (Azure IR); ~$0.10 per hour (Self-Hosted IR)
~$0.193 per vCore-hour (General Purpose); ~$0.343 (Memory Optimized)
From ~$0.33/hr (Standard D1 v2) up to several dollars/hr for larger nodes
~$0.80 per inactive pipeline per month
Ready to get started with Azure Data Factory?
View Pricing Options →We believe in transparent reviews. Here's what Azure Data Factory doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Through 2025 and into 2026, Microsoft has continued to converge Azure Data Factory capabilities with Microsoft Fabric Data Factory, with new pipeline activities and Dataflow Gen2 features generally landing in Fabric first before being backported to standalone ADF. Notable evolutions include expanded Microsoft Fabric and OneLake connectors as first-class targets, deeper integration with Microsoft Purview for automated lineage capture across pipeline runs, and broader support for Delta Lake and Iceberg table formats in sinks. The connector library has expanded with additional SaaS and modern data platform integrations, and Mapping Data Flows have received performance improvements around cluster startup and incremental refresh patterns. Copilot in Data Factory (within Fabric) introduces natural-language pipeline authoring and transformation suggestions, signaling Microsoft's direction for AI-assisted data engineering. Customers running standalone ADF should monitor Microsoft's Fabric roadmap closely, as the strategic center of gravity for new investment is clearly shifting toward Fabric while the standalone product remains in mainstream support.
No reviews yet. Be the first to share your experience!
Get started with Azure Data Factory and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →