Microsoft Research's code-first autonomous agent framework that converts natural language into executable Python code for data analytics, statistical modeling, and complex multi-step computational workflows.
Open-source framework from Microsoft Research that builds AI agents capable of writing and running real Python code to solve data analytics tasks from plain English instructions.
TaskWeaver is a code-first agent framework from Microsoft Research that takes a fundamentally different approach to AI-powered task execution. Published as an academic paper (arXiv:2311.17541) and released under the MIT license, TaskWeaver converts natural language requests into executable Python code rather than relying on text-based reasoning chains — making it uniquely powerful for data analytics, statistical modeling, and computational workflows where data fidelity matters.
Most agent frameworks — LangChain, CrewAI, AutoGen — use a text-based approach where agents describe actions in natural language and tools return text results. This works for simple lookups but breaks down for data-intensive tasks. When you serialize a 50,000-row DataFrame to text between agent steps, you lose precision, structure, and the ability to perform complex operations efficiently.
TaskWeaver solves this by generating actual Python code that operates on native data structures directly in memory. A request like 'find the top 10 customers by lifetime value, excluding those who churned in Q4' doesn't get translated into a series of text-based tool calls — it becomes executable Python that loads the data into a DataFrame, applies filters, computes aggregations, and returns structured results. The data never leaves Python's type system.
This is TaskWeaver's single biggest advantage over text-based frameworks: for any workflow involving numerical computation, statistical analysis, data transformation, or visualization, the code-first approach produces dramatically more reliable results. Where LangChain agents might hallucinate numbers during text serialization, TaskWeaver's generated code either runs correctly or throws a traceable error.
TaskWeaver's architecture cleanly separates three concerns:
The Planner receives user requests and decomposes them into a sequence of sub-tasks. Each sub-task has a clear objective and dependencies on previous steps. The Planner can revise its plan based on execution results, enabling adaptive workflows where early results inform later analysis. The Code Generator takes each sub-task and produces Python code to accomplish it. The generator has access to the plugin registry (custom functions the team has defined), conversation history, and results from previous steps. It generates complete, executable code blocks — not snippets or pseudocode. The Code Executor runs generated code in a managed Python process. In local mode, code runs directly on the host with full library access. For production deployments, TaskWeaver supports sandboxed execution environments that restrict file system access, network calls, and system operations while still allowing computational work.This three-part architecture makes debugging straightforward: you can inspect the plan, review generated code, and trace execution results at each step. When something goes wrong, you know exactly where and why.
TaskWeaver's plugin system lets teams add custom capabilities through Python functions paired with YAML description files. The YAML manifest tells the agent what the plugin does, what parameters it accepts, and what it returns — the agent then incorporates these plugins into generated code when relevant.
Common plugin use cases include database connectors (SQL queries against internal data warehouses), API wrappers (pulling data from CRM, ERP, or analytics platforms), custom analytics functions (proprietary scoring algorithms, domain-specific statistical tests), and data transformation utilities (ETL functions, format converters).
The plugin architecture is one of TaskWeaver's strongest features for enterprise adoption: teams can expose their existing Python tooling to the agent without modifying the core framework.
Against LangChain agents: TaskWeaver wins decisively for data analytics workflows where data fidelity matters. LangChain has a vastly larger ecosystem and community, making it better for general-purpose agent building with broad integrations. If your use case is primarily data analysis and computation, TaskWeaver is the stronger choice.
Against AutoGen (Microsoft): Both come from Microsoft, but serve different purposes. AutoGen focuses on multi-agent conversations and collaboration patterns. TaskWeaver focuses on single-agent code execution for analytical tasks. They can complement each other in larger systems.
Against CrewAI: CrewAI emphasizes role-based multi-agent orchestration for business workflows. TaskWeaver is purpose-built for code generation and execution. CrewAI is more accessible to non-developers; TaskWeaver produces better results for technical users doing data work.
TaskWeaver excels in scenarios where an AI agent needs to actually compute results rather than just retrieve information:
TaskWeaver is a research project, not a commercial product. Development follows an academic cadence with updates tied to research milestones rather than a regular release schedule. The community is small — measured in hundreds of GitHub stars rather than the tens of thousands that LangChain commands. Documentation is thorough for core features but thin on production deployment patterns.
The framework requires genuine Python proficiency. If your team isn't comfortable reading and debugging generated Python code, TaskWeaver will be frustrating. It's designed for data scientists and engineers, not business analysts or no-code users.
Code generation quality depends heavily on the underlying LLM. GPT-4 class models produce reliable code for most analytical tasks; smaller models struggle with complex multi-step workflows and produce more errors that require manual intervention.
For teams that fit the profile — Python-proficient data practitioners who need reliable AI-assisted analytics — TaskWeaver offers a uniquely powerful approach that no text-based framework can match for data-intensive work.
Was this helpful?
Converts natural language requests directly into executable Python code that operates on native data structures like pandas DataFrames, NumPy arrays, and standard Python objects. Unlike text-based tool chaining, data never leaves Python's type system between steps, eliminating precision loss and serialization errors.
Use Case:
A data analyst asks 'find correlations between sales and weather data for the past year, excluding holidays' — TaskWeaver generates Python code that loads both datasets, merges on date, filters holidays, computes a correlation matrix, and produces a visualization, all in one executable pipeline.
Three-component system providing clean separation of concerns: the Planner decomposes requests into sub-tasks with dependency tracking, the Code Generator produces complete Python programs for each sub-task, and the Code Executor runs code in a managed process. Plans adapt dynamically based on intermediate results.
Use Case:
A request to 'analyze customer churn factors and build a prediction model' gets decomposed into data loading, feature engineering, exploratory analysis, model training, and evaluation sub-tasks — each generating separate, traceable Python code blocks.
Extend TaskWeaver with custom Python functions paired with YAML manifest files that describe parameters, return types, and usage context. The agent discovers available plugins and incorporates them into generated code when they match the current task requirements.
Use Case:
A data engineering team creates plugins for their Snowflake data warehouse, Salesforce CRM API, and proprietary risk scoring algorithm — TaskWeaver agents automatically use these tools when generating code for relevant analytical queries.
Maintains full execution context across conversation turns, including loaded data, computed variables, and intermediate results. Follow-up requests build on previous analysis without re-executing earlier steps, enabling iterative analytical workflows.
Use Case:
An analyst loads a dataset in turn 1, applies demographic filters in turn 2, and requests a specific visualization in turn 3 — TaskWeaver references the already-filtered DataFrame directly without reloading or recomputing.
Runs generated Python code in configurable execution environments with safety guardrails. Supports both local mode with full library access and restricted sandbox mode that limits file system access, network operations, and system calls while preserving computational capabilities.
Use Case:
A university deploys TaskWeaver for student analytics projects — sandbox mode ensures generated code can perform computations and create visualizations but cannot access system files, make external network calls, or modify the host environment.
Validates generated code before execution to detect potential issues including syntax errors, undefined variables, and unsafe operations. Automatically attempts to fix detected issues before running code, reducing execution failures and improving reliability.
Use Case:
When the Code Generator produces a script with a misspelled variable name or incorrect function signature, the verification step catches the error, applies a fix, and executes the corrected code — avoiding runtime failures that would interrupt the analytical workflow.
Free
month
Ready to get started with TaskWeaver?
View Pricing Options →We believe in transparent reviews. Here's what TaskWeaver doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
TaskWeaver development continues at its research pace through 2026 with incremental improvements to code verification and plugin management. The framework remains the go-to choice for code-first agent execution in data analytics workflows. Microsoft's broader agent ecosystem has expanded significantly with AutoGen 0.4 and Semantic Kernel reaching maturity, but TaskWeaver maintains its unique niche as the Python code generation specialist.
AI Agent Builders
The industry-standard framework for building production-ready LLM applications with comprehensive tool integration, agent orchestration, and enterprise observability through LangSmith.
AI Agent Builders
Open-source Python framework that orchestrates autonomous AI agents collaborating as teams to accomplish complex workflows. Define agents with specific roles and goals, then organize them into crews that execute sequential or parallel tasks. Agents delegate work, share context, and complete multi-step processes like market research, content creation, and data analysis. Supports 100+ LLM providers through LiteLLM integration and includes memory systems for agent learning. Features 48K+ GitHub stars with active community.
Multi-Agent Builders
Microsoft's open-source framework for building multi-agent AI systems with asynchronous, event-driven architecture.
AI Agent Builders
SDK for building AI agents with planners, memory, and connectors. - Enhanced AI-powered platform providing advanced capabilities for modern development and business workflows. Features comprehensive tooling, integrations, and scalable architecture designed for professional teams and enterprise environments.
No reviews yet. Be the first to share your experience!
Get started with TaskWeaver and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →