Master TaskWeaver with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Clone the TaskWeaver repository from GitHub (git clone https://github.com/microsoft/TaskWeaver.git) and install dependencies with pip install
r requirements.txt in a Python
10+ environment Copy the project/taskweaver_config.json.example to taskweaver_config.json and configure your LLM API credentials (OpenAI API key or Azure OpenAI endpoint and deployment name) Run the interactive CLI with python
m taskweaver
p ./project to start a conversation session, then try a data analytics query like 'load the sample CSV and show me basic statistics' Create your first custom plugin by adding a Python function in the project/plugins/ directory with a matching YAML description file that defines parameters and usage context Review the official documentation at microsoft.github.io/TaskWeaver for advanced configuration including sandbox execution mode, conversation memory settings, and multi
model support
💡 Quick Start: Follow these 6 steps in order to get up and running with TaskWeaver quickly.
Explore the key features that make TaskWeaver powerful for multi-agent builders workflows.
Converts natural language requests directly into executable Python code that operates on native data structures like pandas DataFrames, NumPy arrays, and standard Python objects. Unlike text-based tool chaining, data never leaves Python's type system between steps, eliminating precision loss and serialization errors.
A data analyst asks 'find correlations between sales and weather data for the past year, excluding holidays' — TaskWeaver generates Python code that loads both datasets, merges on date, filters holidays, computes a correlation matrix, and produces a visualization, all in one executable pipeline.
Three-component system providing clean separation of concerns: the Planner decomposes requests into sub-tasks with dependency tracking, the Code Generator produces complete Python programs for each sub-task, and the Code Executor runs code in a managed process. Plans adapt dynamically based on intermediate results.
A request to 'analyze customer churn factors and build a prediction model' gets decomposed into data loading, feature engineering, exploratory analysis, model training, and evaluation sub-tasks — each generating separate, traceable Python code blocks.
Extend TaskWeaver with custom Python functions paired with YAML manifest files that describe parameters, return types, and usage context. The agent discovers available plugins and incorporates them into generated code when they match the current task requirements.
A data engineering team creates plugins for their Snowflake data warehouse, Salesforce CRM API, and proprietary risk scoring algorithm — TaskWeaver agents automatically use these tools when generating code for relevant analytical queries.
Maintains full execution context across conversation turns, including loaded data, computed variables, and intermediate results. Follow-up requests build on previous analysis without re-executing earlier steps, enabling iterative analytical workflows.
An analyst loads a dataset in turn 1, applies demographic filters in turn 2, and requests a specific visualization in turn 3 — TaskWeaver references the already-filtered DataFrame directly without reloading or recomputing.
Runs generated Python code in configurable execution environments with safety guardrails. Supports both local mode with full library access and restricted sandbox mode that limits file system access, network operations, and system calls while preserving computational capabilities.
A university deploys TaskWeaver for student analytics projects — sandbox mode ensures generated code can perform computations and create visualizations but cannot access system files, make external network calls, or modify the host environment.
Validates generated code before execution to detect potential issues including syntax errors, undefined variables, and unsafe operations. Automatically attempts to fix detected issues before running code, reducing execution failures and improving reliability.
When the Code Generator produces a script with a misspelled variable name or incorrect function signature, the verification step catches the error, applies a fix, and executes the corrected code — avoiding runtime failures that would interrupt the analytical workflow.
TaskWeaver generates and executes real Python code that works with native data structures like DataFrames, while LangChain agents pass text between steps. For data analytics workflows specifically — loading datasets, computing statistics, generating visualizations — TaskWeaver produces significantly more reliable results because data never gets serialized to text. LangChain has a much larger ecosystem and community, making it better for general-purpose agent building with broad integrations.
TaskWeaver supports any OpenAI-compatible API endpoint, including GPT-4, GPT-4o, GPT-3.5 Turbo, Azure OpenAI Service deployments, and open-source models served through compatible APIs (like vLLM or Ollama with OpenAI compatibility). Code generation quality scales with model capability — GPT-4 class models handle complex multi-step analytics reliably, while smaller models may produce errors on sophisticated tasks.
TaskWeaver is functional and battle-tested for internal tools and data science workflows, but it carries research-project caveats. There is no commercial support, SLA, or dedicated operations team. Teams using TaskWeaver in production typically add their own error handling, monitoring, and deployment infrastructure. It is well-suited for internal analytics tools and research environments but may need additional hardening for customer-facing applications.
No. TaskWeaver is designed for developers and data scientists who are comfortable with Python. You need Python proficiency to set up the framework, write plugins, debug generated code, and configure the execution environment. Non-technical users should look at no-code alternatives like CrewAI Studio or pre-built analytics chatbots.
TaskWeaver includes automated code verification that checks generated code before execution, plus a sandbox execution mode that restricts file system access, network calls, and system operations. In local mode, generated code runs with the same permissions as the user, so production deployments should use sandbox mode or containerized environments for safety.
Both are Microsoft projects but serve different purposes. AutoGen focuses on multi-agent conversations and collaboration patterns — multiple agents talking to each other. TaskWeaver focuses on single-agent code execution for analytical tasks — one agent that writes and runs Python code to solve data problems. They can work together in larger architectures where AutoGen orchestrates multiple TaskWeaver agents.
Now that you know how to use TaskWeaver, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful multi-agent builders tool in minutes.
Tutorial updated March 2026