Automation

FDM-1

Name: FDM-1
Brand: FDM-1

Foundation model for computer use trained on 11-million-hour video dataset that can perform complex computer actions like CAD modeling, website navigation, and real-world tasks at 30 FPS.

Starting atCustom (contact sales)

Visit FDM-1 →

Overview

FDM-1 is an Automation foundation model for computer use that performs complex multi-step tasks like CAD modeling, website exploration, and real-world driving at 30 FPS, with pricing available only through enterprise engagements with Standard Intelligence. It is built for research labs, engineering teams, and enterprises that need long-horizon agentic computer-use capabilities far beyond traditional VLM-based agents.

Released on February 23, 2026 by Standard Intelligence (standard intelligence pbc), FDM-1 represents a fundamental departure from the prior recipe of fine-tuning vision-language models on contractor-annotated screenshots. Instead, FDM-1 was trained on a portion of an 11-million-hour screen recording video dataset, labeled using a custom inverse dynamics model. The architecture combines a video encoder that compresses nearly 2 hours of 30 FPS video into just 1 million tokens, an inverse dynamics model for action labeling, and a forward dynamics model that predicts future video frames conditioned on actions. This long-context training enables FDM-1 to act on minutes of context rather than the few seconds typical of conventional computer-use agents, and it consistently improves with scale.

Based on our analysis of 870+ AI tools in the directory, FDM-1 is one of the only foundation models specifically architected for general computer use rather than retrofitted from a chat-based VLM. Compared to the other Automation tools in our directory — most of which wrap GPT-4 or Claude with screenshot-based browser automation — FDM-1 trains and infers directly on video, which lets it learn unsupervised from internet-scale corpora of coding livestreams, gameplay, film editing, and CAD tutorials. Demonstrations published with the launch include extruding faces on an n-gon to model a gear in Blender, fuzzing websites, and even driving a car in the real world. The team also showed test-time compute techniques using OS checkpoints (forking VMs at successful operations like 'extrude' or 'select') to expand the model's effective reasoning budget on long-horizon tasks. FDM-1 is positioned as the first computer-use model with the long-context training depth needed to serve as a genuine coworker for CAD, finance, engineering, and eventually ML research workflows.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

11-Million-Hour Video Training Corpus+

FDM-1 is trained on a portion of Standard Intelligence's 11-million-hour screen recording dataset — orders of magnitude larger than the largest open computer-use dataset, which is under 20 hours of 30 FPS video. This internet-scale corpus is what the team argues is necessary for general computer-use agents, paralleling how GPT-3 required an internet-scale text corpus.

Video Encoder with 1M-Token / 2-Hour Compression+

The custom video encoder compresses nearly 2 hours of 30 FPS footage into roughly 1 million tokens. This compression ratio is what makes long-context computer-use training feasible, allowing the model to reason over minutes of behavior rather than the few seconds available to screenshot-based VLM agents.

Inverse Dynamics Model for Unsupervised Labeling+

Rather than relying on expensive contractor annotations, Standard Intelligence trained an inverse dynamics model that infers the actions (clicks, keystrokes, mouse movement) that produced any given video segment. This unlocks unsupervised training on coding livestreams, gameplay, CAD tutorials, and other organic computer-use video on the internet.

Forward Dynamics Model at 30 FPS+

FDM-1's forward dynamics component predicts future video frames conditioned on actions, generating output at native 30 FPS. This continuous-time formulation is what enables smooth mouse trajectories in CAD demos and the realistic driving demonstration, where discrete screenshot agents would stutter or skip frames.

OS Checkpointing and Forking VMs for Test-Time Compute+

For long-horizon tasks like CAD modeling, FDM-1 uses a forking virtual machine that snapshots the operating system at successful intermediate operations such as extrude or select. The agent can branch and retry from these checkpoints, providing a test-time compute mechanism analogous to tree search in code or math reasoning.

Pricing Plans

Enterprise

Custom (contact sales)

✓Full access to FDM-1 foundation model for computer use
✓30 FPS native video inference for long-horizon tasks
✓CAD modeling, website automation, and multi-step workflow capabilities
✓OS checkpoint and forking VM infrastructure for test-time compute
✓Custom deployment and integration support
✓Research partnership and co-development options

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with FDM-1?

View Pricing Options →

Best Use Cases

🎯

Long-horizon CAD modeling workflows in tools like Blender where an agent must perform tens of continuous mouse movements and operations such as extrude, select, and transform without losing context

⚡

Automated website exploration and fuzzing for QA and security research, where the agent must navigate complex multi-step flows beyond the few-second context of screenshot-based agents

🔧

Enterprise research partnerships exploring computer-use coworkers for finance, engineering, or ML research where minute-scale context and 30 FPS control are required

🚀

Embodied or physical-world tasks demonstrated by the team, such as driving a car, where continuous-time video understanding outperforms discrete screenshot reasoning

💡

Internal R&D teams building on top of a foundation model rather than orchestrating a VLM with screenshot tools and per-task RL environments

🔄

Dataset and infrastructure teams that want to leverage internet-scale unlabeled video (livestreams, gameplay, tutorials) instead of paying for contractor-annotated computer-use data

Limitations & What It Can't Do

We believe in transparent reviews. Here's what FDM-1 doesn't handle well:

⚠Closed model with no public API, SDK, or self-serve onboarding documented as of the February 23, 2026 release
⚠No published benchmark scores against competing computer-use models (OSWorld, WebArena, etc.) — capabilities shown only through demo videos
⚠Test-time compute approach relies on forking VMs and OS checkpoints, which requires substantial infrastructure to reproduce
⚠Driving and other real-world demos are research showcases, not productized capabilities with safety certifications
⚠No disclosed information on languages, supported operating systems, or accessibility for non-English interfaces

Pros & Cons

✓ Pros

✓First computer-use foundation model trained on internet-scale video (11M hours), versus the largest open computer-use dataset of under 20 hours of 30 FPS video
✓Native 30 FPS video processing enables continuous control like smooth mouse movement and CAD operations rather than discrete screenshot-by-screenshot reasoning
✓Highly efficient video encoder compresses nearly 2 hours of footage into just 1M tokens, unlocking minute-scale context windows
✓Unsupervised training via the inverse dynamics model removes the bottleneck of expensive contractor-labeled screenshots
✓Test-time compute via OS checkpoints / forking VMs lets the model retry from validated intermediate states on long-horizon tasks
✓Demonstrably general — the same model performs CAD modeling, website fuzzing, and real-world driving without task-specific RL environments

✗ Cons

✗No public API, pricing page, or self-serve access — gated to enterprise and research partners
✗Capabilities are demonstrated through curated video clips rather than peer-reviewed benchmarks against established computer-use leaderboards
✗Released February 23, 2026, so production track record, reliability, and safety guardrails are unproven at scale
✗Inference at 30 FPS on minute-long video contexts implies significant GPU cost not disclosed publicly
✗No documentation of supported operating systems, integrations, or developer tooling beyond the research blog post

Frequently Asked Questions

What is FDM-1 and who built it?+

FDM-1 is a foundation model for general computer use built by Standard Intelligence (standard intelligence pbc), announced February 23, 2026. Unlike prior computer-use agents that fine-tune a vision-language model on screenshots, FDM-1 trains and infers directly on video at 30 FPS. It was trained on a portion of an 11-million-hour screen recording dataset labeled by a custom inverse dynamics model. The team positions it as the first fully general computer action model.

How is FDM-1 different from screenshot-based agents like Claude Computer Use or OpenAI's Operator?+

Traditional computer-use agents fine-tune a VLM on contractor-annotated screenshots, which limits them to a few seconds of context, low framerates, and short-horizon tasks. FDM-1 instead trains directly on 30 FPS video and uses a video encoder that compresses ~2 hours into 1M tokens, giving it minute-scale context. It also avoids per-task reinforcement learning environments, learning unsupervised from the open internet's video corpus. Based on our analysis of 870+ AI tools, this is the only Automation entry that trains a custom video foundation model end-to-end for computer use.

What can FDM-1 actually do today?+

Standard Intelligence demonstrated FDM-1 performing multi-action CAD sequences in Blender (including extruding faces on an n-gon to make a gear), exploring and fuzzing complex websites, and driving a car in the real world — all at 30 FPS. The CAD demo uses OS checkpoints created at successful operations (extrude, select, etc.) to enable test-time compute via a forking VM. The blog post emphasizes that capabilities consistently improve with scale, and the team frames the current model as the first step toward CAD, finance, engineering, and ML-research coworker agents.

How much does FDM-1 cost and how do I access it?+

FDM-1 has no published pricing or self-serve access as of the February 23, 2026 announcement. Standard Intelligence describes it as a research milestone in a blog post at si.inc/posts/fdm1/, and access appears to be limited to enterprise or research partnerships. Compared to other Automation tools in our directory that publish $20–$200/month tiers, FDM-1 sits firmly in the enterprise / contact-sales segment with no free or developer tier announced.

What are the technical components of FDM-1's training recipe?+

The training recipe has three core components, all described in the launch post. First, a video encoder that compresses approximately 2 hours of 30 FPS video into 1 million tokens, enabling long-context training. Second, an inverse dynamics model that labels raw screen recordings with the actions that produced them, removing the need for contractor annotation. Third, a forward dynamics model that predicts future frames conditioned on actions, which is the component used to drive the agent at inference time.

🦞

New to AI tools?

Learn how to run your first agent with OpenClaw

Learn OpenClaw →

Get updates on FDM-1 and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

FDM-1 was announced on February 23, 2026 as Standard Intelligence's first publicly disclosed foundation model for computer use. The launch post introduces the 11-million-hour screen recording dataset, a video encoder that compresses ~2 hours into 1M tokens, inverse and forward dynamics models, and OS-checkpoint test-time compute via forking VMs. Demonstrations include CAD modeling in Blender, website fuzzing, and real-world car driving — all at 30 FPS.

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try FDM-1 Today

Get started with FDM-1 and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about FDM-1

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

📚 Related Articles

10 AI Automation Workflows Every Small Business Should Build in 2026

Stop drowning in repetitive tasks. These 10 AI automation workflows help small businesses save time on email, customer support, invoicing, social media, and more — with practical setup guidance using accessible tools.

2026-03-1412 min read

Beginner's Guide to AI Automation for Business (2026)

A jargon-free guide to AI automation for business owners. Learn what AI can and can't do, the five functions where it saves the most time, and a practical 4-week implementation plan with real tool recommendations.

2026-03-1210 min read

Complete Guide to AI Social Media Automation in 2026: From Content Creation to Performance Analytics

Managing social media accounts across five or six platforms used to mean hiring a dedicated team or spending your weekends writing captions. AI tools have compressed that workflow. A single marketer can now draft platform-specific posts, schedule them across channels, and track p

2026-04-15T02:34:00Z5 min read

How to Build an AI Agent in 2026: Complete No-Code Guide for Business Automation

Two years ago, learning **how to build an AI agent** required a Python environment, API credentials, and at least a weekend of debugging async functions. That barrier has dropped sharply. Visual workflow builders now let operations managers, marketers, and solo founders assemble

2026-04-09T18:04:37Z5 min read

Overview

Key Features

11-Million-Hour Video Training Corpus+

Video Encoder with 1M-Token / 2-Hour Compression+

Inverse Dynamics Model for Unsupervised Labeling+

Forward Dynamics Model at 30 FPS+

OS Checkpointing and Forking VMs for Test-Time Compute+

Pricing Plans

Enterprise

Custom (contact sales)

✓Full access to FDM-1 foundation model for computer use
✓30 FPS native video inference for long-horizon tasks
✓CAD modeling, website automation, and multi-step workflow capabilities
✓OS checkpoint and forking VM infrastructure for test-time compute
✓Custom deployment and integration support
✓Research partnership and co-development options

Best Use Cases

🎯

Long-horizon CAD modeling workflows in tools like Blender where an agent must perform tens of continuous mouse movements and operations such as extrude, select, and transform without losing context

⚡

Automated website exploration and fuzzing for QA and security research, where the agent must navigate complex multi-step flows beyond the few-second context of screenshot-based agents

🔧

Enterprise research partnerships exploring computer-use coworkers for finance, engineering, or ML research where minute-scale context and 30 FPS control are required

🚀

Embodied or physical-world tasks demonstrated by the team, such as driving a car, where continuous-time video understanding outperforms discrete screenshot reasoning

💡

Internal R&D teams building on top of a foundation model rather than orchestrating a VLM with screenshot tools and per-task RL environments

🔄

Dataset and infrastructure teams that want to leverage internet-scale unlabeled video (livestreams, gameplay, tutorials) instead of paying for contractor-annotated computer-use data

Limitations & What It Can't Do

We believe in transparent reviews. Here's what FDM-1 doesn't handle well:

⚠Closed model with no public API, SDK, or self-serve onboarding documented as of the February 23, 2026 release

⚠No published benchmark scores against competing computer-use models (OSWorld, WebArena, etc.) — capabilities shown only through demo videos

⚠Test-time compute approach relies on forking VMs and OS checkpoints, which requires substantial infrastructure to reproduce

⚠Driving and other real-world demos are research showcases, not productized capabilities with safety certifications

⚠No disclosed information on languages, supported operating systems, or accessibility for non-English interfaces

Pros & Cons

✓ Pros

✓First computer-use foundation model trained on internet-scale video (11M hours), versus the largest open computer-use dataset of under 20 hours of 30 FPS video
✓Native 30 FPS video processing enables continuous control like smooth mouse movement and CAD operations rather than discrete screenshot-by-screenshot reasoning
✓Highly efficient video encoder compresses nearly 2 hours of footage into just 1M tokens, unlocking minute-scale context windows
✓Unsupervised training via the inverse dynamics model removes the bottleneck of expensive contractor-labeled screenshots
✓Test-time compute via OS checkpoints / forking VMs lets the model retry from validated intermediate states on long-horizon tasks
✓Demonstrably general — the same model performs CAD modeling, website fuzzing, and real-world driving without task-specific RL environments

✗ Cons

✗No public API, pricing page, or self-serve access — gated to enterprise and research partners
✗Capabilities are demonstrated through curated video clips rather than peer-reviewed benchmarks against established computer-use leaderboards
✗Released February 23, 2026, so production track record, reliability, and safety guardrails are unproven at scale
✗Inference at 30 FPS on minute-long video contexts implies significant GPU cost not disclosed publicly
✗No documentation of supported operating systems, integrations, or developer tooling beyond the research blog post

Frequently Asked Questions

What is FDM-1 and who built it?+

How is FDM-1 different from screenshot-based agents like Claude Computer Use or OpenAI's Operator?+

What can FDM-1 actually do today?+

How much does FDM-1 cost and how do I access it?+

What are the technical components of FDM-1's training recipe?+

What's New in 2026