Foundation model for computer use trained on 11-million-hour video dataset that can perform complex computer actions like CAD modeling, website navigation, and real-world tasks at 30 FPS.
FDM-1 is an Automation foundation model for computer use that performs complex multi-step tasks like CAD modeling, website exploration, and real-world driving at 30 FPS, with pricing available only through enterprise engagements with Standard Intelligence. It is built for research labs, engineering teams, and enterprises that need long-horizon agentic computer-use capabilities far beyond traditional VLM-based agents.
Released on February 23, 2026 by Standard Intelligence (standard intelligence pbc), FDM-1 represents a fundamental departure from the prior recipe of fine-tuning vision-language models on contractor-annotated screenshots. Instead, FDM-1 was trained on a portion of an 11-million-hour screen recording video dataset, labeled using a custom inverse dynamics model. The architecture combines a video encoder that compresses nearly 2 hours of 30 FPS video into just 1 million tokens, an inverse dynamics model for action labeling, and a forward dynamics model that predicts future video frames conditioned on actions. This long-context training enables FDM-1 to act on minutes of context rather than the few seconds typical of conventional computer-use agents, and it consistently improves with scale.
Based on our analysis of 870+ AI tools in the directory, FDM-1 is one of the only foundation models specifically architected for general computer use rather than retrofitted from a chat-based VLM. Compared to the other Automation tools in our directory â most of which wrap GPT-4 or Claude with screenshot-based browser automation â FDM-1 trains and infers directly on video, which lets it learn unsupervised from internet-scale corpora of coding livestreams, gameplay, film editing, and CAD tutorials. Demonstrations published with the launch include extruding faces on an n-gon to model a gear in Blender, fuzzing websites, and even driving a car in the real world. The team also showed test-time compute techniques using OS checkpoints (forking VMs at successful operations like 'extrude' or 'select') to expand the model's effective reasoning budget on long-horizon tasks. FDM-1 is positioned as the first computer-use model with the long-context training depth needed to serve as a genuine coworker for CAD, finance, engineering, and eventually ML research workflows.
Was this helpful?
FDM-1 is trained on a portion of Standard Intelligence's 11-million-hour screen recording dataset â orders of magnitude larger than the largest open computer-use dataset, which is under 20 hours of 30 FPS video. This internet-scale corpus is what the team argues is necessary for general computer-use agents, paralleling how GPT-3 required an internet-scale text corpus.
The custom video encoder compresses nearly 2 hours of 30 FPS footage into roughly 1 million tokens. This compression ratio is what makes long-context computer-use training feasible, allowing the model to reason over minutes of behavior rather than the few seconds available to screenshot-based VLM agents.
Rather than relying on expensive contractor annotations, Standard Intelligence trained an inverse dynamics model that infers the actions (clicks, keystrokes, mouse movement) that produced any given video segment. This unlocks unsupervised training on coding livestreams, gameplay, CAD tutorials, and other organic computer-use video on the internet.
FDM-1's forward dynamics component predicts future video frames conditioned on actions, generating output at native 30 FPS. This continuous-time formulation is what enables smooth mouse trajectories in CAD demos and the realistic driving demonstration, where discrete screenshot agents would stutter or skip frames.
For long-horizon tasks like CAD modeling, FDM-1 uses a forking virtual machine that snapshots the operating system at successful intermediate operations such as extrude or select. The agent can branch and retry from these checkpoints, providing a test-time compute mechanism analogous to tree search in code or math reasoning.
Custom (contact sales)
Ready to get started with FDM-1?
View Pricing Options âWe believe in transparent reviews. Here's what FDM-1 doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
FDM-1 was announced on February 23, 2026 as Standard Intelligence's first publicly disclosed foundation model for computer use. The launch post introduces the 11-million-hour screen recording dataset, a video encoder that compresses ~2 hours into 1M tokens, inverse and forward dynamics models, and OS-checkpoint test-time compute via forking VMs. Demonstrations include CAD modeling in Blender, website fuzzing, and real-world car driving â all at 30 FPS.
No reviews yet. Be the first to share your experience!
Get started with FDM-1 and see if it's the right fit for your needs.
Get Started âTake our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack âExplore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates âStop drowning in repetitive tasks. These 10 AI automation workflows help small businesses save time on email, customer support, invoicing, social media, and more â with practical setup guidance using accessible tools.
A jargon-free guide to AI automation for business owners. Learn what AI can and can't do, the five functions where it saves the most time, and a practical 4-week implementation plan with real tool recommendations.
Managing social media accounts across five or six platforms used to mean hiring a dedicated team or spending your weekends writing captions. AI tools have compressed that workflow. A single marketer can now draft platform-specific posts, schedule them across channels, and track p
Two years ago, learning **how to build an AI agent** required a Python environment, API credentials, and at least a weekend of debugging async functions. That barrier has dropped sharply. Visual workflow builders now let operations managers, marketers, and solo founders assemble