Self-Operating Computer
An open-source framework to enable multimodal models to operate a computer.
Self-Operating Computer
Using the same inputs and outputs of a human operator, this framework enables multimodal AI models to view the screen and decide on a series of mouse and keyboard actions to reach an objective.
Integration
It is currently integrated with GPT-4o, o1, Gemini Pro Vision, Claude 3 and LLaVa.
Compatibility
Designed for support across operating systems and to be used various multimodal models.
This project is compatible with Mac OS, Windows, and Linux (with X server installed).
Join the Discussion and Contribute on GitHub
We encourage contributions and discussion via the Self Operating Computer GitHub page.
Our team is unable to provide custom support at this time.