AI Blog

Robots are already like humans. Through AI implications, machines are gaining skills that were reserved only for humans. Until now, many of these machines required precise programming for the tasks they were supposed to perform, which limited their flexibility . In the current era, the emergence of such large language models as LLM and the development of multimodal AI systems have opened the way for something groundbreaking in robotics with AI.

Google, with Gemini 2.0, has introduced Gemini Robotics and Gemini Robotics ER, a set of AI models that not only understand but can visually identify the environment, plan the execution of activities and allow us to program them additionally from the observer’s point of view. This is groundbreaking in that individual robots gain something like visual perception , language understanding and the ability to learn new tasks in real time. This is significant in that robots can already perform tasks for which they were not originally programmed and make adaptive decisions on their own in the form of image , sound or even knowledge analysis. This raises a wide range of possibilities in terms of both the development and application of machines in the near future. As a result, today’s robots are capable of much more than just a few years ago, and their training takes much less time and is much cheaper.

Already, language models are already teaching how to combine verbal commands and visual cues from the environment and take physical actions on their own. What does this mean? Such a robot can get directions in ordinary everyday language and then decide how to grab a cup , arrange fruit in a container moved by a human or take care of assembling a complex figure.

The ability shown in videos by engineers to respond to unpredictable changes such as a sudden change in the position of an object proves that the robot understands what it is doing and is no longer a machine merely reproducing a typed program. Today’s robot prototypes such as ALOHA are capable of performing complex manual tasks such as precisely folding paper , packing delicate items and changing their motion sequences as needed.

From the point of program writing, it is no longer necessary to painstakingly teach the robot all possible cases by trial and error . It is enough to give it a target command or a description of the desired effect and the multimodal architecture allows one algorithm to handle different types of machines.

Such advanced capabilities, however, raise questions about safety and liability. Already, Google is betting on a holistic programming approach in which every decision of a robot is evaluated both physically ( whether it will not cause a collision) and sematically (whether it conforms to ethical principles of behavior). To this end, the so-called Constitution of a robot with Asimov’s laws of robotics and also Asimov’s metrics for evaluating situations and potential risks arising from complex robot actions have already been developed. The safety aspect is becoming increasingly important because devices controlled by advanced AI must be resistant to attempts to manipulate the model that could “persuade” it to perform undesirable or dangerous tasks.

The benefits of implementing such robots can be enormous and bring a revolution in the common life of humans, starting with assistants improving work, support for the disabled, conversation in many languages, ending with universal robots that can do almost everything.

The Gemini Robotics project is the first prototype of such a robot capable of its own interpretation of tasks, in which the vision of machines capable of all tasks is becoming increasingly clear.

AI Blog

Seamless Factory-Warehouse Integration Services

Made with love by devispace