By the time Erik Schluntz co-founded Cobalt Robotics in 2016, he had already carved a few notches in his belt.
The engineer-cum-tech-entrepreneur had worked on SpaceX’s Flight Software Team, testing a fuel system on the Falcon 9 rocket in 2015; had prototyped new uses for low power electronics on Google X’s Smart Contact Lens project; and had temporarily set aside his studies in electrical engineering at Harvard to start Posmetrics, a data collection tool that would help businesses collect customer feedback.
Is it any wonder that in 2018, at the ripe old age of 24, Schluntz was named to Forbes 30 Under 30 for his work on a security robot?
Today, Schluntz’s role as chief technology officer (CTO) at Cobalt Robotics in San Mateo, Calif. allows him to combine his interests in both software and hardware. “Robotics is so interdisciplinary that you need to understand the software, the mechanical hardware, the electrical design, as well as the human factors and economic factors in order to create a solution that works and solves across all these domains,” he said.
Cobalt’s robots currently work as security guards by night and recharge on their off hours. But Schluntz and his team are working on expanding their fleet of robots to become multi-purpose. “We’re looking at having them become virtual receptionists during the day, especially at remote offices, where a lot of places now have smaller satellite offices and don’t have a full-time receptionist,” he said. “The robot could be the receptionist and have a sign-in screen where you can tap to talk to a human operator if needed. And then, at night it becomes a security guard robot.”
In the following edited version of an interview with Machine Design, Schluntz calls out four technology strides that are transforming the fields of robotics and AI in the near term. These technologies are stoking wider adoption, he said, but are merely precursors to what lies ahead.
Machine Design: Cobalt’s work to make the user experience easy or seamless is a big factor in your success. Part of your work involves Large Language Models. [In lay terms, this means using AI tools to read, analyze and translate speech to text and predict future words.] Why will LLM build on its importance in the field of AI in the coming years?
Erik Schluntz: Large Language Models are really exciting to me, simply because language is the interface that humans use to communicate with each other. Almost anything can be expressed in language. Whether it’s a problem of security—you would tell the security guard what to look for through language, or when designing something, tell the engineer what needs to be built—language is the universal interface between everything.
Language models are not just good for completing text, but they’re good for everything when you think about text as this universal interface. As an example, a language model can help control a robot. Someone could instruct the robot to, “Make me a peanut butter and jelly sandwich.” And then the language model will understand what a peanut butter and jelly sandwich is, and could even break that down into steps.
Whereas before, a person designing a robot, such as a home helper robot, would have to manually program in any possible thing that the robot can do. Language models encode all of this common sense, or contextual knowledge that humans have. The interesting way that that happens is that they were basically trained by reading the entire Internet, given sentences from the internet and instructed to predict the next word. It’s really fascinating that in order to do this well, these large language models basically had to form an understanding of how everything connects together. And so, it really does give this common sense that feels very human, and that robots and AI have really been missing before this.
MD: That brings us to the second trend your work calls out, which is combining LLM with robotics. You mentioned common sense. The idea of training large amounts of data can be concerning. I think about how you train these models and how they respond. And I think about good data versus bad data. And I think about biased data. Help me understand and contextualize why this is an important area of growth?
ES: Those are really good concerns. And there are a lot of people worried that, on the entire Internet, there’s a lot of language that you don’t want your language model repeating. Just like raising a kid, you wouldn’t want them reading certain parts of the internet. Reddit comes to mind, as it’s a place where I’m sure there’s all sorts of nasty language. I think there’s a lot of researchers that are working on how to do that and how to filter out this content, and make sure that these language models can understand the difference between real news and fake news or offensive content, or [obscene] content.
Within robotics, those concerns are a little bit less pressing, because this common sense of the physical world, of knowing that if the robot is instructed to make a sandwich, that means taking two pieces of bread and putting things between it, without someone needing to explicitly program that into the computer.
I’m sure I could be surprised, but I don’t think there’s a lot of room for offensive or bad content to sneak into these physical world understandings. I don’t think there’s a lot of people trolling the internet, and writing fake instructions on how to make sandwiches to mess up future robots. But that’s one of the reasons I’m most excited by this physical world common sense; it seems to be pretty accurate on the internet.
And the biggest research has gone into this has been from Google so far. They worked on combining a language model with a robot, and it was able to do very general tasks using the language model to convert the high-level instruction into step-by-step steps that the robot could do. And that way the creators of the robot can just focus on making individual, small tasks that a robot can do. For instance, to pick something up, bring it somewhere, combining something. And then the language model can figure out how to convert a high-level instruction from one thing to another.
So back to the sandwich example. If someone gave the instruction of, “Make a peanut butter and jelly sandwich,” it could convert the language into the step-by-step instructions of “find bread, find peanut butter, find jelly, put them together.” In the past, that could also be programmed by hand.
But the really cool thing about the language model now is that once you’ve done this, it knows how to make any kind of sandwich that the Internet knows how to make. You could also ask it to make a Reuben, and it would know how to make a Reuben, whereas, previously someone would have to go and program in the instructions for any possible kind of sandwich. So, it creates this very general-purpose ability to scale to many different instructions.
And, as a roboticist, one of the interesting things that I heard in the past was that “if it’s just as hard to describe a task as it is to actually do it, it will never be automated.” I think the language models could be a solution to that. You can just say, “Do it,” and it will have the common sense to do it. And you don’t need to tediously program, which would probably make the funny example of sandwich making a prohibitively expensive thing to build.
MD: So that brings us to the question, “Will the robot take my job?”
ES: I think this is a really fascinating topic. Five years ago, everyone thought that by now truck drivers and taxi drivers would all be out of a job, replaced by self-driving cars. That clearly hasn’t materialized. People think of blue collar work as something that can be automated because a lot of it has been automated in the past. But the interesting thing about AI is that it is best at things that humans are bad at. They are these very complementary skill sets.
One of the earliest things that AI was amazing at was chess. And before this happened, people thought chess is the pinnacle of human intelligence. AI actually was really good at that even 50 years ago, but couldn’t do very basic things that the humans take for granted. So, AI progress is actually happening from the opposite end of the spectrum than we all expected.
And if you look at things like generative models and images that have come out this year, with things GPT-3, Dall-E and Stable Diffusion, you see AI able to do things like generating artwork really effectively. One would think creativity might be the last thing that ever gets automated. And it seems like it’s happening in the other direction. However, I’m optimistic that these things are not going to replace jobs, but augment them.
MD: The idea of a humanoid robot and general-purpose robots, I think, go hand-in-hand. Talk a little bit about where we’re at with the humanoid robot.
ES: Humanoid robots and general-purpose humanoids have fascinating history. There is a large graveyard of companies that tried to make general-purpose robots. In the past, general purpose kind of meant “no purpose.” And they would say it’s general purpose because they couldn’t come up with a use case for it. People doing that failed and went out of business, and those that made much more specific robots for warehouses or for security have done quite well.
The really interesting thing in the last year is Tesla coming out with the Tesla Bot and launching this project to try to leapfrog all the way to the end. The idea there is that ultimately, in a world where tons of different things are done by robots, everything at the end of the day is built for a human interface.
A humanoid robot is this universal interface to the physical world, where, if a human can use it, a human-like robot will do it. Tesla made a lot of interesting progress over the last year, and basically went from just an idea to a walking robot, which they demoed at AI Day in Palo Alto, Calif. recently.
What to note, though, is that Tesla caught up to the state of the art incredibly quickly—in about a year. But they’ve just reached the edge of it. I’m very curious to see whether they will actually be able to push beyond the current state of the art and do new things that others have not been able to do. There’s a big jump between catching up and extending beyond that. So, I’m really looking forward to seeing that in this next year.
For humanoid robots getting out into the wild, that will be a much longer term thing. Legs are much, much more expensive to build than a simple wheel. And luckily for Cobalt, and a lot of other robotics companies, human spaces are also built for people. Wheelchairs and robots with wheels are able to get around really well in most of our environments. Jumping to full humanoid adds a lot of complexity and a lot of cost that right now I don’t think is worth it, or even if 10 years from now, when it is more fully featured and can get to those last-edge cases that robots with wheels can’t get to today.