Meta's New Segment Anything Model for Identification Is a Big Deal, Experts Say

SAM can recognize objects it hasn’t seen before

  • Meta’s new AI image segmentation tool could lead to advances like better tagging of photos on social media.
  • The SAM model was trained on a vast database of images.
  • There’s a race to find better ways for computers to detect and recognize objects.
A portrait of a person with image segmentation overlaying half the facial features.

Prostock Studio / Getty Images

Computers are getting closer to human levels of visual perception with improved abilities to detect and recognize objects. 

Meta is rolling out an AI image segmentation model that can see and isolate objects in an image even if it never saw them before. The model, called Segment Anything (SAM), brings faster and more accurate image recognition and reduces reliance on humans to label objects. 

"The flexibility of SAM allows it to be applied to a variety of industries and use cases, such as agriculture, retail, medical imagery, and geospatial imagery, leading to improved outcomes and increased efficiency," Ulrik Stig Hansen, the president at Encord, a software company that recently integrated SAM into its product, told Lifewire, in an email interview. 

Image Segmentation in AI

Meta's software could be a significant boon to computer vision researchers. SAM is an image segmentation model that can respond to text prompts or user clicks to isolate specific objects within an image, Meta researchers wrote in a blog post

One fundamental problem in the field of computer vision is how to get the software to recognize and understand objects it hasn't seen before. The approach used by SAM is image segmentation, which involves dividing an image into multiple segments or regions, each representing a specific object or area of interest. 

SAM uses interactive segmentation, with a human guiding the model by refining results, and automatic segmentation, where the model does it by itself after being trained on hundreds or thousands of annotated objects. The dataset used to teach SAM contains more than 1.1 billion segmentation masks collected from 11 million licensed and privacy-preserving images, meaning it has 400 times more masks than any existing dataset.

The vast dataset lets SAM generalize new types of objects and images beyond what it was trained on. As a result, the researchers claim that AI practitioners will no longer need to collect their own segmentation data and instead can use the open-source SAM model. 

SAM has a head start in recognizing objects and has already learned a general idea of what things are. It can generate "masks" for any object in any image or video, even for objects and images it has not previously encountered. Masking involves identifying an object based on the changes in contrast at its edges and separating it from the rest of the scene. Meta researchers said SAM is general enough for many uses. 

"In the future, SAM could be used to help power applications in numerous domains that require finding and segmenting any object in any image," the researchers said. "For the AI research community and others, SAM could become a component in larger AI systems for a more general multimodal understanding of the world, for example, understanding a webpage's visual and text content. In the AR/VR domain, SAM could enable selecting an object based on a user's gaze and then 'lifting' it into 3D."

Better Object Recognition

Meta's new tools will mean more accurate and efficient image recognition, tech analyst Iu Ayala Portella, the CEO of Gradient Insight, said in an email. For example, a social media platform might be able to automatically recognize and tag objects in your photos, even if they're not commonly recognized objects. "Or, imagine a self-driving car that can better detect and navigate around unexpected obstacles," Portell said. 

Meta SAM Image Tool

Meta

SAM is one of a growing number of improvements to how computers see things. Google has also developed several object detection methods for its AI-powered image recognition systems, some of which are similar to the methods used by Meta. For example, there's Single Shot Multibox Detector (SSD), an object detection method developed by Google that uses a single deep neural network to detect objects in an image. 

But observers said that SAM is unique as an open-source model. 

"The new object recognition and separation model that Meta recently released is a big step forward in computer vision," Vladimir Fomenko, the director of Infatica, said in an email. "This approach proves essential because it enables machines to learn and recognize items in photographs they have never seen before, significantly expanding their capabilities. Automating tasks such as identifying and categorizing things, locating product flaws, and improving picture search results are all possible with the help of this innovation."

Was this page helpful?