Is AI-powered computer vision getting a sense of common sense? Maybe so, says Facebook.
Also, you can now turn your friend into a musical instrument and add immersive effects to your Clips videos.
Computer vision inches towards common sense with Facebook's latest research
Facebook's AI research department has been working to advance and scale advanced computer vision algorithms. It has made steady progress, and one interesting development is "semi-supervised learning."
Semi-supervised learning involves figuring out essential parts of a data set without any labeled data. You give the system, for example, a thousand sentences to study, then show it ten more that have several words missing. The system fills in the blanks based on what it's seen before. The concept is more tricky with pictures and video, but possible, according to Facebook.
The company's DINO system (DIstillation of knowledge with NO labels) can learn to find things of interest in videos of people, animals, and objects quite well without any labeled data by considering the video as a complex, interrelated set instead of a sequence of images to be analyzed one by one in order.
The system can get a sense of things like "an object with this general shape goes from left to right." That information feeds into other knowledge; for example, when two objects overlap, the system knows they're not the same thing, just touching in those frames. The data can then be applied to other situations. It sort of develops a basic sense of visual meaning.
For instance, while an AI that has been trained with 500 dog pictures and 500 cat pictures will recognize both, it won't really have any idea that they're similar in a way. But DINO gets that they're similar to one another, more so anyway than they are to cars.
Transform your friend into an AR musical synthesizer
An indie developer Lucas Rizzotto has made an augmented reality synthesizer app that can turn people into musical instruments.
The app works via granular synthesis - or deconstructing a music sample - where the fragments are manipulated to create new sounds. In the initial iteration, Rizzotto constructed a virtual cube filled with a particle cloud of sound fragments. Users can interact with the cube and generate notes with hand gestures.
In the final form of the instrument, Rizzotto configures it to use his partner's arms as the target for the sound particles as he gestures over them. "This was by far the most rewarding AR project I've ever built. I think it strikes the perfect balance of exciting and crazy, as well as deeply meaningful," Rizzotto says in the video's description.
The app is available for HoloLens 2 or a PC-tethered VR headset for Rizzotto's Patreon page subscribers. He also plans to release the app as an NFT.
Clips adds immersive new AR spaces
Clips, Apple's video creation app, delivers more fun options to record captivating videos with AR Spaces. The app, powered by LiDAR on iPhone 12 Pro and iPad Pro models, enables creators to transform their space by adding immersive visual effects that map to the shapes of a room.
The users can scan their space and see a live preview of effects that bring dynamic lighting, falling objects, and immersive scenes to life. Using the rear camera, users will see the effects appear on walls, floors, surfaces, and furniture.
The app recognizes people in the video and projects the AR effects in front of and behind them. AR Spaces can also be combined with animated stickers, text labels, and emoji overlays and recorded in Clips.
That's all for now, see you next time!