Old universities tend to isolate disciplines into separate buildings. Still, back in the ‘50s, MIT was forced to merge a diverse group of scientists into a temporary one - "The Plywood Palace" - accidentally creating one of the most creative spaces in the world for the next 40 years. The same multidisciplinarity is at the heart of Aalto University in Helsinki today, where the science, business, and art schools are merged for cross-pollination by design.
Tapio Friberg was a research assistant at Aalto studying mining water circulation, and at the time, a project where students build real-life satellites was underway. By accident, he wandered into one of their kickoff events, and after the show, the professor quizzed the room on people's backgrounds. Everybody seemed to be an electrical engineer.
"Do you have anything for an image processing guy?" Tapio asked out of curiosity.
"I don't, but I know somebody who might," the professor replied. Next - in a scene that could be straight out of an Aalto University advertisement - the professor escorted Tapio into the offices of ICEYE next door. Today he works for the company as a senior machine learning engineer in Luxembourg.
ICEYE is a world leader in synthetic-aperture radar (SAR) microsatellites, with a bold mission to continuously monitor the entire globe. They are an Aalto University spin-off started in 2014, securing a respectable $136M series D funding in early 2022.
Where classic optical imaging satellites rely on very short wavelengths relative to visible light, the SAR technology is based on significantly longer ones. These microwaves penetrate the clouds and produce reliable data regardless of weather or lighting. ICEYE has built a constellation of small and affordable SAR satellites, and perhaps for the first time in the world, allowing reliable around-the-clock observation of the earth's surface.
The ability to persistently monitor locations is critical in order to detect the high-frequency change and makes it possible to react to events much quicker than before. In time-critical cases like disaster relief, their satellites can provide real-time information through any storm, clouds, or debris.
The ICEYE constellation produces vast amounts of unique data, and with new data comes great opportunities. One of these opportunities is deforestation, a topic that has gotten Tapio's attention.
There are places under cloud cover where illegal logging can happen without the fear of getting caught. With traditional satellites, the role of the observer is to come afterward and record the damage. “Using our continuous SAR imaging, the passive observer can be turned into an active participant, potentially interrupting the deforestation in the act," he explains.
But SAR images alone are not enough. The amount of raw data is so vast that no manual process could hope to capitalize on it. It's where the background in machine learning and image processing comes into play. With his small team, they have trained a deforestation monitoring model capable of scanning hundreds of thousands of square miles repeatedly. "SAR is a change detection machine," he sums up, explaining that the real value comes from piling images on top of each other over time.
Training ML models with exotic data and increased temporal granularity was not straightforward, though. No "SAR ImageNet" is available, and manual labeling can be a brutal undertaking. The existing datasets for historical deforestation have a yearly resolution, while the data from ICEYE satellites is a daily stream.
The solution used the annual data as labels, training a model to transform SAR images into a generic forest-or-not probability map. This encoder becomes the first part of the final model, which can detect daily deforestation. We recommend reading Tapio’s article for a deeper dive into the modeling process.
Teamwork and tooling
Today, Tapio is part of a growing machine learning team at ICEYE, originally spun off from the analytics side. He feels the job title scene hasn't kept up with the times and identifies fluid roles like algorithm developer, data gatherer, and infrastructure builder. The team even discusses these labels during recruiting. He thinks it's a good idea to try a bit of everything, but eventually, it's more efficient when people specialize. "An efficient machine learning team should be an experiment factory – conveyor belts and all," he says.
Success in ML comes from a high experimentation rate driven by tooling choices. "The faster we can create experiments, the better we are at our job,” he continues. Every tool is to ramp up that rate in the fastest sustainable way. For example, moving to Pytorch and later to Pytorch Lightning have been great leaps in productivity for the team, as they have made GPU training trivial. “It's amazing how much the basic infrastructure has improved during the past years,” he rejoices.
For orchestrating the experiments, ICEYE has chosen the Valohai MLOps platform. "We evaluated Valohai being the least oppressive, allowing us scaling power without making us jump through too many hoops," he remembers. "We want to move fast, be able to change direction fast, and leave a trail of documentation while doing so."
The Future of ML
As data scientists worldwide are painfully aware, it is extremely hard to predict the future. Tapio lays his professional hope on semi-supervised and self-supervised learning as ICEYE models need to shift through huge amounts of different data. The cost of labeling large satellite images means ICEYE still has a large pool of untapped data, and these new techniques are developing into a state where they might start to make an impact.
On a more personal level, he has been pondering world models and mentions the book I Am a Strange Loop by Douglas R. Hofstadter. The book takes a deep dive into paradoxes within layers of the human consciousness, which loosely links with the future of machine learning. He anticipates that "agents navigating around lower-dimensional latent spaces - instead of the raw, brutal reality - will be an important part of expanding the domain of problems where machine learning will be robust enough."
Finally, for all the upcoming ML pioneers and data science enthusiasts, there is one piece of advice: Don't fall in love with a single idea. The best practice is to set up a personal deadman's switch for all projects upfront: Ideas might deserve three months of hopium, but many don’t pan out, and it’s time to pull the switch and move on. "Sometimes the necessary information simply isn't in the data, and dropping ideas is a big part of keeping the experiment factory healthy," he concludes.