There's a problem in machine learning you'll likely start hearing more about. No, it's not hallucinations. It concerns what happens when data that machines learn from is found to be false, inaccurate, or becomes outdated. Imagine the consequences of a legal case regarding the right to be forgotten.
The issue of forgetting or machine unlearning aims to remove the influence of a specific subset of training examples from a trained model without the loss of other beneficial properties — it's akin to updating the value of one cell within an Excel document, which doesn't reprocess the whole document, but just the cells dependent on it.
It's a complex challenge preoccupying industry minds — Google even launched a competition to determine the best solutions earlier this summer. There's no single solution but a myriad of pathways for practitioners working with batch, streaming, and large language models (LLMs).
French deep tech startup Pathway has found one way forward and, in doing so, solves a pervasive problem in industries like shipping and supply chain. CEO and co-founder Zuzanna Stamirowska is the author of the state-of-the-art model for forecasting maritime trade published by the National Academy of Sciences of the USA.
While working on this project, she saw that the digitization of the logistics industry was slowed down by the lack of a software infrastructure capable of doing automated reasoning on top of data streams in real-time.
IoT, artificial intelligence, big data analytics, and automation converge in this space, but problems persist despite this combinatorial innovation. This was the spark to launch Pathway.
"Currently, it's tough to design systems that can process information in real-time and have systems that react to changes on the fly. This is problematic where businesses rely on data for immediate decision making."
The Pathway data processing framework grew from close collaboration with large organisations performing IoT data analytics at scale.
Imagine extracting the value of data from sensors on shipping containers that track cargo shipments. It's not just about combining seemingly disparate data from the thousands of cargo loads that make up a shipment but also accommodating low-bandwidth environments that result in delayed data syncing with their cloud.
This results in a veritable mismatched data deluge of delayed and real-time data that has to be managed at speed in real-time. It creates a challenge for data processing as the data platform has to unlearn out-of-date data continually.
"We observed the problem first-hand and determined how to not only bridge the gap but maybe even push further some technological advances to address these very practical, tangible challenges across freight and logistics industries."
The tech startup has found a way to overcome this by continuously training AI systems and LLMs with new streaming data, making it possible to revise data points without requiring a new full batch data upload. This makes it possible to correct inaccurate source information to improve system outputs.
And its gained valuable traction in industries reliant on accurate real-time data. With CMA CGM shipping, Pathway improved the precision of container gate-out ETAs (Estimated Time of Arrival) and improved terminal operations, resulting in a speed-up of the container handling times, lowering business and environmental costs.
However, it's worth stressing that the applicability of Pathway is industry-agnostic and application to various data types, including table-like data, time series, IoT messages, event streams, things-in-motion, graphs, and ontologies. The company is focused on product refinement and ensuring the platform's intelligence is easy to use in observability and other challenges across various industries.
Lead image: Chuttersnap.