Jan Chorowski, a man who is training artificial intelligence with the use of logbooks kept by the captains of the 17th century Dutch fleet, in conversation with Monika Redzisz
Monika Redzisz: I’ve placed an order with AliExpress. My shipment from China has been delayed for a month. Maybe it’s because of the coronavirus and maybe the ship was attacked by the pirates of the Caribbean… I don’t care! I’m still waiting for my package and I don’t know when it’s going to be delivered. Can NavAIgo prevent such situations from happening in the future?
*Jan Chorowski: We are working on it. There is nothing surprising in customers wanting to know where their shipment is and, when it is going to be delivered and whether it is in good condition. I’ve experienced that too. I once ordered Chinese RC cars I needed for the science festival. The package came one month after the scheduled date. Back then you didn’t get any text messages informing you where your shipment was. Unfortunately, the more we’re dependent on mass transportation services, the less we know about them. The only thing we’re told is the estimated delivery time. And this is why we have come up with the idea to create algorithms that would streamline the whole process and that would be able to provide information about any potential transportation problems and, should it be the case, to tell us how long the delay is. And I think we’re getting there.
How long will we have to wait for the results?
So far we have done two research projects focused on the estimated delivery time and the analysis of data from IoT loggers. One of them was carried out in collaboration with CMA CGM, which is the fourth biggest shipowner in the world, while the other one was performed together with DB Schenker. Big transportation companies have changed their approach. They have been investing in devices installed in the containers allowing to track the shipments they carry. It’s the first step to get all necessary transportation information.
What other advantages can such a technology bring apart from the fact that we won’t have to worry about our package?
Ultimately, it will help us optimize the use of the resources. For example, the navigation capable of precisely estimating delivery times will allow us to better manage our own time.
And what if we didn’t have to transcribe the speech? What if we could record what’s on the radio? Would the computer learn to recognize the speech just as children do?
Likewise, to draft an optimal plan for ordered goods we must have reliable forecasts on how long it will take to transport them. It will no longer be necessary to order goods in advance or to have them on stock just in case to be on the safe side. Today, it’s cheaper to order excessive 10 percent than to deal with negative opinions of 3 percent of your online customers. Today’s forecasts are nothing but wild guesses. The question is how inaccurate they are and what we can do to make them more precise with the use of the computers.
Where did you come up with the idea for a company with that line of business?
NavAIgo is the sum of my interests and the interests of the co-founders: Zuzanna Kosowska, Adrian Kosowski and Bartek Dudek. Bartek and Adrian specialize in graph algorithms. Zuzanna has a PhD degree in development of transport connections in maritime trade. I’m an assistant professor at the Faculty of Mathematics and Computer Science , University of Wrocław, and I deal with speech processing and speech recognition. Although we are experts in different fields, we all share the same passion for logistics. It is based on the graph structure: nodes and connections between them. For example the cities and the road network. It can be used to model a variety of programs: from searching for the shortest path, for example in internet navigation, to categorizing such networks, i.e. identifying similarities between two networks. Surprisingly, my experience with speech analysis has also proven useful in my work for NavAlgo.
How long have you been dealing with speech recognition?
Since 2013. I graduated from the Faculty of Microsystem Electronics and Photonics, Wrocław University of Science and Technology. I took my PhD from the University of Louisville, USA, although my supervisor, professor Jacek Żurada, was of Polish origin. I came back to Poland as an expert specializing in neural networks. Back then, neural networks started improving their performance in almost every domain they were used. That was the beginning of the boom for deep learning.
I then joined Google, which offered me a job after I had managed to improve neural models for speech recognition. I worked in Mountain View for six month and I continued to cooperate with them for another year and a half but at that that time I worked from home in Poland. Although I was flattered by Google’s interest in my speech recognition, I also realized that after my return to Poland I wouldn’t be able to compete with the industry if I worked on my own. I decided to take a step back and to immerse myself in something new: unsupervised speech recognition.
Why did you choose this particular area?
That’s what my gut was telling me. Nowadays, if you want a good system, you need a huge amount of data. And those data have to be prepared by people relying on human intelligence. If you want to work on a speech recognitions solution, you need hundreds or even thousands of audio recordings transcribed by humans. But what if didn’t have to transcribe the speech? What if we could record what’s on the radio? Would the computer learn to recognize the speech just as children do? Children don’t read any transcriptions, they learn a language by ear. Could the models do that too? In that area, the pressure from the industry is not so big and the number of competitors is smaller, although you must be aware that for example Facebook is working on that too.
But why would you want to do that? The way it is done now is satisfactory, isn’t it?
It is but you have to pay people for doing it, which makes the system quite costly.
Tagging millions of pictures and describing what they contain is very lucrative. But the necessity to gather data constitutes a serious limitation. It’s not a big deal if data can be prepared by average people. The real problem begins when the job can be done only by specialists, as in the case of medical image recognition.
A new profession has emerged in China: people describe objects for artificial intelligence.
Not only in China. Also in Africa. It’s a good job…
Is working for artificial intelligence an OK job?
Yes. Almost everyone can do that. You just need to know how to tag cars, pedestrians, roofs, swimming pools… Everyone can co-develop artificial intelligence systems.
There is something paradoxical about it.
That’s why I would like to escape that. I would like to create a system that would be able to learn by itself how to interpret what it sees. But it’s also very fascinating and challenging from the research perspective: how to achieve the same goal without tags? How to construct a model capable of recognizing the world without referring to any descriptions? We, people, learn differently. So it looks like artificial intelligence is not so intelligent after all. It still requires our help. Bearing that in mind, we should be aware that it won’t solve the problems we can’t solve.
How to construct a model capable of recognizing the world without referring to any descriptions? We, people, learn differently
To me, it’s just the beginning of a long journey. I wish the machines could learn by themselves. I would like them to be able to scan the world around them and to use the pictures gathered to draw relevant conclusions. For example about what is happening on the road when we’re driving a car. People see objects and make assumptions about how a specific class of objects might behave. We assume that cars will stay on the road. We assume that pedestrians will either stay on the sidewalk or cross the street. We try to anticipate their movement. It would be fantastic if computers could analyze pictures with certain objects and figure out what these objects are and what they will do.
What texts do you use for training purposes?
Handwritten logbooks kept by the captains of the Dutch fleet exploring Australia and Tasmania in 17th century.
Why is handwriting so important for you?
Because many AI teams are now trying to use untagged data. The best models that can understand a language have been being constructed for about a year and a half with the use of tens of gigabytes of text. It’s really a lot – more than we will ever see in our entire life. These models can see almost everything in the Internet and use that to create their own vision of sentences and content. They are able to learn which sentences mean the same thing and which sentences are contradictory. As far as a language is concerned, this has already happened. In terms of speech processing, you can say that these systems are almost ready for use. But, as far as I know, no one has achieved anything similar for handwriting. We’re still working on it as, for now, we are not satisfied with the results.
*Jan Chorowski, PhD, is the AI Head in NavAlgo and an assistant professor at the Faculty of Mathematics and Computer Science , University of Wrocław. He got his master’s degree from the Wrocław University of Science and Technology, took his PhD from the University of Louisville, and obtained his habilitation degree from the University of Wrocław. He collaborated with numerous research teams, including Google Brain, Microsoft Research and the Yoshua Bengio Laboratory, University of Montreal. He was also the leader of the research team during JSALT 2019 workshops organized by the Johns Hopkins University. His professional interests include use of neural networks in intuitional problems and in problems that are easy for the humans but difficult for the machines, such as speech and natural language processing. In NavAlgo Chorowski has been developing artificial intelligence solutions for objects in motion.