One of the great emerging fields of computing is the use of big data and machine learning. This is a process whereby large datasets actually teach computers to do things like translate text, interpret human speech, categorize images, and so on. The problem with this is, so far, it requires large amounts of data and a lot of computing power.
The paradigm is largely opposed to the types of computing people would prefer to do and use. We would rather not send our voice data out to the Internet or have the Internet always listening or watching us to get these benefits of machine learning. But while the advances in technology will allow for us to crunch the data on smaller devices, it will be difficult to have the corpus of data needed for training and use.
It remains to be seen whether smaller datasets or synthesized datasets (where a large dataset is somehow compressed or distilled into the important parts) will emerge. So how do we get big data in our relatively small computers?
It is likely that the problem will provoke the emergence of more distributed systems, something many have wanted and waited for. Distributed systems or collaborative computing allows your computer(s) to participate in computing larger datasets. Projects like the Search for Extra-Terrestrial Intelligence (SETI) have used such distributed computing for over a decade.
The main challenge will be finding ways to break up data to send to the distributed system that protect privacy. That is, if you send the whole voice capture to the distributed system (as you do, AFAIK, with cloud services like Apple’s Siri), you risk the same privacy issues as with the cloud model.
Instead, it should be possible to break up inputs (video or audio) and send portions (possibly with some redundancy, depending on e.g., if word breaks can be determined locally) to several systems and let them each return only a partial recognition of the whole.
It also remains to be seen whether this piecemeal approach will be as functional as the whole-system approach in all cases. While this splitting undoubtedly takes place in whole-systems like Siri, the reassembly and final processing surely takes place over the whole input. That final step may not be easily managed over a distributed system, at least not while protecting privacy.
Consider asking, “what is the time in Rome?” which might be processed as slightly off, due to pronunciation, “what is the dime in Rome?” In a whole-system approach it’s likely easier to infer
dime → time at some late step, rather than if each hands back a partial result and the final recipient has less knowledge of how it was made. In a question case like that, the final text is likely targeted to a search engine, which will correct (though it could take the question literally and say, “It is the €0.10 coin.”).
For situations where the voice command lends insufficient context for local correction, it could be a greater challenge.
The good news is that it does look like it’s possible for us to have these distributed systems replace proprietary cloud solutions. The questions are when and how they will emerge, and where they might be weaker.