Image, text and speech recognition, stock prediction, disease detection, error reduction, human extension – the field of machine learning is wide and vast. While it has been around since the fifties, we are reaching the point when is taking part in our lives. It is transforming into something more routinely, growing all around us without giving notice. This seems to come out of science fiction but is real and with the current technological advances goes with us all the time.
Just think about your smart mobile device: it knows the music you prefer, tells you the best routes to go home, listens to voice commands, orders images by the people and places in them. It reads and helps you to write messages, texts or emails. Not only confirmed by that, it connects with a smart car that drives itself to the airport for us to take an airplane, almost completely controlled by machines, to go on a long vacation trip to our favorite place to relax and think of something else.
It takes time and training for a machine to learn how to be useful or achieve a simple human task. That is one of the principal reasons the progress has been slow so far but that is changing remarkably fast, specifically with mobile devices like the iPads and iPhones. The hardware architectures are getting more impressive and the work that 50 years ago took months to do in top computers now takes hours or even minutes to accomplish. Apple is aware of this and is releasing tools for us developers to take advantage of this and create new and impressive experiences.
Since iOS 10 hit the market we were able to get our hands over Siri speech recognition, a sound type machine learning algorithm in which the device records some audio and then classifies it into text. That text can latter be processed in detail to get more out of it, much in the same way in which Siri gives some sense to it and tells you a joke.
Based in how a human brain work, a neural network consists of a net of interconnected neurons that process electrical signals which create some meaning. For example, my friend José says -hola- and my eardrum will transform the vibrations in the air and will send signals to the brain that will be processed and generate the thought that he is greeting me. What it appears to be a simple task, requires me to already know a few things that I learned with time, like understand what a word means or how it is pronounced in another language. In a similar manner machines need to do the same. They need to create connections between sounds, words and languages and they do that through a series of mathematical operations to each neuron and the connections between them.
We use weights to measure the relation between neurons and we use biases to learn from the error they generate while interacting. The machine learning process is made in an operation called gradient descent that what it does is to approximate an input to an expected output by measuring the error in its prediction. The input could be an image of a dog and the output could be any type of animal, the gradient descent process will iterate a number of times and with each iteration will be improving the result, bird, snake, horse, cat, dog.
By this time Apple has released 2 new APIs, BNNS and metal CNN to create neural networks, both provide us with operations mostly focused to process images through convolution. They are limited to infer knowledge by using pre-trained models and not to train new ones, an advantage that machines have is that they can transfer their knowledge, weights and biases, in the same way you copy and paste a file. If you pre-trained a machine with hundreds of thousands of images to distinguish a variety of objects you can transfer that knowledge to a smartphone and use it. The result would not be as accurate but would be good enough to do the job with the interpretation of your data; You can still try to do your own backpropagation and gradient descent operations to reduce the error if you want your application to learn something but it will be computationally costly and not very efficient. I am pretty sure apple is already working on some solutions to this but in the meantime, we can only wait or do it ourselves.
The first API, BNNS, stands for Basic Neural Network Subroutines, is optimized to take full advantage of the CPU. It has a series of activation functions and network filters. The activations are made to calculate the value of each neuron through linear, sigmoid, tanhs, abs activation functions and filters to process fragments of images to extract features. The most common operations for doing this and are the ones provided in the API are max pooling and average pooling which calculate the max or the average values in groups of image pixels that create what the machine understands as a feature.
In a very similar way the second API (Metal Performance Shaders CNN or Convolutional Network), has some variations, but the most important difference is the use of the GPU over the CPU to increase the performance of the network. This is because the data is transformed into vectors. Vector operations are more efficient using GPUs. With both APIs you can accomplish similar tasks but metal is optimized for real time computer vision so it will depend, does it need to process images from the camera in real time? Then use metal or, it will take an image from your library to do something with it? You can use BNN, It will be up to you and your project, interpret numbers or text from pictures of handwritten stuff. It can create a separate folder for your images, estimate the change in the Apple stock trough a year, generate random jokes or poems, and possibly generate a new song.
Only time will tell what we can accomplish but we sure are living in interesting times and is becoming an exciting experience to do and create all kinds of stuff.
Guillermo Irigoyen, iOS Computer Systems Analyst en AT&T, CTO and Co-Founder at Darc Data. Guillermo has 8 years of experience in software development, started as a multimedia artist, focussing in the building and programming of art installations for museums and events with programming languages like JAVA, C and Python controlling Linux boxes, Raspberries, Arduinos and servers.
QA Manager at Ubertesters
What’s the best way to make machine learning code/scripts testing for iOS and Android? To reply to this question, I’ll divide my overview to 3 main short points.
Machine learning technology has to deal with the big and uncertain amount of data.We are facing the problem when ML model doesn’t have a right or wrong result (“0” or “1”) as in the traditional software.
Not to fail during the testing sessions we should keep in mind the ML scenarios and permanently ask ourselves:
Have a question about ML testing? Send your e-mail to Alex directly!
Get in touch, fill out the form below, and an Ubertesters representative will contact you shortly to find out how we can help you.
Want to see the Ubertesters platform at work? Please fill out the form below and we'll get in touch with you as quickly as possible.
Fill out a quick 20 sec form to get your free quote.
Please try again later.