This week I attended an artificial intelligence workshop organized by the company I work for and very nicely delivered by Richard Jarvis from DXC. Therefore, I think it would be nice to share some things I’ve learned there alongside with some personal thoughts on buzz words like, artificial intelligence and machine learning.
First of all there is a huge misconception that those who are not math geniuses should probably stay away from this topic. I learned during the workshop that this is simply not true. Sure, if you really want to dive deep into machine learning algorithms researches you need to know mathematics. Fortunately, the world won’t probably need as many researchers as people who are aware how things work and that are able to build applications and bring value to users by relying on the research that’s already been made. Why I say that? Because there are already plenty of machine learning algorithms that are freely available out there and other payed platforms that offer machine learning as a service to anybody interested. During the workshop we used H2O Flow which is open source. Machine learning as a service is also offered by Microsoft Azure, AWS, IBM and many more. Basically all major tech companies offer plenty of opportunities to play around with machine learning algorithms, without the need to architect and develop those algorithms yourself.
So if machine learning algorithms are at our fingerprints, why is it not so obvious how to use them in a meaningful way? The answer is easy: data. You simply can’t do machine learning without reliable data and that’s today where we would put most of the work. A good thing is that we can also rely on the research that’s been done in the industry for data sets. For the workshop we used the “Climate change” data set on Kaggle. On the platform you can find hundreds of data sets that you can play with. Also Microsoft has recently published most of the data sets that were used during 20 years of research in the field. Different universities also have published data sets that were used during researches. So getting helpful data sets is not that challenging.
Data sets however need to be prepared in order to be fed to machine learning algorithms. During the workshop we wanted to use the mentioned climate change data set to predict temperatures. If you simply take the data set and feed it to a machine learning algorithm, the outcome won’t be that accurate. The first thing we did was to remove the rows with missing data and the predictions got better. However, still not goo enough. To improve the accuracy of machine learning models we can also manipulate the available data set by adding new columns, for instance with eventual values that can be calculated or deducted from the existing data. This type of new information is technically called “features”. The idea is that predictions become more and more accurate with the increase of different new and meaningful features for our purpose. On the climate change data set we added for instance a “months” column so that the algorithm can take months into consideration when predicting temperatures. I also dropped all the data older than 80 years, since the inaccurate measurements can be a problem. Therefore, preparing a data set for machine learning is the most time consuming task, because one would have to take all the aspects into consideration that could improve the outcomes.
Nowadays the IT industry is lacking developers. Few years from now the need for developers will decrease but the need for people skilled in data preparation for machine learning will probably increase exponentially. That’s why I think that developers nowadays should really try to plan for their future and get in touch with machine learning. Data already means a lot and will mean even more in the future. The ability to use data sets in a skilled way will open a lot new doors for those who possess that skill. Companies that strategically move towards machine learning and artificial intelligence will have a competitive advantage. Companies that will be able to create meaningful data sets will thrive. That’s why I think that data will be the new global currency. Not necessarily the data itself, but the skill to do something meaningful with a given data set.