General enquiries :
+44 (0)20 7602 6000

How open source is transforming data science

Tuesday 21 May 2019 AIData Insight & AnalyticsMarketing Technology

John Tansley's picture
By John Tansley

Machine learning is today becoming more and more part of everyday life, with home assistant boxes recognising our voices, websites pushing us highly customised recommendations, and face recognition at airports. All these developments seem to have come together very quickly over the space of a couple of years. The rise of these innovations is inextricably linked with the rise of open source approaches. Today, we have access to a powerful range of machine learning and data science tools that we didn’t have a few years back and intriguingly, most of them are free. How did this happen? Why are things both better and cheaper? In this post, I’ll dig into why I feel open source has enabled and supported this huge growth of powerful new machine learning approaches.

In this post, I’ll dig into why I feel open source has enabled and supported this huge growth of powerful new machine learning approaches.

 

What is open source?

Open source covers any software for which the source code (the programming instructions that make up any piece of software) can be viewed by anyone.

This doesn’t necessarily mean that the software is free of charge, although that is often the case as well. What this openness does do, however, is enable sharing and collaboration in the creating and updating of the software. It is this collaboration enabled by open source that has led to open source tools becoming part of our daily lives. The internet is run largely on open source Linux servers, while 2 billion Android devices have reshaped the way we communicate. Bitcoin, currently holds around £70 billion. To ensure the absolute security of this vast amount of money, the core Bitcoin code has to be open source in order to be completely transparent, trusted, and verifiable. The open source approach isn’t purely limited to software, in April 2019 Toyota open sourced 24,000 of their patents on hybrid cars, to help stimulate further innovation across the industry.

 

Who contributes to open source?

One major advantage of open source is the huge number of contributors and developers. GitHub, probably the most commonly used platform for sharing open source code had around 2 million contributors in 2017. Interestingly, 24,000 contributors came from large IT companies such as Google, Microsoft, IBM or Amazon. This vast number of contributors ensures that the main open source projects are incredibly well maintained, tested, and updated. Individual contributors tend to be motivated to work on problems in which they have a personal interest. Often developers will create tools that they would like to use themselves, which ensures that projects are often well aligned with developer needs and current gaps in the market. However, it may seem harder to understand the rationale behind commercial companies contributing to open source. Surely the most commercially pragmatic approach would be to download any code of interest, make changes as needed and then keep that code closed to maintain a competitive advantage? However, this misses the fact that ongoing development by other companies will then pass you by. In order to get the very latest contributions and updates, it is generally in a company’s interest to submit and share their own improvements. This then ensures that they have access to both their own updates and the most up to date enhancements from the community.

 

Source control underpins open source software

One of the key developments that supports the development of open source tools is the central part played by source control tools such as GitHub. Source control is the platform that enables many users to contribute to the same knowledge base without causing too much confusion. All changes are always traceable and linked to a particular contributor. If you’ve ever ended up with countless slightly different versions of the same presentation on your laptop – you need source control! Source control has created a fundamental change in the way software is developed and shared.

One big advantage of source control is the way that it provides a centralised repository of knowledge. Before the widespread use of source control, there was often much duplication of effort in the implementation of new algorithms, often with many separate companies maintaining their own versions. With the advent of open source, we see this happening much less, with commercial software houses tending to increasingly link into open source algorithms rather than maintaining their own versions. This has a major effect on how open source algorithms are updated and improved: far more contributors are working on one core implementation, leading to a much faster innovation cycle. In fact, I would go as far as to say that the main advantage of open source is not being free, but enabling rapid cumulative improvement.

The seemingly unstoppable rise of data science over the last few years has been enabled by this collaborative approach of the open source community. Advanced techniques such as the natural language processing approach that assistants such as Siri or Alexa use to understand us are now taken for granted. It will be very exciting to see what other capabilities we’ll be taking for granted in a few years’ time.

 

What next?

In my next post, I’ll have a look at how these open source benefits can also be realised within your data science team, by using source control and notebook based techniques to work more effectively and collaboratively.

 

Find out more

If you’d like to find out more about how Forecaster leverages the power of open source to provide best of breed demand forecasting, get in touch.

Our resident data scientist, John Tansley, digs into why open source has enabled and supported a huge growth of new machine learning approaches.

How open source is transforming data science

Comments

zbqwvqdwess (not verified)

I love checking your internet site. Many thanks! https://shop4shoe.com

zbqwvqdwess (not verified)

I love checking your internet site. Many thanks! https://shop4shoe.com

카지노사이트추천 (not verified)

With good moderation and spam controls, and making sure that links in the comments section are “no follow” (which WordPress now does) you can protect yourself somewhat. It is important to take a serious rein on your comments section and be purposeful about it. Another reason Copyblogger provided for ending their comments was that the discussion was happening elsewhere, on social media. This is discouraging if you are trying to build social proof on your actual blog, and see social media comment streams as a form of “sharecropping” your content off of your blog property.

바카라사이트 (not verified)

A blog comments section might seem antiquated in the face of this new conversation. If you’ve been blogging a while, you’ve probably noticed that social media has meant two things for your blog comments: Comment systems, though, are evolving. Plugins that support social media integration, or a comment system like Disqus, help tie your blog’s comments into that social pulse. In that sense, you can bring that “outside” conversation back onto your own property.

카지노 (not verified)

Yes, some people (as in, you and me an others) aren’t doing the whole blog commenting thing the right way. It’s hard to have purely altruistic motives, sometimes. If you head into it at all thinking that you’ll get a link back to your site, you’re doing it wrong. You’ll be moderated, spammed, and possibly penalized.

바카라사이트 (not verified)

A comment section is for conversation. While I’m familiar with the feeling of enjoying a post, having nothing to say, but wanting to let the blogger know, it would be better to share the post on social media and say “this was a good post” rather than create acres of comments that say “good job.” Anyone else roll their eyes when trying to sift through “good job!” and “I agree!” comments to get to something meatier?

바카라사이트 (not verified)

We’re just there to argue and be jerks. Let’s admit it. We’ve all lost it in a blog comments section somewhere, and hang our heads in shame at who we became. How many times have you read a blog post where the author helpfully suggested a tech fix of some sort, just to be kind, and the comment section quickly fills with people (of all temperaments) wanting help from the author troubleshooting why it didn’t work? Sometimes we mistake a blog post on a particular topic as the place to go for help on that topic, and it isn’t.

Add new comment