Julia on the Upswing: Why Data Scientists are Choosing Julia

By S G Rickman On Nov 18, 2022

In the ever-developing field of data science, the onus is on data scientists, to keep track of developments in algorithms, technology stacks, databases, and languages. One such development is a programming language called Julia, which has received a fair bit of attention in the past few years because of its high speed and ease of use.

What is Julia?

Julia, a newcomer to the programming languages for data science, is a high-level, general-purpose programming language, that was developed specifically for scientific computing. The developers of Julia, Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah, while coming from different backgrounds, were interested in the collective power of all programming languages. They wanted Julia to have the best of all the languages.

In short, Julia would be open-source with a liberal license, as fast as C, as general-purpose as Python, as statistics-friendly as R, easy to learn, and a compiled language. With that vision in mind, Julia’s first version went live in 2012.

Julia’s claim to fame

There are many reasons why Julia is preferable in the Computation and Machine learning (ML) world:

Free and Open Source: The license is held by MIT and the code is hosted on Git where everyone can view and make changes to it.
Parallelism: Julia was designed for parallel processing and provides primitives for parallel computing unlike Python and any other programming languages.
High execution speed: Julia matches the speed of C and FORTRAN, which are among the fastest languages.
Compatible with Jupyter: It is compatible with Jupyter and many other IDEs such as VS Code and Vim.
Tailored for ML: It does not require external packages (such as NumPy for Python) for ML calculations. ‘Vanilla’ Julia supports matrices and equations.

Julia for Data Science

Julia compared to Python and R

Julia was built to provide the best of what pre-existing languages offered. Python and R are the most widely used languages for ML, statistical analytics, and data visualization. Together, they have been ruling the data world, casting a shadow on other similar languages. But Julia has distinguished itself from the pack and has slowly been moving towards the light. It’s important to understand how Julia compares to the language giants:

Benchmark time normalized against the C implementation

Source: https://julialang.org/benchmarks/

Speed and Performance:

Using C as the benchmark for the fastest language, Python is slower than C and, R is slower than Python. Julia’s execution time, however, is comparable to that of C’s. This is because Julia is a compiled language whereas R and Python are interpreted.

Sources/Libraries:

A vast number of libraries and APIs are available for Python, whereas a lesser number is available for R. Being one of the new languages, there are limited libraries and APIs available for Julia.

Community Support:

Python has a very large developer community and community support, whereas R has a comparatively smaller developer community. Julia, being in the initial stages, has a much smaller but growing developer community.

Machine Learning Support in Julia

Common libraries

• GmmFlow.jl

• Clustering.jl

• QuickShiftClustering.jl (Hierarchical clustering)

• MultivariateStats.jl (PCA)

Julia has vast support for a range of problems in Machine Learning such as supervised learning, classification, regression, unsupervised learning, cluster analysis, dimensionality reduction.

It also has support for Deep Learning algorithms – ConvNet, TextRNN and many more.

Pros and Cons of Julia

Pros:

1.Julia’s speed and ease of implementation certainly makes it a desirable programming language for data science.

2.It has an intuitive syntax just like Python.

3.It has multiple wrapper libraries on top of Python libraries and a functionality to call Python functions.

4.It has support for Machine Learning algorithms.

Cons:

1.While its community support is not great, it is developing steadily.

2.Some wrapper libraries such as Pandas have slow execution in local Jupyter.

3.It has high initial compile time for imported libraries, and sometimes requires multiple libraries to perform a single task. For e.g., reading a csv as dataframe requires 2 libraries: DataFrames and CSV.

4.Some deep learning functions don’t have the same flexibility in parameter tuning as that of Python counterparts.

Julia on the rise

Julia was developed specifically for scientific computing. Since it went live, it has seen a wide range of applications across multiple industries. NASA has been using it to model animal, plant, and human migration patterns and their responses to climate change. BlackRock, one of the largest asset management companies, has been using Julia for time series data analytics and big-data applications. Even MIT has used Julia to program robots to climb stairs and walk on hazardous, difficult, and uneven terrain.

The rise of data and data science has been exponential thereby increasing the importance of faster and simpler programming languages. Julia has a few more miles to go in developing its data science ecosystem i.e., documentation, community support, libraries, and packages but does great in terms of speed. Julia can potentially reduce time-to-market in places where code execution time is the major roadblock. It can also be experimented in places where simple ML algorithms are used, or complex computations are performed as the community support is good for basic algorithms. Julia is evolving steadily and is a language to watch out for data science.

References

Resources

1.Getting started – https://docs.julialang.org/en/v1/manual/getting-started/

2.ML Library – https://fluxml.ai/Flux.jl/stable/

3.Time series – https://discourse.julialang.org/t/simple-flux-lstm-for-time-series/35494

4.Sample Problems – https://github.com/FluxML/model-zoo

5.Julia docs – https://docs.julialang.org/en/v1/

Author:

Vedang Dalal, Lead Analyst, Merkle

The post Julia on the Upswing: Why Data Scientists are Choosing Julia appeared first on Analytics Insight.