What Exactly is R?
Last Updated on December 18, 2020 by Editorial Team
Author(s): Johar M. Ashfaque
R is perhaps one of the most powerful and most popular platforms for statistical programming and applied machine learning.
What isΒ R?
R is an open-source environment for statistical programming and visualization.
R is many things, which might be confusing atΒ first:
- R is a computer language.
- R is an interpreter.
- R is a platform.
Where R CameΒ From
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, as an implementation of the S programming language. Its development started in 1993. A version was made available on FTP released under the GNU GPL in 1995. The larger core group and open source project was set up inΒ 1997.
It started as an experiment by the authors to implement a statistical testbed in Lisp using a syntax like that provided in S. As it developed, it took on more of the syntax and features of S, eventually surpassing it in capability and scope.
The Key Features ofΒ R
R is a tool to use when you need to analyze data, plot data, or build a statistical model for data. It is ideal for one-off analyses, prototyping, and academic work but not suited to building models to be deployed in scalable or operational environments.
The Benefits ofΒ R
The three key benefits of usingΒ R:
- Open Source: R is free and open-source. You can download it right now and start using it. You can read the source code, learn from it, and modify it to meet yourΒ needs.
- Packages: R is popular because it has a vast number of very powerful algorithms implemented as third party libraries called packages. It is common for academics in statistical fields to release their methods as R packages, meaning that you have direct access to some state-of-the-art methods.
- Maturity: R is inspired by propriety statistical language S, using and improving the idioms and metaphors useful for statistical computing, like working in matrices, vectors, and dataΒ frames.
For more information about R packages, check out the CRAN (Comprehensive R Archive Network.). The Machine Learning & Statistical Learning view that lists packages for machine learning will be of great interest.
Difficulties withΒ R
There are three key difficulties with usingΒ R:
- Inconsistency: Each algorithm is implemented with its own parameters, naming conventions, and parameters. This can be very frustrating and requires deep reading of the documentation with each new package that youΒ use.
- Documentation: There is a lot of documentation, but it is generally direct and terse. The built-in help is rarely helpful, driving you constantly to the web for complete working examples from which you must derive your useΒ case.
- Scalability: R is intended for use on data that fit into memory on one machine. It is not intended for use with streaming data, big data, or working across multiple machines.
The language is a little obtuse, but as a programmer, you will have little difficulty in picking it up and adapting examples to yourΒ needs.
Who is UsingΒ R?
Commercial companies now support R. For example, Revolution R is a commercially supported version of R with extensions useful for enterprises such as an IDE. Oracle, IBM, Mathematica, MATLAB, SPSS, SAS, and others provide integration with R and their platforms.
The Kaggle platform for data science competitions and the KDnuggets polls both point out R as the most popular platform for a successful practicing data scientist.
Bio: Johar Ashfaque is a Trainee Clinical Coder and Data Scientist at NHS EKHUFT. Prior to this Johar was a Post-Doc at MPI-SWS in Germany. He earned his Ph.D. in Theoretical Physics (String Theory and Phenomenology) from the University of Liverpool and his BSc. from the University of Kent at Canterbury, UK.
What Exactly is R? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI