3 minutes
RMarkdown: Working With Data and R Markdown
September 12, 2020
Generating documents from markdown is all well and good but one of the main draws of RMarkdown is the fact it can pull data from external sources.
While the main purpose of R Markdown is that it can run R in a document I personally rather working with python.
To run arbitrary python code you can make use of the python library called reticulate (installed in my R Markdown docker image).
Setup
First we need to run some R code to import the library we want and to setup the python virtual environment.
``` {r setup, include = FALSE}
library(reticulate)
virtualenv_create("my-proj")
py_install("matplotlib", envname="my-proj")
py_install("pandas", envname="my-proj")
use_virtualenv("my-proj")
```
Ok breaking this down we have the following:
-
{r setup, include = FALSE}
- The braces indicate that the language specified needs to be executed at build time and include = FALSE hides the results and the code of the code block. Use this for setup code. -
In R we load the reticulate library so we can use python.
-
We are creating a virtual env called my-proj. You can name this what you want, its not important because it is going to be created in the docker container and thrown away at the end of the build.
-
We are importing the matplotlib and pandas packages into python.
-
Finally we are telling python to use the my-proj virtual environment for the rest of this document.
Creating a Graph from Python
Next we can run a block of python code which will generate a graph we want to put on the page.
``` {python, echo = FALSE}
import matplotlib.pyplot as plt
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]
plt.plot(time, position)
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)')
```
Breaking this code chunk down:
-
The
{python, echo = FALSE}
as you have probably guessed executes python code in the block. Theecho = FALSE
is similar to theinclude = FALSE
above. Instead of hiding the block altogether it only hides the code but will display the result (in this case a nice graph). -
The remainder of the code is just a simple example of using matplotlib python library to create a graph.
Creating a table from Python
If you want to output a table from data gathered from a script you can do the following:
``` {python, include = FALSE}
import pandas as pd
mydata = [ {
"Id": 1,
"Message": "fooo"
},
{
"Id": 2,
"Message": "bar"
}
]
pandadata = pd.DataFrame(data=mydata)
```
```{r, echo = FALSE}
kable(py$pandadata, caption="Data from python")
```
Breakdown of the code above:
-
You will see 2 blocks of code. The python block and the r block. We are generating data in python and then using the r block to display it.
-
You will also see the use of the pandas package. This lets us create a pandas data frame which r can turn into a table using kable. I actually like this because it keeps the data and presentation a little seperate.
Importing a csv file into a table
This is probably the easiest of them all. You just need to add the following code block.
```{r, echo = FALSE}
kable(read.csv("./test.csv", header = TRUE))
```
Final thoughts
AS you can see what we can do with this is pretty much limitless. You also have the option of generating data in the file system as part of a script and then pulling it in via regular markdown.
RMarkdown Series
- 01 - Replacing MS Office
- 02 - Setting up R Markdown
- 03 - How to do common word tasks in R Markdown
- 04 - Generating presentations in R Markdown and Reval.JS
- 05 - Working with data and R Markdown
- 06 - Generating flow charts
- 07 - Creating books in Bookdown
- 08 - Misc other tools
- 09 - Co-operating with other people