I use the reticulate R to Python interface quite regularly. Besides the yelling when a numeric is once again converted to float, the transfer-time of pandas DataFrames with their memory usage has been a regular topic.
To get some data and maybe a definitive answert to the question “What is the best way to transfer tabular data from Python to R?” I set up a pair of tiny scipts.
tl;dr: It depends. If you have relatively small tables (< 10.000 rows) returning pandas DataFrames will be your best choice. For anything over that size returning arrow Tables seems to be the way to go.
Data Transfer for data.frames / data.tables on the R side and Pandas DataFrames on the Python side.
Comparsion of transfer time for different sizes in rows and a fixed number of columns. They shall be filled with random numbers.
Sizes 20 columns of random integers between 1 and 100 with:
Use a minimal class that will generate random input and fill them as members.
Use 4 different methods to get the different sizes from python to R. This should get the generating done ahead of time. (Will repetition make it quicker?)
R: 4.2.0 with libraries:
Python: 3.9.12 with libraries:
If you want to check the results on your machine, you can find the code on github.