Jekyll2021-11-05T10:42:55+00:00https://rinprod.com/feed.xmlRinProd.comRinProd.com is a resource aimed specifically at those looking to run the R statistical programming language in enterprise environments and is licensed under the Creative Commons Attribution 4.0 International License.Writing Production R2019-03-01T01:10:33+00:002019-03-01T01:10:33+00:00https://rinprod.com/writing-production-r<p>Hopefully you’ve realised by this point that this site isn’t so much about R as it is the R ecosystem and infrastructure around R. So it’s not the best place for information on writing production R. If that’s what you’re interested in though, the broader R community has you covered.</p>
<p>Here are some links to the best resources for writing better R code:</p>
<h1 id="absolute-beginners">Absolute beginners</h1>
<ul>
<li><a href="https://fg2re.sellorm.com">Field Guide to the R Ecosystem</a>: a short guide for those new to the R ecosystem and targeted specifically at Ops teams and managers.</li>
</ul>
<h1 id="starting-out">Starting out</h1>
<ul>
<li>Garrett Grolemund and Hadley Wickham’s “<a href="https://r4ds.had.co.nz/">R for Data Science</a>” is a very popular text for new users. It’s so popular in fact that it’s spawned it’s own <a href="https://www.rfordatasci.com/">R for Data Science Online Learning Community</a> of learners and mentors helping newcomers with the language and working on problems together.</li>
</ul>
<h1 id="version-control">Version Control</h1>
<ul>
<li>Jenny Bryan and team’s “<a href="https://happygitwithr.com/">Happy git with R</a>” is a fantastic starting point for using git (and specifically GitHub.com) for version control.</li>
</ul>
<h1 id="more-advanced-users">More advanced users</h1>
<ul>
<li>The “<a href="https://csgillespie.github.io/efficientR/">Efficient R Programming</a>” book, by Colin Gillespie and Robin Lovelace is an excellent resource for squeezing maximum performance from the language.</li>
<li>Hadley Wickham’s “<a href="http://adv-r.had.co.nz/">Advanced R</a>”, is hugely well regarded exploration of some more advanced topics.</li>
</ul>
<h1 id="shiny">Shiny</h1>
<ul>
<li><a href="https://kellobri.github.io/shiny-prod-book/">Shiny in production</a> (Kelly O’Briant and Sean Lopp): a supplement to the ‘Shiny in Production’ 2 day workshop delivered at RStudio::conf 2019.</li>
</ul>
<h1 id="other-resources">Other resources</h1>
<p>If you’d like to suggest other useful resources, please <a href="https://github.com/rinprod/rinprod.com/issues">raise an issue</a> in this site’s GitHub repo.</p>Hopefully you’ve realised by this point that this site isn’t so much about R as it is the R ecosystem and infrastructure around R. So it’s not the best place for information on writing production R. If that’s what you’re interested in though, the broader R community has you covered.The talk that started it all2019-01-31T00:40:33+00:002019-01-31T00:40:33+00:00https://rinprod.com/the-post-that-started-it-all<p>These slides were originally presented to an audience of R users by <a href="https://twitter.com/sellorm">Mark Sellors</a> at <a href="https://rstudio.com/conference">RStudio::conf</a> 2019 in Austin, Texas.</p>
<p>A video of the talk is available on the <a href="https://resources.rstudio.com/rstudio-conf-2019/r-in-production">RStudio website</a>.</p>
<h2 id="slide-1">Slide 1</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-00.jpg" alt="" /></p>
<p>In this talk we’ll look at some techniques for getting R running in production in your company.</p>
<p>Many R users find it difficult to get R outside of the data science bubble and into wider use within the business. Often, this is attributed to unhelpful IT departments, unwilling to adopt new approaches or methods. Whilst this perception is common, the reality is often that IT departments are not familiar with R and rarely have the bandwidth (or budget) to learn more about it.</p>
<p>In the slides that follow, we’ll take a look at the two main approaches that people use to successfully get R in production and begin deliver the benefits of R to the wider business.</p>
<h2 id="slide-2--3">Slide 2 & 3</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-01.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-02.jpg" alt="" /></p>
<p>There’s no magic formula to running any language in production and R is no different. You won’t find any weird tricks or any magic functions in this talk.</p>
<h2 id="slide-4">Slide 4</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-04.jpg" alt="" /></p>
<p>Our starting position is that all of the technical barriers to running R in production are (comparatively) easy to overcome, but that it’s the cultural issues that slow us down. It’s these cultural barriers we’re going to focus on here.</p>
<h2 id="slide-6">Slide 6</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-05.jpg" alt="" /></p>
<p>What is production anyway?</p>
<h2 id="slide-7">Slide 7</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-06.jpg" alt="" /></p>
<p>This is the sort of image that people will often think of when the word “production” is used. Large scale data centres running huge systems, but is that “scale” more than it is “production”?</p>
<h2 id="slide-8--9">Slide 8 & 9</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-07.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-08.jpg" alt="" /></p>
<p>Production is anything that is run <strong>repeatedly (or continuously)</strong> and is <strong>relied upon</strong>. Acknowledging this is the key to running any language in production. For those of us working on data products “relied upon” generally means that the outputs are used in a decision making process somewhere. Production systems can be relied upon by thousands of people, or a single person the scale is not important.</p>
<h2 id="slide-10">Slide 10</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-09.jpg" alt="" /></p>
<p>R is a great language to run in production. It’s mature, stable, has many existing production users, an extensive package ecosystem and it’s essentially become the lingua franca of data.</p>
<h2 id="slide-11--12">Slide 11 & 12</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-10.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-11.jpg" alt="" /></p>
<p>So, how do we get there? There are two main techniques that I’ve seen people use and I refer to these as the left-hand and right-hand paths.</p>
<h2 id="slide-13--14">Slide 13 & 14</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-12.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-13.jpg" alt="" /></p>
<p>The left hand path is the path of magic.</p>
<p>This simple technique works well, but can backfire. The basic goal is to impress a decision maker higher up in the organisation who can exert downward pressure on the business in general and the IT team in particular. If you can impress this person enough they’ll push through your project to allow you to run R in production.</p>
<p>The main issue with this approach is that it’s quite confrontational and is unlikely to make you any friends. That said, there are many examples of it working well and enabling the business to take this important step.</p>
<h2 id="slide-15--16">Slide 15 & 16</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-14.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-15.jpg" alt="" /></p>
<p>The other option is the right hand path.</p>
<p>This approach directly addresses the elephant in the room, getting R past your IT team and into
production.</p>
<p>The first thing we need to do here is clear up this IT-team-as-the-enemy trope. In the vast majority of
large organisations the IT team exists purely to enact the technical will of the business and the way it
works is an expression of that will. In general IT teams are intentionally slow to change and risk-averse.
As the gatekeepers of an organisation’s infrastructure the IT team – or ops, or devops or whatever they’re
called in your business – have the ultimate responsibility for the security of the business’s information.
As a consequence they have a natural tendency towards conservatism when it comes to the infrastructure that
they’re responsible for.</p>
<p>This responsibility often puts them at odds with the goals of a data scientist, which is generally to
access and leverage data. Given these conflicting priorities it is unsurprising that there is sometimes
friction between these groups.</p>
<h2 id="slide-17">Slide 17</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-16.jpg" alt="" /></p>
<p>Data science and software engineering aren’t the same thing, though there is significant overlap.</p>
<p>In well run data science teams I don’t usually expect to see much more than code reviews and methodological reviews - we must be sure the code does what we think it does and that the statistical methodologies used are appropriate for the task at hand.</p>
<p>Enterprise scale software engineering teams however, generally have many more hoops to jump through in their work and it is often siloed into very narrow bands to facilitate hand-offs between each activity. For example, a developer may be able to run automated and unit tests themselves but their code will often go to a specific test team for further testing, such as User Acceptance Testing or UAT, before passing on to the next stage in the release process. This release process tends towards the lengthy and extremely rigourous. In some organisations the length of time it will take to release even a simple application to their production environment is so long that even the thought of bringing in anything new can be extremely problematic.</p>
<h2 id="slide-18">Slide 18</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-17.jpg" alt="" /></p>
<p>These are just a few of the things you might need to get your head around when working to get R into a production setting.</p>
<p>To be clear, I’m not suggesting that all data scientists need to learn this stuff. For many there will be no need and for many others no interest - and that’s absolutely fine - what we’re talking about here is building a deeper understand of a different area of the business that you might have to work with. In some organisations this role can be taken on by specialist “R Admins” who understand the work of the IT team <strong>and</strong> the data science team and can act as a facilitator between the two.</p>
<h2 id="slide-19">Slide 19</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-18.jpg" alt="" /></p>
<p>Unless you work in a very small company, you’ll likely have to build bridges with other teams in order to land your work with R in production. Get to know these people and what drives them. There’s often more common ground than you might think. At the end of the day getting R into production is about ensuring <strong>confidence</strong> in your work and building bridges with other teams within the organisation.</p>
<h2 id="slide-20">Slide 20</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-19.jpg" alt="" /></p>
<p>Use this checklist as the basis for your own on what needs to be considered to get your work into production.</p>
<p>One of my favourites from this list is “Support”: Who will provide support to your application once it goes live? Do you want to receive support calls at 3am if something breaks or does someone else need to be trained up for that role?</p>
<h2 id="slide-21--22">Slide 21 & 22</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-20.jpg" alt="" /></p>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-21.jpg" alt="" /></p>
<p>If you can negotiate all of that and help the business to gain the confidence in your work that it needs you’ll make it to production - congratulations!</p>
<p>And if you do make it to production please share your experiences - production stories will help us all to raise our game and demonstrate the validity and utility of running R in production.</p>
<h2 id="slide-23">Slide 23</h2>
<p><img src="/static/2020-01-31-the-post-that-started-it-all/r-in-prod-22.jpg" alt="" /></p>
<ul>
<li><a href="https://fg2re.sellorm.com">Field Guide to the R Ecosystem</a>: a guide for those new to the R ecosystem and targeted specifically at Ops teams and managers</li>
<li><a href="https://kellobri.github.io/shiny-prod-book/">Shiny in production</a> (Kelly O’Briant and Sean Lopp): a supplement to the ‘Shiny in Production’ 2 day workshop delivered at RStudio::conf 2019</li>
<li><a href="https://github.com/ThinkR-open/companies-using-r">Companies using R</a> (Colin Fay/ThinkR): a great resource to see what others are already doing with R from ThinkR</li>
</ul>These slides were originally presented to an audience of R users by Mark Sellors at RStudio::conf 2019 in Austin, Texas.