The R4GC Book
Learn Data Science with R: Notes and tutorials from the R4GC Community ‘Lunch and Learn’ meetups
16 December 2021
Raison d’être
This book is prepared as part of the R4GC Community skill enhancing and knowledge gathering exercise.
It aims at consolidating the knowledge-based that is being gathered by the R4GC Community spread across various R4GC community portals and discussions.
It also serves to illustrate - as always with source code - one of the most powerful features of R, which is the collaborative peer-reviewed development of data science codes and reports using R Markdown.
R4GC Community
R4GC Community (formerly called “Use R!” GCCollab community) was created in March 2021 to bring together the R users across the Government of Canada. Here we gather and curate the knowledgebase related to the use of R within the Government of Canada. Everyone is welcome to join, whether you are an advanced R user, just starting learning it, or simply want to learn more about data science and how it is done.
The idea to create this group came during the GC Data2021 Conference workshop on Data Engineering Challenges and Solutions: Demonstration of Shiny [^gcdata2021]. The highest voted question during the discussion there was : “How can I get more help for our members to enhance their knowledge and”spread the word" and raise more awareness regard to this tool?" The creation of this community group is the answer to this question.
By November 2021, the R4GC GCCollab group has been become one of the largest active data science practitioners groups in Canada, counting over a quarter thousand of members. The weekly “Lunch and Learn Data Science with R” meetups organized by the R4GC Community have been attended by data practitioners from over twenty government departments, and generated hundreds of questions/answers, a dozen of tutorials, multiple open to use applications, and thousands of line of open code.
“Lunch and Learn Data Science with R” meetups
“Building advanced Data Science skills using R, together - one meeting at a time!”
These informal meetings are organized weekly during Friday lunch time (from 12:05 to 12:55). There data scientists wanting to upscale their knowledge of R and other Data Science related subjects get together to show and discuss their R codes and share their data coding tricks and methodologies. Normally, each session is focused on a particular subject or project with the codes shared on GCcode.
No registration is required to join the meeting. However, in order to view the notes and video-recordings from these meetups, you need to join this sub-group: https://gccollab.ca/groups/about/7855030. For Agenda and Dial-in MS Teams numbers please see Group Events page at https://gccollab.ca/event_calendar/group/7391537.
Community portals
The R4GC community makes used of the following collaborative platforms provided by Shared Services Canada.
GCcode group - r4gc
URL: https://gccode.ssc-spc.gc.ca/r4gc
GCcode is the GitLab solution that is accessible from within the GC network. As such, it allows one to view and update (pull and push) codes and documentation with a single click of button on a GC laptop from an RStudio. The tutorial on how to do it is developed. The ‘r4gc’ group has been created within GCcodes, where the codes, tutorials and other resources are gathered. It contains three main folders:
/codes. - This is where “raw” (not-reviewed, unedited) R codes contributed by GC community are uploaded. Currently, this includes codes for analyzing and visualizing PSES (Public Service Employee Survey) results, ATIP requests, COVID-19 statistics, and various codes for ease of day to day work and maintenance. Some codes are readily available to become packages, some are short code snippets taken from various blogs, question and answer portals, such as www.stackoverflow.org and www.rseek.org, and open-source textbooks.
/gc-packages. - This is where the work on packages being developed from the submitted “raw” codes is happening. Currently it includes repositories for building packages to process PSES results, COVID-19 data, and the utility functions package for data engineering and efficient data processing.
/resources. - This where the rest of knowledge-base is gathered, including the tutorials, slides, and codes presented at the community weekly ‘Lunch and Learn’ meetups.
GCcollab: R4GC (Use R)
URL: https://gccollab.ca/groups/profile/7391537/enuse-rfruse-r
GCcollab allows one to participate in the discussion from within and outside GC network (for registered users). This makes it convenient for gathering information from any sources, including those that my not be available from within the GC network. In order to facilitate the curation of knowledge, a number of discussion threads have been created there to address of topics of highest interest for the R4GC community. These are reviewed and updated regularly, commonly as part of community weekly meetups.
GCwiki: UseR!
URL: https://wiki.gccollab.ca/UseR!
This platform is used to consolidate all discussion topics in one place and link them with other data science resources in the wiki space.
GitHub: open-canada
URL: https://open-canada.github.io/UseR/
Inline with the GC policies of open science and open data , since most information gathered by the R4GC community is unclassified and comes from public domain, a public facing organizational account has been created on GitHub (https://github.com/open-canada) for sharing and growing the R4GC community knowledgebase. This is where public-facing community outputs are gathered, including the growing collection of Web Apps and codes that were built with contributions from GC data scientists using open source tools and data, located at https://github.com/open-canada/Apps
Community presentations
International Methodology Symposium (October 2021)
The work of the R4GC community was presented at the 2021 International Methodology Symposium organized by Statistics Canada in October 2021. The slides for this presentation are available here: in English and en français.
GC Data Conference (February, 2021)
The earlier work of community members was also presented at the 2021 GC Data Conference Data Literacy Fest Workshop “Data Engineering Challenges and Solutions: Demo of Shiny” in February 2021. The video-recording of this workshop was made available by conference organizers and is available on YouTube.
Book source code
This is book is built using the bookdown R package in RStudio. It is hosted at Open Canada GitHub repo https://open-canada.github.io/r4gc. The source code of it located at https://github.com/open-canada/r4gc.
Thus built, the book enables easy collaboration, transparency and peer-reviewing.
Disclaimer
The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any agency of the Government of Canada.
Contributors
R4GC is a collaborative effort of many people who have contributed to the development of the knowledgebase that is gathered in this book. They are listed below.
Jonathan Dench, Joseph Stinziano, Henry Luan, Eric Littlewood, Philippe-Israel Morin, Tony Machado, Maxime Girouard, Martin Jean, Tim Roy, Mehrez Samaali, Sylvain Paquet, Dejan Pavlic, Utku Suleymanoglu,
Additionally, much support has been also received from wider international R community through stackoverflow.org portal and knowledge-sharing events organized by the RStudio, as well as from several other Government of Canada employees who remained anonymous.
Their help is greatly appreciated.
Licenses
This book is licensed under the Creative Commons Attribution 4.0 License, and is (and will always be) free to use.
Key principles
Open data, open source, open science
This book contains only the information and knowledge that was obtained from public domain, using the Open Government principles
Chatham House Rule
This book is prepared using the Chatham House rule. The Chatham House Rule helps create a trusted environment to understand and resolve complex problems through dialog and timely open communication. Its guiding spirit is: share the information you receive, but do not reveal the identity of who said it. Hence, no attributions are made and the identity of speakers and participants is not disclosed. It is based on the views and codes contributed by community members as part of ongoing community events interactions. Offered as a means to facilitate the discussion, the document does not constitute an analytical document, nor does it represent any formal position of any organization involved.
How to contribute
Any chapter of this book can be edited by simply clicking on the “edit” button, which will lead to the corresponding source Rmd file in the book’s repo, where you can make a change in the document (in doing so, this repo will forked at your github account ) and submit ir to the book editor (by submitting the merge request). Alternatively, you can always contact the R4GC group lead at the contact listed below and attend R4GC weekly meetups.