• Raison d’être
    • R4GC Community
      • “Lunch and Learn Data Science with R” meetups
      • Community portals
      • Community presentations
      • International Methodology Symposium (October 2021)
      • GC Data Conference (February, 2021)
    • Book source code
    • Disclaimer
    • Contributors
    • Licenses
    • Key principles
      • Open data, open source, open science
      • Chatham House Rule
    • How to contribute
    • About authors
      • Contacts
  • Français
  • Preface
    • Introduction
    • Data as ‘electricity’ of the 21st century
    • Data Engineering challenge
    • Towards open source and open science
    • Pedagogical approach
    • Book outline
  • I General discussions
  • 1 Why R?
    • 1.1 Top 10 R advantages for Data Science
    • 1.2 Tackling the limitations of R
    • 1.3 R vs. Python
  • 2 Learn R: Right way!
    • 2.1 Intro
    • 2.2 In which sense ‘Right’?
    • 2.3 R101 from “R4GC Lunch and Learn”
    • 2.4 Additional resources
      • 2.4.1 Ways to continue growing your R skills
      • 2.4.2 Selected presentations
  • 3 Great Open Source Textbooks
    • 3.1 By presentation
      • 3.1.1 In two-column format
      • 3.1.2 In three-column format
    • 3.2 By Topic
      • 3.2.1 Introductory and General
      • 3.2.2 Specialized
  • 4 Great Open Source Tutorials
    • 4.1 From RStudio
    • 4.2 From Data Carpentry
    • 4.3 More from GitHub
  • 5 Events and forums for R users
    • 5.1 From RStudio
      • 5.1.1 https://resources.rstudio.com.
      • 5.1.2 RStudio Community Meetup
    • 5.2 R blogs
    • 5.3 R conferences
    • 5.4 Related journals:
      • 5.4.1 Academic Journals:
    • 5.5 RElated conferences
      • 5.5.1 Non-academic
      • 5.5.2 Academic Conferences:
      • 5.5.3 IEEE Conferences
    • 5.6 Related communities in GC
      • 5.6.1 GCConnex (GC only):
      • 5.6.2 GCcode (GC only):
      • 5.6.3 GCcollab (public with login):
      • 5.6.4 GCwiki (public):
      • 5.6.5 GitHub (public):
      • 5.6.6 Other Canada groups
  • 6 Open government policies
    • 6.1 Open Source
      • 6.1.1 Open government
    • 6.2 Open Science
      • 6.2.1 Scientific Integrity
  • II Art of R programming
  • 7 Use R efficiently with data.table!
    • 7.1 data.table vs. dplyr
    • 7.2 Extensions of data.table
  • 8 Python and R unite!
  • 9 From Excel to R
  • 10 Reading various kinds of data in R
    • 10.0.1 rio
    • 10.1 readxl and xlsx
    • 10.2 Discussion
  • 11 Other tips for Efficient coding in R
    • 11.1 Variable names !
      • 11.1.1 Code formating
    • 11.2 Code starter tricks
      • 11.2.1 Most important / useful libraries
    • 11.3 Efficient workflows
      • 11.3.1 Workflow: Data-first approach
      • 11.3.2 Workflow: Task/needs/algorithm-first approach
    • 11.4 Object oriented programming in R
      • 11.4.1 S3
      • 11.4.2 R6
    • 11.5 RStudio tricks
      • 11.5.1 Coding online
  • 12 Using R with GC data infastructure (gcdocs, AWS, etc)
    • 12.1 gcdocs
  • III Visualization and Reporting
  • 13 R Markdown for literate programming and automated reports
    • 13.1 Resources
    • 13.2 Automated generation of multiple PDF files
    • 13.3 Useful tricks and tips
      • 13.3.1 In RStudio editor
      • 13.3.2 Spliting Rmd in chunks
      • 13.3.3 Making good use of configuration yaml header in index.Rmd file
      • 13.3.4 Conditional execution of chunks
  • Welcome to Supervised Machine Learning for Text Analysis in R
    • 13.3.5 Automated compilation of Rmd files in GitHub using GitHub Actions
  • 14 ggplot2 and its extensions for data visualization
    • 14.1 Resources
      • 14.1.1 Plotly + R Shiny
  • 15 Shiny for Interactive Data Visualization, Analysis and Web App development
    • 15.1 Resources
  • 16 Interactive Outputs in R: plotly, Datatable, reactable
    • 16.1 reactable
  • 17 Geo/Spatial coding and visualization in R
    • 17.1 Resources
    • 17.2 Federal Geospatial Platform
    • 17.3 Tutorials
    • 17.4 Dealing with memory issues
    • 17.5 Canadian geo-data
    • 17.6 Code snippets
      • 17.6.1 Using simplemaps.com
      • 17.6.2 Using Google API
      • 17.6.3 Using tidygeocoder
      • 17.6.4 Using Open Street map
      • 17.6.5 Using Open Database of Addresses / Educational Facilities
      • 17.6.6 Getting Postal codes
  • IV Machine Learning and AI
  • 18 Data Engineering, Record Linking and Deduplication
    • 18.1 TL;DL
    • 18.2 Intro: What is Data Engineering
      • 18.2.1 Data Engineering vs. Software Engineering
    • 18.3 Data Engineering vs. ETL and ELT
      • 18.3.1 Taxonomy of Data Engineering tasks
    • 18.4 Useful packages
      • 18.4.1 Single variable
    • 18.5 0 > R base and data.table
      • 18.5.1 Description
      • 18.5.2 Examples
    • 18.6 1 >textclean
      • 18.6.1 Description
      • 18.6.2 Example
    • 18.7 2 > Package phonics
      • 18.7.1 Description
      • 18.7.2 Example
    • 18.8 3 > Package stringdist
      • 18.8.1 Description
      • 18.8.2 Example
    • 18.9 Multi-variable recording linking
    • 18.10 >> library(fastLink)
      • 18.10.1 Description
      • 18.10.2 Dataset
      • 18.10.3 Example
    • 18.11 >> Package RecordLinkage
      • 18.11.1 Description
      • 18.11.2 Datasets: German names 500 and 10,000
    • 18.12 > library(blink)
      • 18.12.1 Summary
    • 18.13 > library(reclin)
      • 18.13.1 Description
      • 18.13.2 Included Datasets:
    • 18.14 >> library(fuzzyjoin)
      • 18.14.1 Description
      • 18.14.2 Example 1: Joining with Common Mispelling
      • 18.14.3 Example 2: from datacamp
      • 18.14.4 Example 3: from stackoverflow
    • 18.15 > Package blink
      • 18.15.1 Description
      • 18.15.2 Datasets: German names 500 and 10,000
    • 18.16 Discusion - Other methods
  • 19 Text Analysis in R
    • 19.1 Open source textbook
    • 19.2 Plagiarism detection
    • 19.3 Related work at International Methodology Symposium
    • 19.4 Useful code snippets
      • 19.4.1 Basic cleaning : Remove accents (benchmarking)
      • 19.4.2 Text cleaning
      • 19.4.3 Extracting, re-ordering words in a string
      • 19.4.4 Automatically finding / removing common parts in strings
      • 19.4.5 Useful packages
      • 19.4.6 cleanText(text): clean text
      • 19.4.7 Convert Text to Date or Timestamp
      • 19.4.8 Transliteration & cleaning
  • 20 Statistical tests and mixed-effects analysis
  • 21 Machine Learning and Modeling in R
    • 21.1 Resources
    • 21.2 Additional references
  • 22 Deep Learning and Computer vision
  • 23 Simulation and Optimization
  • V Community Tutorials
  • 24 GCCode 101
    • 24.1 TL;DR
    • 24.2 Step 00: Connecting to GCCode and installing required soft.
    • 24.3 Step 0: Configuring Windows, Git and GitLab (tokens)
    • 24.4 Step 1: Find (or create) a GitLab project you want to contribute to.
      • 24.4.1 1.2 Using Command Line
      • 24.4.2 1.2 In RStudio
    • 24.5 Step 2: Using Branches (optional)
    • 24.6 Step 3: GCcoding from RStudio
    • 24.7 Related GC discussions and links:
  • 25 Packages 101
    • 25.1 Resources
      • 25.1.1 GC groups and repos
      • 25.1.2 Related tutorials
      • 25.1.3 How to contribute
      • 25.1.4 * R script to start:
    • 25.2 * Setup
    • 25.3 * Overall Workflow
      • 25.3.1
    • 25.4 * .Rbuildignore
    • 25.5 * License: use_mit_license()
    • 25.6 * DESCRIPTION
    • 25.7 * NAMESPACE
    • 25.8 * Examples and tests
    • 25.9 .. in MY_CODES
    • 25.10 * testthat
    • 25.11 * Documentation
    • 25.12 * Vignettes
    • 25.13 Delivering package
    • 25.14 Packaging and publishing w. pkgdown
    • 25.15 (optional) Rtools
  • 26 R101: Building COVID-19 Tracker App from scratch
  • VI Community Databases
  • 27 Accessing Open Data Canada databases
    • 27.1 With curl, fread, readxls
    • 27.2 With cansim package
    • 27.3 Via API using library(ckanr)
      • 27.3.1 Via API using library(“rgovcan”)
  • 28 Health-related databases
    • 28.1 Canadian Vitals Statistics Database
      • 28.1.1 Vital Statistics - Death Database
      • 28.1.2 Vital Statistics - Birth Database
    • 28.2 COVID-19 infection and vaccination related:
  • 29 Open Ontario Data
    • 29.0.1
  • 30 Performance-related databases
    • 30.1 PSES Results interactive analysis and visualization:
      • 30.1.1 Other PSES tools and dashboards
    • 30.2 TIP requests dataset
    • 30.3 Geo-mapped current, historical and predicted border wait times:
  • VII Community Codes
  • 31 Lunch and Learn notes
    • Interactive Outputs in R without Shiny
    • 31.1 Geo/Spatial coding and visualization with R. Part 1:
    • 31.2 Text Analysis with R. Part 1:
    • 31.3 Dual Coding - Python and R unite !
    • 31.4 Working with ggtables
    • 31.5 Automate common look and feel of your ggplot graphs
    • 31.6 Automated generation of report cards
    • 31.7 Discussed RStudio Webinars
  • 32 Canada-related Open source R codes and packages
    • 32.1 Packages
      • 32.1.1 Packages on CRAN
    • 32.2 Packages not on CRAN
    • 32.3 Codes on GitHub
      • 32.3.1 Expenditures and procurement
      • 32.3.2 Health and Environment
      • 32.3.3 Elections
      • 32.3.4 Ottawa
      • 32.3.5 Vancouver
      • 32.3.6 Other related packages
  • Published with bookdown/li>

The R4GC Book

32 Canada-related Open source R codes and packages

32.1 Packages

32.1.1 Packages on CRAN

One can go to https://packagemanager.rstudio.com/ to search and explore packages.

Install them from Tools-> Install Packages in RStudio, or directly using the line below:

install.packages('cansim') `

32.1.1.1 rgovcan - Easy Access to the Canadian Open Government Portal

https://github.com/open-canada/rgovcan

A R package to interact with the Open Canada API (see https://open.canada.ca/en/access-our-application-programming-interface-api), to search and download datasets (see Licence at https://open.canada.ca/en/open-government-licence-canada). It is our hope that we will be able to bring this package up to the standard of a ropensci packages (see this issue on ropensci/wishlist https://github.com/ropensci/wishlist/issues/27).

This package makes extensive use of ckanr to access the Canadian government’s CKAN REST API.

library("rgovcan")
dfo_search <- govcan_search(keywords = c("dfo"), records = 10)
dfo_search

# Another possibility is to start with a package id corresponding to an actual record 
id <- "7ac5fe02-308d-4fff-b805-80194f8ddeb4" # Package ID
id_search <- govcan_get_record(record_id = id)
id_search
id_resources <- govcan_get_resources(id_search)
id_resources
dfo_resources <- govcan_get_resources(dfo_search)

path <- "tmp/data/"
dir.create(path, recursive = TRUE)
govcan_dl_resources(id_resources, path = path)

32.1.1.2 CANSIM2R - Directly Extracts Complete CANSIM Data Tables

Extract CANSIM (Statistics Canada) tables and transform them into readily usable data in panel (wide) format. It can also extract more than one table at a time and produce the resulting merge by time period and geographical region.

32.1.1.3 cansim - Accessing Statistics Canada Data Table and Vectors

https://github.com/mountainMath/cansim

Searches for, accesses, and retrieves new-format and old-format Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. Optional caching features are provided.

32.1.1.4 cancensus

https://github.com/mountainMath/cancensus

See also censusmapper

32.1.1.5 https://github.com/mountainMath/tongfen

32.1.1.6 https://github.com/mountainMath/statcanXtabs

32.1.1.7 https://github.com/warint/statcanR

statcanR

Easily connect to Statistics Canada’s Web Data Service with R. Open economic data (formerly known as CANSIM tables, now identified by Product IDs (PID)) are accessible as a data frame, directly in the user’s R environment.

32.1.1.8 install.packages(‘canadamaps’)

Terrestrial maps with simplified topologies for Census Divisions, Agricultural Regions, Economic Regions, Federal Electoral Divisions and Provinces.

32.2 Packages not on CRAN

32.2.0.1 https://github.com/VLucet/rgovcan

rgovcan Easy access to the Canadian Open Government Portal

32.2.0.2 https://github.com/bcgov/canwqdata

canwqdata An R package to download open water quality data from Environment and Climate Change Canada’s National Long-term Water Quality Monitoring Data.

32.3 Codes on GitHub

32.3.1 Expenditures and procurement

32.3.1.1 https://github.com/nmarum/canadadefencespending

Data wrangling and exploratory analysis of publicly available data about National Defence expenditures and defence procurement.

32.3.2 Health and Environment

32.3.2.1 https://github.com/dbuijs/HealthCanadaOpenData

A project for importing open data from Health Canada’s website into R.

32.3.3 Elections

32.3.3.1 2019 Canadian election forecast

32.3.3.2 https://github.com/thisismactan/Canada-2019

This is the repository for the Election StatSheet 2019 Canadian election forecast.

32.3.3.3 2015 Canadian Election Data

https://github.com/lchski/canada-2015-federal-election-data

32.3.3.4 https://github.com/lchski/canada-2015-federal-election-data

2015 Canadian Election Data

32.3.3.5 FPTP Election Strategizer

“FPTP Election Strategizer: A data-science driven tool to mitigate the political biases of the First Past The Post electoral system in Canada” - Research article, with R code and visualizations in support of Fair Vote Canada

https://github.com/ivi-m/election-strategizer/

32.3.4 Ottawa

32.3.4.1 https://github.com/whipson/Ottawa_Bicycles

Data and cleaning scripts for Ottawa Bicycle Counter shiny app.

32.3.4.2 https://github.com/lchski/ottawa-fire-stations

Mapping Ottawa’s fire stations

32.3.5 Vancouver

Many codes, packages, and blogs related to Vancouver and Canada, and other utility functions, are developed by
MountainMath (Jens von Bergman) - Source: https://github.com/mountainMath

32.3.6 Other related packages

From jennybc

googlesheets Public Google Spreadsheets R API

googleComputeEngineR Public Forked from cloudyr/googleComputeEngineR