• Raison d’être
    • R4GC Community
      • “Lunch and Learn Data Science with R” meetups
      • Community portals
      • Community presentations
      • International Methodology Symposium (October 2021)
      • GC Data Conference (February, 2021)
    • Book source code
    • Disclaimer
    • Contributors
    • Licenses
    • Key principles
      • Open data, open source, open science
      • Chatham House Rule
    • How to contribute
    • About authors
      • Contacts
  • Français
  • Preface
    • Introduction
    • Data as ‘electricity’ of the 21st century
    • Data Engineering challenge
    • Towards open source and open science
    • Pedagogical approach
    • Book outline
  • I General discussions
  • 1 Why R?
    • 1.1 Top 10 R advantages for Data Science
    • 1.2 Tackling the limitations of R
    • 1.3 R vs. Python
  • 2 Learn R: Right way!
    • 2.1 Intro
    • 2.2 In which sense ‘Right’?
    • 2.3 R101 from “R4GC Lunch and Learn”
    • 2.4 Additional resources
      • 2.4.1 Ways to continue growing your R skills
      • 2.4.2 Selected presentations
  • 3 Great Open Source Textbooks
    • 3.1 By presentation
      • 3.1.1 In two-column format
      • 3.1.2 In three-column format
    • 3.2 By Topic
      • 3.2.1 Introductory and General
      • 3.2.2 Specialized
  • 4 Great Open Source Tutorials
    • 4.1 From RStudio
    • 4.2 From Data Carpentry
    • 4.3 More from GitHub
  • 5 Events and forums for R users
    • 5.1 From RStudio
      • 5.1.1 https://resources.rstudio.com.
      • 5.1.2 RStudio Community Meetup
    • 5.2 R blogs
    • 5.3 R conferences
    • 5.4 Related journals:
      • 5.4.1 Academic Journals:
    • 5.5 RElated conferences
      • 5.5.1 Non-academic
      • 5.5.2 Academic Conferences:
      • 5.5.3 IEEE Conferences
    • 5.6 Related communities in GC
      • 5.6.1 GCConnex (GC only):
      • 5.6.2 GCcode (GC only):
      • 5.6.3 GCcollab (public with login):
      • 5.6.4 GCwiki (public):
      • 5.6.5 GitHub (public):
      • 5.6.6 Other Canada groups
  • 6 Open government policies
    • 6.1 Open Source
      • 6.1.1 Open government
    • 6.2 Open Science
      • 6.2.1 Scientific Integrity
  • II Art of R programming
  • 7 Use R efficiently with data.table!
    • 7.1 data.table vs. dplyr
    • 7.2 Extensions of data.table
  • 8 Python and R unite!
  • 9 From Excel to R
  • 10 Reading various kinds of data in R
    • 10.0.1 rio
    • 10.1 readxl and xlsx
    • 10.2 Discussion
  • 11 Other tips for Efficient coding in R
    • 11.1 Variable names !
      • 11.1.1 Code formating
    • 11.2 Code starter tricks
      • 11.2.1 Most important / useful libraries
    • 11.3 Efficient workflows
      • 11.3.1 Workflow: Data-first approach
      • 11.3.2 Workflow: Task/needs/algorithm-first approach
    • 11.4 Object oriented programming in R
      • 11.4.1 S3
      • 11.4.2 R6
    • 11.5 RStudio tricks
      • 11.5.1 Coding online
  • 12 Using R with GC data infastructure (gcdocs, AWS, etc)
    • 12.1 gcdocs
  • III Visualization and Reporting
  • 13 R Markdown for literate programming and automated reports
    • 13.1 Resources
    • 13.2 Automated generation of multiple PDF files
    • 13.3 Useful tricks and tips
      • 13.3.1 In RStudio editor
      • 13.3.2 Spliting Rmd in chunks
      • 13.3.3 Making good use of configuration yaml header in index.Rmd file
      • 13.3.4 Conditional execution of chunks
  • Welcome to Supervised Machine Learning for Text Analysis in R
    • 13.3.5 Automated compilation of Rmd files in GitHub using GitHub Actions
  • 14 ggplot2 and its extensions for data visualization
    • 14.1 Resources
      • 14.1.1 Plotly + R Shiny
  • 15 Shiny for Interactive Data Visualization, Analysis and Web App development
    • 15.1 Resources
  • 16 Interactive Outputs in R: plotly, Datatable, reactable
    • 16.1 reactable
  • 17 Geo/Spatial coding and visualization in R
    • 17.1 Resources
    • 17.2 Federal Geospatial Platform
    • 17.3 Tutorials
    • 17.4 Dealing with memory issues
    • 17.5 Canadian geo-data
    • 17.6 Code snippets
      • 17.6.1 Using simplemaps.com
      • 17.6.2 Using Google API
      • 17.6.3 Using tidygeocoder
      • 17.6.4 Using Open Street map
      • 17.6.5 Using Open Database of Addresses / Educational Facilities
      • 17.6.6 Getting Postal codes
  • IV Machine Learning and AI
  • 18 Data Engineering, Record Linking and Deduplication
    • 18.1 TL;DL
    • 18.2 Intro: What is Data Engineering
      • 18.2.1 Data Engineering vs. Software Engineering
    • 18.3 Data Engineering vs. ETL and ELT
      • 18.3.1 Taxonomy of Data Engineering tasks
    • 18.4 Useful packages
      • 18.4.1 Single variable
    • 18.5 0 > R base and data.table
      • 18.5.1 Description
      • 18.5.2 Examples
    • 18.6 1 >textclean
      • 18.6.1 Description
      • 18.6.2 Example
    • 18.7 2 > Package phonics
      • 18.7.1 Description
      • 18.7.2 Example
    • 18.8 3 > Package stringdist
      • 18.8.1 Description
      • 18.8.2 Example
    • 18.9 Multi-variable recording linking
    • 18.10 >> library(fastLink)
      • 18.10.1 Description
      • 18.10.2 Dataset
      • 18.10.3 Example
    • 18.11 >> Package RecordLinkage
      • 18.11.1 Description
      • 18.11.2 Datasets: German names 500 and 10,000
    • 18.12 > library(blink)
      • 18.12.1 Summary
    • 18.13 > library(reclin)
      • 18.13.1 Description
      • 18.13.2 Included Datasets:
    • 18.14 >> library(fuzzyjoin)
      • 18.14.1 Description
      • 18.14.2 Example 1: Joining with Common Mispelling
      • 18.14.3 Example 2: from datacamp
      • 18.14.4 Example 3: from stackoverflow
    • 18.15 > Package blink
      • 18.15.1 Description
      • 18.15.2 Datasets: German names 500 and 10,000
    • 18.16 Discusion - Other methods
  • 19 Text Analysis in R
    • 19.1 Open source textbook
    • 19.2 Plagiarism detection
    • 19.3 Related work at International Methodology Symposium
    • 19.4 Useful code snippets
      • 19.4.1 Basic cleaning : Remove accents (benchmarking)
      • 19.4.2 Text cleaning
      • 19.4.3 Extracting, re-ordering words in a string
      • 19.4.4 Automatically finding / removing common parts in strings
      • 19.4.5 Useful packages
      • 19.4.6 cleanText(text): clean text
      • 19.4.7 Convert Text to Date or Timestamp
      • 19.4.8 Transliteration & cleaning
  • 20 Statistical tests and mixed-effects analysis
  • 21 Machine Learning and Modeling in R
    • 21.1 Resources
    • 21.2 Additional references
  • 22 Deep Learning and Computer vision
  • 23 Simulation and Optimization
  • V Community Tutorials
  • 24 GCCode 101
    • 24.1 TL;DR
    • 24.2 Step 00: Connecting to GCCode and installing required soft.
    • 24.3 Step 0: Configuring Windows, Git and GitLab (tokens)
    • 24.4 Step 1: Find (or create) a GitLab project you want to contribute to.
      • 24.4.1 1.2 Using Command Line
      • 24.4.2 1.2 In RStudio
    • 24.5 Step 2: Using Branches (optional)
    • 24.6 Step 3: GCcoding from RStudio
    • 24.7 Related GC discussions and links:
  • 25 Packages 101
    • 25.1 Resources
      • 25.1.1 GC groups and repos
      • 25.1.2 Related tutorials
      • 25.1.3 How to contribute
      • 25.1.4 * R script to start:
    • 25.2 * Setup
    • 25.3 * Overall Workflow
      • 25.3.1
    • 25.4 * .Rbuildignore
    • 25.5 * License: use_mit_license()
    • 25.6 * DESCRIPTION
    • 25.7 * NAMESPACE
    • 25.8 * Examples and tests
    • 25.9 .. in MY_CODES
    • 25.10 * testthat
    • 25.11 * Documentation
    • 25.12 * Vignettes
    • 25.13 Delivering package
    • 25.14 Packaging and publishing w. pkgdown
    • 25.15 (optional) Rtools
  • 26 R101: Building COVID-19 Tracker App from scratch
  • VI Community Databases
  • 27 Accessing Open Data Canada databases
    • 27.1 With curl, fread, readxls
    • 27.2 With cansim package
    • 27.3 Via API using library(ckanr)
      • 27.3.1 Via API using library(“rgovcan”)
  • 28 Health-related databases
    • 28.1 Canadian Vitals Statistics Database
      • 28.1.1 Vital Statistics - Death Database
      • 28.1.2 Vital Statistics - Birth Database
    • 28.2 COVID-19 infection and vaccination related:
  • 29 Open Ontario Data
    • 29.0.1
  • 30 Performance-related databases
    • 30.1 PSES Results interactive analysis and visualization:
      • 30.1.1 Other PSES tools and dashboards
    • 30.2 TIP requests dataset
    • 30.3 Geo-mapped current, historical and predicted border wait times:
  • VII Community Codes
  • 31 Lunch and Learn notes
    • Interactive Outputs in R without Shiny
    • 31.1 Geo/Spatial coding and visualization with R. Part 1:
    • 31.2 Text Analysis with R. Part 1:
    • 31.3 Dual Coding - Python and R unite !
    • 31.4 Working with ggtables
    • 31.5 Automate common look and feel of your ggplot graphs
    • 31.6 Automated generation of report cards
    • 31.7 Discussed RStudio Webinars
  • 32 Canada-related Open source R codes and packages
    • 32.1 Packages
      • 32.1.1 Packages on CRAN
    • 32.2 Packages not on CRAN
    • 32.3 Codes on GitHub
      • 32.3.1 Expenditures and procurement
      • 32.3.2 Health and Environment
      • 32.3.3 Elections
      • 32.3.4 Ottawa
      • 32.3.5 Vancouver
      • 32.3.6 Other related packages
  • Published with bookdown/li>

The R4GC Book

29 Open Ontario Data

29.0.1

COVID-19 cases in hospital and ICU, by Ontario Health (OH) region https://data.ontario.ca/datastore/dump/e760480e-1f95-4634-a923-98161cfb02fa?bom=True

https://data2.ontario.ca/en/dataset/deaths-by-cause-sex-and-age We are reviewing the data in this record to determine if it can be made open.

https://data.ontario.ca/en/dataset?q=vital+statistics

https://data.ontario.ca/en/dataset/covid-19-vaccine-data-in-ontario