Fall 22 edition of Berkeley Datahub Newsletter
Get regular updates about the cutting edge work happening with Berkeley Datahub and across Berkeley Data Science Teaching Stack
Team Update:
The Datahub team had a major transition during the Fall 22 semester. Tech lead Yuvi Panda transitioned out of the team after running the complex infrastructure for 5+ years paving the way for Shane Knapp as the new technical lead. Shane came from supporting Electrical Engineering and Computer Science’s Research Engineering, and started running the day to day operations of Datahub from October, 2022.
Product Updates:
New Hub deployments:
a11y hub: Accessibility is one of the key priorities for both the Datahub and the Jupyter accessibility team. Different teams within the Jupyter community are currently making changes to both JupyterLab and RetroLab distributions in order to make them accessible. The Datahub team deployed a new a11y hub to test accessibility related changes and provide concrete feedback to the upstream Jupyter team. As part of this effort, installed the alpha version of Jupyter Notebook 7.0 which had updates to fix some of the common accessibility issues reported in previous versions of notebooks. Tested the alpha version and recommended changes for the stable version of Notebook 7.0 to the upstream Jupyter team.
CEE hub: Deployed a new Civil and Environmental Engineering hub to launch a Jupyter Desktop environment with QGIS for a Civil Engineering course named Engineering Geology. QGIS is a cross platform open source geospatial application which will be used by the students for Geographic Information System (GIS) exploration as part of their coursework. Jupyter-QGIS is a tool built to deploy linux native QGIS software in cloud native Datahub environments.
New Features:
Support for private Github repository: Configured Datahubs such as Biology and EECS hub to launch Jupyter notebooks from private Github repositories using the nbgitpuller links. The Datahub team advocates for open source ethos across technology and curriculum so that many instructors and students can benefit from free access to resources. However, If instructors require private Github repositories for legal and compliance purposes, then support for private Github repositories can be enabled.
Database Deployment: Deployed PostgreSQL database in Data 101 hub for a Data Engineering course. Students use this hub to connect their Jupyter Notebooks to PostgreSQL database and perform computations as part of their assignments. They perform database operations in Python using the notebooks.
Shiny Applications: Shiny applications can be launched using Shiny hub and R Datahub. The goal for deploying shiny applications is to support instructors (mainly from Social Science) who use the R programming language as part of their courses. Political Science course named “Intro to Empirical Analysis and Quantitative Methods” currently deploys shiny dashboards as a pedagogical tool.
Canvas based authentication: Most of the datahubs deployed at Berkeley will start using bcourses (Canvas) based authentication during Spring 23. Users will login using their bCourses credentials to launch Datahub. Couple of reasons for making this change,
a) Improve stability of the hubs by removing single point of failures and
b) Allow the team to collect course enrollment and user data extensively which can improve support practices.
Repo2JupyterLite Action: JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels. Setting up JupyterLite for a course doesn’t require an extensive understanding of the cloud and back end. Repo2Jupyterlite-action is a Github action built recently by Yuvi Panda to publish a Github repository full of interactive notebooks to Github pages statically with JupyterLite.
UX Improvements:
When a large number of students try to access Datahub concurrently during peak load times (such as assignment deadline), they may face issues with kernels taking a long time to load. Students earlier had no way of knowing the amount of time they needed to wait for the kernel to launch. This can lead to frustrations for students who may experience the delay during the assignment deadline. The recent infra update ensures that students get a helpful message about the time it takes for the kernels to launch when the Datahub loading page appears. This UX change will hopefully improve the user experience.
Instructor Spotlight:
Instructor Name: Ryan Edwards
Course Taught: Econometrics (Econ 140)
Semester: Fall 22
Ryan Edwards used R Datahub to launch Jupyter Notebooks with R Kernels. Ryan taught a class of 400+ undergraduate students during Fall 22. He is one of the few instructors who used R-based Jupyter Notebooks instead of R Studio as a pedagogical tool for a relatively large class size. He had a positive experience using Datahub which he tweeted about sometime back,
I just finished teaching #Econometrics to 400 undergraduates at Berkeley.I ran it in R using Jupyter notebooks running on Berkeley's Datahub, an elegant and flexible solution for hands-on learning with 400. The pedagogical approach is fully modern and applied, a perfect fit for Berkeley undergraduates, many of whom have seen Data Science using similar building blocks. But MM places causal inference front and center, training the next generation of economists and data scientists.
Scheduled Workshops:
Jupyter community workshop with an explicit focus on education is scheduled between Jan 24th and 26th, 2023 in Paris. You can find more details using this link.