May edition of Berkeley Datahub Newsletter
Get regular updates about the cutting edge work happening with Berkeley Datahub and across Berkeley Data Science Teaching Stack
Datahub Product Updates:
Secure Github Authentication
gh-scoped-creds is a new tool that securely simplifies providing push access to GitHub from JupyterHub or High-Performance Computing (HPC) systems. This recent update to the Datahub infrastructure enables gh-scoped-creds. Here is a Jupyter blog post that provides the detailed rationale for this functionality and a short demo to securely push changes to Github. This tool was successfully deployed in Stat 159 course (Collaborative and Reproducible Data Science) taught by Fernando Perez in Spring 2022.
Workshop Hub
The workshop hub (workshop.datahub.berkeley.edu) is the Datahub infrastructure set up to help instructors run data science workshops. The workshop hub offers 1 GB RAM with all basic packages pre-installed in Python to run the required computational workflow. Currently, a program named Berkeley Unboxing Data Science (BUDS) uses the workshop hub to run its data science workshops with high school students during the summer. BUDS is a new Computing, Data Science, and Society (CDSS) summer program that immerses high school students in the world of data science and research.
Dashboarding via R Shiny
Shiny is an R package that makes it easy to build interactive web apps straight from R. R shiny server in Datahub is used to build interactive dashboards. Here is an example of an R shiny plot rendered in Datahub.
Short demo of a shiny application is below,
If you are interested to play with shiny examples then you can use examples from the shared Github repository - https://github.com/rstudio/shiny-examples
Nbgitpuller updates
Nbgitpuller allows pulling data from a public Github repository. Thanks to the requests from multiple users, the recent update allows nbgitpuller to pull data from a private GitHub repository. A GitHub app should get installed in order to enable the nbgitpuller application to fetch data from a private Github repository. Refer to this detailed description to enable nbgitpuller to pull content from a private Github repository.
Instructor Spotlight
Graduate Student Instructor: Elise LePage (a Ph.D. student working on Mathematical Physics problems with Professor Mina Aganagic)
Courses Taught: Physics 88 with professor Heather Gray (Data Science Applications in Physics)
How did you leverage Berkeley Datahub and other tools within Data Science Teaching Stack as part of your course?
I acted as a GSI for Physics 88 course for a couple of semesters. This course can be considered an intro-level physics + coding course for undergrads. The profile of students taking this course is pretty diverse. Few students have a lot of experience with coding in Python while many students lack that experience. Course workflow involves instructors uploading homework to the GitHub repository and distributing them as unqiue links. Students will use those links to access their homework in Datahub.
Did the tools help improve student engagement and/or learning outcomes? If yes, how?
Using Datahub as part of Physics 88 saved a lot of time for students and helped avoid frustrations with setting up their local environments. Also, it saved precious in person classroom time from debugging student local setup.
What are your biggest learnings using the stack?
Students get excited about engaging with datasets and visualizing Physics problems. Particularly, they enjoyed exploring datasets from Physics-based experiments and Astro-related datasets. Some of the exciting problems that engaged the students in this class,
Visualize planetary orbits using numerical integration,
Visualize pendulum data,
Work on datasets involving scattering problems,
Work on datasets involving collision and decay related problems.
Students work in groups for their assignments and final projects. Currently, they are using Google Colab for teamwork-related stuff. Collaborating with other students via Jupyter notebooks is one of THE most important feature I would expect to see in Datahub.
What will your advice be for the varied faculty who intend to adopt these tools?
Datahub is a good idea for integrating data related stuff as part of the course work. Jupyter Notebook format is a great way to visualize existing data. Keep the content less nebulous and work on interesting datasets that students want to learn more about to make the course work engaging.