February edition of Berkeley Datahub Newsletter
Get regular updates about the cutting edge work happening with Berkeley Datahub and across Berkeley Data Science Teaching Stack
Datahub Product Updates:
Real-Time File Sharing:
Are you a student interested in collaborating with fellow students in a team project? Or an instructor who wants to share assignment solutions/course materials with your teaching team? Datahub currently allows a Dropbox-like functionality to share selected file(s)/folder(s) with collaborators. This is done by using a third-party application called SyncThing. SyncThing enables users in a hub to perform real-time collaboration. This means one can create a file/folder that can be shared with collaborators, either as a part of the same hub or across different hub(s). Interested to explore this feature? Refer to this documentation which walks you through the step-by-step process.
Linux Desktop in Datahub:
Are you an instructor thinking about moving some of your labs that use Linux applications to the cloud? Datahub allows running arbitrary Linux Desktop Applications on your instance. What this means is that any application that can be run on a Linux desktop, can run inside your browser, with fairly usable latencies. This is extremely helpful when part of your workflow might involve pre-developed scientific software that is not really web-friendly. Currently, this functionality is enabled in the EECS and Stat 159 hubs. Students access some of the EECS circuit diagramming applications, required as part of their lab through this environment. Refer to this recent post on how this works. Interested in exploring how the desktop environment works? Check it out here.
Secure Github Authentication:
Are you an instructor interested in exposing your students to Github and are wondering how to do that securely via Datahub? One of the recent updates to Datahub is secure Github authentication (Thanks Yuvi Panda). What this means is that you can securely connect to your GitHub from your hub instance. Why is this important? The primary reason is to provide better security for users as they interact with their Github repositories. Refer below to find the detailed goals for this feature.
Allow users on a JupyterHub to grant push access to specific repositories alone, rather than all the repositories of a user.
Do not store long-term credentials (like personal access tokens or ssh-keys) on disk, as they may get archived / fall into the wrong hands in the future.
Allow GitHub organization admins visibility and control over what can be pushed remotely. Having control over this is important as repos’ users on remote systems (like JupyterHub or a shared cluster) might be visible to other admins of the remote system. Hence, they might be able to access the files of users with push access to repos. This has serious implications for supply chain security, as credentials might be stolen or lost, and concerning vulnerabilities may be pushed to the repo.
This functionality is currently enabled for usage in Stat 159 hub.
Faculty Spotlight:
Instructor: Douglas Dreger
Course Name: EPS 130 (Strong Motion Seismology)
How did you leverage Berkeley Datahub as part of your course?
Our department, Earth and Planetary Science, had a desire to expose our undergraduate students to computational tools early in their curriculum to help to develop a skill set that facilitated their study of upper-division course topics, as well as one that benefits them after graduation. Towards this goal, we developed a connector course called PyEarth (EPS88) that builds on the topics students learn in DATA8, that are geared towards the interests and needs of students either already declared or considering a major in Earth Science. This would not have been possible without being able to leverage the Berkeley Datahub. A common issue one faces when trying to introduce computational tools in a physical science class that is not a “programming” class, is the wide range of experience and the even wider range of hardware and software that the students have at their disposal. The Berkeley Datahub eliminates the latter by having a computational environment that is accessible through the web and is, therefore, computational platform-independent. This frees up valuable instructor time from helping students get t7heir computers ready and transfers it to the lesson plans. Importantly for students, it removes front-end frustration when they are having problems with operating system (OS, compilers, and environment) setup, which can be a barrier to downstream learning.
I will add that for the EPS88 connector course, the structure and accessibility of the Datahub have made it easy for fully remote, and hybrid in-class instruction.
Did the stack help improve student engagement and/or learning outcomes? If yes, how?
I have started using Datahub in an upper-division seismology class (EPS130), which in the past homework design required some computational analyses which could be carried out with Excel, Matlab, Mathcad, or even C depending on the background of the student. That is, assignments were formulated that required some level of data analysis, but the tools to use were left to the student (of course while giving them ideas on how they can approach the problem given their background). I am still in the process of reformulating the assignments but I am finding that porting them to python, which can be used within a Jupyter environment on Datahub has greatly improved the overall outcome of the assignments, where the more experienced students continue to do well, but the less computer-savvy students are also doing well in the assignments. Variable time is therefore spent on the seismological lessons rather than figuring out how to do something and whether Excel or Matlab should be used. The process of defining assignments on bcourses and then linking to the assignment on Datahub works very well. In addition, when demonstrating or teaching how to approach a problem, it is very useful for the instructor to do so on exactly the same platform the students will be using for the assignment.
Previous Workshops with sample notebooks from domain instructors :
Workshop with Political Science: (11/17/2021)
Political Science X Modules Slide Deck
Workshop with Sociology: (1/14/2022)
Sociology X Modules Slide Deck
Workshop with Civil and Environmental Engineering: (1/19/2022)
Civil and Environmental Engineering X Modules Slide Deck