Archived Data Sets

February 10, 2025February 12, 2025

Last week saw a flurry of messages about how to find archived data sets. This is the list of resources and links from those messages. The bulk of this list came from the Data Rescue Project (@datarescue2025.bsky.social) that was shared by Melissa Haendel. Please check the Data Rescue Project page for new updates. The Data Rescue Project now has a homepage https://www.datarescueproject.org/about-data-rescue-project/

Larger and Established Data / Website Efforts

End of Term Crawl

The main coordinated effort to archive websites
Datasets have been more of a challenge, especially data embedded in databases.

EDGI

They have been focused on environmental data and a good organization to follow for updates.
They work with Public Environmental Data Project (see below)

Public Environmental Data Project

A coalition committed to preserving and providing public access to federal environmental data.
January 31, 2025 – CDC’s Social Vulnerability Index and Environmental Justice Index
January 24, 2025 – Council on Environmental Quality EJScorecard
January 24, 2025 – Climate and Economic Justice Screening Tool

Harvard’s LIbrary Innovation Lab Team

They have been focusing on data.gov and should released their data on Feb 6, 2025. https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/
- #SafeguardingResearch is in contact with them to mirror data on servers not in US-jurisdiction

ICPSR

Overview of ICPSR’s data rescue activities to date:
- Downloaded ~2800 files from various sources requested by researchers; all the files ICPSR collected will soon be available via a dropbox link.
- Examining CDC data dump from archive.org to assess what might be missing.
  - Ideally will also be a resource for those looking for data to see what is/isn’t available.
- ICPSR staff and allies are generating metadata for each of the datasets we have so that we can make them available through an existing archive at ICPSR (DataLumos, openICPSR, or the Resource Center for Minority Data, depending on our timeline and some technical issues we’re working out)
ICPSR Data Lumos – They have the older version of a lot of major data, including a recent addition from the CDC.

IPUMS

They have data and have been working on cataloging efforts
Notification went out yesterday that they will share more soon.

Dryad

Generalist repository available to help with data publication, storage, and preservation.

Synapse

Generalist biology and biomedical data repository available to help with data publication, storage, and preservation.

Silencing Science Tracker

Joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund.
Tracks government attempts to restrict or prohibit scientific research, education or discussion, or the publication or use of scientific information.

OSF

Generalist repository for archiving, sharing, and storing all types of research outputs, not limited to preprints or only data.
OSF is available as an option for pre-prints of articles if, for some reason, they cannot be posted on official sources.
Many universities also have institutional repositories where research (articles, data, dissertations, etc) from that institution can be posted. They also have preservation mandates. An example is Penn’s ScholarlyCommons.

The Climate Mirror Project

Has NOAA data pulled during the 2017 data rescue.

Open Energy Data Initiative

A volunteer has pointed out that “key equity data” is missing from the Dept of Energy. Says they were able to find it on this site. Includes additional data from DOE.

Wayback Machine

The Wayback Machine is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Other projects include Open Library & archive-it.org.

Data Rescue Events

University of Washington-based Data Rescue
- Hosted by the University of Washington Center for Advances in Libraries, Museums, and Archives (CALMA), series of data rescues followed the model from 2017. The spreadsheet of data reviewed at the events is available: Data Tracking List – Data Rescue 2025 (Responses).xlsx
- It is unclear if they are hosting more.
Healthy Regions Policy Lab at UIUC
- https://emails.illinois.edu/newsletter/02/615978402.html
- Includes CDC, EPA, and HRSA Data
Stanford’s Big Local News
- They are running Federal data collection collaborative

Smaller/Ad Hoc Rescue Efforts/ Data Archiving Activists

UCSB LSIT Data Mirroring
- Mirrored and archived public data on locally hosted git server
- Includes retrieved data sets from CDC, NIH, and NOAA
CDC Page on Internet Archive
- A special archive created on IA of all CDC datasets publicly available as of January 28, 2025
- uploaded by DataHoarders (we think)
Datasets in Dataverse
- Data uploaded by the Climate Change and Health Research Coordinating Center (CAFE)
  - CAFE is looking for potentially non US based location to duplicate the contents of their collection
- Includes CDC’s Social Vulnerability Index data.
- Most of what’s being placed here is data focusing on health and the environment.
- DataRefuge from 2017 DataRefuge initiative can be opened for more deposits
Safeguarding Research
- Organizer is Henrik Schönemann; https://fedihum.org/@lavaeolus
- There is a forum: https://safeguarding-research.discourse.group/ (admin = Henrik)
  - Based in EU, USA and global – got access to Update 1-2 PB (and more on the way) of storage & people willing to seed
  - Currently, we’ve got around 1TB of data backed up
    - Including >100.000 PDFs from academia.edu (“transgender”, “Queer Studies”, “intersex”, “nonbinary” etc. – see the forum for the full list)
    - 350GB web archive of CDC, including all 30.000 files from archive.cdc.gov And much more
    - “We’re working on providing a central index of archives, with metadata about who archived what, when, to be disseminated widely alongside torrent files and act as both a central point of coordination for archivers to assess what new work is needed, and a mass distribution channel.”
  - Possible contact to CERN, will update asap
Data Hoarder
- A reddit community that is coordinating efforts to rescue data.
Data Hoarding
- index of resources and archives related to data hoarding, web archival and self hosting.
ArchiveTeam Warriors
- They run a distributed crawler. Anyone can install it to help contribute.
- US Federal Data page
- Data is uploaded to Archive.org by volunteers
Data Liberation Project
- Note: It looks like the project may have stalled in September 2024. Send info if you know more about them.
- Run by BigLocalNews and MuckRock, which are good groups to follow.

Tools for Data Rescues

DCN Curating Data for Data Rescues
- Provides key insights for curating data and the types of questions that need to be asked.
Data Management Checklist For Data Rescues (from MIT)
- Checklist to assist with curating data rescue efforts.
#RStats package from @ropensci.org
- gitcellar downloads and archives all repos, issues, and PRs from a GitHub organization in one shot: docs.ropensci.org/gitcellar/
WebRecorder.net
- According to an email: has archived 8TB+ of government sites, some from the End-of-Term-Archive seed list, some from EDGI Slack requests, and many sites independently
ArchiveBox.io
- According to an email: has also archived government datasets from data.gov, CIBP, USCIS, NOAA, NASA, NSIDC, and more
Awesome-datahoarding
- Provides a list of tools for web harvesting, etc.
Awesome Web Archiving
- Another curated list of web archiving tools
DataRescue Workflow
- This is the workflow from the original data rescue/DataRefuge project in 2017.
- Many of the tools are no longer working, but the workflow is still useful. UW used this to create their workflow above.
- The challenge with the original project was where to store and how to make discoverable the large amounts of data captured.
- Part of this effort is also housed in the Harvard Dataverse Repository and can be opened for more data deposits
- There is a CKAN instance with some of the 2017 data.
https://govdiff.com/
- Tool created by Jerome Paulos to show side-by-side changes in government websites.
How You Can Help Archive U.S. Government Data Right Now: Install Archive Team Warrior
- This is a reddit post, but it lists instructions for how to archive and the tools needed to be able to contribute. Figured it would best be categorized here.

Library Guides to Data Rescues

American Univ: https://subjectguides.library.american.edu/data_rescue (Now shared through Springshare)
Univ of MN: https://libguides.umn.edu/govpubs/admin
Salem State: https://libguides.salemstate.edu/datapreservation
Butler: https://libguides.butler.edu/archiveddatasources
Hamilton: https://libguides.hamilton.edu/c.php?g=132443&p=10779226
Albany: https://libguides.library.albany.edu/c.php?g=1450281&p=10779581
GODORT: https://godort.libguides.com/c.php?g=1450475&p=10780944

Articles on current efforts

Call to arms: What government information librarians can do to help save critical federal information from being lost – Blogpost from FGI (Free Government Information)
Why EDGI is Archiving Public Environmental Data – blog post from EDGI
Preserving federal health data – by The Journalist’s Resource out of the Harvard Kennedy School
- As the US government removes health websites and data, here’s a list of non-government data alternatives and archives – by The Journalist’s Resource
Archivists Work to Identify the Thousands of Datasets Disappearing from Data.gov – by 404 Media; interviews with EOT and James Jacobs
The scramble to back up CDC.gov – by Garbage Day; mentions some coordinating efforts by Health Professionals and Journalists to gather the CDC data
Lending a hand with EOT Crawl – blog post from the PEGI Project.
As the Trump admin deletes online data, scientists and digital librarians rush to save it – Salon Magazine. Talks about EOT.
Three Efforts to Preserve Government Data as a New Trump Administration Approaches – Union of Concerned Scientists
What’s at Stake if the Data at Federal Agencies Disappears? – Union of Concerned Scientists
Researchers rush to preserve federal health databases before they disappear from government websites from The Journalist’s Resource

Articles for context

CDC Site Restores Some Purged Files from NYT
Thousands of U.S. Government Web Pages Have Been Taken Down Since Friday” by Ethan Singer.
The Government Information Crisis Is Bigger Than You Think It Is blog post by Free Government Information
CDC removes gender, equity references in public health material from WaPo
BREAKING NEWS: CDC orders mass retraction and revision of submitted research across all science and medicine journals from Inside Medicine
A Look at Federal Health Data Taken Offline from KFF
As Data Goes Off-Line Under Trump, Environmental Researchers Are Uploading Backups from Inside Higher Ed
The mad dash to protect environmental data from Donald Trump from The Verge
Some federal health websites restored, others still down, after data purge from VPM
Trump orders USDA to take down websites referencing climate crisis from The Guardian

Existing Alternative Data Sources

Thanks to Brianne Dosch for suggesting the section and some of the bullets.

PolicyMap – offers a free tier that can be used to view basic information down to the tract-level, but more detailed data and functionality requires a subscription; available at some universities
- Purged Federal Agency Data Available
FRED – They have some demographic data as well; free and open source
Census Reporter – is a free, open-source platform focused on making American Community Survey (ACS) data more accessible, including the recent upload of the 2022 1-Year ACS data
Esri – for mapping users, the GIS vendor publishes several U.S. Census Bureau data sets, including the ACS, through its ArcGIS Online Platform
IPUMS – Even when the government operates normally, many analysts turn to Minnesota Population Center products to access ACS, Current Population Survey microdata and Decennial Census data
Social Explorer – historical Census data and more; available at some universities
SimplyAnalytics – has internally processed American Community Surveys; available at some universities
American College of Obstetricians and Gynecologists – Hosting copies of immunization schedules and contraceptive use guidance from the CDC
https://www.ebi.ac.uk/ena/browser/home – The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. Mirrors SRA public data

Economic Indicators

National League of Cities: Federal Grant Navigation Equity Dashboard
- This tool aggregated data from many sources – it seems to still be able to categorize disadvantaged communities (by environmental and economic standards), as well as other critical data denotations that are increasingly hard to access
ALICE Economic Vitality Dashboard and Report (2022 w/ 2024 update)
- This resource specifically provides data on work, housing, and community resources for households below the ALICE threshold (Asset Limited, Income Constrained, Employed). The data is provided by the U.S. Census Bureau’s Public Use Microdata Sample (PUMS, 202!)
National Equity Atlas Dashboards
- A data and policy tool that provides a detailed report card on racial and economic equity – this tool can provide a holistic Racial Equity Index snapchat of communities. The Atlas draws its data from a unique regional equity indicators database developed and maintained by two private institutions: PolicyLink and USC Equity Research Institute ERI.

Public Health

County Health Rankings & Roadmaps (CHR&R)
- A program of University of Wisconsin’s Population Health Institute, this data tool aims to highlight the symbiotic nature of health and equity by factoring in physical environment, social and economic indicators, clinical care, and health behaviors to health outcomes.
  - They also recommend these additional health data platforms:
  - America’s Health Rankings report is a health assessment tool based on state-level health indicators.
  - Congressional District Health Dashboard pulls together local data on the health and well-being for each congressional district.
City Health Dashboard
- From NYU Langone Health, this platform provides 40+ measures of health and factors affecting health across five areas (Health Behaviors, Social and Economic Factors, Physical Environment, Health Outcomes, and Clinical Care) for 970+ cities across the U.S.

Biocuration 2019 – Workshop Reports

June 21, 2019June 26, 2019

GREEKC

The COST Action GREEKC held a workshop inviting community feedback on its work to align efforts to curate, standardize, archive and share information about the regulation of gene expression. A status report was presented by the Work Group leaders, and feedback on the organisation of the next events was received. Some of the feedback that GREEKC needs can still be given through these surveys: “The Work of Curators” and “The Experience of Curators”. One of the main discussion points concerned a re-design of the Sequence Ontology, and a comprehensive set of term requests necessary to annotate the regulatory genome are now being worked on with the SO team (Eilbeck group, Utah). The status of the SO was further discussed with a much wider group of users within the Biocuration community, at an impromptu lunch discussion later during the Biocuration 2019 main event. We hope to be able to present a significantly updated SO at our upcoming workshops, the first week of November 2019.

The IMEx Consortium of Molecular Interaction databases

The IMEx Consortium is a collaboration between interaction databases willing to share data and curation effort. This workshop focused on the coordination and further definition of curation practices. Topics covered were curation coordination tools such as IMExcentral and targeted curation practices, glycan-related physical interactions, nucleic acid-protein interactions and the influence of variation upon interaction outcome. In a joint session with the GREEKC community, transcription factor-target gene interactions and causal relationships were also discussed, developing already active areas of collaboration between the two communities on the representation of this type of data. If you are interested in contributing to the work of IMEx, contact us on intact-help@ebi.ac.uk

Practical ontology applications, tooling and interoperability best practices for FAIRification

This workshop provided an interactive introduction to FAIRification and interoperability best practices in the context of ontology services and semantic web technologies such as the OBO Foundry, ontology service suite at the EMBL-EBI and ELIXIR’s Recommended Interoperability Resources (RIRs). The day started with a general introduction to interoperable data management and FAIR principles before a series of talks and practical demonstrations on resources including the OBO Foundry in general, and specifically OBO core, the single cell expression Atlas (scAtlas), the EMBL-EBI Ontology Tooling Suite and a number of ELIXIR Recommended Interoperability Resources (RIRs) such as FAIRsharing (slides), InterMine and BridgeDb. The workshop concluded with an open-floor discussion on the needs of the biocuration community with respect to FAIR resources and ontologies, and ontology and FAIRification best practice.

Curating Evidence for Gene:Disease Validity for Clinical Omics

Three Gene Curation Coalition (GenCC) member groups (Genomics England PanelApp, ClinGen and Orphanet) presented an overview of their gene curation strategies and focus, leading to dialogue about the merits and challenges of each approach. The conversations reinforced some of the challenges we face in performing manual curation of gene:disease associations, and the rules we have in place to ensure consistent annotation. We reviewed where we could most benefit from incorporating additional ontologies and mappings into our resources, and areas that required further clarity; it quickly became apparent that even the term ‘panel’ can be ambiguous given its different use between resources- are we talking about a panel of genes, or a panel of people? Curators are already familiar with the need for consistent curation terminology, and the workshop provided the perfect opportunity to poll attendees for their views on clinical evidence descriptions. We were then able to demonstrate how the recent efforts of the GenCC to establish consensus terms for validating gene:disease associations will allow us to work together and allow efficient data sharing. Overall we hope that the workshop provided an insight in to the roles and diversity of data curation in the clinical setting.

Equality, Diversity, and Inclusion (EDI)

The introductory slides explained what these terms mean and how they are being embraced by scientific institutes in different countries. This was followed by a more in-depth presentation by the invited speaker Dr Saher Ahmed, head of EDI at the Wellcome Sanger Institute, Cambridge, UK, who discussed gender discrepancies in the workplace, and highlighted some efforts at Sanger to address these issues, such as pay transparency, changes to their leave policies, and creating a family-friendly workplace. The remaining time exchanging views on the gender pay gap, maternity, paternity & carers leave, cultural differences in working practices and accessibility. As an outcome of this workshop, attendees agreed there is a need for the ISB to create an EDI subcommittee and that this workshop should be held at subsequent Biocuration meetings. The EDI subcommittee is currently being formed, and the exact roles are to be defined, but they will address issues including a code of conduct amongst the Society as a whole and at conferences, and accessibility at conferences and for ISB activities.

“Not Everything That Counts Can Be Counted” – how biological resources should be evaluated

As scientific data output continues to grow, biological resources are increasingly critical for data discoverability and re-use. However, many highly-used biological resources find it increasingly difficult to secure and maintain funding. This discordance implies that the value of curated resources to the research infrastructure is still not fully appreciated in the wider scientific community, or that the money saved by curated resources is not fully factored into funding models. In this workshop, we hoped to address questions surrounding this disparity. A short introduction to the issues was provided by Marc Gillespie. A funder’s perspective was provided by David Carr (Wellcome Open Research). Jo McEntyre (EBI, Literature Services, UK) provided an overview of the Elixir indicators designed to evaluate resource quality in a standardized way. Two major priorities emerged from the discussion. Firstly, knowledge bases not only capture data but also synthesize new knowledge. The differences in requirements for evaluating archive and knowledge-base database should be made clearer. Secondly, the need to educate the scientific communit and funding bodies about the hidden work and benefits of data curation is urgently required. Suggestions ideas and recommendations gathered during the conference and post-meeting are documented here, and we encourage curators to add further ideas, with an aim to develop into an ISB position paper during 2019-2020.

Biocuration in Industry

The Biocuration in Industry workshop was organised by Jane Lomax (SciBite) and Yasmin Alam-Faruque (Eagle Genomics) with an aim to discuss the experiences of, and challenges faced by, non-academic biocurators. The workshop attracted ~100 participants, with most coming from academia. The workshop began with short talks from commercial companies, including Nebion, Hoffman-La Roche, Healx and Eagle Genomics, who described their curation pipelines, standards and scientific interests, which included cancer immunotherapy, microbiomes and Parkinson’s disease. A common theme was the use of public standards and ontologies, emphasising the importance of key resources such as MONDO, GO, HPO and MeSH to aid drug discovery and knowledge management. This also came through in the subsequent panel discussion where the panel agreed that, in order to maintain the high-quality of these resources, there is an onus on the commercial sector to contribute back improvements to these open-source efforts. The main challenge for the panel, as in the academic sector, is data cleansing to create high-quality and reproducible datasets for downstream processes. However, this was seen as a valuable, and transferable, skill for biocurators as the biomedical industry increasingly recognises the need for clean data.

The Phenotypes Traversing All the Organisms (POTATO)

The POTATO workshop is part of an ongoing effort to reconcile phenotype ontologies across species. This, the second workshop in the series, brought together 24 curators and ontology developers from a variety of backgrounds including representatives of many important groups in the phenotype curation space: Monarch Initiative, the Alliance of Genome Resources, ZFIN, PomBase, dictyBase, PHIBase, GO, SGD, HPO, FlyBase, MGI, Phenoscape and more. The Phenotype Ontology Reconciliation Effort aims to align phenotype ontologies using a common set of design patterns. These design patterns depend on a variety of external ontologies including the Phenotype and Trait Ontology (PATO) and the multi-species anatomy ontology, Uberon. The workshop included training in editing these two ontologies. It also featured an extended session to develop a strategy to deal with shortcomings and current limitations of PATO and its usage, as identified by the Phenotype Ontology Reconciliation group. During this session, focus groups discussed a number of PATO related issues, including how to improve PATO definitions in general and how to improve PATO representation of increased and decreased amounts (including absence), frequencies and rates. A number of edits to PATO have already been implemented as a result of this work. The results of the discussion are currently being written up as a meeting report, which will guide future improvements to PATO.

Data Licensing Workshop

The data licensing workshop at Biocuration 2019 was focused on helping scientists to understand important factors in the selection of a data license, as well as the implications of that selection on downstream use and reuse. We had a diverse line up of speakers who each shared their unique perspective — data owners, data miners, and a legal expert — followed by a robust discussion among all participants. The goal of the workshop was not to achieve consensus on the “best” license, but rather to share experiences, perspectives, and questions.

Mapping the Landscape of Biocuration

This well attended pre-conference workshop asked questions such as: What is the state of biocuration in 2019? Where are biocurators based? What are their skills and levels of expertise? What training do they need? What are the tools they use? As part of an ELIXIR Implementation Study, members of EMBL-EBI, FAIRsharing.org/Oxford and SIB ran a survey to capture information on biocurators and the resources they run, the life science/health domains they operate in, and their expertise and training requirements. In the workshop we described the current biocuration landscape, and ran an interactive session to compile feedback on career progression and training roadblocks. Slides from the workshop can be found here: Survey: 10.7490/f1000research.1116798.1; FAIRsharing: 10.7490/f1000research.1116785.1; TeSS: 10.7490/f1000research.1116784.1). More information on the Implementation Study and follow-up work can be found here: https://elixir-europe.org/about-us/implementation-studies/mapping-biocuration

Postgraduate Certificate in Biocuration – University of Cambridge, UK

May 15, 2018June 4, 2018

Launching in October 2018, the Post Graduate Certificate in Biocuration at the University of Cambridge is the first formal educational qualification in the field of Biocuration.

Developed collaboratively between the University of Cambridge’s Institute of Continuing Education (ICE) and EMBL-EBI, this programme has been designed to provide biocurators with a set of practical skills that are applicable across the biological sciences. Whether you are new to biocuration and looking to develop your skills, or an established curator looking to gain a recognised qualification, this course will provide a strong foundation in the principles of biocuration with additional focus on computational skills, data management and user-experience.

The course is divided into three modules, each including a 3 day workshop followed by a period of self-study through online activities. You do not need to be based in Cambridge to study for this course, but you must be able to attend all workshops.

For more details on the course and to apply, please visit: http://www.ice.cam.ac.uk/course/postgraduate-certificate-biocuration

Applications close 30th June 2018.

CTD turned 10!

March 11, 2015March 19, 2015

The Comparative Toxicogenomics Database (CTD) recently celebrated its 10-year anniversary on the web. Since its beginnings, CTD has been devoted to centralizing and harmonizing information about genes responding to environmental toxic agents across diverse species. The database has now evolved into a premier toxicology resource, allowing scientists to discover information and develop testable hypotheses about the biological consequences of chemical exposure (both environmental and drug). Today, CTD includes over 24 million toxicogenomic connections relating chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, Gene Ontology annotations, and pathways.

This celebratory milestone was recently published in the journal Nucleic Acids Research, which summarized the history and evolution of CTD, including descriptions of curation processes, new content, and enhanced visualization and analysis tools. The article also detailed a new “Pathway View” tool that leverages gene interaction data from BioGRID to allow users to build unique toxicogenomic interaction modules connecting chemical exposure to disease events.

As it was ten years ago, CTD today is still managed by a small team of biologists and software engineers who work with both the toxicology and biocuration communities to advance understanding of chemical-gene-disease data and how best to extract and code this information from the published literature. All CTD data are freely available to the public. As well, CTD content has been disseminated further into the scientific community via more than 55 other databases that routinely incorporate CTD’s annotations. If interested in establishing links to CTD data, please notify us and follow these instructions.

Name	Domain	Purpose	Expiry	Type
wpl_user_preference	www.biocuration.org	WP GDPR Cookie Consent Preferences.	1 year	HTTP
PHPSESSID	www.biocuration.org	PHP generic session cookie.	54 years	HTTP
__stripe_mid	www.biocuration.org	For processing payment and to aid in fraud detection.	1 year	HTTP
__stripe_sid	www.biocuration.org	Stripe Cookie to process payments	Session	HTTP
YSC	youtube.com	YouTube session cookie.	54 years	HTTP

Name	Domain	Purpose	Expiry	Type
__cflb	api2.hcaptcha.com	Generic CloudFlare functional cookie.	Session	HTTP
NID	google.com	Google unique id for preferences.	6 months	HTTP

Name	Domain	Purpose	Expiry	Type
pmpro_visit	www.biocuration.org	---	54 years	---
m	m.stripe.com	---	2 years	---
hmt_id	api.hcaptcha.com	---	1 month	---
wordpress_test_cookie	www.biocuration.org	Generic WordPress cookie.	54 years	HTTP
VISITOR_PRIVACY_METADATA	youtube.com	---	6 months	---

International Society for Biocuration

A non profit organization for biocurators, developers, and researchers with an interest in biocuration

Category: Biocuration Highlight

Archived Data Sets

Larger and Established Data / Website Efforts

End of Term Crawl

EDGI

Public Environmental Data Project

Harvard’s LIbrary Innovation Lab Team

ICPSR

IPUMS

Dryad

Synapse

Silencing Science Tracker

OSF

The Climate Mirror Project

Open Energy Data Initiative

Wayback Machine

Data Rescue Events

Smaller/Ad Hoc Rescue Efforts/ Data Archiving Activists

Tools for Data Rescues

Library Guides to Data Rescues

Articles on current efforts

Articles for context

Existing Alternative Data Sources

Biocuration 2019 – Workshop Reports

Postgraduate Certificate in Biocuration – University of Cambridge, UK

CTD turned 10!

Search on the site

Larger and Established Data / Website Efforts

Data Rescue Events

Smaller/Ad Hoc Rescue Efforts/ Data Archiving Activists

Tools for Data Rescues

Library Guides to Data Rescues

Articles on current efforts

Articles for context

Existing Alternative Data Sources

Search on the site

Log In