Nominations Open for 2025 Biocuration Awards

The ISB is calling for nominations for three Excellence in Biocuration awards in 2025 through the end of May:

1. Early Career Award (https://forms.gle/pGABYnqSqPgiDsVe9)

2. Advanced Career Award (https://forms.gle/Tyfa25BcrM8ejDsC9)

3. Exceptional Contribution to Biocuration (i.e., lifetime award; https://forms.gle/3E2b85eSteaoXzSA6)

Self nominations are welcome! See previous awards on our site: https://www.biocuration.org/community/biocuration-career-awards.

Call for Proposals to host the 2027 International Biocuration Conference.

Dear Colleagues,

The Executive Committee of the International Society for Biocuration (ISB) is pleased to open the call to host the 20th International Biocuration Conference in Asia/Oceania, preferably during April or May 2027, though nearby dates may also be considered.

Individuals and organizations interested in applying may do so by sending a proposal to the ISB Executive Committee (intsocbio@gmail.com) on or before August 31st, 2025

The successful bidder will be notified by October 1st, 2025. The ISB Executive Committee will publicly announce the selected organization or individuals during the Annual General Meeting held virtually in October.

Format

Interested individuals or organizations are invited to submit their application via the following form:
Application to Host Biocuration Conference 2027

Applicants may later be asked to provide further details about the proposed venue, proposed dates, strategies for broad community engagement and fair gender representation, and more.

In a continued effort to bring our meeting to curators in all geographic regions, we encourage ISB members across the Asia/Oceania region to put forward proposals to bring the ISB meeting to your region once again, or for the first time!

REGIONS ROTATION: 

  • North and South America
  • Europe and Africa
  • Asia and Oceania

For more information about the ISB and our previous conferences, please visit http://www.biocuration.org.

Your colleagues at the ISB Executive Committee.

Biocuration Community Survey 2025

This survey aims to gather and analyze information about the field of biocuration.

This survey is being conducted by the International Society for Biocuration (ISB) to identify potential gaps or inequities among biocurators and to identify areas where the ISB may be able to take actions to improve awareness of biocuration.

This is a follow-up with our community to assess the progress made since the we began surveying the community in 2017. The results from past surveys are available here: https://www.biocuration.org/dissemination/survey-results/ (see ISB Career Description Survey Results).  

The resulting data will be aggregated and analyzed and shared with the community. No identifying information will be revealed in reporting results of this survey.

Thank you for your participation. The survey will close Friday, March 28th, 2025

Equity, Diversity, Inclusion and Accessibility Officer

The International Society for Biocuration (ISB) is committed to working to build an inclusive and diverse network of biocurators, ontologists, data stewards and others who work to improve the quality of data wherever they may work. The EDI subcommittee has worked hard to establish a set of guidelines to promote equity, diversity, inclusion, and accessibility for the society. With these guidelines in place and with the difficulty in maintaining an active committee in the past year the executive committee has decided to establish an Equity, Diversity, Inclusion and Accessibility Officer. 

This officer will be charged with:

  1. Acting as a point person for ISB members to communicate EDIA concerns.
  2. Reviewing applications for Biocuration conference organizers for any EDIA concerns.
  3. Working with the Biocuration conference liaison to ensure the annual conference is following EDIA guidelines.
  4. Acting as a point person to think ahead for any potential EDIA blindspots.

The past few years have seen the first Biocuration conference in India (2024), the first fully hybrid Biocuration conference (2025), and plans for the first Biocuration conference in Africa (2026). We fund travel fellowships to enable curators from low-income countries to attend Biocuration conferences, We have increased the number of available microgrants and inclusivity grants available to members this year to two of each type. We have also revised and updated our guidelines for conference organizers.

We thank Mary Ann Tuli and the members of the EDI committee for their tireless work over the years to guide the society policies to where they are now.

We thank Luana Licata for volunteering to be the inaugural EDIA Officer!

Archived Data Sets

Last week saw a flurry of messages about how to find archived data sets. This is the list of resources and links from those messages. The bulk of this list came from the Data Rescue Project (@datarescue2025.bsky.social) that was shared by Melissa Haendel. Please check the Data Rescue Project page for new updates. The Data Rescue Project now has a homepage https://www.datarescueproject.org/about-data-rescue-project/

Larger and Established Data / Website Efforts

End of Term Crawl 

  • The main coordinated effort to archive websites
  • Datasets have been more of a challenge, especially data embedded in databases.

EDGI

Public Environmental Data Project 

Harvard’s LIbrary Innovation Lab Team

ICPSR

  • Overview of ICPSR’s data rescue activities to date:
    • Downloaded ~2800 files from various sources requested by researchers; all the files ICPSR collected will soon be available via a dropbox link.
    • Examining CDC data dump from archive.org to assess what might be missing.
      • Ideally will also be a resource for those looking for data to see what is/isn’t available.
    • ICPSR staff and allies are generating metadata for each of the datasets we have so that we can make them available through an existing archive at ICPSR (DataLumos, openICPSR, or the Resource Center for Minority Data, depending on our timeline and some technical issues we’re working out)
  • ICPSR Data Lumos – They have the older version of a lot of major data, including a recent addition from the CDC.

IPUMS

  • They have data and have been working on cataloging efforts
  • Notification went out yesterday that they will share more soon.

Dryad

  • Generalist repository available to help with data publication, storage, and preservation.

Synapse

  • Generalist biology and biomedical data repository available to help with data publication, storage, and preservation.

Silencing Science Tracker

  • Joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund.
  • Tracks government attempts to restrict or prohibit scientific research, education or discussion, or the publication or use of scientific information.

OSF

  • Generalist repository for archiving, sharing, and storing all types of research outputs, not limited to preprints or only data.
  • OSF is available as an option for pre-prints of articles if, for some reason, they cannot be posted on official sources.
  • Many universities also have institutional repositories where research (articles, data, dissertations, etc) from that institution can be posted. They also have preservation mandates. An example is Penn’s ScholarlyCommons.

The Climate Mirror Project

  • Has NOAA data pulled during the 2017 data rescue.

Open Energy Data Initiative

  • A volunteer has pointed out that “key equity data” is missing from the Dept of Energy. Says they were able to find it on this site. Includes additional data from DOE.

Wayback Machine

Data Rescue Events

Smaller/Ad Hoc Rescue Efforts/ Data Archiving Activists

  • UCSB LSIT Data Mirroring
    • Mirrored and archived public data on locally hosted git server
    • Includes retrieved data sets from CDC, NIH, and NOAA
  • CDC Page on Internet Archive
    • A special archive created on IA of all CDC datasets publicly available as of January 28, 2025
    • uploaded by DataHoarders (we think)
  • Datasets in Dataverse
    • Data uploaded by the Climate Change and Health Research Coordinating Center (CAFE)
      • CAFE is looking for potentially non US based location to duplicate the contents of their collection
    • Includes CDC’s Social Vulnerability Index data.  
    • Most of what’s being placed here is data focusing on health and the environment.
    • DataRefuge from 2017 DataRefuge initiative can be opened for more deposits 
  • Safeguarding Research
    • Organizer is Henrik Schönemann; https://fedihum.org/@lavaeolus
    • There is a forum: https://safeguarding-research.discourse.group/ (admin = Henrik)
      • Based in EU, USA and global – got access to Update 1-2 PB (and more on the way) of storage & people willing to seed
      • Currently, we’ve got around 1TB of data backed up
        • Including >100.000 PDFs from academia.edu (“transgender”, “Queer Studies”, “intersex”, “nonbinary” etc. – see the forum for the full list)
        • 350GB web archive of CDC, including all 30.000 files from archive.cdc.gov And much more
        • “We’re working on providing a central index of archives, with metadata about who archived what, when, to be disseminated widely alongside torrent files and act as both a central point of coordination for archivers to assess what new work is needed, and a mass distribution channel.”
      • Possible contact to CERN, will update asap
  • Data Hoarder
    • A reddit community that is coordinating efforts to rescue data. 
  • Data Hoarding 
    • index of resources and archives related to data hoarding, web archival and self hosting. 
  • ArchiveTeam Warriors
    • They run a distributed crawler. Anyone can install it to help contribute.
    • US Federal Data page
    • Data is uploaded to Archive.org by volunteers
  • Data Liberation Project
    • Note: It looks like the project may have stalled in September 2024. Send info if you know more about them.
    • Run by BigLocalNews and MuckRock, which are good groups to follow.

Tools for Data Rescues

Library Guides to Data Rescues

Articles on current efforts

Articles for context

Existing Alternative Data Sources

Thanks to Brianne Dosch for suggesting the section and some of the bullets.

  • PolicyMap – offers a free tier that can be used to view basic information down to the tract-level, but more detailed data and functionality requires a subscription; available at some universities
  • FRED – They have some demographic data as well; free and open source
  • Census Reporter – is a free, open-source platform focused on making American Community Survey (ACS) data more accessible, including the recent upload of the 2022 1-Year ACS data
  • Esri – for mapping users, the GIS vendor publishes several U.S. Census Bureau data sets, including the ACS, through its ArcGIS Online Platform
  • IPUMS – Even when the government operates normally, many analysts turn to Minnesota Population Center products to access ACS, Current Population Survey microdata and Decennial Census data
  • Social Explorer – historical Census data and more; available at some universities
  • SimplyAnalytics – has internally processed American Community Surveys; available at some universities
  • American College of Obstetricians and Gynecologists – Hosting copies of immunization schedules and contraceptive use guidance from the CDC
  • https://www.ebi.ac.uk/ena/browser/home – The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. Mirrors SRA public data

Economic Indicators 

  • National League of Cities: Federal Grant Navigation Equity Dashboard 
    • This tool aggregated data from many sources – it seems to still be able to categorize disadvantaged communities (by environmental and economic standards), as well as other critical data denotations that are increasingly hard to access 
  • ALICE Economic Vitality Dashboard and Report (2022 w/ 2024 update)
    • This resource specifically provides data on work, housing, and community resources for households below the ALICE threshold (Asset Limited, Income Constrained, Employed). The data is provided by the U.S. Census Bureau’s Public Use Microdata Sample (PUMS, 202!) 
  • National Equity Atlas Dashboards
    • A data and policy tool that provides a detailed report card on racial and economic equity – this tool can provide a holistic Racial Equity Index snapchat of communities. The Atlas draws its data from a unique regional equity indicators database developed and maintained by two private institutions: PolicyLink and USC Equity Research Institute ERI.

Public Health 

  • County Health Rankings & Roadmaps (CHR&R)
    • A program of University of Wisconsin’s Population Health Institute, this data tool aims to highlight the symbiotic nature of health and equity by factoring in physical environment, social and economic indicators, clinical care, and health behaviors to health outcomes. 
      • They also recommend these additional health data platforms: 
  • City Health Dashboard
    • From NYU Langone Health, this platform provides 40+ measures of health and factors affecting health across five areas (Health Behaviors, Social and Economic Factors, Physical Environment, Health Outcomes, and Clinical Care) for 970+ cities across the U.S.

Biocuration 2025 Preliminary Schedule of Talks

Schedule of talks for April 7-9

DAY 1

Keynote: Tanya Berger-Wolf

Director of the Translational Data Analytics Institute, Director of Imageomics Institute, PI of AI and Biodiversity Change (ABC) Global Climate Center, Ohio State University.

Day 1, Session 1: Data Standards & Ontologies

  • Encouraging authors to use and cite data in public repositories; a publisher perspective.
    • Bastien Molcrette
    • Data Publications, Data Standards, Fair Data Principles, Public Data Resources
  • DO Spanish: enhancing DEI via a standardized workflow for translating ontology and website content
    • Lynn Schriml
    • Curation, Data Sharing, Disease, Ontologies
  • Harnessing Community Power for Long-Term Success of the Mondo Disease Ontology
    • Sabrina Toro
    • Curation, Data Standards, Disease, Ontologies
  • The Earth Metabolome Initiative Ontology  
    • Tarcisio Mendes de Farias
    • Data Modeling, Knowledge Graphs, Omics Data, Ontologies

Keynote: Nirav Merchant

Director of the Data Science Institute at University of Arizona, PI CyVerse

Day 1, Session 2: Artificial Intelligence

  • From Lab Bench to Web: A Strategy for Making Biomedical Data Findable and Accessible
    • Christina Parry
    • Data Standards, Fair Data Principles, Graph Databases, Repositories
  • Extending Ontology for Biomarkers of Aging using OLIVE
    • Hande Kucuk McGinty
    • Artificial Intelligence, Knowledge Graphs, Large Language Models, Ontologies
  • Building the Lighthouse: Guiding LLM-Powered Biocuration with Domain Knowledge and Context
    • Harry Caufield
    • Generative Artificial Intelligence, Large Language Models, Literature Mining, Ontologies
  • AI Curation Methods for NASA Scientific Data
    • Walter Alvarado
    • Artificial Intelligence, Curation, Large Language Models, Metadata
  • Plant Reactome: A plant pathways Knowledgebase and discovery platform
    • Sushma Naithani
    • Artificial Intelligence, Curation, Functional Gene Annotations, Knowledge Graphs

Day 1, Session 3: Data Sharing, Databases & Knowledgebases

  • Single-cell comparative transcriptomics for hundreds of species?
    • Frederic Bastian
    • Comparative Data, Curation, Data Standards, Gene Expression
  • Epitope-Driven Annotations in Protein Resources
    • Randi Vita
    • Database, ontology, protein, epitope
  • Towards FAIR Phenome: Indian Crop Phenome Database at Indian Biological Data Centre (IBDC)
    • Sonia Balyan
    • Data Sharing, Data Standards, Databases, Phenotypes
  • Making Rare Disease Data Available in the Rare Disease Cures Accelerator-Data and Analytics Platform 
    • Nicole Vasilevsky
    • Curation, Data Sharing, Disease, Fair Data Principles
  • Project ‘Shail’: Curating a mountain
    • Saurabh Raghuvanshi
    • Curation, Databases, Drug Discovery, Genomics 
  • Import of Human GWAS Data and Mapping of EFO to multiple ontologies at the Rat Genome Database
    • Stan Laulederkind
    • Curation, Disease, Genomics, Ontologies

DAY 2

Keynote: Paul Thomas

Director, Division of Bioinformatics, Director of the Gene Sequence, Function, and Health Laboratory Initiative, University of Southern California, PI Gene Ontology, PI PANTHER

Day 2, Session 1: Gene/Protein Functional Prediction 

  • DisProt: The Manually Curated Resource for Intrinsically Disordered Proteins
    • M. Victoria Nugnes
    • Curation, Databases, Ontologies, Proteins
  • A Large Scale Crowdsourcing of the Fifth Critical Assessment of Protein Function Annotation
    • Iddo Friedberg
    • Annotations, Artificial Intelligence, Functional Protein Annotations, Public Data Resources
  • New Synteny visualizations on Xenbase
    • Malcolm Fisher
    • Annotations, Comparative Data, Genomes, Synteny

Day 2, Session 2: Gene/Protein Functional Prediction

  • Cross-species quantification of function annotations provides insights into disease-associated uncharacterized human genes
    • Parnal Joshi
    • Annotations, Comparative Analysis, Data Analysis, Functional Protein Annotations
  • Leveraging the AlphaFold Database for enhanced protein function annotation
    • Paulyna Magaña
    • Annotations, Functional Protein Annotations, Protein Structure Prediction, Proteins
  • Leveraging Large Language Models for Gene Summary Generation at the Alliance of Genome Resources
    • Valerio Arnaboldi
    • Large Language Models, Literature Mining, Automated Gene Summaries, Text Summarization
  • Life Cycle Events for Protein Family Models: Birth, Maturation, Cloning, Retirement
    • Daniel Haft
    • Bacteria, Data Sharing, Functional Protein Annotations

Keynote: Andy Hickl

Chief Technology Officer, Allen Institute

Day 2, Session 3: Natural Language Processing 

  • Semi-automated curation of post-translational modification relationships using automated knowledge extraction and assembly
    • Benjamin Gyori
    • Artificial Intelligence, Curation, Databases, Literature Mining
  • Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity
    • Daniela Raciti
    • Curation, Machine Learning, Community Curation, Sentence Classification
  • Enhancing the SIB Literature Services with annotations to support biocuration
    • Deborah Caucheteur
    • Annotations, Curation, Data Analysis, Literature Mining
  • Protein structure enrichment through text mining
    • Melanie Vollmar
    • Annotations, Literature Mining, Natural Language Processing, Protein structures
  • Enhancing data annotation in ChEMBL for robust analyses
    • Sybilla Corbett
    • Annotations, Curation, Fair Data Principles, Natural Language Processing

Day 2, Session 4: Glycans 

  • Glycan Archetypes: definitions, implementations and applications for standardizing glycan structure data
    • Kiyoko Aoki-Kinoshita
    • Data Standards, Databases, Glycans, Ontologies
  • BiomarkerKB: Biomarker-centric data modeling and knowledge integration for translational research
    • Raja Mazumder
    • Curation, Databases, Glycans, Knowledge Graphs
  • Inferring Tissue and Cell-type Glycosyltransferase Specificity from Single-Cell Gene Expression Data
    • Nathan Edwards
    • Annotations, Glycans, Machine Learning

DAY 3

Keynote: Shannon Farrell

Data Curation Network/Univ. Minnesota

Day 3, Session 1: Data Curation 

  • It’s Now or Never: Delays in Biocuration Disproportionately Affect Understudied Proteins
    • An Phan
    • Curation, Data Analysis, Functional Gene Annotations, Literature Mining
  • How have standards in genomics evolved since the first microbial genome was published 3 decades ago?
    • Chris Hunter
    • Data Standards, Metadata, Ontologies, Repositories

Keynote: Sandra Orchard

ISB 2023 Exceptional Contribution to Biocuration Awardee, EMBL-European Bioinformatics Institute – UK

Day 3, Session 2: Data Curation Databases, Infrastructure, Literature Mining, Public Data Resources

  • The global biodata infrastructure: how, where, who, and what?
    • Chuck Cook
    • Databases, Infrastructure, Literature Mining, Public Data Resources

Executive Committee 2024 Election Winners

As announced at the Annual General Meeting earlier this week, we would like to congratulate the following three individuals who have been elected by members of the International Society for Biocuration to serve on the society’s Executive Committee from 2024:

  1. M. Victoria Nugnes (University of Padua)
  2. Sonia Balyan (Indian Biological Data Centre)
  3. Peter Uetz (Virginia Commonwealth University)

We would also like to thank the election officer, Harry Caufield, and the nominating committee for making this election run smoothly:

  1. Sabrina Toro (Chair)
  2. Pascale Gaudet
  3. Silvio Tosatto
  4. Saurabh Raghuvanshi
  5. Raja Mazumder

More information on the Executive Committee can be found here.

Alliance of Genome Resources Webinar – An Introduction to the Alliance

Carol Bult (0000-0001-9433-210X) will present An Introduction to the Alliance on September 19th, 2024. This will cover searching the Alliance, gene pages, and disease pages and will include a Q & A session. This is the first in an ongoing series.

Register by September 18 to receive the Zoom URL. You can find the registration link here: https://www.alliancegenome.org/news/webinar-an-introduction-to-the-alliance

Search by Categories