ISB response to NIH RFI: NIH Strategic Plan for Data Science

On behalf of the International Society for Biocuration (ISB), we provide the following response to the Request for Information: NIH STRATEGIC PLAN FOR DATA SCIENCE, which describes NIH’s overarching goals, strategic objectives, and implementation tactics for modernizing the NIH-funded biomedical data-resource ecosystem.

We are a community highly involved in the development and maintenance of biological and biomedical databases, and the task of biocuration: the translation and integration of information relevant to biology into a database enabling the integration of the scientific literature as well as large data sets (distilling data into knowledge). The International Society for Biocuration (ISB) community includes, among others, biocurators, software developers, bioinformaticians, and standard developers. We are thus familiar with the pitfalls of current funding mechanisms for databases and recognize the importance of developing a different model which is what the strategic plan for data science intends to address. In this response, we focus exclusively on selected aspects of Goal 2: Promote Modernization of the Data-Resources Ecosystem, and Goal 4: Enhance Workforce Development for Biomedical Data Science.

Information requested:

* The appropriateness of the goals of the plan and of the strategies and implementation tactics proposed to achieve them:
Goal 2: Promote Modernization of the Data-Resources Ecosystem
Whilst overall the ISB is generally supportive of the statements made in this RFI, we feel that some terminology used needs to be improved. The RFI refers to databases and repositories indistictively. It should be noted that the term database is an overarching term, and we see the separation as being between primary data repositories, such as members of the INSDC (http://www.insdc.org/), with set submission criteria and minimal subsequent expert curation of the data (biocuration), and Knowledgebases [1]. Then both repositories and knowledgebases are types of databases. We suggest that the terms database, repositories and knowledgebase are clearly defined. Here are our proposed definitions and changes to the text:

A database is a computerized storehouse of data that provides a standardized way for locating, adding, removing, and changing data [2].

Data Repositories and Knowledgebases: What’s the Difference?
Data repositories and knowledgebases are both types of databases which store, organize, validate, and make accessible the core data related to a particular system or set of technologies. In the case of a data repository, the data is deposited by researchers following a set of guidelines and, other than ensuring the guidelines are adhered to, receives minimal subsequent input or modification.

Knowledgebases accumulate, organize, and link growing bodies of information related to the deposited data. A knowledgebase may contain information about gene models, transcript/protein expression patterns, splicing variants, localization, and protein-protein interaction and pathway networks related to an organism or set of organisms. Knowledgebases typically require significant semi-automated as well as manual biocuration by domain experts (e.g., literature-based gene ontology and phenotype annotations) beyond the quality assurance/quality control and annotation needed for data repositories.

We propose that the definition of biocuration is added to the glossary.

Biocuration is the extraction of knowledge from unstructured biological data (typically but not limited to publications) into a structured, computable form. Biocurators are typically Ph.D. level biologists, often with lab bench experience, coupled with
specialized expertise in computational knowledge representation. Their work entails the synthesis and integration of information from multiple sources, including, for example, peer-reviewed papers, large-scale projects, or conference abstracts. They contact authors directly for clarification, digest supplemental information, and resolve
identifiers, in order to accurately capture a researcher’s conclusion and their evidence for that conclusion. Biocurators strive to distill the current ‘best view’ from conflicting sources and ensure that their resources provide data that is not only
Findable, Accessible, Interoperable, and Reproducible (FAIR), but also Traceable, appropriately Licensed, and inter-Connected (collectively, the FAIR-TLC principles) [3].

Goal 4-Enhance Workforce Development for Biomedical Data Science
Again, the ISB is in favor of this proposed goal as training different stakeholders in data science is key for the NIH to achieve the stewardship goals outlined in the NIH-wide strategic plan. However, the enhancement of the workforce is only discussed in terms of data-scientists, and we believe biocurators are relevant stakeholders as well.
In section 4.1 “In addition, NIH will recruit a cohort of data scientists and others with expertise in areas such as project management, systems engineering, and computer science from the private sector and academia for short-term (1- to 3-year) national service sabbaticals. These “NIH Data Fellows” will be embedded within a range of high-profile, transformative NIH projects such as All of Us, the Cancer MoonshotSM and the BRAIN initiative and will serve to provide innovation and expertise not readily available within the federal government.”
We think that biocurators would offer a unique perspective to these NIH projects given their training in formulating and using standards, in data analysis and integration, working with a variety of research communities for adoption of FAIR principles [3]. We suggest that biocurators are explicitly listed and considered as potential “NIH Data Fellows”.
One of the ISB goals is to train the next generation of biocurators, and have developed/collected training materials that could be used by NIH for training grant reviewers (https://www.biocuration.org/dissemination/biocuration-training-materials/).

* Opportunities for NIH to partner in achieving these goals:
NIH should establish a closer interaction with the International Society for Biocuration (ISB) to learn about biocuration and data science. ISB could collect/prepare training materials that could contribute to NIH training goals. ISB members could serve as NIH Data Fellows.
NIH should consult FAIRsharing (a catalogue of data preservation, management and sharing policies from international funding agencies, regulators and journals) and the BioDBcore guidelines [4-5], a community-defined, uniform, generic description of the core attributes of biological databases; ensuring consistency and interoperability between resources.
Encourage and provide guidance to R01 and R21 proposal writers to budget correctly for data sharing. Dumping data into a repository is not trivial, it takes time to deposit data with adequate information. There needs to be clear instructions to grant recipients to submit structured data to journals and/or databases. The biocuration community could help identify a few examples of how such structured data can be submitted. In addition, minimal common standards for databases are already described in BioDBcore guidelines, mentioned in the previous point.
There should be more emphasis on how NIH intramural researchers could collaborate with external groups to link resources. The plan discusses linking all NIH data resources in detail. However, there is a need to also link to external resources and vice-versa.

* Additional concepts that should be included in the plan:
We propose that the definitions of database and biocuration be added to the glossary.

* Performance measures and milestones that could be used to gauge the success of elements of the plan and inform course corrections:
Nothing to comment at this point

* Any other topic the respondent feels is relevant for NIH to consider in developing this strategic plan:

Sustained long-term funding for key resources. Whilst we appreciate that resources need to be constantly re-evaluated and shown to be keeping pace with the demands of new technologies and new use cases, constantly moving from one short-term grant to another, with no guarantee of renewed funding is not beneficial to the resource growth and the user community that relies on it.

References:
1. Gabella C, Durinx C, Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Res. 2017 Nov 27;6. Pii: ELIXIR-2051. doi: 10.12688/f1000research.12989.1. eCollection 2017. PubMed PMID: 29333230; PubMed Central PMCID: PMC5747334.

2. Mount D. Bioinformatics: Sequence and Genome Analysis, Second Edition (2004). Chapter 2. Cold Spring Harbor Laboratory Press

3. International Society for Biocuration. Biocuration: Distilling data into knowledge. PLOS Biology (2018) in press.

4. Gaudet P, Bairoch A, Field D, Sansone SA, Taylor C, Attwood TK, Bateman A, Blake JA, Bult CJ, Cherry JM, Chisholm RL, Cochrane G, Cook CE, Eppig JT, Galperin MY, Gentleman R, Goble CA, Gojobori T, Hancock JM, Howe DG, Imanishi T, Kelso J, Landsman D, Lewis SE, Karsch Mizrachi I, Orchard S, Ouellette BF, Ranganathan S, Richardson L, Rocca-Serra P, Schofield PN, Smedley D, Southan C, Tan TW, Tatusova T, Whetzel PL, White O, Yamasaki C; BioDBCore Working Group.Towards BioDBcore: a community-defined information specification for biological databases. Database (Oxford). (2011) baq027. doi:10.1093/database/baq027. Print 2011. PubMed PMID: 21205783; PubMed Central PMCID: PMC3017395.

5. Gaudet P, Bairoch A, Field D, Sansone SA, Taylor C, Attwood TK, Bateman A, Blake JA, Bult CJ, Cherry JM, Chisholm RL, Cochrane G, Cook CE, Eppig JT, Galperin MY, Gentleman R, Goble CA, Gojobori T, Hancock JM, Howe DG, Imanishi T, Kelso J, Landsman D, Lewis SE, Mizrachi IK, Orchard S, Ouellette BF, Ranganathan S, Richardson L, Rocca-Serra P, Schofield PN, Smedley D, Southan C, Tan TW, Tatusova T, Whetzel PL, White O, Yamasaki C; BioDBCore Working Group. Towards BioDBcore: a community-defined information specification for biological databases. (2011) Nucleic Acids Res. 39(Database issue):D7-10. doi:10.1093/nar/gkq1173. Epub 2010 Nov 18. PubMed PMID: 21097465; PubMed CentralPMCID: PMC3013734.

Additional information requested:
Name: Cecilia Arighi, Nicole Vasilevsky and Sandra Orchard
Work Email: intsocbio@gmail.com
Name of Organization:International Society for Biocuration (ISB) (www.biocuration.org)

 

For members of advocacy groups or professional societies (optional): Please indicate your role and indicate whether you are responding on behalf of your organization.
Cecilia Arighi is the Chair of the Society, Nicole Vasilevsky is the Secretary and Sandra Orchard the Treasurer. This RFI is submitted on behalf of the ISB.

Sent April 01, 2018

Biocuration Exchange Fellowship

The International Society for Biocuration is pleased to announce Luana Licata, from the University of Rome Tor Vergata as the first recipient of the Biocuration Exchange Fellowship. During her fellowship, she will visit the Protein Function Team at EMBL-EBI and the Gene Annotation Team, UCL London, to learn Gene Ontology (GO) annotation.

The Biocuration Exchange Fellowship is a short-term fellowship to promote collaborations and exchanges between groups working in the field of biocuration. The fellowship funds the visit of a biocurator to another laboratory or organization with long experience in biocuration. This visit constitutes a unique opportunity to learn new methods, experience biocuration in different settings and/or in different fields, and to establish mutually beneficial collaborations across groups and disciplines.

More information on Biocuration Exchange Fellowship can be found here.

FSCI Summer Training

 The FORCE11 Scholarly Communications Institute at the University of California, San Diego is a week long summer training course, incorporating intensive coursework, seminar participation, group activities, lectures and hands-on training. Participants will attend courses taught by world-wide leading experts in scholarly communications. Participants will also have the opportunity  to discuss the latest trends and gain expertise in new technologies in research flow, new forms of publication, new standards and expectations, and new ways of measuring and demonstrating success that are transforming science and scholarship.

FORCE11 Scholarly Communication Institute (FSCI)

July 30 – August 3, 2018
University of California San Diego (UCSD)
UCSD School of Medicine – Medical Education and Telemedicine (MET). La Jolla, CA 92161 USA

https://www.force11.org/fsci/2018

Contact name: Stephanie Hagstrom
info@force11.org

Eleanor Williams on winning the Biocuration Award 2018

mouse dissociated embryonic kidney cells reforming into a kidney organoid
Mouse dissociated embryonic kidney cells reforming into a kidney organoid – from the Image Data Resource

In mid-January, I got an email saying I had won the 2018 Biocuration Career Award from the International Society for Biocuration!  I was excited to win the award, thrilled to be going to the Biocuration conference in China and wondered what I would say in a presentation about “my career” at the conference.

That week I had just started in a new biocuration position and after hearing about the award, my new colleagues suggested I give a lunchtime talk on my scientific background so that the others in the team could get to know me better.  As I worked on a timeline and slides about each of my past biocuration positions, I realized that there actually was a lot that I wanted to say.

I have worked on several different projects over the last 17 years that I would describe as “biocuration.”  These jobs have varied greatly.  I started curating data in a research position at the Fred Hutchinson Cancer Research Center, collating information about mouse and human olfactory receptors. Then back in the UK I worked in a more “service” role in the established resources of ArrayExpress and Expression Atlas at the European Bioinformatics Institute (EMBL-EBI), curating data submissions, talking with submitters and I started to delve into the world of ontologies.  More recently I have been the primary curator setting up the metadata processing pipeline for the Image Data Resource at the University of Dundee.  This resource was built from scratch, and went where few others have dared to go, creating a robust data repository for complex bioimaging data, with the added value of metadata integration. It has been immensely satisfying to see this project grow both in size and reputation.

Now, after spending a lifetime in academia, I am off on a new adventure working as a Scientific Curator for a company, Genomics England, set up to  to provide a genomic medicine service for the UK National Health Service using data from the 100,000 Genomes Project.

My biocuration career has been partly driven by the external forces i.e. what fits with family life and funding opportunities, and partly by my own interests and desire to develop skills.  In hindsight each job seems a logical progression from the last, with new experiences, new skills, new technologies (github and conference calling were not around in the early 2000s) and new challenges that have made for a fun and interesting career.

I am very proud to receive the International Society of Biocuration Career Award.  It has helped me reflect on my work and recognize that I have had a career and not just a series of jobs!  I hope that by sharing my story and the skills I have found to be most useful, I can help others think about their own journeys.  It is also wonderful to appreciate the international community of biocurators who meet together to share experiences and recognize the importance of data and biocuration in the sciences. I am looking forward to meeting many of you in Shanghai!


Dr Eleanor Williams will be presenting her work at the International Biocuration Conference in Shanghai in April 2018, her talk will be on:

Title: Curating bioimaging data – lessons from the first 40 terabytes

Synopsis: I have experienced three very different types of biocuration work in my career. My first taste of biocuration was in a research lab curating information about olfactory receptor genes. I then moved to work on the well-established functional genomics databases of ArrayExpress and the Expression Atlas at EMBL-EBI curating data submissions submissions. From there I moved to a project at the University of Dundee setting up the Image Data Resource for bioimaging data. This was my biggest curation  challenge, starting almost from scratch to develop a method to capture the biomolecular, experimental and analytic annotations associated with images, and to create a pipeline to populate the database. I will describe the most useful biocuration skills I have learnt and some of the challenges I’ve encountered. I will finish by describing my new position, working as a scientific curator in a company performing analyses of the genomes of patients with rare diseases.

 

ISB members’ survey

The ISB is continually looking to improve and optimise its services for members. To help improve the benefits of your membership we are inviting you to fill in a short survey. In this way, you can tell us what you have observed so far and what you would like to see.

We appreciate your participation to help us ensure that we meet (or surpass) all your expectations.

This survey should take approximately 5 minutes to fill in.

Complete survey on Google Forms

We would like to have all responses back by March 21st, 2018 so we can report back to the community at the Biocuration meeting in Shanghai in April.

If you have any problems accessing Google forms, please contact the ISB Exec using intsocbio [at] gmail.com

Microgrant report: Arighi Oct.2017

ISB-Microgrant report of the BioCreative VI workshop
By Cecilia Arighi

The BioCreative VI workshop took place on October 18-20, 2017 in Bethesda, Maryland, USA.

BioCreative is a community-wide effort for evaluating text mining systems applied to the biological and biomedical domain. The meeting attracted participants from the biomedical natural language processing, biocuration, literature/publishing, research and funding
domains (over 60 workshop participants, one third being students), and 36 teams participated in the track activities (with representation from America, Asia, Europe and Australia).

The scientific program covered:

  1. the talks related to the individual tracks (ran previous to the workshop) with biocuration relevant topics (assignment of bioentity IDs to facilitate downstream curation; text mining services for triage for human kinases; extraction of causal network information using the Biological Expression Language; mining protein-protein interactions affected by mutations; and annotation of chemical-protein interactions)
  2. a panel about Innovation in biomedical digital curation with views presented by users, publishers, literature service providers
  3. a panel on funding stakeholders where funding opportunities and needs for text mining and collaborations were presented by representatives from various funding agencies;
  4. a general session for text mining topics that showcased other interesting bioNLP work;
  5. 2 keynote speakers (Dr. Patricia Flatley Brennan, Director, National Library of Medicine, talked about future of data-powered health, and Dr. Hongfang Liu, Mayo Clinic, discussed opportunities and challenges of text mining in precision medicine.
  6. a poster session with Additional points discussed included the challenges of using real data over a gold standard; the strategic direction of BioCreative; and the relationship with other NLP challenge evaluations.

Corpora and datasets from the different tracks are publicly available (with prior registration). The workshop Proceedings is publicly available on the BioCreative website.

Funds were used towards the rental of room for the poster session, and the ISB was listed as sponsor.

The OHSU Library Data Science Institute Promotes Biocuration and the ISB to Librarians and Researchers

By Nicole Vasilevsky

The Oregon Health Science University Library in Portland, Oregon hosted the “OHSU Library Data Science Institute” (GitHub repo) from November 6-8, 2017 in downtown, Portland.

The event was targeted towards researchers, librarians and information specialists with an interest in gaining beginner level skills in data science. The goal was to provide face-to-face, interactive instruction over a  three-day workshop. The learning objectives for the training were:

  • Increase awareness of key skills in data science and how these can be applied to the participants own daily practices, such as research or serving patrons
  • Increase confidence with using data science techniques
  • Increase the ability of participants to use or apply data science techniques in problems outlined in the course

Over 75 participants attended this event, which was held over the 3 days. Participants came from within and outside Portland,  Washington, Idaho, California, British Columbia and Kansas. The topics for the workshop included topics relevant to the biocuration community such as biomedical data standards; data description, sharing and reuse; and data cleaning and preparation. All of the  materials are openly available on our website. I gave a brief talk on the “Trials and Tribulations of a Biocurator” and described the lessons learned as a biocurator and how she wished she knew the things she knows now when she was a bench researcher (and how her biocuration skills can be applied in her current role as well). We hope that we instilled the value of biocuration and proper data management on researchers and librarians alike, and hope that they will apply the skills they learned to better manage and curate their data. We informed participants about efforts that are currently underway at the International Society for Biocuration, and distributed ISB stickers as well. Funds from the micro-grant were used to provide coffee each morning for attendees, which was greatly appreciated, and the ISB is listed as a sponsor on our website.

Deadline extension for Biocuration 2018

If you haven’t yet submitted your abstract for talks, posters or workshops for the 11th International Biocuration Conference, fear not! The deadline has been extended to 15th January 2018. 

Abstracts for talks and posters

Full details on abstract submissions can be found in the original call. They have a maximum length of 300 words and can cover a wide variety of topics from data ontologies to precision medicine.

Submit your abstracts at https://easychair.org/cfp/biocuration2018

Any questions should be sent to biocuration2018@126.com

Workshop proposals

Workshops are a great way to build connections, crowdsource ideas and to promote topics and standards in the global community.  If you’ve got an idea for a workshop, you can submit a proposal be emailing biocuration2018@126.com.

This should be a short paragraph (or two) describing the following:

  • Proposed scope and main objective, and their relation to biocuration
  • Brief discussion of why the topic is of particular interest at this time
  • Suggested format (talks, panel discussion, etc.)
  • Potential speakers, panels, or other activities

Visas

Biocuration 2018 is set to be a great conference and Shanghai is a wonderful location. Many people will require a visa so please remember to apply well in advance.

If you need an invitation letter for the visa, please send an email to biocuration2018@126.com , including your registration form. Local organizers will send the invitation letter within two days.

We look forward to seeing you in April!

Biocuration 2018 Call for Abstracts

DEADLINE EXTENDED: 15th January 2018

The 11th International Biocuration Conference will be held from April 08-11, 2018 in Shanghai, China. The conference is a unique event for biocurators and developers of biological databases to discuss their work, promote collaborations, and foster a sense of community in this very active and growing area of research.

You are invited to submit an abstract for a talk or poster presentation at the upcoming conference. This is a great occasion to enhance the recognition of your work and of our profession by the greater biological research communities.

This year abstracts are invited for the following topic areas:

  • Precision Medicine
  • Phenotypes, genotypes, and variants
  • Data Standards and Ontologies
  • Text Mining
  • Functional Annotation
  • Community Annotation
  • Data Integration and Visualization
  • Deep Learning in curation process
  • Softwares, Applications and Systems in biocuration
  • Curation Standards and Best Practice; inference from evidence; data and annotation quality

Abstracts on topics outside the above will also be considered for presentation.

Please be mindful of the following deadlines:

Submission deadline: EXTENDED to: January 15, 2018

Notifications: January 15, 2018

Conference: April 8-11, 2018

Please submit your abstract here: https://easychair.org/cfp/biocuration2018

All questions about submissions should be emailed to biocuration2018@126.com

Before submitting your abstracts please take into consideration:
  • All abstracts must be written in English.
  • The title and the abstract should be entered in plain text and should not contain HTML elements.
  • The abstract should be written in an unstructured format. Separate sections for Background, Materials and Methods, etc., are not needed.
  • Copy-paste text may include hidden formatting that exceeds the character limit. We recommend either: saving as ‘text only’ in your editor or e-mail program, OR copy-pasting it into Notepad and then onto the website.
  • Abstracts should not exceed 300 words, with the limit of 2000 characters (with spaces).
  • Some web browsers do not accept abstracts close to the 2000-character count.
  • If you have special symbols in your text, please ensure you are using Unicode characters; otherwise these will not be recognized.

Search by Categories