Einat
Sitbon

Roundtables – Prediction in real life – defining the predictability domain of models

Evogene

Einat Sitbon

Einat
Sitbon

Roundtables – Prediction in real life – defining the predictability domain of models

Evogene

Einat Sitbon

Bio

I am working in the biotech industry for eleven years, in various computational biology roles. In the past few years I develop computational tools for analysis of biological data in Evogene, aimed at improving plants for agriculture. This includes machine learning for the identification of insecticide genes, and phenotypic and genotypic analysis of complex traits in plants. In my work, I use various data sources, ranging from genomic information, to plant phenotype measurement.

 

I am currently using machine learning algorithms and exploring deep learning, in addition to using more traditional modeling. During my studies at Weizmann Institute towards a PhD, I used various computational tools to explore biological questions. I am fascinated by the complexity and variability of biology, and the intriguing questions and challenges it presents.

Bio

I am working in the biotech industry for eleven years, in various computational biology roles. In the past few years I develop computational tools for analysis of biological data in Evogene, aimed at improving plants for agriculture. This includes machine learning for the identification of insecticide genes, and phenotypic and genotypic analysis of complex traits in plants. In my work, I use various data sources, ranging from genomic information, to plant phenotype measurement.

 

I am currently using machine learning algorithms and exploring deep learning, in addition to using more traditional modeling. During my studies at Weizmann Institute towards a PhD, I used various computational tools to explore biological questions. I am fascinated by the complexity and variability of biology, and the intriguing questions and challenges it presents.

Abstract

Predictive models developed for life sciences must consider the complexity and the limitations of available data for training. When using models for prediction based on new data, it is assumed that the new data is similar enough to the training set, thus predicting correctly. When this assumption does not hold, the results can be anywhere between comic and disastrous.

 

One of the examples in the field of agriculture is a model built to predict the toxicity of a new herbicide molecules to honeybees. The model is built based on training set of known herbicides molecules whose toxicity was measured experimentally. Can we use this model to predict the toxicity to honeybees of a new molecule, which is not a herbicide?

 

Similarity is an elusive definition – objects could be similar in some aspect, but very different in others. Issues include the “Curse of dimensionality”, similarity to models with different diversity, structure in training data and more.

Here I will present several approaches we explored to define the applicability of new data to different given models.

Abstract

Predictive models developed for life sciences must consider the complexity and the limitations of available data for training. When using models for prediction based on new data, it is assumed that the new data is similar enough to the training set, thus predicting correctly. When this assumption does not hold, the results can be anywhere between comic and disastrous.

 

One of the examples in the field of agriculture is a model built to predict the toxicity of a new herbicide molecules to honeybees. The model is built based on training set of known herbicides molecules whose toxicity was measured experimentally. Can we use this model to predict the toxicity to honeybees of a new molecule, which is not a herbicide?

 

Similarity is an elusive definition – objects could be similar in some aspect, but very different in others. Issues include the “Curse of dimensionality”, similarity to models with different diversity, structure in training data and more.

Here I will present several approaches we explored to define the applicability of new data to different given models.

Discussion Points

  • What is a predictability domain and why it is important
  • Predictability in the life sciences
  • Different approaches to predictability
  • Distance metrics and the curse of dimensionality
  • Real life examples

Discussion Points

  • What is a predictability domain and why it is important
  • Predictability in the life sciences
  • Different approaches to predictability
  • Distance metrics and the curse of dimensionality
  • Real life examples

Planned Agenda

Planned Agenda