Técnicas de IA para Biologia
1 - Introduction
André Lamúrias
Introduction
Summary
- Course structure and assessment
- AI and the origin of Artificial Neural Networks
- Machine Learning
- The power of nonlinear transformations
- What deep learning offers
Introduction
Course Overview
Overview
Objectives
- Overview of two important AI fields in biology
- A practical introduction (some theory, some practice)
Two parts
- Deep learning (sub-symbolic)
- Build and train deep neural networks
- Apply to (semi) realistic problems (realistic take more computation power)
- Ontologies (symbolic)
- Understand and use tools for inference with biological knowledge
Overview
Instructor
- André Lamúrias (a.lamurias@fct.unl.pt)
Assessment:
- 2 short assignments, one for each part (25% each)
- 1 test (or exam), on April 22th (during the classes - date and time to be confirmed)
Website
Overview
Main Bibliography (part 1)
- B. Goodfellow et. al., Deep Learning, MIT Press, 2016
- S. Skansi, Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence , Springer, 2018
- A. Géron, Hands-on machine learning with Scikit-Learn and TensorFlow, O'Reilly Media, Inc, 2017
- P. Singh and A. Manure, Learn TensorFlow 2.0, Springer 2020
Overview
Main Bibliography (part 2)
- P. Robinson and S. Bauer, Introduction to Bio-Ontologies, Chapman & Hall, 2011
- C. Dessimoz and N. Skunca, The Gene Ontology Handbook, Springer, 2017
- G. Antoniou et al., A Semantic Web Primer, MIT Press, 2012
- F: Baader et al., The Description Logic Handbook, Cambridge University Press, 2010
- S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Pearson, 2020
Overview
Software
- Python 3.x + Tensorflow 2
- Options
- virtualenv
- Anaconda
- Docker
- Windows Subsystem for Linux (Ubuntu on Windows)
- Google Colab
Introduction
Artificial Intelligence
Artificial Intelligence
The beginning of AI
- 1956: Dartmouth Summer Research Project on Artificial Intelligence
- John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon
- "proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it"
- Initially, most successful approach of AI was to process rules
- Expert systems, logic programming, ...
- Rule-based expert systems
- Rules provided by humans
- Computer does inference to reach conclusions
Artificial Intelligence
The beginning of AI
- Rule-based expert systems
- Rules provided by humans
- Computer does inference to reach conclusions
- E.g. MYCIN, 1975 (Shortliffe, A model of inexact reasoning in medicine)
If:
(1) the stain of the organism is gram positive, and
(2) the morphology of the organism is coccus, and
(3) the growth conformation of the organism is chains
Then :
there is suggestive evidence (0.7) that the identity of
the organism is streptococcus
- Such systems were initially quite successful in specific areas
Artificial Intelligence
The beginning of AI
- Problems with rule-based expert systems
- Computational complexity
- Rigid rules, less adaptative
- Knowledge aquisition problem
Artificial Intelligence
The beginning of Neural Networks
- The modelling of neurons predates modern AI
- 1943: McCulloch & Pitts, model of neuron
BruceBlaus, Chris 73: CC-BY, source Wikipedia
Artificial Intelligence
The beginning of Neural Networks
- The modelling of neurons predates modern AI
- 1943: McCulloch & Pitts, model of neuron
Chrislb, CC-BY, source Wikipedia
Artificial Intelligence
The perceptron, the first learning machine
- 1958: Rosenblatt, perceptron could learn to distinguish examples
«the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.»
New York Times, 1958
Wightman and Rosenblatt. Source: Cornell Chronicle
Artificial Intelligence
Perceptron
- Linear combination of the $d$ inputs and a threshold function:
$$ y = \sum \limits_{j=1}^{d} w_jx_j + w_0 \quad s(y) = \begin{cases} 1, &y > 0 \\ 0, &y \leq 0 \end{cases}$$
- Training rule for the perceptron:
$$w_i = w_i + \Delta w_i \qquad \Delta w_i = \eta (t-o)x_i$$
- Adjust weights slightly to correct misclassified examples.
- Greater adjustment to those with larger inputs.
Artificial Intelligence
Perceptron
- First implemented on an IBM 704, 1958
- Learned to distinguish between cards punched on the right and punched on the left after 50 examples
Rosenblatt and IBM 704. Source: Cornell Chronicle
Artificial Intelligence
Perceptron
- But then was actually built as a machine
Camera with 20x20 pixels, for image recognition
Electric motors to adjust potentiometers for the weights of the inputs
Mark I Perceptron (Wikipedia)
Artificial Intelligence
Perceptron
- Seemed a promising start
- But the perceptron is just a linear model
Artificial Intelligence
Perceptron
- It's a single neuron, so a linear classifier
- similar to logistic regression that was already known
Artificial Intelligence
Neural Networks
- A very promising early start with neuron and perceptron:
- 1943: McCulloch & Pitts, model of neuron
- 1958: Rosenblatt, perceptron and learning algorithm
- But these turned out to be equivalent to generalized linear models
- And in 1969 Perceptrons (Minsky, Papert): need fully connected networks
1960-mid 1980s: "AI Winter", in particular ANN
- Logic systems ruled AI for the larger part of this period
- But eventually funding was cut drastically
- 1986: Rumelhart, Hinton, Williams, backpropagation can be used for multi-layer networks
Introduction
Machine Learning
Machine Learning
What is machine learning?
- "Field of study that gives computers the ability to learn without being explicitly programmed"
(Samuel, 1959)
- "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E"
(Mitchell, 1997)
Machine Learning
Machine Learning problem
- A task that the system must perform.
- A measure of its performance
- The data used to improve its performance
- Examples:
- Spam filtering
- Image classification
- Medical diagnosis
- Speech recognition
- Autonomous driving
- Clustering, feature representation, ...
Machine Learning
Basic kinds of ML problems
- Unsupervised learning
- No need for labels in data;
- Find structure in data
- Clustering is a common example, but we will see applications in deep learning
ML Problems
- Example: clustering images
Group searches with features from image and HTML (Cai et al, Clustering of WWW Image Search Results, 2004)
Machine Learning
Basic kinds of ML problems
- Unsupervised learning
- Supervised learning
- Uses labelled data and aims at predicting classes or values
- Continuous values: Regression
- Discrete classes: Classification
ML Problems
Supervised learning
- Example: face identification
Valenti et al, Machine Learning Techniques for Face Analysis, 2008
Machine Learning
Basic kinds of ML problems
- Unsupervised learning
- Supervised learning
- Semi-supervised learning
- Mixes labeled and unlabeled data
- Can be useful to increase size of data set
Machine Learning
Basic kinds of ML problems
- Unsupervised learning
- Supervised learning
- Semi-supervised learning
- Self-supervised Learning
- Labels are intrinsic to the data
Machine Learning
Basic kinds of ML problems
- Unsupervised learning
- Supervised learning
- Semi-supervised learning
- Self-supervised Learning
- Reinforcement learning
- Optimize output without immediate feedback for each instance
Machine Learning
Can solve different kinds of problems
- Extracting new features and finding relations
- Approximating a target
- Optimizing policy
Machine Learning
The rise of machine learning
- In the 1990s, AI shifted from knowledge-driven to data-driven with new ML algorithms
- E.g. 1992 Vapnik et. al. publish the kernel trick for SVM
- 1995: SVM (Cortes & Vapnik), Random Forest (Ho)
- 1997: Multi-layered and convolution networks for check processing USA (leCun)
- 1998: MNIST database (LeCun). Benchmarks, libraries and competitions
Introduction
The power of nonlinearity
Nonlinearity
Linear classification
Nonlinearity
Linear classification, e.g. Logistic Regression
$$g(\vec{x},\widetilde{w})=P(C_1|\vec{x}) \qquad g(\vec{x},\widetilde{w}) = \frac{1}{1+e^{-(\vec{w}^T\vec{x}+w_0)}}$$
Nonlinearity
Linear classification, e.g. Logistic Regression
Nonlinearity
Linear classification, e.g. Logistic Regression
Nonlinearity
Nonlinear expansion of attributes
- We can expand the attributes non-linearly ($x_1 \times x_2$)
Nonlinearity
Nonlinear expansion of attributes
- We can expand the attributes non-linearly ($x_1 \times x_2$)
Nonlinearity
Nonlinear expansion of attributes
- We can expand further ($x_1,x_2,x_1x_2,x_1^2,x_2^2$)
Nonlinearity
Nonlinear expansion of attributes
- We can expand further ($x_1,x_2,x_1x_2,x_1^2,x_2^2,x_1^3,x_2^3$)
Nonlinearity
Nonlinear expansion of attributes
- With logistic regression this is not practical
- Support Vector Machines do this automatically
$$\underset{\vec{\alpha}}{\operatorname{arg\,max}}\sum\limits_{n=1}^N \alpha_n -\frac{1}{2} \sum\limits_{n=1}^{N}\sum\limits_{m=1}^{N} \alpha_n \alpha_m y_n y_m K(\vec{x}_n,\vec{x}_m) $$
- Where $\vec{\alpha}$ is a vector of coefficients, $K(\vec{x}_n,\vec{x}_m)$ is the kernel function for some non-linear expansion $\phi$ of our original data
Nonlinearity
Nonlinear expansion of attributes
- Example, using a polynomial kernel: $K_{\phi(\vec{x}^n)} = (\vec{x}^T\vec{z} + 1)^2$
Nonlinearity
Nonlinear expansion of attributes
- Example, using a polynomial kernel: $K_{\phi(\vec{x}^n)} = (\vec{x}^T\vec{z} + 1)^3$
Deep Learning
No free lunch
No free lunch
No-free-lunch theorems (Wolpert and MacReady, 1997)
"[I]f an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems."
Important for two reasons:
- No single model can be best at all tasks:
- We need to create different models optimized for different tasks
- Overfitting
- The hypothesis chosen may be so adjusted to the training data it does not generalize
Overfitting
Nonlinearity is important for capturing patterns in data
- But can lead to loss of generalization
"With great power comes great overfitting"
Benjamin Parker (attributed)
Overfitting
Occurs when model adjusts to noise
- Some details are informative about patterns in the population
- Some are particular to the data sample and do not generalize
Overfitting
Occurs when model adjusts to noise
- Measuring overfitting:
- Evaluate outside the training set
- Validation set: used for selecting best model, hyperparameters, ...
- Test set: used to obtain unbiased estimate of the true error
- Preventing overfitting:
- Adjust training (regularization)
- Select adequate model
- Use more data (allows more powerful models)
Machine Learning
What do we have in "classical" machine learning?
- Many algorithms do nonlinear transformations
- Many different models
- Great diversity, with different algorithms
- The right features
- Feature extraction usually done by the user
- Preventing overfitting
- Method depends on the algorithm
- Ability to use large amounts of data
Machine Learning
Deep learning helps solve these problems
- Nonlinear transformations, stacked
- Many different models
- but all built from artificial neurons
- The right features
- can be done automatically determined by the model during training
- Preventing overfitting
- Ability to use large amounts of data
Introduction
Summary
- Overview of the course
- AI and Machine learning
- Nonlinear transformations and Overfitting
- The promise of deep learning
Further reading:
- Skansi, Introduction to Deep Learning, Chapter 1
- Goodfellow et al, Deep Learning, Chapters 1 and 5