about this course
Machines have had some ability to understand and generate text for several decades.
More recently, neural approaches have revolutionized the field.
The most striking and impressive models are large Transformer-based models
trained on large amounts of text.
But if "Attention is All You Need,"
then maybe we don't need those large language models at all.
If "Language Models are Few-Shot Learners,"
then maybe small language models can learn from a few examples too.
Maybe we can develop a small language model that better understands our language
if we "pay attention" to the language.
Training a model to examine the links between words in a sequence and
directly model those relationships trains the model to understand human language.
Super-sizing a model super-sizes the cost. It does not super-size the understanding.
Larger models tend to perform better than smaller models, but
performance gains diminish as model size increases.
At large sizes, language models can become superficially fluent
without truly understanding human language. Such
"Stochastic Parrots"
simply repeat long sequences that they learned from the training data.
So we need to "pay attention" to those datasets
and ask what the model has learned.
Language models perform better in the domain that they have been trained on.
A model trained on Wikipedia will not understand Huckleberry Finn.
So we also need to ask if there is good reason to believe that
a large language model can be fine-tuned for a given task.
Many times there will be a good reason. Sometimes there will not.
And when the language is not English, a large language model
that can be fine-tuned for any task might not even exist at all.
In those cases, we need to "pay attention" to the language, so that
we can train a small language model that understands our (non-English) language.
what you will learn
This course will compare the performance of RNNs, Transformers, BERT and GPT to previous approaches.
And it will pay particular attention to how those performance gains were achieved.
Did the researchers develop a better model? Or did they train a larger model?
For example, the fluency and translation quality of neural translation models
far surpasses that of phrase-based statistical models.
And in low-resource cases too.
But what's important is how those performance gains were achieved.
Instead of translating words or phrases, the neural approach attempts to understand context.
Neural models translate better than phrase-based models because they attempt to
create a sentence in the target language with the same meaning as the source language sentence.
In that spirit, this course will explore neural approaches to natural language processing.
Comparing them, it will ask how we can develop models that better understand our language.
By training small comparably-sized models, we can
compare approaches. Holding model size constant, we'll ask
which training or fine-tuning technique performs best on a given task.
Identifying the techniques that work well at small scale,
we'll find techniques that work exceptionally well at large scale.