Decision Trees Workshop on Data for NLP

Writing Code for NLP Writing Code for NLP Who we are Matt Gardner (nlpmattg) Matt is a research scientist on AllenNLP He was the original architect of AllenNLP, and he co hosts the NLP Highlights pod.

Trang 1

Writing Code for

NLP

Trang 2

Who we are

Matt is a research scientist on AllenNLP He was the original

architect of AllenNLP, and he co-hosts the NLP Highlights podcast

Mark is a research engineer on AllenNLP He helped build AllenNLP and its precursor DeepQA with Matt, and has implemented many of the models in the demos

Joel is a research engineer on AllenNLP, although you may know him better from "I Don't Like Notebooks" or from "Fizz Buzz in

Tensorflow" or from his book Data Science from Scratch.

Trang 3

BREAK

Trang 4

What we expect you know already

Trang 5

modern (neural) NLP

Trang 6

Python

Trang 7

the difference between good science and bad science

Trang 8

What you'll learn today

Trang 9

how to write code in a way that facilitates good science and reproducible experiments

Trang 10

how to write code in a way that makes your life easier

Trang 11

The Elephant in the Room: AllenNLP

AllenNLP represents our experiences

and opinions about how best to write

research code

examples

tutorial wanting to give it a try

useful even if you never use AllenNLP

AllenNLP

Trang 12

Two modes of writing

research code

Trang 13

1: prototyping 2: writing components

Trang 14

Prototyping New

Models

Trang 15

Main goals during prototyping

- Write code quickly

- Run experiments, keep track of what you tried

- Analyze model behavior - did it do what you wanted?

Trang 16

- Write code quickly

Trang 17

Writing code quickly - Use a framework!

Trang 18

- Training loop?

Trang 19

- Training loop?

model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM,

len (word_to_ix), len (tag_to_ix)) loss_function = nn.NLLLoss()

sentence_in = prepare_sequence(sentence, word_to_ix)

targets = prepare_sequence(tags, tag_to_ix)

accuracy =accuracy) optimizer.step()

else : validation_loss += loss.item() t.set_postfix( validation_loss =validation_loss/(i +

1 ),

accuracy =accuracy) validation_losses.append(validation_loss)

Trang 20

- Tensorboard logging?

- Model checkpointing?

- Complex data processing, with smart batching?

- Computing span representations?

- Bi-directional attention matrices?

- Easily thousands of lines of code!

Trang 21

- Don’t start from scratch! Use someone else’s components

Trang 22

- But

Trang 23

- But

- Make sure you can bypass the abstractions when you need to

Trang 24

Writing code quickly - Get a good starting place

Trang 25

- First step: get a baseline running

- This is good research practice, too

Trang 26

- Could be someone else’s code as long as you can read it

Trang 27

- Could be someone else’s code as long as you can read it

Trang 28

- Even better if this code already modularizes what you want to

change

Add ELMo / BERT here

Trang 29

- Re-implementing a SOTA baseline is incredibly helpful for

understanding what’s going on, and where some decisions might have been made better

Trang 30

Writing code quickly - Copy first, refactor later

- CS degree:

Trang 31

- CS degree:

Trang 32

- CS degree:

We’re prototyping! Just go fast and find

something that works, then go back and

refactor (if you made something useful)

Trang 33

- Really bad idea: using inheritance to share code for related models

- Instead: just copy the code, figure out how to share later, if it makes sense

Trang 34

Writing code quickly - Do use good code style

- CS degree:

Trang 35

- CS degree:

Trang 36

Trang 37

Trang 38

Trang 39

Meaningful names

Trang 40

Shape comments on

tensors

Trang 41

Comments describing non-obvious logic

Trang 42

Write code for people,

not machines

Trang 43

Writing code quickly - Minimal testing (but not no testing)

- CS degree:

Trang 44

- CS degree:

Trang 45

- A test that checks experimental behavior is a waste of time

Trang 46

- But, some parts of your code aren’t experimental

Trang 47

- And even experimental parts can have useful tests

Trang 48

Makes sure data processing works consistently, that tensor operations run, gradients are

non-zero

Trang 49

Run on small test fixtures, so debugging

cycle is seconds, not minutes

Trang 50

Writing code quickly - How much to hard-code?

- Which one should I do?

Trang 51

I’m just prototyping! Why shouldn’t I just hard-code an

embedding layer?

Trang 52

Why so abstract?

Trang 53

On the parts that aren’t what you’re focusing on, you start simple Later add ELMo, etc.,

without rewriting your code.

Trang 54

This also makes controlled experiments easier (both for you and for people who come after you)

Trang 55

And it helps you think more clearly about the pieces of your model

Trang 56

- Run experiments, keep track of what you tried

Trang 57

Running experiments - Keep track of what you ran

- You run a lot of stuff when you’re prototyping, it can be hard to keep track of what happened when, and with what code

Trang 58

Trang 59

This is important!

Trang 60

- Currently in invite-only alpha; public beta coming soon

- https://github.com/allenai/beaker

- https://beaker-pub.allenai.org

Trang 61

Trang 64

Running experiments - Controlled experiments

- Which one gives more understanding?

Trang 65

Important for putting your work in

context

Trang 66

But… too many moving parts, hard

to know what caused the difference

Trang 67

Very controlled experiments,

varying one thing: we can make

causal claims

Trang 68

How do you set up your code for

this?

Trang 69

Trang 70

Possible ablations

Trang 71

GloVe vs character CNN vs

ELMo vs BERT

Trang 72

LSTM vs Transformer vs GatedCNN vs QRNN

Trang 73

- Not good: modifying code to run different variants; hard to keep track of what you ran

- Better: configuration files, or separate scripts, or something

Trang 74

- Analyze model behavior - did it do what you wanted?

Trang 75

Analyze results - Tensorboard

- Crucial tool for understanding model behavior during training

- There is no better visualizer If you don’t use this, start now

Trang 76

- Crucial tool for understanding model behavior during training

- There is no better visualizer If you don’t use this, start now

A good training loop will give you this for free, for any model

Trang 77

Trang 78

Tensorboard will find

optimisation bugs for

you for free.

Here, the gradient for

the embedding is 2

orders of magnitude

different from the rest

of the gradients.

Trang 79

you for free.

Trang 80

you for free.

the embedding is 2

orders of magnitude

different from the rest

of the gradients.

Embeddings have sparse

gradients (only some

embeddings are updated), but

the momentum coefficients

from ADAM are calculated for

the whole embedding every

from allennlp.training.optimizers import DenseSparseAdam

(uses sparse accumulators for

gradient moments)

Trang 81

Analyze results - Look at your data!

- Good:

Trang 82

- Better:

Trang 83

- Better:

Trang 84

- Best:

Trang 85

- Best:

How do you design your code for this?

Trang 86

- Best:

How do you design your code for this?

Well say more later, but the key points are:

- Separate data processing that also works on JSON

- Model needs to run without labels / computing loss

Trang 87

Key point during

prototyping:

The components that you use matter A lot.

Trang 88

We’ll give specific

thoughts on designing components after the break

Trang 89

Developing Good

Processes

Trang 90

Source Control

Trang 91

We Hope You're Already Using Source Control!

makes it easy to safely experiment with code changes

○ if things go wrong, just revert!

Trang 92

Trang 93

Trang 94

Trang 95

That's right, code reviews!

Trang 96

About Code Reviews

Trang 97

About Code Reviews

Trang 98

About Code Reviews

Trang 99

About Code Reviews

and clear, readable code allows your code reviews to

be discussions of your

modeling decisions

Trang 100

About Code Reviews

are wrong because of a bug

Trang 101

Continuous Integration (+ Build Automation)

Trang 102

Continuous Integration (+ Build Automation)

Continuous Integration

always be merging (into a branch)

Build Automation

always be running your tests (+ other checks)

(this means you have to write tests)

Trang 103

Example: Typical AllenNLP PR

Trang 105

if you're not building a library that lots of

other people rely on,

you probably don't

need all these steps

Trang 106

but you do need some

of them

Trang 107

Testing Your Code

Trang 108

What do we mean by "test your code"?

Trang 109

Write Unit Tests

a unit test is

an automated check that a

small part of your code works correctly

Trang 110

What should I test?

Trang 111

If You're Prototyping, Test the Basics

Trang 112

Prototyping? Test the Basics

def test_read_from_file ( self ):

conll_reader = Conll2003DatasetReader()

instances = conll_reader.read( ' data/conll2003.txt'))

instances = ensure_list(instances)

expected_labels = ['I-ORG', 'O', 'I-PER', 'O', 'O', 'I-LOC', 'O']

fields = instances[ 0 ].fields

tokens = [t.text for t in fields['tokens'].tokens]

'.']

fields = instances[ 1 ].fields

tokens = [t.text for t in fields['tokens'].tokens]

Trang 113

Prototyping? Test the Basics

assert len(tags[0]) == 7assert len(tags[1]) == 7

tag = idx_to_token[tag_id]

assert tag in {'O', 'I-ORG', 'I-PER', 'I-LOC'}

Trang 114

If You're Writing

Reusable Components, Test Everything

Trang 115

Test Everything

test your model can train, save, and load

Trang 116

Test Everything

test that it's computing / backpropagating gradients

Trang 117

Test Everything

but how?

Trang 118

Use Test Fixtures

create tiny datasets that look like the real thing

The###DET dog###NN ate###V the###DET apple###NN

Everybody###NN read###V that###DET book###NN

Trang 119

Use Test Fixtures

use them to create tiny

pretrained models

It’s ok if the weights are

essentially random We’re not testing that the model is

any good.

Trang 120

Use Test Fixtures

○ detect logic errors

○ detect malformed outputs

○ detect incorrect outputs

Trang 121

Use your knowledge to write clever tests

def test_attention_is_normalised_correctly (self):

# In order to test the attention, we'll make the weight which

# computes the logits zero, so the attention distribution is

# uniform over the sentence This lets us check that the

# computed spans are just the averages of their representations.

on parameters

Trang 122

Use your knowledge to write clever tests

def test_attention_is_normalised_correctly (self):

# In order to test the attention, we'll make the weight which

# computes the logits zero, so the attention distribution is

# uniform over the sentence This lets us check that the

# computed spans are just the averages of their representations.

Trang 123

Pre-Break Summary

○ Difference between prototyping and building components

○ When should you transition?

○ Good ways to analyse results

○ How to write good tests

○ How to know what to test

○ Why you should do code reviews

Trang 125

Reusable Components

Trang 126

What are the right abstractions for NLP?

Trang 127

The Right Abstractions

consistently proven useful

Trang 128

Things That We Use A Lot

Trang 129

Things That Require a Fair Amount of Code

sequence of tensors with a single tensor

Trang 130

Things That Have Many Variations

Trang 131

Things that reflect our higher-level thinking

○ text, almost certainly

Trang 132

Along the way, we need to worry about some things that make

NLP tricky

Trang 133

Inputs are text , but neural models want tensors

Trang 134

Inputs are sequences of things

and order matters

Trang 135

Inputs can vary in length

Some sentences are short

Whereas other sentences are so long that by the time you finish reading them you've already forgotten what they started off talking about and you have to go back and read them a second time in order to remember the parts at the beginning

Trang 136

Reusable Components

in AllenNLP

Trang 137

AllenNLP is built on PyTorch

Trang 138

and is inspired by the question

"what higher-level components would help NLP researchers do

their research better + more

easily?"

Trang 139

under the covers, every piece

of a model is a torch.nn.Moduleand every number is part of a torch.Tensor

Trang 140

but we want you to be able to

reason at a higher level most of the time

Trang 141

hence the higher level concepts

Trang 142

the Model

class Model ( torch nn Module , Registrable ):

def init ( self ,

vocab : Vocabulary,

regularizer : RegularizerApplicator = None ) -> None :

def forward( self , * inputs ) -> Dict[ str , torch.Tensor]:

def get_metrics( self , reset : bool = False ) -> Dict[ str , float ]:

Trang 143

○ which is good, since at inference / prediction time you don't have one

you'd want in an output dataset or a demo

Trang 144

every NLP project needs a Vocabulary

class Vocabulary ( Registrable ):

def init ( self ,

counter : Dict[ str , Dict[ str , int ]] = None, min_count : Dict[ str , int ] = None,

max_vocab_size : Union[ int , Dict[ str , int ]] = None, non_padded_namespaces : Iterable[ str ] = DEFAULT_NON_PADDED_NAMESPACES, pretrained_files : Optional[Dict[ str , str ]] = None,

only_include_pretrained_words : bool = False, tokens_to_add : Dict[ str , List[ str ]] = None, min_pretrained_embeddings : Dict[ str , int ] = None) -> None:

@ classmethod

def from_instances ( cls , instances : Iterable[ 'Instance' ], ) -> 'Vocabulary' : def add_token_to_namespace ( self , token : str , namespace : str = 'tokens' ) -> int : def get_token_index ( self , token : str , namespace : str = 'tokens' ) -> int :

def get_token_from_index ( self , index : int , namespace : str = 'tokens' ) -> str : return self._index_to_token[namespace][index]

def get_vocab_size ( self , namespace : str = 'tokens' ) -> int :

return len (self._token_to_index[namespace])

Định dạng
Số trang	254
Dung lượng	6,83 MB