Tensorflow + PyTorch in 10 minutes

A machine learning cheat sheet, because there are too many ways to do any one thing.

This page is intended to be a living document recording an opinionated and sufficient subset of the scaffolding required to build a production level project on Tensorflow or PyTorch. Currently the section on Tensorflow is complete, and a PyTorch overview is underway.

Snippets below are aimed at advanced uses of TF+Pytorch. Frameworks like Keras/FastAI/Lightning/Catalyst are intentionally excluded.

Tensorflow 2.x

Tensorflow 2.x

Preparing Data

Data for TF/Keras models is best handled as tf.data.Dataset objects.

Creating Datasets

Datasets can be created by using the from_tensors or from_tensor_slices methods which, despite their names, take any tensor-ish object as input. This includes numpy arrays, python lists, and TF tensors.

tensor = tf.constant([[1, 2], [3, 4]])

dataset = tf.data.Dataset.from_tensors(t)       # [[1, 2], [3, 4]] 1x elements of shape (2,2)
dataset = tf.data.Dataset.from_tensor_slices(t) # [1, 2], [3, 4]   2x elements of shape (2)

Processing Datasets

Often we’ll want to take a list of filenames and process, say, the images in those files. To do this, map the dataset over a parsing function. Specify num_parallel_calls=tf.data.experimental.AUTOTUNE when mapping to allow TF to use builtin heuristics to parallelize the mapping.

def parse_function(fname):
    parsed_example = tf.io.read_file(filename)
    image = tf.io.decode_jpeg(parsed_example)
    return image

fnames = glob.glob('images/*.jpg')
dataset = tf.data.Dataset.from_tensor_slices(fnames)
dataset = tf.data.Dataset.map(parse_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)

Use batching, shuffling, and repeat the dataset when training for multiple epochs. Call repeat before batch to ensure consistent batch sizes in the case where the dataset size is not a multiple of the batch size.

dataset = dataset.repeat().shuffle(buffer_size=100, seed=0)
dataset = dataset.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

It’s often helpful to combine labels and data with zip

image_data = [[1,2], [3,4]]
label_data = ['apple', 'banana']

image_dataset = tf.data.Dataset.from_tensor_slices(image_data)      # [1,2], [3,4]
label_dataset = tf.data.Dataset.from_tensor_slices(label_data)      # 'apple', 'banana'
final_dataset = tf.data.Dataset.zip((image_dataset, label_dataset)) # ([1,2], 'apple'), ([3,4], 'banana')

See the TF Data Performance Guide for info on optimizing dataset operations. In general: interleave when you have multiple datasets, batch before map, cache when possible.

Creating Models

I recommend using tf.keras.Model models even when operating in Tensorflow land. It’s fully compatible and has nice semantics. A very simple model that just wraps resnet looks like this:

class MyModel(tf.keras.model):
    def __init__(self, num_classes=10, name=='my_model'):
        super(MyModel, self).__init__(name=name)
        self.backbone = tf.keras.applications.ResNet101(input_shape=(321,321,3), weights='imagenet', include_top=False)
        self.classifier = tf.keras.layers.Dense(num_classes, activation=None, kernel_regularizer=None, name='desc_fc')
    
    def call(self, inputs, training=True):
        x = self.backbone(inputs)
        logits = self.classifier(x)
        return logits

You can, of course, nest any module-like objects

class WrapperModel(tf.keras.model):
    def __init__(self, backbone, name=='my_model'):
        super(MyModel, self).__init__(name=name)
        self.backbone = backbone
    
    def call(self, inputs, training=True):
        x = self.backbone(inputs)
        return x

backbone_model = MyModel()
model = WrapperModel(backbone_model)

Training

The main steps when training models are:

Get the model output
Compute a loss
Compute and backpropagate the gradients with respect to the loss and model
Repeat

With a model and dataset computing outputs is simple

model = create_model(num_classes)
batch = create_dataset().take(1)

probabilities = model(batch)

To record execution for automatic differentiation and backprop, use a tf.GradientTape

optimizer = tf.keras.optimizers.Adam()

with tf.GradientTape() as tape:
    probabilities = model(batch)
    loss = f.keras.losses.SparseCategoricalCrossentropy(labels, probabilities)

gradients = tape.gradient(loss, model.trainable_weights)
clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
optimizer.apply_gradients(zip(clipped, weights))

The whole train loop might look like this:

while step < max_steps_count:
    labels, batch = next(train_dataset_iterator)
    with tf.GradientTape() as tape:
        probabilities = model(batch)
        loss = compute_loss(labels,probabilities)
    
    gradients = tape.gradient(loss, model.trainable_weights)
    clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
    optimizer.apply_gradients(zip(clipped, weights))

From there, recording the progress to Tensorboard is easy:

summary_writer = tf.summary.create_file_writer('train_logs', flush_millis=10000)
with summary_writer.as_default():
    with tf.summary.record_if(
        tf.math.equal(0, optimizer.iterations % report_interval)):
        while step < max_steps_count:
            ... (see above)
            tf.summary.scalar(
                'loss/crossentropy', loss, step=optimizer.iterations.numpy())