using RandAugment with niacin¶
RandAugment (randaugment) is an algorithm for applying a variety of combinations of augmentations to training data, and experimentally appears to produce results on par with more complicated data augmentation policies like AutoAugment (autoaugment).
In the RandAugment algorithm, there is a predefined set of k
augmentation functions. For each training sample, a random subset of size n<<k
transformation functions are drawn, and applied to the individual sample with a strength or magnitude m
. Here, k, n, and m are hyperparameters to the augmentation policy.
Usage¶
To use RandAugment with enrichment functions from niacin, start by creating an instance of the RandAugment class, and provide a set of transformation functions to be considered. You can set n and m directly, or search over them via a hyperparameter tuning algorithm.
from niacin.augment import RandAugment
from niacin.text import en
augmentor = RandAugment([
en.add_synonyms,
en.add_hyponyms,
en.add_misspelling,
en.swap_words,
en.add_contractions,
en.add_whitespace,
], n=2, m=15, shuffle=False)
You can now use use this object directly, to augment input data by hand:
text = [
"No reading or writing makes a savage of men",
"They were praying for jail, but I mastered the pen",
]
for data in text:
for tx in augmentor:
data = tx(data)
print(data)
returns
No reading or wr it ing make a savage of men
They were praying for jail, but I mastered the ballpoint
With PyTorch Datasets¶
If you are using the PyTorch Dataset classes defined in niacin [1], you can give the augmentor to the Dataset class, and have it apply the transformations on the fly when it retrieves data:
from niacin.text.compat.pytorch import MemoryTextDataset
from torch.utils.data import DataLoader
text = [
"Tell them how we are funding all of these kids to go to college",
"Tell them how we are ceasing all these wars and stopping violence",
]
dataset = MemoryTextDataset(data=text, labels=[1, 1], transforms=augmentor)
loader = DataLoader(dataset)
for epoch in range(3):
print(epoch)
for data, labels in loader:
print(labels, data)
returns
0
tensor([1]) tensor([[ 0, 0, 6, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 16, 7, 0, 8, 14,
0, 0, 0, 0]])
tensor([1]) tensor([[ 2, 6, 5, 9, 4, 11, 3, 7, 0, 10, 17, 18]])
1
tensor([1]) tensor([[ 2, 6, 5, 9, 4, 13, 3, 16, 7, 0, 8, 14, 8, 12]])
tensor([1]) tensor([[ 2, 6, 9, 5, 4, 11, 3, 7, 19, 10, 17, 18]])
2
tensor([1]) tensor([[ 2, 6, 5, 9, 4, 13, 3, 16, 0, 0, 8, 14, 8, 12]])
tensor([1]) tensor([[ 2, 6, 5, 9, 0, 11, 3, 7, 0, 10, 18, 17]])
[1] | using niacin with pytorch loaders |