Expedition at the GPU Technology Conference

This week the team at Expedition Technology had the opportunity to publicly discuss a couple of the compelling projects we are working here. At NVIDIA’s GPU Technology Conference in DC (GTC DC) we presented results on computer vision and on signal processing. The talks were:

Soon you will be able to watch the videos of these talks on NVIDIA’s site for the full experience and follow along with the slides above.

The projects outlined in these talks are great examples of the type of work we tackle here at EXP and are also representative of the state-of-the-art algorithms and results we are developing. If taking on these kinds of big ideas and building solutions to address them is the sort of thing you would love to be doing, drop us a line or check out our current job postings!

Expedition Technology Wins DARPA Award to Map the IoT Via Machine Learning

(Dulles, VA) August 2, 2018 – Expedition Technology, Inc., is proud to announce the receipt of a three-year prime contract award worth up to $9.1 million from the Defense Advanced Research Projects Agency (DARPA) for the Radio Frequency Machine Learning System (RFMLS) program.

RFMLS is the first DARPA program to emphasize the application of machine learning to the RF spectrum. Machine learning is demonstrating considerable success when used in related fields including speech recognition and computer vision, but it has not yet been similarly applied to the crowded spectrum of signals that currently exists.

Through this contract, Expedition Technology and its partners will develop the foundations for applying modern data-driven Machine Learning to the RF Spectrum domain as well as develop practical applications in emerging spectrum problems which demand vastly improved discrimination performance over today’s hand-engineered RF systems. Ultimately, these innovations will result in a new generation of RF systems that are goal-driven and can learn from data rather than being hand-engineered by experts.

The four technical components of the program include: feature learning, attention and saliency, autonomous RF sensor configuration and waveform synthesis. A successful RFML system is intended to address the need for enhanced spectrum situational awareness. By discerning subtle differences in signals transmitted by mass-produced devices, RFMLS strives to identify signals intended to spoof or hack into devices in the Internet of Things (IoT). Additionally, RFMLS investigates new paradigms for the rapid evaluation of broad spectrum use to better support cognitive radio applications.

“The RFMLS program is the centerpiece of Expedition Technology’s rapidly growing portfolio of RF machine learning capabilities,” says Marc Harlacher, President and CEO. Harlacher continues, “Success in this endeavor will give our military the ability to discern and characterize signals in the increasingly-crowded RF spectrum, enhancing the ability to understand what is going on in the wireless domain.”

EXP is a prime contractor for DARPA’s RFMLS program, leading a team that includes the International Computer Science Institute (ICSI) and Leidos as partnering subcontractors.

About Expedition Technology
Expedition Technology (EXP) is a leading developer of machine learning algorithms and autonomous systems for defense and intelligence C4ISR applications including radar, lidar, imaging, full motion video, communications, navigation, signal intelligence, and data analytics. As a small business with extensive experience researching, engineering, developing and operating civil and military defense and aerospace systems, EXP is applying rapidly evolving machine learning capabilities to provide our U.S. Government customers with improved situational awareness and actionable intelligence.

Fighting GAN Mode Collapse by Randomly Sampling the Latent Space 

At Expedition Technology (EXP) we develop a broad set of deep learning solutions for our customers. Each deep learning development cycle typically starts with

  • Understanding the problem space
  • Getting acquainted with the research landscape
  • Tweaking an existing algorithm or developing entirely new architectures
  • Training on an army of GPUs

This is the standard process, but with a constraint: it requires very large diverse data sets to get good results. As many of our customer’s problems grow more sophisticated, absence of that constraint is becoming an ever rarer occurence. In these cases where data is scarce, there is a necessary additional step – amplifying the data that you have.

For help with this, we have been turning to Generative Adversarial Networks (GANs). Despite their wide-ranging success, deep generative methods are hindered by well-known drawbacks such as unstable minima and mode collapse. We have recently made progress regarding the latter and would like to share our methods with the rest of the deep learning community. In this post we will introduce GANs, describe mode collapse, and then explain how we’ve attempted to mitigate this problem while adding justifications and results to support our claims.

GANs

Generative Adversarial Networks [1] (GANs) are an incredible technology. Although classification and segmentation are necessary problems, they don’t have the catchy, easy-to-appreciate results GANs do. After all, you can’t become a great artist just by learning to distinguish Van Gogh from Monet. You have to actually pick up a paintbrush and try your hand at it. Similarly, if we strive to make intelligent systems, they must be able to not only discriminate, but to generate believable outputs. That’s where we cross the border from a passive to an active agent.

[6] – Architecture for a GAN generating MNIST digits

GANs operate by combining two networks – one that creates output, and one that provides feedback. The ‘generator’, as it’s called, is provided a random input and tries to return a correspondingly random output. The ‘discriminator’ then compares this generated sample to real world ones and gives a zero to one score of how believable it is. It’s really just a competition: the generator is trying to fool an ever-improving discriminator. If you let them duke it out a few million times, you end up with a discriminator that learns the real world from the fake world, as well as a generator that does a pretty good job at making realistic looking samples.

This is a powerful tool, as it theoretically allows for creating unlimited additional data. If the generated samples are within the set of all possible inputs, then we can turn 100 data points into 1000 by letting the generator hallucinate 900 new but plausible examples.

Mode collapse

There’s a problem, though. Let’s look at the following situation [2] as a GAN tries to make pictures of cars:

  1. After bumbling around for a bit, the generator learns to draw convincing Honda Civics
  2. The discriminator picks up on this and starts labeling most Honda Civics as generated
  3. In response to this, the generator tweaks its algorithm a bit and begins making a similar but separate class – Honda Accords
  4. Now the discriminator has to adjust, so it starts calling Honda Accords fake
  5. While the discriminator is distracted by Accords, the opportunity presents itself to start making convincing Civics again, which the generator happily reverts to
  6. Repeat steps 2-5

This infinite loop of similar outputs is termed mode collapse, and it is one of the things restricting GANs from being widely used as a data amplification tool. The consequence of mode collapse is that we cannot create an unlimited supply of unique samples, since our generator only flicks back and forth between a couple very similar outputs. This minimally satisfies the job of fooling the discriminator but is ultimately unhelpful if we are trying to stretch the effectiveness of our currently available data.

How to avoid mode collapse

To reconcile this, we decided to add a constraint: the generator outputs must be random, but in such a way that any such random output is believable. An intuitive way to enforce this is to find some compressed space Χ that is densely packed with examples, such that any point within that space corresponds to a true data sample. If we can also find a bijection f: Χ→Y from X, our densely packed space, to Y, our space of real examples, then we can randomly sample Χ, and convert those points to plausible outputs.

Luckily for us, autoencoders are great at finding exactly such a space and such a function. The basic idea is that an autoencoder takes input, processes it to a lower dimensionality vector, then reconstructs the input from that vector. The bottleneck in the middle, then, contains the relevant information about the input with fewer variables, providing us a compressed space, referred to as the latent space. The decoder, given a point in that space, recreates the input that was encoded, which provides us with our bijection f. This relies on two assumptions that we will provide evidence for in the next section.

[5] – Architecture for an autoencoder that compresses MNIST digits

What does this all mean? If we set up an autoencoder to densely encode inputs to a latent space, then any randomly sampled point in that latent space should give a realistic, equally random output upon decoding. Somewhat surprisingly, with a small enough dimensionality of the latent space, this actually works.

Our architecture for the L-GAN

To employ this effectively, we make a small GAN that finds a sub-basis of this latent space, and then take random samples from this sub-basis. In practice, this means that we train a GAN to generate a batch of vectors, enforce that they are orthogonal using their dot product, and then take random linear combinations of these vectors. The discriminator then decides whether these linear combinations are convincing latent space encodings. Those that fool the discriminator get decoded into realistic samples. Due to the sampling being random and the decoder being a bijection, our results are random elements that are indiscernible from the true data. See the figure below for some examples of non-cherrypicked eights generated by the network.

Random 8’s generated by our GAN + Decoder

The reason for having the GAN find a sub-basis is that it is difficult to find a perfect dimensionality of the latent space. This means that not every one of the axes is guaranteed to be utilized evenly. Therefore, it is more sensible to choose a dimensionality that allows the autoencoder some leniency, and to then let the generator learn the necessary basis of ‘highest plausibility’.

This approach is reminiscent of variational autoencoders (VAEs) [4], which also encode the data samples for the purposes of generation. VAEs, however, sample the latent space differently, electing instead to add random std. normal vectors to the encodings. In a VAE, the normal vectors are based on a mean and standard deviation that are also created by the encoder. In our approach, the encoder simply defines the latent space, which is then sampled by a wholly separate GAN.

Reasoning for why this works

There are two critical assumptions that substantiate our approach:

  1. The latent space is densely packed
  2. The decoder approaches a bijection

We provide two points of evidence to show that the latent space is densely packed. The first is a thought experiment. Given inputs that have 10 independent variables, and an encoded vector of length 5, we should expect that an autoencoder learns to utilize every degree of freedom to its fullest extent. If, instead, it only uses three axes of the five provided to it, the autoencoder will be further from representing the ten independent variables of the input space, implying that an easy lower minimum is available on the error landscape. This presents the caveat that our encodings need to be smaller in dimensionality than the number of independent variables in the input space. Such a requirement ensures that the optimal encoder takes advantage of every axis provided to it. Simply said, if you don’t give the encoder adequate dimensionality to represent the information, it must learn to take advantage of everything it has.

The second point is empirical, as seen by traveling through a latent space. It turns out, if we encode two handwritten MNIST digits to a latent space, the points between their encodings also represent plausible outputs, as seen in the figure [3] below. This implies that, given two known points in latent space, any point randomly between them is likely to also represent believable outputs. Our approach treats the latent representations differently by making a unique space for each digit, rather than a single latent space for all of them. In either case, the result should still hold.

[3] – Movement in the latent space from the encoding of a five to the encoding of a nine

Towards the second assumption, it is not true that the decoder is a true bijection, in part due to the discrete nature of the dataset. However, we can make a case that the decoder of a functional autoencoder will approach a bijection, as long as the encodings map to a densely packed space. We do this by showing that the encoder approaches a bijection from true inputs to a unique point in the latent space. The decoder then, as the inverse of the encoder, must learn the inverse bijection.

Before explaining the reasoning for the decoder being a bijection, we want to touch on why this is necessary. A bijection is a function fY that is both ‘onto’ and ‘one-to-one’. This means that any possible value O ∈ {Outputs} has exactly one corresponding input I for which f(I) = O. If both the encoder and the decoder are bijections, then any point randomly sampled in the latent space must have a unique, correspondingly random point in the true data space.

We can claim that the encoder is ‘onto’ as a consequence of our reasoning for the latent space being densely filled. In order to fill that dimensionality, the encoder must attempt to map the inputs into different locations within the latent space. As such, if the whole constrained-dimensionality latent space is filled, then the encoder is onto. We can also show that a working autoencoder’s encoder is ‘one-to-one’ by contradiction. If it were not one-to-one, then two different inputs could map to the same latent representation. Due to the assumption that the autoencoder is functional, this point in the latent space would be decoded back out to the two different inputs. This is not possible by the definition of a function. As such, an optimal encoder approaches a bijection, therefore the decoder must also do the same.

These assumptions come together for the logic of our generative approach. Autoencoders can find a latent space in which every point maps to plausible outputs, and simultaneously approximate the bijection between this latent space and the output space. Therefore, randomly sampling the dense latent space corresponds to randomly sampling the set of realistic data samples. The quality of decoded samples is then a direct result of how ‘bijective’ the encoding and decoding operations are.

Results

The ultimate goal is to amplify our existing data by generating new samples that are indiscernible from the original set. To this end, we set up an experiment where we trained a basic MNIST classifier on the full train set, on a tenth of the train set, and on a tenth of the train set along with generated samples. The GAN in this case was also trained on the same tenth.

We trained the GAN on each digit independently and created 5000 new samples for each. Upon training the classifier with GAN input, we split each batch as either 25, 50 or 75 percent composed of generated digits. The rest of each batch was taken from the tenth of the train set.

We found that the network trained on a tenth of the dataset plus generated samples is more accurate on the test set than the network trained without generated samples. Specifically, we see a decrease in the error rate of up to 17% after training on our amplified dataset.

Train setAll train dataTenth of train data Tenth of train data and generated 75/25Tenth of train data and generated 50/50Tenth of train data and generated 25/75
Test set accuracy96.85%94%94.3%95%92.6%

 

 

References:

  1. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative adversarial nets.” In Advances in neural information processing systems, pp. 2672-2680. 2014
  2. Nibali, http://aiden.nibali.org/blog/2017-01-18-mode-collapse-gans/
  3. Despois, https://medium.com/@juliendespois/latent-space-visualization-deep-learning-bits-2-bd09a46920df
  4. Kingma, Welling. “Auto-Encoding Variational Bayes.” https://arxiv.org/pdf/1312.6114.pdf
  5. Chollet, Building Autoencoders in Keras”, https://blog.keras.io/building-autoencoders-in-keras.html, 2016
  6. Chablani, “GAN – Introduction and Implementation”, https://towardsdatascience.com/gan-introduction-and-implementation-part1-implement-a-simple-gan-in-tf-for-mnist-handwritten-de00a759ae5c, 2017