I hiked to Rattlesnake Ledge, just 1,200 feet of elevation gain and 4 miles roundtrip. I hike this trail maybe 5 times each year. Whenever I have visitors from out-of-town, we go to Snoqualmie Falls and this trail. It has the highest reward to effort ratio, with sweeping views of the Snoqualmie mountain range and Rattlesnake Lake below. The water level of Rattlesnake Lake is highly variable, sometimes receding to reveal huge tree stumps. Even when the trail is mobbed on weekends, it is still worth it.
J and I hiked to Poo Poo Point via this popular trail. It was about 1800 feet of elevation gain, 4 miles roundtrip. On an unreasonably warm winter weekend day, we shared this trail with every family and their dogs. The trail was pleasant and forested.
At the top was a clearing. The hilltop was about 20 degrees warmer, sunny and bright, with Mount Rainier clear and imposing. A paraglider ran a few meters, parachute catching air, and soundly cleared the treetops.
I hiked to the Big Four Ice Caves. Though, calling it a “hike” is a bit of a stretch. It was more a casual jaunt along a flat boardwalk, only 2 miles roundtrip. This time of year, snowstorms dump on the Cascades every week and so the trail was icy and slick the whole way.
Signs warned hikers not to go inside or near the caves, since people had died in years past from cave-ins and tumbling rocks. I observed the singular cave from afar, its mouth partially covered by a recent avalanche.
In previous approaches to generating an image from a sentence using GANs, the entire sentence was encoded as a single vector, and the GAN was conditioned on this vector. Attentional Generative Adversarial Network (AttnGAN) also conditions on the sentence vector, but improves on the previous approaches by refining the image in multiple stages using word vectors as well.
The authors propose two novel components in the AttnGAN: an attentional generative network, and a deep attentional multimodal similarity model (DAMSM).
The attentional generative network works as follows. First, the model calculates hidden state as a function of the sentence vector and some random noise. A low-resolution image is generated from this hidden state. Next, the image is improved iteratively in stages. At each stage, there is an attention model. The attention model takes in the hidden state of the previous stage as well as the word features, and calculates a word-context vector to emphasize words that need more representation for each subregion of the image. Next, the stage has a model that takes in the word-context vector and the previous hidden state to calculate the new hidden state. A higher resolution image is generated from the new hidden state. For example, in the paper, in a particular iteration and select regions of the image, a word that the model prioritizes is “red,” so when the bird image is refined, the bird is more red in the new image.
The loss from the attentional generative network has two main parts: unconditional loss that reflects whether the image is real or fake (for a more realistic-looking image) and conditional loss that reflects whether the image and sentence match.
There is an additional loss term that comes from the DAMSM, and this loss term makes sure each word is represented in the image (not just looking at the entire sentence as in the attentional generative network loss). The DAMSM maps words and image regions into a common semantic space to measure how much the words and image regions match up. Sentence text is encoded with a bi-LSTM. The image is encoded with a CNN into the text feature space. For each word in the sentence, attention is applied to the image encoding, so for each word, we have region-context vectors of how the image represents that word. Then the DAMSM loss compares the attention vector of how the image represents that word to the text encoding. So visually, to minimize DAMSM loss for the word “red,” the bird should be a vibrant red versus a muddied-red color.
The paper proves impressive results by adjusting model hyperparameters. Two stages of refinement in the attentional generative network is more performant (performance is measured by inception score and R-precision) than one stage of refinement, and one stage is better than no stage. This shows that iterative attention-based refinements improve image quality. Also, by increasing the weight applied to DAMSM loss, performance improves, which shows that DAMSM improves performance. In addition, the paper claims that training an AttnGAN is more stable than training other GANs, since mode collapse never occurred.
I thought the paper does a good job detailing how the model works, so the results should be clear and reproducible. Qualitatively, AttnGAN did well on the Caltech-USCD Birds dataset, but the images generated from MS-COCO do not look realistic. MS-COCO has more objects and more complex scenarios.
From the AttnGAN trained on MS-COCO, here’s an example of an image generated from the caption, “A man and women rest as their horse stands on the trail.”
Grass and horse parts are visible. Textures are sharp. But the objects do not have a reasonable shape. Interestingly, when this image is run through a Fast R-CNN object detector trained on VGG, the brown blob is labeled “horse” with 89% confidence.
J and I hiked the Bugandae trail of Bukhansan, a mountain in north Seoul. At first the trail was made up of bricks, with the occasional car driving by. Then the trail narrowed into an endless staircase of white rock.
I described Bukhansan as a “non-trivial hike.” Someone in the hotel overheard me and said that phrase is an oxymoron. I packed light for our trip, so I did not have my hiking boots or poles. But all around me, the locals were fully decked out in sleek matching hiking sets, some with scarves tied around their necks. Basically, we shared the trail with a bunch of dignified, trendy, well-prepared physically-fit old people.
As I continued hiking, my knees became wobbly, and I wished I had purchased hiking poles from one of the numerous purveyors at the base of the mountain. The hike was tiring, but I took a simple pleasure in every step. The leaves had a beautiful red, the trees around me had the novelty of being a distinctly Asian variety, and the air was fresher than the smog of the city. Along steep sections, there was a thick metal rope for hikers to grab onto. The trees along the route had smooth bark where thousands of hikers had latched on for stability.
I was traveling minimally. I had a backpack full of water and snacks, and I’m a lightweight person. But J was having some trouble, so we took frequent breaks. I held his backpack, and it was unreasonably heavy. I wondered what he had packed, because I thought that I was carrying all the essentials for our hike.
“What’s in here?” I asked.
A laptop, an empty metal thermos, and a liter of Japanese lube.