Heather Lake

I hiked to Heather Lake, roughly 5 miles roundtrip with 1000 ft. of elevation gain. Along the trail, we passed by stumps of old-growth trees, with new, thinner trunks shooting off the stumps. We hiked by waterfalls and over rickety wooden boardwalks. A section of trail was flooded by shallow running water. Near the lake, the trail was covered in snow. I had waterproof boots, microspikes, and poles, so there were no issues. I even had mats to sit on in the snow. It’s great to be geared up!

Heather Lake panorama
Heather Lake panorama

The lake was mostly covered in ice and mushy snow. We rested on the shore in the snow, a strange contrast to the bright, sunny, 70° weather. We could hear the roar of waterfalls on the other side of the lake, a robust flow from the snowmelt. As we milled at the lake, the number of arriving hikers started to pick up, and on our way down there were some traffic jams. We also missed a turn, and ended up doing a loop through the snow, stepping over tree branches and walking through mud.

Overall, I enjoyed the hike to Heather Lake. It was leisurely, and the views at the lake were gorgeous.

Spring Reading: Lolita, One Hundred Years of Solitude

I finally read Lolita, by Vladimir Nabokov. I like Nabokov’s effusive prose, so good.

I read One Hundred Years of Solitude, about the rise and fall of the Buendía family over seven generations. At the beginning of the book, I read it as realistic fiction due to the matter of fact tone. But then flying carpets and magical elements were introduced, and I realized these things were taken for granted as completely ordinary, versus to be interpreted as metaphor. Adding to the realism of this fantasy novel, the book interwove actual historical figures and events into the story, such as the banana massacre. Every time I picked up the book, I felt somber afterwards. The decline of the family and their village is foreshadowed and feels inevitable. Buendía family members are born, grow up, live a unique and solitary existence of their own making, then die. In each generation, the children are named after other family members, and so everyone has one of a few names, and the generations follow a cyclical pattern. Events that happened prior in the book are often recalled. The weight of prior generations stack, so that by the end of the book, at the mention of a single room, several generations’ worth of memories in that room are recalled.  At the end of the novel, a mystery introduced at the beginning of the book is finally revealed, and everything comes full circle.

Geocaching part 8: Electric Boogaloo

I moved back to the westside, so I have a long commute. I started geocaching again to pass the time until traffic dies down. Oftentimes, the coordinates given for the cache are off, but they get me to the general vicinity. So I have to rely on a punny name for clues. Here are my latest finds by campus.

There was a Honeywell box geocache under a lamppost skirt near the Honeywell building.

geocache by Honeywell
geocache by Honeywell

This “basset” cache was found in between rocks in a parking lot.

a geocache for rockhounds
a geocache for rockhounds
a geocache with a basset hound photo
a geocache with a basset hound photo

This “tired” cache was found near a golf course.

geocache in a tire planter
geocache in a tire planter

This tree hugging cache was found in a tree by a parking lot.

geocache hugging a tree
geocache hugging a tree
closeup of tree hugging geocache
closeup of tree hugging geocache

This cache was found on the side of a bike trail. I took a travel bug to bring overseas.

geocache by a bike trail
geocache by a bike trail
geocache with dog drawn on lid
geocache with dog drawn on lid
geocache travel bug
geocache travel bug

This cache was tricky, because the coordinates pointed to a different lamppost. But the hint was “Black,” so when I saw the black tape I looked under the heavy metal lamppost skirt.

lamppost with black tape
lamppost with black tape
micro bison geocache under heavy lamppost skirt
micro bison geocache under heavy lamppost skirt

Nisqually National Wildlife Refuge

I took a 5-mile walk in the Nisqually National Wildlife Refuge. We rolled in when the visitor center opened at 9AM, and borrowed binoculars from the visitor center.

At the start of the trail, we saw tens of sparrows diving in the air and flapping erratically, in contrast to the steady glide of larger birds. We saw several gaggles of Canadian geese. Whenever the geese took flight, they would shatter the silence with their loud honking. On the Twin Barns Loop Trail, we tried to find the three baby owls, but apparently they had changed trees. On the Estuary Trail, we spent some time observing two statuesque herons. They slowly waded in the water, then were patiently still as they fished. We also saw crows, red-winged blackbirds, various species of seagulls, and even an eagle soaring over a narrow strip of trees in the middle of the mudflats.

The visitor center overlooks a freshwater march. As we walked farther along the trail, the freshwater started to mix with the saltwater of Puget Sound, and we could smell the saltiness in the air.

An overlook on the Estuary Trail
An overlook on the Estuary Trail

I was surprised by the length of the boardwalks. The boardwalk to get to the Puget Sound Overlook was a mile long. The landscape was surreal, flat grassy marshes and mudflats (it was low tide) as far as the eye could see in all directions.

I thoroughly enjoyed my time birdwatching. Fellow birdwatchers were all friendly, eager to share the location of any birds that were spotted. Many brought a full-size telescope or a camera with telephoto lens. As we walked back to the parking lot, we passed by a lot of families, so we were glad that we were able to enjoy the wildlife refuge when it was uncrowded. The trails are all flat, so the wildlife refuge is a place I would consider taking my parents for a relaxing stroll.


Afterwards, we walked around Olympia. I ate a crab benedict for brunch. We saw the old legislative building and the current state capitol. The gray marble interior and chandelier felt cold and unwelcoming compared to the natural beauty that the capitol building overlooks. Outside one of the chambers, there are portraits of current Washington statesmen. One portrait stood out from the rest: a man wearing black sunglasses. It turns out, that man is the Lieutenant Governor, has accomplished quite a lot as a politician, and is blind. We strolled along the nearby boardwalk at Percival Landing, which displayed sculptures along its length. We climbed a wooden tower to get a view of the lake.  Then we made our way to the farmers market. All these locations were within ten minutes of each other. Olympia’s core area is conveniently walkable.

Umtanum Ridge Crest

I wanted to get away from the unceasing Seattle rain (at record levels this year!), so I drove east towards Yakima, where the skies are blue and the sun beats down relentlessly. I hiked Umtanum Ridge Crest, a 6-mile roundtrip hike with 2400 ft. of elevation gain.

Though I was only 2 hours away from the Puget Sound, the Umtanum Canyon region was like stepping into another world. The coniferous trees of the Sound were swapped for desert fauna, short grasses, sagebrush. Wildflowers were in bloom—blue and purple drops, yellow flowers in star and circle shapes— peppering the rolling hills. Overgrown shrubs encroached on the trail.

Beginning of the trail
Beginning of the trail

There was no forest cover. The packed dirt trail was exposed, winding through hills, always with a moderate incline. We trudged along the dusty path of loose rock, walking past waterfalls and rocky caves.

After some winding turns, we could see the end, the top of a mountain. The trail turned extremely steep. Any steeper and the trail would be a scramble. There were some incredibly fit freaks of nature doing a 50K race, and they ran up and down the ridge with great agility, undaunted by the ridiculous incline.  We pushed along, legs burning, but spurred on by the sight of the end of the trail.

Stacked rocks at the end of the trail
Stacked rocks at the end of the trail

At the top, we soaked in the panoramic view. The way in which we came had a view superior to that of the other side of the mountain. Looking behind us, we could see a massive caldera, with a single yellow tree inside. The valley undulated below us.

Umtanum Ridge Crest panorama
Umtanum Ridge Crest panorama

We ran back down the mountain, as it was more efficient than walking down slowly. The wind died down. The bugs, which gave the hike the white noise of a constant buzzing hum, swarmed thicker as we descended, no longer deterred by strong winds. I kept swatting them away from my face.

As we trekked back, we passed the familiar curves of the trail, the caves, the waterfalls, past the live railroad tracks and the green suspension bridge.

On the way home, we passed by a store that advertised in big letters, “APPLES”, “ANTIQUES”, and interestingly, “ASPARAGUS.” We stopped by for groceries and ice cream.

The next few days, my legs ached. It hurt to walk, especially up staircases, even to stand up. I will remember this hike fondly. Washington’s diversity of ecosystems is astounding!

Bridal Veil Falls and Lake Serene

I hiked to Lake Serene, making a detour to see Bridal Veil Falls along the way, bringing the hike to 8 miles roundtrip with 2000 feet of elevation gain.

The start of the trail was wide and flat. At around the 2-mile mark, the trail branched to climb upwards to Bridal Veil Falls. There was a steep snowfield we had to cross. I brought microspikes, which I got to use for the first time. The falls were powerful. Water beat the rocks below and produced a far-reaching spray.

Bridal Veil Falls
Bridal Veil Falls

On the way down from the falls, I tried glissading, but I could not stop myself on the steep, slick snow. My heart raced, as I was sliding down out of control. Luckily, there was a tree branch I could grab on to. And if I were to have fallen farther, there were some patches of shrubs below that would have probably stopped my fall. After that incident, my fellow hikers gave me advice on how to use microspikes. Instead of glissading down without an ice axe to self-arrest, they said to “trust the equipment, trust the microspikes to work.” Rather than step gingerly on the snow, they said to take firm steps to create footholds, toe-first while ascending and heel-first while descending.

Back at the juncture, we continued on towards Lake Serene. We passed the lower falls, which were nearly as impressive as the Bridal Veil Falls, also wide with a large throughput of water. There were clear swimming holes at the base of the falls. But this was not a day for swimming— during the hike, the weather alternated between rain, sleet, and snow.

The lower falls on the way to Lake Serene
The lower falls on the way to Lake Serene

The flat trail turned into a slog of switchbacks, a stairmaster consisting alternately of actual wooden stairs, roots, and rocks. At higher elevation, again we donned our traction devices as the switchbacks became completely covered in snow. After the switchbacks, we hiked through precipitous snowfields on narrow trails forged by whoever hiked before us. A one point, there was a fairly large drop from the snowpack trail into a creek. We had to slide down, cross the creek, then lift ourselves back onto the trail.

Lake Serene covered in snow
Lake Serene covered in snow

When we finally reached the lake, I was elated. I had eaten breakfast, but the hike made me hungry, and I felt a dull and growing burning in my stomach as time went on. I guzzled down a sandwich while admiring the lake, which was covered in snow. It was certainly serene, watching the quiet lake while snowflakes fell. On the way back, the clouds opened up and we saw a rainbow in the misty blue sky. I was surprised, hiking back, seeing that we had travelled so far.

The snow made this hike challenging for me, and it was not a hike that I would have been comfortable doing alone. I am thankful for my fellow hikers, who lent me their hats to keep away the precipitation, for letting me borrow trekking pulls, giving me advice, and pulling me up steep sections. Most of all, they were all very friendly, humorous, and supportive. Back in the parking lot, I felt relief, glad to have made it and flush with the feeling of accomplishment and expanded capabilities. I will feel more confident and capable doing hikes with this terrain in the future.

Oyster Dome

I hiked to Oyster Dome from Chuckanut Drive (Highway 11). The hike was 6.5 miles roundtrip and 2000 feet of elevation gain. We walked through forest dense with ferns and trees covered in emerald moss. We passed small waterfalls, some old-growth conifers, and story-tall moss-covered boulders.

View at the Oyster Dome summit
View at the Oyster Dome summit

Unfortunately, it was a cloudy day, and at the summit, we were surrounded by a thick fog that impaired all visibility of the Sound. We snacked in the rain, then went back down the trail. As we descended, the clouds broke. We could see shellfish farms. Underwater lines that were covered in shellfish were arranged in neat rows, akin to the rows of crops in a field.

Shellfish farms were visible
Shellfish farms were visible

Overall, this hike was quite enjoyable. The hike was easy enough that I stuffed my pack with stout, wine, and snacks. Since I did not have to work hard to reach the summit, I did not feel much disappointment that the fog had spoiled the view. This hike was low enough that there was no snow, only mud, making it an ideal early season hike. I wouldn’t mind hiking Oyster Dome again on a dry, sunny day, but I imagine on such days it would be thronged with people.

Measures of similarity in the 20 newsgroups dataset

The 20 newsgroups dataset is a data set of posts on 20 topics, ranging from cryptology to guns to baseball. I looked at 3 measures of similarity: Jaccard, cosine, and L2. Comparing each article with every other article, and taking the average similarity for that newsgroup, we get the following heat maps:

Cosine similarity heat map
Cosine similarity heat map
Jaccard similarity heat map
Jaccard similarity heat map
L2 similarity heat map
L2 similarity heat map

Cosine similarity seems the most reasonable, because it considers the relative frequency of words instead of the actual frequency. Take the case where there are two articles, A and B, and article A is the same as article B, except each word in A appears twice as many times in B. The similarity measure ought to indicate the articles are highly similar. The Jaccard similarity would be 0.5, cosine similarity would be 1, and L2 similarity would be some non-zero number. With Jaccard and L2 similarity, the number of words in each article has some influence on the similarity measure, so when one article has a lot more words than another, they will appear more dissimilar.

Let’s look at the cosine similarity plot, but with values < 0.45 removed:

Cosine similarity > 0.45 heat map
Cosine similarity > 0.45 heat map

Pairs of similar newsgroups include soc.religion.christian + soc.religion.christian,  talk.politics.guns + talk.politics.guns, soc.religion.christian + talk.politics.guns. Perhaps these two newsgroups have similar demographics. Other similar pairs include soc.religion.christian + alt.atheism and soc.religion.christian + talk.religion.misc. This seems plausible, that there is some overlap discussing religion or lack of it.


Next, we look at nearest-neighbor counts. For each article in a newsgroup, there is an article in another newsgroup that has largest similarity.

Jaccard similarity nearest-neighbor heat map
Jaccard similarity nearest-neighbor heat map

The average similarity plots are symmetric, because in the formulas for different similarity measures, for any article x and y, (x,y) and (y, x) return the same value, there’s nothing dependent on the order of the bag-of-words vectors.

The nearest-neighbor plot is asymmetric. If an article A has the largest Jaccard similarity to an article B, that does not mean that B has the largest Jaccard similarity to A. For example, say there are three articles X, Y, and Z. X and Y are similar, but Z is very different from both. If Z is most similar to, say, X, that does not mean X is most similar to Z, in this case X is most similar to Y. So, just because an article in a newsgroup M has the largest similarity to an article in a newsgroup N, does not mean that an article in newsgroup N will have the largest similarity to an article in newsgroup M.

Looking at the Jaccard nearest-neighbor heat map, these groups are similar: talk.religion.misc + alt.atheism, soc.religion.christian + alt.atheism, rec.sport.hockey + rec.sport.baseball, comp.sys.ibm.pc.hardware + comp.os.ms-windows.misc,  comp.sys.mac.hardware + comp.sys.ibm.pc.hardware.

Comparing the Jaccard plots, there is some overlap in similar newsgroups, such as soc.religion.christian + alt.atheism. In the nearest-neighbor plot, there are some newsgroups that appear similar that do not seem similar in the average similarity plot, such as comp.sys.mac.hardware + comp.sys.ibm.pc.hardware and rec.sport.hockey + rec.sport.baseball. Average similarity plots appear to have a more even distribution of similarity measures, whereas the counts in the nearest-neighbor plot are mostly low with some high counts.

Using average similarity is more suited to comparing newsgroups. With nearest-neighbors, each article has some discrete influence on similarity, so disparate newsgroups could wrongfully appear similar. It could be the case that the articles in a newsgroup are extremely dissimilar to articles in other newsgroups, such as the articles in misc.forsale. Looking at the Jaccard and cosine average similarity plots, it appears misc.forsale is dissimilar to the other newsgroups. In the nearest-neighbor plot, a noticeable number of articles in misc.forsale are nearest-neighbors to comp.sys.ibm.pc.hardware, probably because there are a lot of PCs for sale, but not the other way around. Likewise, the articles in rec.sport.hockey and rec.sport.baseball might not be similar to each other, but they are more similar to each other than to other newsgroups.


Next, we look at how reducing the number of dimensions affects the quality of results for measures of similarity. Here’s the cosine similarity nearest-neighbor heat map:

Cosine similarity nearest-neighbor heat map
Cosine similarity nearest-neighbor heat map

Now we reduce the dimensions by randomly drawing the features with a standard normal distribution.

Cosine similarity nearest-neighbor heat map, d=10
Cosine similarity nearest-neighbor heat map, d=10
Cosine similarity nearest-neighbor heat map, d=25
Cosine similarity nearest-neighbor heat map, d=25
Cosine similarity nearest-neighbor heat map, d=50
Cosine similarity nearest-neighbor heat map, d=50
Cosine similarity nearest-neighbor heat map, d=100
Cosine similarity nearest-neighbor heat map, d=100

 

Wall-clock times (seconds)

With no dimension reduction, calculating cosine similarities took 202.858168125 sec, finding nearest neighbors took 0.902053117752 sec.

d dimension reduction calculating cosine similarities finding nearest neighbors
10 1.89096689224 34.9237360954 0.762381076813
25 4.87242698669 81.7924189568 0.696530103683
50 8.96683502197 158.616721869 0.77707695961
100 18.9475579262 319.640784025 0.732910871506

For dimension reduction and calculating cosine similarities, wall-clock time increased linearly with d.

Target dimension d=100 gave comparable results to the original embedding.


Now let’s look at a single article, and see how cosine similarities compare after dimension reduction.

Cosine similarities for a single article, d=10
Cosine similarities for a single article, d=10
Cosine similarities for a single article, d=25
Cosine similarities for a single article, d=25
Cosine similarities for a single article, d=50
Cosine similarities for a single article, d=50
Cosine similarities for a single article, d=100
Cosine similarities for a single article, d=100

The error is the vertical distance from a point on the scatterplot to y=x. As d increases, the sum of the errors and the standard deviation of the errors gets smaller, because more of the information about the original words in full dimensions has been retained.

Looking at the target dimension vs. sum of errors:

d          sum of errors

10        300.6589293

25        113.3587640

50        74.9733475

100      67.0568351

It appears that the sum of errors asymptotically decreases as d increases.


Now we try dimension reduction with a random sign (±1) instead of a normal distribution.

Cosine similarity nearest-neighbor heat map, random sign, d=10
Cosine similarity nearest-neighbor heat map, random sign, d=10
Cosine similarity nearest-neighbor heat map, random sign, d=25
Cosine similarity nearest-neighbor heat map, random sign, d=25
Cosine similarity nearest-neighbor heat map, random sign, d=50
Cosine similarity nearest-neighbor heat map, random sign, d=50
Cosine similarity nearest-neighbor heat map, random sign, d=100
Cosine similarity nearest-neighbor heat map, random sign, d=100
Cosine similarities for a single article, random sign, d=10
Cosine similarities for a single article, random sign, d=10
Cosine similarities for a single article, random sign, d=25
Cosine similarities for a single article, random sign, d=25
Cosine similarities for a single article, random sign, d=50
Cosine similarities for a single article, random sign, d=50
Cosine similarities for a single article, random sign, d=100
Cosine similarities for a single article, random sign, d=100
d sum of errors, random normal distribution sum of errors (d), random sign
10 300.6589293 191.6908585
25 113.358764 111.0387588
50 74.9733475 84.4925114
100 67.0568351 66.6889494

The results of dimension reduction by random sign and random normal distribution were similar. For both dimensionally-reduced matrices, the plot for d=100 was comparable to the one with full dimensions.

There was an attempt

I tried to hike to Mailbox Peak on the new trail. It’s about 4,000 feet of elevation gain to the mailbox. At around 3,000 feet elevation gain, the snow was deeper and slicker, and the trail became steep. I would need traction devices and trekking poles. It was hailing and there was limited tree cover at that elevation. I was not feeling particularly energetic to begin with, so I turned back.

As I hiked back down through the trees, the hail turned to snow. Then as I reached lower elevations, the snow turned to rain. I passed all the familiar landmarks from my ascent: burnt trees, waterfalls, bridges, then back to leafy brush. I was disappointed that I didn’t make it to the mailbox, but I know I made the right choice in turning back.

PCA on data from the 1000 genomes project

Taking a dataset of individuals from the 1000 genomes project, with a subsample of ~10,000 nucleobases for each individual, the nucleobases were given a binary encoding based on the mode for that nucleobase position.


The individuals were from 7 African populations:

YRI: Yoruba in Ibadan, Nigeria
LWK: Luhya in Webuye, Kenya
GWD: Gambian in Western Divisions in the Gambia
MSL: Mende in Sierra Leone
ESN: Esan in Nigeria
ASW: Americans of African Ancestry in SW USA
ACB: African Caribbeans in Barbados

A map of the populations in the data
A map of the populations in the data

Plotting the first and second principal components, we see the components capture geographic information.

v1 vs. v2 components, grouped by population
v1 vs. v2 components, grouped by population

On the v1 axis, the populations appear genetically similar except for LWK, ACB, and ASW. The LWK population in east Africa is relatively dissimilar to the populations on the west coast of Africa. Populations ACB and ASW are even more dissimilar and have a wide spread. Perhaps there is greater genetic diversity for the ACB and ASW populations because they are more likely to have mixed ancestry. So the first principal component captures genetic similarity to west African coast populations.

On the v2 axis, we see GWD in a cluster and MSL in a cluster, and ESN, YRI, and LWK in a cluster. ACB and ASW span both the MSL and the ESN/YRI/LWK clusters. So the second principal component captures the split between the two populations on the western part of the coast (GWD + MSL) from the other central and eastern populations (ESN/YRI/LWK), while suggesting individuals in the ACB and ASW populations could have ancestry from either region.


Plotting the first and third principal components, we see the third component captures gender.

v1 vs. v3 components, grouped by gender
v1 vs. v3 components, grouped by gender

Plotting the first and fourth principal components, we see the fourth component captures whether the individual belongs to the LWK population.

v1 vs. v4 components, grouped by population
v1 vs. v4 components, grouped by population

Python libraries used: numpy, scikit-learn, pandas, matplotlib