Techno Blender
Digitally Yours.

Visualizing Everest Expeditions

0 28


An End-To-End Data Visualization Project

Image By Author

I love looking at other people’s data visualization work, especially when they also focus on design like Giorgia Lupi, Nadieh Bremer (I particularly love Patchwork Kingdoms by Nadieh), and Shirley Wu. Nadieh and Shirley are the authors of the book Data Sketches, and one thing I really appreciate from their book is the fact that Nadieh and Shirley explain the process behind every visualization from initial concept to final visualization, highlighting issues and technologies used. I appreciate how they demystify the process and, in the spirit of demystification, I’m sharing the process behind the data visualization above.

Getting The Data

While looking for datasets related to mountaineering, I stumbled across The Himalayan Database:

A compilation of records for all expeditions that have climbed in the Nepal Himalaya. The database is based on the expedition archives of Elizabeth Hawley, a longtime journalist based in Kathmandu, and it is supplemented by information gathered from books, alpine journals and correspondence with Himalayan climbers.

There are two main ways to access the data:

  1. Download an application with embedded data (doing this on Mac requires installing a compatibility layer to run Windows applications).
  2. Using the Himalayan Database Online, which allows you to access data by clicking through buttons and options.

It’s worthwhile to note that the Himalayan website says this about the second option:

The Himalayan Database Online is a subset of the downloadable version of the Himalayan Database that provides the most commonly used features of the database. This will make the database more accessible for many users, especially for those with tablets and smartphones.

It’s unclear exactly what’s missing from the online version so I opted to borrow a friend’s Windows PC to get the full dataset. The data I wanted was information on expeditions, which turned out to be a small CSV file with only 11,185 rows and 65 columns (this was expedition information for all Himalayan peaks since 1905). Here’s an example of the first few rows:

       expid peakid  year  season  host            route1            route2 route3 route4      nation               leaders                                        sponsor  success1  success2  success3  success4 ascent1 ascent2 ascent3 ascent4  claimed  disputed     countries                                   approach   bcdate   smtdate  smttime  smtdays  totdays  termdate  termreason                                         termnote  highpoint  traverse    ski  parapente  camps  rope  totmembers  smtmembers  mdeaths  tothired  smthired  hdeaths  nohired  o2used  o2none  o2climb  o2descent  o2sleep  o2medical  o2taken  o2unkwn                           othersmts                                          campsites                           accidents achievment agency  comrte  stdrte  primrte  primmem  primref primid   chksum
0 ANN260101 ANN2 1960 1 1 NW Ridge-W Ridge NaN NaN NaN UK J. O. M. Roberts NaN True False False False 1st NaN NaN NaN False False India, Nepal Marshyangdi->Hongde->Sabje Khola 3/15/60 5/17/60 1530.0 63 0 - - 1 NaN 7937 False False False 6 0 10 2 0 9 1 0 False True False True False True False False False Climbed Annapurna IV (ANN4-601-01) BC(15/03,3350m),ABC(4575m),C1(5365m),C2(5800m)... NaN NaN NaN False False False False False NaN 2442047
1 ANN269301 ANN2 1969 3 1 NW Ridge-W Ridge NaN NaN NaN Yugoslavia Ales Kunaver Mountaineering Club of Slovenia True False False False 2nd NaN NaN NaN False False NaN Marshyangdi->Hongde->Sabje Khola 9/25/69 10/22/69 1800.0 27 31 10/26/69 1 NaN 7937 False False False 6 0 10 2 0 0 0 0 False False True False False False False False False Climbed Annapurna IV (ANN4-693-02) LowBC(25/09,3950m),BC(27/09,4650m),C1(27/09,53... Draslar frostbitten hands and feet NaN NaN False False False False False NaN 2445501
2 ANN273101 ANN2 1973 1 1 W Ridge-N Face NaN NaN NaN Japan Yukio Shimamura Sangaku Doshikai Annapurna II Expedition 1973 True False False False 3rd NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 3/16/73 5/6/73 2030.0 51 0 - - 1 NaN 7937 False False False 5 0 6 1 0 8 0 0 False False True False False False False False False NaN BC(16/03,3300m),C1(21/03,4200m),C2(10/04,5000m... NaN NaN NaN False False False False False NaN 2446797
3 ANN278301 ANN2 1978 3 1 N Face-W Ridge NaN NaN NaN UK Richard J. Isherwood British Annapurna II Expedition False False False False NaN NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 9/8/78 10/2/78 NaN 24 27 10/5/78 4 Abandoned at 7000m (on A-IV) due to bad weather 7000 False False False 0 0 2 0 0 0 0 0 True False True False False False False False False NaN BC(08/09,5190m),xxx(02/10,7000m) NaN NaN NaN False False False False False NaN 2448822
4 ANN279301 ANN2 1979 3 1 N Face-W Ridge NW Ridge of A-IV NaN NaN UK Paul Moores NaN False False False False NaN NaN NaN NaN False False NaN Pokhara->Marshyangdi->Pisang->Sabje Khola - - 10/18/79 NaN 0 0 10/20/79 4 Abandoned at 7160m due to high winds 7160 False False False 0 0 3 0 0 0 0 0 True False True False False False False False False NaN BC(3500m),ABC,Biv1,Biv2,Biv3,Biv4,Biv5,xxx(18/... NaN NaN NaN False False False False False NaN 2449204

I was only interested in expeditions for Everest, so I filtered the data to peakid = EVER, which leaves us with 2306 records and 65 columns:

         expid peakid  year  season  host                          route1 route2 route3 route4       nation            leaders                                     sponsor  success1  success2  success3  success4 ascent1 ascent2 ascent3 ascent4  claimed  disputed            countries     approach    bcdate   smtdate  smttime  smtdays  totdays  termdate  termreason                                           termnote  highpoint  traverse    ski  parapente  camps  rope  totmembers  smtmembers  mdeaths  tothired  smthired  hdeaths  nohired  o2used  o2none  o2climb  o2descent  o2sleep  o2medical  o2taken  o2unkwn                     othersmts                                          campsites                                          accidents achievment                  agency  comrte  stdrte  primrte  primmem  primref primid   chksum
106 EVER88401 EVER 1988 4 1 S Col-SE Ridge NaN NaN NaN Belgium Herman Detienne Herman Detienne Everest/Lhotse Winter 1988 False False False False NaN NaN NaN NaN False False Netherlands, Poland NaN 11/10/88 12/22/88 NaN 42 0 - - 6 Abandoned at 8700m due to bad weather and Sher... 8700 False False False 4 0 17 0 0 10 0 1 False True False True False True True False False Climbed Lhotse (LHOT-884-01) BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m... Dewaele exhausted, shocked, needed O2 and was ... NaN Mountain Travel True True False False False NaN 2449641
108 EVER88402 EVER 1988 4 1 SW Face (Bonington rte) NaN NaN NaN S Korea Park Young-Bae Park Young-Bae Everest Winter 1988 False False False False NaN NaN NaN NaN False False NaN NaN 12/8/88 1/10/89 NaN 33 35 1/12/89 4 Abandoned at 7800m due to wind (no snow in gully) 7800 False False False 4 0 11 0 0 7 0 0 False False True False False False False False False NaN BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m... NaN NaN NaN False False False False False NaN 2449660
382 EVER89306 EVER 1989 3 2 N Col-N Face (Great Couloir) NaN NaN NaN USA Keith Brown US Chomolungma Expedition False False False False NaN NaN NaN NaN False False NaN NaN 8/21/89 10/14/89 NaN 54 55 10/15/89 4 Abandoned at 7800m due to wind 7800 False False False 2 0 2 0 0 0 0 0 True True False False False True False False False NaN BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000... NaN NaN NaN False False False False False NaN 2449938
383 EVER89310 EVER 1989 3 2 N Face (Japanese Couloir) NaN NaN NaN Italy Lorenzo Mazzoleni Italian Everest Expedition False False False False NaN NaN NaN NaN False False NaN NaN 9/1/89 9/25/89 NaN 24 36 10/7/89 4 Abandoned at 7500m due to bad weather and lack... 7500 False False False 2 0 4 0 0 0 0 0 True False True False False False False False False NaN BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200... NaN NaN Trekking International False False False False False NaN 2449919
384 EVER89305 EVER 1989 3 2 N Col-N Face (Messner Couloir) NaN NaN NaN Switzerland Norbert Joos Everest Team on N Face False False False False NaN NaN NaN NaN False False Italy, W Germany by road via 8/17/89 9/25/89 NaN 39 45 10/1/89 5 Abandoned at 8100m due to deep snow and bad we... 8100 False False False 1 0 6 0 0 0 0 0 True False True False False False False False False NaN BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800... NaN NaN Sherpa Society False False False False False NaN 2449919

Exploring Potentially Relevant Columns

I wanted to see if I could construct the elevation profile for all Everest expeditions, so I pulled out a few columns I thought might be relevant (based solely on the column names):

         expid peakid  year                          route1 route2 route3 route4  claimed  disputed     bcdate    smtdate  termdate  termreason                                           termnote  highpoint                                          campsites
106 EVER88401 EVER 1988 S Col-SE Ridge NaN NaN NaN False False 1988-11-10 1988-12-22 - - 6 Abandoned at 8700m due to bad weather and Sher... 8700 BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m...
108 EVER88402 EVER 1988 SW Face (Bonington rte) NaN NaN NaN False False 1988-12-08 1989-01-10 1/12/89 4 Abandoned at 7800m due to wind (no snow in gully) 7800 BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m...
382 EVER89306 EVER 1989 N Col-N Face (Great Couloir) NaN NaN NaN False False 1989-08-21 1989-10-14 10/15/89 4 Abandoned at 7800m due to wind 7800 BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000...
383 EVER89310 EVER 1989 N Face (Japanese Couloir) NaN NaN NaN False False 1989-09-01 1989-09-25 10/7/89 4 Abandoned at 7500m due to bad weather and lack... 7500 BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200...
384 EVER89305 EVER 1989 N Col-N Face (Messner Couloir) NaN NaN NaN False False 1989-08-17 1989-09-25 10/1/89 5 Abandoned at 8100m due to deep snow and bad we... 8100 BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800...

Many of the columns have a clear interpretation:

  • expid = Identifier for distinct expeditions
  • peakid = Identifier for distinct peaks (e.g., peakid = EVER filters data to Everest expeditions).
  • year = Year of the expedition.
  • route1, route2, route3, route4 = General route information.
  • disputed = Whether the summit was disputed or not.
  • termdate = Termination date for the expedition.
  • termreason = Termination reason.
  • termnote = Note on the termination reason.
  • highpoint = Highest point reached.
  • campsites = Campsites used by expedition members with information on date and campsite elevation.

However, the values in two of the columns were a bit confusing:

  • claimed = Expedition members claim they summited (?)
  • bcdate = Base camp date (?)
  • smtdate = Summit date (?)

It is unclear whether bcdate (which stands for “Base Camp Date”) represents the day the expedition team arrived at base camp or the day the expedition left basecamp (towards the summit). The interpretation of the smtdate column (which stands for “Summit Date”) is similarly unclear because teams that did not summit Everest still have a smtdate. After cross-referencing with other columns in the table (like termnote) and with trip reports from the Himalayan Database Online I concluded that:

  • bcdate is not standardized across records. However, it represents a point in time during which the expedition was at base camp.
  • smtdate is the date at which the team reached their highpoint (the value specified in highpoint), not the summit.

claimed was a little harder to understand. I guessed it was a flag indicating whether members of the expedition “claimed” to reach the summit without proof/verification. Here are two expedition reports with expeditions where the claimed flag is set to True:

  • EVER84305: Success unrecognized. Vos’s summit claim has been disputed by Czech team Sherpas.
  • EVER11193: Mudgal was not seen by the May 14 IMG summit team and they were the only ones high on the route at his supposed summit time (7:25 am). The only others on the route were two Alpine Ascents members who summited quite a bit a later (9:30 am).

It seems that the disputed and claimed flags indicate similar things, with claimed meaning the summit is unverified but not flat out disputed.

Selecting Relevant Columns & Filtering Data

After some thought, I decided I wanted to use the campsites field to approximate the elevation profile of the different Everest expeditions:

# Campsite information for one Everest expedition
campsite = 'BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m),C3(19/12,7500m),C4(10/01,7800m),xxx(10/01,7800m)'

Each of these entries can be treated as a vertical waypoint for the expedition. In this case, the expedition started at 5400m on 08/12, then moved to 6000m on 10/12, and so on. These waypoints are precisely what I want to plot.

Unfortunately, some values in the campsites column seem to have missing/incomplete data. Because of this I selected a few other columns that could help corroborate/contradict/complete the campsite information. The final dataframe had the following 9 columns:

  • expid & year, used to identify distinct records (expid is not unique due to a change in century so year is needed to construct a unique expedition identifier: expid = expid + year).
  • campsites, we’ll try to recover elevation over time from this column.
  • termreason, used to remove expedition records that are rumored to have happened, where information is not available, where the expedition was summiting a different peak (perhaps a subpeak), where the expedition didn’t reach base camp, didn’t attempt the summit, or where the termination reason is unknown.
  • bcdate, smtdate, & highpoint, used to corroborate/contradict/complete information in the campsites column.
  • disputed &claimed, used to remove expeditions with unreliable information (as discussed in the previous section).

After removing missing values, creating a new expedition identifier, removing cases where termreason = [0, 2, 11, 12, 13, 14], and dropping columns used for filtering, we end up with a dataframe like this:

            expid  year     bcdate    smtdate                                          campsites  highpoint
0 EVER88401-1988 1988 1988-11-10 1988-12-22 BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m... 8700
1 EVER88402-1988 1988 1988-12-08 1989-01-10 BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m... 7800
2 EVER89306-1989 1989 1989-08-21 1989-10-14 BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000... 7800
3 EVER89310-1989 1989 1989-09-01 1989-09-25 BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200... 7500
4 EVER89305-1989 1989 1989-08-17 1989-09-25 BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800... 8100

Issues With The “campsites” Field

Missing Dates & Elevations

Most entries in the campsites column have this form:

'BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m),C3(19/12,7500m),C4(10/01,7800m),xxx(10/01,7800m)'

A string of camp names, dates (in DD/MM format), and elevations (in meters). In the example above, the first camp is “BC” (base camp), the expedition camped there on 08/12, and the elevation of the campsite was 5400m. Unfortunately, dates and elevations are sometimes missing. For example, expedition EVER84101 (or EVER84101–1984 under the new expedition ID) has the following campsite information:

BC(08/03,5100m),C1,C2,C3.ABC,C4(12/04,7000m),C5(25/04,7680m),C6(15/05,8230m),C7(19/05,8540m),xxx(20/05,8600m)

As you can see, some campsite dates and elevations are missing for this expedition. We should be able to estimate campsite elevations by looking at other expeditions, at least for most campsites. For example, the campsite C1 was probably around 6000m in elevation (based on the first campsite record I shared). We can also estimate a date for the BC campsite (if it were missing) based on the value in the bcdate column, but we can’t estimate dates for the C1, C3, and C3, campsites. This means we can’t use these campsites as waypoints for the expedition. I decided to drop waypoints where I couldn’t recover date and elevation from an expedition’s elevation profile.

Missing Information Altogether

In other cases, the campsite field simply has text that says things like “See route note descriptions for individual teams” or “see route details”. In theory it’s possible to recover some information from the route details text which looks something like this:

South Side Camp Details:
BC at closest possible site to Icefall
C1 at top of Icefall
C2 at bottom of Lhotse Face in Cwm (normal site)
C3 on Lhotse Face right of Geneva Spur
C4 at South Col
C5 on SE Ridge (halfway South Col-South Summit).

BC(28/03,5350m),C1(04/04,6000m),ABC(11/04,6500m),C3(19/04,7300m),
C4(28/04,7986m),C5(04/05,8300m),Smt(05,10/05)

North Side Camp Details:

BC(06/03,5154m),C1(11/03,5500m),C2(12/03,6000m),ABC C3(17/03,6500m),
C4(01/04,7028m),C5(08/04,7790m),C6(02/05,8200m),C7(04/05,8680m),Smt(05/05)

South team had to hurry up to catch up with N side climbers because were 10 days late at BC due to transport problems to BC (after 29 March in almost daily walkie-talkie radio contact with north side). No mountaineering problems but communications problems among 3 nationalities who had "quite different" ideas especially "Chinese who had quite a bureaucratic inflexible attitude." Chinese were CMA staffers or TMA staff members, not army who had to obey orders from North BC or later when contact established by radio with Beijing, orders from Beijing of what they must do - where as most Japanese members paid some fees to join. Nepalese on salary from NMA but were quite familiar to Japanese. Tibetans could not speak frankly to others because "observers" were in BC who were Han Chinese and Han Chinese at BC could not differ with Beijing.

South side climbing not much snow, so had packed ice. Icefall very dry and stable - on summit climb 5 May, deep snow on ridge. 5 May summit group planned to be 2 traversing parties: Top Bahadur Khatri in 2nd group but miscalculated oxygen supply with not enough on South Col: 24 bottles were there Ok but 17 or 18 empty on 4 May when these 6 and 6 support members on Col or above, hoping all to go to top. 2 Chinese were to go so went, Ang Phurba followed them to C5 to 8300m on 4 May - not enough oxygen at 8300m for Kitamura to go with them, so he stayed at C4 and tried to go to top on 5th from C4 early morning in very strong wind. Khatri and Isona and supporting members including Sungdare did not try to go up because oxygen at Col only enough for one summiter from there. Kitamura reached South Summit at 3:00 pm; was told it too late for him to continue, leadership felt, and told Yamada to bring him down.

7 May Beijing said climb finished but young Japanese and some Nepalese climbers did not agree. 6 Nepalese went to South Col: Sungdare, Padam Bahadur Tamang, Ang Karma, Ang Rita (Thami), Narayan Shrestha and Hira Bahadur Rana - why all 6 did not reach summit Isono doesn't know. Nepalese defied end of climb decision and it was their country, but Japanese were not permitted to continue, much to their unhappiness.

South Chinese climbing leader stayed in North Col camp with Shigehiro at end of climbing period.

North side 3 stages of climb:

1) make camps and carry loads to N Col,
2) do same to C6 with loads but not occupy higher camps - then all down
to BC for rest,
3) complete climb

In 1st stages down to BC when bad weather came after C3 established. In 2nd stage C6 reached but not slept in for 1st time on 9th April. In 3rd stage C6 occupied and rest of climb completed. Did not at any time have to wait for south side climbers to make progress because south side route easier to climb and number of camps less and fact summit date fixed for 5 May. team led by Yamada who knew south route. 6 Yamada selected before leaving Japan so could descend without south side climbers to guide them down. Oxygen used in C5 sleeping, same members climbing above C5, all members sleeping in C6 and C8 and climbing above C7. From summit Yamada left about 10:30 am with Ang Lhakpa and Cerin Douji. Linert Yamada party at top and left 11:00 am. Lhakpa Sona and Yamoumoto left 12:30 pm. Camera crew reached top just few minutes before south side trio arrived and left at 1:00 pm. 2nd traverse team in C5 ready to go to top (had ferried oxygen and food and fuel to C6 on 6 May) when decision taken late night 6 May by Chinese leader (final decision taken 3:00 am 7 May Chinese time) so 2nd team to N Col 7 May. In 2nd team was Mitani who would have been his 4th ascent.

[ADDITIONAL TEXT WAS REMOVED FOR THIS EXAMPLE]

You can see the campsite information we want is available near the start of the document (there are actually two teams):

noth_side = BC(28/03,5350m),C1(04/04,6000m),ABC(11/04,6500m),C3(19/04,7300m), C4(28/04,7986m),C5(04/05,8300m),Smt(05,10/05)
south_side = BC(06/03,5154m),C1(11/03,5500m),C2(12/03,6000m),ABC C3(17/03,6500m),C4(01/04,7028m),C5(08/04,7790m),C6(02/05,8200m),C7(04/05,8680m),Smt(05/05)

I decided not to attempt to recover campsite information from the route details. The main reason was that the text was not available in the expeditions data I extracted, and I would need to either:

  • borrow my friend’s laptop again to see if the route details data was available,
  • or write a web scraper with Selenium to extract route details from The Himalayan Database Online page.

Then, I would need to automate extracting campsite information. Because I was creating this visualization for fun, I decided not to go down this road and instead removed this type of record from the visualization.

Multiple Teams

Another problem with campsite descriptions is that some expeditions have information for multiple teams:

BC(05-06/08,5250m),ABC(15-16/08,5821m),C1(20/08,6248m); W-Ridge.C2(03,05/09,6700m),C3(08/09,6900m)xxx(10/09,7240m); N-Face-xxx(23/09,7315m) (see route notes)

In these cases, I kept only the first route.

Reconstructing Elevation Profiles

To reconstruct elevation profiles from the campsite field I implemented the following logic:

[STEP 1] — Split the campsite information by commas, provided the comma is not within parentheses. Here’s an example of how this split would work:

# Original string
orig_str = 'BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m),C3(06/12,7300m),C4(21/12,8000m),xxx(22/12,8700m)'

# Split by commas not in parentheses
splt_str = ['BC(10/11,5350m)', 'C1(01/12,6050m)', 'C2(03/12,6400m)', 'C3(06/12,7300m)', 'C4(21/12,8000m)', 'xxx(22/12,8700m)']

[STEP 2] — For each element in the list, try to recover the campsite name, date, and elevation. This can be done by matching the following regex patterns and saving them in a tuple:

date_patt = r'\b\d{2}/\d{2}\b'  # Match a string in the format of two digits, a forward slash, and another two digits, surrounded by word boundaries
elev_patt = r'\b\d{4}m\b' # Match a string that consists of exactly four digits followed by the letter 'm', and the whole pattern is expected to be a whole word
camp_patt = r'([^\(]*)\(' # Matches anything before an open parentheses

For example, extracting the campsite, date, and elevation information from split_str above yields the following list of tuples:

# Create (DD/MM, elevation, campsite) tuples 
camp_tuples = [('10/11', 5350.0, 'BC'), ('01/12', 6050.0, 'C1'), ('03/12', 6400.0, 'C2'), ('06/12', 7300.0, 'C3'), ('21/12', 8000.0, 'C4'), ('22/12', 8700.0, 'xxx')]

[STEP 3] — Drop tuples where the date is missing (without a date, the elevation and campsite information is not sufficient for creating an elevation profile). Don’t worry if we accidentally remove a base camp tuple because of a missing date (we will add this back in Step 7).

[STEP 4] — If date is present in a tuple but elevation is missing, check if campsite name is present. If campsite name is present, get campsite elevation from other records.

[STEP 5] If any tuple still has missing elevation information, drop it. There’s no way we can recover this.

[STEP 6] — Drop tuples where the elevation exceeds the elevation in the highpoint field (this is a contradiction).

[STEP 7] — By this point all tuples in a list will have a date and an elevation, and we can assume most list of tuples looks like this (note that I removed campsite name):

# Standard campsite list
[('10/11', 5350.0), ('01/12', 6050.0), ('03/12', 6400.0), ('06/12', 7300.0), ('21/12', 8000.0), ('22/12', 8700.0)]

Some lists, however, will be empty. Fortunately, we can infer two waypoints for any expeditions:

  • Base camp’s elevation and date (from the base camp elevation listed in other records and from the bcdate field).
  • The turnaround point’s elevation and date (from the highpoint and smtdate field).

Appending the base camp elevation and date to the start of each list, and the turnaround elevation and date to the end, we guarantee that each record has at least two waypoints: start and end.

[STEP 8] — Drop tuples where elevation is lower than the elevation of a previous tuple (these were mostly edge cases where the regex pattern wasn’t general enough to extract the data correctly, but also occurred due to issues with the input data such as listing an elevation higher than the height of Everest). Make sure to also drop duplicate tuples (same date and elevation) if any.

Steps 1–8 produce a final elevation profile (i.e., a sequence of elevation waypoints) for each remaining expedition (note that I converted the dates to YYYY/MM/DD format using the information in the year column):

            expid                                        camp_tuples
0 EVER88401-1988 [(1988/11/10, 5360.0), (1988/12/01, 6050.0), (...
1 EVER88402-1988 [(1988/12/08, 5360.0), (1988/12/10, 6000.0), (...
2 EVER89306-1989 [(1989/08/21, 5360.0), (1989/08/25, 6400.0), (...
3 EVER89310-1989 [(1989/09/01, 5360.0), (1989/09/08, 5600.0), (...
4 EVER89305-1989 [(1989/08/17, 5360.0), (1989/08/27, 6450.0), (...

Plotting The Waypoints

The waypoint information (campsite date/elevation tuples) for each expedition can be transformed as follows:


expid date y x
0 EVER00101-2000 2000-04-07 5360.0 0
1 EVER00101-2000 2000-04-23 6450.0 16
2 EVER00101-2000 2000-05-01 7200.0 24
3 EVER00101-2000 2000-05-24 8849.0 47

where y is the elevation in meters and x is the number of days since the start of the expedition. After interpolating the value of y using a monotonic cubic spline (PchipInterpolator to smooth the data while preserving monotonicity) and plotting with matplotlib we get the following image:

The yellow lines are expeditions I wanted to highlight and the red x’s were added to help align text in Illustrator during a later step.

I naturally wondered what on earth were these extreme points to the left (2 days to summit? really?) and to the right (over 90 days?). It turns out the point to the left is Nirmal Purja’s summit from “Project Possible” (yes, the guy from the “14 Peaks: Nothing Is Impossible” documentary). The point on the right was a grueling expedition that ended in frostbite.

I decided to truncate the data to 80 days to make the bulk of the data easier to visualize. The final plot looked something like this:

The yellow profiles correspond to Nirmal Purja’s ascent, the first ascent, the first female ascent, and the longest expedition (to the turnaround point) under 90 days (I looked at the “achievements” field in the database to find the expedition ID’s of the “first ascent” expeditions).

I was a bit bothered by the “kinks” in the second to last waypoint for most expeditions. I decided I wanted to smooth this out a bit more even if it did mean compromising on the exact elevation value (I was more interested in viewing trends, not exact values). Instead of doing this smoothing in Python directly (which would likely involve increasing the interpolation resolution and/or removing the problematic waypoint after interpolation), I decided to take care of this in Illustrator (where curve smoothing is straightforward).

Edit In Illustrator & Photoshop

After saving the previous plot as an SVG, you can open it in Adobe Illustrator, select all paths, and use the Object > Path > Smooth slider to smooth them:

Left: Original paths. Right: Smoothed paths. Smoothing in Illustrator is quick but smoothed paths no longer follow waypoints exactly.

In Illustrator you can select paths with similar properties (such as line thickness and color) and adjust them in the same way. In this case, I selected all lines and increased their transparency (the elevation profiles create a hairlike texture after increasing transparency). I also added some annotations, removed the red x’s once I no longer needed them (their positions were calculated in Python and used to align the “80% of ascents above 8500m…” text):

Adjust colors & opacity, add text, create layout.

As a final touch I exported the image as PNG and added a paper texture in Photoshop:

Add paper texture by overlaying a lightened paper image in multiply mode (you can’t see it here, but there is a paper texture in the final image shared at the start of this article).

That’s it!

Final Thoughts

Creating this visualization was relatively straightforward: both data processing and plotting were done in Python with the layout designed in Illustrator and final touches (texture) added in Photoshop. This sort of approach works well here because I wanted to create a static visualization. If you need a dynamic plot or have a more complex visualization, using D3.js to plot the data (instead of simply doing this directly through matplotlib in Python) might be more appropriate.


Visualizing Everest Expeditions was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.


An End-To-End Data Visualization Project

Image By Author

I love looking at other people’s data visualization work, especially when they also focus on design like Giorgia Lupi, Nadieh Bremer (I particularly love Patchwork Kingdoms by Nadieh), and Shirley Wu. Nadieh and Shirley are the authors of the book Data Sketches, and one thing I really appreciate from their book is the fact that Nadieh and Shirley explain the process behind every visualization from initial concept to final visualization, highlighting issues and technologies used. I appreciate how they demystify the process and, in the spirit of demystification, I’m sharing the process behind the data visualization above.

Getting The Data

While looking for datasets related to mountaineering, I stumbled across The Himalayan Database:

A compilation of records for all expeditions that have climbed in the Nepal Himalaya. The database is based on the expedition archives of Elizabeth Hawley, a longtime journalist based in Kathmandu, and it is supplemented by information gathered from books, alpine journals and correspondence with Himalayan climbers.

There are two main ways to access the data:

  1. Download an application with embedded data (doing this on Mac requires installing a compatibility layer to run Windows applications).
  2. Using the Himalayan Database Online, which allows you to access data by clicking through buttons and options.

It’s worthwhile to note that the Himalayan website says this about the second option:

The Himalayan Database Online is a subset of the downloadable version of the Himalayan Database that provides the most commonly used features of the database. This will make the database more accessible for many users, especially for those with tablets and smartphones.

It’s unclear exactly what’s missing from the online version so I opted to borrow a friend’s Windows PC to get the full dataset. The data I wanted was information on expeditions, which turned out to be a small CSV file with only 11,185 rows and 65 columns (this was expedition information for all Himalayan peaks since 1905). Here’s an example of the first few rows:

       expid peakid  year  season  host            route1            route2 route3 route4      nation               leaders                                        sponsor  success1  success2  success3  success4 ascent1 ascent2 ascent3 ascent4  claimed  disputed     countries                                   approach   bcdate   smtdate  smttime  smtdays  totdays  termdate  termreason                                         termnote  highpoint  traverse    ski  parapente  camps  rope  totmembers  smtmembers  mdeaths  tothired  smthired  hdeaths  nohired  o2used  o2none  o2climb  o2descent  o2sleep  o2medical  o2taken  o2unkwn                           othersmts                                          campsites                           accidents achievment agency  comrte  stdrte  primrte  primmem  primref primid   chksum
0 ANN260101 ANN2 1960 1 1 NW Ridge-W Ridge NaN NaN NaN UK J. O. M. Roberts NaN True False False False 1st NaN NaN NaN False False India, Nepal Marshyangdi->Hongde->Sabje Khola 3/15/60 5/17/60 1530.0 63 0 - - 1 NaN 7937 False False False 6 0 10 2 0 9 1 0 False True False True False True False False False Climbed Annapurna IV (ANN4-601-01) BC(15/03,3350m),ABC(4575m),C1(5365m),C2(5800m)... NaN NaN NaN False False False False False NaN 2442047
1 ANN269301 ANN2 1969 3 1 NW Ridge-W Ridge NaN NaN NaN Yugoslavia Ales Kunaver Mountaineering Club of Slovenia True False False False 2nd NaN NaN NaN False False NaN Marshyangdi->Hongde->Sabje Khola 9/25/69 10/22/69 1800.0 27 31 10/26/69 1 NaN 7937 False False False 6 0 10 2 0 0 0 0 False False True False False False False False False Climbed Annapurna IV (ANN4-693-02) LowBC(25/09,3950m),BC(27/09,4650m),C1(27/09,53... Draslar frostbitten hands and feet NaN NaN False False False False False NaN 2445501
2 ANN273101 ANN2 1973 1 1 W Ridge-N Face NaN NaN NaN Japan Yukio Shimamura Sangaku Doshikai Annapurna II Expedition 1973 True False False False 3rd NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 3/16/73 5/6/73 2030.0 51 0 - - 1 NaN 7937 False False False 5 0 6 1 0 8 0 0 False False True False False False False False False NaN BC(16/03,3300m),C1(21/03,4200m),C2(10/04,5000m... NaN NaN NaN False False False False False NaN 2446797
3 ANN278301 ANN2 1978 3 1 N Face-W Ridge NaN NaN NaN UK Richard J. Isherwood British Annapurna II Expedition False False False False NaN NaN NaN NaN False False NaN Marshyangdi->Pisang->Salatang Khola 9/8/78 10/2/78 NaN 24 27 10/5/78 4 Abandoned at 7000m (on A-IV) due to bad weather 7000 False False False 0 0 2 0 0 0 0 0 True False True False False False False False False NaN BC(08/09,5190m),xxx(02/10,7000m) NaN NaN NaN False False False False False NaN 2448822
4 ANN279301 ANN2 1979 3 1 N Face-W Ridge NW Ridge of A-IV NaN NaN UK Paul Moores NaN False False False False NaN NaN NaN NaN False False NaN Pokhara->Marshyangdi->Pisang->Sabje Khola - - 10/18/79 NaN 0 0 10/20/79 4 Abandoned at 7160m due to high winds 7160 False False False 0 0 3 0 0 0 0 0 True False True False False False False False False NaN BC(3500m),ABC,Biv1,Biv2,Biv3,Biv4,Biv5,xxx(18/... NaN NaN NaN False False False False False NaN 2449204

I was only interested in expeditions for Everest, so I filtered the data to peakid = EVER, which leaves us with 2306 records and 65 columns:

         expid peakid  year  season  host                          route1 route2 route3 route4       nation            leaders                                     sponsor  success1  success2  success3  success4 ascent1 ascent2 ascent3 ascent4  claimed  disputed            countries     approach    bcdate   smtdate  smttime  smtdays  totdays  termdate  termreason                                           termnote  highpoint  traverse    ski  parapente  camps  rope  totmembers  smtmembers  mdeaths  tothired  smthired  hdeaths  nohired  o2used  o2none  o2climb  o2descent  o2sleep  o2medical  o2taken  o2unkwn                     othersmts                                          campsites                                          accidents achievment                  agency  comrte  stdrte  primrte  primmem  primref primid   chksum
106 EVER88401 EVER 1988 4 1 S Col-SE Ridge NaN NaN NaN Belgium Herman Detienne Herman Detienne Everest/Lhotse Winter 1988 False False False False NaN NaN NaN NaN False False Netherlands, Poland NaN 11/10/88 12/22/88 NaN 42 0 - - 6 Abandoned at 8700m due to bad weather and Sher... 8700 False False False 4 0 17 0 0 10 0 1 False True False True False True True False False Climbed Lhotse (LHOT-884-01) BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m... Dewaele exhausted, shocked, needed O2 and was ... NaN Mountain Travel True True False False False NaN 2449641
108 EVER88402 EVER 1988 4 1 SW Face (Bonington rte) NaN NaN NaN S Korea Park Young-Bae Park Young-Bae Everest Winter 1988 False False False False NaN NaN NaN NaN False False NaN NaN 12/8/88 1/10/89 NaN 33 35 1/12/89 4 Abandoned at 7800m due to wind (no snow in gully) 7800 False False False 4 0 11 0 0 7 0 0 False False True False False False False False False NaN BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m... NaN NaN NaN False False False False False NaN 2449660
382 EVER89306 EVER 1989 3 2 N Col-N Face (Great Couloir) NaN NaN NaN USA Keith Brown US Chomolungma Expedition False False False False NaN NaN NaN NaN False False NaN NaN 8/21/89 10/14/89 NaN 54 55 10/15/89 4 Abandoned at 7800m due to wind 7800 False False False 2 0 2 0 0 0 0 0 True True False False False True False False False NaN BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000... NaN NaN NaN False False False False False NaN 2449938
383 EVER89310 EVER 1989 3 2 N Face (Japanese Couloir) NaN NaN NaN Italy Lorenzo Mazzoleni Italian Everest Expedition False False False False NaN NaN NaN NaN False False NaN NaN 9/1/89 9/25/89 NaN 24 36 10/7/89 4 Abandoned at 7500m due to bad weather and lack... 7500 False False False 2 0 4 0 0 0 0 0 True False True False False False False False False NaN BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200... NaN NaN Trekking International False False False False False NaN 2449919
384 EVER89305 EVER 1989 3 2 N Col-N Face (Messner Couloir) NaN NaN NaN Switzerland Norbert Joos Everest Team on N Face False False False False NaN NaN NaN NaN False False Italy, W Germany by road via 8/17/89 9/25/89 NaN 39 45 10/1/89 5 Abandoned at 8100m due to deep snow and bad we... 8100 False False False 1 0 6 0 0 0 0 0 True False True False False False False False False NaN BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800... NaN NaN Sherpa Society False False False False False NaN 2449919

Exploring Potentially Relevant Columns

I wanted to see if I could construct the elevation profile for all Everest expeditions, so I pulled out a few columns I thought might be relevant (based solely on the column names):

         expid peakid  year                          route1 route2 route3 route4  claimed  disputed     bcdate    smtdate  termdate  termreason                                           termnote  highpoint                                          campsites
106 EVER88401 EVER 1988 S Col-SE Ridge NaN NaN NaN False False 1988-11-10 1988-12-22 - - 6 Abandoned at 8700m due to bad weather and Sher... 8700 BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m...
108 EVER88402 EVER 1988 SW Face (Bonington rte) NaN NaN NaN False False 1988-12-08 1989-01-10 1/12/89 4 Abandoned at 7800m due to wind (no snow in gully) 7800 BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m...
382 EVER89306 EVER 1989 N Col-N Face (Great Couloir) NaN NaN NaN False False 1989-08-21 1989-10-14 10/15/89 4 Abandoned at 7800m due to wind 7800 BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000...
383 EVER89310 EVER 1989 N Face (Japanese Couloir) NaN NaN NaN False False 1989-09-01 1989-09-25 10/7/89 4 Abandoned at 7500m due to bad weather and lack... 7500 BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200...
384 EVER89305 EVER 1989 N Col-N Face (Messner Couloir) NaN NaN NaN False False 1989-08-17 1989-09-25 10/1/89 5 Abandoned at 8100m due to deep snow and bad we... 8100 BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800...

Many of the columns have a clear interpretation:

  • expid = Identifier for distinct expeditions
  • peakid = Identifier for distinct peaks (e.g., peakid = EVER filters data to Everest expeditions).
  • year = Year of the expedition.
  • route1, route2, route3, route4 = General route information.
  • disputed = Whether the summit was disputed or not.
  • termdate = Termination date for the expedition.
  • termreason = Termination reason.
  • termnote = Note on the termination reason.
  • highpoint = Highest point reached.
  • campsites = Campsites used by expedition members with information on date and campsite elevation.

However, the values in two of the columns were a bit confusing:

  • claimed = Expedition members claim they summited (?)
  • bcdate = Base camp date (?)
  • smtdate = Summit date (?)

It is unclear whether bcdate (which stands for “Base Camp Date”) represents the day the expedition team arrived at base camp or the day the expedition left basecamp (towards the summit). The interpretation of the smtdate column (which stands for “Summit Date”) is similarly unclear because teams that did not summit Everest still have a smtdate. After cross-referencing with other columns in the table (like termnote) and with trip reports from the Himalayan Database Online I concluded that:

  • bcdate is not standardized across records. However, it represents a point in time during which the expedition was at base camp.
  • smtdate is the date at which the team reached their highpoint (the value specified in highpoint), not the summit.

claimed was a little harder to understand. I guessed it was a flag indicating whether members of the expedition “claimed” to reach the summit without proof/verification. Here are two expedition reports with expeditions where the claimed flag is set to True:

  • EVER84305: Success unrecognized. Vos’s summit claim has been disputed by Czech team Sherpas.
  • EVER11193: Mudgal was not seen by the May 14 IMG summit team and they were the only ones high on the route at his supposed summit time (7:25 am). The only others on the route were two Alpine Ascents members who summited quite a bit a later (9:30 am).

It seems that the disputed and claimed flags indicate similar things, with claimed meaning the summit is unverified but not flat out disputed.

Selecting Relevant Columns & Filtering Data

After some thought, I decided I wanted to use the campsites field to approximate the elevation profile of the different Everest expeditions:

# Campsite information for one Everest expedition
campsite = 'BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m),C3(19/12,7500m),C4(10/01,7800m),xxx(10/01,7800m)'

Each of these entries can be treated as a vertical waypoint for the expedition. In this case, the expedition started at 5400m on 08/12, then moved to 6000m on 10/12, and so on. These waypoints are precisely what I want to plot.

Unfortunately, some values in the campsites column seem to have missing/incomplete data. Because of this I selected a few other columns that could help corroborate/contradict/complete the campsite information. The final dataframe had the following 9 columns:

  • expid & year, used to identify distinct records (expid is not unique due to a change in century so year is needed to construct a unique expedition identifier: expid = expid + year).
  • campsites, we’ll try to recover elevation over time from this column.
  • termreason, used to remove expedition records that are rumored to have happened, where information is not available, where the expedition was summiting a different peak (perhaps a subpeak), where the expedition didn’t reach base camp, didn’t attempt the summit, or where the termination reason is unknown.
  • bcdate, smtdate, & highpoint, used to corroborate/contradict/complete information in the campsites column.
  • disputed &claimed, used to remove expeditions with unreliable information (as discussed in the previous section).

After removing missing values, creating a new expedition identifier, removing cases where termreason = [0, 2, 11, 12, 13, 14], and dropping columns used for filtering, we end up with a dataframe like this:

            expid  year     bcdate    smtdate                                          campsites  highpoint
0 EVER88401-1988 1988 1988-11-10 1988-12-22 BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m... 8700
1 EVER88402-1988 1988 1988-12-08 1989-01-10 BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m... 7800
2 EVER89306-1989 1989 1989-08-21 1989-10-14 BC(21/08,5200m),ABC(25/08,6400m),C1(28/08,7000... 7800
3 EVER89310-1989 1989 1989-09-01 1989-09-25 BC(01/09,5200m),ABC(08/09,5600m),C1(11/09,6200... 7500
4 EVER89305-1989 1989 1989-08-17 1989-09-25 BC(17/08,5200m),ABC(27/08,6450m),C1(06/09,7800... 8100

Issues With The “campsites” Field

Missing Dates & Elevations

Most entries in the campsites column have this form:

'BC(08/12,5400m),C1(10/12,6000m),C2(14/12,6300m),C3(19/12,7500m),C4(10/01,7800m),xxx(10/01,7800m)'

A string of camp names, dates (in DD/MM format), and elevations (in meters). In the example above, the first camp is “BC” (base camp), the expedition camped there on 08/12, and the elevation of the campsite was 5400m. Unfortunately, dates and elevations are sometimes missing. For example, expedition EVER84101 (or EVER84101–1984 under the new expedition ID) has the following campsite information:

BC(08/03,5100m),C1,C2,C3.ABC,C4(12/04,7000m),C5(25/04,7680m),C6(15/05,8230m),C7(19/05,8540m),xxx(20/05,8600m)

As you can see, some campsite dates and elevations are missing for this expedition. We should be able to estimate campsite elevations by looking at other expeditions, at least for most campsites. For example, the campsite C1 was probably around 6000m in elevation (based on the first campsite record I shared). We can also estimate a date for the BC campsite (if it were missing) based on the value in the bcdate column, but we can’t estimate dates for the C1, C3, and C3, campsites. This means we can’t use these campsites as waypoints for the expedition. I decided to drop waypoints where I couldn’t recover date and elevation from an expedition’s elevation profile.

Missing Information Altogether

In other cases, the campsite field simply has text that says things like “See route note descriptions for individual teams” or “see route details”. In theory it’s possible to recover some information from the route details text which looks something like this:

South Side Camp Details:
BC at closest possible site to Icefall
C1 at top of Icefall
C2 at bottom of Lhotse Face in Cwm (normal site)
C3 on Lhotse Face right of Geneva Spur
C4 at South Col
C5 on SE Ridge (halfway South Col-South Summit).

BC(28/03,5350m),C1(04/04,6000m),ABC(11/04,6500m),C3(19/04,7300m),
C4(28/04,7986m),C5(04/05,8300m),Smt(05,10/05)

North Side Camp Details:

BC(06/03,5154m),C1(11/03,5500m),C2(12/03,6000m),ABC C3(17/03,6500m),
C4(01/04,7028m),C5(08/04,7790m),C6(02/05,8200m),C7(04/05,8680m),Smt(05/05)

South team had to hurry up to catch up with N side climbers because were 10 days late at BC due to transport problems to BC (after 29 March in almost daily walkie-talkie radio contact with north side). No mountaineering problems but communications problems among 3 nationalities who had "quite different" ideas especially "Chinese who had quite a bureaucratic inflexible attitude." Chinese were CMA staffers or TMA staff members, not army who had to obey orders from North BC or later when contact established by radio with Beijing, orders from Beijing of what they must do - where as most Japanese members paid some fees to join. Nepalese on salary from NMA but were quite familiar to Japanese. Tibetans could not speak frankly to others because "observers" were in BC who were Han Chinese and Han Chinese at BC could not differ with Beijing.

South side climbing not much snow, so had packed ice. Icefall very dry and stable - on summit climb 5 May, deep snow on ridge. 5 May summit group planned to be 2 traversing parties: Top Bahadur Khatri in 2nd group but miscalculated oxygen supply with not enough on South Col: 24 bottles were there Ok but 17 or 18 empty on 4 May when these 6 and 6 support members on Col or above, hoping all to go to top. 2 Chinese were to go so went, Ang Phurba followed them to C5 to 8300m on 4 May - not enough oxygen at 8300m for Kitamura to go with them, so he stayed at C4 and tried to go to top on 5th from C4 early morning in very strong wind. Khatri and Isona and supporting members including Sungdare did not try to go up because oxygen at Col only enough for one summiter from there. Kitamura reached South Summit at 3:00 pm; was told it too late for him to continue, leadership felt, and told Yamada to bring him down.

7 May Beijing said climb finished but young Japanese and some Nepalese climbers did not agree. 6 Nepalese went to South Col: Sungdare, Padam Bahadur Tamang, Ang Karma, Ang Rita (Thami), Narayan Shrestha and Hira Bahadur Rana - why all 6 did not reach summit Isono doesn't know. Nepalese defied end of climb decision and it was their country, but Japanese were not permitted to continue, much to their unhappiness.

South Chinese climbing leader stayed in North Col camp with Shigehiro at end of climbing period.

North side 3 stages of climb:

1) make camps and carry loads to N Col,
2) do same to C6 with loads but not occupy higher camps - then all down
to BC for rest,
3) complete climb

In 1st stages down to BC when bad weather came after C3 established. In 2nd stage C6 reached but not slept in for 1st time on 9th April. In 3rd stage C6 occupied and rest of climb completed. Did not at any time have to wait for south side climbers to make progress because south side route easier to climb and number of camps less and fact summit date fixed for 5 May. team led by Yamada who knew south route. 6 Yamada selected before leaving Japan so could descend without south side climbers to guide them down. Oxygen used in C5 sleeping, same members climbing above C5, all members sleeping in C6 and C8 and climbing above C7. From summit Yamada left about 10:30 am with Ang Lhakpa and Cerin Douji. Linert Yamada party at top and left 11:00 am. Lhakpa Sona and Yamoumoto left 12:30 pm. Camera crew reached top just few minutes before south side trio arrived and left at 1:00 pm. 2nd traverse team in C5 ready to go to top (had ferried oxygen and food and fuel to C6 on 6 May) when decision taken late night 6 May by Chinese leader (final decision taken 3:00 am 7 May Chinese time) so 2nd team to N Col 7 May. In 2nd team was Mitani who would have been his 4th ascent.

[ADDITIONAL TEXT WAS REMOVED FOR THIS EXAMPLE]

You can see the campsite information we want is available near the start of the document (there are actually two teams):

noth_side = BC(28/03,5350m),C1(04/04,6000m),ABC(11/04,6500m),C3(19/04,7300m), C4(28/04,7986m),C5(04/05,8300m),Smt(05,10/05)
south_side = BC(06/03,5154m),C1(11/03,5500m),C2(12/03,6000m),ABC C3(17/03,6500m),C4(01/04,7028m),C5(08/04,7790m),C6(02/05,8200m),C7(04/05,8680m),Smt(05/05)

I decided not to attempt to recover campsite information from the route details. The main reason was that the text was not available in the expeditions data I extracted, and I would need to either:

  • borrow my friend’s laptop again to see if the route details data was available,
  • or write a web scraper with Selenium to extract route details from The Himalayan Database Online page.

Then, I would need to automate extracting campsite information. Because I was creating this visualization for fun, I decided not to go down this road and instead removed this type of record from the visualization.

Multiple Teams

Another problem with campsite descriptions is that some expeditions have information for multiple teams:

BC(05-06/08,5250m),ABC(15-16/08,5821m),C1(20/08,6248m); W-Ridge.C2(03,05/09,6700m),C3(08/09,6900m)xxx(10/09,7240m); N-Face-xxx(23/09,7315m) (see route notes)

In these cases, I kept only the first route.

Reconstructing Elevation Profiles

To reconstruct elevation profiles from the campsite field I implemented the following logic:

[STEP 1] — Split the campsite information by commas, provided the comma is not within parentheses. Here’s an example of how this split would work:

# Original string
orig_str = 'BC(10/11,5350m),C1(01/12,6050m),C2(03/12,6400m),C3(06/12,7300m),C4(21/12,8000m),xxx(22/12,8700m)'

# Split by commas not in parentheses
splt_str = ['BC(10/11,5350m)', 'C1(01/12,6050m)', 'C2(03/12,6400m)', 'C3(06/12,7300m)', 'C4(21/12,8000m)', 'xxx(22/12,8700m)']

[STEP 2] — For each element in the list, try to recover the campsite name, date, and elevation. This can be done by matching the following regex patterns and saving them in a tuple:

date_patt = r'\b\d{2}/\d{2}\b'  # Match a string in the format of two digits, a forward slash, and another two digits, surrounded by word boundaries
elev_patt = r'\b\d{4}m\b' # Match a string that consists of exactly four digits followed by the letter 'm', and the whole pattern is expected to be a whole word
camp_patt = r'([^\(]*)\(' # Matches anything before an open parentheses

For example, extracting the campsite, date, and elevation information from split_str above yields the following list of tuples:

# Create (DD/MM, elevation, campsite) tuples 
camp_tuples = [('10/11', 5350.0, 'BC'), ('01/12', 6050.0, 'C1'), ('03/12', 6400.0, 'C2'), ('06/12', 7300.0, 'C3'), ('21/12', 8000.0, 'C4'), ('22/12', 8700.0, 'xxx')]

[STEP 3] — Drop tuples where the date is missing (without a date, the elevation and campsite information is not sufficient for creating an elevation profile). Don’t worry if we accidentally remove a base camp tuple because of a missing date (we will add this back in Step 7).

[STEP 4] — If date is present in a tuple but elevation is missing, check if campsite name is present. If campsite name is present, get campsite elevation from other records.

[STEP 5] If any tuple still has missing elevation information, drop it. There’s no way we can recover this.

[STEP 6] — Drop tuples where the elevation exceeds the elevation in the highpoint field (this is a contradiction).

[STEP 7] — By this point all tuples in a list will have a date and an elevation, and we can assume most list of tuples looks like this (note that I removed campsite name):

# Standard campsite list
[('10/11', 5350.0), ('01/12', 6050.0), ('03/12', 6400.0), ('06/12', 7300.0), ('21/12', 8000.0), ('22/12', 8700.0)]

Some lists, however, will be empty. Fortunately, we can infer two waypoints for any expeditions:

  • Base camp’s elevation and date (from the base camp elevation listed in other records and from the bcdate field).
  • The turnaround point’s elevation and date (from the highpoint and smtdate field).

Appending the base camp elevation and date to the start of each list, and the turnaround elevation and date to the end, we guarantee that each record has at least two waypoints: start and end.

[STEP 8] — Drop tuples where elevation is lower than the elevation of a previous tuple (these were mostly edge cases where the regex pattern wasn’t general enough to extract the data correctly, but also occurred due to issues with the input data such as listing an elevation higher than the height of Everest). Make sure to also drop duplicate tuples (same date and elevation) if any.

Steps 1–8 produce a final elevation profile (i.e., a sequence of elevation waypoints) for each remaining expedition (note that I converted the dates to YYYY/MM/DD format using the information in the year column):

            expid                                        camp_tuples
0 EVER88401-1988 [(1988/11/10, 5360.0), (1988/12/01, 6050.0), (...
1 EVER88402-1988 [(1988/12/08, 5360.0), (1988/12/10, 6000.0), (...
2 EVER89306-1989 [(1989/08/21, 5360.0), (1989/08/25, 6400.0), (...
3 EVER89310-1989 [(1989/09/01, 5360.0), (1989/09/08, 5600.0), (...
4 EVER89305-1989 [(1989/08/17, 5360.0), (1989/08/27, 6450.0), (...

Plotting The Waypoints

The waypoint information (campsite date/elevation tuples) for each expedition can be transformed as follows:


expid date y x
0 EVER00101-2000 2000-04-07 5360.0 0
1 EVER00101-2000 2000-04-23 6450.0 16
2 EVER00101-2000 2000-05-01 7200.0 24
3 EVER00101-2000 2000-05-24 8849.0 47

where y is the elevation in meters and x is the number of days since the start of the expedition. After interpolating the value of y using a monotonic cubic spline (PchipInterpolator to smooth the data while preserving monotonicity) and plotting with matplotlib we get the following image:

The yellow lines are expeditions I wanted to highlight and the red x’s were added to help align text in Illustrator during a later step.

I naturally wondered what on earth were these extreme points to the left (2 days to summit? really?) and to the right (over 90 days?). It turns out the point to the left is Nirmal Purja’s summit from “Project Possible” (yes, the guy from the “14 Peaks: Nothing Is Impossible” documentary). The point on the right was a grueling expedition that ended in frostbite.

I decided to truncate the data to 80 days to make the bulk of the data easier to visualize. The final plot looked something like this:

The yellow profiles correspond to Nirmal Purja’s ascent, the first ascent, the first female ascent, and the longest expedition (to the turnaround point) under 90 days (I looked at the “achievements” field in the database to find the expedition ID’s of the “first ascent” expeditions).

I was a bit bothered by the “kinks” in the second to last waypoint for most expeditions. I decided I wanted to smooth this out a bit more even if it did mean compromising on the exact elevation value (I was more interested in viewing trends, not exact values). Instead of doing this smoothing in Python directly (which would likely involve increasing the interpolation resolution and/or removing the problematic waypoint after interpolation), I decided to take care of this in Illustrator (where curve smoothing is straightforward).

Edit In Illustrator & Photoshop

After saving the previous plot as an SVG, you can open it in Adobe Illustrator, select all paths, and use the Object > Path > Smooth slider to smooth them:

Left: Original paths. Right: Smoothed paths. Smoothing in Illustrator is quick but smoothed paths no longer follow waypoints exactly.

In Illustrator you can select paths with similar properties (such as line thickness and color) and adjust them in the same way. In this case, I selected all lines and increased their transparency (the elevation profiles create a hairlike texture after increasing transparency). I also added some annotations, removed the red x’s once I no longer needed them (their positions were calculated in Python and used to align the “80% of ascents above 8500m…” text):

Adjust colors & opacity, add text, create layout.

As a final touch I exported the image as PNG and added a paper texture in Photoshop:

Add paper texture by overlaying a lightened paper image in multiply mode (you can’t see it here, but there is a paper texture in the final image shared at the start of this article).

That’s it!

Final Thoughts

Creating this visualization was relatively straightforward: both data processing and plotting were done in Python with the layout designed in Illustrator and final touches (texture) added in Photoshop. This sort of approach works well here because I wanted to create a static visualization. If you need a dynamic plot or have a more complex visualization, using D3.js to plot the data (instead of simply doing this directly through matplotlib in Python) might be more appropriate.


Visualizing Everest Expeditions was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment