edit: I’ve since found a few sites that have done similar things to what i’ve done here, so check out:
So I found myself in a particular position: I was trying this new website, Kaggle, and trying to learn a bit of dataframe coding. This is basically the whole idea of using giant databases to extract and transform and analyse it. It was going well until I stopped caring about the problems. We started exploring wine and this was putting me to sleep.
How to make this boring work any more interesting? Well I started looking into other databases. Which could I actually use and find fun? I happened on a very interesting one from 8a.nu. Some guy had basically scraped the entire website, gotten them to file a DMCA against him and threaten to sue. He had collected a half gig of data which he made freely available to everyone! Then someone else pointed out that he had scrapped the names of every climber and he had to anonymize the database.
However, he did not delete it, which is the most important part!
So what I ended up doing below was to explore the data using Python, Pandas and some neato graph tool. Shout outs to matlibplot and seaborn.
Well, first I needed to know: Who uses this site? Is it truly those good for nothing Euros? Well, the data is frightening!
Well, to be expected that a much larger country such as the USA would have more users. The Spanish do have a good representation for their size (about 1/8 the size of the USA but 1/2 the user base), but this isn’t a European site per se. What I would say about this is that I am surprised in a way. I always thought that this site was largely Euro driven but it now makes sense that the language is English.
Next: is it a male lead site? The results will SHOCK you (they wont).
About 52,000 males to 10,000 females. I think this is still a win considering how climbing for grades can sometimes be a very male-centered or dominated sport. However I was very proud to see the ladies showing up (and looking at their stats as well). 10,000 women means about that many datapoints to start with for later visualization.
Outside of this, some other cool data is that Boulder, Colorado is 4th biggest climbing city on this website and Madrid is extremely dense in terms of climbers (top spot by far, not counting that it appears twice due to a capitalization issue in the top 20). Was surprised to see Oslo and Stockholm show up quite high as well. Seattle was another big climbing city, believe it or not.
More data visualization, trying weird shit
So next I thought hey, can I maybe find some sort of regression and plot BMI vs. Grade?
Woof. This sucks. I didn’t really filter anything and just thought the data would lead me to the conclusion. Visualization doesn’t really work like that. And this regression is dog-shit. What does this mean? At 10 bmi you’ll climb this grade of 60 (I’ll explain what this means in a second here)?
So I had to start being a bit more intelligent and I started by filtering for only higher grades and using a swarm plot.
So this data is the hardest sport grade someone (one grade per user) has done vs. their BMI (reported, calculated from height and weight). This is an interesting graph because swarm plots basically try to show density. The grades are based on the internal database grades that was used to relate YDS to Euro to Bouldering Grades. Here’s what they mean:
Which turns out to also be a nice conversion table in any case that anyone would want to use it. Once I saw the swarm acting like some sort of data distribution it was obvious. Time to start being intelligent and using the dreaded bar graph.
Beautiful. Exactly what I wanted to know. People tend to be slightly skinnier at higher grades (the top grades don’t have too many people and so I am not convinced anything after 8c+ is worth digging into). Moreover, much of the top and bottom drops out in terms of BMI. Extremely skinny people put on muscle and extremely fat people lose weight?
Bar plots are great as they show where the 25/50/75 quartiles are and how they relate to the 0 and 100. They also calculate outliers which there are a few in this case. Do I believe that someone who is a BMI of 35 sent a 8b+? Not really but maybe!
No surprise in this graph of women’s bmi vs grades. 2000 lady sends in here and on average are svelter than men. However still within healthy and none of the ladies are extremely underweight it seems. 75% of ladies tend to be in the 19-21 range here.
Here’s age. This is a really weird one. First off because of no one using the split grade of 14d and 15a, there isn’t much in the way of number 73 here (should have run the none zero filter on this one). But what’s surprising is that there seems to be lots of 30 year old crusher out there, and it seems like even into the late 30s people are climbing extremely hard. Between 14a and 14c, however, there’s definately a move towards younger. However, I think you can probably try hard at any age!
(why is 64=13c such an old guy grade though?)
I thought this data may be screaming, “hey, check age distribution!” and so I did.
Yep, Just as predicted. These ages are over represented. Well, even in general population is was easy to see:
Weirdly here, when I looked at the ages of all users through all grades, there seemed to pile more on the younger side than the older side. This seems to make some sort of sense to me as it take a long time to develop the physical characteristics to climb hard. The body is so interesting and climbing is a weirdly mental and neurological sport. Also thirty year olds are bored at work more often and also need to log data as they become middle managers.
Finally, I needed to know: are tall guys at an advantage?
Yeah, Not really. Instead it seems to trend a little bit down when it comes to higher grades. Shorties are probably more able to maintain lower weight, perhaps lessening injury risk? Tall people, are you injured? Let me know!
What did i learn with all this?
So some good stuff came out of this whole affair
- I think I know what ideal BMI looks like now, around 21 for most climbers
- I think I can realize that shorties will always be able to send the hard grades
- Women tend to weigh even less than men, being around 19-20 average for the higher grades
- age seems irrelevant to sending until about 40, maybe even right now the best climbers are in their thirties
- 8a is not only for Euros
- None of this correlation actually has causation built into it
Interesting stuff. If there are any other things people think of here, I’d love to know what to look into. Here i mostly used a script that scrubbed all the data for hardest grade per user and found this. But who knows? Maybe there is actual value in individual ticks (to which there are many thousand more than there are users).
What is not interesting though is that there isn’t much more big data available to us in climbing. One thing I’d love to know is if Ben Moon is sitting on a ton of data with his Moonboard? if so that be excellent to look over.
Anyhow, keep being about a 20-22 BMI!