By Charlotte Weber,
It is summer, and in a PhD student’s life this only means one thing – no, not holidays – but time to go for a summer school! That sounds fun, doesn’t it? But what is a summer school and why should I go there?
First of all, summer schools usually provide intense courses running over a week or two with some hands-on experience. Second, summer schools are generally open to everyone, so you don’t have to be enrolled at the host university, which makes it a great networking opportunity because the participants often come from all over the world.
So during my summer, I went to the University of Oslo, where I attended the “Oslo Summer School in Comparative Social Science Studies 2017” and took this year’s course in “Collecting and Analyzing Big Data” for a full week.
The course was mostly methodological, meaning we were taught how to use the programming language Python in order to analyze data or Big Data in particular. In our case, this meant more something like loads of data. For example, imagine an excel file where it would take you something like an hour to scroll to the bottom. Then you don’t want to extract or transform this type of data manually, right? Because you wouldn’t only mess up your data and introduce some sort of error by copy, pasting, and cutting, but you might also go a little nuts when trying to find that exact piece of information you are looking for in there. It just wouldn’t be cool (nor fun) to go in there and fiddle with the data if there is just too much of it. Yet, if you need to analyze that type of thing, then there are smart and cool ways to do so – with one of them being Python! Python lets you write some code, so that all the data extraction, manipulation, and analysis can be done without you actually having to botch around in that file. You can simply apply your commands to the file, to several rows, columns, extract only the data with certain properties, or anything else you can imagine. There really are no limits. It’s so wonderful, I’m still in awe!
Besides data analysis, we also got to learn how to scrape the web. Just think of the internet and the amount of information available, just floating out there, all the sites, all the data! Collecting that kind of information on a larger scale by hand, however, would take you a very, very long time. Your hand would probably sooner fall off before you manage to copy and paste all the information that you might want or need. So here again, you want to automate some of these processes by writing some pretty code and scraping the info with Python. You just run your code and – whoop – all the info is there, saved and stored on your computer! Easy as that! Beautiful!
So after all, what did I take from this summer school? As so often, it is not only the things I was taught but also the people I met and the experiences I made. It felt great to let my inner nerd out and find like-minded around me who would want to spend lunch breaks over screens trying to ‘crack a problem’ or try to ‘finally scrape this website. Also, it is always such a pleasure to meet PhDs from all over the place, with many stories to tell and PhD experiences to share.
Maybe the most fascinating thing for me was that in this course, we were all interested in learning the same thing. No matter the discipline we came from, we still wanted to learn the same method. This is something I hardly see, even though I work in a very multi-/inter-disciplinary environment. One of the bigger struggles working with other disciplines is often the methods and the different ways to tackle a problem. But at the summer school, the anthropologists, economists, sociologists, and biologists all sat in the same room and everyone was writing the same code, using the same program, to do the same thing: collect and analyze data.
“The world is one big data problem.”- Andrew McAfee, MIT scientist
That made me realize at the end of the week, that data is really what connects us. We might have our own methods within each of our disciplines, but data is growing every day and will probably increase rather than decrease in the next years. So while we need to try and stay up to date to tackle tomorrows (data) problems, it is great to know that we can, after all, bond over data!