How to find information in the stuff

November 12, 2024

30in30, women-in-tech

This post is part of my 30in30 challenge, where I write 30 minutes every day for 30 working days. Due to my limited time for this challenge, the content will be only very lightly researched and edited. The idea is to just write. Find my voice, and find the courage to publish. To follow my curiosity wherever it may take me.


Have you ever wondered how successful people made it in the field of technology? What was their background, their journey, their story?

I think about this a lot, and it's often one of the first questions I ask: what is your background? How did you get into tech? How did you get to where you are now, and what can I learn from you?

I realize this sounds very selfish, but I won't lie. I do want to learn, and how better can I learn than from the people who are already where I want to be?

I find people's stories fascinating. I used to worry that my background in psychology was pointless and that I wasted years of university education I'm not even using (spoiler: I didn't need to worry), but the more I learn about other people, the more I see that the best of them also had a very non-linear journey in life.

Karen Spärck Jones was one of these people.

Born in the 1940s, she studied at Cambridge University, UK. Her bachelor's degree was in history, and her PhD was in Philosophy. Karen joined the Cambridge Language Research Unit and became fascinated by natural language processing and information retrieval. She aimed to get computers to understand people rather than teach people how to talk to computers.

She came up with inverse document frequency, which measures how much information the word provides, how common or rare it is across all documents. By combining statistics with linguistics, she came up with formulas for computers to interpret relationships between words. To this day, her work underpins every modern search engine to rank search results based on their relevance to the search query.

During her research, the internet was in the early stages of public adoption. It is estimated that in the 90s, the internet consisted of 23,500 websites with about 300 terabytes of data.

This is what Karen said during a lecture in 1994:

That stuff (information) is absolutely flooding on to the internet. So the question is - how to find the information in the stuff. You can easily find something. But finding what you really need is much more difficult.

Dare to guess how much information is on the internet today?

One estimate is that it is about 120 zettabytes (ZB) of data. That is about 120,000,000,000,000 gigabytes. That's a whole lot of bytes, if you ask me!

The stuff is really flooding on to the internet now.

Not only was Karen an expert at inverse document frequency, she was also a big advocate for the involvement of women in these fields. She believed in the importance of diversity in tech, particularly in computing, to drive innovation and more inclusive technology. As one of the very few women in her field during her time, Karen saw the critical need for women's involvement in developing technology that affects all of society.

In fact, her slogan is slowly becoming my own personal motto:

I think it's very important to get more women into computing. Computing is too important to be left to men (alone).

Hear, hear, Karen. I couldn't have said it better.