Friday, June 18, 2021

I'm Not Saying I'm a Trendsetter...

...But my parents sure were!

    My name is Treyson (sounds like Jason) and in this life I have met only 2 other people with that name. People butcher the pronunciation so frequently that I pretty much go by Trey-Sahn at this point. There are folks that I have professionally known for years that have never gotten it right. 

    A few years ago I was working with this Census dataset for a project which has a list of how many people were born in any year with any name. To show up on the list, there has to be 5 occurrences of that name. Everyone who knows me, knows I am going to say that the absolute best way to go through this is with Alteryx, so I am going to get to the point now, and then later I will show you how I did the analysis. 

According to this dataset, the very first year that there were more than 5 people born with the name Treyson was 1990. I was born in 1989.


If you look at the chart below (also made in Alteryx), you will see that the US hit peak Treysondom during the time I was at university (2008-2011) and we have had a slow decline since.

    So that's a fun thing. Data is great and you can learn a lot. What else do you need from me? Okay, now we are now going to look at the technique and why Alteryx is so good for these quick insights on a dataset like this one (100+ txt files).

    This is a high level overview. You can get a copy of my workflow here. Alteryx has the directory tool, which allows you to grab a list of all files in the directory. So once you have downloaded and unzipped the data from the census site above, you just have to point that tool to the newly exposed folder containing our text files. Dynamic Input, which is my favorite tool, allows you to query a bunch of files all at once, as long as they have the same structure. So all of these have 3 fields (name gender and count), which makes it really easy. I also bring in the file name so that we can grab the year that each record is coming from. Then we clean up some data points, creating dates from text, changing field types, etc. And then graph. You can download the workflow and follow along and even take a look at the popularity of your name or your kids names. This took like 5 minutes to create and 10 seconds to run. Super fun stuff.



No comments:

Post a Comment

An 80 Year Old Chicken Fryer

    How the heck do I write a blog about cast iron pans that hasn't already been written? There are so many benefits to using cast iron ...