CodeCopyCoffee

Dark Theme | Category: Code, Educational

Bash Random Word Generator

Today was my first day at a new job, and as such I was setting new passwords, which triggered a memory. At my previous job we used to have a “master password” that was used for shared test accounts, which was a certain number of random words separated by spaces (they no longer have shared accounts or master passwords over there – take off your hacker pants and relax, people). I really enjoyed those passwords, so I decided to make my own random word generator. NOW LISTEN. I didn’t intend to use Bash, necessarily, but when I searched for a tutorial to follow that’s what popped up. More specifically, this tutorial. I will admit, it lost me at “Create a new script and paste the following code into it”. Sure, the post describes, high-level, what each step of the script is doing. HOWEVER. If you’re familiar with my work, you know I appreciate a tutorial that holds your hand and explains everything in great detail so that folks at every level can walk away with some transferrable skills and not just the thing they came for.

So today we are going to do two things:
1. Walk through the LinuxConfig tutorial and figure out exactly what’s going on
2. Walk through the less snazzy beginner-friendly version I came up with on my own

LinuxConfig Bash Random Word Generator – Explained

Let’s start with the code:

The first thing this script does on lines 6 through 12 is check whether exactly 1 argument has been given. If not, it prompts the user to specify the number of words they want to generate and rerun the script with that number as an argument. The 1>&2 at the end of each line redirects the echoed statement from stdout to stderr, basically making those statements error messages instead of standard output. The script then exits with exit code 0 so that the user can rerun the script with an argument as directed.

Next, three variables are established on lines 15, 16, and 19:
bash variables

The comment # Constants is confusing because one of them is changing. X is going to be used as a counter in the while loop that we’ll look at next. ALL_NON_RANDOM_WORDS stores the path to a file that contains a list of words (you should have this file by default also if you’re on Linux or a Mac). The non-random_words variable uses wc -l, a built-in scripting tool, to count the number of lines in the words file. Each word is on its own line in this file, so non_random_words ends up having a value around 235,886.

Okay, time for the while loop:
bash while loop

Let’s talk about the easy part first. $X was set to 0 in the last step, and $1 is the argument supplied by the user as to how many random words they want. On line 28, X is incremented every time the loop runs. The loop will generate one random word each time it runs, so it will execute until it has output the number of words the user requested.

A random number is being generated on lines 25 and 26. It’s what I like to call “big ugly math”, and I suck at pretty little math, but I’m going to do my best to break it down because that is how much I care about you. In Linux and Linux-based systems, /dev/urandom (and /dev/random) are special files that act as random number generators. The od command on line 25 stands for octal dump and is used to dump files in octal and other formats (such as hex, binary… basically, different ways of representing numerical data). Next we have some option flags specifying how we want to format the data. -N3 displays the first 3 bytes, -An specifies that output should have no byte offset, and -i displays results as an integer. For more info about od and to see some examples of how the flags affect the output of the data, see this GeeksForGeeks post and/or this LinuxHint post.

Now that we’ve formatted some data, line 26 uses awk to do some math. The basic syntax is awk -flags variables {program} file. The -v flag allows us to pass in some shell variables, f and r. f is being set to 0, and r is set to the value of the $non_random_words variable, which we established previously is something like 235,886 (depending on exactly how many words your version of the file contains). The printf command in awk isn’t just for printing to the screen, it’s also used for formatting data. %i\n specifies that an integer (not a float) should be returned. This is important because Line 27 is going to use the random number generated on Line 26 as a Line number for the words file, and Line numbers are integers. Then there’s the math itself, which returns a number from 1 to the number of words in the file. I will be honest – I am MYSTIFIED by f in all of this. Since f=0, I don’t understand the point of adding it to r, or anything else for that matter. I’ve run the program with and without it and I can’t tell a difference in functionality. TBH, I do not think you need f, but if you know why we need f, please find me on Instagram/Twitter/LinkedIn and educate me. It’s driving me bananas, honestly.

“Hold on,” you might now be saying. “This seems like a lot of work to generate a random number. What about Bash’s $RANDOM function? Why can’t we use that?” Great question. I used it in my version, as you’ll see, but there is a very good reason it won’t work well in this situation. That function returns a value between 0 and 32,767. The words file in use here has around 235,886 words. If we were to use $RANDOM, we would barely scratch the surface of that file. Plus, since it’s in alphabetical order, we’d be limited to words beginning with a through c, which is kinda boring, unless of course we worked some additional magic to jump around the file – and at that point, it makes more sense not to use $RANDOM at all.

On line 27 the sed command is used to select the word from the words file that is on the line that corresponds to the $random_number variable set above, and then echo it to the screen. The path to the words file, you will recall, is stored in the $ALL_NON_RANDOM_WORDS variable.

Okay, that was a lot. Go grab some coffee and a snack, and then I’ll walk you through my version.

My Version – Explained

Again, let’s start with the code as a whole and then we’ll walk through each piece. If you want a copy-and-pasteable version I’ve put it on GitHub.
bash script for random word generator
On line 3 I’m using read with the prompt (-p) flag to output a prompt, accept user input, and save said input in a variable called num. I like taking input from the user instead of accepting arguments for two reasons: 1) If I’m not the person running the script, the user doesn’t have to read the script first in order to understand which arguments are required and what the limitations on those might be (in this case, a number from 1 to 10) – all they have to do is run it, and they’re guided to the result. Actually, let’s be honest: even if I am the person running the script, I have way too many scripts to keep track of which ones require which arguments passed in which order. 2) It feels like I’m playing a choose-your-own-adventure text game (Zork, anyone?) and adds a sprinkle of joy to my script-running experience.

The if statement on lines 5 through 9 checks if the user’s input is a number between 1 and 10, and if it isn’t, gently chastises them and exits the program with a code of 1 (i.e. not a success code, which would be 0). The important thing to note here is that user input, even if it looks like an integer, is always a string. Bash doesn’t use type coercion the way JavaScript (parseInt()) or Ruby (to_i) does. Instead, Bash uses context clues (with a little help from us) to figure out the data type we want. In this case, the double parentheses (( … )) tell Bash that we need $num to be used as an integer.

Next, I did some setup for myself on lines 11 through 17:
bash code

If you read through the LinuxConfig version, you’ll see the author used a file inherent to Linux systems that’s located in usr/share/dict/words. I ran into two issues with this file. First, it was 235,886 lines long. I used Bash’s $RANDOM function, which we previously established only returns a value as high as 32,767. Second, a lot of the words in that file are way out of left field (a few that came up in my tests: awapuhi, bathycolpic, and cardionephric). There’s nothing wrong with that from a practical standpoint, but the amusement I get from these kinds of passwords has to do with the juxtaposition of common words resulting in an amusing mental image. For personal preference reasons and the aforementioned numerical reason, I decided to create my own file.

I used this random word generator to produce a list of 2,466 words (the maximum I could get returned in one go) that I then copied and pasted into the file you see referenced on line 12. If you were determined to go for the maximum number of unique words I’m sure you could do it, but since I’m just messing around with this script this was good enough for my purposes. The same way LinuxConfig did, I then used Bash’s built-in wc (word count) function with the -l (line) flag to count the number of lines in the file, since each word was on its own line. I saved the file path and the word count in the variables $words and $word_count respectively.

Note: When writing Bash in a professional context I adhere to the convention that variable names should be in ALL_CAPS. When I’m just writing my own code for fun, I don’t want it yelling at me.

Finally, I set up two empty arrays on lines 16 and 17 that the subsequent for loops are going to fill.

The first for loop on lines 20 to 23 looks like this:
bash for loop

You’ll recall that on line 3 I asked the user how many words they want to generate, 1 through 10, and saved their answer to a variable called $num. Here I’m using a for loop so that for each random word they requested, I’m going to generate a random number between 1 and the number of words in that word file I have (the $word_count variable). I’m then going to push that random number into the empty $num_arr array I created earlier. Therefore, if the user entered 4, our $num_arr will look something like: (12 13987 462 9881). Remember, that’s what Bash arrays look like syntactically, so if you’re more used to JavaScript it might help you to visualize the array as [12, 13987, 462, 9881]. Basically, we now have an array of random integers between 1 and 2,466 (because that’s how many words I have in my file specifically) and the length of our array is based on the user’s input.

Let’s move on to our last block of code:
Bash for loop

This for loop works like a for each loop in other languages: for each value in the $num_array, I’m setting a variable called current_word equal to one of the words in my words file, and then I’m putting that word into my $word_arr array, which you’ll recall is the second empty array I created earlier. Let’s look at line 27. The value of $i is the value of the current array element – not an index number – so if we’re using the example array above on the first loop $i == 12. We then use sed to look for the 12th line of our words file, the path to which is stored in the variable $words. The q is for quit and tells sed not to print anything else after the 12th line. d is for delete, but in this case we aren’t deleting anything, we’re using q;d to only output the one line (and in this case, word, since each line has only one word on it) that we want. This Stack Overflow post and this section of the sed docs have more info on q;d.

On the last line of this script, line 31, I echo the $word_arr array like so: echo ${word_arr[@]}. The result is the random words the user requested. If our user entered 4, they might get a result like:
running a bash script and the result

Tell me that doesn’t sound like a group of young dinosaurs taking a math test and they are NOT happy about it. Funny, right?

I hope you’ve enjoyed learning to make a Bash Random Word Generator two different ways, and I hope you enjoyed my way more (just kidding). At this point I should probably just change the name of this blog to #BashIsLife because I say this every time and every time it’s a lie BUT I swear I will write about other languages eventually. I’m working on taking my PHP skills up a notch and I want to get back into Ruby, maybe we can do that together? Anyway, have a wonderful week, gentlefriends, and I will see you next time for a non-Bash (probably Bash) tutorial!

</ XOXO>

Enjoy my content and want to show your appreciation? You can share this post, pay it forward by teaching someone else, or buy me a coffee!

[Photo credit: Andreas Fickl via Unsplash]

Back to the Blog