Trie data structure from scratch


Its been a while since I posted a new tutorial, and better yet a data structure I haven’t covered yet. So here is Trie. What I’ve seen is there are more interview questions regarding usage of Tries, so we’ll cover implementation, usage and advantages, and some ideas for interview questions.

What is a Trie?

Also known as a prefix tree, a Trie is a search kind of tree. In most use cases, the prefix tree stores words as the keys (leaf nodes), and the characters (their ASCII unicode values) are nodes that refer to specific letter in the alphabet. Its different from a binary tree in that you can have multiple child nodes branching from parent nodes (similar to a B-Tree however different time complexity).

In both pictures below, we can see the root is blank, but its successive children are each based on a specific character or letter of a dictionary. For example, suppose you wanted to insert a new word that begins with letter “T”, the Trie finds the character already, and appends the new word by traversing downwards until no matching characters are found, then creating new nodes until all characters of the new word have been inserted. The leaf node now represents a valid word in the Trie.

Methodology for implementation

In any Trie, note that the root itself refers to no character, its blank. However, successive child nodes all refer to a single character in the alphabet.

In any data structure, we first create the Node. In this case, we’ll call it TrieNode. Its characteristics are ideally that it has an array of type TrieNode, of size 26 (all the letters of the alphabet) where each index refers to a character. It also has a boolean that determines if a node completes a full word or not. Here’s a simple implementation in Java.

public class TrieNode {
    public static final int ALPHABET_SIZE = 26;
    TrieNode[] children = new TrieNode[ALPHABET_SIZE];
    boolean endOfWord;
    TrieNode(){
        endOfWord = false;
        for(int i = 0; i < ALPHABET_SIZE; i++){
            children[i] = null;
        }
    }
}
Each node in the Trie contains an array where each index maps to a specific ASCII character, and also indicates whether or not the node completes a full word.

Now we get to the full Trie implementation. I’ll only cover insertion, searching, and display (traversal). I’ll list each algorithm step by step.

Insertion:

  1. Pass in a string as input to insert into the trie
  2. Create a TrieNode pointer to the root, so that we can traverse the trie
  3. Traverse the entire string:
    1. Get the ASCII index by getting the character at the i-th index, and then shift the ASCII value so that we get only values 0-25
    2. If our current pointer finds that the child node at the ASCII index is null, we create a new TrieNode at that index
    3. Otherwise, and in any case, update the current TrieNode pointer to the newly added TrieNode
  4. Since we’ve finished traversing the entire string, we set the current TrieNode pointer to be the end of a completed word (boolean value).

Searching:

  1. Pass in a string as input to search in the trie
  2. Create a TrieNode pointer to the root, so that we can traverse the trie
  3. Traverse the entire string:
    1. Get the ASCII index by getting the character at the i-th index, and then shift the ASCII value so that we get only values 0-25
    2. If our current pointer finds the child node at the ASCII index to be null, we return false
    3. Otherwise, update the current TrieNode pointer
  4. We finish traversing the entire string, we return true if our current TrieNode pointer is not null, AND our current pointer is looking at the end of a word.

Display (Trie traversal):

Trie traversal means performing recursive calls to each successive child nodes, in other words, performing a depth first traversal. For this example, its best to keep the recursion private (a helper method) and the actual function call public.

  1. We pass in a root node, a StringBuilder (to build a string to display), and a integer index called ‘level’ to insert characters at particular index (depth within the Trie).
  2. Our base case: if the current node we look at is the end of a word (boolean check), we print the string. We also clear the StringBuilder from the current level index to the end so as to not print any redundant characters.
  3. Loop through all the indices for the children array of the current node, our size is the alphabet size.
  4. If the current child node is not null:
    1. Insert into the string builder AT the current level in the Trie, shift the integer to correct ASCII code (i + ‘a’), convert to string.
    2. Recurse at the child node on the i-th index. Example: display(node.children[i]…)
class Trie {
    
    TrieNode root;
    
    /* Initializing the root */
    public Trie() {
        root = new TrieNode();
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        int index;
        TrieNode pCrawl = root;

        for(int i = 0; i < word.length(); i++){
            index = word.charAt(i) - 'a';
            if(pCrawl.children[index] == null){
                pCrawl.children[index] = new TrieNode();
            }
            pCrawl = pCrawl.children[index];
        }
        pCrawl.endOfWord = true;
    }
    
    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        int index;
        TrieNode pCrawl = root;
        
        for(int i = 0; i < word.length(); i++){
            index = word.charAt(i) - 'a';
            if(pCrawl.children[index] == null){
                return false;
            }
            pCrawl = pCrawl.children[index];
        }
        return (pCrawl != null && pCrawl.endOfWord);
    }
    
    /*Display the contents of the Trie*/
    public void display() {
    	StringBuilder sb = new StringBuilder();
    	//we pass in a root node, string builder, and a level (index) to insert chars at
    	displayHelper(root, sb, 0);
    }
    private void displayHelper(TrieNode node, StringBuilder str, int level) {
    	//base case for displaying a full word (key)
    	//if a node is the end of a word (boolean), we print
    	if(node.endOfWord) {
    		//clear any chars remaining from previous words inserted into the string builder
    		str.delete(level, str.length()); 
    		System.out.println(str.toString());
    	}
    	
    	//loop through all the indices through a child array
    	//if a non null child is found, append the character to the String 'str'
    	//and recursively call the helper method on its child node
    	for(int i = 0; i < TrieNode.ALPHABET_SIZE; i++) {
    		if(node.children[i] != null) {
    			//insert a char at the level index
    			
    			//example: level = 2, char is 'y'
    			//our current string builder contains: 
    			//t r a n k s b y e a....
    			//_ _ _ _ _ _ _ _ _ _
    			
    			//now, we replace
    			//t r y <--insert at index 2, we'll clear other chars at the base case
    			//_ _ _
    			str.insert(level, Character.toString((char) (i + 'a')));
    			displayHelper(node.children[i], str, level + 1);
    		}
    	}
    }
}

Usage and efficiency

Trie has many uses, for example auto-completion in search engines, spell checking, phone book searching, dictionary games such as Boggle, and dictionary/IP routing based algorithms in general. They can be more efficient than a hash table in that lookup is faster in the worst case (due to possible hashing collisions). It can also provide alphabetical ordering of entries. On other hand, they require much more space than a hash table, in that memory is allocated for each node created.

Space: O(w*n) w is the amount of words in the trie, n is the amount of nodes in the trie.

Time:

  • Search – O(m) m is the length of the search key, O(1) best case if your search key is just ‘a’.
  • Insertions – O(m) m is the length of the search key
  • Delete – O(m) m is the length of the search key

Practice and further reading

Some practice interview questions that exercise usage of a Trie:

  1. https://leetcode.com/problems/add-and-search-word-data-structure-design/description/
  2. https://leetcode.com/problems/top-k-frequent-words/description/
  3. https://leetcode.com/problems/implement-magic-dictionary/description/

Further readings:

I hope to post more tutorials, I made this one a little lighter and more concise. Please rate and comment, I hope this has helped you in any way. Cheers.

One comment

  1. This Cleaning international company is giving attention professionalismtheir trained workers and tracks the timely growth of their vocational training. To our professionals by virtue of pretty challenging works, all will ensure the cleanliness of the most inaccessible sites.
    Select optimal for your company mode – day, evening or night.

    Coming maid is private housekeeper, which comes at any convenient for client time and performs the list on the eve of the stipulated works. This popular work, allowing to maintain office, apartment, house, or other housing in complete order, and not spending all free time on the decision homemade problems. Ifyou required coming worker Midtown Manhattan, our firm provide her for your firms.

    our company can order service maid Broadway Flushing several times in month / week. Maid will come to your apartment / house / office in advance approved time , and fulfillpreset list of goals. Customer enough totalonly give an assessment high quality made work and enjoy clean and comfortable housing .

    Molly maid New-York – Maids Brooklyn

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s