Thursday 27 July 2017

All about Trie data structure !


Contents :
  1. Introduction to Trie
  2. Insertion of a string in Trie
  3. Searching a string in a Trie
  4. Application : Auto-Complete the keyword

  • Introduction to Trie :
    Trie is a data structure which is very efficient for information retrieval and in competitive programming it has many applications. Using tries we can search a keyword in O(n) time in a dictionary where n is the length of keyword. To insert an element in a trie we also need O(n) time. But it is not memory efficient as each node of a trie consists of multiple branches where each branch represents a possible character of keys. We need to mark last node for every string so that when we search for a key in trie we must know its endpoint. Lets see how data is stored in a trie with the help of an example.
    So this is how data is stored in a trie. Green nodes are the special nodes indicating the end of a particular string. Lets see the structure we need to make this trie. First of all we need a boolean variable which is true for special nodes. We need various pointers depending on type of data, lets consider our data consists of only lowercase alphabets. So we need 26 pointers.
    1
    2
    3
    4
    struct trie{
        bool isleaf;      // True when special node
        trie * next[26];  // Pointers to next possible charatcers
    };
    

    So, now we know the structure of trie, lets move to insertion part.
  • Insertion of a string in Trie :
    We are given the string to insert in the trie and the root node of the trie. In order to insert the string in the trie we iterate through all the characters in the given string. Initially we are at root node. So we need to check if there is a pointer to node corresponding to the first character of the string. If found then just jump to that node and check the next character of the string. But if node is not found (i.e. NULL), at this point we need to create a new node and save the pointer to that node in the root node and jump to that node. Similarly do this for every character of the string. Once all the characters are found (corresponding pointers) or inserted in the trie mark the special variable of last node as true.
    Implementation :
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    trie * getNode(){
        trie * node  = new trie();
        node->isleaf = 0;
        for(int i = 0 ; i < 26 ; i ++){
            node->next[i] = NULL;
        }
        return node;
    }
    
    
    void insert(string s, trie *root){
        trie * node = root;
        for(int i = 0 ; i < s.size() ; i ++){
            int index = s[i] - 'a';        // index of pointer corresponding to ith char of string 
            if(node->next[index] == NULL){ // if pointer does'nt exists
                trie * temp = getNode();   // create a new node
                node->next[index] = temp;  // save the pointer to new node 
            }
            node = node->next[index];      // jump to next node 
        }
        node->isleaf = true;               // mark special node : True
    }
    
    // inside int main()move to
    trie *root = getNode();
    insert("happy" , root);
    
    Complexity :
    As we have only one loop for covering all the characters of the string (0 to size of keyword), so time complexity is of order n i.e. O(n), where n is size of keyword.
  • Searching a string in a Trie :
    Searching is also an easier task once you understand the insertion part clearly. We start from root node and look the next node corresponding to the first character of the string. If node is not found then return false as string is not present in the trie. If corresponding node is found we move to next character of the string and jump to the corresponding node. We do the same for all other characters of the string. Once we checked all the characters of the string, we need one final check which is to check the special variable of the last node. If special variable of last node is true it means that string exists in the trie otherwise the given string is one of the prefix of a bigger string that exists in the trie.
    Implementation :
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    bool search(string s, trie * root){
        trie * node = root;
        for(int i = 0 ; i < s.size() ; i ++){
            int index = s[i] - 'a';
            if(!node->next[index]){    // Checking the pointer corresponding to given string 
                return 0;              // Not Found
            }
            node = node->next[index];
        }
        return (node->isleaf == true); // Return true if special is true                   
    }
    
    Complexity : O(n), n = length of keyword
  • Application : Auto-Complete the keyword
    This problem was asked in many coding interviews, so thats why i think it is important to discuss here.
    This is interesting because soon we will create our own program which provide suggestions for the entered keyword. This is similar to google suggestions when we type something in search box.
    Problem : Given a string s, print all the strings which are stored in a trie and have prefix s.
    Solution : Initially we are at root node. First of all we find the given string s in the trie. If it is not found then there could not be any other string with prefix s. But if it is found we reached to the node N corresponding the last character of the string s. Now we need to print all the words which can be formed from the nodes beneath the node N. For this we can apply DFS with a little modification. So DFS will print all the strings with prefix s. It will print only when it will reach the special node. We need 3 parameters for DFS which are :
    1. Pointer to current node
    2. String s which is prefix for all the strings
    3. Remaining string, lets call it suffix
    Lets see the implementation part for this problem !
    Implementation :
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
      #include<bits/stdc++.h>
    
      using namespace std;
    
      typedef long long ll ;
    
      struct trie{
          bool isleaf;  
          trie * next[26];
          
      };
    
      trie * getNode(){
          trie * node  = new trie();
          node->isleaf = 0;
          for(int i = 0 ; i < 26 ; i ++){
              node->next[i] = NULL;
          }
          return node;
      }
    
    
      void insert(string s, trie *root){
          trie * node = root;
          for(int i = 0 ; i < s.size() ; i ++){
              int index = s[i] - 'a';
              if(node->next[index] == NULL){
                  trie * temp = getNode();
                  node->next[index] = temp;
              }
              node = node->next[index];
          }
          node->isleaf = 1;
      }
    
    
      trie *search(string s, trie * root){
          trie * node = root;
          for(int i = 0 ; i < s.size() ; i ++){
              int index = s[i] - 'a';
              if(!node->next[index]){
                  return NULL;
              }
              node = node->next[index];
          }
          return node;    
      }
    
    
      void dfs(trie* root, string prefix, string suffix){  
      // Print all the suggestions recursively
          if(root->isleaf == 1){
              cout << prefix << suffix << endl;
          }
          for(int i = 0 ; i < 26 ; i ++){
              if(root->next[i]){
                  string newSuffix;
                  newSuffix = suffix;
                  newSuffix.push_back(i + 'a');
                  dfs(root->next[i], prefix, newSuffix);
              }
          }
      }
    
      int predict(string s, trie * root){ // Show suggestions
          trie * node = root;
          if(!(node = search(s, root))){
              return -1;
          }
          dfs(node, s, "");
      }
    
      int main(){
          int n;
          cin >> n;
          trie *root = getNode();
          for(int i = 0 ; i < n ; i ++){
              string s;
              cin >> s;
              insert(s, root);
          }
          int Q;
          cin >> Q;    // Q : queries
          while(Q--){
              string s;
              cin >> s;   // We need suggestions for string s.
              int x = predict(s, root);
              if(x == -1){
                  cout << "No keywords found" << endl;
              }
              
          }
      }
    

    So after reading this post you get to know a lot about Trie data structure and if you have any doubt or you find some error or bug in the program please don't hesitate to write in the comment section.
    Thank You   
THE END !!

No comments:

Post a Comment

Featured Posts

Euler Totient Function

Hello Coders, I am writing this blogpost after a very long time. I am really sorry for that. :( This blogpost is related to a mathemat...