Make delicious recipes!

Find lexical order of alphabets from a given dictionary of words



Problem: Given a dictionary of words, find the order of alphabets that make that dictionary.

Example:
If dictionary = {
		"2ad",
		"2bd",
		"abb",
		"abd"
	}

Then following relationships can be gathered:

2 is before a (because 2bd comes before abb, check first char)
a is before b (because 2ad comes before 2bd, check second char)
b is before d (because abb comes before abd, check third char)

If we traverse the path from each character 2, a, b and d, we get the complete path only from 2.
Hence the complete order of alphabets in the above is 2,a,b,d

Solution:
Dictionary is a sorted data-structure.
So if we traverse the first alphabets of each word in the dictionary, we will get lexical order for some of alphabets.
This order of alphabets will be lexicographically sorted, but it may have some holes in it.
For example: In the above case, first letters only contain the alphabets 2 and a.

This can be resolved by looking at the second level of alphabets.
But there is a catch.
The second level of alphabets can be matched only if first level of alphabets is same.
For example, in the above, we can only compare second level of alphabets in 2ad with 2bd but we cannot compare second level of alphabets in 2bd with abb

Once all these comparisons are done level-wise, we can create a map which will store immediate lower-level characters per alphabet.
Then we simply traverse this map for each character and wherever the length of the path becomes equal to number of alphabets, we get a solution.

Example: For the above dictionary : {"2ad", "2bd", "abb", "abd"},
  1. We build a Map<Character, Set<Character>> of characters as:
    chars coming after 2: [a] (by seeing first char of "2bd" and "abb")
    chars coming after a: [b] (by seeing second char of "2ad" and "2bd")
    chars coming after b: [d] (by seeing third char of "abb", "abd")

  2. We traverse this map.
    Since order of hash-map is not guaranteed, let's say we first got 'a'
    Following 'a' through the map, the order we get is 'a', 'b', 'd' which is less than number of characters 4
    So, we ditch 'a' and go to the next char in the map until the length of ordered set obtained becomes 4

  3. If we use a LinkedHashSet and LinkedHashMap, this order can be obtained much earlier.


A slightly more complex example

Let the words be:
AC
AD
BDG
CG

Then our map would look like:

1st Pass:
A: [B]
B: [C]

2nd Pass:
A: [B]
B: [C]
C: [D]

3rd Pass:
No new information obtained.
So we begin to traverse the map. Longest path is obtained if we traverse from A and that is A -> B -> C -> D
However, number of unique alphabets in the given word set is 5.
Thus, for this case, we have to conclude that the number of words is insufficient to determine dictionary order.

However, if one more word was present say BDC, then 3rd pass would yield:
A: [B]
B: [C]
C: [D, G]

Still, we get maximum path as either:
A -> B -> C -> D or
A -> B -> C -> G

Still not resolved !

Finally, if we had BG present in there, then 2nd pass would give:
A: [B]
B: [C]
C: [D]
D: [G]

And this would give the correct answer: A -> B -> C -> D -> G

Java code



import java.util.*;
import java.util.Map.Entry;

public class OrderOfCharactersFromDictionary 
{

    public static void main(String[] args) 
    {
        int numChars=7;

        Set<String> dict = generateRandomDictionary (numChars);
        int maxWordLen = findMaxWordLength (dict);
        
        HashMap<Character, Set<Character>> graph = buildGraph(numChars, maxWordLen, dict);
        printGraph (graph);
        
        LinkedHashSet<Character> result = new LinkedHashSet<Character> ();
        findOrder (graph, result, numChars);
        
        System.out.print ("\nResult: ");
        for (Character c : result)
            System.out.print (c + ", ");
    }


    static HashMap<Character, Set<Character>> buildGraph (int numChars, int maxWordLen, Set<String> dict)
    {
        HashMap<Character, Set<Character>> grph = new HashMap<Character, Set<Character>> ();
        
        for (int i=0; i<maxWordLen; i++)
        {
            Iterator<String> itr = dict.iterator();
            String word1 = itr.next();
            
            while (itr.hasNext())
            {
                String word2 = itr.next();
                if (word1.length() > i && word2.length() > i)
                {
                    if (word1.substring(0, i).equals(word2.substring(0, i))) // IMP: Most important line of whole code                 
                    {
                        char c1 = word1.charAt(i);
                        char c2 = word2.charAt(i);
                        if (c1 != c2)
                        {
                            Set<Character> children1 = addOrGetChar(grph, c1);
                            Set<Character> children2 = addOrGetChar(grph, c1);
                            children1.add(c2);
                        }
                    }
                }
                word1 = word2;
            }
        }
        
        return grph;
    }
    

    // Major function to traverse the built graph of characters and find the existence of a path-length=numChars
    static void findOrder(HashMap<Character, Set<Character>> graph, LinkedHashSet<Character> result, int numChars)    
    {
        Iterator<Entry<Character, Set<Character>>> itr = graph.entrySet().iterator();
        while(itr.hasNext())
        {
            Entry<Character, Set<Character>> ele = itr.next();
            
            result.add(ele.getKey());
            findOrder (ele.getKey(), result, graph, numChars);
            if (result.size() == numChars)
                return;
            result.clear();
        }
    }
    
    
    // Auxiliary function used with upper function
    static void findOrder(
            Character key, 
            LinkedHashSet<Character> result, 
            HashMap<Character, Set<Character>> graph,
            int numChars
    ) {
        
        if (graph.containsKey(key) == false)
            return;
        
        for (Character c : graph.get(key))
        {
            if (result.contains(c))
                return;
            result.add(c);
            findOrder (c, result, graph, numChars);
            if (result.size() == numChars)
                return;
            result.remove(c);
        }
    }


    // Helper function to check existence of key.
    // Adds the key if it is not present in the map.
    static Set<Character> addOrGetChar(HashMap<Character, Set<Character>> grph, char c) 
    {
        Set<Character> children = grph.get(c);
        if (children == null)
        {
            children = new HashSet<Character>();
            grph.put(c, children);
        }
        return children;
    }
    
    
    
    // Helper function to find the maximum word length
    static int findMaxWordLength (Set<String> dict)
    {
        int maxWordLen = 0;
        for (String word: dict)
            if (word.length() > maxWordLen)
                maxWordLen = word.length();
        return maxWordLen;
    }
    
    
    
    // Helper function to create a random list of words
    static Set<String> generateRandomDictionary (int numChars)
    {
        Set <String> dictionary = new HashSet <String> ();
        int charsPresent[] = new int[numChars];
        int charsProcessed = 0;
        int size = dictionary.size();
        
        while (charsProcessed < numChars || size < 2*numChars)
        {
            int rndm = ((int)(Math.random()*50))%numChars;
            char c = (char) ('a' + rndm);
            
            if (charsPresent[rndm] == 0)
                charsProcessed++;
            charsPresent[rndm]++;
            
            if (Math.random() < 0.3)
            {
                dictionary.add("" + c);
            } 
            else 
            {
                Set <String> dictionary2 = new HashSet <String> ();
                for (String w: dictionary)
                {
                    if (Math.random() > 0.5 && w.length() < 6)
                        dictionary2.add(w + c);
                    else
                        dictionary2.add(w);
                }
                dictionary = dictionary2;
            }
            size = dictionary.size();
        }
        
        
        Set<String> set = new TreeSet<String> ();
        for (String w : dictionary)
            set.add(w);
        for (String w: set)
            System.out.println(w);
        
        return set;
    }

    
    // Helper function to print the graph of characters
    static void printGraph(HashMap<Character, Set<Character>> graph) 
    {
        System.out.println();
        Iterator<Entry<Character, Set<Character>>> itr = graph.entrySet().iterator();      
        while(itr.hasNext())
        {
            Entry<Character, Set<Character>> ele = itr.next();
            
            System.out.print("\n" + ele.getKey() + " = ");
            for (Character c : ele.getValue())
                System.out.print(c + ", ");
        }
        System.out.println();
    }

}


Example executions:


Generate Dictionary 1:
aababc
b
bdabbd
cbbacd
cbbcab
cd
dbda
dcdaca


Graph of alphabets
b = d, c, 
c = d, 
a = b, c, 


Result: a, b, c, d






Generate Dictionary 2:
ae
afabeb
afdcef
b
c
decdac
dfafcf
ecaefb
fafcae
fdcfef
ffab
gaefbd
gcfabd
gfbdeb

Graph of alphabets
f = g, 
d = f, e, 
e = f, 
b = c, 
c = f, d, 
a = d, b, c, 

Result: a, b, c, d, e, f, g, 



Order analysis:
Order complexity of this algorithm is O(mn) where m is the number of words in the dictionary and n is the maximum word-length





Like us on Facebook to remain in touch
with the latest in technology and tutorials!


Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Name:
Email: (Your email is not shared with anybody)
Comment:

Facebook comments:

Site Owner: Sachin Goyal