Make delicious recipes!

Using a dictionary of words, make words from line without spaces



Problem: Given a dictionary of words and a string having no spaces, find out if the line can be completely broken up into words from the dictionary

Example: If dictionary = { "i", "like", "care", "take", "takers", "caretakers", "ice", "ball", "iceball"} and
input = "ilikeiceball", then the program should be able to break up the input string into its constituent words.


Solution:
A recursive solution for this is as follows:

import java.util.ArrayList;
import java.util.HashSet;

public class LineToWords 
{

    static String [] dictionary = 
        { "i", "like", "care", "take", "takers", "caretakers", "ice", "ball", "iceball"};

    static HashSet<String> dictionarySet = new HashSet<String> ();
    static String input =  "ilikeiceball";


    public static void main (String args[])
    {
        for (String word: dictionary)
            dictionarySet.add(word);

        ArrayList <String> words = new ArrayList <String> ();
        boolean possible = linetoWords (words, 0, 0);
    }

    private static boolean linetoWords(ArrayList<String> words, int startPos, int pos) 
    {
        if (startPos >= input.length())
            return true;

        while (pos <= input.length())
        {
            String word = input.substring(startPos, pos);
            System.out.println(startPos + " to " + pos + ", word=" + word);
            
            if (dictionarySet.contains(word))
            {
                words.add(word);
                if (linetoWords(words, pos, pos))
                    return true;
                words.remove(words.size()-1);
            }
            pos++;
        }
        return false;
    }
}


Example executions:

0 to 0, word=
0 to 1, word=i
1 to 1, word=
1 to 2, word=l
1 to 3, word=li
1 to 4, word=lik
1 to 5, word=like
5 to 5, word=
5 to 6, word=i
6 to 6, word=
6 to 7, word=c
6 to 8, word=ce
6 to 9, word=ceb
6 to 10, word=ceba
6 to 11, word=cebal
6 to 12, word=ceball
5 to 7, word=ic
5 to 8, word=ice
8 to 8, word=
8 to 9, word=b
8 to 10, word=ba
8 to 11, word=bal
8 to 12, word=ball
Line to words' breakup: possible
i, like, ice, ball, 


Order analysis:
The above solution is exponential because it tries to solve the same problem again and again.
For example, in the above output, observe the line "5 to 6, word=i", since "i" is a valid word in the dictionary, it assumes that "i" may have been a correct word and then goes on to parse the remaining string.
On failing, it gives up the word "i" as a good choice and tries other combinations.
In doing so, several words in the remaining string are analyzed again, although they have been already analyzed.

Clearly, the algorithm can be improved by applying dynamic programming.
By using memoization in the above algorithm, a hash-map can be used to store valid list of words against every index in the input.


A tabular approach can also be used where one alphabet is introduced one at a time in the outer loop.
And the inner loop checks valid word formation with that alphabet and a lookup.
The order for such an algorithm would be O(n2) and is shown below:



import java.util.ArrayList;
import java.util.HashSet;

public class LineToWords 
{

    static String [] dictionary = 
        { "i", "like", "care", "take", "takers", "caretakers", "ice", "ball", "iceball"};

    static HashSet<String> dictionarySet = new HashSet<String> ();
    static String input =  "ilikeiceball";


    public static void main (String args[])
    {
        for (String word: dictionary)
            dictionarySet.add(word);

        ArrayList <String> words = new ArrayList <String> ();
        boolean possible = linetoWordsDP (words, 0, 0);

        System.out.println ("Line to words' breakup: " + (possible?"possible":"not possible"));
        for (String word: words)
            System.out.print(word + ", ");
    }

    
    private static boolean linetoWordsDP (ArrayList<String> words, int startPos, int pos) 
    {
        ArrayList <ArrayList<String>> lookup = getLookup ();
        
        
        for (int i=0; i<=input.length(); i++)
        {
            for (int j=i; j>=0; j--)
            {
                String word = input.substring(j,i);
                if (dictionarySet.contains(word)==false)
                    continue;
                
                ArrayList<String> wordsAtIndexJ = lookup.get(j);
                ArrayList<String> wordsAtIndexI = lookup.get(i);
                
                if (wordsAtIndexJ.size() == 0) // no valid word list at index j
                {
                    if (j!=0) // if j is 0, only then we can form a word from j to i
                        continue;
                }
                else
                {
                    wordsAtIndexI.addAll(wordsAtIndexJ);
                }
                wordsAtIndexI.add(word);
                break;
            }
        }
        
        ArrayList<String> finalWords = lookup.get(lookup.size()-1);
        boolean possible =  finalWords.size() != 0;
        if (possible)
            words.addAll (finalWords);
        
        return possible;
    }

    
    private static ArrayList<ArrayList<String>> getLookup() 
    {
        ArrayList <ArrayList<String>> lookup = new ArrayList <ArrayList<String>> (input.length()+1);
        for (int i=0;i<=input.length();i++)
            lookup.add(new ArrayList<String> ());
        
        return lookup;
    }
}








Like us on Facebook to remain in touch
with the latest in technology and tutorials!


Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Name:
Email: (Your email is not shared with anybody)
Comment:

Facebook comments:

Site Owner: Sachin Goyal