August 26, 2012

Finding LCA of two nodes in a tree



A tree is given say T, we’ve to find out the lowest common ancestor of two nodes. LCAT(u,v) – given nodes u, v in T(tree), return the node furthest from the root that is an ancestor of both u and v. In below figure, node in green is LCA of u and v.

We’ll use RMQ(Range Minimum Query) to solve LCA problem as both problems could be reduced to each other and solving RMQ seems easy. In case you don't know RMQ please go through this link RMQ tutorial on TopCoder

If an algorithm has preprocessing time f(n) and query time g(n), we will say that the algorithm has complexity {f(n),g(n)}. There’s an lemma which states that:
If there is {f(n),g(n)} solution to RMQ then there exists {O(f(2n-1))+O(n), O(g(n))+O(1)} time solution exists for LCA.

For RMQ we can use Segment Tree. Processing time f(n) for segment tree is O(N) and query time is O(lgN). Hence putting these {O(N),O(lgN)} in above lemma we get {O(N),O(lgN)} time for LCA also.
First of all we’ve to reduce LCA problem into an RMQ problem. Reduction process is as follows:
An Euler tour in an undirected graph is a tour that traverses each edge of the graph exactly once. An undirected graph has a closed Euler tour iff(if and only if) it is connected and each vertex has an even degree. Here we’ve tree and we’ll visit every edge twice. Well, both of the statements seems contradictory but for the understanding, take it as Euler tour of tree.
We’ll build three arrays, Euler tour array E[], Level of nodes in tree L[] and representative array R[] which will hold the first occurrence of node in Euler tour.

  • E[1,..., 2*N-1] - the nodes visited in an Euler Tour of TE[i] is the label of i-th visited node in the tour. E[i] is the label of the i-th node visited in the Euler tour of i. The Euler Tour of  T  is the sequence of nodes we obtain if we write down the label of each node each time it is visited during a DFS(Depth First Search). The array of the Euler tour has length 2*N-1 because we start at the root and subsequently output a node each time we traverse an edge. We traverse each of the N-1 edges twice, once in each direction.
  • L[1,..., 2*N-1] - the levels of the nodes visited in the Euler Tour. L[i] is the level of node E[i]. Let the level of a node be its distance from the root. Compute the level array , L[1,…,2*N-1] where L[i] is the level of node E[i] in the Euler Tour.
  • R[1,...N] in which R[i] is representative of node is the index of the first occurrence of node i in E[].

In implementation things we’ve to do are:
1.  Run DFS on tree and build E[], L[], R[]
2.  Build segment Tree for L[]
3.  For two query nodes, u and v, first of all we find values in R[], i.e. R[u] and R[v]. This will give us two index of E[]. Suppose X=R[u] and Y=R[v]. X and Y are indices in E[] which gives us range for RMQ. As L[] keeps the level of nodes, we’ll perform query on L[]. We’ve to find that index for which holds the minimum value between L[X] and L[Y]. The node corresponding to that index in E[] will be our answer.
The picture shows an example and is self-explanatory:



Implementation for this example is given below. There are three functions which are quite basic but still deserves some explanation
  • dfs(root,level): This is just dfs of tree with some additions. We store nodes in E[] two times, first when we start exploring that node and second time when we returns from it. In the 'traverse' part of dfs we've three exact lines which are outside of 'traverse' because we've to store the node again after the traversal is finished to the node.
  • segmentTree(int node, int start, int end): This function builds the segment tree. node is the index of root node. start is starting index of L[], i.e. 1 and end is the last index of L[], i.e. 2*N-2 
  • query(int node, int start, int end, int left, int right): node, start, end has the same meaning as in the last function. left is the left node for query, similarly right is the right node for query. These left, right comes from R[] and the range of left to right is queried in L[]
Here's the code for the above example


# include <iostream>
# include <vector>
# include <climits>
# include <algorithm>

using namespace std;

# define traverse(C, itr) for(typeof((C).begin())itr=(C).begin(); itr!=(C).end();itr++)

const int n = 10;	//	number of nodes
// There are n nodes, so n-1 edges. We've to traverse every edge twice, so 2*n-2
// root is counted twice and arrays are 0-indexed hence 2*n-2+1+1 = 2*n
//	E[] --> Euler tour array, L[] --> Level of nodes
//  R[] --> Representative of node in E[], tree[] --> Segment Tree
int E[2*n], L[2*n], R[n+1], tree[100];
vector<int> G[n+1];	// Graph to store undirected graph
bool color[n+1];		// To mark nodes visited in DFS

void dfs (int, int);
void segmentTree (int, int, int);
int query (int, int, int, int, int);

int main() {
    fill(tree,tree+100,-1);
    fill(color,color+n+1,true);	// all nodes are unvisited
    fill(R,R+n+1,INT_MAX);		// to get first occurence of node
    int a, b;
    // n is number of nodes, hence there are n-1 edges
    for (int I=1; I<n ; I++) {
        cin >> a >> b;
        G[a].push_back(b);       //	two pushbacks as its undirected graph
        G[b].push_back(a);
    }
    dfs (1,0);		// root node is 1 and its level is 0
    //	third argument is 2*n-2 as (2*n-1)th is reptition of first node
    segmentTree(1,1,2*n-2);
//	Displaying contents of E[], L[], R[]

    for (int I=1; I<=2*n-1 ; I++ ) {
        cout << E[I]<<" ";
    }
    cout <<endl;

    for (int I=1; I<=2*n-1 ; I++ ) {
        cout << L[I]<<" ";
    }
    cout <<endl;

    for (int I=1; I<=n ; I++ ) {
        cout << R[I]<<" ";
    }
    cout <<endl;

   	//	give two nodes for which query is done
   	//	for e.g. take a=10, b=7 as per example
    cin >> a >> b;
    int l = R[a];
    int r = R[b];
    if (l > r)swap(l,r);// left index should always be less than right index
    int ans = query(1,1,2*n-2, l, r);
    cout << E[ans] <<endl;
    return 0;
}

void dfs (int root, int level) {
    static int index = 1;
    E[index] = root;
    L[index] = level;
    if (R[root] > index)
        R[root] = index;
    index++;
    color[root] = false;
    traverse(G[root],itr) {
        int curr = *itr;
        if (color[curr] == true) {
            dfs(curr, level+1);
            //	to keep records of finished node after traversal
            E[index] = root;
            L[index] = level;
            index++;
        }
    }
}

void segmentTree(int node, int start, int end) {
    if (start == end) {
        tree[node] = start;
        return ;
    }
    segmentTree(2*node, start, (start+end)>>1);
    segmentTree(2*node+1, ((start+end)>>1)+1, end);
    // keep the index of minimum
    if ( L[tree[2*node]] <=  L[tree[2*node+1]])
        tree[node]  = tree[2*node];
    else tree[node] = tree[2*node+1];
}

int query (int node, int start, int end, int left, int right) {
    if (left > end || right <start)
        return -1;
    if (left <= start && right >= end)
        return tree[node];
    int leftpart= query (2*node, start, (start+end)>>1, left, right);
    int rightpart = query (2*node+1, ((start+end)>>1)+1,end, left, right);
    if (leftpart == -1)return rightpart;
    if (rightpart == -1) return leftpart;
    // we want that index which has minimum value
    return L[leftpart]<L[rightpart] ? leftpart: rightpart;
}





August 15, 2012

How many paths of given length in a graph?

A graph is given and also a value N where N is the path length between two nodes in a graph.  Every edge is considered to be of length one. We've to tell how many different paths of given length could be in graph.
    Well, there's a easy solution to this. We've to build an adjacency matrix for the given graph and then raise this matrix up to the power N. Now just get the value of matrix[X][Y] where X is starting node and Y is ending node. That's it.
    E.g. There's an ant at top of tetrahedron and can go to any other vertex connected to it. Moving from one vertex to another increases the traversed length by one. Value of path length is given, say L. We've to tell in how many ways the ant can return to the vertex from where it started after traversing length L. Technically we've to find out the number of different cyclic paths with the length of L from starting vertex to itself.
   As it's a tetrahedron so this is a regular graph of degree 3. Just build the adjacency matrix and raise it to the power L.

Problem Statement:  http://codeforces.com/problemset/problem/166/E
My solution:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

public class tetrahedron{
    final static int MOD = 1000000007;
    final static int dim = 4;
    public static void main (String []args)throws IOException{
        BufferedReader br = new BufferedReader (new InputStreamReader(System.in));
        int T, N;
        long array[][] = new long[4][4];
        array[0][0] = 0;
        array[0][1] = 1;
        array[0][2] = 1;
        array[0][3] = 1;
        array[1][0] = 1;
        array[1][1] = 0;
        array[1][2] = 1;
        array[1][3] = 1;
        array[2][0] = 1;
        array[2][1] = 1;
        array[2][2] = 0;
        array[2][3] = 1;
        array[3][0] = 1;
        array[3][1] = 1;
        array[3][2] = 1;
        array[3][3] = 0;
        N = Integer.parseInt(br.readLine());
        long [][] ans = power_fun(array,N);
        long result = ans[3][3]%MOD;
        System.out.println(result);    
    }

    private static long [][] mul (long [][] A, long [][]B){
        long [][]res = new long [dim][dim];
        for (int I=0; I<dim;I++){
            for (int J=0;J<dim;J++){
                for (int K=0;K<dim;K++){
                    res[I][J] += (A[I][K]*B[K][J])%MOD;
                    res[I][J] %= MOD;
                }
            }
        }
        return res;
    }

    private static long [][] power_fun (long [][] base, int pow){
        if (pow == 1 )return base;
        long [][] temp = new long[dim][dim];
        temp = mul (base, base);
        temp = power_fun (temp, pow>>1);
        if ((pow &1) ==1)
            temp = mul (temp,base);
        return temp;
    }
}

August 13, 2012

Lazy propagation in Segment Tree


UPDATE 1: I've written a tutorial explaining basics of Segment tree along with example. You can find it at https://techienotes.info/2015/08/31/segment-tree-tutorial/

UPDATE 2: I've re-written this tutorial about lazy propagation. That updated article explains the technique in a much clear manner with a lot of comments in the code. I strongly recommend you to go through the new article. You can find it at https://techienotes.info/2015/09/01/lazy-propagation-in-segment-tree/


Question: An array of integers of $N$ elements from $1$ to $N$ is given. Two types of operations are to be done. You can add or subtract a value in the range of $p$ to $q$ where $1\le{p,q}\le{N}$ or you can ask for sum of elements between range of $p$ to $q$.



Solution: One can go for bruteforce but it is too slow. Suppose there are $T$ operations and each operation would require $O(n)$ in worst case for a loop $p$ to $q$. This approach leads to $O(T\cdot{N})$ complexity. If $T$ and $N$ both are $10^6$, this will lead to $O(10^{12})$ which isn't acceptable. However, this problem could be solved by other approaches like BIT(Binary Indexed Trees) and so on. I'm going to solve it by Segment Trees based on arrays. This array based Segment tree consumes more memory but it's easy to implement.
 I assume you know at least how to build a Segment Tree. Suppose initially all the elements of array are zero and that's why we don't have to build it explicitly. We can start updating it. What I'm using to update the range is either lazy propagation or something very similar to it. Well whatever it is, here's the idea.
For a given range find that parent node close to root as much as possible which includes the given range completely. Now update this node with value$\times$range of this node. Range of this node means the range of array for which this node is responsible. After this, we set a flag on it. Idea behind flag is that this range will be updated when the query for sum will be fired. When there is query for sum of a range, we start with root and proceed downwards the tree. If flag is set on any node, value of this node will be updated and flag will be passed on to its children, i.e flag of this node will unset and will be set on its children. In this procedure of finding sum we update only those nodes which we encounter in our range. 
Things become easy with example, so let's take this one. 

Problem Statement: http://www.spoj.pl/problems/HORRIBLE/
Here's my code:

import java.io.BufferedReader;
import java.io.IOException;
import java.util.Arrays;
import java.io.InputStreamReader;
import java.util.StringTokenizer;

class horrible{
    private static long tree[], helper[];
    private static boolean color[];

    public static void main (String [] args) throws IOException{
        BufferedReader br = new BufferedReader (new InputStreamReader (System.in));
        StringTokenizer st =null;
        int t = Integer.parseInt(br.readLine());
        for(;t>0;t--){
            st = new StringTokenizer (br.readLine());
            int N = Integer.parseInt(st.nextToken());
            int C = Integer.parseInt(st.nextToken());
            init(N);
            for (int I=1;I<=C;I++){
                st = new StringTokenizer(br.readLine());
                if (st.nextToken().equals("0")){
                    int P = Integer.parseInt(st.nextToken());
                    int Q = Integer.parseInt(st.nextToken());
                    long V = Long.parseLong(st.nextToken());
                    update (1,1, N, P,Q, V);
                }
                else {
                    int P = Integer.parseInt(st.nextToken());
                    int Q = Integer.parseInt(st.nextToken());
                    long ans = sum_of_range (1,1,N, P,Q);
                    System.out.println(ans);
                }
            }
        }
    }

    private static void init(int N){
        tree = new long [262145];
        helper = new long [262145];
        color  = new boolean [262145];
        Arrays.fill (tree,0);
        Arrays.fill (helper, 0);
        Arrays.fill (color, false);
    }

    private static   void update (int node, int start, int end, int left, int right, long value){
        if (left > end || right <start )
            return ;
        if (start == end ){
            tree[node]+= value;
            if (color[node]==true){
                tree[node] += helper[node];
                helper[node]=0;
            }
            return;
        }
        if (left <=start && right >= end){
            helper[2*node] += value;
            helper[2*node+1] += value;
            color[2*node] = color[2*node+1] = true;
            tree[node] += (value *(end-start+1));
            return ;
        }
        else if (left <= end && right >end)
            tree[node ] += (value*(end-left+1));
        else if (start <= right && left <start)
            tree[node] += (value*(right-start+1));
        else if (left >= start && right <= end)
            tree[node] += (value*(right-left+1));

        update (2*node, start, (start+end)/2, left, right, value);
        update ((2*node)+1, ((start+end)/2)+1, end, left, right, value);
        return;
    }


    private static long sum_of_range (int node, int start, int end, int left, int right){
        if (left > end || right <start)
            return 0;
        if (color[node] == true){
            tree[node] += (helper[node] *(end-start+1));
            color[node]=false;
            if (start != end){
                color[2*node]=color[2*node+1] = true;
                helper[2*node] += helper[node];
                helper[2*node+1] += helper[node];
            }
            helper[node]=0;
        }
        if (left <= start && right >= end){
            return tree[node];
        }
        long s1 = sum_of_range(2*node, start, (start+end)/2, left, right);
        long s2 = sum_of_range(2*node+1,((start+end)/2)+1,end, left,right);
        return s1+s2;
    }
}





First of all, size of tree which is $262145$ isn't random figure. We build a segment tree for a given array with with $N$ leaf nodes. For a binary tree with $N$ leaf nodes its height would be $\lceil(\log_2N)\rceil$. So we need space of tree$[2^{\lceil(\log_2N)\rceil + 1}]$.Here $+1$ is because at $n^{th}$ level of binary tree number of elements is almost equal to the number of all nodes from root node to (n-1)th level in worst case so you need one extra level of height. In this case $N = 10^5$, $\lceil(\log_210^5)\rceil+1 = 18$ and $2^{18} = 262144$. Well here figure is $262145$, it's because in my practice. I prefer to allocate one more element to reduce off-by-one errors. Next, there are two other arrays helper and color. color is a boolean array to set/unset the flag on node and helper array is for storing the values in buffer until a query for sum is fired. e.g. there are five consecutive update queries and there is a node which is part of all of these queries. Suppose the five queries were to update the range by $2, 4, 1, 6, 3$. Now for every update query these values will added to helper. After these queries helper[node] will have value of $2+4+1+6+3=16$ assuming initially it was zero. Say, sixth query is for sum. Now, for e.g. if this node has range $6$ to $10$ then after sum query it will be $16\times(10-6+1) = 80$. $10-6+1$ is because this node is responsible for array element $6,7,8,9$ and $10$; a total of five elements. Do not forget to set the helper[node] to $0$ after updation. This approach is called lazy because any node won't be updated until it's needed. The values propagates from root to leaf nodes in lazy way.
Rest of the details could be easily figured out in the code itself. If something isn't clear, please leave a comment.

August 2, 2012

Minimum number of insertions in a string to make it palindrome.




Question: You are given a string and have to find out minimum number of characters to insert in it so that it becomes palindrome. A palindrome is a string which looks same either it is read from left to right or from from right to left.





Solution: Take any string, for e.g. “SARGAM”. This is not palindrome but if you delete the characters 'S', 'G', 'M' what remains is “ARA” and this is palindrome. Now this palindrome was already in the given string and its length is three. So basically we've to find out the length of longest palindrome which is already hidden inside in given string. In this way, our
required answer = length of given string - length of longest palindrome hidden inside it
    Now our task is to find the longest palindrome. If you think a little hard, you'll immediately find that this task is equivalent to finding the Longest Common Subsequence(LCS) of a given string and its reverse, which is a classical DP(Dynamic Programming) problem. I won't go into details of LCS as there are already lots of tutorial on the web. I'll dive directly into calculations. I'll highly recommend you to understand LCS problem and it's solution first, in case if you don't know.

    Given string is “SARGAM”, reverse of it would be “MAGRAS”. We've to find the LCS between these two strings.
Suppose there are two arrays, array_orig[]=”SARGAM” and array_rev[]=”MAGRAS” which we have to compare. We'll compare all characters of array_rev[] with array_orig[] one by one. We'll take first character of array_rev, which is 'M' and compare it with 'S' then 'A' then 'R' and so on till the length of array_orig[]. While comparing if both the characters are same it means that we'll add 'one' to the length of longest palindrome till found. If characters doesn't match then we'll simply take the maximum of values till found. If these lines doesn't make any sense to you, you should better check your concepts on finding LCS.
    In the given table, cell entries gives the length of longest palindrome till that cell. So the final cell which is highlighted in red color is length of longest palindrome in the given string. '0' in bold in row and column entry means comparison with empty string, i.e. of length zero.
 
Cell entries are filled with the formula
   if characters match:
            array$[I][J] =$ array$[I-1][J-1] + 1$
   else   array$[I][J] = $max (array$[I-1][J]$, array$[I][J-1]$);

// array is the name of this table given here.


0 S A R G A M
0 0 0 0 0 0 0 0
M 0 0 0 0 0 0 1
A 0 0 1 1 1 1 1
G 0 0 1 1 2 2 2
R 0 0 1 2 2 2 2
A 0 0 1 2 2 3 3
S 0 1 1 2 2 3 3


This method demands the two dimensional array which would need $O(n^2)$ space for string of length $'n'$. It means for the given string of length $100000$ characters you would need $O(10^{10})$ space which is too much. A little awareness could save you from this trouble. Observe that in array states we need only array$[I][J]$, array$[I-1][J]$ and array$[I][J-1]$, that's it. So we need only the current state and it's last state. For this, we'll need array$[2][length+2]$ where $'2'$ is because of two states and 'length' is length of string given.
After calculating the row entry of array$[I][J]$ copy it to array$[I-1][J]$.

Finally, required answer $= 6$(length of "SARGAM")$ - 3 $(red entry in the table)$ = 3$
So, the given string is "sargam" and after 3 insertions it would become "sMargRamS". Here, upper-case characters are inserted to make it palindrome.

Original Problem Statement can be found here:
http://www.spoj.pl/problems/AIBOHP 

My solution:
# include <stdio.h>
# include <string.h>

inline int max(int a, int b)
{return a>=b?a:b;}

int main()
{
    int tcases;
    scanf("%d",&tcases);
    char str[6100];
    int dp[2][6100];
    while (tcases--){
        scanf("%s",str);
        memset(dp, 0, sizeof dp);
        int I, J, K, length;
        length=strlen(str);
        for(I=length-1;I>=0;I--){
            char ch=str[I];
            for (J=0;J<length;J++)
                if (ch==str[J])
                    dp[1][J+1]=dp[0][J]+1;
                else dp[1][J+1]=max(dp[1][J], dp[0][J+1]);

            for (K=0;K<=length;K++)
                dp[0][K]=dp[1][K];
        }
        printf("%d\n",length-dp[1][J]);
    }
    return 0;
}
References:
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem