230 likes | 355 Views
Fast Business Process Similarity Search with Feature- based E stimation. Zhiqiang Yan*, Remco Dijkman, Paul Grefen. Contents. Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion.
E N D
Fast Business Process Similarity Search with Feature- based Estimation Zhiqiang Yan*, Remco Dijkman, Paul Grefen
Contents Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion
Business Process Similarity Search Given a process model repository and a query process, it returns all the similar processes in the repository with respect to the query process.
Business Process Similarity Search Similar to not Similar to
State of the art • Dijkman et al. (BPM09) present algorithms that can rank all the business process models in a repository basing on their similarities to a given query process model. • However, compare the query model with all the models in the repository. How to improve?
Contents Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion
Process Graph Similarity Estimation • Model sets: relevant, potentially relevant, irrelevant. • Only rank models in the “potentially relevant” set with algotithms, e.g., BPM09. How to Estimate? Rank Potentially relevant Relevant Irrelevant
Contents Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion
Features Features are: • small fragments • characteristic for a model This makes them suitable for quick rough measurements Features are: • labels • structures (start, stop, split, join and regular (sequence)) • role of a node • combination of nodes
Features Number of nodes • Node feature • Label: { Buy Goods, Receive Goods, Verify Invoice} • role: {(start,split),(regular),(join,stop)} • Seq (2) feature : {(Buy Goods, Verify Invoice), (Buy Goods, Receive Goods), (Receive Goods, Verify Invoice)} • Split(3) feature : {(Buy Goods, (Verify Invoice, Receive Goods))} • Merge(3) feature : {(Buy Goods, Receive Goods), Verify Invoice)}
Label Feature Similarity String Edit distance between label1 and label2 Ed(l1,l2) lSim (l1,l2) = 1.0 - Max length of label1 and label2 Max(|l1|,|l2|) • Label feature • lSim (l1,l2) = 1.0 - 7/13 = 0.46 • lSim (l1,l2) >= lcutoff ----- Similar
Role Feature Similarity 1 if start ∈ croles ∧ stop ∈ croles avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1) if start ∈ croles ∧ stop ∈ croles \ • rSim (n1,n2) = avg(1,1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|), 1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ \ • Role feature where croles = role(n)∩role(m) • Similarity of input role:1-0/(1+1)=1 Similarity of output role: 1-2/2=0 • rSim (n1,n2)=(1+0)/2=0.5
Discriminative Role Feature 1 if any r∈role(n)∩ r∈role(n) : discriminative(r) • disc(n1,n2)= 0 otherwise • Discriminative Role feature • |{n|n∈N, r∈R(n)}|/|N|<=dcutoff -> discriminative(r) • Discriminative power
Feature Similarity • Role feature • rSim (n1,n2) *disc(n1,n2)>= rcutoff ----- Similar
Feature Matching • Node feature matching rules: • lSim (l1,l2) >= lcutoffh ----- matched • lSim (l1,l2) >= lcutoffm and rSim (n1,n2) *disc(n1,n2)>= rcutoff----- matched • Sequence, split and join feature matching rules : • base on node feature matching
Feature-based Process Graph Similarity and Pre-Selection Number of features are matched m1+m2 GSim(g1,g2) = n1+n2 Number of features in g1 and g2 GSim = ratior GSim = ratiop Potentially relevant Relevant Irrelevant improved?
Contents Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion
Quality Evaluation 100 'document' processes 10 'search' processes 'documents' relevant to each 'search' determined by human judgement retrieve ‘documents’ basing on features comparison between automatically retrieved results and human judgement compute precision(R)
Time Evaluation • 604 'document' processes • 10 'search' processes • compute time consuming
Conclusions 7 types of features to pre-select processes Node and Path(2) features works well Larger features do not help Search time is reduced Precision(R) is stable