Fast Business Process Similarity Search with Feature- based E stimation

Fast Business Process Similarity Search with Feature- based Estimation Zhiqiang Yan*, Remco Dijkman, Paul Grefen

Contents Business Process Similarity Search Process Graph Similarity Estimation Feature Matching and Process Graph Similarity Evaluation Conclusion

Business Process Similarity Search Given a process model repository and a query process, it returns all the similar processes in the repository with respect to the query process.

Business Process Similarity Search Similar to not Similar to

State of the art • Dijkman et al. (BPM09) present algorithms that can rank all the business process models in a repository basing on their similarities to a given query process model. • However, compare the query model with all the models in the repository. How to improve?

Process Graph Similarity Estimation • Model sets: relevant, potentially relevant, irrelevant. • Only rank models in the “potentially relevant” set with algotithms, e.g., BPM09. How to Estimate? Rank Potentially relevant Relevant Irrelevant

Features Features are: • small fragments • characteristic for a model This makes them suitable for quick rough measurements Features are: • labels • structures (start, stop, split, join and regular (sequence)) • role of a node • combination of nodes

Features Number of nodes • Node feature • Label: { Buy Goods, Receive Goods, Verify Invoice} • role: {(start,split),(regular),(join,stop)} • Seq (2) feature : {(Buy Goods, Verify Invoice), (Buy Goods, Receive Goods), (Receive Goods, Verify Invoice)} • Split(3) feature : {(Buy Goods, (Verify Invoice, Receive Goods))} • Merge(3) feature : {(Buy Goods, Receive Goods), Verify Invoice)}

Label Feature Similarity String Edit distance between label1 and label2 Ed(l1,l2) lSim (l1,l2) = 1.0 - Max length of label1 and label2 Max(|l1|,|l2|) • Label feature • lSim (l1,l2) = 1.0 - 7/13 = 0.46 • lSim (l1,l2) >= lcutoff ----- Similar

Role Feature Similarity 1 if start ∈ croles ∧ stop ∈ croles avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1) if start ∈ croles ∧ stop ∈ croles \ • rSim (n1,n2) = avg(1,1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|), 1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ \ • Role feature where croles = role(n)∩role(m) • Similarity of input role:1-0/(1+1)=1 Similarity of output role: 1-2/2=0 • rSim (n1,n2)=(1+0)/2=0.5

Discriminative Role Feature 1 if any r∈role(n)∩ r∈role(n) : discriminative(r) • disc(n1,n2)= 0 otherwise • Discriminative Role feature • |{n|n∈N, r∈R(n)}|/|N|<=dcutoff -> discriminative(r) • Discriminative power

Feature Similarity • Role feature • rSim (n1,n2) *disc(n1,n2)>= rcutoff ----- Similar

Feature Matching • Node feature matching rules: • lSim (l1,l2) >= lcutoffh ----- matched • lSim (l1,l2) >= lcutoffm and rSim (n1,n2) *disc(n1,n2)>= rcutoff----- matched • Sequence, split and join feature matching rules : • base on node feature matching

Feature-based Process Graph Similarity and Pre-Selection Number of features are matched m1+m2 GSim(g1,g2) = n1+n2 Number of features in g1 and g2 GSim = ratior GSim = ratiop Potentially relevant Relevant Irrelevant improved?

Quality Evaluation 100 'document' processes 10 'search' processes 'documents' relevant to each 'search' determined by human judgement retrieve ‘documents’ basing on features comparison between automatically retrieved results and human judgement compute precision(R)

Quality Evaluation

Time Evaluation • 604 'document' processes • 10 'search' processes • compute time consuming

Time Evaluation

Conclusions 7 types of features to pre-select processes Node and Path(2) features works well Larger features do not help Search time is reduced Precision(R) is stable

Fast Business Process Similarity Search with Feature- based E stimation

Fast Business Process Similarity Search with Feature- based E stimation

Presentation Transcript

WORKING WITH BIOSEQUENCES Alignments and similarity search

Fast Parallel Similarity Search in Multimedia Databases

Feature Similarity

Enterprise Search with FAST

NeMa : Fast Graph Search with Label Similarity

Feature Based Similarity

Feature Based Similarity

FTW: Fast Similarity Search under the Time Warping Distance

Database Similarity Search

Feature Based Approaches to Semantic Similarity

Similarity Search

Content-Based Similarity Search

Feature Sets Based Similarity Measures for Image Retrieval

Fast Similarity Search in Image Databases

Search Bugs Fast with Elasticsearch

e-Business Architecture with Enterprise Search Engine

FACULTY e-SEARCH PROCESS

Similarity based deduplication

Similarity Search: A Matching Based Approach

Fast Similarity Search in Image Databases

Database Similarity Search