Community search

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref.,[1] which reviews all the recent studies [2][3][4][5][6][7][8] [9] [10] [11]

Main advantages

As pointed by the first work on community search[2] published in SIGKDD'2010, many existing community detection/discovery methods consider the static community detection problem, where the graph needs to be partitioned a-priori with no reference to query nodes. While community search often focuses the most-likely communitie containing the query vertex. The main advantages of community search over community detection/discovery are listed as below:

(1) High personalization.[3][9][10] Community detection/discovery often uses the same global criterion to decide whether a subgraph qualifies as a community. In other words, the criterion is fixed and predetermined. But in reality, communities for different vertices may have very different characteristics. Moreover, community search allows the query users to specify more personalized query conditions. In addition, the personalized query conditions enable the communities to be interpreted easily.

For example, a recent work,[9] which focuses on attributed graphs, where nodes are often associated with some attributes like keyword, and tries to find the communities, called attributed communities, which exhibit both strong structure and keyword cohesiveness. The query users are allowed to specify a query node and some other query conditions: (1) a value, k, the minimum degree for the expected communities; and (2) a set of keywords, which control the semantic of the expected communities. The communities returned can be easily interpreted by the keywords shared by all the community members. More details can be fround from.[11]

(2) High efficiency. With the striking booming of social networks in recent years, there are many real big graphs. For example, the numbers of users in Facebook and Twitter are often billions-scale. As community detection/discovery often finds all the communities from an entire social network, this can be very costly and also time-consuming. In contrast, community search often works on a sub-graph, which is much efficient. Moreover, detecting all the communities from an entire social network is often unnecessary. For real applications like recommendation and social media markets, people often focus on some communities that they are really interested in, rather than all the communities.

Some recent studies[4][9] have shown that, for million-scale graphs, community search often takes less than 1 second to find a well-defined community, which is generally much faster than many existing community detection/discovery methods. This also implies that, community search is more suitable for finding communities from big graphs.

(3) Support for dynamically evolving graphs.[3] Almost all the graphs in real life are often evolving over time.  Since community detection often uses the same global criterion to find communities, they are not sensitive of the updates of nodes and edges in graphs. In other words, the detected communities may loose freshness after a short period of time. On the contrary, community search can handle this easily since it is able to  search the communities in an online manner, based on a query request.

Community search often uses some well-defined, fundamental graph metrics to formulate the cohesiveness of communities. The commonly used metrics are k-core (minimum degree),[2][4][6][7][9] k-truss,[5][8] k-edge-connected ,[12][13] etc. Among these measures, the k-core metric is the most popular one, and has been used in many recent studies as surveyed in.[1]

References

  1. Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, Xuemin Lin. 2019. A Survey of Community Search over Big Graphs. arXiv link: https://arxiv.org/abs/1904.12539.
  2. Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10). ACM, New York, NY, USA, 939-948. DOI=https://dx.doi.org/10.1145/1835804.1835923
  3. Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Online search of overlapping communities. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 277-288. DOI=https://dx.doi.org/10.1145/2463676.2463722
  4. Wanyun Cui, Yanghua Xiao, Haixun Wang, and Wei Wang. 2014. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 991-1002. DOI=https://dx.doi.org/10.1145/2588555.2612179
  5. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 1311-1322. DOI=https://dx.doi.org/10.1145/2588555.2610495
  6. Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, and Rui Mao. 2015. Influential community search in large networks. Proc. VLDB Endow. 8, 5 (January 2015), 509-520. DOI=https://dx.doi.org/10.14778/2735479.2735484
  7. Nicola Barbieri, Francesco Bonchi, Edoardo Galimberti, and Francesco Gullo. 2015. Efficient and effective community search. Data Min. Knowl. Discov. 29, 5 (September 2015), 1406-1433. DOI=https://dx.doi.org/10.1007/s10618-015-0422-1
  8. Xin Huang, Laks V. S. Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate closest community search in networks. Proc. VLDB Endow. 9, 4 (December 2015), 276-287. DOI=https://dx.doi.org/10.14778/2856318.2856323
  9. Yixiang Fang, Reynold Cheng, Siqiang Luo, Jiafeng Hu. 2016. Effective community search for large attributed graphs. Proc. VLDB Endow. 9, 12, 1233-1244.
  10. Yixiang Fang, Reynold Cheng, Xiaodong Li, Siqiang Luo, Jiafeng Hu. 2017. Effective community search over large spatial graphs. Proc. VLDB Endow. 10, 6, 709-720.
  11. http://i.cs.hku.hk/~yxfang/acq.html
  12. Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Wenjie Zhang. "Index-based optimal algorithms for computing Steiner components with maximum connectivity." In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 459-474. ACM, 2015.
  13. Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. On minimal steiner maximum-connected subgraph queries. IEEE Transactions on Knowledge and Data Engineering 29, no. 11 (2017): 2455-2469.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.