Overview

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F 1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

Presentation

Code

An implementation of DeepWalk is available on Github

Usage

Example Usage

$deepwalk --input example_graphs/karate.adjlist --output karate.embeddings

--input: input_filename

  1. --format adjlist: for an adjacency list, e.g::

     1 2 3 4 5 6 7 8 9 11 12 13 14 18 20 22 32
     2 1 3 4 8 14 18 20 22 31
     3 1 2 4 8 9 10 14 28 29 33
     ...
    
  2. --format edgelist: for an edge list, e.g::

     1 2
     1 3
     1 4
     ...
    
  3. --format mat: for a Matlab MAT file containing an adjacency matrix (note, you must also specify the variable name of the adjacency matrix --matfile-variable-name)

--output: output_filename

The output representations in skipgram format - first line is header, all other lines are node-id and *d* dimensional representation::

    34 64
    1 0.016579 -0.033659 0.342167 -0.046998 ...
    2 -0.007003 0.265891 -0.351422 0.043923 ...
    ...

Full Command List The full list of command line options is available with $deepwalk --help

Requirements

  • numpy
  • scipy

(may have to be independently installed)

Installation

  1. cd deepwalk

  2. pip install -r requirements.txt

  3. python setup.py install

Citing

If you find DeepWalk useful in your research, we ask that you cite the following paper::

@inproceedings{Perozzi:2014:DOL:2623330.2623732,
 author = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
 title = {DeepWalk: Online Learning of Social Representations},
 booktitle = {Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '14},
 year = {2014},
 isbn = {978-1-4503-2956-9},
 location = {New York, New York, USA},
 pages = {701--710},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2623330.2623732},
 doi = {10.1145/2623330.2623732},
 acmid = {2623732},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, latent representations, learning with partial labels, network classification, online learning, social networks},
}