Generating a network graph of Twitter followers using Python and NetworkX

twitter network

In this article I show you how by starting at a single twitter account we can build up a network graph of twitter followers and then visualize that network using the NetworkX library.

The steps are:

  1. From initial seed account collect followers using the Snowball Sampling technique.
  2. Process the collected twitter data to generate an output file of relationships between twitter accounts.
  3. Visualize network data in a network graph using the NetworkX library.

Step 1. Collect follower data from the Twitter API

You will need to have API keys to be able to query the Twitter API. I have written in previous articles how to do this, e.g. Collecting tweets using Python.

When you interact with the Twitter API you will learn quickly that you need to cache data as you go along. This is because the API is rate limited and you will find any script you write will halt frequently when hitting a rate limit if you don’t cache responses. The solution is to check for cached data before making an API call, if you get a cache miss then query the API and write the returned data to disk.

I use two directories for cached data. The directory ‘following’ contains a CSV file for each twitter account queried. The name of each file is the screen name of the twitter account and the content is a tab delimited list, each row contains the twitter id, screen name and account name of a follower, up to a maximum of 200 followers.

$ ls following/
-rw-r--r-- 1 mark mark 7.1K Aug 14 21:04 TEDxMtHood.csv
-rw-r--r-- 1 mark mark 7.0K Aug 14 21:21 TEDxYYC.csv
-rw-r--r-- 1 mark mark 5.7K Aug 15 07:29 TEDxCibeles.csv
-rw-r--r-- 1 mark mark 2.8K Aug 15 07:30 TEDxProvidence.csv
-rw-r--r-- 1 mark mark 6.9K Aug 15 07:46 TEDxUHasselt.csv
-rw-r--r-- 1 mark mark  625 Aug 15 07:46 TEDxWestVillage.csv
-rw-r--r-- 1 mark mark  196 Aug 15 07:46 TEDxESPRIT.csv
-rw-r--r-- 1 mark mark 2.9K Aug 15 08:02 TEDxUU.csv

cat following/TEDxESPRIT
XXXXXXXXX       dediil  hedil jabou
XXXXXXXXX       MehdiBJemia     Mehdi Ben Jemia
XXXXXXXXX       _willywall      _william
XXXXXXXX        MirakHikimori   Hello Hikimori
XXXXXXXX        maroo_king      Marou

The second directory is called ‘twitter-users’, it is a cache of twitter user details, each file contains cached data for a twitter user including friend and follower counts and a list of follower IDs (up to a maximum of 5000 follower IDs can be queried from the API).

$ ls twitter-users/
-rw-r--r-- 1 mark mark  252 Jul 24 16:45 XXXXXXXXX.json
-rw-r--r-- 1 mark mark  57K Jul 24 16:46 XXXXXXXX.json
-rw-r--r-- 1 mark mark 6.3K Jul 24 17:01 XXXXXXXXXX.json

... Lots more ...


$ cat twitter-users/XXXXXXXX
{
 "name": "TEDxSingapore",
 "friends_count": 147,
 "followers_count": 12814,
 "followers_ids": [
  XXXXXXXXXX,
  XXXXXXXXXX,
  XXXXXXXXX,
  ...
  XXXXXXXXXX,
  XXXXXXXXXX
 ],
 "id": XXXXXXXX,
 "screen_name": "TEDxSingapore"
}

Here is the script to collect this data:

Python file: get_followers.py

I ran this script twice first without a filter on the screen name but limiting the maximum number of following accounts to 20 then again but this time filtering for accounts starting with ‘TED’ (line 102) and allowing up to 200 following accounts to be queried. This will give a mix of TED and non-TED twitter accounts. Running the script:

$ python get_followers.py -s TEDxSingapore -d 3

Max Depth: 3
Found 147 friends for TEDxSingapore
Found 200 friends for TEDWomen
Already been here.
Found 72 friends for TEDxDanteSchool
Found 33 friends for TEDHelp
Retrieving user details for twitter id XXXXXXXX from API...

... Lots more output ...

Step 2. Process twitter data to generate an output file of relationships between twitter accounts

The script below will process the data collected from the twitter API and generate an edge list. That is a list of relationships between twitter accounts. A weight value is included, this value is the total number of followers for the first twitter account, this value is retrieved from the API. The weight value can be used later to prune the network graph.

Python file: twitter_network.py

The output generated from this script:

...

TEDxSingapore   trendwatchingAP 12814
adaptev TEDxSingapore   321
IS_magazine     TEDxSingapore   9955
trendwatchingAP TEDxSingapore   678
TEDxSingapore   GuyKawasaki     12814
TEDxSingapore   InnovateAP      12814
TEDxSingapore   InnosightTeam   12814
TEDxSingapore   ScottDAnthony   12814
TEDxSingapore   WorldAndScience 12814
TEDxSingapore   EntMagazine     12814
...

Step 3. Visualizing the Network using the NetworkX library

We now have all the data we need to generate a network graph. Here are the steps used to visualize the network graph:

  1. Create a directed graph (net.DiGraph) containing all the edge data including metadata.
  2. Remove nodes based on how connected they are to other nodes in the network (i.e. remove poorly connected nodes)
  3. Remove edges that have less than a minimum number of followers
  4. Split nodes into two separate categories, ‘TED’ and ‘non-TED’ sets.
  5. Render each nodeset
  6. Render edges between nodes
  7. Render node labels

Here is the code to generate the twitter network image. I wrote this code in IPython Notebook (this is the reason Line 3 has a magic command that causes matplotlib output to be rendered in the browser):

Python file: visualize.py

  • Line 7 Load edge data from disk
  • Line 9-13 Create a directed graph from the edge data and populate a dictionary with the followers count data
  • Line 18 Centre and restrict size of graph around the SEED node (TEDxSingapore)
  • Line 20-29 Method to prune the network graph by eliminating nodes that don’t meet filter criteria
  • Line 31-41 Method to prune the network graph by eliminating edges that don’t meet filter criteria
  • Line 44, 46 removes nodes and edges from the network that don’t meet the filter criteria
  • Line 67-73 For each nodeset draw the nodes, the size of each node is based on the log value of the followers count
  • Line 76 Draw network edges
  • Line 80-83 Draw network labels, use matplotlib directly to do this rather than net.draw_networkx_labels() method.

Output from running script in IPython Notebook

g:  119567
core after node pruning:  958
core after edge pruning:  198
Not TED 38
TED 160
colourmap:  {'Not TED': 'red', 'TED': 'green'}

twitter network

See Also:

  • Shamit Bagchi

    I seem to be getting a rate limiting error (429) on line 110 of get_followers.py –> c = list(tweepy.Cursor(api.followers, id=user[‘id’]).items()). I give a sleep of 2 minutes and yet this keeps recurring.

    • You need to wait longer than two minutes if you are rate limited. From memory you should wait for 15 minutes before making another API request.

  • I am tryng to get my own followers, but I always get this error:

    >>> python ./get_followers.py -s Spanishwalker -d 3
    File “”, line 1
    python ./get_followers.py -s Spanishwalker -d 3
    ^
    SyntaxError: invalid syntax

    • The scripts on this page require Python 2. It looks from the error message that you are using Python 3. If you are not sure which version of python you are using you can type: “python –version” on the command line.

  • BrianOJee

    Hi, This is a great tutorial. I am trying to run this but can’t render the graph. I am running the code in Pycharm rather than IPython. I have tried including plt.show() at the end of visualize.py to display the graph, but have had no luck. Is there a way to run this locally? PS. I am new to networkx. Thanks , Brian

  • What if we don’t want to filter for accounts starting with ‘TED’, but just all of them. I tried to delete that chunk of code (from 102 to 133) and I always get an error of indentation, but I cannot find what the problem is. A kind of basic, I know, but…

  • arvind

    i ran executed get_followers.py
    following errors were generated how to rectify problems?
    please
    ————————————————————-

    python get_followers.py -s -d 1

    Max Depth: 1

    Traceback (most recent call last):

    File “twitter_data_col.py”, line 173, in

    matches = api.lookup_users(screen_names=[twitter_screenname])

    File “/usr/lib/python2.7/dist-packages/tweepy/api.py”, line 164, in lookup_users

    return self._lookup_users(list_to_csv(user_ids), list_to_csv(screen_names))

    File “/usr/lib/python2.7/dist-packages/tweepy/binder.py”, line 179, in _call

    return method.execute()

    File “/usr/lib/python2.7/dist-packages/tweepy/binder.py”, line 162, in execute

    raise TweepError(error_msg, resp)

    tweepy.error.TweepError: Twitter error response: status code = 403

  • arvind

    i executed your code got following errors
    how we can remove it.
    can you help
    ——————————————

    python twitter_data_col.py -s -d 1

    Max Depth: 1

    —eroor—-

    Traceback (most recent call last):

    File “twitter_data_col.py”, line 173, in

    matches = api.lookup_users(screen_names=[twitter_screenname])

    File “/usr/lib/python2.7/dist-packages/tweepy/api.py”, line 164, in lookup_users

    return self._lookup_users(list_to_csv(user_ids), list_to_csv(screen_names))

    File “/usr/lib/python2.7/dist-packages/tweepy/binder.py”, line 179, in _call

    return method.execute()

    File “/usr/lib/python2.7/dist-packages/tweepy/binder.py”, line 162, in execute

    raise TweepError(error_msg, resp)

    tweepy.error.TweepError: Twitter error response: status code = 403

  • whichwitcho-o

    Hi Mark Kay, I found your tutorial very helpful. But could you please provide further instruction on how to get followers for a list of specific users? Say I have screen_names = [‘ABC’, ‘TEDxSingapore’, ‘TEDWomen’]. Thank you very much!

  • Akshay Govande

    Sir get followers code it shows error argument –s/–screen_name is required error and it put required is false then it shows value error in int I’m using Windows 7 and python 2.7