Explore Courses Blog Tutorials Interview Questions
0 votes
in Blockchain by (4.1k points)

I have all 150GB Bitcoin blocks now what? How to open them and read them in Python? I need to extract all used hash160 so far

I tried to open them with Berkeley DB but no success it seems these files aren't Berkeley DB and what is the difference between blkxxxxx.dat and revxxxxx.dat files anyway? it seems revxxxxx.dat files got some improvement in file size

1 Answer

0 votes
by (14.4k points)
edited by

You can extract all the used hash160 addressed from the bitcoin blockchain via RPC from the bitcoin-core by using simple python code. The fact that bitcoin-core handles all parsing issues makes extraction much easier. Just make sure that the bitcoin-core is executed with txindex=1.

Also, you need to ensure that the following dependencies are installed while running the code. 

sudo pip install python-bitcoinrpc

Here’s a python script that can be used to extract all used hash160 addresses: 

import sys

from bitcoinrpc.authproxy import AuthServiceProxy




def connect(address, user, password):

    return AuthServiceProxy("http://%s:%s@%s"%(user, password, address))

def extract_block_addresses(rpc, block_hash):

    block = rpc.getblock(block_hash)

    addresses = []

    for tx in block[u'tx']:

        raw_tx = rpc.getrawtransaction(tx, True)

        if not raw_tx.has_key('vout'):

            sys.stderr.write("Transaction %s has no 'vout': %s\n"%(tx, raw_tx))


        for vout in raw_tx[u'vout']:

            if not vout.has_key("scriptPubKey"):

                sys.stderr.write("Vout %s of Transaction %s has no 'scriptPubKey'\n"%(vout, tx))


            if vout["scriptPubKey"]["type"] == "nulldata":

                # arbitrary data


            elif vout['scriptPubKey'].has_key('addresses'):



                sys.stderr.write("Can't handle %s transaction output type in transaction %s\n"%(vout["scriptPubKey"]["type"], raw_tx))

    return addresses

if __name__ == "__main__":

    if len(sys.argv) > 1:

        start_block = int(sys.argv[1])


        start_block = 1

    if len(sys.argv) > 2:

        end_block = int(sys.argv[2])


        end_block = 0


    if end_block == 0:

        end_block = rpc.getblockcount()

    b = start_block

    for b in xrange(start_block, end_block+1):


            block_hash = rpc.getblockhash(b)

            for addr in extract_block_addresses(rpc, block_hash):

                print addr


            rpc = connect(RPC_ADDRESS, RPC_USER, RPC_PASSWORD)

            block_hash = rpc.getblockhash(b)

            for addr in extract_block_addresses(rpc, block_hash):

                print addr

The bitcoin-core runs with 4 RPC threads by default. So, it might be helpful to compress the addresses: 

time python 1 100000 2> bad_transaction-1.log | gzip -9 > addresses-1.gz &

time python 100000 200000 2> bad_transaction-2.log | gzip -9 > addresses-2.gz &

time python 200000 300000 2> bad_transaction-3.log | gzip -9 > addresses-3.gz &

time python 30000 2> bad_transaction-4.log | gzip -9 > addresses-4.gz &

If your hard drive proves to be a bottleneck, you can simply run only one instance. 

time python 2> bad.log | gzip -9 > addresses.gz

To make sure that your code does not record known addresses, use this: 

zcat addresses.gz | sort -u

Want to make your career in Blockchain? Enroll in Blockchain Course to acquire the essential skills.

Browse Categories