Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Blockchain by (4.1k points)

I have all 150GB Bitcoin blocks now what? How to open them and read them in Python? I need to extract all used hash160 so far

I tried to open them with Berkeley DB but no success it seems these files aren't Berkeley DB and what is the difference between blkxxxxx.dat and revxxxxx.dat files anyway? it seems revxxxxx.dat files got some improvement in file size

1 Answer

0 votes
by (14.4k points)
edited by

You can extract all the used hash160 addressed from the bitcoin blockchain via RPC from the bitcoin-core by using simple python code. The fact that bitcoin-core handles all parsing issues makes extraction much easier. Just make sure that the bitcoin-core is executed with txindex=1.

Also, you need to ensure that the following dependencies are installed while running the code. 

sudo pip install python-bitcoinrpc

Here’s a python script that can be used to extract all used hash160 addresses: 

import sys

from bitcoinrpc.authproxy import AuthServiceProxy

RPC_ADDRESS="127.0.0.1:8332"

RPC_USER="user"

RPC_PASSWORD="password"

def connect(address, user, password):

    return AuthServiceProxy("http://%s:%s@%s"%(user, password, address))

def extract_block_addresses(rpc, block_hash):

    block = rpc.getblock(block_hash)

    addresses = []

    for tx in block[u'tx']:

        raw_tx = rpc.getrawtransaction(tx, True)

        if not raw_tx.has_key('vout'):

            sys.stderr.write("Transaction %s has no 'vout': %s\n"%(tx, raw_tx))

            break

        for vout in raw_tx[u'vout']:

            if not vout.has_key("scriptPubKey"):

                sys.stderr.write("Vout %s of Transaction %s has no 'scriptPubKey'\n"%(vout, tx))

                break

            if vout["scriptPubKey"]["type"] == "nulldata":

                # arbitrary data

                break

            elif vout['scriptPubKey'].has_key('addresses'):

                addresses.extend(vout['scriptPubKey']['addresses'])

            else:

                sys.stderr.write("Can't handle %s transaction output type in transaction %s\n"%(vout["scriptPubKey"]["type"], raw_tx))

    return addresses

if __name__ == "__main__":

    if len(sys.argv) > 1:

        start_block = int(sys.argv[1])

    else:

        start_block = 1

    if len(sys.argv) > 2:

        end_block = int(sys.argv[2])

    else:

        end_block = 0

    rpc = connect(RPC_ADDRESS, RPC_USER, RPC_PASSWORD)

    if end_block == 0:

        end_block = rpc.getblockcount()

    b = start_block

    for b in xrange(start_block, end_block+1):

        try:

            block_hash = rpc.getblockhash(b)

            for addr in extract_block_addresses(rpc, block_hash):

                print addr

        except:

            rpc = connect(RPC_ADDRESS, RPC_USER, RPC_PASSWORD)

            block_hash = rpc.getblockhash(b)

            for addr in extract_block_addresses(rpc, block_hash):

                print addr

The bitcoin-core runs with 4 RPC threads by default. So, it might be helpful to compress the addresses: 

time python bitcoin-addresses.py 1 100000 2> bad_transaction-1.log | gzip -9 > addresses-1.gz &

time python bitcoin-addresses.py 100000 200000 2> bad_transaction-2.log | gzip -9 > addresses-2.gz &

time python bitcoin-addresses.py 200000 300000 2> bad_transaction-3.log | gzip -9 > addresses-3.gz &

time python bitcoin-addresses.py 30000 2> bad_transaction-4.log | gzip -9 > addresses-4.gz &

If your hard drive proves to be a bottleneck, you can simply run only one instance. 

time python bitcoin-addresses.py 2> bad.log | gzip -9 > addresses.gz

To make sure that your code does not record known addresses, use this: 

zcat addresses.gz | sort -u

Want to make your career in Blockchain? Enroll in Blockchain Course to acquire the essential skills.

Browse Categories

...