Performance

This section compares clickhouse-driver performance over Native interface with TSV and JSONEachRow formats available over HTTP interface.

clickhouse-driver returns already parsed row items in Python data types. Driver performs all transformation for you.

When you read data over HTTP you may need to cast strings into Python types.

Test data

Sample data for testing is taken from ClickHouse docs.

Create database and table:

DROP DATABASE IF EXISTS perftest;

CREATE DATABASE perftest;

CREATE TABLE perftest.ontime (
  Year UInt16,
  Quarter UInt8,
  Month UInt8,
  DayofMonth UInt8,
  DayOfWeek UInt8,
  FlightDate Date,
  UniqueCarrier FixedString(7),
  AirlineID Int32,
  Carrier FixedString(2),
  TailNum String,
  FlightNum String,
  OriginAirportID Int32,
  OriginAirportSeqID Int32,
  OriginCityMarketID Int32,
  Origin FixedString(5),
  OriginCityName String,
  OriginState FixedString(2),
  OriginStateFips String,
  OriginStateName String,
  OriginWac Int32,
  DestAirportID Int32,
  DestAirportSeqID Int32,
  DestCityMarketID Int32,
  Dest FixedString(5),
  DestCityName String,
  DestState FixedString(2),
  DestStateFips String,
  DestStateName String,
  DestWac Int32,
  CRSDepTime Int32,
  DepTime Int32,
  DepDelay Int32,
  DepDelayMinutes Int32,
  DepDel15 Int32,
  DepartureDelayGroups String,
  DepTimeBlk String,
  TaxiOut Int32,
  WheelsOff Int32,
  WheelsOn Int32,
  TaxiIn Int32,
  CRSArrTime Int32,
  ArrTime Int32,
  ArrDelay Int32,
  ArrDelayMinutes Int32,
  ArrDel15 Int32,
  ArrivalDelayGroups Int32,
  ArrTimeBlk String,
  Cancelled UInt8,
  CancellationCode FixedString(1),
  Diverted UInt8,
  CRSElapsedTime Int32,
  ActualElapsedTime Int32,
  AirTime Int32,
  Flights Int32,
  Distance Int32,
  DistanceGroup UInt8,
  CarrierDelay Int32,
  WeatherDelay Int32,
  NASDelay Int32,
  SecurityDelay Int32,
  LateAircraftDelay Int32,
  FirstDepTime String,
  TotalAddGTime String,
  LongestAddGTime String,
  DivAirportLandings String,
  DivReachedDest String,
  DivActualElapsedTime String,
  DivArrDelay String,
  DivDistance String,
  Div1Airport String,
  Div1AirportID Int32,
  Div1AirportSeqID Int32,
  Div1WheelsOn String,
  Div1TotalGTime String,
  Div1LongestGTime String,
  Div1WheelsOff String,
  Div1TailNum String,
  Div2Airport String,
  Div2AirportID Int32,
  Div2AirportSeqID Int32,
  Div2WheelsOn String,
  Div2TotalGTime String,
  Div2LongestGTime String,
  Div2WheelsOff String,
  Div2TailNum String,
  Div3Airport String,
  Div3AirportID Int32,
  Div3AirportSeqID Int32,
  Div3WheelsOn String,
  Div3TotalGTime String,
  Div3LongestGTime String,
  Div3WheelsOff String,
  Div3TailNum String,
  Div4Airport String,
  Div4AirportID Int32,
  Div4AirportSeqID Int32,
  Div4WheelsOn String,
  Div4TotalGTime String,
  Div4LongestGTime String,
  Div4WheelsOff String,
  Div4TailNum String,
  Div5Airport String,
  Div5AirportID Int32,
  Div5AirportSeqID Int32,
  Div5WheelsOn String,
  Div5TotalGTime String,
  Div5LongestGTime String,
  Div5WheelsOff String,
  Div5TailNum String
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;

Download some data for 2017 year:

for s in `seq 2017 2017`
do
for m in `seq 1 12`
do
wget https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_${s}_${m}.zip
done
done

Insert data into ClickHouse:

for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | clickhouse-client --query="INSERT INTO perftest.ontime FORMAT CSVWithNames"; done

Required packages

pip install clickhouse-driver requests clickhouse-connect

For fast json parsing we’ll use ujson package:

pip install ujson

Installed packages:

$ pip freeze
backports.zoneinfo==0.2.1
certifi==2022.12.7
charset-normalizer==3.0.1
clickhouse-connect==0.5.0
clickhouse-driver==0.2.5
idna==3.4
lz4==4.3.2
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
requests==2.28.2
tzdata==2022.7
tzlocal==4.2
ujson==5.7.0
urllib3==1.26.14
zstandard==0.19.0

For clickhouse-connect we need to turn off compression with compress=False for elimination decompression overhead. This package also adds LIMIT clause to the query by default. Let’s disable it off with query_limit=None.

Versions

Machine: Linux klebedev-ThinkPad-T460 5.15.0-57-generic #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Python: Python 3.8.12 (default, Apr 13 2022, 21:16:23) [GCC 11.2.0]

Benchmarking

Let’s pick number of rows for testing with clickhouse-client.

SELECT count() FROM ontime WHERE FlightDate < '2017-01-04'

45202
SELECT count() FROM ontime WHERE FlightDate < '2017-01-10'

131848
SELECT count() FROM ontime WHERE FlightDate < '2017-01-16'

217015
SELECT count() FROM ontime WHERE FlightDate < '2017-02-01'

450017
SELECT count() FROM ontime WHERE FlightDate < '2017-02-18'

697813

Scripts below can be benchmarked with following one-liner:

for d in 2017-01-04 2017-01-10 2017-01-16 2017-02-01 2017-02-18; do /usr/bin/time -f "%e s / %M kB" python script.py $d; done

Time will measure:

  • elapsed real (wall clock) time used by the process, in seconds;

  • maximum resident set size of the process during its lifetime, in kilobytes.

Plain text without parsing

Let’s take get plain text response from ClickHouse server as baseline.

Fetching not parsed data with pure requests (1)

import sys
import requests

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT {}".format(sys.argv[1], sys.argv[2])
data = requests.get('http://localhost:8123/', params={'query': query})

Parsed rows

Line split into elements will be consider as “parsed” for TSV format (2)

import sys
import requests

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

data = [line.decode('utf-8').split('\t') for line in resp.iter_lines(chunk_size=10000)]

Now we cast each element to it’s data type (2.5)

from datetime import date
import sys
import requests


def get_python_type(ch_type):
    if ch_type.startswith('Int') or ch_type.startswith('UInt'):
        return int

    elif ch_type == 'String' or ch_type.startswith('FixedString'):
        return None

    elif ch_type == 'Date':
        return lambda value: date(*[int(x) for x  in value.split('-')])

    raise ValueError(f'Unsupported type: "{ch_type}"')


resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
python_types = [get_python_type(x) for x in ch_types]

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

data = []

for line in resp.iter_lines(chunk_size=10000):
    data.append([cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)])

JSONEachRow format can be loaded with json loads (3)

import sys
import requests
from ujson import loads

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

data = [list(loads(line).values()) for line in resp.iter_lines(chunk_size=10000)]

Get fully parsed rows with clickhouse-driver in Native format (4)

import sys
from clickhouse_driver import Client

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
client = Client.from_url('clickhouse://localhost')

data = client.execute(query)

Get fully parsed rows with clickhouse-connect (14)

import sys
import clickhouse_connect

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)

data = client.query(query).result_rows

Iteration over rows

Iteration over TSV (5)

import sys
import requests

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = line.decode('utf-8').split('\t')

Now we cast each element to it’s data type (5.5)

from datetime import date
import sys
import requests


def get_python_type(ch_type):
    if ch_type.startswith('Int') or ch_type.startswith('UInt'):
        return int

    elif ch_type == 'String' or ch_type.startswith('FixedString'):
        return None

    elif ch_type == 'Date':
        return lambda value: date(*[int(x) for x  in value.split('-')])

    raise ValueError(f'Unsupported type: "{ch_type}"')


resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
python_types = [get_python_type(x) for x in ch_types]

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = [cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)]

Iteration over JSONEachRow (6)

import sys
import requests
from ujson import loads

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
  line = list(loads(line).values())

Iteration over rows with clickhouse-driver in Native format (7)

import sys
from clickhouse_driver import Client

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
client = Client.from_url('clickhouse://localhost')

for row in client.execute_iter(query):
  pass

Iteration over rows with clickhouse-connect (17)

import sys
import clickhouse_connect

query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)

rv = client.query(query)
with rv:
    for row in rv.stream_rows():
        pass

Iteration over string rows

OK, but what if we need only string columns?

Iteration over TSV (8)

import sys
import requests

cols = [
    'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
    'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
    'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
    'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
    'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
    'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
    'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
    'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
    'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
    'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = line.decode('utf-8').split('\t')

Iteration over JSONEachRow (9)

import sys
import requests
from ujson import loads

cols = [
    'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
    'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
    'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
    'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
    'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
    'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
    'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
    'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
    'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
    'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = list(loads(line).values())

Iteration over string rows with clickhouse-driver in Native format (10)

import sys
from clickhouse_driver import Client

cols = [
    'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
    'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
    'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
    'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
    'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
    'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
    'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
    'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
    'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
    'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
client = Client.from_url('clickhouse://localhost')

for row in client.execute_iter(query):
  pass

Iteration over string rows with clickhouse-connect (15)

import sys
import clickhouse_connect

cols = [
    'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
    'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
    'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
    'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
    'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
    'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
    'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
    'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
    'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
    'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)

rv = client.query(query)
with rv:
    for row in rv.stream_rows():
        pass

Iteration over int rows

Iteration over TSV (11)

import sys
import requests

cols = [
    'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
    'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
    'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
    'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
    'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
    'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
    'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
    'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = [int(x) for x in line.split(b'\t')]

Iteration over JSONEachRow (12)

import sys
import requests
from ujson import loads

cols = [
    'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
    'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
    'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
    'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
    'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
    'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
    'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
    'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})

for line in resp.iter_lines(chunk_size=10000):
    line = list(loads(line).values())

Iteration over int rows with clickhouse-driver in Native format (13)

import sys
from clickhouse_driver import Client

cols = [
    'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
    'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
    'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
    'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
    'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
    'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
    'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
    'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
client = Client.from_url('clickhouse://localhost')

for row in client.execute_iter(query):
    pass

Iteration over int rows with clickhouse-connect (16)

import sys
import clickhouse_connect

cols = [
    'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
    'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
    'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
    'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
    'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
    'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
    'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
    'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
]

query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)

rv = client.query(query)
with rv:
    for row in rv.stream_rows():
        pass

Results

This table contains memory and timing benchmark results of snippets above.

JSON in table is shorthand for JSONEachRow.

Rows

50k

131k

217k

450k

697k

Plain text without parsing: timing

Naive requests.get TSV (1)

0.35 s

0.56 s

0.83 s

1.15 s

1.72 s

Naive requests.get JSON (1)

0.99 s

1.80 s

2.77 s

5.15 s

7.80 s

Plain text without parsing: memory

Naive requests.get TSV (1)

52 MB

110 MB

167 MB

323 MB

489 MB

Naive requests.get JSON (1)

263 MB

726 MB

1.88 GB

2.42 GB

3.75 GB

Parsed rows: timing

requests.get TSV (2)

0.83 s

1.97 s

3.32 s

7.90 s

13.13 s

requests.get TSV with cast (2.5)

1.59 s

4.31 s

6.99 s

15.60 s

25.89 s

requests.get JSON (3)

2.78 s

5.55 s

9.23 s

21.45 s

31.50 s

clickhouse-driver Native (4)

0.87 s

1.50 s

2.21 s

4.20 s

6.32 s

clickhouse-connect (14)

0.89 s

1.72 s

2.46 s

4.85 s

7.19 s

Parsed rows: memory

requests.get TSV (2)

182 MB

487 MB

794 MB

1.63 GB

2.51 GB

requests.get TSV with cast (2.5)

138 MB

359 MB

579 MB

1.18 GB

1.82 GB

requests.get JSON (3)

136 MB

351 MB

565 MB

1.15 GB

1.77 GB

clickhouse-driver Native (4)

155 MB

343 MB

530 MB

1.04 GB

1.58 GB

clickhouse-connect (14)

139 MB

333 MB

524 MB

1.05 GB

1.61 GB

Iteration over rows: timing

requests.get TSV (5)

0.48 s

0.91 s

1.28 s

2.57 s

3.72 s

requests.get TSV with cast (5.5)

1.25 s

3.05 s

4.77 s

9.67 s

15.04 s

requests.get JSON (6)

1.80 s

4.48 s

7.10 s

14.45 s

22.17 s

clickhouse-driver Native (7)

0.72 s

1.38 s

2.01 s

3.65 s

5.45 s

clickhouse-connect (17)

0.85 s

1.62 s

2.12 s

4.12 s

6.05 s

Iteration over rows: memory

requests.get TSV (5)

22 MB

22 MB

22 MB

22 MB

22 MB

requests.get TSV with cast (5.5)

22 MB

22 MB

22 MB

22 MB

22 MB

requests.get JSON (6)

24 MB

24 MB

24 MB

24 MB

24 MB

clickhouse-driver Native (7)

91 MB

93 MB

93 MB

94 MB

94 MB

clickhouse-connect (17)

68 MB

68 MB

68 MB

68 MB

68 MB

Iteration over string rows: timing

requests.get TSV (8)

0.44 s

0.57 s

0.77 s

1.40 s

1.94 s

requests.get JSON (9)

1.03 s

2.46 s

3.87 s

7.76 s

11.96 s

clickhouse-driver Native (10)

0.63 s

1.06 s

1.44 s

2.45 s

3.57 s

clickhouse-connect (15)

0.62 s

1.13 s

1.53 s

2.84 s

4.00 s

Iteration over string rows: memory

requests.get TSV (8)

22 MB

22 MB

22 MB

22 MB

22 MB

requests.get JSON (9)

24 MB

24 MB

24 MB

24 MB

24 MB

clickhouse-driver Native (10)

77 MB

79 MB

79 MB

79 MB

79 MB

clickhouse-connect (15)

60 MB

60 MB

60 MB

60 MB

60 MB

Iteration over int rows: timing

requests.get TSV (11)

0.81 s

1.66 s

2.61 s

5.08 s

7.91 s

requests.get JSON (12)

0.97 s

2.02 s

3.29 s

6.50 s

10.00 s

clickhouse-driver Native (13)

0.55 s

0.78 s

1.02 s

1.73 s

2.44 s

clickhouse-connect (16)

0.54 s

0.79 s

1.01 s

1.68 s

2.20 s

Iteration over int rows: memory

requests.get TSV (11)

22 MB

22 MB

22 MB

22 MB

22 MB

requests.get JSON (12)

24 MB

24 MB

24 MB

24 MB

24 MB

clickhouse-driver Native (13)

71 MB

72 MB

72 MB

73 MB

73 MB

clickhouse-connect (16)

41 MB

41 MB

41 MB

41 MB

41 MB

Conclusion

If you need to get significant number of rows from ClickHouse server as text then TSV format is your choice. See Iteration over string rows results.

But if you need to manipulate over python data types then you should take a look on drivers with Native format. For most data types driver uses binary pack() / unpack() for serialization / deserialization. Which is obviously faster than cls() for x in lst. See (2.5) and (5.5).

It doesn’t matter which interface to use if you manipulate small amount of rows.