Cassandra with large number of columns per row

Cassandra with large number of columns per row

I setup cassandra with default configuration in clean AWS instance, and I insert 10000 columns into a row, each column has a 1MB data. I use this ruby(version 1.9.3) script:
10000.times do
key = rand(36**8).to_s(36)
value = rand(36**1024).to_s(36) * 1024
Cas_client.insert(TestColumnFamily,TestRow,{key=>value})
end

every time I run this script, it will crash:
/usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in `read’: CassandraThrift::Cassandra::Client::TransportException from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in `read_all’
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in `read_frame’
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer’
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `read_message_begin’
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `receive_message’
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:251:in `recv_batch_mutate’
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:243:in `batch_mutate’
from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy’ from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate’
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/protocol.rb:7:in `_mutate’
from /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/cassandra.rb:463:in `insert’
from a.rb:6:in `block in


from a.rb:3:in `times’
from a.rb:3:in `

yet cassandra performs normally, then I run another ruby script to get how many columns I have inserted:
p cas_client.count_columns(TestColumnFamily,TestRow)

this script crashed again, same error message. And cassandra process remain in 100% cpu usage.
AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu)
cassandra-1.1.2
ruby-1.9.3-p194
jdk-7u6-linux-x64
ruby-gems:
cassandra (0.15.0)
thrift (0.8.0)
thrift_client (0.8.1)

What is the problem?

Solutions/Answers:

Solution 1:

10,000 columns at 1mb each is 10 gigs of data.

Cassandra rpc uses thrift, which requires that the entire return value from an rpc call must fit in memory, so trying to read all columns would require you to load a 10 gig thrift object into memory which is not practical, especially in ruby.

References

Related:  Cassandra - sstable2json “-f” option doesn't work