Iterating through Cassandra wide row with CQL3



  • How can I pull in a range of Composite columns with CQL3?

    Consider the following:

    CREATE TABLE Stuff (
        a int,
        b text,
        c text,
        d text,
        PRIMARY KEY (a,b,c)
    );
    
    

    In Cassandra what this effectively does is creates a ColumnFamily with integer rows (values of a) and with CompositeColumns composed of the values of b and c and the literal string 'd'. Of course this is all covered up by CQL3 so that we will think that we're inserting into individual database rows... but I digress.

    And consider the following set of inputs:

    INSERT INTO Stuff (a,b,c,d) VALUES (1,'A','P','whatever0');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'A','Q','whatever1');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'A','R','whatever2');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'A','S','whatever3');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'A','T','whatever4');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'B','P','whatever5');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'B','Q','whatever6');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'B','R','whatever7');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'B','S','whatever8');
    INSERT INTO Stuff (a,b,c,d) VALUES (1,'B','T','whatever9');
    
    

    In my current use case, I want to read all of the values of Stuff, n values at a time. How do I do this? Here's my current take using n=4:

    SELECT * FROM Stuff WHERE a=1 LIMIT 4;
    
    

    And as expected I get:

     a | b | c | d
    ---+---+---+-----------
     1 | A | P | whatever0
     1 | A | Q | whatever1
     1 | A | R | whatever2
     1 | A | S | whatever3
    
    

    The trouble that I run into is how do I get the next 4? Here is my attempt:

    SELECT * FROM Stuff WHERE a=1 AND b='A' AND c>'S' LIMIT 4;
    
    

    This doesn't work because we've constrained b to equal 'A' - which is a reasonable thing to do! But I've found nothing in the CQL3 syntax that allows me to keep iterating anyway. I wish I could do something like:

    SELECT * FROM Stuff WHERE a=1 AND {b,c} > {'A','S'} LIMIT 4;
    
    

    How do I achieve my desired result. Namely, how do I make CQL3 return:

     a | b | c | d
    ---+---+---+-----------
     1 | A | T | whatever0
     1 | B | P | whatever1
     1 | B | Q | whatever2
     1 | B | R | whatever3
    
    

    Auto paging is done https://issues.apache.org/jira/browse/CASSANDRA-4415, it's release to Cassandra 2.0.1

    After reading through CQL3 document I have not found a way to achieve the desired effect.

    You can however fake the desired effect with a series of CQL queries. Consider that I want to page though items in the above model 4 at a time. It's easy enough to get the first 4:

    SELECT * FROM a = 1 LIMIT 4;

    However there is no way to get the next 4 in a single query. But I can do it piecewise. The last item from the above query is

     a | b | c | d
    ---+---+---+-----------
     1 | A | S | whatever3
    
    

    So I can issue a query to start from here and get everything until the next value of b:

    SELECT * FROM a = 1 WHERE b='A' and c>'S' LIMIT 4;

    And in this case I'll get a single CQL3 row:

     a | b | c | d
    ---+---+---+-----------
     1 | A | T | whatever4
    
    

    (Now, if I'd gotten 4 rows, I would hit the limit and I would start again next time with the last element of that set. But for now I just have one row.) So, to get the rest I iterate from that point and get the remaining 3 rows:

    SELECT * FROM a = 1 WHERE b > 'A' LIMIT 3;
    
    

    And I continue on with this same algorithm until I've incrementally scanned as far as I please.

    In the example above the PRIMARY KEY is composed of 3 elements meaning that under CQL in Cassandra the column names are CompositeColumns of 2 elements (...well basically, but the difference doesn't matter here). And because the CompositeColumns are of 2 elements you have to make 2 queries as I've demoed here. In general though, if the PRIMARY KEY is of n elements, then you will have to make n-1 queries to fake a scan of the CQL table (a.k.a Cassandra row).


    Update: Indeed, CQL3 doesn't have a server side cursor, (see the "CQL3 pagination" section here), and if you wanted to fake it, you'd have to use something described above (read further on that link so see my basic idea elaborated by the post's author).

    However, there is a JIRA issue regarding the server-side cursor which will be available in Cassandra 2 and which is already present in Cassandra 2 Beta.

    There is also a related JIRA issue that would make it much easier to implement the client-side cursor as I've hinted at above. But it stands unresolved.


    Update2: JIRA issue is now fixed.

    You can now query using tuple/vector syntax WHERE (c1, c2) > (1, 0)

    What you try to do is to get pagination stuff in Cassandra. CQL3 doesn't support this. You should create a column which is suitable for comparison, i.e. for less than, greater than operations and this column should form increasing/decreasing sequence. Indeed, as jorgebg noticed above, concatenation of b+c would fit this.

    Hong

    select * from stuff where a = 1 and (b,c) > ('A','S') limit 4;

    来自:https://stackoverflow.com/questions/17664438/iterating-through-cassandra-wide-row-with-cql3



最新帖子

推荐阅读