Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@apppies
Copy link

@apppies apppies commented Mar 19, 2017

Adds support for obtaining chunked responses from InfluxDb.
After all chunks have been received they are processed and returned in one list.
Chunks are not stitched together, as a chunk can be split on both series or number of rows. The consumer has to handle that part.

It would be nice to process each chunk as soon as the http data has been received, but that requires (I think) a lot more work.

@tihomir-kit
Copy link
Owner

Hey man, thanks for this. I'll try to deploy it to nuget.org tonight in v6.0.2.

Just one quick question as I haven't had the time to take a more careful look at the code - for the end user, this is as simple as consuming any other api calls? The end user doesn't need to collect these chunks into a collection and then put them together? It works out of the box and puts the whole message together in the background?

@apppies
Copy link
Author

apppies commented Mar 20, 2017

To your question: no there is still work left for the user:
But due to your question I just found the statement_id being returned by the influxdb response. Let me see if I can fit that in.

I'm currently using it like this, where Datapoint is a oxyplot datapoint:

var query = "SELECT value FROM measurement"
    + $" WHERE time >= '{start:yyyy-MM-dd HH:mm:ss}'"
    + $" AND time < '{end:yyyy-MM-dd HH:mm:ss}'"
    + " AND (item_tag = '" + string.Join("' OR item_tag = '", tags) + "')"
    + " GROUP BY item_tag";
var response = await influxClient.Client.QueryChunkedAsync(database, query, 50000);

//Collect the values belonging to the same serie
var datapoints = new Dictionary<string, List<DataPoint>>();
foreach (var serie in response)
{
    var tag = serie.Tags.First().Value;
    var data = new List<DataPoint>(serie.Values.Count);
    for (int i = 0; i < serie.Values.Count; i++)
    {
        var newDatapoint = new DataPoint(DateTimeAxis.ToDouble(serie.Values[i][0]), Convert.ToDouble(serie.Values[i][1]));
        data.Add(newDatapoint);
    }

    if (datapoints.ContainsKey(tag))
        datapoints[tag].AddRange(data);
    else
        datapoints.Add(tag, data);
}

@apppies
Copy link
Author

apppies commented Mar 20, 2017

Taking a quick look it seems doable by using the statement_id and partial values in the response. The difficulty might be that the chunks for different series can be interleaved.

@tihomir-kit
Copy link
Owner

Hi,

thanks for the PR. Just to let you know, I'll merge this in about a week or two.

Cheers!

@apppies
Copy link
Author

apppies commented Mar 27, 2017

There is a slight problem with chunks: as there is no longer a limit on the amount of data InfluxDB is sending it is rather easy to get an out of memory exception after running a query with a large result set.

@tihomir-kit tihomir-kit merged commit a195654 into tihomir-kit:master Apr 23, 2017
@tihomir-kit
Copy link
Owner

Merged this. Seems really solid. Thanks!

@tihomir-kit
Copy link
Owner

Also, while I was merging epoch time format from another PR, I had to change code a bit because overloads and additional methods started piling up so chunkSize ended up being an optional param on one of the original methods. Not a huge change, but once I deploy the nuget, you'll probably have a few things to update if you used your build in the mean time.

@yellboy
Copy link

yellboy commented Aug 22, 2017

Hello pootzko. Is this already deployed to nuget? I just updated to 7.0.3 and this method seems not to be included. Thanks.

@tihomir-kit
Copy link
Owner

@yellboy hi!

I removed the separate method and added an optional chunkSize param to QueryAsync() and MultiQueryAsync() methods. If the param is not passed, the request won't be chunked.

Please see:

https://github.com/pootzko/InfluxData.Net/blob/41fc728a5d85e9fdba5d7f3f485c3eaca5805599/InfluxData.Net.InfluxDb/ClientModules/BasicClientModule.cs#L24

https://github.com/pootzko/InfluxData.Net/blob/41fc728a5d85e9fdba5d7f3f485c3eaca5805599/InfluxData.Net.InfluxDb/ClientModules/ClientModuleBase.cs#L54

Let me know if that helped. Cheers!

@yellboy
Copy link

yellboy commented Aug 22, 2017

Thanks for the answer. It answers my question, but I actually wanted to asynchronously fetch each chunk and to do some processing with every chunk, but that I don't have to wait for all chunks to be fetched to start this processing. However, if I didn't misunderstood docs, there is no way to do this right now except manually. I guess I will have to go with that way. If I find a nice way to do this which could enrich this library, I will fork and create a PR. Thanks for the quick answer once again.

@tihomir-kit
Copy link
Owner

I'm actually not fully sure that will work (if you need the payload) as the beginning and the end might contain crucial part of the payload such as some headers or something like that. So you might not be able to use the payload that's in the middle, but I'm curious what you will find out. Good luck, and let us know! :)

@yellboy
Copy link

yellboy commented Sep 20, 2017

Hi @pootzko! I am not completely sure what you mean. Here is what I needed: having a measurement containing for example 100000 rows, I wanted to fetch all the data and process it. However, this would take too much time if done synchronously, because that would mean that first the data is fetched and then the data is processed. What I did is actually creating queries that have LIMIT and OFFSET inside, with LIMIT being, let's say 10000. That means creating 10 threads that run in parallel, every thread fetching one chunk, and then processing it. This made the process a lot faster, because you start processing before all the data is fetched. But, I don't think this should be part of your library, since it is nothing more than just creating queries containing LIMIT and OFFSET clauses.

@tihomir-kit
Copy link
Owner

I see. Yeah, I'm not sure the lib should be doing what you needed either. The chunked option simply means more async under the hood. It's not really supposed to be something you sort of "hook-into" and process the data in parallel.

I think your approach is great for your needs, and I'm glad you solved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants