-
Notifications
You must be signed in to change notification settings - Fork 50
Adding basic chunk support #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey man, thanks for this. I'll try to deploy it to nuget.org tonight in v6.0.2. Just one quick question as I haven't had the time to take a more careful look at the code - for the end user, this is as simple as consuming any other api calls? The end user doesn't need to collect these chunks into a collection and then put them together? It works out of the box and puts the whole message together in the background? |
|
To your question: no there is still work left for the user: I'm currently using it like this, where Datapoint is a oxyplot datapoint: var query = "SELECT value FROM measurement"
+ $" WHERE time >= '{start:yyyy-MM-dd HH:mm:ss}'"
+ $" AND time < '{end:yyyy-MM-dd HH:mm:ss}'"
+ " AND (item_tag = '" + string.Join("' OR item_tag = '", tags) + "')"
+ " GROUP BY item_tag";
var response = await influxClient.Client.QueryChunkedAsync(database, query, 50000);
//Collect the values belonging to the same serie
var datapoints = new Dictionary<string, List<DataPoint>>();
foreach (var serie in response)
{
var tag = serie.Tags.First().Value;
var data = new List<DataPoint>(serie.Values.Count);
for (int i = 0; i < serie.Values.Count; i++)
{
var newDatapoint = new DataPoint(DateTimeAxis.ToDouble(serie.Values[i][0]), Convert.ToDouble(serie.Values[i][1]));
data.Add(newDatapoint);
}
if (datapoints.ContainsKey(tag))
datapoints[tag].AddRange(data);
else
datapoints.Add(tag, data);
} |
|
Taking a quick look it seems doable by using the statement_id and partial values in the response. The difficulty might be that the chunks for different series can be interleaved. |
|
Hi, thanks for the PR. Just to let you know, I'll merge this in about a week or two. Cheers! |
|
There is a slight problem with chunks: as there is no longer a limit on the amount of data InfluxDB is sending it is rather easy to get an out of memory exception after running a query with a large result set. |
|
Merged this. Seems really solid. Thanks! |
|
Also, while I was merging epoch time format from another PR, I had to change code a bit because overloads and additional methods started piling up so chunkSize ended up being an optional param on one of the original methods. Not a huge change, but once I deploy the nuget, you'll probably have a few things to update if you used your build in the mean time. |
|
Hello pootzko. Is this already deployed to nuget? I just updated to 7.0.3 and this method seems not to be included. Thanks. |
|
@yellboy hi! I removed the separate method and added an optional Please see: Let me know if that helped. Cheers! |
|
Thanks for the answer. It answers my question, but I actually wanted to asynchronously fetch each chunk and to do some processing with every chunk, but that I don't have to wait for all chunks to be fetched to start this processing. However, if I didn't misunderstood docs, there is no way to do this right now except manually. I guess I will have to go with that way. If I find a nice way to do this which could enrich this library, I will fork and create a PR. Thanks for the quick answer once again. |
|
I'm actually not fully sure that will work (if you need the payload) as the beginning and the end might contain crucial part of the payload such as some headers or something like that. So you might not be able to use the payload that's in the middle, but I'm curious what you will find out. Good luck, and let us know! :) |
|
Hi @pootzko! I am not completely sure what you mean. Here is what I needed: having a measurement containing for example 100000 rows, I wanted to fetch all the data and process it. However, this would take too much time if done synchronously, because that would mean that first the data is fetched and then the data is processed. What I did is actually creating queries that have LIMIT and OFFSET inside, with LIMIT being, let's say 10000. That means creating 10 threads that run in parallel, every thread fetching one chunk, and then processing it. This made the process a lot faster, because you start processing before all the data is fetched. But, I don't think this should be part of your library, since it is nothing more than just creating queries containing LIMIT and OFFSET clauses. |
|
I see. Yeah, I'm not sure the lib should be doing what you needed either. The chunked option simply means more async under the hood. It's not really supposed to be something you sort of "hook-into" and process the data in parallel. I think your approach is great for your needs, and I'm glad you solved it. |
Adds support for obtaining chunked responses from InfluxDb.
After all chunks have been received they are processed and returned in one list.
Chunks are not stitched together, as a chunk can be split on both series or number of rows. The consumer has to handle that part.
It would be nice to process each chunk as soon as the http data has been received, but that requires (I think) a lot more work.