-
-
Notifications
You must be signed in to change notification settings - Fork 728
Clickhousify samples endpoint #11393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clickhousify samples endpoint #11393
Conversation
84339f7
to
3b627d6
Compare
3b627d6
to
fb11408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to discuss. I thought the new clickhouse work will utilize the clean arch approach. I can give you more details tomorrow
374a048
to
ae4e13a
Compare
2210ab2
to
e329f81
Compare
|
8060e80
to
f5cd65c
Compare
6855908
to
5fdbe7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (needs a couple of changes) still looking at the derived sample table
|
||
@Service | ||
@Profile("clickhouse") | ||
public class FetchMetaSamplesHeadersUseCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove these use cases that add Http Headers and put this in the controller
|
||
@Service | ||
@Profile("clickhouse") | ||
public class GetMetaSamplesHeadersUseCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
|
||
@Service | ||
@Profile("clickhouse") | ||
public class GetMetaSamplesOfPatientInStudyHeadersUseCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same I would remove these http headers usse case
5fdbe7e
to
342ca0d
Compare
sample_unique_id String, | ||
sample_unique_id_base64 String, | ||
sample_stable_id String, | ||
patient_unique_id String, | ||
patient_unique_id_base64 String, | ||
patient_stable_id String, | ||
cancer_study_identifier LowCardinality(String), | ||
internal_id Int, | ||
-- fields below are needed for the DETAILED projection | ||
sequenced Int, | ||
copy_number_segment_present Int, | ||
patient_internal_id Int, | ||
sample_type String, | ||
cancer_study_id Int, | ||
type_of_cancer_id String, | ||
cancer_study_name String, | ||
cancer_study_description String, | ||
cancer_study_public Int, | ||
cancer_study_pmid String, | ||
cancer_study_citation String, | ||
cancer_study_groups String, | ||
cancer_study_status Int, | ||
cancer_study_import_date DateTime, | ||
reference_genome String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would be curious how much speed we get with the wide column vs just doing a simple join or using a dictionary. Don't want to just randomly add more columns to these tables if they aren't needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still keep this logic and the two additional columns (sequenced
and copy_number_segment_present
). The rest of the columns are straightforward to obtain with a simple join and won't probably cause much performance issue. I'll look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried moving everything, including the sequenced
and copy_number_segment_present
logic, to the SQL layer. It doesn't seem to degrade the performance significantly. The wide table is slightly faster but probably negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced the number of columns by 34f3b77.
For now keeping sequenced
and copy_number_segment_present
as well as sample_type
and patient_internal_id
, but it is also possible to derive them in SQL if we don't want to add any new columns to sample_derived
at all.
1ac1cbc
to
7b2dc5b
Compare
); | ||
} | ||
|
||
public Sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a comment here explaining the difference between these Sample overloads
List<String> sampleIds, | ||
ProjectionType projection | ||
) { | ||
return switch (projection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onursumer where do pagination params go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alisman This method doesn't support pagination. Legacy implementation supports pagination only for certain methods where we get all samples. I kept it the same way for the clickhouse implementation.
Lines 79 to 80 in fa1847f
Integer pageSize, | |
Integer pageNumber, |
Lines 110 to 111 in fa1847f
Integer pageSize, | |
Integer pageNumber, |
Lines 133 to 134 in fa1847f
Integer pageSize, | |
Integer pageNumber, |
7b2dc5b
to
d0a8ece
Compare
<sql id="sampleColumnsSummary"> | ||
<include refid="sampleColumnsId" />, | ||
sample_type as sampleType, | ||
patient_internal_id as patientId | ||
</sql> | ||
|
||
<sql id="sampleColumnsDetailed"> | ||
<include refid="sampleColumnsSummary" />, | ||
sequenced as sequenced, | ||
copy_number_segment_present as copyNumberSegmentPresent, | ||
patient_internal_id as "patient.internalId", | ||
patient_stable_id as "patient.stableId", | ||
cs.cancer_study_id as "patient.cancerStudyId", | ||
cs.cancer_study_identifier as "patient.cancerStudyIdentifier", | ||
cs.cancer_study_id as "patient.cancerStudy.cancerStudyId", | ||
cs.cancer_study_identifier as "patient.cancerStudy.cancerStudyIdentifier", | ||
cs.type_of_cancer_id as "patient.cancerStudy.typeOfCancerId", | ||
cs.name as "patient.cancerStudy.name", | ||
cs.description as "patient.cancerStudy.description", | ||
cs.public as "patient.cancerStudy.publicStudy", | ||
cs.pmid as "patient.cancerStudy.pmid", | ||
cs.citation as "patient.cancerStudy.citation", | ||
cs.groups as "patient.cancerStudy.groups", | ||
cs.status as "patient.cancerStudy.status", | ||
cs.import_date as "patient.cancerStudy.importDate", | ||
rg.name as "patient.cancerStudy.referenceGenome" | ||
</sql> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I would say this is a good abstraction but the extra abstraction makes it easier to get lost.
The balance is to not duplicate code, but also don't abstract so much
<sql id="selectSample"> | ||
SELECT | ||
<include refid="sampleColumnsId" /> | ||
FROM sample_derived | ||
</sql> | ||
|
||
<sql id="selectSummarySample"> | ||
SELECT | ||
<include refid="sampleColumnsSummary" /> | ||
FROM sample_derived | ||
</sql> | ||
|
||
<sql id="selectDetailedSample"> | ||
SELECT | ||
<include refid="sampleColumnsDetailed" /> | ||
<include refid="fromJoinedTable" /> | ||
</sql> | ||
|
||
<sql id="selectMetaSample"> | ||
SELECT | ||
COUNT(*) as totalCount | ||
FROM sample_derived | ||
</sql> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove these, but keep the other abstractions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change, but after that It is good
d0a8ece
to
8fc22ac
Compare
8fc22ac
to
e3ae3f3
Compare
e3ae3f3
to
cbf8bed
Compare
|
Fix #11377