Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

onursumer
Copy link
Member

Fix #11377

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch 2 times, most recently from 84339f7 to 3b627d6 Compare February 13, 2025 00:28
@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from 3b627d6 to fb11408 Compare February 13, 2025 18:35
@onursumer onursumer marked this pull request as ready for review February 13, 2025 19:27
Copy link
Collaborator

@haynescd haynescd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss. I thought the new clickhouse work will utilize the clean arch approach. I can give you more details tomorrow

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch 5 times, most recently from 374a048 to ae4e13a Compare February 24, 2025 21:00
@dippindots dippindots requested a review from haynescd February 25, 2025 16:46
@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch 3 times, most recently from 2210ab2 to e329f81 Compare February 26, 2025 18:22
@sonarqubecloud
Copy link

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch 4 times, most recently from 8060e80 to f5cd65c Compare February 28, 2025 19:43
@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch 3 times, most recently from 6855908 to 5fdbe7e Compare March 3, 2025 20:37
Copy link
Collaborator

@haynescd haynescd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (needs a couple of changes) still looking at the derived sample table


@Service
@Profile("clickhouse")
public class FetchMetaSamplesHeadersUseCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these use cases that add Http Headers and put this in the controller


@Service
@Profile("clickhouse")
public class GetMetaSamplesHeadersUseCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above


@Service
@Profile("clickhouse")
public class GetMetaSamplesOfPatientInStudyHeadersUseCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same I would remove these http headers usse case

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from 5fdbe7e to 342ca0d Compare March 7, 2025 18:14
Comment on lines 57 to 80
sample_unique_id String,
sample_unique_id_base64 String,
sample_stable_id String,
patient_unique_id String,
patient_unique_id_base64 String,
patient_stable_id String,
cancer_study_identifier LowCardinality(String),
internal_id Int,
-- fields below are needed for the DETAILED projection
sequenced Int,
copy_number_segment_present Int,
patient_internal_id Int,
sample_type String,
cancer_study_id Int,
type_of_cancer_id String,
cancer_study_name String,
cancer_study_description String,
cancer_study_public Int,
cancer_study_pmid String,
cancer_study_citation String,
cancer_study_groups String,
cancer_study_status Int,
cancer_study_import_date DateTime,
reference_genome String
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would be curious how much speed we get with the wide column vs just doing a simple join or using a dictionary. Don't want to just randomly add more columns to these tables if they aren't needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/cBioPortal/cbioportal/pull/11393/files#diff-90bb502239e23d89960abe06053613ae64223d06b81e263425d1c32026d5728fR86-R104

I think we should still keep this logic and the two additional columns (sequenced and copy_number_segment_present). The rest of the columns are straightforward to obtain with a simple join and won't probably cause much performance issue. I'll look into that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried moving everything, including the sequenced and copy_number_segment_present logic, to the SQL layer. It doesn't seem to degrade the performance significantly. The wide table is slightly faster but probably negligible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the number of columns by 34f3b77.

For now keeping sequenced and copy_number_segment_present as well as sample_type and patient_internal_id, but it is also possible to derive them in SQL if we don't want to add any new columns to sample_derived at all.

@dippindots dippindots requested a review from haynescd April 15, 2025 15:08
@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from 1ac1cbc to 7b2dc5b Compare April 17, 2025 17:56
);
}

public Sample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a comment here explaining the difference between these Sample overloads

List<String> sampleIds,
ProjectionType projection
) {
return switch (projection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onursumer where do pagination params go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alisman This method doesn't support pagination. Legacy implementation supports pagination only for certain methods where we get all samples. I kept it the same way for the clickhouse implementation.

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from 7b2dc5b to d0a8ece Compare April 22, 2025 18:35
Comment on lines +165 to +209
<sql id="sampleColumnsSummary">
<include refid="sampleColumnsId" />,
sample_type as sampleType,
patient_internal_id as patientId
</sql>

<sql id="sampleColumnsDetailed">
<include refid="sampleColumnsSummary" />,
sequenced as sequenced,
copy_number_segment_present as copyNumberSegmentPresent,
patient_internal_id as "patient.internalId",
patient_stable_id as "patient.stableId",
cs.cancer_study_id as "patient.cancerStudyId",
cs.cancer_study_identifier as "patient.cancerStudyIdentifier",
cs.cancer_study_id as "patient.cancerStudy.cancerStudyId",
cs.cancer_study_identifier as "patient.cancerStudy.cancerStudyIdentifier",
cs.type_of_cancer_id as "patient.cancerStudy.typeOfCancerId",
cs.name as "patient.cancerStudy.name",
cs.description as "patient.cancerStudy.description",
cs.public as "patient.cancerStudy.publicStudy",
cs.pmid as "patient.cancerStudy.pmid",
cs.citation as "patient.cancerStudy.citation",
cs.groups as "patient.cancerStudy.groups",
cs.status as "patient.cancerStudy.status",
cs.import_date as "patient.cancerStudy.importDate",
rg.name as "patient.cancerStudy.referenceGenome"
</sql>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I would say this is a good abstraction but the extra abstraction makes it easier to get lost.

The balance is to not duplicate code, but also don't abstract so much

Comment on lines 193 to 215
<sql id="selectSample">
SELECT
<include refid="sampleColumnsId" />
FROM sample_derived
</sql>

<sql id="selectSummarySample">
SELECT
<include refid="sampleColumnsSummary" />
FROM sample_derived
</sql>

<sql id="selectDetailedSample">
SELECT
<include refid="sampleColumnsDetailed" />
<include refid="fromJoinedTable" />
</sql>

<sql id="selectMetaSample">
SELECT
COUNT(*) as totalCount
FROM sample_derived
</sql>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these, but keep the other abstractions

Copy link
Collaborator

@haynescd haynescd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small change, but after that It is good

@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from d0a8ece to 8fc22ac Compare April 24, 2025 19:38
@onursumer onursumer force-pushed the clickhousify-samples-endpoint branch from e3ae3f3 to cbf8bed Compare April 29, 2025 17:35
@sonarqubecloud
Copy link

@onursumer onursumer merged commit 14b4975 into cBioPortal:master Apr 30, 2025
19 of 22 checks passed
@onursumer onursumer deleted the clickhousify-samples-endpoint branch April 30, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clickhouseify the samples endpoint

4 participants