Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit d7eaaf0

Browse files
Oleh Borysevychdariabezkorovaina
andauthored
Duet AI code-explanation and code-generation prompts basic (#30686)
* parquet added * csv and avro added * Apply suggestions from code review Co-authored-by: Daria Bezkorovaina <[email protected]> * whitespace fixed * Apply suggestions from code review Co-authored-by: Daria Bezkorovaina <[email protected]> --------- Co-authored-by: Daria Bezkorovaina <[email protected]>
1 parent 33a0bfb commit d7eaaf0

8 files changed

Lines changed: 884 additions & 0 deletions

File tree

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Prompt:
2+
What does this code do?
3+
4+
```java
5+
package gcs;
6+
7+
import org.apache.beam.sdk.Pipeline;
8+
import org.apache.beam.sdk.io.json.JsonIO;
9+
import org.apache.beam.sdk.options.Description;
10+
import org.apache.beam.sdk.options.PipelineOptions;
11+
import org.apache.beam.sdk.options.PipelineOptionsFactory;
12+
import org.apache.beam.sdk.options.Validation;
13+
import org.apache.beam.sdk.schemas.JavaFieldSchema;
14+
import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
15+
import org.apache.beam.sdk.transforms.Create;
16+
17+
import java.io.Serializable;
18+
import java.util.Arrays;
19+
import java.util.List;
20+
21+
public class WriteJsonIO {
22+
23+
@DefaultSchema(JavaFieldSchema.class)
24+
public static class SampleRecord implements Serializable {
25+
public int id;
26+
public String month;
27+
public String amount
28+
29+
public SampleRecord() {
30+
}
31+
32+
public SampleRecord(int id, String month, String amount) {
33+
this.id = id;
34+
this.month = month;
35+
this.amount = amount;
36+
}
37+
}
38+
39+
public interface WriteJsonOptions extends PipelineOptions {
40+
41+
@Description("A file path to write JSON files to")
42+
@Validation.Required
43+
String getFilePath();
44+
45+
void setFilePath(String filePath);
46+
}
47+
48+
public static void main(String[] args) {
49+
.
50+
WriteJsonOptions options =
51+
PipelineOptionsFactory.fromArgs(args)
52+
.withValidation().as(WriteJsonOptions.class);
53+
54+
Pipeline p = Pipeline.create(options);
55+
56+
List<SampleRecord> rows =
57+
Arrays.asList(
58+
new SampleRecord(1, "January", "$1000"),
59+
new SampleRecord(2, "February", "$2000"),
60+
new SampleRecord(3, "March", "$3000"));
61+
62+
p.apply("Create Records", Create.of(rows))
63+
.apply(
64+
"Write Records to JSON File",
65+
JsonIO.<SampleRecord>write(options.getFilePath())
66+
.withNumShards(1));
67+
p.run();
68+
}
69+
}
70+
```
71+
72+
Response:
73+
This code snippet uses the Apache Beam JsonIO connector to write a list of `SampleRecord` objects to a JSON file.
74+
75+
The `SampleRecord` class defines the schema for the JSON records and is annotated with `@DefaultSchema(JavaFieldSchema.class)` to indicate that its fields should be used as the schema for the JSON records.
76+
77+
```java
78+
@DefaultSchema(JavaFieldSchema.class)
79+
public static class SampleRecord implements Serializable {
80+
public int id;
81+
public String month;
82+
public String amount
83+
84+
public SampleRecord() {
85+
}
86+
87+
public SampleRecord(int id, String month, String amount) {
88+
this.id = id;
89+
this.month = month;
90+
this.amount = amount;
91+
}
92+
}
93+
```
94+
95+
The `WriteJsonOptions` interface specifies command-line options for configuring the output file path. The `@Description` annotation provides a description of the option, and `@Validation.Required` indicates that the option is required.
96+
97+
```java
98+
public interface WriteJsonOptions extends PipelineOptions {
99+
100+
@Description("A file path to write JSON files to")
101+
@Validation.Required
102+
String getFilePath();
103+
104+
void setFilePath(String filePath);
105+
}
106+
```
107+
108+
The `PipelineOptionsFactory` class generates `PipelineOptions` from command-line arguments. The `Pipeline.create` method creates a data pipeline, defined as a sequence of transformations to be applied to the data.
109+
110+
```java
111+
WriteJsonOptions options =
112+
PipelineOptionsFactory.fromArgs(args)
113+
.withValidation().as(WriteJsonOptions.class);
114+
115+
Pipeline p = Pipeline.create(options);
116+
```
117+
118+
The pipeline generates a list of `SampleRecord` objects and applies the `Create` transform to create a `PCollection` from this list. Subsequently, the `JsonIO.write` transform is used to write the `PCollection` to a JSON file, with the `withNumShards` method specifying the number of output shards.
119+
120+
```java
121+
List<SampleRecord> rows =
122+
Arrays.asList(
123+
new SampleRecord(1, "January", "$1000"),
124+
new SampleRecord(2, "February", "$2000"),
125+
new SampleRecord(3, "March", "$3000"));
126+
127+
p.apply("Create Records", Create.of(rows))
128+
.apply(
129+
"Write Records to JSON File",
130+
JsonIO.<SampleRecord>write(options.getFilePath())
131+
.withNumShards(1));
132+
```
133+
134+
Finally, the code snippet invokes the `Pipeline.run` method to execute the pipeline.
135+
136+
```java
137+
p.run();
138+
```
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Prompt:
2+
What does this code do?
3+
4+
```java
5+
public class WriteCsvFile {
6+
7+
@DefaultSchema(JavaFieldSchema.class)
8+
public static class ExampleRecord implements Serializable {
9+
public int id;
10+
public String month;
11+
public String amount;
12+
13+
public ExampleRecord() {
14+
}
15+
16+
public ExampleRecord(int id, String month, String amount) {
17+
this.id = id;
18+
this.month = month;
19+
this.amount = amount
20+
}
21+
}
22+
23+
public interface WriteCsvFileOptions extends PipelineOptions {
24+
25+
@Description("A file path to write CSV files to")
26+
@Validation.Required
27+
String getFilePath();
28+
29+
void setFilePath(String filePath);
30+
}
31+
32+
public static void main(String[] args) {
33+
WriteCsvFileOptions options =
34+
PipelineOptionsFactory.fromArgs(args)
35+
.withValidation().as(WriteCsvFileOptions.class);
36+
37+
Pipeline p = Pipeline.create(options);
38+
39+
List<ExampleRecord> rows =
40+
Arrays.asList(
41+
new ExampleRecord(1, "January", "$1000"),
42+
new ExampleRecord(2, "February", "$2000"),
43+
new ExampleRecord(3, "March", "$3000"));
44+
45+
CSVFormat csvFormat =
46+
CSVFormat.DEFAULT.withHeaderComments("CSV file created by Apache Beam")
47+
.withCommentMarker('#');
48+
49+
p.apply("Create collection", Create.of(rows))
50+
.apply(
51+
"Write to CSV file",
52+
CsvIO.<ExampleRecord>write(options.getFilePathPrefix(), csvFormat)
53+
.withNumShards(1));
54+
p.run();
55+
}
56+
}
57+
58+
```
59+
60+
Response:
61+
This code snippet reads data from a CSV file using the `ReadFromCsv` transform from the CsvIO connector provided by Apache Beam.
62+
63+
```java
64+
@DefaultSchema(JavaFieldSchema.class)
65+
public static class ExampleRecord implements Serializable {
66+
public int id;
67+
public String month;
68+
public String amount;
69+
70+
public ExampleRecord() {
71+
}
72+
73+
public ExampleRecord(int id, String month, String amount) {
74+
this.id = id;
75+
this.month = month;
76+
this.amount = amount
77+
}
78+
}
79+
```
80+
81+
The `ExampleRecord` class represents a row in the CSV file, defining three fields: `id`, `month`, and `amount`. The `@DefaultSchema(JavaFieldSchema.class)` annotation instructs the Apache Beam SDK to use the `JavaFieldSchema` class to generate the schema for the `ExampleRecord` class.
82+
83+
```java
84+
public interface WriteCsvFileOptions extends PipelineOptions {
85+
86+
@Description("A file path to write CSV files to")
87+
@Validation.Required
88+
String getFilePath();
89+
90+
void setFilePath(String filePath);
91+
}
92+
```
93+
94+
The `WriteCsvFileOptions` interface defines a custom option for specifying the file path to write the CSV files to.
95+
96+
```java
97+
WriteCsvFileOptions options =
98+
PipelineOptionsFactory.fromArgs(args)
99+
.withValidation().as(WriteCsvFileOptions.class);
100+
101+
Pipeline p = Pipeline.create(options);
102+
```
103+
104+
The `Pipeline.create` method creates a data pipeline using the options defined in the `WriteCsvFileOptions` class.
105+
106+
```java
107+
List<ExampleRecord> rows =
108+
Arrays.asList(
109+
new ExampleRecord(1, "January", "$1000"),
110+
new ExampleRecord(2, "February", "$2000"),
111+
new ExampleRecord(3, "March", "$3000"));
112+
```
113+
114+
Subsequently, the code snippet creates a list of `ExampleRecord` objects to be written to the CSV file.
115+
116+
```java
117+
CSVFormat csvFormat =
118+
CSVFormat.DEFAULT.withHeaderComments("CSV file created by Apache Beam")
119+
.withCommentMarker('#');
120+
```
121+
122+
To write the data to a CSV file, the pipeline creates a `CSVFormat` object with a header comment and a comment marker.
123+
124+
```java
125+
p.apply("Create collection", Create.of(rows))
126+
.apply(
127+
"Write to CSV file",
128+
CsvIO.<ExampleRecord>write(options.getFilePathPrefix(), csvFormat)
129+
.withNumShards(1));
130+
```
131+
132+
The code applies the `Create` transform to generate a collection of `ExampleRecord` objects. Then, the `CsvIO.write` transform is applied to write the collection to a CSV file, with the `withNumShards` method specifying the number of shards to use when writing the file.
133+
134+
```java
135+
p.run();
136+
```
137+
138+
Finally, the code snippet invokes the `Pipeline.run` method to execute the pipeline.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
Prompt:
2+
What does this code do?
3+
4+
```java
5+
public class ReadGenericAvroIO {
6+
7+
private static final Logger LOG = LoggerFactory.getLogger(ReadGenericAvroIO.class);
8+
9+
public interface ReadAvroOptions extends PipelineOptions {
10+
@Description("Path to the Avro file to read from")
11+
@Default.String("gs://<your-bucket-name]/[your-object-name]/filename.avro")
12+
String getInput();
13+
14+
void setInput(String value);
15+
}
16+
17+
public static void main(String[] args) {
18+
ReadAvroOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(ReadAvroOptions.class);
19+
20+
Pipeline p = Pipeline.create(options);
21+
22+
String fields = "[{ \"name\": \"months\", \"type\": \"string\" }, { \"name\": \"amount\", \"type\": \"string\" }]";
23+
String schema = "{\"type\": \"record\", \"name\": \"AvroRecord\",\"fields\":" + fields + "}";
24+
Schema avroSchema = Schema.parse(schema);
25+
26+
p.apply(AvroIO.readGenericRecords(avroSchema).from(options.getInput()))
27+
.apply("Parse and log", ParDo.of(new DoFn<GenericRecord, String>() {
28+
@ProcessElement
29+
public void processElement(ProcessContext c) {
30+
GenericRecord record = c.element();
31+
String message = "Month: " + record.get("month") + ". Amount: " + record.get("amount");
32+
LOG.info(message);
33+
c.output(message);
34+
}
35+
}));
36+
37+
p.run();
38+
}
39+
}
40+
```
41+
42+
Response:
43+
This code reads data from an Apache Avro file using the `ReadFromAvro` transform from the AvroIO connector provided by Apache Beam.
44+
45+
```java
46+
public interface ReadAvroOptions extends PipelineOptions {
47+
@Description("Path to the Avro file to read from")
48+
@Default.String("gs://<your-bucket-name]/[your-object-name]/filename.avro")
49+
String getInput();
50+
51+
void setInput(String value);
52+
}
53+
```
54+
55+
The `ReadAvroOptions` interface defines the command-line argument `--input`, which specifies the path to the Avro file and sets the default value using the `@Default` annotation.
56+
57+
```java
58+
ReadAvroOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(ReadAvroOptions.class);
59+
60+
Pipeline p = Pipeline.create(options);
61+
```
62+
63+
The `PipelineOptionsFactory` class parses the command-line arguments and creates a `ReadAvroOptions` instance. This instance is then used to create a pipeline with the provided pipeline options.
64+
65+
```java
66+
String fields = "[{ \"name\": \"months\", \"type\": \"string\" }, { \"name\": \"amount\", \"type\": \"string\" }]";
67+
String schema = "{\"type\": \"record\", \"name\": \"AvroRecord\",\"fields\":" + fields + "}";
68+
Schema avroSchema = Schema.parse(schema);
69+
```
70+
71+
The AvroIO connector requires a schema to read Avro files. Hence, the schema is defined as a string and parsed into a `Schema` object.
72+
73+
```java
74+
p.apply(AvroIO.readGenericRecords(avroSchema).from(options.getInput()))
75+
.apply("Parse and log", ParDo.of(new DoFn<GenericRecord, String>() {
76+
@ProcessElement
77+
public void processElement(ProcessContext c) {
78+
GenericRecord record = c.element();
79+
String message = "Month: " + record.get("month") + ". Amount: " + record.get("amount");
80+
LOG.info(message);
81+
c.output(message);
82+
}
83+
}))
84+
```
85+
86+
In this segment, the `ParDo` transform processes each `GenericRecord` object from the Avro file. Each `GenericRecord` object is then parsed into a string and logged accordingly.
87+
88+
```java
89+
p.run();
90+
```
91+
92+
Finally, the pipeline is executed to read the Avro file using the AvroIO connector, parse the `GenericRecord` objects, format them, and output the results.

0 commit comments

Comments
 (0)