• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java Dataset类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.kitesdk.data.Dataset的典型用法代码示例。如果您正苦于以下问题:Java Dataset类的具体用法?Java Dataset怎么用?Java Dataset使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



Dataset类属于org.kitesdk.data包,在下文中一共展示了Dataset类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: testTimedFileRolling

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Test
public void testTimedFileRolling()
    throws EventDeliveryException, InterruptedException {
  // use a new roll interval
  config.put("kite.rollInterval", "1"); // in seconds

  DatasetSink sink = sink(in, config);

  Dataset<GenericRecord> records = Datasets.load(FILE_DATASET_URI);

  // run the sink
  sink.start();
  sink.process();

  Assert.assertEquals("Should have committed", 0, remaining(in));

  Thread.sleep(1100); // sleep longer than the roll interval
  sink.process(); // rolling happens in the process method

  Assert.assertEquals(Sets.newHashSet(expected), read(records));

  // wait until the end to stop because it would close the files
  sink.stop();
}
 
开发者ID:moueimei,项目名称:flume-release-1.7.0,代码行数:25,代码来源:TestDatasetSink.java


示例2: dataset

import org.kitesdk.data.Dataset; //导入依赖的package包/类
/**
 * Ensures the given dataset exists, creating it if it doesn't
 * and updating the schema if necessary.
 */
protected void dataset(String uri, DatasetDescriptor descriptor) {

  try {

    Datasets.create(uri, descriptor);
  } catch (DatasetExistsException e) {

    Dataset existingDataset = Datasets.load(uri);

    DatasetDescriptor updated;

    // The given discriptor might not have a location,
    // so use the current one.
    if (descriptor.getLocation() == null) {
      updated = new DatasetDescriptor.Builder(descriptor)
          .location(existingDataset.getDescriptor().getLocation())
          .build();
    } else {

      updated = descriptor;
    }

    Datasets.update(uri, updated);
  }
}
 
开发者ID:rbrush,项目名称:kite-apps,代码行数:30,代码来源:AbstractApplication.java


示例3: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  // Load the users dataset
  Dataset<Record> users = Datasets.load(
      "dataset:hdfs:/tmp/data/users", Record.class);

  // Get a reader for the dataset and read all the users
  DatasetReader<Record> reader = null;
  try {
    reader = users.newReader();
    for (GenericRecord user : reader) {
      System.out.println(user);
    }

  } finally {
    if (reader != null) {
      reader.close();
    }
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:23,代码来源:ReadUserDatasetGeneric.java


示例4: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  // Load the users dataset
  Dataset<Record> users = Datasets.load(
      "dataset:hive?dataset=users", Record.class);

  // Get a reader for the dataset and read all the users
  DatasetReader<Record> reader = null;
  try {
    reader = users.newReader();
    for (GenericRecord user : users.newReader()) {
      System.out.println(user);
    }

  } finally {
    if (reader != null) {
      reader.close();
    }
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:23,代码来源:ReadHiveUserDatasetGeneric.java


示例5: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  // Load the users dataset
  Dataset<Record> users = Datasets.load(
      "dataset:hdfs:/tmp/data/users", Record.class);

  // Get a reader for the dataset and read all the users
  DatasetReader<Record> reader = null;
  try {
    reader = users.with("favoriteColor", "green").newReader();
    for (GenericRecord user : reader) {
      System.out.println(user);
    }

  } finally {
    if (reader != null) {
      reader.close();
    }
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:23,代码来源:ReadUserDatasetGenericOnePartition.java


示例6: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  // Load the products dataset
  Dataset<Product> products = Datasets.load(
      "dataset:hdfs:/tmp/data/products", Product.class);

  // Get a reader for the dataset and read all the users
  DatasetReader<Product> reader = null;
  try {
    reader = products.newReader();
    for (Product product : reader) {
      System.out.println(product);
    }

  } finally {
    if (reader != null) {
      reader.close();
    }
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:23,代码来源:ReadProductDatasetPojo.java


示例7: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {

  // Load the events dataset
  Dataset<GenericRecord> events = Datasets.load("dataset:hive:/tmp/data/default/events");

  // Get a reader for the dataset and read all the events
  DatasetReader<GenericRecord> reader = events.newReader();
  try {
    for (GenericRecord event : reader) {
      System.out.println(event);
    }
  } finally {
    reader.close();
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:19,代码来源:ReadDataset.java


示例8: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
  Dataset<Record> movies = Datasets.load(
      "dataset:hdfs:/tmp/data/movies", Record.class);

  DatasetReader<Record> reader = null;
  try {
    reader = movies.newReader();
    for (Record rec : reader) {
      System.err.println("Movie: " + rec);
    }

  } finally {
    if (reader != null) {
      reader.close();
    }
  }

  return 0;
}
 
开发者ID:kite-sdk,项目名称:kite-examples,代码行数:21,代码来源:ReadMovies.java


示例9: testParquetDataset

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Test
public void testParquetDataset() throws EventDeliveryException {
  Datasets.delete(FILE_DATASET_URI);
  Dataset<GenericRecord> created = Datasets.create(FILE_DATASET_URI,
      new DatasetDescriptor.Builder(DESCRIPTOR)
          .format("parquet")
          .build());

  DatasetSink sink = sink(in, config);

  // run the sink
  sink.start();
  sink.process();

  // the transaction should not commit during the call to process
  assertThrows("Transaction should still be open", IllegalStateException.class,
      new Callable() {
        @Override
        public Object call() throws EventDeliveryException {
          in.getTransaction().begin();
          return null;
        }
      });
  // The records won't commit until the call to stop()
  Assert.assertEquals("Should not have committed", 0, read(created).size());

  sink.stop();

  Assert.assertEquals(Sets.newHashSet(expected), read(created));
  Assert.assertEquals("Should have committed", 0, remaining(in));
}
 
开发者ID:moueimei,项目名称:flume-release-1.7.0,代码行数:32,代码来源:TestDatasetSink.java


示例10: createDataset

import org.kitesdk.data.Dataset; //导入依赖的package包/类
private static Dataset createDataset(Schema schema,
    CompressionType compressionType, String uri) {
  DatasetDescriptor descriptor = new DatasetDescriptor.Builder()
      .schema(schema)
      .format(Formats.PARQUET)
      .compressionType(compressionType)
      .build();
  return Datasets.create(uri, descriptor, GenericRecord.class);
}
 
开发者ID:aliyun,项目名称:aliyun-maxcompute-data-collectors,代码行数:10,代码来源:ParquetJob.java


示例11: configureInputFormat

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
protected void configureInputFormat(Job job, String tableName, String tableClassName,
    String splitByCol) throws ClassNotFoundException, IOException {
  fileType = getInputFileType();

  super.configureInputFormat(job, tableName, tableClassName, splitByCol);

  if (isHCatJob) {
    SqoopHCatUtilities.configureExportInputFormat(options, job, context.getConnManager(),
        tableName, job.getConfiguration());
    return;
  } else if (fileType == FileType.AVRO_DATA_FILE) {
    LOG.debug("Configuring for Avro export");
    configureGenericRecordExportInputFormat(job, tableName);
  } else if (fileType == FileType.PARQUET_FILE) {
    LOG.debug("Configuring for Parquet export");
    configureGenericRecordExportInputFormat(job, tableName);
    FileSystem fs = FileSystem.get(job.getConfiguration());
    String uri = "dataset:" + fs.makeQualified(getInputPath());
    Exception caughtException = null;
    try {
      DatasetKeyInputFormat.configure(job).readFrom(uri);
    } catch (DatasetNotFoundException e) {
      LOG.warn(e.getMessage(), e);
      LOG.warn("Trying to get data schema from parquet file directly");
      caughtException = e;
    }
    if (caughtException != null && caughtException instanceof DatasetNotFoundException) {
      DatasetDescriptor descriptor = getDatasetDescriptorFromParquetFile(job, fs, uri);
      Dataset dataset = Datasets.create(uri, descriptor, GenericRecord.class);
      DatasetKeyInputFormat.configure(job).readFrom(dataset);
    }
  }

  FileInputFormat.addInputPath(job, getInputPath());
}
 
开发者ID:aliyun,项目名称:aliyun-maxcompute-data-collectors,代码行数:37,代码来源:HdfsOdpsImportJob.java


示例12: verifyHiveDataset

import org.kitesdk.data.Dataset; //导入依赖的package包/类
private void verifyHiveDataset(String tableName, Object[][] valsArray) {
  String datasetUri = String.format("dataset:hive:default/%s",
      tableName.toLowerCase());
  assertTrue(Datasets.exists(datasetUri));
  Dataset dataset = Datasets.load(datasetUri);
  assertFalse(dataset.isEmpty());

  DatasetReader<GenericRecord> reader = dataset.newReader();
  try {
    List<String> expectations = new ArrayList<String>();
    if (valsArray != null) {
      for (Object[] vals : valsArray) {
        expectations.add(Arrays.toString(vals));
      }
    }

    while (reader.hasNext() && expectations.size() > 0) {
      String actual = Arrays.toString(
          convertGenericRecordToArray(reader.next()));
      assertTrue("Expect record: " + actual, expectations.remove(actual));
    }
    assertFalse(reader.hasNext());
    assertEquals(0, expectations.size());
  } finally {
    reader.close();
  }
}
 
开发者ID:aliyun,项目名称:aliyun-maxcompute-data-collectors,代码行数:28,代码来源:TestHiveImport.java


示例13: testMultiTableImportAsParquetFormat

import org.kitesdk.data.Dataset; //导入依赖的package包/类
public void testMultiTableImportAsParquetFormat() throws IOException {
  String [] argv = getArgv(new String[]{"--as-parquetfile"}, null);
  runImport(new ImportAllTablesTool(), argv);

  Path warehousePath = new Path(this.getWarehouseDir());
  int i = 0;
  for (String tableName : this.tableNames) {
    Path tablePath = new Path(warehousePath, tableName);
    Dataset dataset = Datasets.load("dataset:file:" + tablePath);

    // dequeue the expected value for this table. This
    // list has the same order as the tableNames list.
    String expectedVal = Integer.toString(i++) + ","
        + this.expectedStrings.get(0);
    this.expectedStrings.remove(0);

    DatasetReader<GenericRecord> reader = dataset.newReader();
    try {
      GenericRecord record = reader.next();
      String line = record.get(0) + "," + record.get(1);
      assertEquals("Table " + tableName + " expected a different string",
          expectedVal, line);
      assertFalse(reader.hasNext());
    } finally {
      reader.close();
    }
  }
}
 
开发者ID:aliyun,项目名称:aliyun-maxcompute-data-collectors,代码行数:29,代码来源:TestAllTables.java


示例14: getExecutor

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@VisibleForTesting
KiteDatasetExecutor getExecutor(LinkConfiguration linkConfiguration, String uri, Schema schema,
    FileFormat format) {
  // Note that instead of creating a dataset at destination, we create a
  // temporary dataset by every KiteLoader instance. They will be merged when
  // all data portions are written successfully. Unfortunately, KiteLoader is
  // not able to pass the temporary dataset uri to KiteToDestroyer. So we
  // delegate KiteDatasetExecutor to manage name convention for datasets.
  uri = KiteDatasetExecutor.suggestTemporaryDatasetUri(linkConfiguration.linkConfig, uri);
  LOG.info("Constructed temporary dataset URI: " + uri);
  Dataset<GenericRecord> dataset =
      KiteDatasetExecutor.createDataset(uri, schema, format);
  return new KiteDatasetExecutor(dataset);
}
 
开发者ID:vybs,项目名称:sqoop-on-spark,代码行数:15,代码来源:KiteLoader.java


示例15: getSchema

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Override
public Schema getSchema(InitializerContext context,
    LinkConfiguration linkConfig, FromJobConfiguration fromJobConfig) {
  String uri = ConfigUtil.buildDatasetUri(
      linkConfig.linkConfig, fromJobConfig.fromJobConfig.uri);
  Dataset dataset = Datasets.load(uri);
  org.apache.avro.Schema avroSchema = dataset.getDescriptor().getSchema();
  return AvroDataTypeUtil.createSqoopSchema(avroSchema);
}
 
开发者ID:vybs,项目名称:sqoop-on-spark,代码行数:10,代码来源:KiteFromInitializer.java


示例16: createDataset

import org.kitesdk.data.Dataset; //导入依赖的package包/类
/**
 * Creates a new dataset.
 */
public static Dataset<GenericRecord> createDataset(String uri, org.apache.sqoop.schema.Schema schema,
    FileFormat format) {
  Schema datasetSchema = KiteDataTypeUtil.createAvroSchema(schema);
  Format datasetFormat = KiteDataTypeUtil.toFormat(format);
  DatasetDescriptor descriptor = new DatasetDescriptor.Builder()
      .property("kite.allow.csv", "true")
      .schema(datasetSchema)
      .format(datasetFormat)
      .build();
  return Datasets.create(uri, descriptor);
}
 
开发者ID:vybs,项目名称:sqoop-on-spark,代码行数:15,代码来源:KiteDatasetExecutor.java


示例17: getExecutor

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@VisibleForTesting
KiteDatasetExecutor getExecutor(String uri, Schema schema,
    FileFormat format) {
  Dataset<GenericRecord> dataset =
      KiteDatasetExecutor.createDataset(uri, schema, format);
  return new KiteDatasetExecutor(dataset);
}
 
开发者ID:vybs,项目名称:sqoop-on-spark,代码行数:8,代码来源:KiteToDestroyer.java


示例18: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
public void run(@DataIn(name = "event_stream", type = SmallEvent.class)
                JavaDStream<SmallEvent> stream,
                @DataOut(name = "event_output", type = SmallEvent.class)
                Dataset<SmallEvent> output) {

  SparkDatasets.save(stream, output);
}
 
开发者ID:rbrush,项目名称:kite-apps,代码行数:8,代码来源:StreamingSparkJob.java


示例19: testJobConfiguration

import org.kitesdk.data.Dataset; //导入依赖的package包/类
@Test
public void testJobConfiguration() {

  Map<String,String> settings = ImmutableMap.of("test.app.level.setting", "appvalue",
      "kite.job.write-config-job.test.job.level.setting", "jobvalue",
      "kite.job.write-config-job.output.kv-output.test.output.level.setting", "outputvalue");

  AppContext context = new AppContext(settings, getConfiguration());

  TestScheduler scheduler = TestScheduler.load(WriteConfigOutputApp.class, context);

  Instant nominalTime = new DateTime(2015, 5, 15, 12, 0, 0, 0, DateTimeZone.UTC).toInstant();

  scheduler.runScheduledJobs(nominalTime);

  Dataset<KeyValues> ds = Datasets.load(WriteConfigOutputApp.OUTPUT_DATASET, KeyValues.class);

  DatasetReader<KeyValues> reader = ds.newReader();

  try {

    KeyValues kv = reader.next();

    Assert.assertEquals(ImmutableMap.of(
            "test.app.level.setting", "appvalue",
            "test.job.level.setting", "jobvalue",
            "output.kv-output.test.output.level.setting", "outputvalue"),
        kv.getJobsettings());

    Assert.assertEquals(ImmutableMap.of(
            "test.app.level.setting", "appvalue",
            "test.job.level.setting", "jobvalue",
            "test.output.level.setting", "outputvalue"),
        kv.getOutputsettings());

  } finally {
    Closeables.closeQuietly(reader);
  }
}
 
开发者ID:rbrush,项目名称:kite-apps,代码行数:40,代码来源:SchedulableJobTest.java


示例20: run

import org.kitesdk.data.Dataset; //导入依赖的package包/类
public void run(@DataIn(name = "event_stream", type = ExampleEvent.class)
                JavaDStream<ExampleEvent> stream,
                @DataOut(name = "event_output", type = ExampleEvent.class)
                Dataset<ExampleEvent> output) {

  SparkDatasets.save(stream, output);
}
 
开发者ID:rbrush,项目名称:kite-apps,代码行数:8,代码来源:TopicToDatasetJob.java



注:本文中的org.kitesdk.data.Dataset类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java WSSecUsernameToken类代码示例发布时间:2022-05-23
下一篇:
Java SOAPSEIModel类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap