0;95;c Brown Bag Seminar on Technologies for Data Sciences

Brown Bag Seminar on Technologies for Data Sciences

Organizers: Duncan Temple Lang & Jerome Braun

The idea is to identify, become familiar with and discuss and compare technologies that are used increasingly in research and industry in the area of "Data Sciences". We'll explore these together informally by collectively educating each other.

The following are some sample topics we might discuss. This is intentionally vague and incomplete. I strongly encourage each of you to suggest others. Just send me mail.

Also, I really want and expect people to volunteer to research one of these or other related topics and to give a presentation. You don't have to be an expert in the topic, just interested in exploring it and willing to spend time investigating it with my help and guidance as you want it. You can partner and work with somebody and give a joint presentation Please email if you have identified a topic and are willing to give a talk.

  • Describe computing challenges/problems your are working on and get input from the group
    Presenting on works in progress where you want feedback about approaches and design are very welcome. They can be short and introduce the basic task and pose the questions and then we can query and discuss.
  • Strategies & Packages for Handling Big Data in R
  • Parallel & Distributed Computing
    • Scheduling
    • Fault Tolerance
    • Queueing software
    • MPI, multi-core, snow
  • Cloud Computing & Interfaces
  • Hadoop
  • Functional programming for parallelization
  • MapReduce
    succinct overview
  • PIG programming language
  • Text Search and Processing
    • Lucene
    • ElasticSearch
    • Solr
  • NoSQL databases
  • GPUs (Graphical Processing Units) for Scientific Computing
  • Visualization for Large Data
  • Choice of BLAS
    Benchmarks paper
  • Dynamic Documents & Reproducible Research
    • Sweave, Knitr, etc.
    • odfWeave, RWordXML
    • XDynDocs - XML and XSL
    • Emacs Org mode, esp. the Babel extension
  • Interfacing to Native/compiled code
    • Rcpp & Co. ("Not your father's C++")
  • Writing R software & packages
  • Programming Languages
    C, C++, Python, Julia, MATLAB, ...
  • Version Control
  • Productivity tools
    Workflow, Editors, Programming Environment, etc.
  • Statistics
    • Regularization
    • Controlling False Discovery Rate (FDR) for massively multiple testing
  • Streaming data and real-time estimation
  • Network Discovery and Analysis
    • Gene Regulatory Networks
    • Communities in Social Media
    • Collaboration/Co-author Networks (eg academic papers, wikipedia, etc)

  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Tue Apr 17 15:19:29 PDT 2012