Software

  1. Software escrow
  2. Testing
  3. Hadoop
  4. Apache Spark

Software escrow

  • 3rd party that stores source code, compiled code, manual
  • hands over stored data to the customer only on specific conditions
  • protects customer from software developer going bankrupt ≡ absense of patches

Testing

  • fuzzing: random input
  • static analyzer
  • valgrind: memory leak

Hadoop

  • node types
    • NameNode:
      • coordinate data movement: map where every block is stored and where replicated
      • active/standby
      • block size: 64 MB, 128 MB
    • DataNode: store data
      • does not use RAID
  • MapReduce
    • batch processing, not real-time
    • query is broken into smaller tasks and distributed across nodes
    • high I/O to disks ≡ latency

Apache Spark

  • in-memory
  • Spark Streaming: real-time processing
  • integrates with messaging system (e.g., Kafka)
  • DStream: discretized stream ≡ microbatch