This project is read-only.

Cloud Collections using the Actor Framework

The Actor Framework provides a platform for writing Actors that store state in the cloud, with replication across multiple machines to provide high availability. An obvious use is to store data for an application running on a device. Instead of using a database, the most natural way to do this is using existing frameworks that represent data, such as the .NET collection interfaces like System.Collections.Generic.IList<T>.   This document describes building the client side portion of collections.

System.Cloud.Collections

We have built some compelling client-side collections that store their data in the cloud, using actors. Specifically, we have produced CloudList<T> and CloudDictionary<TKey, TValue> modeled after .NET’s existing generic collections. We intend to grow this over time to match the .NET collections that exist today.

Some simple observations informed our design.

1)      The .NET collection interfaces are a natural way of doing programming today, and interoperability with them is paramount to pulling data from the cloud into an application in a transparent way.

2)      The .NET collection interfaces are not asynchronous. For better responsiveness & resource utilization, apps should not block the UI thread nor ideally any thread.

3)      Failures are more common over a network connection, by several orders of magnitude.

In response, we’ve introduced the following interfaces. While CloudList<T> implements IList<T> and friends today, the following interfaces should help improve the responsiveness of applications and allow for newly-written code to focus a bit more on error handling when handling Tasks.

namespace System.Cloud.Collections
{
   public interface ICollectionAsync<T> : IObservable<T>
   {
       Task<int> CountAsync { get; }
       Task<bool> IsReadOnlyAsync { get; }
       TimeSpan Timeout { get; set; }
 
       Task AddAsync(T item);
       Task ClearAsync();
       Task<bool> ContainsAsync(T item);
       Task CopyToAsync(T[] array, int arrayIndex);
       Task<bool> RemoveAsync(T item);
   }
 
   public interface IListAsync<T> : ICollectionAsync<T>
   {
       Task<T> GetItemAsync(int index);
       Task SetItemAsync(int index, T value);
       Task<int> IndexOfAsync(T item);
       Task InsertAsync(int index, T item);
       Task RemoveAtAsync(int index);
 
       // Less chatty versions
       Task AddAsync(IEnumerable<T> items);
       Task RemoveRangeAsync(int index, int count);
   }
 
   public interface IDictionaryAsync<TKey, TValue> : ICollectionAsync<KeyValuePair<TKey, TValue>>
   {
       Task<TValue> GetValueAsync(TKey key);
       Task SetValueAsync(TKey key, TValue value);
       Task<Tuple<bool, TValue>> TryGetValueAsync(TKey key);
 
       // No AddAsync - use SetValueAsync instead. We have no atomic operation to add iff a value is not in the dictionary.
       Task<bool> ContainsKeyAsync(TKey key);
       Task<bool> RemoveAsync(TKey key);
 
       // Bulk operations
       Task<ICollection<TValue>> GetValuesAsync(IEnumerable<TKey> keys);
       Task SetValuesAsync(IEnumerable<TKey> keys, IEnumerable<TValue> values);
       Task RemoveAsync(IEnumerable<TKey> keys);
 
       ICollection<TKey> Keys { get; }
       ICollection<TValue> Values { get; }
   }
}

CloudList

With these interfaces in place, CloudList<T> looks like a natural extension of List<T> for the cloud:

namespace System.Cloud.Collections
{
   public class CloudList<T> : IListAsync<T>, IList<T>, IDisposable
   {
       public CloudList(String server, int port, String listName);
       public CloudList(String hostEndpoint, String listName);
       public CloudList(String hostEndpoint, String listName, bool connectThroughGateway, int numReplicas = 2);
 
       public void Dispose(); // Disconnect from the actor (close sockets, etc)
       public void Destroy(); // Destroy the CloudList’s storage in the cloud
 
       // In addition to methods from IList<T> & IListAsync<T>, we’ve added: 
       public Task<T[]> ToArrayAsync();
       public Task SortAsync();
       public Task AddRangeAsync(IEnumerable<T> items);
       public Task RemoveRangeAsync(int index, int count);
   }

CloudList<T> will communicate with a ListActor in the Actor Framework to create a list with the specified name, then store all the values in memory. This list’s data will be replicated across two machines in the cloud, but is not persisted to disk. So if one machine goes down, your data is still safe, but if both are turned off due to a power outage, all data is lost. You can increase the number of replicas by setting numReplicas as desired. We hope to implement some form of persistence in the future.

Data Binding to UI Controls

Further, we must support INotifyCollectionChanged to data bind a control to a CloudList<T>. For this purpose, we have ObservableCloudList<T>, which subscribes to the actor and listens for published changes. The only complication beyond the existing ObservableCollection<T> is that we must have some way of communicating with the application’s UI thread, so we require users to provide a SynchronizationContext.

   public class ObservableCloudList<T> : CloudList<T>, INotifyCollectionChanged,
                                         IReadOnlyObservableList<T>
   {
       public ObservableCloudList(String server, int port, String listName, SynchronizationContext uiThreadSyncContext);
       public ObservableCloudList(String hostEndpoint, String listName, SynchronizationContext uiThreadSyncContext);
       public ObservableCloudList(String hostEndpoint, String listName, SynchronizationContext uiThreadSyncContext, bool useGateway = false, int numReplicas = 2);
 
       public event NotifyCollectionChangedEventHandler CollectionChanged;
   }

This is sufficient for apps like a grocery list written in WPF, accessed from two machines (ideally running on two phones). For example, one person can be at the store buying items, while someone at home can know that certain items have been purchased and add other items to the grocery list with minimal delay. See the GroceryList sample included with the Actor Framework. Here is how an ObservableCloudList<T> can be initialized, using a user-defined data type GroceryItem:

       // Run this on the UI thread.
       public static void InitializeGroceries()
       {
           // Use ObservableCloudList (similar to ObservableCollection) which implements INotifyCollectionChanged, so UI updates itself.
           try
           {
               // ObservableCloudList<T> needs the UI thread's synchronization context to respond to changes from other clients appropriately.
               _groceries = new ObservableCloudList<GroceryItem>("localhost", 9000, "Family Grocery List", SynchronizationContext.Current);
 
               if (_groceries.Count == 0)
               {
                   // Note - if we use AddAsync here, consider blocking until finished on each operation if we want to preserve order.
                   _groceries.Add(new GroceryItem("Milk", 1));
                   _groceries.Add(new GroceryItem("Cereal", 2));
                   _groceries.Add(new GroceryItem("Soda - 24 pack", 2));
                   _groceries.Add(new GroceryItem("The Macallan - 15 year", 1));
               }
           }
           catch (SocketException e)
           {
               // error handling
               Contract.Assert(e == null, e.Message);
           }
           catch (TimeoutException e)
           {
               MessageBox.Show(String.Format("Timed out when contacting the server for a CloudList. {0}", e.Message), "GroceryList timed out connecting to server");
           }
       }

The app then simply data binds to the ObservableCloudList<T> in the same way it would with a local List<T> or ObservableCollection<T>. That’s it – no additional work required, though error handling and using async methods would improve the app’s quality.

This client-side development should be an extremely intuitive way for .NET developers to access & store data in the cloud. This should work for high score lists and any other place developers use a list.

Mixing LINQ Queries & Databinding using ObservableCloudList<T>

Sometimes you want to apply a simple transformation to objects before data binding them to a list. IE, perhaps you have a list of account names as Strings, but you want to data bind that to a ListBox of client-side objects representing those accounts. But, you also want updates to the original collection to be pushed through to your UI as well. IE, if you add or remove an account name from the list, you want the UI to instantly update itself automatically. Conceptually you’re looking for a simple-seeming LINQ query that projects a String to your rich client-side UI-aware objects:

   _listBox.ItemsSource = accountNames.Select(name => new Person(name));

As discussed above, the INotifyCollectionChanged interface goes a long way to solving this problem. However it is insufficient because LINQ queries lose the type identity. By default, the call to Select above doesn’t return an object that implements INotifyCollectionChanged, so any updates are lost.

We have fixed this problem for ObservableCloudList<T>. This required adding in a marker interface:

   public interface IReadOnlyObservableList<T> : IReadOnlyList<T>, INotifyCollectionChanged
   {
   }

Using this marker interface, we can then provide an appropriate extension method that C# will bind to appropriately:

 
   public static class LinqExtensions
   {
       public static IReadOnlyObservableList<U> Select<T, U>(this IReadOnlyObservableList<T> self, Func<T, U> projection);
   }

With an appropriate implementation, Select will now preserve the INotifyCollectionChanged aspect of an ObservableCloudList<T> while also doing the projection required by Select. Internally this call to Select separates out the CollectionChanged event stream from the collection’s contents, projects both independently, and merges them back together.

CloudDictionary

Similar to Dictionary, we provide a CloudDictionary<TKey, TValue> for mapping keys to values. We’ve also produced a CloudStringDictionary<TValue>, since many dictionaries use Strings, and our actors work best when keys that map easily to Strings. With CloudDictionary, you can easily look up values in a natural way for your program. Consider a CloudDictionary<Customer, List<Orders>>:

 
   CloudDictionary<Customer, List<Order>> orderMap = new CloudDictionary<Customer, List<Order>>(fabricAddress, "Customer Orders");
   Customer customer = …
   List<Order> orders = orderMap[customer];

It's important that any changes to values stored within the dictionary are written back to the dictionary after you’re done. Unlike .NET collections, this write step is required to make your changes visible on other machines. This is conceptually like updating a value in a database – any local mutations are purely local until you update the value remotely.

With this support, you can think of the Actor Framework as providing the foundation for a NoSQL store, and our cloud collections let you deal with this remote data store in the most natural way possible for a .NET developer.

Here is the public set of API’s:

namespace System.Cloud.Collections
{
   public class CloudDictionary<TKey, TValue> : IDictionaryAsync<TKey, TValue>, IDictionary<TKey, TValue>, IDisposable
   {
       public CloudDictionary(String hostEndpoint, String dictionaryName, 
           Func<TKey, String> toIdentityString = null, Func<String, TKey> 
           fromIdentityString = null, bool connectThroughGateway=false);
 
       public void Dispose();
       public void Destroy(); // Destroy the CloudDictionary’s storage in the cloud
 
       // All methods from IDictionary<TKey, TValue> and IDictionaryAsync<TKey, TValue>
   }
}

Subtle notes on Equality & Hashing

Equality and hashing are on the surface easy concepts, but the .NET Framework allows types to overload GetHashCode & Equals, as well as users of a specific Dictionary to provide their own IEqualityComparer instances. This allows a wide range of flexibility, such as supporting case-insensitive string lookups, as well as hashing & comparing only a small number of fields of a key. These two approaches to customizing the notion of equality are useful, but don’t fit as well with where we want to grow the Actor Framework. Specifically we’d like to fully support clients in Javascript and other languages where such notions of equality and hashing are represented differently if at all. So we do not currently let users pass custom implementations of equality or hashing functions to a CloudDictionary.

Under the covers, the best mapping to actors is to use Strings for keys. So CloudDictionary<TKey, TValue> is simply layered on top of a CloudStringDictionary<TValue>. The CloudDictionary takes two functions that map a TKey to and from an identity String, encapsulating the useful fields that define equality between two TKeys. IE, if you have a Customer object and are interested in only the customer’s name but not which city they live in, you can provide a function that maps Customer to just the first & last name fields. All lookups of Customer instances would then ignore the city in all comparisons. By default, we use JSON serialization to convert to and from a string, which may include a few more fields than you intend. Understanding the version-tolerant serialization attributes and where necessary ISerializable will give you the right foundation for building useful key types.

Last edited Jun 4, 2013 at 12:08 AM by BrianGru, version 7

Comments

No comments yet.