scott colestock's talk on appfabric caching

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Everything you wanted to know about Velocity(but were afraid to cache) : 

Everything you wanted to know about Velocity(but were afraid to cache) Scott Colestock scott@marcatopartners.com Marcato Partners, LLC

About Scott : 

About Scott Scott Colestock scott@colestock.net Twitter: scolestock Marcato Partners (MarcatoPartners.com) One of three partners Focused on agile coaching Focused on helping early-stage startup ventures in the mobile space

What is it? : 

What is it? Velocity is a distributed key/value cache that provides .NET developers with a way to increase performance and scalability when writing data-centric applications.

What is it? (2) : 

What is it? (2) The combined RAM available to all servers in a Velocity cluster is presented to Velocity clients as a unified whole Any serializable CLR object can be stored Actual location within cluster is transparent Client is a simple key/value API at heart Run as a service accessed across the network Additional servers can be added on demand

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

Motivation : 

Motivation Data-centric applications have been the norm for a long while Relational data More recently, “service-obtained” data Velocity is about increasing performance by bringing the data physically closer to the consumer Reduce pressure on underlying data stores/services Velocity can be about storing data in value-added form (logically closer to the consumer) Object graphs Output caching (not explicit in V1) Aggregated data in xml or other transformed formats

Motivation (2) : 

Motivation (2) Databases are always a point of high contention as you scale out, and tuning is expensive Are your data retrieval sprocs getting harder to maintain - excessive sql chops required? Service calls for reference data (internal/external) are often slow or intentionally throttled Caching has always been considered a solution for these issues…

Motivation (3) : 

Motivation (3) Machine-local caching solutions (like Microsoft’s “Enterprise Library Caching Application Block”) can provide partial answer Easy key/value API Flexible store (memory, disk-backed, etc.) Flexible expiration and eviction policy Limitations: Limited by the memory available to a single node… Application recycles typically mean you lose the cache In a load-balanced environment, a large data set means you will frequently “miss” when attempting to load from cache…

Motivation (4) : 

Motivation (4) Key 3,5,23 Key 7,11,47 Key 12,16,33 Machine-local caches wind up being sparsely populated when used with a load balancer (if the data set has many keys)

Motivation (5) : 

Motivation (5) With machine-local caches, you have no central place to update/delete cached items This means you can only cache data that can afford to be stale by some time period If the time period is short, you need a low TTL (time-to-live, aka expiration) which means more cache misses You can’t cache data that must have changes visible to the system in (near) real time With a single logical cache, you have one cache to shoot in the event of an update/delete Might be able to live with no expiration

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

Windows Server AppFabric Caching : 

Windows Server AppFabric Caching History: AppFabric caching was a separate component Public debut at TechEd 2008 (earlier?) Codename: Velocity “Dublin” was a separate effort, focused on providing a hosting and management environment around WCF/WF November 2009: Technologies grouped under heading of “Windows Server AppFabric” RTW in June 2010…

Relationship to Windows Azure AppFabric : 

Relationship to Windows Azure AppFabric Service bus: Handle communication and authentication for accessing applications Expose apps through firewalls, NAT gateways, etc. Assist cloud-based apps talking to on-premise apps Other composite app scenarios; pub/sub Access Control Service: Allow you to avoid setting up federated identity agreements just to grant partner/customer access to your cloud-based or on-premise apps. Today: Only common marketing/branding with Windows Server AppFabric. Later: Common services for both

Cache-Aside Pattern : 

Cache-Aside Pattern In the current version, the out-of-box support is for the “cache-aside” pattern. Check cache If miss, retrieve data, then populate the cache Lots of other patterns you might contemplate (and simulate) with what is provided Read-through/Write-through Refresh-ahead/Write-behind

Cache-Aside Pattern : 

Cache-Aside Pattern

Logical Hierarchy : 

Cache Cluster Logical Hierarchy Server A Cache Host A Server B Cache Host B Server C Cache Host C Named Cache: Product Catalog Default Cache Region: Sports Region 1 Region 3 Client apps work with a single logical unit of cache Server process is DistributedCacheService.exe Caches explicitly created with TTL, expiration, HA policy Regions represent a partition of data (subset of key/value pairs). Live on one node. Unit of replication/failover. Regions can be implicit or explicit. Use explicit only for bulk gets or searching.

Logical Hierarchy : 

Logical Hierarchy Named Cache: Product Catalog Default Cache Region: Sports Region 1

Physical Layout : 

Cache Cluster Physical Layout Web Server A IIS 7.x Web Server B IIS 7.x Web Server C IIS 7.x Load Balancer Cache Server A Cache Host Cache Server B Cache Host Cache Server C Cache Host

Combined Deployment : 

Combined Deployment

Physical Layout : 

Physical Layout Configuration store contains cache policies and global partition map (how keys divide into regions, which servers have which regions) If Sql config store, servers will send heartbeat to Sql. Otherwise, heartbeat goes to one or more “lead hosts” Partition map used by “Global Partition Manager” (one node in the cluster, but auto failover) to communicate routing information to Velocity clients

Regions as unit of replication/failover(Global Partition Manager in action) : 

Regions as unit of replication/failover(Global Partition Manager in action) Cache Cluster Server A Cache Host A Server B Cache Host B Server C Cache Host C Named Cache: Product Catalog Default Cache Region: Sports Region 1

Regions as unit of replication/failover(When using Secondaries) : 

Regions as unit of replication/failover(When using Secondaries) Cache Cluster Server A Cache Host A Server B Cache Host B Server C Cache Host C Named Cache: Product Catalog Default Cache Region: Sports Region 1 Sports secondary Region 1 secondary (Updates done synchronously)

Local Cache : 

Local Cache Local cache is an option that can be enabled when creating the cache client (DataCacheFactory) Allows a local cache to be populated that will prevent network hop (and serialization) if request can be satisfied locally Best when data set is (relatively) small, changes infrequently, and stale data is acceptable Can expire via TTL or notifications (which might be late/lost) Can specify max object count before evicting LRU

Data Types and Caching Considerations : 

Data Types and Caching Considerations Reference Data: Product catalogs, “lookup” tables, other slow-moving content Safe to cache for a defined period of time because you probably live with staleness already “Local” cache option might be desirable for small data sets Activity Data: Shopping carts or other transient transaction state Accessed for read and write operations, but not shared. Low/No concurrency considerations – exclusive write. Safe to cache for reads and keep in cache for writes Resource Data: Inventory, Orders, and other core transactional data Accessed concurrently for read and write Caching will require a concurrency model to be chosen and managed

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

Deploy/Install Considerations : 

Deploy/Install Considerations Windows “Application Server” Role required A few critical updates (see install guide) .NET3.5SP1 for cache clients; .NET4 for servers You’ll need Powershell 2 (already in Win7/Win2k8R2) Windows XP cannot be a client… “Install” and “Configure” for AppFabric are two distinct steps

Deploy/Install Considerations : 

Deploy/Install Considerations Primary screen of interest is choosing your configuration store: XML/File share Sql-Based File share avoids the need for Sql Server, but requires that some nodes in the cache cluster be special (“Lead Hosts”) Using Sql as the configuration store is the better engineering choice for production – you may have other reasons to avoid it.

Deploy/Install Considerations : 

Deploy/Install Considerations As you build out your AppFabric Cache Cluster, you will do “New Cluster” on the first node, and “Join Cluster” on subsequent nodes Ultimately, all of Windows Server AppFabric is a set of features underneath the Application Server Role – so standard command line installations work. Setup.exe /install /i cachingservice,cacheclient,cacheadmin /l:c:\temp\setup.log

AppFabric as Application Server“Role Service” : 

AppFabric as Application Server“Role Service”

Deploy/Install Considerations : 

Deploy/Install Considerations Can do a “Cache client” install for clients, or for internal apps, just incorporate client assemblies in your own build/deploy process Microsoft.ApplicationServer.Caching.Core.dll Microsoft.ApplicationServer.Caching.Client.dll Microsoft.WindowsFabric.Common.dll Microsoft.WindowsFabric.Data.Common.dll

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

Caching Classes : 

Caching Classes DataCacheFactory DataCacheFactory() DataCacheFactory(configuration) DataCache GetCache(string cache) GetDefaultCache() DataCacheFactoryConfiguration LocalCacheProperties NotificationProperties SecurityProperties DataCacheServerEndpoint[] Servers (Can set these via configuration) DataCache

Caching Classes : 

Caching Classes

DataCache with DataCacheItemVersion : 

DataCache with DataCacheItemVersion GetCacheItem: returns tags and version info GetIfNewer: lets you use that version info! Put and Remove have overloads that takes version info Allows for an optimistic concurrency model Will only succeed if version information matches what is current for the cached item

DataCache and Locking : 

DataCache and Locking GetAndLock: Allows you to lock a cache item for a specified time period, even if not present (Will fail if already locked) public Object GetAndLock (string key, TimeSpan timeout, out DataCacheLockHandle lockHandle, bool forceLock) Useful when attempting to get multiple servers to coordinate “cache pre-load” activity PutAndUnlock: Unlock an item, with given key and lock handle Unlock: Explicitly unlock, optional extend TTL

DataCache and Tags/Regions : 

DataCache and Tags/Regions Explicitly created regions live on a single node…can create a hot spot for both call volume and memory growth But they offer bulk retrieval and flexible tag-based retrieves For secondary indexes, instead of regions: simulate secondary indexes with your own secondary-to-primary mapping cache

Administrative Model : 

Administrative Model Administration for AppFabric Caching done purely through PowerShell Can administrate entire Cache Cluster from wherever administrative portion of install has been done – all nodes addressable from single command line location Use-CacheCluster points the shell at a particular cluster to administrate Get-Command -module DistributedCacheAdministration

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

What we’ll cover : 

What we’ll cover What motivates this product/technology Terms / Pictures / Concepts Deploy / Install Process A lap around the API & Admin model Demos Gotchyas

Gotchyas : 

Gotchyas Balance number of nodes in cluster with memory per node. Too many nodes = cluster overhead, too much memory per node = GC overhead If you don’t use Sql Config Store, you need to manually run Start-CacheHost after reboot Consider the nature of data stored in cache, and secure appropriately (don’t let cache be weakest link) Sql Config Store requires high Sql privileges right now at point of install Currently service runs as network service account Consider what you will do when cache is down You can go after source of truth How do you avoid leaving stale data in the cache?

Resources : 

Resources AppFabric Caching and Deployment Guide http://bit.ly/AppFabMgmt AppFabric Development Center http://bit.ly/AppFabDevCtr AppFabric Forums http://bit.ly/AppFabForum NHibernate integration http://sourceforge.net/projects/nhcontrib/files/NHibernate.Caches/ Entity Framework integration (basis for) http://code.msdn.microsoft.com/EFProviderWrappers Recent MSDN: http://msdn.microsoft.com/en-us/magazine/ff714581.aspx Ayende’s series on distributed hash tables (excellent): http://ayende.com/Blog/archive/2008/08/09/Patterns-for-using-Distributed-Hash-Tables-Conclusion.aspx

Thank you - : 

Thank you - Questions?