Trading System Arkitektur Pdf

Trading Floor Architecture Trading Floor Architecture Executive Oversikt Økt konkurranse, høyere markedsdata volum og nye regulatoriske krav er noen av drivkraften bak industriendringer. Bedrifter prøver å opprettholde sin konkurransefortrinn ved stadig å endre sine handelsstrategier og øke handelshastigheten. En levedyktig arkitektur må inneholde de nyeste teknologiene fra både nettverks - og applikasjonsdomener. Det må være modulært for å gi en håndterlig vei for å utvikle hver komponent med minimal forstyrrelse av det totale systemet. Derfor er arkitekturen som foreslås av dette papiret basert på et tjenestegrunnlag. Vi undersøker tjenester som ultra-low latency messaging, latency overvåking, multicast, databehandling, lagring, data og applikasjons virtualisering, trading resiliency, handel mobilitet og tynn klient. Løsningen til de komplekse kravene til neste generasjons handelsplattform må bygges med en helhetlig tankegang, som krysser grensene for tradisjonelle siloer som forretning og teknologi eller applikasjoner og nettverk. Hoveddokumentet til dette dokumentet er å gi retningslinjer for å bygge en plattform for ekstremt lav ventetid, samtidig som man optimaliserer råmengde og meldingsfrekvens for både markedsdata og FIX-handelsordrer. For å oppnå dette foreslår vi følgende ventetidsreduksjonsteknologier: Høyhastighets inter-connectInfiniBand eller 10 Gbps-tilkobling for handelsklyngen Høyhastighetsmeldingsbuss Applikasjonsakselerasjon via RDMA uten applikasjon omregistrere sanntidsforsinkelse og re-retning av handelstrafikk til banen med minimal ventetid Bransjestendenser og utfordringer Neste generasjons handelsarkitekturer må svare på økte krav til fart, volum og effektivitet. For eksempel forventes volumet av opsjonsmarkedsdata å doble etter innføringen av opsjoner penny trading i 2007. Det er også lovkrav for best mulig utførelse, som krever håndteringspris oppdateringer til priser som nærmer seg 1M msgsec. for utveksling. De krever også synlighet i dataets friskhet og bevis på at klienten har best mulig utførelse. På kort sikt er handels - og innovasjonshastighet viktige differensier. Et økende antall handler håndteres av algoritmiske handelsapplikasjoner som er plassert så nært som mulig for handelsutførelsesstedet. En utfordring med disse quartz-boxquot-handelsmotorer er at de forbinder volumøkningen ved å utstede ordrer bare for å avbryte dem og sende dem igjen. Årsaken til denne oppførselen er mangel på synlighet i hvilken arena gir best mulig utførelse. Den menneskelige næringsdrivende er nå en sivilingeniør, en kvotantkvot (kvantitativ analytiker) med programmeringsferdigheter, som kan justere handelsmodeller på fluen. Bedrifter utvikler nye finansielle instrumenter som vær derivater eller cross-asset class trades, og de trenger å distribuere de nye applikasjonene raskt og på skalerbar måte. På lang sikt bør konkurransedyktig differensiering komme fra analyse, ikke bare kunnskap. Stjernehandlerne i morgen tar på seg risiko, oppnår ekte klientinnsikt og konsekvent slo markedet (kilde IBM: www-935.ibmservicesusimcpdfge510-6270-trader. pdf). Forretningssikkerhet har vært en viktig bekymring for handelsfirmaer siden 11. september 2001. Løsninger på dette området spenner fra redundante datasentre som ligger i forskjellige geografiske områder og koblet til flere handelssteder til virtuelle handelsløsninger som tilbyr strømforhandlere mesteparten av funksjonaliteten til et handelsgulv på en ekstern plassering. Finansnæringen er en av de mest krevende når det gjelder IT-krav. Næringen opplever et arkitektonisk skifte mot Services Oriented Architecture (SOA), webtjenester og virtualisering av IT-ressurser. SOA utnytter økningen i nettverkshastigheten for å muliggjøre dynamisk binding og virtualisering av programvarekomponenter. Dette gjør det mulig å skape nye applikasjoner uten å miste investeringen i eksisterende systemer og infrastruktur. Konseptet har potensial til å revolusjonere måten integrering er gjort, noe som muliggjør betydelige reduksjoner i kompleksiteten og kostnaden ved en slik integrasjon (gigaspacesdownloadMerrilLynchGigaSpacesWP. pdf). En annen trend er konsolidering av servere i datasenter-serverfarmar, mens handelsdisker har kun KVM-utvidelser og ultratynne klienter (for eksempel SunRay og HP-bladløsninger). Høyhastighets Metro Area Networks gjør det mulig for markedsdata å være multicast mellom forskjellige steder, slik at virtualiseringen av handelsgulvet blir muliggjort. Høyt nivå arkitektur Figur 1 viser arkitekturen på et høyt nivå i et handelsmiljø. Ticker-anlegget og de algoritmiske handelsmotorer er lokalisert i high performance trading-klyngen i firmaets datasenter eller på utvekslingen. De menneskelige handelsmennene er lokalisert i sluttbruker applikasjonsområdet. Funksjonelt er det to applikasjonskomponenter i bedriftshandelsmiljøet, utgivere og abonnenter. Messaging-bussen gir kommunikasjonsveien mellom utgivere og abonnenter. Det er to typer trafikk som er spesifikk for et handelsmiljø: Market DataCarries prisinformasjon for finansielle instrumenter, nyheter og annen verdiskapende informasjon, for eksempel analytics. Det er ensrettet og svært latensfølsom, vanligvis levert over UDP multicast. Det måles i oppdateringsspesifikasjon. og i Mbps. Markedsdata flyter fra en eller flere eksterne feeds, som kommer fra markedsdatautbydere som børser, dataaggregatorer og ECNs. Hver leverandør har sitt eget markedsdataformat. Dataene er mottatt av feed handlers, spesialiserte applikasjoner som normaliserer og renser dataene og sender det til data forbrukere, for eksempel prismotorer, algoritmiske handelsapplikasjoner eller menneskelige handelsfolk. Selger-sidefirmaer sender også markedsdata til sine kunder, kjøpsselskaper som fond, hedgefond og andre kapitalforvaltere. Noen kjøpsselskaper kan velge å motta direkte strømmer fra utveksling, noe som reduserer ventetiden. Figur 1 Handelsarkitektur for et kjøp SideSell Sidefirma Det er ingen bransjestandard for markedsdataformater. Hver utveksling har sitt proprietære format. Finansielle innholdsleverandører som Reuters og Bloomberg samlet ulike kilder til markedsdata, normaliserer det og legger til nyheter eller analyser. Eksempler på konsoliderte feeds er RDF (Reuters Data Feed), RWF (Reuters Wire Format) og Bloomberg Professional Services Data. For å levere lavere ventetid markedsdata har begge leverandørene gitt ut sanntids markedsdata-feeder som er mindre prosessert og har mindre analyser: Bloomberg B-PipeWith B-Pipe, deaktiverer Bloomberg sine markedsdata fra deres distribusjonsplattform fordi en Bloomberg terminal er ikke nødvendig for å få B-rør. Wombat og Reuters Feed Handlers har annonsert støtte for B-Pipe. Et firma kan bestemme seg for å motta strømmer direkte fra en bytte for å redusere ventetiden. Gevinsten i overføringshastigheten kan være mellom 150 millisekunder og 500 millisekunder. Disse fôrene er mer komplekse og dyrere og firmaet må bygge og vedlikeholde sin egen ticker plante (financetechfeaturedshowArticle. jhtmlarticleID60404306). Trading Orders Denne typen trafikk bærer de faktiske handler. Det er toveis og veldig latensfølsomt. Det måles i messagessec. og Mbps. Ordrene kommer fra en kjøpsside eller selger side firma og sendes til handelssteder som en Exchange eller ECN for utførelse. Det vanligste formatet for bestillingstransport er FIX (Financial Information eXchangefixprotocol. org). Programmene som håndterer FIX-meldinger kalles FIX-motorer, og de grensesnitt med ordrehåndteringssystemer (OMS). En optimalisering til FIX kalles FAST (Fix Adapted for Streaming), som bruker et komprimeringsskjema for å redusere meldingslengde og i virkeligheten redusere ventetiden. FAST er målrettet mer til levering av markedsdata og har potensial til å bli en standard. FAST kan også brukes som kompresjonsskjema for proprietære markedsdataformater. For å redusere ventetiden kan firmaer velge å etablere direkte markedsadgang (DMA). DMA er den automatiserte prosessen med å dirigere en verdipapirordre direkte til en utførelseslokal, og unngår derfor inngrep av en tredjepart (towergroupresearchcontentglossary. jsppage1ampglossaryId383). DMA krever en direkte forbindelse til utførelsesstedet. Messaging-bussen er mellomvareprogramvare fra leverandører som Tibco, 29West, Reuters RMDS, eller en åpen kildekodeplattform som AMQP. Messagingbussen bruker en pålitelig mekanisme for å levere meldinger. Transporten kan gjøres over TCPIP (TibcoEMS, 29West, RMDS og AMQP) eller UDPmulticast (TibcoRV, 29West og RMDS). Et viktig konsept i meldingsdistribusjon er den quottopiske strømmen, som er en delmengde av markedsdata definert av kriterier som tickersymbol, industri eller en viss kurv med finansielle instrumenter. Abonnenter deltar emnegrupper kortlagt til ett eller flere underemner for kun å motta relevant informasjon. Tidligere mottok alle forhandlere alle markedsdata. Ved dagens trafikkvolum vil dette være suboptimal. Nettverket spiller en kritisk rolle i handelsmiljøet. Markedsdata blir overført til handelsgulvet der de menneskelige handlerne befinner seg via et høyhastighetsnettverk i Campus eller Metro. Høy tilgjengelighet og lav ventetid, samt høy gjennomstrømning, er de viktigste beregningene. Det høye ytelsesmessige handelsmiljøet har de fleste av komponentene i datasenterets servergård. For å minimere ventetid, må de algoritmiske handelsmotorer lokaliseres i nærheten av matbehandlere, FIX-motorer og ordrehåndteringssystemer. En alternativ distribusjonsmodell har de algoritmiske handelssystemene plassert på en utveksling eller en tjenesteleverandør med rask tilkobling til flere utvekslinger. Distribusjonsmodeller Det er to distribusjonsmodeller for en plattform for høy ytelse. Bedrifter kan velge å ha en blanding av de to: Datasenter for handelsfirmaet (Figur 2) Dette er den tradisjonelle modellen, hvor en fullverdig handelsplattform er utviklet og vedlikeholdt av firmaet med kommunikasjonsforbindelser til alle handelssteder. Latency varierer med hastigheten på linkene og antall humle mellom firmaet og spillestedene. Figur 2 Tradisjonell distribusjonsmodell Samlokalisering på handelsstedet (utveksling, finansielle tjenesteleverandører) (Figur 3) Handelsfirmaet utnytter sin automatiserte handelsplattform så nært som mulig for utførelsesstedene for å minimere latens. Figur 3 Hosted Deployment Model Services-Oriented Trading Architecture Vi foreslår et tjenesteorientert rammeverk for å bygge neste generasjons handelsarkitektur. Denne tilnærmingen gir et konseptuelt rammeverk og en gjennomføringsvei basert på modularisering og minimering av interdependenser. Dette rammeverket gir bedrifter en metode for å: Evaluere deres nåværende tilstand når det gjelder tjenester Prioritere tjenester basert på deres verdi til virksomheten Utvikle handelsplattformen til ønsket tilstand ved hjelp av en modulær tilnærming. Den høye ytelseshandelsarkitekturen er avhengig av følgende tjenester, som definert av servicearkitekturrammen som er representert i figur 4. Figur 4 Servicearkitekturramme for høy ytelseshandel Ultra-lav varslingstjeneste Denne tjenesten leveres av meldingsbussen, som er et programvaresystem som løser problemet med å koble til mange-til - mange applikasjoner. Systemet består av: Et sett med forhåndsdefinerte meldingsskjemaer Et sett med vanlige kommandobeskeder En delt applikasjonsinfrastruktur for å sende meldingene til mottakerne. Den delte infrastrukturen kan være basert på en meldingsmegler eller på en publiseringsmeldingsmodell. Nøkkelkravene for neste generasjons meldingsbuss er (kilde 29West): Lavest mulig ventetid (f. eks. Mindre enn 100 mikrosekunder) Stabilitet under tung belastning (f. eks. Mer enn 1,4 millioner meldinger). Kontroll og fleksibilitet (hastighetskontroll og konfigurerbare transporter) Der er innsats i bransjen for å standardisere meldingsbussen. Advanced Message Queuing Protocol (AMQP) er et eksempel på en åpen standard som er preget av J. P. Morgan Chase og støttet av en gruppe leverandører som Cisco, Envoy Technologies, Red Hat, TWIST Process Innovations, Iona, 29West og iMatix. To av de viktigste målene er å gi en enklere vei til interoperabilitet for applikasjoner skrevet på forskjellige plattformer og modularitet slik at middleware enkelt kan utvikles. I svært generelle termer er en AMQP-server analog med en e-postserver med hver utveksling som fungerer som et meldingsoverføringsmiddel og hver meldingskø som en postkasse. Bindingene definerer rutingstabellene i hvert overføringsmiddel. Utgivere sender meldinger til individuelle overføringsagenter, som deretter sender meldingene til postkasser. Forbrukerne tar meldinger fra postkasser, som skaper en kraftig og fleksibel modell som er enkel (kilde: amqp. orgtikiwikitiki-index. phppageOpenApproachWhyAMQP). Latency Monitoring Service Hovedkravene for denne tjenesten er: Måling av måling i løpet av millisekundene. Nær sanntidssynthet uten å legge latens i handelstrafikken. Mulighet for å differensiere applikasjonsbehandlingstid fra nettverkstransittidforsinkelse. Mulighet til å håndtere høymeldingshastigheter. Gi et programmatisk grensesnitt for handelsapplikasjoner for å motta latensdata, slik at algoritmiske handelsmotorer kan tilpasse seg endrede forhold Korrelere nettverkshendelser med applikasjonshendelser for feilsøkingsformål Latens kan defineres som tidsintervallet mellom når en handelsordre sendes og når samme ordre er bekreftet og handlet på av mottakeren. Å adressere latensproblemet er et komplekst problem som krever en helhetlig tilnærming som identifiserer alle kilder til latens og bruker ulike teknologier på forskjellige lag i systemet. Figur 5 viser variasjonen av komponenter som kan introdusere latens ved hvert lag av OSI-stakken. Den kartlegger også hver latent kilde med en mulig løsning og en overvåkingsløsning. Denne lagdelte tilnærmingen kan gi bedrifter en mer strukturert måte å angripe latensproblemet på, hvor hver komponent kan betraktes som en tjeneste og behandles konsekvent på tvers av firmaet. Opprettholde et nøyaktig mål på den dynamiske tilstanden til dette tidsintervallet over alternative ruter og destinasjoner kan være til stor hjelp i taktiske handelsbeslutninger. Evnen til å identifisere den eksakte plasseringen av forsinkelser, enten i kundens kantenettverk, sentralbehandlingsnav eller transaksjonsapplikasjonsnivå, bestemmer vesentlig tjenesteleverandørens evne til å oppfylle deres avtaler om handelstjenesteavtaler (SLAer). For buy-side og salgs-side skjemaer, samt for markedsdata syndikatorer, gir hurtig identifikasjon og fjerning av flaskehalser direkte til økte handelsmuligheter og inntekter. Figur 5 Latency Management Architecture Cisco Low-Latency-overvåkingsverktøy Tradisjonelle nettverksovervåkingsverktøy opererer med minutter eller sekunder granularitet. Neste generasjons handelsplattformer, spesielt de som støtter algoritmisk handel, krever latens mindre enn 5 ms og ekstremt lave nivåer av tap av tap. På et Gigabit LAN kan en 100 ms mikroburst føre til at 10 000 transaksjoner går tapt eller for høyt forsinket. Cisco tilbyr sine kunder et utvalg av verktøy for å måle latens i et handelsmiljø: Båndbreddekvalitetsbehandling (BQM) (OEM fra Corvil) Cisco AON-basert FSMS-båndbreddekvalitetshåndterer Båndbreddekvalitetsbehandling (BQM) 4.0 er et neste generasjons nettverksapplikasjonsresultatstyringsprodukt som gjør at kundene kan overvåke og levere sitt nettverk for kontrollerte nivåer av latens og tapytelse. Selv om BQM ikke er utelukkende rettet mot handelsnettverk, gjør mikrosekundens synlighet kombinert med intelligente båndbreddeinnretninger det ideelt for disse krevende miljøene. Cisco BQM 4.0 implementerer et bredt sett av patenterte og patentanmeldte trafikkmålinger og nettverksanalyseteknologier som gir brukeren enestående synlighet og forståelse for hvordan man optimaliserer nettverket for maksimal applikasjonsytelse. Cisco BQM støttes nå på produktfamilien til Cisco Application Deployment Engine (ADE). Cisco ADE-produktfamilien er den plattformen du velger for Cisco-nettverksadministrasjonsprogrammer. BQM-fordeler Cisco BQM-mikrosynlighet er evnen til å oppdage, måle og analysere latency, jitter og tap som induserer trafikkhendelser ned til mikrosekundnivåer av granularitet med per pakkeoppløsning. Dette gjør det mulig for Cisco BQM å oppdage og bestemme virkningen av trafikkhendelser på nettverksforsinkelse, jitter og tap. Kritisk for handelsmiljøer er at BQM kan støtte latency, loss og jitter målinger enveis for både TCP og UDP (multicast) trafikk. Dette betyr at det rapporteres sømløst for både trading trafikk og markedsdata feeds. BQM tillater brukeren å spesifisere et omfattende sett med terskler (mot mikroburstaktivitet, latens, tap, jitter, utnyttelse, etc.) på alle grensesnitt. BQM driver deretter en bakgrunnsrullende pakkeopptak. Når en terskelbrudd eller annen potensiell ytelsesforstyrrelse oppstår, utløser det Cisco BQM for å lagre pakkeopptaket til disk for senere analyse. Dette gjør det mulig for brukeren å undersøke i detalj både applikasjonstrafikken som ble påvirket av ytelsesforringelse (quotthe victimsquot) og trafikken som forårsaket ytelsesforringelsen (quotthe culpritsquot). Dette kan redusere tiden brukt til å diagnostisere og løse problemer med nettverksytelse. BQM er også i stand til å gi detaljert båndbredde og kvalitet på service (QoS) policyrådgivningsanbefalinger, som brukeren direkte kan søke for å oppnå ønsket nettverksytelse. BQM målinger illustrert For å forstå forskjellen mellom noen av de mer konvensjonelle målingsteknikkene og synligheten som BQM gir, kan vi se på noen sammenligningsgrafer. I det første settet av grafer (Figur 6 og Figur 7) ser vi forskjellen mellom latens målt ved BQMs Passive Network Quality Monitor (PNQM) og latens målt ved å injisere pingpakker hvert 1. sekund inn i trafikkstrømmen. På figur 6. ser vi latency rapportert av 1 sekunders ICMP pingpakker for ekte nettverkstrafikk (den er delt med 2 for å gi et estimat for enveisforsinkelsen). Den viser forsinkelsen komfortabelt under ca. 5 ms for nesten hele tiden. Figur 6 Latency Rapportert av 1-sekunders ICMP Ping-pakker for ekte nettverkstrafikk På figur 7. ser vi latens rapportert av PNQM for samme trafikk samtidig. Her ser vi at ved å måle enveis latens av de faktiske applikasjonspakker, får vi et radikalt annet bilde. Her ser latensen seg å svinge rundt 20 ms, med sporadiske sprekker langt høyere. Forklaringen er at fordi ping bare sender pakker hvert sekund, er det helt mangler det meste av søknadstrafikken. Faktisk viser pingresultater typisk bare omgående forsinkelsesforsinkelse i stedet for realistisk søknadslatelse over hele nettverket. Figur 7 Latency Rapportert av PNQM for ekte nettverkstrafikk I det andre eksempelet (figur 8) ser vi forskjellen i rapporterte linkbelastning eller metningsnivå mellom en 5-minutters gjennomsnittsvisning og en 5 ms mikroburstvisning (BQM kan rapportere om mikroborster ned til omtrent 10-100 nanosekunds nøyaktighet). Den grønne linjen viser gjennomsnittlig utnyttelse på 5-minutters gjennomsnitt for å være lav, kanskje opptil 5 Mbitss. Den mørkeblå plottet viser 5ms mikroburstaktiviteten som nå mellom 75 Mbitss og 100 Mbitss, LAN-hastigheten effektivt. BQM viser dette nivået for granularitet for alle applikasjoner, og det gir også klare bestemmelser for å gjøre det mulig for brukeren å kontrollere eller nøytralisere disse mikrobristene. Figur 8 Forskjell i rapportert lenkebelastning mellom en 5-minutters gjennomsnittlig visning og en 5 ms Microburst View-BQM-distribusjon i handelsnettverket Figur 9 viser en typisk BQM-distribusjon i et handelsnettverk. Figur 9 Typisk BQM-distribusjon i et handelsnettverk BQM kan da brukes til å svare på disse typer spørsmål: Er noen av mine Gigabit LAN-kjernekoblinger mettet i mer enn X millisekunder. Er dette forårsaker tap Hvilke koblinger vil mest ha nytte av en oppgradering til Etherchannel eller 10 Gigabit-hastigheter Hvilken applikasjonstrafikk forårsaker metning av mine 1 Gigabit-koblinger Er noen av markedsdataene opplever end-to-end-tap Hvor mye ekstra ventetid gjør failover datasenteropplevelsen Er denne linken riktig dimensjonert for å håndtere mikroburst Er mine forhandlere får lav latens oppdateringer fra markedet data distribusjonslaget Er de se noen forsinkelser større enn X millisekunder Å kunne svare på disse spørsmålene sparer både tid og penger i å drive handelsnettverket. BQM er et viktig verktøy for å få synlighet i markedsdata og handelsmiljøer. Det gir granulære end-to-end latensmålinger i komplekse infrastrukturer som opplever stor volumdatabevegelse. Effektivt å detektere mikroburst i under millisekundnivå og motta ekspertanalyse på en bestemt hendelse er uvurderlig for handelsgulvarkitekter. Anbefalingsanbefalinger for smart båndbredde, for eksempel dimensjonering og hva-om-analyse, gir større smidighet for å reagere på volatile markedsforhold. Da eksplosjonen av algoritmisk handel og økende meldingsrater fortsetter, gir BQM, kombinert med QoS-verktøyet, muligheten til å implementere QoS-politikk som kan beskytte kritiske handelsapplikasjoner. Cisco Financial Services Latency Monitoring Solution Cisco og Trading Metrics har samarbeidet om latency overvåking løsninger for FIX bestillingsflyt og markedsdata overvåking. Cisco AON-teknologi er grunnlaget for en ny klasse av nettverksbaserte produkter og løsninger som hjelper til med å slå sammen intelligente nettverk med applikasjonsinfrastruktur, basert på enten serviceorienterte eller tradisjonelle arkitekturer. Trading Metrics er en ledende leverandør av analytics programvare for nettverksinfrastruktur og applikasjons latens overvåking formål (tradingmetrics). Cisco AON Financial Services Latency Monitoring Solution (FSMS) korrelerte to typer hendelser ved observasjonspunktet: Nettverkshendelser korrelert direkte med sammenfallende applikasjonsmeldingshåndtering Handelsordestrøm og matchende markedsoppdateringshendelser Ved hjelp av tidsstemmer hevdet ved fangstpunktet i nettverk, real-time analyse av disse korrelerte datastrømmene tillater nøyaktig identifisering av flaskehalser på tvers av infrastrukturen mens en handel utføres eller markedsdata distribueres. Ved å overvåke og måle latens tidlig i syklusen, kan finansielle selskaper ta bedre beslutninger om hvilket nettverkstjeneste og hvilken formidler, marked eller motpartyto velger for ruting av handelsordrer. På samme måte gir denne kunnskapen mer strømlinjeformet tilgang til oppdaterte markedsdata (aksjekurser, økonomiske nyheter, etc.), som er et viktig utgangspunkt for å starte, trekke seg fra eller forfølge markedsmuligheter. Komponentene i løsningen er: AON-maskinvare i tre formfaktorer: AON-nettverksmodul for Cisco 2600280037003800-rutere AON Blade for Cisco Catalyst 6500-serien AON 8340 Appliance Trading Metrics MampA 2.0-programvaren, som gir overvåkings - og varslingsprogrammet, viser latensgrafer på et dashbord, og utsteder varsler når avvik skjer (tradingmetricsTMbrochure. pdf). Figur 10 AON-basert FIX-latensovervåking Cisco IP SLA Cisco IP SLA er et innebygd nettverksadministrasjonsverktøy i Cisco IOS som gjør det mulig for rutere og brytere å generere syntetiske trafikkstrømmer som kan måles for latens, jitter, pakktap og andre kriterier (ciscogoipsla ). To sentrale begreper er kilden til den genererte trafikken og målet. Begge disse driver en IP SLA quotresponder, citer som har ansvaret for å tidsstempelere kontrolltrafikken før den hentes og returneres av målet (for en rundtursmåling). Ulike typer trafikk kan hentes i IP SLA, og de er rettet mot ulike beregninger og målrettes mot ulike tjenester og applikasjoner. UDP-jitter-operasjonen brukes til å måle enveis - og returforsinkelse og rapportere variasjoner. Siden trafikken er tidsstemplet på både sende - og mål-enheter som bruker responder-funksjonen, er omdrejningsforsinkelsen karakterisert som deltaet mellom de to tidsstemplene. En ny funksjon ble introdusert i IOS 12.3 (14) T, IP SLA Sub Millisecond Reporting, som gjør at tidsstempler kan vises med en oppløsning i mikrosekunder, og gir dermed et nivå av granularitet som ikke tidligere er tilgjengelig. Denne nye funksjonen har nå gjort IP SLA relevant for campusnettverk, hvor nettverksforsinkelsen vanligvis ligger i området 300-800 mikrosekunder, og evnen til å oppdage trender og pigger (korte trender) basert på mikrosekundgrannhetsteller er et krav for at kunder er engasjert i tid - følsomme elektroniske handelsmiljøer. Som et resultat blir IP SLA nå vurdert av betydelige antall finansielle organisasjoner, da de alle står overfor krav til: Rapporter baseline latens til brukerne. Trend baseline latens over tid Reager raskt på trafikkspor som forårsaker endringer i rapportert latens Sub - millisekondrapportering er nødvendig for disse kundene, siden mange campus og backbones leverer for øyeblikket under et sekund av latens på flere bryterhopper. Elektroniske handelsmiljøer har generelt arbeidet for å eliminere eller minimere alle områder av enhet og nettverksforsinkelse for å levere rask ordreutvikling til virksomheten. Rapportering om at nettverksresponstider er angitt under en millisekund, er ikke lenger tilstrekkelig, korrelasjonen til latensmålinger rapportert over et nettverkssegment eller ryggrad må være nærmere 300-800 mikrosekunder med en grad av oppløsning på 100 igrave sekunder. IP SLA har nylig lagt til støtte for IP multicast teststrømmer, som kan måle markedsdata latens. En typisk nettverkstopologi er vist i Figur 11 med IP SLA-skygge rutere, kilder og respondenter. Figur 11 IP SLA Distribution Computing Services Computing-tjenester dekker et bredt spekter av teknologier med det formål å eliminere minne og CPU-flaskehalser opprettet ved behandling av nettverkspakker. Handelsapplikasjoner bruker høye volumer av markedsdata, og serverne må bruke ressurser til å behandle nettverkstrafikk i stedet for applikasjonsbehandling. TransportbehandlingAt høye hastigheter kan nettverkspakkebehandling forbruke en betydelig mengde server-CPU-sykluser og minne. En etablert tommelfingerregel fastslår at 1 Gbps av nettverksbåndbredde krever 1 GHz prosessorkapasitet (kilde Intel-hvitt papir på IO-akselerasjon inteltechnologyioacceleration306517.pdf). Mellomliggende bufferkopiering I en konvensjonell nettverksstablettimplementering må data kopieres av CPUen mellom nettverksbuffere og applikasjonsbuffere. Denne overhead forverres av det faktum at minnehastigheter ikke har holdt opp med økning i CPU-hastigheter. For eksempel nærmer prosessorer som Intel Xeon 4 GHz, mens RAM-sjetonger svinger rundt 400 MHz (for DDR 3200-minne) (kilde Intel inteltechnologyioacceleration306517.pdf). Kontekstbytting Hver gang en individuell pakke må behandles, utfører CPU en kontekstbryter fra applikasjonskontekst til nettverkstrafikksammenheng. Denne overhead kan reduseres hvis bryteren bare ville oppstå når hele programbufferen er fullført. Figur 12 Kilder til overhead i datasenter-servere TCP Offload Engine (TOE) Avlaster transportprosessor-sykluser til NIC. Flytter TCPIP-protokollstakkbufferkopier fra systemminne til NIC-minne. Remote Direct Memory Access (RDMA) Gjør det mulig for et nettverkskort å overføre data direkte fra applikasjon til applikasjon uten å involvere operativsystemet. Eliminerer mellomliggende og applikasjonsbufferkopier (minnebåndbreddeforbruk). Kernel bypass Direkte tilgang til maskinvare på brukernivå. Dramatisk reduserer applikasjons kontekst brytere. Figur 13 RDMA og Kernel Bypass InfiniBand er en punkt-til-punkt (byttet stoff) toveis seriell kommunikasjonslink som implementerer RDMA, blant andre funksjoner. Cisco tilbyr en InfiniBand-bryter, Server Fabric Switch (SFS): ciscoapplicationpdfenusguestnetsolns500c643cdccont0900aecd804c35cb. pdf. Figur 14 Typiske SFS Distribution Trading-applikasjoner drar nytte av reduksjonen i latens - og latensvariabilitet, som vist ved en test utført med Cisco SFS og Wombat Feed Handlers av Stac Research: Application Virtualization Service Dekobling programmet fra den underliggende operativsystemet og serveren gjør det mulig for dem å kjøre som nettverkstjenester. En applikasjon kan kjøres parallelt på flere servere, eller flere applikasjoner kan kjøres på samme server, som den beste ressursfordelingen dikterer. Denne avkoblingen muliggjør bedre belastningsbalansering og katastrofegjenoppretting for forretningsstrategier. Prosessen med å omfordele databehandlingsressurser til et program er dynamisk. Ved hjelp av et applikasjonsvirtualiseringssystem som Data Synapses GridServer, kan applikasjoner migrere, ved hjelp av forhåndsdefinerte retningslinjer, til underutnyttede servere i en prosess for forsyningspolicyer (wwwworkworldsupp2005ndc1022105virtual. htmlpage2). Det er mange forretningsmessige fordeler for finansielle firmaer som vedtar applikasjonsvirtualisering: Raskere tid til markedet for nye produkter og tjenester Raskere integrering av bedrifter etter fusjon og oppkjøpsaktivitet Økt tilgang til applikasjoner Bedre arbeidsbelastningsfordeling, noe som skaper flere kvotepenger for prosessering av spikes i handelsvolum Operasjonelle effektivitet og kontroll Reduksjon i IT-kompleksitet For øyeblikket brukes ikke applikasjonsvirtualisering i handelsfronten. En brukstilstand er risikomodellering, som Monte Carlo-simuleringer. Som teknologien utvikler seg, er det tenkelig at noen handelsplattformene vil vedta det. Data Virtualiseringstjeneste For å effektivt dele ressurser på tvers av distribuerte bedriftsapplikasjoner, må bedriftene kunne utnytte data over flere kilder i sanntid samtidig som data integritet sikres. Med løsninger fra data virtualiseringsprogramvareleverandører som Edelsten eller Tangosol (nå Oracle), kan finansielle firmaer få tilgang til heterogene datakilder som et enkelt systembilde som muliggjør tilkobling mellom forretningsprosesser og ubegrenset tilgang til distribuert caching. Nettoresultatet er at alle brukere har umiddelbar tilgang til disse dataressene på tvers av et distribuert nettverk (gridtoday030210101061.html). Dette kalles et datanettverk og er det første trinnet i å skape hva Gartner kaller ekstremtransaksjonsbehandling (XTP) (gartnerDisplayDocumentrefgsearchampid500947). Teknologier som data - og applikasjonsvirtualisering gjør det mulig for finansielle bedrifter å utføre kompliserte analyser i sanntid, hendelsesdrevne applikasjoner og dynamisk ressursallokering. One example of data virtualization in action is a global order book application. An order book is the repository of active orders that is published by the exchange or other market makers. A global order book aggregates orders from around the world from markets that operate independently. The biggest challenge for the application is scalability over WAN connectivity because it has to maintain state. Todays data grids are localized in data centers connected by Metro Area Networks (MAN). This is mainly because the applications themselves have limitsthey have been developed without the WAN in mind. Figure 15 GemStone GemFire Distributed Caching Before data virtualization, applications used database clustering for failover and scalability. This solution is limited by the performance of the underlying database. Failover is slower because the data is committed to disc. With data grids, the data which is part of the active state is cached in memory, which reduces drastically the failover time. Scaling the data grid means just adding more distributed resources, providing a more deterministic performance compared to a database cluster. Multicast Service Market data delivery is a perfect example of an application that needs to deliver the same data stream to hundreds and potentially thousands of end users. Market data services have been implemented with TCP or UDP broadcast as the network layer, but those implementations have limited scalability. Using TCP requires a separate socket and sliding window on the server for each recipient. UDP broadcast requires a separate copy of the stream for each destination subnet. Both of these methods exhaust the resources of the servers and the network. The server side must transmit and service each of the streams individually, which requires larger and larger server farms. On the network side, the required bandwidth for the application increases in a linear fashion. For example, to send a 1 Mbps stream to 1000recipients using TCP requires 1 Gbps of bandwidth. IP multicast is the only way to scale market data delivery. To deliver a 1 Mbps stream to 1000 recipients, IP multicast would require 1 Mbps. The stream can be delivered by as few as two serversone primary and one backup for redundancy. There are two main phases of market data delivery to the end user. In the first phase, the data stream must be brought from the exchange into the brokerages network. Typically the feeds are terminated in a data center on the customer premise. The feeds are then processed by a feed handler, which may normalize the data stream into a common format and then republish into the application messaging servers in the data center. The second phase involves injecting the data stream into the application messaging bus which feeds the core infrastructure of the trading applications. The large brokerage houses have thousands of applications that use the market data streams for various purposes, such as live trades, long term trending, arbitrage, etc. Many of these applications listen to the feeds and then republish their own analytical and derivative information. For example, a brokerage may compare the prices of CSCO to the option prices of CSCO on another exchange and then publish ratings which a different application may monitor to determine how much they are out of synchronization. Figure 16 Market Data Distribution Players The delivery of these data streams is typically over a reliable multicast transport protocol, traditionally Tibco Rendezvous. Tibco RV operates in a publish and subscribe environment. Each financial instrument is given a subject name, such as CSCO. last. Each application server can request the individual instruments of interest by their subject name and receive just a that subset of the information. This is called subject-based forwarding or filtering. Subject-based filtering is patented by Tibco. A distinction should be made between the first and second phases of market data delivery. The delivery of market data from the exchange to the brokerage is mostly a one-to-many application. The only exception to the unidirectional nature of market data may be retransmission requests, which are usually sent using unicast. The trading applications, however, are definitely many-to-many applications and may interact with the exchanges to place orders. Figure 17 Market Data Architecture Design Issues Number of GroupsChannels to Use Many application developers consider using thousand of multicast groups to give them the ability to divide up products or instruments into small buckets. Normally these applications send many small messages as part of their information bus. Usually several messages are sent in each packet that are received by many users. Sending fewer messages in each packet increases the overhead necessary for each message. In the extreme case, sending only one message in each packet quickly reaches the point of diminishing returnsthere is more overhead sent than actual data. Application developers must find a reasonable compromise between the number of groups and breaking up their products into logical buckets. Consider, for example, the Nasdaq Quotation Dissemination Service (NQDS). The instruments are broken up alphabetically: This approach allows for straight forward networkapplication management, but does not necessarily allow for optimized bandwidth utilization for most users. A user of NQDS that is interested in technology stocks, and would like to subscribe to just CSCO and INTL, would have to pull down all the data for the first two groups of NQDS. Understanding the way users pull down the data and then organize it into appropriate logical groups optimizes the bandwidth for each user. In many market data applications, optimizing the data organization would be of limited value. Typically customers bring in all data into a few machines and filter the instruments. Using more groups is just more overhead for the stack and does not help the customers conserve bandwidth. Another approach might be to keep the groups down to a minimum level and use UDP port numbers to further differentiate if necessary. The other extreme would be to use just one multicast group for the entire application and then have the end user filter the data. In some situations this may be sufficient. Intermittent Sources A common issue with market data applications are servers that send data to a multicast group and then go silent for more than 3.5 minutes. These intermittent sources may cause trashing of state on the network and can introduce packet loss during the window of time when soft state and then hardware shorts are being created. PIM-Bidir or PIM-SSM The first and best solution for intermittent sources is to use PIM-Bidir for many-to-many applications and PIM-SSM for one-to-many applications. Both of these optimizations of the PIM protocol do not have any data-driven events in creating forwarding state. That means that as long as the receivers are subscribed to the streams, the network has the forwarding state created in the hardware switching path. Intermittent sources are not an issue with PIM-Bidir and PIM-SSM. Null Packets In PIM-SM environments a common method to make sure forwarding state is created is to send a burst of null packets to the multicast group before the actual data stream. The application must efficiently ignore these null data packets to ensure it does not affect performance. The sources must only send the burst of packets if they have been silent for more than 3 minutes. A good practice is to send the burst if the source is silent for more than a minute. Many financials send out an initial burst of traffic in the morning and then all well-behaved sources do not have problems. Periodic Keepalives or Heartbeats An alternative approach for PIM-SM environments is for sources to send periodic heartbeat messages to the multicast groups. This is a similar approach to the null packets, but the packets can be sent on a regular timer so that the forwarding state never expires. S, G Expiry Timer Finally, Cisco has made a modification to the operation of the S, G expiry timer in IOS. There is now a CLI knob to allow the state for a S, G to stay alive for hours without any traffic being sent. The (S, G) expiry timer is configurable. This approach should be considered a workaround until PIM-Bidir or PIM-SSM is deployed or the application is fixed. RTCP Feedback A common issue with real time voice and video applications that use RTP is the use of RTCP feedback traffic. Unnecessary use of the feedback option can create excessive multicast state in the network. If the RTCP traffic is not required by the application it should be avoided. Fast Producers and Slow Consumers Today many servers providing market data are attached at Gigabit speeds, while the receivers are attached at different speeds, usually 100Mbps. This creates the potential for receivers to drop packets and request re-transmissions, which creates more traffic that the slowest consumers cannot handle, continuing the vicious circle. The solution needs to be some type of access control in the application that limits the amount of data that one host can request. QoS and other network functions can mitigate the problem, but ultimately the subscriptions need to be managed in the application. Tibco Heartbeats TibcoRV has had the ability to use IP multicast for the heartbeat between the TICs for many years. However, there are some brokerage houses that are still using very old versions of TibcoRV that use UDP broadcast support for the resiliency. This limitation is often cited as a reason to maintain a Layer 2 infrastructure between TICs located in different data centers. These older versions of TibcoRV should be phased out in favor of the IP multicast supported versions. Multicast Forwarding Options PIM Sparse Mode The standard IP multicast forwarding protocol used today for market data delivery is PIM Sparse Mode. It is supported on all Cisco routers and switches and is well understood. PIM-SM can be used in all the network components from the exchange, FSP, and brokerage. There are, however, some long-standing issues and unnecessary complexity associated with a PIM-SM deployment that could be avoided by using PIM-Bidir and PIM-SSM. These are covered in the next sections. The main components of the PIM-SM implementation are: PIM Sparse Mode v2 Shared Tree (spt-threshold infinity) A design option in the brokerage or in the exchange. sirengus nam, ar i darb eigoje, danai mintys pradeda suktis apie kiemo aplink. Keletas landafto architekts patarim kaip aplink susiplanuoti patiems. Prie pradedant galvoti apie glynus arba alpinariumus, svarbiausia yra pirmi ingsniai tai funkcinis teritorijos planavimas. Nesuskirsius teritorijos tinkamas zonas, augalai pasodinami ten, kur j visai nereikia, er iltnamis pastatomas toje vietoje, kur jis Skaityti daugiau. Tlf. 370 608 16327 El. p. Infoskraidantikamera. lt Interneto svetain: skraidantikamera. lt Socialiniai tinklai: facebook paskyra Apraymas: Filmuojame 8211 fotografuojame i 70 8211 100 meter høyere enn i verdensklasse. Sukuriame HD raikos nuotraukas ir video siuetus. Silom pasli, sod, mik, medelyn apiros nuotraukas i aukio. Daugiau ms darb pavyzdi rasite interneto Skaityti daugiau. Profesionalios technins, sodo arnos (make kaina) PVC-lamper: PVC, slitesterk slitasje, slitasje slitasje og polysterio-silisiumsprayer med ultravioletinamenter. Spindulamper med kokosprøytemidler er 58 skersmens, 16 mm, 8211 kaina 0,90 Lm 34 skersmens, 19 mm. 8211 kaina 1,20 lm 1 kol. skersmens, 25 mm, 8211 kaina 2,30 Ltm Profesionalios PVC auktos kokybs Skaityti daugiau. The LMAX Architecture Over the last few years we keep hearing that the free lunch is over1 - we cant expect increases in individual CPU speed. So to write fast code we need to explicitly use multiple processors with concurrent software. This is not good news - writing concurrent code is very hard. Locks and semaphores are hard to reason about and hard to test - meaning we are spending more time worrying about satisfying the computer than we are solving the domain problem. Various concurrency models, such as Actors and Software Transactional Memory, aim to make this easier - but there is still a burden that introduces bugs and complexity. So I was fascinated to hear about a talk at QCon London in March last year from LMAX. LMAX is a new retail financial trading platform. Its business innovation is that it is a retail platform - allowing anyone to trade in a range of financial derivative products2. A trading platform like this needs very low latency - trades have to be processed quickly because the market is moving rapidly. A retail platform adds complexity because it has to do this for lots of people. So the result is more users, with lots of trades, all of which need to be processed quickly.3 Given the shift to multi-core thinking, this kind of demanding performance would naturally suggest an explicitly concurrent programming model - and indeed this was their starting point. But the thing that got peoples attention at QCon was that this wasnt where they ended up. In fact they ended up by doing all the business logic for their platform: all trades, from all customers, in all markets - on a single thread. A thread that will process 6 million orders per second using commodity hardware.4 Processing lots of transactions with low-latency and none of the complexities of concurrent code - how can I resist digging into that Fortunately another difference LMAX has to other financial companies is that they are quite happy to talk about their technological decisions. So now LMAX has been in production for a while its time to explore their fascinating design. Overall Structure Figure 1: LMAXs architecture in three blobs At a top level, the architecture has three parts business logic processor5 input disruptor output disruptors As its name implies, the business logic processor handles all the business logic in the application. As I indicated above, it does this as a single-threaded java program which reacts to method calls and produces output events. Consequently its a simple java program that doesnt require any platform frameworks to run other than the JVM itself, which allows it to be easily run in test environments. Although the Business Logic Processor can run in a simple environment for testing, there is rather more involved choreography to get it to run in a production setting. Input messages need to be taken off a network gateway and unmarshaled, replicated and journaled. Output messages need to be marshaled for the network. These tasks are handled by the input and output disruptors. Unlike the Business Logic Processor, these are concurrent components, since they involve IO operations which are both slow and independent. They were designed and built especially for LMAX, but they (like the overall architecture) are applicable elsewhere. Business Logic Processor Keeping it all in memory The Business Logic Processor takes input messages sequentially (in the form of a method invocation), runs business logic on it, and emits output events. It operates entirely in-memory, there is no database or other persistent store. Keeping all data in-memory has two important benefits. Firstly its fast - theres no database to provide slow IO to access, nor is there any transactional behavior to execute since all the processing is done sequentially. The second advantage is that it simplifies programming - theres no objectrelational mapping to do. All the code can be written using Javas object model without having to make any compromises for the mapping to a database. Using an in-memory structure has an important consequence - what happens if everything crashes Even the most resilient systems are vulnerable to someone pulling the power. The heart of dealing with this is Event Sourcing - which means that the current state of the Business Logic Processor is entirely derivable by processing the input events. As long as the input event stream is kept in a durable store (which is one of the jobs of the input disruptor) you can always recreate the current state of the business logic engine by replaying the events. A good way to understand this is to think of a version control system. Version control systems are a sequence of commits, at any time you can build a working copy by applying those commits. VCSs are more complicated than the Business Logic Processor because they must support branching, while the Business Logic Processor is a simple sequence. So, in theory, you can always rebuild the state of the Business Logic Processor by reprocessing all the events. In practice, however, that would take too long should you need to spin one up. So, just as with version control systems, LMAX can make snapshots of the Business Logic Processor state and restore from the snapshots. They take a snapshot every night during periods of low activity. Restarting the Business Logic Processor is fast, a full restart - including restarting the JVM, loading a recent snapshot, and replaying a days worth of journals - takes less than a minute. Snapshots make starting up a new Business Logic Processor faster, but not quickly enough should a Business Logic Processor crash at 2pm. As a result LMAX keeps multiple Business Logic Processors running all the time6. Each input event is processed by multiple processors, but all but one processor has its output ignored. Should the live processor fail, the system switches to another one. This ability to handle fail-over is another benefit of using Event Sourcing. By event sourcing into replicas they can switch between processors in a matter of micro-seconds. As well as taking snapshots every night, they also restart the Business Logic Processors every night. The replication allows them to do this with no downtime, so they continue to process trades 247. For more background on Event Sourcing, see the draft pattern on my site from a few years ago. The article is more focused on handling temporal relationships rather than the benefits that LMAX use, but it does explain the core idea. Event Sourcing is valuable because it allows the processor to run entirely in-memory, but it has another considerable advantage for diagnostics. If some unexpected behavior occurs, the team copies the sequence of events to their development environment and replays them there. This allows them to examine what happened much more easily than is possible in most environments. This diagnostic capability extends to business diagnostics. There are some business tasks, such as in risk management, that require significant computation that isnt needed for processing orders. An example is getting a list of the top 20 customers by risk profile based on their current trading positions. The team handles this by spinning up a replicate domain model and carrying out the computation there, where it wont interfere with the core order processing. These analysis domain models can have variant data models, keep different data sets in memory, and run on different machines. Tuning performance So far Ive explained that the key to the speed of the Business Logic Processor is doing everything sequentially, in-memory. Just doing this (and nothing really stupid) allows developers to write code that can process 10K TPS7. They then found that concentrating on the simple elements of good code could bring this up into the 100K TPS range. This just needs well-factored code and small methods - essentially this allows Hotspot to do a better job of optimizing and for CPUs to be more efficient in caching the code as its running. It took a bit more cleverness to go up another order of magnitude. There are several things that the LMAX team found helpful to get there. One was to write custom implementations of the java collections that were designed to be cache-friendly and careful with garbage8. An example of this is using primitive java longs as hashmap keys with a specially written array backed Map implementation ( LongToObjectHashMap ). In general theyve found that choice of data structures often makes a big difference, Most programmers just grab whatever List they used last time rather than thinking which implementation is the right one for this context.9 Another technique to reach that top level of performance is putting attention into performance testing. Ive long noticed that people talk a lot about techniques to improve performance, but the one thing that really makes a difference is to test it. Even good programmers are very good at constructing performance arguments that end up being wrong, so the best programmers prefer profilers and test cases to speculation.10 The LMAX team has also found that writing tests first is a very effective discipline for performance tests. Programming Model This style of processing does introduce some constraints into the way you write and organize the business logic. The first of these is that you have to tease out any interaction with external services. An external service call is going to be slow, and with a single thread will halt the entire order processing machine. As a result you cant make calls to external services within the business logic. Instead you need to finish that interaction with an output event, and wait for another input event to pick it back up again. Ill use a simple non-LMAX example to illustrate. Imagine you are making an order for jelly beans by credit card. A simple retailing system would take your order information, use a credit card validation service to check your credit card number, and then confirm your order - all within a single operation. The thread processing your order would block while waiting for the credit card to be checked, but that block wouldnt be very long for the user, and the server can always run another thread on the processor while its waiting. In the LMAX architecture, you would split this operation into two. The first operation would capture the order information and finish by outputting an event (credit card validation requested) to the credit card company. The Business Logic Processor would then carry on processing events for other customers until it received a credit-card-validated event in its input event stream. On processing that event it would carry out the confirmation tasks for that order. Working in this kind of event-driven, asynchronous style, is somewhat unusual - although using asynchrony to improve the responsiveness of an application is a familiar technique. It also helps the business process be more resilient, as you have to be more explicit in thinking about the different things that can happen with the remote application. A second feature of the programming model lies in error handling. The traditional model of sessions and database transactions provides a helpful error handling capability. Should anything go wrong, its easy to throw away everything that happened so far in the interaction. Session data is transient, and can be discarded, at the cost of some irritation to the user if in the middle of something complicated. If an error occurs on the database side you can rollback the transaction. LMAXs in-memory structures are persistent across input events, so if there is an error its important to not leave that memory in an inconsistent state. However theres no automated rollback facility. As a consequence the LMAX team puts a lot of attention into ensuring the input events are fully valid before doing any mutation of the in-memory persistent state. They have found that testing is a key tool in flushing out these kinds of problems before going into production. Input and Output Disruptors Although the business logic occurs in a single thread, there are a number tasks to be done before we can invoke a business object method. The original input for processing comes off the wire in the form of a message, this message needs to be unmarshaled into a form convenient for Business Logic Processor to use. Event Sourcing relies on keeping a durable journal of all the input events, so each input message needs to be journaled onto a durable store. Finally the architecture relies on a cluster of Business Logic Processors, so we have to replicate the input messages across this cluster. Similarly on the output side, the output events need to be marshaled for transmission over the network. Figure 2: The activities done by the input disruptor (using UML activity diagram notation) The replicator and journaler involve IO and therefore are relatively slow. After all the central idea of Business Logic Processor is that it avoids doing any IO. Also these three tasks are relatively independent, all of them need to be done before the Business Logic Processor works on a message, but they can done in any order. So unlike with the Business Logic Processor, where each trade changes the market for subsequent trades, there is a natural fit for concurrency. To handle this concurrency the LMAX team developed a special concurrency component, which they call a Disruptor 11 . The LMAX team have released the source code for the Disruptor with an open source licence. At a crude level you can think of a Disruptor as a multicast graph of queues where producers put objects on it that are sent to all the consumers for parallel consumption through separate downstream queues. When you look inside you see that this network of queues is really a single data structure - a ring buffer. Each producer and consumer has a sequence counter to indicate which slot in the buffer its currently working on. Each producerconsumer writes its own sequence counter but can read the others sequence counters. This way the producer can read the consumers counters to ensure the slot it wants to write in is available without any locks on the counters. Similarly a consumer can ensure it only processes messages once another consumer is done with it by watching the counters. Figure 3: The input disruptor coordinates one producer and four consumers Output disruptors are similar but they only have two sequential consumers for marshaling and output.12 Output events are organized into several topics, so that messages can be sent to only the receivers who are interested in them. Each topic has its own disruptor. The disruptors Ive described are used in a style with one producer and multiple consumers, but this isnt a limitation of the design of the disruptor. The disruptor can work with multiple producers too, in this case it still doesnt need locks.13 A benefit of the disruptor design is that it makes it easier for consumers to catch up quickly if they run into a problem and fall behind. If the unmarshaler has a problem when processing on slot 15 and returns when the receiver is on slot 31, it can read data from slots 16-30 in one batch to catch up. This batch read of the data from the disruptor makes it easier for lagging consumers to catch up quickly, thus reducing overall latency. Ive described things here, with one each of the journaler, replicator, and unmarshaler - this indeed is what LMAX does. But the design would allow multiple of these components to run. If you ran two journalers then one would take the even slots and the other journaler would take the odd slots. This allows further concurrency of these IO operations should this become necessary. The ring buffers are large: 20 million slots for input buffer and 4 million slots for each of the output buffers. The sequence counters are 64bit long integers that increase monotonically even as the ring slots wrap.14 The buffer is set to a size thats a power of two so the compiler can do an efficient modulus operation to map from the sequence counter number to the slot number. Like the rest of the system, the disruptors are bounced overnight. This bounce is mainly done to wipe memory so that there is less chance of an expensive garbage collection event during trading. (I also think its a good habit to regularly restart, so that you rehearse how to do it for emergencies.) The journalers job is to store all the events in a durable form, so that they can be replayed should anything go wrong. LMAX does not use a database for this, just the file system. They stream the events onto the disk. In modern terms, mechanical disks are horribly slow for random access, but very fast for streaming - hence the tag-line disk is the new tape.15 Earlier on I mentioned that LMAX runs multiple copies of its system in a cluster to support rapid failover. The replicator keeps these nodes in sync. All communication in LMAX uses IP multicasting, so clients dont need to know which IP address is the master node. Only the master node listens directly to input events and runs a replicator. The replicator broadcasts the input events to the slave nodes. Should the master node go down, its lack of heartbeat will be noticed, another node becomes master, starts processing input events, and starts its replicator. Each node has its own input disruptor and thus has its own journal and does its own unmarshaling. Even with IP multicasting, replication is still needed because IP messages can arrive in a different order on different nodes. The master node provides a deterministic sequence for the rest of the processing. The unmarshaler turns the event data from the wire into a java object that can be used to invoke behavior on the Business Logic Processor. Therefore, unlike the other consumers, it needs to modify the data in the ring buffer so it can store this unmarshaled object. The rule here is that consumers are permitted to write to the ring buffer, but each writable field can only have one parallel consumer thats allowed to write to it. This preserves the principle of only having a single writer. 16 Figure 4: The LMAX architecture with the disruptors expanded The disruptor is a general purpose component that can be used outside of the LMAX system. Usually financial companies are very secretive about their systems, keeping quiet even about items that arent germane to their business. Not just has LMAX been open about its overall architecture, they have open-sourced the disruptor code - an act that makes me very happy. Not just will this allow other organizations to make use of the disruptor, it will also allow for more testing of its concurrency properties. Queues and their lack of mechanical sympathy The LMAX architecture caught peoples attention because its a very different way of approaching a high performance system to what most people are thinking about. So far Ive talked about how it works, but havent delved too much into why it was developed this way. This tale is interesting in itself, because this architecture didnt just appear. It took a long time of trying more conventional alternatives, and realizing where they were flawed, before the team settled on this one. Most business systems these days have a core architecture that relies on multiple active sessions coordinated through a transactional database. The LMAX team were familiar with this approach, and confident that it wouldnt work for LMAX. This assessment was founded in the experiences of Betfair - the parent company who set up LMAX. Betfair is a betting site that allows people to bet on sporting events. It handles very high volumes of traffic with a lot of contention - sports bets tend to burst around particular events. To make this work they have one of the hottest database installations around and have had to do many unnatural acts in order to make it work. Based on this experience they knew how difficult it was to maintain Betfairs performance and were sure that this kind of architecture would not work for the very low latency that a trading site would require. As a result they had to find a different approach. Their initial approach was to follow what so many are saying these days - that to get high performance you need to use explicit concurrency. For this scenario, this means allowing orders to be processed by multiple threads in parallel. However, as is often the case with concurrency, the difficulty comes because these threads have to communicate with each other. Processing an order changes market conditions and these conditions need to be communicated. The approach they explored early on was the Actor model and its cousin SEDA. The Actor model relies on independent, active objects with their own thread that communicate with each other via queues. Many people find this kind of concurrency model much easier to deal with than trying to do something based on locking primitives. The team built a prototype exchange using the actor model and did performance tests on it. What they found was that the processors spent more time managing queues than doing the real logic of the application. Queue access was a bottleneck. When pushing performance like this, it starts to become important to take account of the way modern hardware is constructed. The phrase Martin Thompson likes to use is mechanical sympathy. The term comes from race car driving and it reflects the driver having an innate feel for the car, so they are able to feel how to get the best out of it. Many programmers, and I confess I fall into this camp, dont have much mechanical sympathy for how programming interacts with hardware. Whats worse is that many programmers think they have mechanical sympathy, but its built on notions of how hardware used to work that are now many years out of date. One of the dominant factors with modern CPUs that affects latency, is how the CPU interacts with memory. These days going to main memory is a very slow operation in CPU-terms. CPUs have multiple levels of cache, each of which of is significantly faster. So to increase speed you want to get your code and data in those caches. At one level, the actor model helps here. You can think of an actor as its own object that clusters code and data, which is a natural unit for caching. But actors need to communicate, which they do through queues - and the LMAX team observed that its the queues that interfere with caching. The explanation runs like this: in order to put some data on a queue, you need to write to that queue. Similarly, to take data off the queue, you need to write to the queue to perform the removal. This is write contention - more than one client may need to write to the same data structure. To deal with the write contention a queue often uses locks. But if a lock is used, that can cause a context switch to the kernel. When this happens the processor involved is likely to lose the data in its caches. The conclusion they came to was that to get the best caching behavior, you need a design that has only one core writing to any memory location17. Multiple readers are fine, processors often use special high-speed links between their caches. But queues fail the one-writer principle. This analysis led the LMAX team to a couple of conclusions. Firstly it led to the design of the disruptor, which determinedly follows the single-writer constraint. Secondly it led to idea of exploring the single-threaded business logic approach, asking the question of how fast a single thread can go if its freed of concurrency management. The essence of working on a single thread, is to ensure that you have one thread running on one core, the caches warm up, and as much memory access as possible goes to the caches rather than to main memory. This means that both the code and the working set of data needs to be as consistently accessed as possible. Also keeping small objects with code and data together allows them to be swapped between the caches as a unit, simplifying the cache management and again improving performance. An essential part of the path to the LMAX architecture was the use of performance testing. The consideration and abandonment of an actor-based approach came from building and performance testing a prototype. Similarly much of the steps in improving the performance of the various components were enabled by performance tests. Mechanical sympathy is very valuable - it helps to form hypotheses about what improvements you can make, and guides you to forward steps rather than backward ones - but in the end its the testing gives you the convincing evidence. Performance testing in this style, however, is not a well-understood topic. Regularly the LMAX team stresses that coming up with meaningful performance tests is often harder than developing the production code. Again mechanical sympathy is important to developing the right tests. Testing a low level concurrency component is meaningless unless you take into account the caching behavior of the CPU. One particular lesson is the importance of writing tests against null components to ensure the performance test is fast enough to really measure what real components are doing. Writing fast test code is no easier than writing fast production code and its too easy to get false results because the test isnt as fast as the component its trying to measure. Should you use this architecture At first glance, this architecture appears to be for a very small niche. After all the driver that led to it was to be able to run lots of complex transactions with very low latency - most applications dont need to run at 6 million TPS. But the thing that fascinates me about this application, is that they have ended up with a design which removes much of the programming complexity that plagues many software projects. The traditional model of concurrent sessions surrounding a transactional database isnt free of hassles. Theres usually a non-trivial effort that goes into the relationship with the database. Objectrelational mapping tools can help much of the pain of dealing with a database, but it doesnt deal with it all. Most performance tuning of enterprise applications involves futzing around with SQL. These days, you can get more main memory into your servers than us old guys could get as disk space. More and more applications are quite capable of putting all their working set in main memory - thus eliminating a source of both complexity and sluggishness. Event Sourcing provides a way to solve the durability problem for an in-memory system, running everything in a single thread solves the concurrency issue. The LMAX experience suggests that as long as you need less than a few million TPS, youll have enough performance headroom. There is a considerable overlap here with the growing interest in CQRS. An event sourced, in-memory processor is a natural choice for the command-side of a CQRS system. (Although the LMAX team does not currently use CQRS.) So what indicates you shouldnt go down this path This is always a tricky questions for little-known techniques like this, since the profession needs more time to explore its boundaries. A starting point, however, is to think of the characteristics that encourage the architecture. One characteristic is that this is a connected domain where processing one transaction always has the potential to change how following ones are processed. With transactions that are more independent of each other, theres less need to coordinate, so using separate processors running in parallel becomes more attractive. LMAX concentrates on figuring the consequences of how events change the world. Many sites are more about taking an existing store of information and rendering various combinations of that information to as many eyeballs as they can find - eg think of any media site. Here the architectural challenge often centers on getting your caches right. Another characteristic of LMAX is that this is a backend system, so its reasonable to consider how applicable it would be for something acting in an interactive mode. Increasingly web application are helping us get used to server systems that react to requests, an aspect that does fit in well with this architecture. Where this architecture goes further than most such systems is its absolute use of asynchronous communications, resulting in the changes to the programming model that I outlined earlier. These changes will take some getting used to for most teams. Most people tend to think of programming in synchronous terms and are not used to dealing with asynchrony. Yet its long been true that asynchronous communication is an essential tool for responsiveness. It will be interesting to see if the wider use of asynchronous communication in the javascript world, with AJAX and node. js, will encourage more people to investigate this style. The LMAX team found that while it took a bit of time to adjust to asynchronous style, it soon became natural and often easier. In particular error handling was much easier to deal with under this approach. The LMAX team certainly feels that the days of the coordinating transactional database are numbered. The fact that you can write software more easily using this kind of architecture and that it runs more quickly removes much of the justification for the traditional central database. For my part, I find this a very exciting story. Much of my goal is to concentrate on software that models complex domains. An architecture like this provides good separation of concerns, allowing people to focus on Domain-Driven Design and keeping much of the platform complexity well separated. The close coupling between domain objects and databases has always been an irritation - approaches like this suggest a way out. if you found this article useful, please share it. I appreciate the feedback and encouragement

Forex utveksling Kolvereid

Search This Blog

Trading System Arkitektur Pdf

Comments

Post a Comment

Popular posts from this blog

How To Beregne The Verdi Of My Aksjeopsjoner

Logiciel De Trading Alternativet Binaire

Pivot Punkt Strategi In Forex Trading