Archive by Author

Ad Network vs Ad Exchange

The video from OpenX gives you a good comparison between ad network and ad exchange. Among them, I found some of them quite important:

  1. Transparency – publisher sees who is winning their inventory whereas advertiser can bid at the impression level based on the value of the impression they see.
  2. Reach – I assume the exchange has much large audience reach than an individual ad network unless the ad network has its own unique audience reach.
  3. Bid war – Giving good transparent, advertiser is willing to bid more that ends up higher publisher pay check.
  4. Data becomes more important – If buyer uses RTB to bid on the impression, they need to develop their own data intelligence in order to value the impression and put the price on it.

It is quite tempting for publishers and advertisers to jump into an exchange and see if they can make more money. However, there are some concerns I have:

  1. Does the exchange handle the frauds?
  2. Do they support different pricing models other than CPM?
  3. Do the rule based bidders need to rely on the audience segmentation info provided by the exchange?
Leave a comment Continue Reading →

Design Robust Restful API

This article is to summarize some key design decisions related to exposing some in-house services through REST API to the public. It involves url path design based on “resource oriented model”, version control, authentication and authorization and asynchronous call handling.

URL Path Design

Check out this video first:

  • Use resource oriented model - Every resource type should be a Noun that is normally represented as a collection. On the other hand, HTTP verbs (GET, PUT, POST, DELETE) will be used to manipulate the resources.
    • GET: read (cacheable)
    • PUT: modify (caller provides id)
    • POST: create the resource and use to call method as RPC.
    • DELETE: remove
  • Each resource should have 2 apis only: one for the collection in plural form, one for a particular entity.
  • The collection one can have ?search=… for locate the set of entities you want.
  • No verb should be in the url path.
  • Complex variation should use ?xxxx to take care of. Don’t complicate your url.
  • Pagination prefers to use ?offset=50&limit=200
  • Field extraction: field extraction uses ?fields=xxx,yyy
  • Formatting: take advantage of file extension like dogs.json
  • For input parameters, put everything into the URL and not using the HTTP headers, which is used for OAuth headers.
  • Error handling: Use HTTP error code to indicate error in the server side. (ie. 200 = succeed, 400=application error and 500 = wrong request, is used if the API). A human readable error message, together with the hint to fix that, should be sent back in the HTTP body response.
  • Operations can be sync vs async.
  • In GET operation, by default the container only return the URL reference of its immediate children. An optional parameter “expand” can be used to request the actual representation of all children and descendant.
  • API should only expose the function semantics but nothing about its implementation details, which allows the implementation to continuously evolve without breaking the client interface. And a good API should focus to do one thing well, rather than multiple things of different purposes. Each API must be self-contained and not relying on any specific call sequence to work correctly.

Example:

  1. List all persons. GET /persons
  2. Find a person with a particular id. GET /persons/123
  3. Get partial fields. GET /persons/123?fields=(name,age)
  4. Find a person’s particular friend. GET/persons/123/friends/456
  5. Find all persons named John. GET /persons/search?q=(name,eq,John)
  6. Find all dogs whose master is John. GET /persons/search?q=(name,eq,John)/dogs
  7. Create a person with a server assigned id. POST /persons?name=Dave&age=10
  8. Create a person with a client assigned id. PUT /persons/123?name=Dave&age=10
  9. Ask the person to perform an action. POST /persons/123/action/travel?location=Euro
  10. Remove a person. DELETE /persons/123
  11. Return a page of result. GET /persons/search?q=(name,eq,John)&offset=1&limit=25

Async call handling

In case when the operation takes a long time to complete, an asynchronous mode should be used. In a polling approach, a transient transaction object is return immediately to the caller. The caller can then use GET request to poll for the result of the operation. We can also use a notification approach. In this case, the caller pass along a callback URI when making the request. The server will invoke the callback URI to POST the result when it is done.

Versioning

  • If you follow the minimal API design approach, the newer version is usually about adding parameters to your original API rather than removing parameters.
  • Backward compatibility via using the same URL. (e.g. http://xyz.com/v1/path/…). On the implementation side, you only have the implementation that takes the latest version API parameters as input. In other words, you are prepared to receive request of the older version as well as the latest version. But you substitute the default value of the parameters of the newer version that is missing in the older version. And then send this request (with all the parameters filled) to the latest implementation.
  • For incompatible change, use a different URL endpoint for the new version (e.g. http://xyz.com/v2/path/…). You also keep the corresponding implementation (v1 and v2) behind those endpoints. Depends on your decision whether to keep supporting the older version, you may want to introduce a deprecation process. Unfortunately there is no standard way to indicate an API will be deprecated in the response. One possible way is to put a flag in the HTTP header of the response to indicate when the API will be deprecated.
  • Most people put versioning info as part of url like (e.g. http://xyz.com/v1/path/…). But it is debatable as others see versioning shouldn’t be part of url instead you can play trick in content type to specify what version of content you want.
Reference
  1. http://thereisnorightway.blogspot.com/2011/02/versioning-and-types-in-resthttp-api.html

Authentication and Authorization

Authentication call which must the first call to make and precede any other application API calls. As far as API security, App level key, with OAuth2.0 protocol should be used for authentication and authorization purpose.

Leave a comment Continue Reading →

Modern principles for building large distributed system

CAP Theorem

Consistency, Availability and Partition – CAP. You can at most get 2 out of 3.

BASE

In large deployments and high throughput scenarios, ACID breaks down because it requires actions to be “all or nothing”, effectively creating bottlenecks while clients wait for transaction handles. If you’ve done any distributed or concurrent computing, you know that contention for shared resources is a primary cause of failure, including problems like deadlock, starvation and race conditions. For the sake of reducing single points of failure (bottlenecks), Some distributed systems like Riak implements eventual consistency — where storage operations are accepted immediately, and then propagated across the cluster in an asynchronous fashion. This makes it nearly always available for writes and reads.

  • Basic Availability
  • Soft-state
  • Eventual consistency (Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state).

Comparing to ACID, BASE system is forfeiting C and I for Availability.

Vector Clock for Conflict Resolution

Since we are doing Eventual Consistency for availability, we need to deal with inconsistency issue. To deal with inconsistency, Riak tags each datum with a vector clock that internally reveals the datum’s lineage (who modified what version). When there are conflicts — that is, two parallel versions of the same datum — Riak returns you both versions so that your application can decide how to resolve it. The idea here is to perform conflict resolution during the read time rrather than in an SQL database with this kind of conflict, your transaction might fail and rollback, forcing you to resolve it and retry anyway.

Load Distribution over Consistent Hashing

Traditional hashing hashes the key and mod the number of machines to identify which machine to store the key. (ie hash(k) mod N). This works well until you add or remove cache machines, for then N changes and every object need to rehash with the new N. Before you reshuffle the cache objects to the new machine, you may experience cache misses. On top of that, the reshuffle process could be very slow if the data is big. To solve this problem, here comes the consistent hashing. The idea of consistent hashing is to create a 64 bits ring and each point in the ring represents either the key or machine. When a key is hashed, a point in the 64bit ring is identified. Then, you can locate the machine clockwise to have it handled the resource. For better resource allocation, you can create more than 1 point per machine based on its capacity (ie. virtual nodes) and have them spread across the ring for more uniformed resource allocation.

  1. http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
  2. http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

Replica Synchronization with Merkle Tree

Robust failure recovery

If a node goes down in your cluster, its replicas will take over for it until it comes back, a feature known as hinted handoff. If it doesn’t come back — as happens all too often on EC2 — you can add a new node and the cluster will rebalance.

Reference

  • http://seancribbs.com/tech/2010/02/06/why-riak-should-power-your-next-rails-app/
Leave a comment Continue Reading →

How to get search content for millions of keywords per day at low price?

Recently I have a project that requires me to analyze Google and Yahoo organic search content/ad copies over millions of keywords. Before digging down into deep semantic analysis or even machine learning, I first need to crawl the html content from each search engines in parallel. Initially, I thought it can be achieved easily. Then I came across roadblocks like search quota and rate limiting over a period of time from each of the search engine. To get around these, I looked into API call. However, it is very pricy. Google is charging $5/1000 queries while Yahoo Boss is cheaper but still @ $0.8/1000 queries. If I have 1 million query per day, we are talking about few thousands a day. I have no way to afford that!! What should I do?

Can YQL or 80legs help?

I started thinking of virtually increase the search quota per day via doing that from different machines. However, if I need to rent boxes from Amazon, it could cause me hundreds a month to meet my need. Plus Google may figure that out and block my boxes. Then I need to find another boxes. It is doable but quite labor intensive to stay below the radar. Then I start thinking to leverage YQL as Yahoo has quite a lot of machines powering this. However, I found out Google and even Yahoo has put up robot.txt to cut YQL out. Same problem I faced from trying out 80legs.

Can IronWorker help?

I did more research and identify IronWorker that I just need to pay the processing time rather than renting a bunch of physical boxes. The price is reasonable and their cloud is big. But they have no guarantee how many boxes they can provision you to run your job. If they allocate few boxes for that, I may get them banned quickly. On the other hand, to use ironworker, I need to create a crawl script and upload it to account. It is not bad but I need to re-engineering my code to leverage this b/c crawl becomes a batch process rather than realtime request and response model.

Proxy farm is here to save?

I moved forward and found out how others getting around this. They are using proxies to hide their identities. I first tried out with public proxies. (ie. As for a poor engineer like me, I always tried to find the free solution first ^v^). My experience for public proxies are bad. Most of them are blocked, slow and unreliable. Then I tried to pay a little money for shared proxies. Since it is shared, you will be blocked if someone uses the proxy and leave the footprint for Google. From what I heard, Google is attacking the private proxy providers via banning at the subnet level rather than ip. So, my experience on shared proxies are still not meeting my reliability requirement. Then I start shopping for private proxy and would like to find out which providers actually have proxies across many subnets for the sake of reducing my chance to be blocked completely. Some providers look good but the rate they charge are like $4 per proxy per month. If I am using 5 threads to run my search queries, I may need 30 proxies to standby (NOTE: I adopt the 1:6 ratio to get myself under the radar). It is not that expensive but I may scale it to 100 proxies and it may end up costing me few hundreds a month. Eventually, I come across a great and cheap service called myprivateproxy.net. It is around $2 per private proxy per month and it has great feedbacks from the users and speed is extremely fast.

I am doing reliability test on it right now. I need to run it for a period of time to confirm if it is good or not. If it works well, I will have my company used this service as well. I will post my result for the next post. Stay tuned!!

PS. If you are are interest in MPP, you can use this promotion code I got “warrior“. This code can save you 15% recurring charges. On top of 10% extra proxy as discount, the deal is really great!!

Leave a comment Continue Reading →

Extract info from Apache Log

This article is to show you how to extract information from the Apache access log. Since Apache is the most common web server, the Linux shell commands below should be quite useful.

Extract unique IP from Apache access log

  • Location the apache access log file
  • Then spit out one line to see the format:
1
  • Find the IP address in that line and count which part it is. (first part is $1)
1
  • To count how many unique IP
1
  • Once you get the IP lists, you can download the csv from maxmind and lookup its geo location. I have written a program to process this IP lists and figure out how many US traffics I am getting.

Identify bot traffics from Apache access log

Generate the hourly traffic report

Figure out who refers us the most traffic

Leave a comment Continue Reading →

Function Currying in Scala – Part 4

When I first read function currying, I had no idea what is the use of it. Why someone wants to break down its argument list into multiple () with one argument per each. Luckily, I finally came across an article that provides a great explanation of it. Given what you have learnt so far from my series, you should have good foundation to understand this article. I will try to walk through this example with you here and hope you will grasp it as well.

Review the example

1

You can see some syntax rules are applied here. Let me go through with you one after one.

  1. testRule1() method has return type Unit, so ‘=’ sign is not necessary. Since it is > 1 statement in the function body, we still need to have the { }.
  2. parserRule must be an instance variable from class Parser.
  3. The return type of parserRule is instance of one of the case class “Success” or “NoSuccess” and they extend ParseResult abstract class. The return result will be used in the pattern matching like regular function chaining.
  4. “apply” method name can be omitted and author wants to factor out the routine for Success as a separate function.

The author creates a method tryMatch and refactor out the assertion routine as below:

1

There are some interesting points I want to bring up:

  1. ParseResult has [T] next to it. I haven’t talked about that in my series. It is generic in Scala way.
  2. The 2nd parameter is a function type that allows you to pass the function to handle the success case.

How to use the new function above to make your test cleaner and easier

1

According to pattern matching rules, the value in the ParseResult will assign to r if case Success is matched. And the r will be the “result” of the function literal passed in. So far so good?

Function Currying

OK, I finally get to point to talk about function currying. The example in this blog saying that if there are multiple statement in success case routine, it will look ugly if you try to encapsulate all of these inside the function literal like below:

1

<
So, the author suggested to use function currying as it will transform a method taking multiple parameters into a chain of singular parameter. Here is what he can achieve after the function currying:

1

However, I don’t really feel the power of function currying simply just convert () to {}. If the author can format the previous code a bit, he can achieve the one like below. To me, it is about the same in term of clarity.

1

I believe function currying should possess the power better than this example. The reason I picked this example because I like the refactoring process from the author. I have googled more and notice this example may be a good one. However, to be honest, I am still not quite clear about the pivot syntax. So, I googled more and found out this statement from this article, “… Sometimes it makes sense to let people apply some arguments to your function now and others later”. Sounds like curried function allowing you to partially apply the parameter to the function.

1

Too bad that I still cannot figure out a real life example that we need to use curried function. If you know one, please comment this post.

Leave a comment Continue Reading →

Power of Scala – Part 3

Enough comparison between Scala and Java. The first two articles in the series just helps you to get familiar with Scala syntax but not telling you the actual power of Scala. In fact, I have no motivation to pick up Scala if it doesn’t give me the power big enough to scarify my time with my family. From this onward, I will put down the key features in Scala that differentiate it from Java.

Pattern Matching

Extend the power of switch/cases

  1. Support more than simple primitive type switch/cases
  2. Support case class
  3. Type match and assignment
Simple switch/cases

//primitive
def checkPrime(number:Int) = number match {
case 1 => true
case 2 => true

case _ => false
}

// you can condense cases in one line and return other data type
def toYesOrNo(choice: Int): String = choice match {
case 1 | 2 | 3 => “yes”
case 0 => “no”
case _ => “error”
}

//you can compare different data types
def f(x: Any): String = x match {
case i:Int => “integer: ” + i
case _:Double => “a double”
case s:String => “I want to say ” + s
}

//just check type. If match, assign x with obj and assign it with type Color. Then return it and assign to cast variable.
var obj = performOperation()
var cast:Color = obj match {
case x:Color => x
case _ => null
}

//with this addon, you can make it looks very similar to formula definition:
def fact(n: Int): Int = n match {
case 0 => 1
case n => n * fact(n – 1)
}


Example of case class

case class Number(value:Int)

def checkPrime(n:Number) = n match {
case Number(1) => true
case Number(2) => true
case Number(3) => true
case Number(5) => true
case Number(7) => true
case Number(_) => false
}

checkPrime(Number(12))

  1. The first statement is the key to the whole thing: defining a new case class “Number” with a single property.
  2. Case classes are a special type of class in Scala which can be matched directly by pattern matching.
  3. You can also create a new instance of a case class without the new operator
  4. For each case, we’re actually creating a new instance of Number, each with a different value.
  5. When Scala sees that the instances are the same type during comparison, it introspects the two instances and compares the property values.
  6. Case class supports inheritance.

How exception handling uses this?

Scala does not have checked exceptions and it does not force you to catch exceptions as well. Basically, it is up to you to catch it. To catch various exception, you don’t need multiple catch block in Scala. See the example below. You can see it just uses one catch block with switch/cases to do the job.

Box title

import java.sql._
import java.net._

Connection conn = DriverManager.getConnection(“jdbc:hsqldb:mem:testdb”, “sa”, “”)
try {
PreparedStatement stmt = conn.prepareStatement(“INSERT INTO urls (url) VALUES (?)”)
stmt.setObject(1, new URL(“http://www.codecommit.com”))

stmt.executeUpdate()
stmt.close()
} catch {
case e:SQLException => println(“Database error”)
case e:MalformedURLException => println(“Bad URL”)

case e => {
println(“Some other exception type:”)
e.printStackTrace()
}
} finally {
conn.close()
}

Leave a comment Continue Reading →

Class in Scala – Part 2

Common knowledge in this space

When I first dealt with class, the main concept I learnt was that it encapsulates both related properties and behaviors. Then I started learning the constructor, access scope, mutability, type constraints, inheritance and etc. I would like Scala addressing those OO concepts as well or it could provide a better abstraction of this concept.

Define a regular class

First, let me take the example used by a blog as a starting point.

Regular class definition

//default primary constructor
class Person {
  private var name = "Daniel Spiewak" // private field that could be changed
  val ssn = 1234567890    // public constant field
  var age = 0             // public field

  // public method
  def firstName() = splitName()(0)   // public method

  // import with method scope and round() from math can only be used in this method.
  // visible to subclass and enclosing class
  protected def guessAge() = {
    import Math._
    round(random * 20)
  }

  //private method that can be accessed by object of the same class
  private def splitName() = name.split(" ")  

  //access-restricted to both the enclosing class <em>and</em> the enclosing package
  private[mypackage] def myMethod = "test" 

  //access-restricted to this instance but no others.
  private[this] def myMethod2 = "test2"
}


Rules:

  1. The primary constructor for a class is coded in-line in the class definition. Parameters of the primary constructor turn into fields that are initialized with the construction parameters.
  2. The primary constructor can be made private by adding the access modifier private before the parameter list.
  3. Class parameters:
    • val to make them immutable instance value whereas
    • var to make them mutable.
  4. Class parameters can be preceded by an access modifier such as private or protected. By default, it is public on both class and method.
    • protected by default limits access to only subclasses, unlike Java which also allows access to other classes in the same package.
    • modifier[package] notation in Scala gives you more control of visibility.
    • Scala has far fewer method modifiers than Java does, primarily because it doesn’t need so many.  For example, Scala supports the final modifier, but it doesn’t support abstract, native or synchronized:
  5. auto-generate get/set methods but in different forms. To generate the setXxx and getXxx as Java Bean,  you need to use annotation: @BeanProperty.
    • If the field is private, the getter and setter are private. (NOTE: other object under the same class can access the private field. ie class-private)
    • If the field is a val, only a getter is generated.
    • If you don’t want any getter or setter, declare the field as private[this]. (Note: other object under the same class cannot access this. ie. object-private).
  6. A class can have a companion “object” (ie. singleton) with the same name but both of them must be at the same source file. They can access each other private features.
  7. Scala doesn’t force you to declare the return type for your methods.  Once again the type inference mechanism can come into play and the return type will be inferred.  The exception to this is if the method can return at different points in the execution flow (so if it has an explicit return statement).  In this case, Scala forces you to declare the return type to ensure unambiguous behavior. This “returnless” form becomes extremely important when dealing with anonymous methods.
  8. Access scope difference between Java and Scala

An example demonstrate all modifiers in Scala

In Scala

abstract class Person {
  private var age = 0

  def firstName():String
  final def lastName() = "Spiewak"

  def incrementAge() = {
    synchronized {
      age += 1
    }
  }

  @native
  def hardDriveName():String
}

In Java

public abstract class Person {
    private int age = 0;

    public abstract String firstName();

    public final String lastName() {
        return "Spiewak";
    }

    public synchronized void incrementAge() {
        age += 1;
    }

    public native String hardDriveAge();
}

Rules

  1. abstract is used in the class but not method level. No body method definition is considered to be abstract.
  2. final can be used in the method level but not at the field level
  3. native method needs to use annotation to specify
  4. synchronized method no longer there in Scala. But you can use code block level synchronized to achieve the same. However, we should consider to move to Scala actor-based concurrency model instead.

Singleton

Scala doesn’t support static member/methods. But you can use “object” to get the same capability. Basically, you can see object as singleton but there is no instance been created.

Companion object

    class MyString(val jString:String) {
      private var extraData = ""
      override def toString = jString+extraData
    }
    object MyString {
      def apply(base:String, extras:String) = {
        val s = new MyString(base)
        s.extraData = extras
        s
      }
      def apply(base:String) = new MyString(base)
    }
    println(MyString("hello"," world"))
    println(MyString("hello"))

Leave a comment Continue Reading →

Function in Scala – Part 1

Function Definition

def methodName(paramName: paramType, …): returnType = {//function body}

Rules:

  1. single statement doesn’t need {}
  2. semicolon after statement is optional
  3. return type is optional as scala interpreter will look at the last line as return type (no need return keyword)
  4. return type and the “=” can be omitted when the type is Unit (ie. void in Java)
  5. recursive function must specify the return type.
  6. A function with no parameters can be declared without parentheses, in which case it must be called with no parentheses.
  7. Vararg parameters are declared by appending an asterisk to the argument
  8. You can specify default arguments
Example

//this function will return true if character ‘l’ is passed in
def lls(p: Char): Boolean = { p == ‘l’ }

//return void but >1 statement, so braces are needed
def box(s : String) {
val border = “-” * s.length + “–\n”
println(border + “|” + s + “|\n” + border)
}

//varag parameters
def sum(args: Int*) = {
var result = 0
for (arg <- args) result += arg
result
}

//call it:
val s = sum(1, 4, 9, 16, 25)
//1 to 5 is a range not Int as expected for single argument. Use _* to tell it is arg sequence
val s = sum(1 to 5: _*)

//bounded array input -  T is subclass of Number, input is T varying array
def sum[T <: Number] (as: T*): Double = as.foldLeft(0d)(_+_.doubleValue)

//default arguments
def decorate(str: String, left: String = “[", right: String = "]“) = left + str + right

Function Call

Now you should have a good idea how to define a function in Scala. Lets look at some interesting areas related to anonymous function and compile syntax sugar.

Lets start from defining a function:

def lls(p: Char): Boolean = { p == ‘l’ }

In StringOps, there is a count function that takes the function type specified above:
def count (p: (Char) ⇒ Boolean): Int

StringOps is a scala way to extend the capability of Java String.
“Hello”.count(lls) // outputs 2

Functions with zero or one argument can be called without the dot and ( ). If it is single argument, you can use {} instead of ( ).
“Hello” count lls  // outputs 2

Syntax Sugar

This kind of syntax sugar is quite confusing for an newbie like me. I believe it is created for a reason and I notice that it is used quite extensively in the area of DSL. Hopefully, there is syntax style standard develops on top of this and make life easier. Before seeing this, I need to try out all possible combinations and see how far Scala can take.

Here I demonstrate the rules are correct and > 1 space can be added as long as it doesn’t confuse the compiler.

Function Literal

(x:Int, y:Int) => //function body

Use of function literal – is like an anonymous function in java.
“Hello” count { p:Char => p == ‘l’ } // outputs 2

Function literal is shorthand of object of class FunctionN where N is number of input parameter)
new Function1[Char, Boolean] {
def apply(p: Char): Boolean = { p == ‘l’}
}

You can bind Function Literal to variables:
val add = (a:Int, b:Int) => a + b
add(1, 2) // Result is 3

Function literals are useful for passing as arguments to higher-order functions. They’re also useful for defining one-liners or helper functions nested within other functions.
“Hello” count { p => p == ‘l’ } // outputs 2 (type inference knowing p is character)
“Hello” count { _ == ‘l’ } // still outputs 2 (if one parameter passed in and the name has no significant, you can use _)

Leave a comment Continue Reading →

Linux kernel tuning for high volume of traffic

C500K from Urban Airship

Urban Airship is generous enough to publish how they tune the Linux Kernel to handle over 500K concurrent users. This article is just my note to fill up some background info to facilitate better understanding of what and why they did that.

To start with, they want to squeeze the system and would like it to handle as many connections as possible. The tradeoff is they should have this box doing less. That means less code, cpu-usage, and ram-usage. So, its main job is to deal with client connections and submit the task to the queue.

Check memory usage

[root@f3 ~]# sysctl -a | grep mem
net.ipv4.udp_wmem_min = 4096
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_mem = 1549632    2066176    3099264
net.ipv4.tcp_rmem = 4096    87380    4194304
net.ipv4.tcp_wmem = 4096    16384    4194304
net.ipv4.tcp_mem = 196608    262144    393216
net.ipv4.igmp_max_memberships = 20
net.core.optmem_max = 20480
net.core.rmem_default = 129024
net.core.wmem_default = 129024
net.core.rmem_max = 131071
net.core.wmem_max = 131071
vm.lowmem_reserve_ratio = 256    256    32
vm.overcommit_memory = 0

Check system max in Internet connection

[root@f3 ~]# sysctl -a | grep file-max
fs.file-max = 1587124

//how many open file descriptors are currently being used.
[root@f3 ~]# more /proc/sys/fs/file-nr
1020    0    1587124
- 1020: total allocated file descriptor
- 0: total free allocated file descriptor
- 1587124: max number of file descriptor allowed on the system

//how many files are open.
[root@f3 ~]# lsof | wc -l
2497
[root@f3 ~]# lsof -u trffcapp | wc -l

[root@f3 ~]# vi /etc/security/limits.conf

Check max connection per user

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 2048
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 512
virtual memory          (kbytes, -v) unlimited

$ launchctl limit
cpu         unlimited      unlimited
filesize    unlimited      unlimited
data        unlimited      unlimited
stack       8388608        67104768
core        0              unlimited
rss         unlimited      unlimited
memlock     unlimited      unlimited
maxproc     1024           2048
maxfiles    2048           4096

$ sysctl -a | grep files
kern.maxfiles = 32768
kern.maxfilesperproc = 16384
kern.maxfiles: 32768
kern.maxfilesperproc: 16384
kern.num_files: 2049

temporary changes:
sudo launchctl limit maxfiles 16384 32768
sudo ulimit -n 32768
sudo sysctl -w kern.maxfilesperproc=16384
sudo sysctl -w kern.maxfiles=32768

permanent changes, you need to go to the actual file:
/etc/sysctl.conf (kern.maxfilesperproc=65536)
/etc/launchd.conf

The ulimit level is set low to prevent one poor shell script from flooding the kernel with open files.
The kern.maxfilesperproc is there to leave a little room in the max files count so that one process can use most but not all of the open file handler space from the kernel.

Leave a comment Continue Reading →